Data Integration Musings, Circa 1991

I recently stumbled over this very old text. It is really just notes and musings, but thought it was interesting to see some of my earliest thoughts on the data integration problem. Presented as is.

Mechanical Symbol Systems

To what extent can knowledge be thought of as sentences in an internal language of thought?
Should knowledge by seen as an essentially biological, or essentially social, phenomenon?
Can a machine be said to have intentional states, or are all meanings of internal machine representations essentially rooted in human interpretations of them?

Robot Communities

How can robots and humans share knowledge?
Can artificial reasoners act as vehicles for knowledge transfer between humans? (yes, they already are – see work on training systems)

Human Symbol Systems
Structures: Concepts, Facts and Process
Human Culture
Communication Among Individuals

Discourse

The level of discourse among humans is very complex. Researchers in the natural language processing field would tell you that human discourse is very hard to capture in computer systems. Humans of course have no problem following the subject changes and shifting contexts of discourse.
Language is the means through which humans pass information to one another. Historically, verbal communication has been the primary means of conveying information. Through verbal communication, parents teach their children, conveying not just facts, but also concepts and world view. Through socialization, children learn the locally acceptable way in which to exist in the world. Through continual human contact, all persons reinforce their understanding of the world. Culture is a locally defined set of concepts, facts and processes.

Myth

One of the most important transmission devices for human communication is myth. Myth is story-telling, and therefore is largely verbal in nature.

Ritual

Ritual also is used to communicate knowledge and reiterate beliefs among individuals. Ritual is performance, and can be used to teach process.

Information Systems Structures: Concepts, Facts and Process

The conceptual level of a standard information system may be stored in a database’s data dictionary. In some cases, the data dictionary is fairly simplistic, and may actually be hidden within the processes which maintain the database, inaccessible to outside review except by skilled programmers. More sophisticated data dictionaries, such as IBM’s Repository, and other CASE tools, make explicit the machine-level representation of the data contained in the system. The concepts stored in such devices are largely elementary, and idiosynchratic.
They are elementary in that a single concept in a data dictionary will generally refer to a small item of data called variously a “column” or a “field”. What is expressed by a single entry in a data dictionary is a mapping from an application-specific concept, for instance “PART_NUMBER”, to a machine-dependent, computable format (numeric, 12 decimal digits).
A “fact” in a database sense is a single instance or example of a data dictionary concept coupled with a single value.

Communication Across Information Systems, Custom Approaches

Information systems typically have no provision either to generate or understand discursive communication. Typically, information shared between two information systems must be rigidly defined long before transmission begins. This takes human intervention to define transmission carriers, as well as format, and periodicity.

Networks

The ISO OSI seven layers of communication was an initial attempt at defining the medium of computer communication. All computers which required communications services faced the same problems. Much of the work in networking today is geared toward building this ability to communicate. For humans, communication is through the various senses, taking advantage of the natural characteristics of the environment and the physical body. The majority of computers do not share the same senses.
Distributed systems are those in which all individual systems are connected via a network of transmission lines, and in which some level of pre-defined communication has been developed. The development of distributed database systems represents the first steps toward homogenation of mechanical symbol systems.

Electronic Data Interchange

EDI takes the communication process a step farther by introducing a rudimentary level of discourse among individual enterprises. Typically discourse is restricted to payments and orders of material, and typically these interchanges are just as static as earlier developments. The difference here is that human intervention is slowly developing a cultural definition of the information format and content that may be allowed to be transferred.
As standards are developed describing the exact nature and structure of the information that any company may submit or recieve, more of a culture of discourse can be recognized in the process overall. The discourse is of course carried out by humans at this point, as they define a syntax and semantics for the proper transmission of information in the domain of supply, payment, and delivery (commerce).
Although it is ridiculous to talk of an “EDI culture” as a machine-based, self-defining, self-reinforcing collection of symbols in its own right, it is a step in that direction. What EDI, and especially the development of standards for EDI transmissions, represents is an initial attempt to define societal-like communication among computers. In effect, EDI is extending the means of human discourse into the realm of high-speed transaction processing. The standards being developed for the format and type of transactions allowed represent a formalization and agreement among the society of business enterprises on the future language of commerce.

Raising Consciousness in Mechanical Symbol Systems

In order to partake of the richness and flexibility of human symbol systems, machines must be given control of their own senses. They must become aware of their environment. They must become aware of their own “bodies”. This is the mind-body problem.
mission lines, and in which some level of pre-defined communication has been developed. The development of distributed database systems represents the first steps toward homogenation of mechanical symbol systems.

Electronic Data Interchange

EDI takes the communication process a step farther by introducing a rudimentary level… (Author note: transcript cuts off right here)

Root Causes of the Data Integration Problem

The Fundamental Phenomenon – Human Behavior

4/24/2005

Writing over a century ago, Emile Durkheim and Marcel Mauss recognized and documented the true root cause of today’s data integration woes. (Primitive Classification, 1903, page 5-6 as quoted by Mary Douglas in Natural Symbols, page 61-62)

At the bottom of our conception of class there is the idea of circumscription with fixed and definite outlines. 

Given that this concept of classification is the basis of logic, social discourse, religion and ritual, it should not be a surprise that it also comes into play when software developers write software. They make assumptions and assertions in the design, data and code of their systems that rely on a fixed vision of the problem. Applications may be written for maximum flexibility in some ways, and still there is an intent on the part of the developers to define the breadth and width of the system,  in other words, to bound and fix in place the concepts and relations supportable by the application.

The highly successful ERP products like SAP, JD Edwards, and ORACLE Financials allow tremendous flexibility to configure for different business practices. The breadth of businesses that can make these products work for them is very large. However, it is a common understanding in the ERP professional community (of installers) that there are some things in each product that just can’t be changed or accomplished. In these areas, the business is said to have to change to accommodate the tool. The whole industry of “change management” was born from the need to change the PRACTICE of business due to the ultimate limitations of these systems which were imposed by the conceptual boundaries their authors had to place upon them. (This is a different subject which should be pressed and researched). No matter how flexible the business system is, it is ultimately, and fundamentally, a fixed and bounded symbolic system.

 So how does this relate to my claim that Durkheim and Mauss have unwittingly predicted the current crisis of data integration? Because they go on to point out that: 

It would be impossible to exaggerate, in fact, that state of indistinction from which the human mind developed. Even today a considerable part of our popular literature, our myths, and our religions is based on a fundamental confusion of all images and ideas. They are not separated from each other, as it were, with any clarity. 

This “conceptual stew” is present in every aspect of life. The individual human mind is particularly adept at working within this broad confusion, picking and choosing what to believe is true based on internal processes. Groups of individuals, in order to communicate, will add structure and formality to certain portions thru discussion and negotiation. But this “social” activity is not always accompanied by strong enforcement by the community.

 As Mary Douglas (Natural Symbols, page 62) continues from Durkheim and Mauss, individuals in modern society (and increasingly this encompasses the global community) are presented with many different conceptual mileaus during the course of a single day. Within each person, she indicates,

 A classification system can be coherently organized for a small part of experience, and for the rest it can leave the discrete items jangling in disorder. Or it can be highly coherent in the ordering it offers for the whole of experience, but the individuals for whom it is available may enjoy access to another competing and different system, equally coherent in itself, from which they feel free to select segments here and there eclectically, not worrying about the overall lack of coherence. Then there will be conflicts, contradictions and uncoordinated areas of classification for these people.

 This not only describes a few individuals, but it is my contention that this describes the whole of human experience. Nowhere in the modern world especially, except perhaps when alone with oneself, will the individual find a single, coherent, non-contradictory and comprehensive classification of the world. Instead, the individual is faced with dozens or hundreds of partial, conflicting conceptions of the world. Being the adaptable human being her ancestors evolved her to be, however, this utter muddle is rarely a problem in a healthy person. The brain is a reasoning engine built especially to handle this confusion, in fact it thrives on it – the source of much that we call “creative” or “humorous” or “brilliant” is derived from this ever-changing juxtaposition and jostling of different, partial conceptions. Human society expands from the breadth and complexity created by these different classification systems. Communication between strangers depends on the human capacity to process and understand commonalities and fill in the blanks in the signal.

The very thing which defines us as human, our ability to communicate across fuzzy boundaries, is also that thing that creates and exacerbates the Data Integration Problem in our software. Our software “circumscribes with fixed and definite outlines” some small aspect of our experience. In doing so, it denies the fuzziness of our larger reality, and imposes barriers between systems.

The Folk Model – What We Really Build Software From

The anthropological notion of a “folk model” can be a useful paradigm to consider when analyzing the implementation of software applications. Folk models are the proto-scientific conceptualizations of a group of people which they use to describe, understand and interact some aspect of their collective experience.

When writing software, especially but not only within the Agile approach, it is the through the elicitation and joint “discovery” of the user’s folk model that a common set of requirements for the software is defined. Ultimately, it is the closeness of fit between the folk model and the operation and symbology of the software that will determine its success or failure.

Different groups of people faced with the same or similar problems may develop largely similar folk models, and from these, different software development teams may create largely similar software applications. This is one reason why the software development process works best as a hand-crafted enterprise.

But what at first appears to be minor discrepancies between what the software model presents and what the folk model expects can grow so large that it can cause the failure of the software for those users. Especially if the folk model was flawed or in a state of flux at the time the software tried to codify it (and really, when is a folk model not in flux?).

You Can’t Store Meaning In Software

I’ve had some recent conversations at work which made me realize I needed to make some of the implications of my other posts more obvious and explicit. In this case, while I posted awhile ago about How Meaning Attaches to Data Structures I never really carried the conversation forward.

Here is the basic, fundamental mistake that we software developers make (and others) in talking about our own software. Namely, we start thinking that the data structure and programs actually and directly hold the meaning we intend. That if we do things right, that our data structures, be they tables with rows and columns or POJOs (Plain Old Java Objects) in a Domain layer, just naturally and explicitly contain the meaning.

The problem is, that whatever symbols we make in the computer, the computer can only hold structure. Our programs are only manipulating addresses in memory (or disk) and only comparing sequences of bits (themselves just voltages on wires). Now through the programming process, we developers create extremely sophisticated manipulations of these bits, and we are constantly translating one sequence of bits into another in some regular, predictable way. This includes pushing our in-memory patterns onto storage media (and typically constructing a different pattern of bits), and pushing our in-memory patterns onto video screens in forms directly interpretable by trained human users (such as displaying ASCII numbers as characters in an alphabet forming words in a language which can be read).

This is all very powerful, and useful, but it works only because we humans have projected meaning onto the bit patterns and processes. We have written the code so that our bit symbol representing a “1” can be added to another bit symbol “1” and the program will produce a new bit symbol that we, by convention, will say represents a value of “2”.

The software doesn’t know what any of this means. We could have just as easily defined the meaning of the same signs and processing logic in some other way (perhaps, for instance, to indicate that we have received signals from two different origins, maybe to trigger other processing).

Why This Is Important

The comment was made to me that “if we can just get the conceptual model right, then the programming should be correct.”  I won’t go into the conversation more deeply, but it lead me to thinking how to explain why that was not the best idea.

Here is my first attempt.

No matter how good a conceptual model you create, how complete, how general, how accurate to a domain, there is no way to put it into the computer. The only convention we have as programmers when we want to project meaning into software is that we define physical signs and processes which manipulate them in a way consistent with the meaning we intend.

This is true whether we manifest our conceptual model in a data model, or an object model, or a Semantic Web ontology, or a rules framework, or a set of tabs on an Excel file, or an XML schema, or … The point is the computer can only store the sign portion of our symbols and never the concept so if you intend to create a conceptual model of a domain, and have it inform and/or direct the operation of your software, you are basically just writing more signs and processes.

Now if you want some flexibility, there are many frameworks you can use to create a symbollic “model” of a “conceptual model” and then you can tie your actual solution to this other layer of software. But in the most basic, reductionist sense, all you’ve done is write more software manipulating one set of signs in a manner that permits them to be interpreted as representing a second set of signs, which themselves only have meaning in the human interpretation.

Q&A: Meaning Symbol Sign and Mind (Part 2)

On one of my recent posts, a commentor named “psycho” asked me some very good questions. I decided I needed to respond in more detail than just a single comment reply. I respond in pieces below, so just for context, here is psycho’s entire original comment.

But if you take more meanings, and put them together to get yet another meaning. Don’t you feel like those meanings were again like symbols creating a new meaning?

In my understanding, every bit of information is a symbol – what is represented by the invididual neurons in the brain. And if you take all related bits (that is neurons, symbols), and look at it as a whole, what you get is meaning.

The sentence is a symbol, and it is made of word-symbols. And the list of word-symbols makes a meaning. Which, when given a name (or feeling), becomes a symbol, that can be further involved in other meanings.

I’ll respond to each paragraph in separate posts, in order to get all of my thoughts down in a reasonably readable fashion. Part one covered the first paragraph. Here is part two where I cover the rest of my thoughts.

Symbols in the Mind

In my understanding, every bit of information is a symbol – what is represented by the invididual neurons in the brain. And if you take all related bits (that is neurons, symbols), and look at it as a whole, what you get is meaning.

I’m not a neurologist or any kind of brain scientist by any means, so I could eventually be proven wrong on this, but what a neuron represents, to me is not a symbol and not a sign and not a specific meaning. I know I read somewhere of a brain experiment (using MRIs I think) where the image of Jennifer Aniston presented visually during a brain scan caused only a single neuron to fire. I recall that the interpretation given was that the entire concept of “Jennifer Aniston” was stored in one singular neuron.

I guess I just don’t buy it. What if the meaning of that neuron was more along the lines of “a famous person whose name I forget” or “I recognize a face I’ve seen on ‘Entertainment Tonight'”? The fact of it is, the experimenters drew a conclusion on a correlation that not even their subject would be able to explain or confirm.

Then there is some hypothesis that memories and meanings are distributed across the brain in such a pattern as to suggest more of a holographic storage mechanism (where damage in one area of the brain is overcome by stimulation and growth and retraining).

I think that memory and meaning is essentially an EXPERIENCED thing. That the physical stimuli produce a complex of sensations through re-activation of neurons that causes the brain itself to “sense” the memory. I don’t think this qualifies as a symbollic sensation, being a much more holistic, “analog” experience not unlike the original. If every bit of information were a symbol, then I think we’d be just as hard-wired as computers to recognize only one set of sensations and meanings. Our experience being more fluid, it allows us to be much more creative in the aspects and portions of sensation that we recognize and name. As an individual I have full freedom to separate the signal from the noise, the foreground from the background, as I fancy. I can “slice and dice” me experience of sensation in anyway that I find meaningful, and if I communicate it to you, then you can see what I see just like that. In other words, working with the “analog” of my sensations is a much more powerful, creative endeavor than merely encoding and decoding “digital” symbols.

That’s my two cents on that thought.

Q&A: Meaning Symbol Sign and Mind (Part 1)

On one of my recent posts, a commentor named “psycho” asked me some very good questions. I decided I needed to respond in more detail than just a single comment reply. I respond in pieces below, so just for context, here is psycho’s entire original comment.

But if you take more meanings, and put them together to get yet another meaning. Don’t you feel like those meanings were again like symbols creating a new meaning?

In my understanding, every bit of information is a symbol – what is represented by the invididual neurons in the brain. And if you take all related bits (that is neurons, symbols), and look at it as a whole, what you get is meaning.

The sentence is a symbol, and it is made of word-symbols. And the list of word-symbols makes a meaning. Which, when given a name (or feeling), becomes a symbol, that can be further involved in other meanings.

I’ll respond to each paragraph in a separate post, in order to get all of my thoughts down in a reasonably readable fashion. Here is part one.

Construction of Symbols

But if you take more meanings, and put them together to get yet another meaning. Don’t you feel like those meanings were again like symbols creating a new meaning?

I try to make a very strong statement of the difference between symbols, signs and their “meanings”. Perhaps I’m being too analytical, but it allows my to think about certain types of information events in a way I find useful in my profession as a data modeller. So let me try to summarize here the distinctions I make, then I’ll try to answer this question.

First, in my writings, I separate the thing represented by a symbol from the thing used as the representation. The thing represented I call the “concept” or “meaning”. The thing which is used to represent the concept I have termed “the sign”.  A symbol is the combination of the two. In fact, a specific symbol is a discrete object (or other physical manifestation) built for the express purpose of representing something else. That specific symbol has a specific meaning to someone who acts as the interpreter of that symbol.

As I have come to learn as I continue reading in this subject area, this is a somewhat ideosyncratic terminology compared to the formal terms that have grown out of semiology and linguistics. To that I say, “so be it!” as I would have  a lot of re-writing to do to make my notions conform. I think my notions are comparable, in any case, and don’t feel I need to be bogged down by the earlier vocabulary, if I can make myself clear. You can get a feel for some of my basic premises by poking around some of my permanent pages, such as the one on Syntactic Media and the Structure of Meaning.

There is obviously a lot of nuance to describing a specific symbol, and divining its specific meaning can be a difficult thing, as my recurring theme concerning “context” should indicate. However, within my descriptive scheme, whatever the meaning is, it is not a symbol. Can a symbol have several meanings? Certainly. But within a specific context at a specific time, a specific symbol will tend to have a single specific meaning, and the meaning is not so fluid.

How do you express a more complex or different idea, then? It is through the combination of SIGNS which each may represent individual POTENTIAL concepts that I am able to express my thoughts to you. By agreement (and education) we are both aware of the potential meanings that a specific word might carry. Take for example this word (sign):

blue

When I show you that word in this context, what I want you to recognize is that by itself, I am merely describing its “sign”-ness. Those four letters in that combination form a word. That word when placed into context with other words may represent several different and distinct ideas. But by itself, it is all just potential. When you read that word above, you cannot tell if I’m going to mean one of the colors we both might be able to see, or if I might be about to tell you about an emotional state, or if I might describe the nature of the content of a comedian’s act I just saw…

While I can use that sign when I describe to you any of those specific meanings, in and of itself, absent of other symbols or context, it is just a sign with all of those ambiguous, potential meanings, but in the context of our discussion, it has no specific meaning.

It has a form, obviously, and it has been constructed following rules which

Photo of an Actual Stop Sign In Its Normal Context

you and I now tacitly understand. Just as a stop sign has been constructed following rules we have been trained to recognize.

Imagine now a warehouse at the Department of Transportation where a pile of new stop signs has been delivered. Imagine they are laid flat and stacked on a pallet, just waiting to be installed on a corner near you.

While they lay in that stack, they certainly have substance, and they each have the potential to mean something, but until they are placed into a proper context (at a corner by a road) their meaning is just as ambiguous as the word sign above. If you were driving a fork lift through the warehouse and came upon the pallet, would you interpret the sign right then as applying to you? Probably not! Could you say, just be looking at an individual instance of a sign, exactly which cars on which road it is intended to stop? No, of course not.

So this is the distinction between the sign and the meaning of a symbol. The sign is a physical construct. When placed into a recognized context, it represents a specific meaning. In that context, the sign will only carry that one specific meaning. If I make another instance of the sign and put it in a different context, while the signs may look the same, they will not mean the same, and hence I will have made two different symbols.

Just to be perfectly clear on the metaphor I’m presenting, here is a “pile” of signs (words) which I could use in a context to express meaning:

blue

blue blue

blue blue blue blue

Now let me use some of them and you will see that given a context (which in this case consists of other word signs and some typcal interpretations) I express different meanings (the thoughts in your head when you read them together):

once in a blue moon

blue mood

blue sky project

blue eyes crying in the rain

But make no mistake, while i have now expressed several different ideas to you using the same sign in different contexts, they are each, technically, NOT THE SAME SIGN AT ALL! Rather they are four examples of a type of sign, just as each of the stop signs on that pallet at the DoT are examples of a type of sign, but each is uniquely, physically its own sign! This subtlety is I think where a lot of people’s thinking goes awry, leading to conflation and confusion of the set of all instances of a sign with all of the concepts which the SET of signs represents.

To make this easier to see, consider the instance of the word (sign) “blue” above which I have colored red. That is a specific example of the “blue” sign, and it has a specific, concrete meaning which is entirely different from the word (sign) “blue” above which I have colored green.  The fact that both phrases have included a word (sign) of “blue” is almost coincidental, and does not actually change or alter the individual meanings of the two phrases on their own.

Finally, since I have belabored my nit-picking a bit, if I were to re-word your initial statement slightly to use the terminology I prefer on this site, It would change to:

But if you take more [signs], and put them together to get yet another meaning. Don’t you feel like those [signs] were again like symbols creating a new meaning?

And to this question, it should be clear, that my answer is “Yes, precisely: when you put other signs together, you create new meaning”.

Just What Is Meaning? A Lay Perspective

7/20/2005

The Origin of Symbols, Code and Meaning

Memories are NOT CODED. They are ANALOG recordings, not unlike phonograph records and the old photographs before the invention of digital cameras. There is some evidence that memories are stored in a manner similar to holographs within the medium of the brain. Memories may include recordings of coded information, this would be how symbols are recognized.

 Only when communication between brains is needed does CODE come into play. One brain must create appropriate SYMBOLS which represent the information. These symbols must be physicalized in some manner because the only input mechanism available to the other brain are the five senses of the body. Information is packaged and lumped, nuances and unimportant details are necessarily removed, symbols are selected and generated. If the other brain is receptive, then the symbols are sensed by the body, evoking the memory centers of the second brain. Communication is completed if the second brain understands the code and “remembers” the meaning in its own analog memory.

The Origin of Language

The brain records sensory inputs as memory. The mind constructs an internal symbol system describing the sensory information in ways the human body can communicate or relate the information. Details of the input which the mind cannot put a name to may be remembered or memorable, but cannot be communicated. Have you ever experienced something that you were unable to describe to someone else who had not experienced it?

Two people who have experienced the same or similar types of events can have a conversation about it and begin to form a language. Language is a shortcut to memory. It is the human capacity for the invention of vocabulary that sets them apart from other creatures (and from computers). If two people share a new experience, they’ll be able to talk about it by recognizing the same features in the sensory record and describing it in terms that evoke the same memory in the other person. Eventually, they’ll form a unique vocabulary of short hand symbolic terms and phrases to permit efficient communications. This is how strangers who meet at 12-Step Meetings are able to express and understand each other.

 But if only one of the two persons has experienced the events, there is no referent memory in one of the two. Think of the old saw “a picture is worth a thousand words”. Have you ever heard a new musical piece and tried to explain it to someone who hasn’t heard it? It takes a lot of explanation and yet is ultimately a failure.

 Consider another example: Wine tasting connoisseurs

 These people have an intense sensitivity to subtle features of taste and smell making their experience of wine very rich with information. More importantly, they have been able to attach vocabulary to these differences in unique ways that allows them to communicate with other wine experts. Of course, their success at communicating is predicated on the existence of other individuals with similar talents and experiences. When they try to explain to someone without the sensitivity of taste, their words merely confuse or sound hilariously out of place.

 This is one example of how “context” arises in human communication.

 What does this suggest for our major theme? 

  1. The features that are recognized in the sensory record are dependent first on the individuals whose senses recorded them
  2. The features that are chosen for communication are dependent on the interests and needs of the individuals doing the communicating. Other features that at first do not seem to contribute to the remembrance of the experience are often ignored or discounted.
  3. The vocabulary describing and naming these features is dependent on both the individual who sensed and on the people to whom they try to explain the sensation. Thru trial and error, the person who is trying to communicate will hit upon terms that find resonance in their audience.
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: