Context and Chomsky’s Colorless Green Ideas

Language is code. The speaker chooses the terms, sequence and intonations of their speech with the hope that the listener shares enough of the same human experience to recognize the intended meaning. Conversation is a negotiation as much as anything else. In conversation, the participants can adjust the selection of terms and details until they all reach an understanding of what is being said. This is the practical meaning of “context”, then.

Many years ago, in an effort to make a point about how syntax is different from semantics, Noam Chomsky once proposed the following sentence as an example of a grammatically correct sentence that had no discernible meaning:

Colorless green ideas sleep furiously.

In the context within which Chomsky was writing this sentence, reflective of common cultural experience of these terms among a broad community of American society, he made the claim that the sentence had no meaning. Since that time, other scholars have suggested that there may be contexts in which this construction of terms may actually be meaningful.

Here’s a quote from the english language version of Wikipedia from August 1, 2005:

This phrase can have legitimate meaning to English-Spanish bilinguals, for whom there are double-entendres about the word “green” (meaning “newly-formed”) and “sleep” (used as a verb of non-experience). An equivalent sentence [in the context understood by these English-Spanish bilinguals] would be “Newly formed, bland ideas are unexpressible in an infuriating way.”

This little example provides an excellent case study of the role context plays in communication. Never mind the fact that the sentence was first defined in a context for which it held no meaning. Since the moment of its invention, other contexts have either been recognized or constructed around the sentence in which it holds meaning.

The notion of “context” as that mileiu which drives the interpretation of a sentence such as this is the same notion that explains how the meaning of any coded message must be interpretted. This would include messages encoded in the data structures of computer systems. Data within a omputer system is constructed within and in order to support specific information recordation and transmittal of things important to a specific context. This context is the tacit agreement between the software developers and the business community on what the “typical interpretation” of those computer symbols should be.

The importance of context to the understanding of the data integration problem cannot be understated (which is why I keep coming back to it on this blog). While many theorists recognize the role context plays, and many pundits have written about the failures of computer systems when context has been ignored or mishandled, practitioners continue to develop and deploy applications with little explicit attention to context.

All computer applications written in business today are written from some point of view. This point of view establishes the context of the system. Most developers would agree with these statements. The trick is to define a system which allows the context of the system to change and evolve over time, as the business community learns and invents it. It must be a balancing act between excluding the software equivalent of Chomsky’s meaningless statement, and allowing the software to adapt as the context shifts to allow real meaning to be applied to those structures.

Example of How Meaning Is Attached to Structure

What follows is a detailed example of the thought process followed by a software developer to create a class of data structures and how meaning is attached to those structures.

Consider that the meaning of one data structure may be composed of the collection of meanings of a set of smaller structures which themselves have meaning. Take the following description as the meaning to be represented by a structure:

An employee is a human being or person. Each employee has a unique identity of their own. Each employee has a name, which may be the same as the name of a different person or employee. Being human, each employee has an age, calculated by counting the number of years since they were born up to some other point in time (such as present day).Each person of a certain age may enter into a marriage with another human being, who in turn also has their own identity and other attributes of a person.

To represent this information using data structures (i.e., to project the meaning of this information onto a data structure), we might tie the various concepts about a human being/employee to a computer-based data structure. Recognizing that a human being is an object with many additional characteristics of which we might want to know about, we might choose to project the concept of “human beings” or “people” onto a relational table and the concept of a particular individual onto one of that table’s rows (or a similar record structure).

This table would represent a set of individual human beings, and onto each row of the table would be projected the meaning of a particular human being. Saying this again in a more conventional manner, we would say that each row of the table will reference a singular and particular human being, the all of the rows will represent the set of all human beings we’ve observed in the context of our usage of the computer system.

In a more mathematical vein, we would define a projection Þ from the set of actual human beings Α onto Š, (Þ(Α) |–> Š), the set of data structures such that for any α in Α where α is a human being, there is a record or row σ in Š that represents that human being.

A record data structure being a conglomeration of fields, each of which can symbolically represent some attribute of a larger whole, then we might project additional attributes of the human being, such as their name and identifier, to particular fields within the record. If σ is the particular record structure representing a particular human being, α, then the meaning (values) of the attributes of that person could be associated with the fields, f1..fn, of that record through attribute-level projections, ψ1..ψn for attributes 1 .. n.

To represent a particular person, first we would project the reference to the person to a particular row, Þ(α) |–> σ, then we would also project the attribute facts about that person onto the individual fields of that row:

ψ1(α.1) |–> σ.f1

ψn(α.n) |–> σ.fn

Projection onto Relational Structure

When modeling a domain for incorporation into computer software, the modeler’s task is to define a set of structures which software can be written to manipulate. When that software is to use relational database management systems, then the modeler will first project the domain concepts onto abstract relational structures defined over “tuples”. These abstract structures have a well-defined mathematical nature which if followed provides very powerful manipulations. The developer projects meaning onto relations in a conventional way, such as by defining a relation of attributes to represent “PERSON” – or the set of persons, and another relation of attributes to represent “EMPLOYEE” – or the set of persons who are also employees. Having defined these relational sets, the relational algebra permits various mathematical operations/functions to be applied, such as “JOIN” and “INTERSECTION”. These functions have strictly defined properties and well-defined results over arbitrary tuples. The software developer having projected meaning onto the individual relations, he is also therefore able to project meaning on the outcomes of these operations which can then be used to manipulate large sets of data in an efficient, and semantically correct way.

As the developer creates the software however, they must keep in mind what these functions are doing on two levels, at the level of the set content and at the level of the represented domain (the referent of the sets and manipulations). Thus the intersection of the PERSON and EMPLOYEE relations should produce the subset of tuples (records, etc.) which has its own meaning derived from the initial projected meaning of the original sets. Namely, this intersection represents the set of PERSONS who are also EMPLOYEES, (which is the same, alternatively, as the set of EMPLOYEES who are also PERSONS). This is an important point about software: the meaning is not simply recorded in the data structure but the manipulations of the data by the computer themselves have specific connotations and implications on the meaning of data as it is processed.

Representational Redundancy

As a typical practice in the projection of information onto data structures within the relational model, there will usually be a repetition of the information projected onto more than one symbol. In particular, the reference to the identity of a single person will be represented both by the mere existence of a single row in the table, and also by a subset of fields on the row which the software developers have chosen (and which the software enforces) for this purpose. In other words, under common software development practices, each record/row as a conglomerate entity will represent a single person. In addition, there will be k attributes (1 <= k <= n) on that record structure whose values in combination also represent that same individual. These k attributes make up the “primary key” of the data structure. The software developer will use and repeat these columns on multiple data structures to permit additional concepts regarding the relationship between that person and other ideas also being recorded. For example, a copy of one person record’s primary key could be placed on another person record and be labelled “spouse”. The attributes which make up the primary key often have less mechanical meanings as well (for example, perhaps the primary key for our person includes the name attribute. As part of the primary key, the name value of the person merely helps to reference that person. It also in its own right represents the name of the person.

How Meaning Attaches to Data Structures: A Summary

What follows is a high level summary of how humans attach meaning to various kinds of data structures within a computer. It will serve as a good baseline account, though certainly not an exhaustive one, providing a model upon which more detailed dicussion can begin. 

 Background Terminology

Computer systems provide functionality to support the performance and record of business processes. They do that through three inter-related features: DATA, LOGIC, and PRESENTATION. The presentation consists of information displays permitting both an information visualization aspect and an information capture aspect. The logic consists of several aspects, much of it having to do with support of the presentation and manipulation of displays, but also a lot of it having to do with creation, transformation and storage of data. Data consists of sets of symbols constructed in a systematic, regular fashion using a set of data structures. Different data structures are constructed to represent different aspects of the recorded activity. It is in the relationships between the macro and micro structures where the specific detailed information captured.generated by the business process resides. By following a codified, rigid construction of its data structures, the computer system is able to record multiple recurring instances of similar events. Through the development of fixed transformations using program logic, the computer system is able to make routine, conventional conclusions about those events or observations, and it is able to maintain and retain those observations virtually indefinitely.
Data is maintained and stored in DATA STRUCTURES. The more regular these data structures are, the more easily they are interpreted by a broad audience of software developers. In most situations, the PRESENTATION of the data captured by a system to the end user of that system is in a more directly understandable form than the way that information is stored in the computer.  (This statement is not only trivially true, but in a very deep sense too, since the computer actually stores everything using more and more complex sequences of binary digits. That’s a different subject than our current presentation.)  The data structures within the computer system typically exist in two, simultaneous forms, one intended to support human reasoning (through what is often called a “logical”, “abstract” or “conceptual” model) and one supporting manipulations by the computer. Most software developers today strictly deal with the abstract model of the data for design, coding, and discussion. (There are still some developers working in assembly level code, but even that is at a more abstract level than the actual electro-mechanical machinations of the actual hardware!)
An obvious observation, at least on its face, is that different computer systems will store data representing similar ideas using different structures. We need to keep this in the back of our minds as we progress through the rest of this discussion, but it will be more directly adressed in other entries.
 A final thought concerns sets of data of similar structure, called a POPULATION. A population of data consists of some set of data symbols, all constructed using the same data structure pattern which represents a set of similar ideas. The classification of populations of data structures applies to the DATA portion of systems, represents an analogous classification of sets of observed events external to the computer system, and is affected by and affecting the LOGIC and PRESENTATION portions of the computer system. A more detailed definition of the notion of a “population” will also be treated in separate sections.

Commonalities of Structure

Many computer systems, especially those built in support of business (or other human activity) processes, are constructed using a conventional system of abstract data structures. (When I say they are “conventional” what I mean is that the majority of software developers follow conventional patterns for the construction of data structures to represent their idiosynchratic subject areas.) Whether these structures are called “objects”, “tables”, “records”, or something else, they typically take the form of a heterogenous collection of smaller structures grouped together into regular conglomerations. Instances or examples of the larger collections of data structures will each be said to “represent” individual intances of some real-world conglomerate. Each of the individual component element structures of these conglomerations will each be said to represent the individual attributes or characteristics of the real-world conglomerate object. In order to permit efficient processing by the computer,   instances of similar phenomenon will be represented by the same kind of conglomeration.
Typically, business systems will be based on a data structure called a RECORD.  Records consist of a series of “attribute data structures” all related in some fashion to each other. (A more complex structure called an “object” still has record-like attributes combined together to represent a larger whole, the nuances and variation of object-based representation is a subject for later.)  Each RECORD will stereotypically symbolize one instance of a particular concept. This could be a reference to and certain observed details of a real-world object, or it could be something more ephemereal like observations of an event. For example, one “PERSON” record would represent a single individual person.
RECORDS themselves consist of individually defined data elements or FIELDS. Each RECORD of a particular type will share the same set of FIELDS. Each FIELD will symbolize one kind of fact about the thing symbolized by the RECORD. For example, a NAME field on a PERSON record will record what the represented individual’s name is, at least as it was at the time the record was created. 
The set of all records within a system having the same structure will typically be collected and stored together, often in a data structure called a TABLE. Each TABLE will symbolize the set of KNOWN INSTANCES of whatever type of thing each record represents. TABLES are also described as having ROWS and COLUMNS. Each row of a table is one RECORD. The set of shared element-attribute structures across the set of  rows can be described as the “columns” of the table. Each column represents the set of all instances of a FIELD in the table, in other words, the same field across all records. Tables are a commonly used data structure because they readily support interpretation using relational algebra and set theoretic operations, as well as being easily presented and understood both by human and computer.  

Basic Data Structures and Their Relationships

The nomenclature of “record”, ” table”, “row”, “column” and “fields” describes the construction building blocks of an abstract syntactic medium whose usage permits humans to represent complex concepts within the computer system. By assigning names to various collections and combinations of these generic structures, humans project meaning onto them. Using diagrams called “data models”, a short hand of sorts allows the modeler to describe how the generic tables and fields relate to each other and what these relationships signify in the external world. These models also, by virtue of the typified short hand they use, allows for the generation of computer logic that can be applied to a database to support certain standard operations and manipulations of the data generated by a computer system.

Traditional data modeling results in the creation of a data dictionary which relates each structural element to a particular kind of concept. Every structure will be given a name, and if the developers are diligent, these can be associated with more fully realized text descriptions as well. Some aspects of the data structures are not described, at least typically, within a data model, such as populations or subsets of records with similar structures.

Traditional data dictionary entries record name and description of the set of all structures contained in a table. Using a set of structures to represent a set or collection of similar objects is itself a symbolic action. So not only does each row in a table represent one instance of some type of thing, and each column represents one observed (or derived) fact or attribute of that instance, but the collection of all instances of these row data structures also represents the logical set or population of these things.

The strategy for applying meaning to these data structures begins when the decision is made to treat the entirety of each record as the representation of a member of a population of like things. Being similar, then, a set of fields is conceived to capture various detailed observations regarding the things. These fields are intended to capture details about both how each thing is different from the other things in the collection, but also how different things may share similarities. Much of the business logic of the application system will be consumed by the comparisons between individual things, and the mathematical derived counts (and other metrics) of those sets of things (and of subsets within). Using the computer to compare the bit sequences contained in each field, the computer will indicate whether these contents are the same or different between different instances. Humans will then interpret the results of these comparisons by projecting the conclusion out of the computer and into the conceptual world.

For example, let’s say that we have defined the computer sequence “10101010” to represent a reference to a specific person, “Julie Smith”. If we take two different instances of bit sequences and compare them in the computer, the computer will tell us if they are the same or not. As humans, we would then interpret the purely electro-mechanical result which the computer calculated that “10101010” and “10101010” are the same as an indication that the two instances of these sequences represent the same specific person. Likewise, we would interpret a computer result indicating that two bit sequences were not the same as an indication that different people were being referred to.  This type of projection of meaning from mechanical result to logical inference is fundamental to the way humans use computers.

The specific number of fields and their bit sequence representations (data types)  that are developed within a computer application is entirely dependent on the complexity of the problem domain and the attributes of the objects required to reason over that domain. However, no matter how simple or complex, it is the projection of meaning onto the representation of these attributes in the computer and the projection of an interpretation onto the results of the computer comparisons of the physical representations which makes the computer the powerful engine that it is in our society.

How Row Subsets Represent Subpopulations
How Row Subsets Represent Subpopulations

 

The Syntactics of Speech: What a Language Permits You to Say Is Less Than What You Know

I found this article intensely interesting. It corroborates and validates some of my own ideas about how language and symbols are used in communication. Namely, it suggests that even though a language does not contain structures and syntactic rules allowing for precise designation of a concept, that does not mean that such a concept cannot be communicated and understood by someone who uses that language. It just may take a lot more time to convey the thought. It may also be difficult to confirm the listener’s understanding because the language they have available to respond is the same one as the original message (which we said could not directly convey the meaning).

NY Times article

Example Interaction Between Parent and Child Context

In a previous post, I described in general some of the relationships that could exist between and across a large organization’s sub-contexts. What follows is a short description of some actual observations of how the need for regional autonomy in the examination and collection of taxes affected the use of software data structures at the IRS.

Effect of Context on Systems and Integration Projects

July 15, 2005

Contexts lay claim to individual elements of a syntactic medium. A data structure (syntactic medium) used in more than one context by definition must contain meaningful symbols for each context. Some substructures of the data structure may be purposefully “reserved” for local definition by child contexts. In the larger, shared context, these data structures may have no meaning (see the idea of “traveller” symbols). When used by a child context, the meaning may be idiosyncratic and opaque to the broader context.

One way this might occur is through the agreement across different organizational groups that a certain structure be set aside for such uses. Two examples would include the automated systems at the IRS used respectively for tax examinations and tax collections.

Within the broad context defined by the practitioners of “Tax Examination” which the examination application supports, several child contexts have been purposefully developed corresponding to “regions” of the country. Similar organizational structure have also been defined for “Tax Collection” which the collection application supports. In both systems, portions of the syntactic media have been set aside with the express purpose of allowing the regional contexts to project additional, local meaning into the systems.

While all regions are contained in the larger “Examination” or “Collection” contexts, it was recognized that the sheer size of the respective activities was too great for the IRS central offices to be able to control and react to events on the ground in sufficient time. Hence, recognizing that the smaller regional authorities were in better position to diagnose and adjust their practices, the central authorities each ceded some control. What this allowed was that the regional centers could define customized codes to help them track these local issues, and that each application system would capture and store these local codes without disrupting the overall corporate effort.

Relying on the context defined and controlled by the central authorities would not be practical, and could even stifle innovation in the field. This led directly to the evolution of regional contexts. 

Even though each region shares the same application, and that 80 to 90% – even 95% – of the time, uses it in the same way, each region was permitted to set some of its own business rules. In support of these regional differences in practice, portions of the syntactic medium presented by each of the applications were defined as reserved for use by each region. Often this type of approach would be limited to classification elements or other informational symbols, as opposed to functional markers that would effect the operation of the application.

This strategy permits the activities across the regions to be rolled up into the larger context nearly seamlessly. If each region had been permitted to modify the functionality of the system, the ability to integrate would be quickly eroded, causing the regions to diverge and the regional contexts to share less and less with time. Eventually, such divergence could lead to the need for new bridging contexts, or in the worst case into the collapse of the unified activity of the broader context.

By permitting some regional variation in the meaning and usage of portions of the application systems, the IRS actually strengthened the overall viability of these applications, and mitigated the risk of cultural (and application system) divergence.

What a Context Is: Information Flow Theory

I’ve been busy lately, and let this discussion lapse for a bit. Let’s see if I can kickstart it again.

Sometimes, information flows from the physical world to a person who observes and interprets his perceptions to create and recognize new knowledge. Sometimes, someone creates a message in a physical medium and “sends” it into the world hoping that there will be another person who not only can sense it, but can recognize it as a message and can receive the information layered on top of the physical perception. The first is an example of simple observation and perception, the second is an example of communication within a context. While both rely on the perception of physical reality by the observer, they are fundamentally, though subtlety, different.

I have been reading a book on the mathematical theory underlying the “flow” of “information” in the real world, and while I don’t yet really understand the theory, there are some points the book is failing to make about context which I think are important.

Barwise, Jon and Seligman, Jerry, Information Flow: The Logic of Distributed Systems, Cambridge University Press, 1997.

Not that the book was intended to cover the concept of context, per se, being an attempt to lay out a mathematical/logical framework for describing the flow of information across, through and between physical systems. It does have an extensive section on “local logics” which when I understand it better may help me describe my own ideas about context in a formal manner.

What struck me, and forms the origin point of today’s discussion on contexts, is the example the authors use to introduce and illustrate the technical discussion. And as you will see, it is not just the example given, but variations of it that I will use to elucidate better what a context is and is not. To get to the meat of my thinking, here is their example, as written on pages 4-5.

Judith, a keen but inexperienced mountaineer, embarked on an ascent of Mt. Ateb. She took with her a compass, a flashlight, a topographic map, and a bar of Lindt bittersweet chocolate. The map was made ten years previously, but she judged that the mountain would not have changed too much. Reaching the peak shortly after 2 P. M. she paused to eat two squares of chocolate and reflect on the majesty of her surroundings.

At 2:10 P. M. she set about the descent. Encouraged by the ease of the day’s climb, she decided to take a different route down. It was clearly indicated on the map and clearly marked on the upper slopes, but as she descended the helpful little piles of stones left by previous hikers petered out. Before long she found herself struggling to make sense of compass bearings taken from ambiguously positioned rocky outcrops and the haphazard tree line below. By 4 P. M. Judith was hopelessly lost.

Scrambling down a scree slope, motivated only by the thought that down was a better bet than up, the loose stones betrayed her, and she tumbled a hundred feet before breaking her fall against a hardy uplands thorn. Clinging to the bush and wincing atthe pain in her left leg, she took stock. It would soon be dark. Above her lay the treacherous scree, below her were perils as yet unknown. She ate the rest of the chocolate.

Suddenly, she remembered the flashlight. It was still working. She began to flash out into the twilight. By a miracle, her signal was seen by another day hiker, who was already near the foot of the mountain. Miranda quickly recognized the dots and dashes of the SOS and hurried on to her car where she phoned Mountain Rescue. Only twenty minutes later the searchlight from a helicopter scanned the precipitous east face of Mt. Ateb, illuminating the frightened Judith, still clinging to the thorn bush but now waving joyously at the aircraft.

Was Judith Lucky That Miranda Knew Morse Code?

In a word, yes. In so many ways:

  • like the fact that Miranda was near the bottom of the mountain,
  • that she had a clear view of the side of the mountain where Judith lay,
  • that she had a cell phone at the car,
  • and most importantly, Judith was indeed lucky that Miranda knew the Morse code for “SOS”.

In fact, it was also lucky that Judith herself knew the code. So in the story as given, since Judith knew the code to use when she found herself in trouble, she used her light as the syntactic medium in which she encoded a message of her need for help. The fact that Morse code is a globally standard coding scheme simply meant that both Judith and Miranda both shared a common context without ever having met. The fact of their shared knowledge of the code provided the context by which Judith was able to get her message to Miranda.

What if Miranda Didn’t Know Morse Code

Things could have been much worse for Judith if either of them had no knowledge of the code, or if neither of them did. In addition, it was lucky that Miranda was somehow aware (or realized as she was watching) that a flashing light could be used to signify such a code, and that she then obviously deduced from the repeated pattern that someone was sending a message.

Imagine what might have happened if Miranda had seen the flashing light, but didn’t recognize it as a code, and therefore didn’t try to translate what she was seeing. Instead of reacting by calling for help, she might have thought to herself “Oh look, there’s a light up on the mountain. I wonder what that is?” but then gone on about her business.

In other words, Miranda could have observed her environment and perceived the flashing light but concluded that it was simply a physical phenomenon of no particular import. She may have perceived the signal but failed to recognize it as a message. In which case, this would show that Judith and Miranda had failed to establish a context for the communication.

What if Judith didn’t know Morse Code?

If Judith didn’t know Morse Code, perhaps she would still have started waving her flashlight around. Miranda having seen the light would have no reason to recognize a code.

Would this mean Judith would be out of luck? Not necessarily, if Miranda was also an experienced hiker. Miranda being in the context of hiking, it might occur to her that there shouldn’t be a light on that part of the mountain at that moment in time. She might think to herself that the random way the light was moving, plus its position on the mountain compared with where the safe trails were, added up to someone in distress.

In this case, a message has still been sent from Judith to Miranda, with the same result. The context that Miranda was thinking in plus her perception and prior knowledge of the mountain trails, allowed her to reach a conclusion that there was someone on the mountain in trouble. But it is important to note that Judith was not in the same context as Miranda.

In fact, if Miranda was a ranger, she may have been trained to look for and recognize the behavior of people in distress. In this example, we must conclude that Judith was not actually participating in the context with Miranda. It was Miranda’s knowledge of and mindset regarding her perceptions of the dangerous mountain environment which led her to deduce the existence of a person in trouble, not the fact of Judith’s trying to send a message.

Yes, this Judith tried to send a message, but she couldn’t have known that her random wavings would be recognized in anyway. Whereas the Judith who used Morse code actually knew of a context and encoded a very specific message using that context, with the expectation and hope that someone else might also understand it.

The difference in the two versions of the story is subtle. In both cases a message was sent, and in both a message was received and an action was taken. But in the first story, a bridging context in the form of Morse Code was called upon to carry a very specific message, while in the second story there was no bridging context. In the second story, it was entirely the perceptiveness and deductive power of Miranda’s “hiking Mt. Ateb context” which allowed her to create for herself new information: namely that “someone out there is in trouble”.

Once More, What If Judith Wasn’t In Trouble?

Let’s take one more variation of the story to enforce this last point. Let’s say that everything happened as described, except that instead of falling down the scree, Judith purposefully rappelled down the side of the mountain. And furthermore, that instead of clinging desperately to a thorn bush, that she had actually managed to establish a bivuoac in that peculiar outpost. In this version of the story, perhaps Judith is waving her flashlight around as in our second story, only this time merely to light her little campground while fixing herself dinner.

Now imagine ranger Miranda, trained as before and with a knowledge of the trails, but without prior knowledge of anyone camping where Judith found her perch. Using her same described skills of perception and deduction, Miranda may still come to the conclusion that there was someone on the side of the mountain in trouble, and would take the aforementioned steps to effect a rescue. Only she would find that Judith was not in need of help, and is now put out by the disturbance of her relaxation by the whirring chopper blades.

In this version of the story, Miranda is still in the same context as before, and uses her perceptions and the rules of that context to reach her conclusion. The fact is, and this second version of the story should make it clear, that while information did flow from Judith to Miranda just as before, we cannot call this information a “message” carried on a medium and in a context shared by Judith and Miranda. In other words, it was not a purposeful communication across a bridging context.

No, quite simply, in both of these latter examples, Miranda’s context guided her to her perception and the creation of the knowledge that Judith was on the mountain and in trouble (even if she was mistaken on this last point in the final story).

Summary

The fact that a person who presses a flashlight button does or does not intend to send a message – to communicate through that act -defines whether we classify the information flow as being a symbollic act or not. Perhaps the person does not realize or care whether there is another person watching for a flashlight in the dark. The factthat someone sees the light and acts in response does not mean that communication has occurred. Just because information has flowed does not mean that symbols have flowed.
This is a subtle distinction but an important one.

Why Comparability Is Critical To Solving The Data Integration Problem

At its most basic, the task of data integration from multiple source systems is one of recognizing the EQUIVALENCY and diagnosing the CONFLICTS among sets of symbols (the data) stored in each system’s data structures (syntactic media). Data integration is accomplished when the conflicts have been eliminated through TRANSFORMATION into new COMMON SYMBOLS which are COMPARABLE at both the syntactic and semantic levels.

The end result of data integration should be that SEMANTICALLY EQUIVALENT (or at least COMPARABLE) data structures become SYNTACTICALLY EQUIVALENT (COMPARABLE) as well. When this result is achieved, the data structures are considered COMPARABLY EQUIVALENT, and the data from the different source systems can be collapsed, combined or integrated correctly.

Structural Comparability

The issue can be characterized as one of the COMPARABILITY of data between systems.

  • Syntactic Comparability is defined by the DATA TYPE and internal DATA STRUCTURE
  • Semantic Comparability is defined by the CONCEPT or MEANING projected onto the data structure by the users of the source system
  • Two data items are COMPARABLE if they share both SYNTACTIC and SEMANTIC COMPARABILITY

Typical Conflicts

Typical conflicts occur between and among the data structures originating from different sources.

  • Syntactic Conflicts:
    • Data Type Conflicts
    • Structural Conflicts
    • Key Conflicts
  • Semantic Conflicts:
    • Scale Conflicts
    • Abstraction/Formula Conflicts
    • Domain Conflicts
  • Symbol Conflicts:
    • Naming Conflicts (Synonyms, Homonyms, Antonyms)

Syntactic Conflicts

  • Data Type Conflicts – The same concept projected onto different physical representations. Example: different codes for the same set of options
  • Structural Conflicts – For example, the same concept (referent) represented in one database by only a single attribute in one data source, but as a complete record of attributes in another source.
  • Key Conflicts – Two systems using different unique keys for the same concept.
    • As an example, from a freight rail project I once worked, one set of systems represented a “station” by using the nearest Mileboard number to the station, while another set used an industry standard designator called a “SPLC” which was a code assigned to every reported station on all rail lines in North America.
    • In this example, the two different keys conflicted syntactically (e.g., Mileboard was an integer, SPLC was a string), and semantically (e.g., Mileboards are only meaningful within the context of a single railroad, being the distance from the origin of the line, while SPLCs are universal designators within the context of North America railroads).

Semantic Conflicts

  • Scale Conflicts
    • Same data structure but representing different units. For example, corporate revenue represented as currency, but one using US Dollars and the other using CANADIAN Dollars.
  • Abstraction/Formula Conflicts
    • Same data structure and “symbol”, but two different formulas used to calculate values.
  • Domain Conflicts
    • Similar symbols and data structure, but two different sets of valid values or ranges of values.
    • For example, references to Customers in two systems each have assigned numeric identifiers, but the same customer has different assigned identifiers in each system.

Data Integration

The data integration specification documents how the symbols in two (or more) systems are similar and how they are different. The specification describes how the conflicts identified (under the rough categories described above) can be resolved to produce and combine comparable data symbols from each system. From a practical point of view, researching and documenting/describing the conflicts and similarities between symbols in two different systems is the same activity as defining the data integration specification which would be used to automate the integration.

Comparability: How Software Works

Back in 1990, I was working on a contract with NASA building a prototype database integration application. This was the dawn of the Microsoft Windows era, as Windows 3.0 had just been released (or was about to be). Oracle was still basically a start-up relational database vendor trying to reach critical mindshare. The following things did not yet exist which we take for granted today (and even think of as kind of out dated):

  • ODBC – allowing standardized access to databases from the desktop
  • Microsoft Access and similar personal data management utilities
  • Java (in fact most of the current web software stack was still just the twinkles in the eyes of their subsequent inventors)
  • Message-based engines, although EDI techniques existed
  • SOA and XML data formats
  • Screen-scrapers, user simulators, ETL utilities…

The point is, it was still largely a research project just to connect different databases that an enterprise might be using. Not only did the data representational difficulties that we face today exist back in 1990, but there was also a complete lack of infrastructure to support remote connection to databases: from network communication protocols, to query interfaces, to security and session continuity functions, even to standardized query languages (SQL was not the dominant language for accessing data back then), and more.

In this environment, NASA had asked us to prototype a generic capability that would permit them to take user search criteria, and to query three different database applications. Then, using the returned results from the three databases, our tool was to generate a single, unified query result.

While generally a successful prototype, during a critical review, it became clear to NASA and to us that maintaining such an application would be horribly expensive, so the research effort was ended, and the final report I wrote was delivered, then put into the NASA archives. It is just as well too, because within five years, much of the functional capabilities we’d prototyped had started to become available in more robust, standards-based commercial products.

What follows is a handful of excerpts from the final report, which while now out of context, still expresses some important ideas about how software symbols actually work. The gist of the excerpt describes how software establishes the comparability and sometimes the equivalence of meaning of the symbols it manipulates.

In a nutshell, software works with memory addresses with particular patterns of voltage (or magnetic field direction) representing various concepts from the human world. Software is constantly having to compare such “structures” together in order to establish either equivalence of meaning, or to alter meaning through the alteration of the pattern through heavily constrained manipulations. The key operation for the computer, therefore, is to establish whether or not two symbols are “comparable“. If they are not comparability, quite literally, then the computer cannot reliably compare them and produce a meaningful result.

Without further ado, here are the important excerpts from the research study’s final report, which I wrote and delivered to NASA in November 1990.

“Database Integration Graphical Interface Tools, Future Directions and Development Plan”, Geoff Howe, November 1990

2.2 The Comparability of Fields

There are many kinds of comparisons that can be made among fields. In databases, the simplest level of comparability is at the data type level. If two fields have the same simple data type (e.g., integer, character, fixed string, real number), then they can be compared to each other by a computer. This level of comparability is called “basal comparability”. Thus, if fields A and B are both integers, they can be combined, compared and related in any way appropriate for two integers.

However, two elements meeting the qualification for basal comparability may still be incomparable at the next level, that of the syntactic level. The syntactic level of comparability is that level in which the internal structure of a field becomes important. Examples of internal formats which might matter and might be important at this level include date formats, identification code formats, and string formats. In order to compare two fields in different formats, one or the other of these fields would have to be converted into the other format, or else both would have to be converted into a third format. The only meaningful comparisons that can be made among the fields of a database or databases must be made at the syntactic level.

As an example, suppose A is a field representing a date in Julian format, and suppose B is a field representing a date in Gregorian format. Assuming that both fields are stored as integers, comparing these dates would be meaningless because they lack the same syntactic structure. In order to compare these dates one or the other of these dates would have to be converted into the other format, or else both would have to be converted into a third format.

Unfortunately, having the same syntactic structure is not a guarantee that two fields can be compared meaningfully by a computer process. Rather, syntactic comparability is the minimum requirement for meaningful comparison by computer process. Another form of comparability must be incorporated as well, that of semantic comparability. Semantic comparability is based on the equivalence of the meanings attached to the contents of some pair of data items. The semantics of data items are not readily available to computer processes directly; a separate description in some form must be used to allow the computer to understand the semantic equivalence of concepts. Once such representation is in place, the computer should be able to reason over the semantic equivalence of concepts.

As an example of semantic comparability consider the PCASS fields, ITEM PART NUMBER from the FMEA PARTS table of the PCASFME subsystem, and CRIT_LRU_PART_# from the CRITICAI LRU table of the PCASCLRU subsystem. Under certain circumstances, both of these fields will hold the part numbers of “line replaceable units” or LRUs. Hence, these fields are semantically comparable. Given a list of the contents of ITEM PART NUMBER, and a similar list for CRIT LRU PART #, the assumption can be made that some of the same “line replaceable units” will be referenced in both lists.

Semantic comparability is useful when integrating data from different databases because it can be used to indicate the equivalence of concepts. Yet, semantic comparability does not imply syntactic comparability, and thus both must be present in order to satisfactorily integrate the values of fields from different databases. A definition of the equivalence of fields across databases can now be offered. Two fields are equivalent if they share the same base type; if their internal syntactic structure is the same; if their representational domains are the same; and if they represent the same concept in all contexts.

2.3 Heterogeneous Data Dictionary Architecture

 The approach which seems to have the most documentary support in the research for solving the integration of heterogeneous distributed databases uses a two-tiered data dictionary to support the construction of location-independent queries. The single data dictionary, used by both the single-site database management system, and the homogenous distributed environment, is split in two across the physical-conceptual boundary. This results in a two-level dictionary where one level describes in detail the physical fields of each integrated database, and the second level describes the general concepts stored across systems. For each unique concept represented by the physical level., there would be an entry in the conceptual level data dictionary describing that concept. Figure 2 shows the basic architecture of the two level data dictionary.

As an example of the difference between the conceptual and physical data dictionary levels, consider again the field PCASFME.FMEA PARTS.ITEM PART NUMBER. This is the full name of the actual field in the PCASS database. The physical level of the data dictionary would have this full name, plus the details of how this field is represented (character string, twelve places long). The conceptual level of the data dictionary would contain a description of the contents of the field, and a conceptual field name, “line replaceable unit part number”. Other fields in other tables of PCASS or in other databases may also have the same meaning. This fact poses the problem of mapping the concept to the physical field, which will be described below. Notice, however, how much easier it would be for a user to be able to recall the concept “line replaceable unit part number”, as opposed to the formal field name. This ease of recall is one of the major benefits of the two-level data dictionary being proposed. Two important relationships exist between the conceptual and physical data dictionaries. One of the relationships between fields of the conceptual level data dictionary and fields of the physical level data dictionary can be characterized as one-to-many. That is, one concept in the conceptual data dictionary could have many physical implementations. Identification of this type of relationship would be a matter of identifying and recording the semantic equivalences across system boundaries among fields at the physical level. All physical fields sharing the same meaning are examples of this one-to-many relationship.

Within the PCASS system, the concept of a line replaceable unit part number” occurs in a number of places. It has already been mentioned that both the ITEM PART NUMBER field of the FMEA_PARTS table, and the CRIT LRU PART # field of the CRITICAI_LRU table, represent this concept. The relationship between the concept and these two fields is, therefore, one-to-many.

The second type of relationship which may also be present, depending on the nature of the existing databases, relates several different concepts to a single field. This relationship is characterized as “many-to-one”. Systems which have followed strict database design rules should result in a situation where every field of the database represents one and only one concept. In practical implementations, however, it is often the case that this rule has not been thoroughly implemented, for a variety of reasons. Thus it is more than likely, especially in large database systems, that some field or set of fields may have more than one meaning under various circumstances. Often, these differences in meaning will be indicated by the values of other associated fields.

As an example of this type of relationship, consider the case of the ITEM PART NUMBER field of the PCASS table FMEA PARTS in the FMEA dataset one-more time. This field can have many meanings depending on the value of the PART TYPE field in the same table. If PART TYPE is set to “LRU”, the ITEM PART NUMBER field contains a line replaceable unit part number. If PART TYFE is set to “SRU”, the ITEM PART NUMBER field actually contains a shop replaceable unit part number. Storing both kinds of part numbers in the same structure is convenient. However, in order to use the ITEM PART NUMBER field properly, the user must know how to read and set the PART TYPE field to disambiguate the meaning of any particular instance of the record. Thus, the PART TYPE field in the physical database must hold either an “SRU” or “LRU” flag to indicate the particular meaning desired at any one time.

In the heterogeneous environment, it may be possible to find a different database in which the same two concepts which have been stored in one filed in one database, are stored in separate fields. It may in fact be possible that in one or more databases, only one of the two concepts has been stored. This is certainly the case among the separate data sets which make up the PCASS system. For example, in the PCASCLRU data set, only the “line replaceable unit part number” concept is stored (in the field, CRIT_LRU_PART_#). For this reason, the conceptual level of the data dictionary must include both concepts. Then there must be some appropriate construct within the data definition language of the data dictionary system which could express the constraints under which any particular field had any particular meaning. In order to be useful in raising the level of data location transparency, these conditional semantics must be entered into the data dictionary using this construct.

It is obvious now that the relationship between entries in the conceptual data dictionary and the physical data dictionary is truly many to many (see Figure 3). To implement such a relationship, using relational techniques, a third major structure (in addition to the set of tables supporting the conceptual data dictionary and the set of tables supporting the physical data dictionary) must be developed to mediate this relationship. This structure is described in the next section.

2.3.1 Conceptual – Physical Data Mapping

As an approach to implement this mapping from conceptual to physical structures, a table must be developed which relates every concept with the fields which represent it, and every field with the concepts it represents. This table will consist of tautological statements of the semantic equivalence of physical fields to concepts. A tautology is a logical statement that is true in all contexts and at all times. In thiis approach, the tautologies take the following form (please note that the “==” operator means “is semantically equivalent to”, not “is equal to”):

 normalized field f == field a from location A

 The normalized field f of the above example corresponds directly to an entry in the conceptual data dictionary. We call the field, f, normalized to indicate that it is a standard form. As will be described later, the comparison of values from different databases will be supported by normalizing these values into the representation described in the conceptual data dictionary for the normalized field.

Conditional semantics must now be added to the structure to support discussion. Given a general representation for a tautology, conditional semantics may be represented by adding logical operations to the right side of the equivalence. Assume that a new database, D, has a field, d1, which is equivalent to the normalized field, f, but only when certain other fields have specific values. Logically, we could represent this in the following manner:

normalized field f == field d1 from location D iff
field d2 from location D = VALUE1 AND
field d3 from location D = VALUE2 AND …
field dn from location D opn VALUEn

 In more general terms, the logical statement of the tautology would be as follows:

 R == P iff  E

where R is the normalized field representation, P is the physical field, and E is the set of equivalence constraints which apply to the relation. In our part number example, the following tautologies would be stored in the mapping:

Line Replaceable Unit Part Number == PCASFME.FMEA.PARTS.ITEM_PART_NUMBER iff PCASFME.FMEA.PARTS.PART_TYPE = “LRU”

Shop Replaceable Unit Part Number == PCASFME.FMEA.PARTS.ITEM_PART_NUMBER iff PCASFME.FMEA.PARTS.PART_TYPE = “SRU”

Line Replaceable Unit Part Number == PCASCLRU.CRITICAL_LRU_CRIT_LRU_PART_#

The condition statements are similar to condition statements in the SQL query language. In fact, this similarity is no accident, since these conditions wilt be added to any physical query in which ITEM PART NUMBER is included.

From a user’s point of view, implementing this feature allows the user to create a query over the concept of a line replaceable unit part number without having to know the conditions under which any particular field represents that concept. In addition, by representing the general – concept of a line replaceable unit part number, something the user would be very familiar with, this conceptual mapping technique has also hidden the details of the naming conventions used in each of the physical databases.

2.4.2 Integrating Data Translation Functions Into the Data Dictionary

In the simplest case, the integration of data translation functions into the data dictionary would be a matter of attaching to the data mapping tautologies described above a field which would store an indication of the type of translation which must occur to transform a result from its Location-specific form into the normalized form. This approach can be simplified further by allowing translations at the basal level to be identified by the source and target data types involved, and not recording any further information about the translation. It may not be unreasonable to assume that in certain well-defined domains, most of the translation functions required would be either identity functions or simple basal translation functions.

It is now possible to define completely the data structure required to store any arbitrary physical-conceptual field mapping tautology. The data structure would consist of the following parts:

  • concept field – a single, unique concept which the physical projection represents
  • normalized – a reference to the conceptual data dictionary entry used to represent the concept
  • physical projection – the field or set of fields from the physical data dictionary which under the conditions specified in the equivalence constraints represent the concept
  • equivalence constraints – the conditions under which the physical projection can be said to represent the concept
  • translation function – the function which must be performed on the physical projection in order to transform it into the normalized format of the normalized field

The logical statement of the tautology would be as follows:

R = Ft (P) iff E

where R is the normalized field representation, Ft is the translation function over the physical projection, P, and E is the set of equivalence constraints which apply to the relation. The exact implementation of this data structure would depend on the environment in which the system were to be developed, and would have to be specified in a physical design document. Note that instead of the “==” sign, which was defined above as “is semantically equivalent to”, has been replaced by “=” which means “is equivalent to”, and is a stronger statement. The “=” implies that not only is the left side semantically equivalent to the right, but it is also syntactically equivalent.

What’s in a Name: Not That Much, Actually

The referenced paper is seminal. The comments that appear here are largely unaltered from when I first wrote them back in 1989. I follow this older writing with some additional conclusions, looking back over twenty years of experience working with data.

September 23, 1989:

When parsing a record-based system’s data, the software developer is faced with all of the problems of data structure semantics described by W. Kent (in William Kent, “Limitations of Record Based Information Models”, ACM Transactions on Database Systems 4(1), March 1979. Also John Mylopolous and Michael Brodie (eds), Readings in Artificial Intelligence and Databases, Morgan Kaufman, San Mateo, California, 1989. [20 pp]).

Field naming problems can be handled by naming all fields with a field number, then providing synonyms for all fields. I gave each field a “name” similar to the name of the original system which was possibly meaningless. This name was to allow for maintenance and information mapping between systems. Then, using synonyms I could give a more semantically significant name to the field. The record is just a place keeper – the concept represented is buried in the code supporting the use of the record, or perhaps by agreement (explicit or implicit) among the designers and users of the system. When this agreement is verbal, or worse, implied by training, that’s when the trouble arises: idiosyncratic usage enters the picture, along with the possibly disasterous loss of meaning accompanying the departure of those whose concept is being represented.

November 1, 2009:

This note was just one of several ideas I was toying with as I worked on a thesis paper for my Masters. The project I was working on was to integrate and add expert system capabilities (using Prolog) to an existing business application built on top of COBOL fixed record structures. What it describes is the idea I used to get around the very badly named columns of the COBOL records in order to improve the effectiveness and readability of the Prolog code. The basic trick was to put into the Prolog knowledgebase multiple names for the same data structures and attach to these Prolog structures logic statements that permitted the statement (in nearly human-language terms) of logical constraints.

In later years, I have come to recognize that this problem of naming conventions within code, while important to an extent, is not as important as some practitioners think. The fact of the matter is that the computer could care less what the column name of a table is, or the variable name within a program, etc. For all the computer cares, so long as the programming code references the right data structure at the right moment consistently, the actual references might as well be unique, semantically meaningless numbers.

Naming conventions are for the humans who have to write and maintain the code, or, more generally, who have to directly interact with the data structures. And while there can often be contentious, protracted debate amongst software developers on the “right” naming convention for various situations, in my mind, it is not usually worth the amount of attention it gets during development.

If left to my own devices, then the naming convention I try to impose is as richly semantic as possible. Column names and table names are as close to expressing the intended content, down to including qualifying adjectives, and role names to an appropriate, context-specific noun. The context I select the name from is defined by the context of the problem domain for which the software is being written. I also try to be very consistent in the use of names and name parts from one end to the other of whatever system I’m working on.

If the system already has a naming convention, so long as it can be written down in a set of repeatable rules, I’ll use whatever it is. Oftentimes I find I have to rationalize and standardize terms used previously, due to the fact that at different times, different developers may have used different conventions.

I have participated in efforts at making a universal naming convention, and these have all ultimately hit a wall and been stopped (the reasons for this have been to this point the primary subject of this blog – even if I haven’t explicitly described the scenario yet). Namely, the cross-context politics, long initial duration, required ongoing maintenance activities and ultimately the diminishing returns of such efforts cause them to sink from their own weight.

But even when I have had complete control over the data structure development, and I have had time to craft the “perfect” name for each column, even when I’ve checked and double checked and triple checked that I have consistently applied the same naming convention from one end of the system to the next, once my software has gone into use, it hasn’t taken long for the user community to start redefining the meaning of some aspect of the data structure. Or, the requirement changes and the programming team must change the usage of one of my finely-crafted data structures so that it supports a new meaning, not reflected in that finely crafted name.

This can be frustrating, and it can also pose a long term hazard to the maintenance of the system, as either the original meaning or the new meaning becomes a minority of the usage. But it is not the end of the world, and it does not always break the software if the code is changed to handle the new meaning correctly.

However, it does mean that the actual name of the field no longer reflects the contents it holds. But if the code is working properly, the name no longer matters to the operation of the system. Plus, the maintenance problem such a change presents is also no big deal, so long as the revised meaning is captured in an appropriate dictionary and made available to the programming team for future reference.

Why is this the case? The real truth is that the data structure stores symbols which have a meaning within a context defined by the USERS of the software. The data structures merely represent SYNTAX of the symbols, consisting of the data type of the symbol, and the manipulations of the symbol performed by the code. So long as the manipulations are applied appropriately to the correct part of the syntax, no matter HOW it is named, then the software will manage the MEANING intended by the USERS, despite of, not because of, the naming convention of the data structure.

Hence, what’s in a name used on a data structure? From the computer’s point of view, not so much. From the human’s point of view, since the meaning can change over time, the name shouldn’t be trusted until the code has been reviewed to confirm the content. So there again, not so much…

Functions On Symbols

Data integration is a complex problem with many facets. From a semiotic point of view, quite a lot of human cognitive and communicative processing capabilities is involved in the resolution. This post is entering the discussion at a point where a number of necessary terms and concepts have not yet been described on this site. Stay tuned, as I will begin to flesh out these related ideas.

You may also find one of my permanent pages on functions to be helpful.

A Symbol Is Constructed

Recall that we are building tautologies showing equivalence of symbols. Recall that symbols are made up of both signs and concepts.

If we consider a symbol as an OBJECT, we can diagram it using a Unified Modeling Language (UML) notation. Here is a UML Class diagram of the “Symbol” class.

UML Diagram of the "Symbol" Object

UML Diagram of the "Symbol" Object

The figure above depicts how a symbol is constructed from both a set of “signs” and a set of “concepts“. The sign is the arrangement of physical properties and/or objects following an “encoding paradigm” defined by the members of a context. The “concept” is really the meaning which that same set of people (context) has projected onto the symbol. When meaning is projected onto a physical sign, then a symbol is constructed.

Functions Impact Both Structure and Meaning

Symbols within running software are constructed from physical arrangements of electronic components and the electrical and magnetic (and optical) properties of physical matter at various locations (this will be explained in more depth later). The particular arrangement and convention of construction of the sign portion of the symbol defines the syntactic media of the symbol.

Within a context, especially within the software used by that context, the same concept may be projected onto many different symbols of different physical media. To understand what happens, let’s follow an example. Let’s begin with a computer user who wants to create a symbol within a particular piece of software.

Using a mechanical device, the human user selects a button representing the desired symbol and presses it. This event is recognized by the device which generates the new instance of the symbol using its own syntactic medium, which is the pulse of current on a closed electrical circuit on a particular wire. When the symbol is placed in long term storage, it may appear as a particular arrangement of microscopic magnetic fields of various polarities in a particular location on a semi-metalic substrate. When the symbol is in the computer’s memory, it may appear as a set of voltages on various microscopic wires. Finally, when the symbol is projected onto the computer monitor for human presentation, it forms a pattern of phosphoresence against a contrasting background allowing the user to perceive it visually.

Note through all of the last paragraph, I did not mention anything about what the symbol means! The question arises, in this sequence of events, how does the meaning of the symbol get carried from the human, through all of the various physical representations within the computer, and then back out to the human again?

First of all, let’s be clear, that at any particular moment, the symbol that the human user wanted to create through his actions actually becomes several symbols – one symbol for each different syntactic representation (syntactic media) required for it to exist in each of the environments described. Some of these symbols have very short lives, while others have longer lives.

So the meaning projected onto the computer’s keyboard by the human:

  • becomes a symbol in the keyboard,
  • is then transformed into a different symbol in the running hardware and operating system,
  • is transformed into a symbol for storage on the computer’s hard drive, and
  • is also transformed into an image which the human perceives as the shape of the symbol he selected on the keyboard.

But the symbol is not actually “transforming” in the computer, at least in the conventional notion of a thing changing morphology. Instead, the primary operation of the computer is to create a series of new symbols in each of the required syntactic media described, and to discard each of the old symbols in turn.

It does this trick by applying various “functions” to the symbols. These functions may affect both the structure (syntactic media) of the symbol, but possibly also the meaning itself. Most of the time, as the symbol is copied and transferred from one form to another, the meaning does not change. Most of the functions built into the hardware making up the “human-computer interface” (HCI) are “identity” functions, transferring the originally projected concept from one syntactic media form to another. If this were not so, if the symbol printed on the key I press is not the symbol I see on the screen after the computer has “transformed” it from keyboard to wire to hard drive to wire to monitor screen, then I would expect that the computer was broken or faulty, and I would cease to use it.

Sometimes, it is necessary/desirable that the computer apply a function (or a set of functions called a “derivation“) which actually alters the meaning of one symbol (concept), creating a new symbol with a different meaning (and possibly a different structure, too).

%d bloggers like this: