Example of How Meaning Is Attached to Structure

What follows is a detailed example of the thought process followed by a software developer to create a class of data structures and how meaning is attached to those structures.

Consider that the meaning of one data structure may be composed of the collection of meanings of a set of smaller structures which themselves have meaning. Take the following description as the meaning to be represented by a structure:

An employee is a human being or person. Each employee has a unique identity of their own. Each employee has a name, which may be the same as the name of a different person or employee. Being human, each employee has an age, calculated by counting the number of years since they were born up to some other point in time (such as present day).Each person of a certain age may enter into a marriage with another human being, who in turn also has their own identity and other attributes of a person.

To represent this information using data structures (i.e., to project the meaning of this information onto a data structure), we might tie the various concepts about a human being/employee to a computer-based data structure. Recognizing that a human being is an object with many additional characteristics of which we might want to know about, we might choose to project the concept of “human beings” or “people” onto a relational table and the concept of a particular individual onto one of that table’s rows (or a similar record structure).

This table would represent a set of individual human beings, and onto each row of the table would be projected the meaning of a particular human being. Saying this again in a more conventional manner, we would say that each row of the table will reference a singular and particular human being, the all of the rows will represent the set of all human beings we’ve observed in the context of our usage of the computer system.

In a more mathematical vein, we would define a projection Þ from the set of actual human beings Α onto Š, (Þ(Α) |–> Š), the set of data structures such that for any α in Α where α is a human being, there is a record or row σ in Š that represents that human being.

A record data structure being a conglomeration of fields, each of which can symbolically represent some attribute of a larger whole, then we might project additional attributes of the human being, such as their name and identifier, to particular fields within the record. If σ is the particular record structure representing a particular human being, α, then the meaning (values) of the attributes of that person could be associated with the fields, f1..fn, of that record through attribute-level projections, ψ1..ψn for attributes 1 .. n.

To represent a particular person, first we would project the reference to the person to a particular row, Þ(α) |–> σ, then we would also project the attribute facts about that person onto the individual fields of that row:

ψ1(α.1) |–> σ.f1

ψn(α.n) |–> σ.fn

Projection onto Relational Structure

When modeling a domain for incorporation into computer software, the modeler’s task is to define a set of structures which software can be written to manipulate. When that software is to use relational database management systems, then the modeler will first project the domain concepts onto abstract relational structures defined over “tuples”. These abstract structures have a well-defined mathematical nature which if followed provides very powerful manipulations. The developer projects meaning onto relations in a conventional way, such as by defining a relation of attributes to represent “PERSON” – or the set of persons, and another relation of attributes to represent “EMPLOYEE” – or the set of persons who are also employees. Having defined these relational sets, the relational algebra permits various mathematical operations/functions to be applied, such as “JOIN” and “INTERSECTION”. These functions have strictly defined properties and well-defined results over arbitrary tuples. The software developer having projected meaning onto the individual relations, he is also therefore able to project meaning on the outcomes of these operations which can then be used to manipulate large sets of data in an efficient, and semantically correct way.

As the developer creates the software however, they must keep in mind what these functions are doing on two levels, at the level of the set content and at the level of the represented domain (the referent of the sets and manipulations). Thus the intersection of the PERSON and EMPLOYEE relations should produce the subset of tuples (records, etc.) which has its own meaning derived from the initial projected meaning of the original sets. Namely, this intersection represents the set of PERSONS who are also EMPLOYEES, (which is the same, alternatively, as the set of EMPLOYEES who are also PERSONS). This is an important point about software: the meaning is not simply recorded in the data structure but the manipulations of the data by the computer themselves have specific connotations and implications on the meaning of data as it is processed.

Representational Redundancy

As a typical practice in the projection of information onto data structures within the relational model, there will usually be a repetition of the information projected onto more than one symbol. In particular, the reference to the identity of a single person will be represented both by the mere existence of a single row in the table, and also by a subset of fields on the row which the software developers have chosen (and which the software enforces) for this purpose. In other words, under common software development practices, each record/row as a conglomerate entity will represent a single person. In addition, there will be k attributes (1 <= k <= n) on that record structure whose values in combination also represent that same individual. These k attributes make up the “primary key” of the data structure. The software developer will use and repeat these columns on multiple data structures to permit additional concepts regarding the relationship between that person and other ideas also being recorded. For example, a copy of one person record’s primary key could be placed on another person record and be labelled “spouse”. The attributes which make up the primary key often have less mechanical meanings as well (for example, perhaps the primary key for our person includes the name attribute. As part of the primary key, the name value of the person merely helps to reference that person. It also in its own right represents the name of the person.

Advertisements

How Meaning Attaches to Data Structures: A Summary

What follows is a high level summary of how humans attach meaning to various kinds of data structures within a computer. It will serve as a good baseline account, though certainly not an exhaustive one, providing a model upon which more detailed dicussion can begin. 

 Background Terminology

Computer systems provide functionality to support the performance and record of business processes. They do that through three inter-related features: DATA, LOGIC, and PRESENTATION. The presentation consists of information displays permitting both an information visualization aspect and an information capture aspect. The logic consists of several aspects, much of it having to do with support of the presentation and manipulation of displays, but also a lot of it having to do with creation, transformation and storage of data. Data consists of sets of symbols constructed in a systematic, regular fashion using a set of data structures. Different data structures are constructed to represent different aspects of the recorded activity. It is in the relationships between the macro and micro structures where the specific detailed information captured.generated by the business process resides. By following a codified, rigid construction of its data structures, the computer system is able to record multiple recurring instances of similar events. Through the development of fixed transformations using program logic, the computer system is able to make routine, conventional conclusions about those events or observations, and it is able to maintain and retain those observations virtually indefinitely.
Data is maintained and stored in DATA STRUCTURES. The more regular these data structures are, the more easily they are interpreted by a broad audience of software developers. In most situations, the PRESENTATION of the data captured by a system to the end user of that system is in a more directly understandable form than the way that information is stored in the computer.  (This statement is not only trivially true, but in a very deep sense too, since the computer actually stores everything using more and more complex sequences of binary digits. That’s a different subject than our current presentation.)  The data structures within the computer system typically exist in two, simultaneous forms, one intended to support human reasoning (through what is often called a “logical”, “abstract” or “conceptual” model) and one supporting manipulations by the computer. Most software developers today strictly deal with the abstract model of the data for design, coding, and discussion. (There are still some developers working in assembly level code, but even that is at a more abstract level than the actual electro-mechanical machinations of the actual hardware!)
An obvious observation, at least on its face, is that different computer systems will store data representing similar ideas using different structures. We need to keep this in the back of our minds as we progress through the rest of this discussion, but it will be more directly adressed in other entries.
 A final thought concerns sets of data of similar structure, called a POPULATION. A population of data consists of some set of data symbols, all constructed using the same data structure pattern which represents a set of similar ideas. The classification of populations of data structures applies to the DATA portion of systems, represents an analogous classification of sets of observed events external to the computer system, and is affected by and affecting the LOGIC and PRESENTATION portions of the computer system. A more detailed definition of the notion of a “population” will also be treated in separate sections.

Commonalities of Structure

Many computer systems, especially those built in support of business (or other human activity) processes, are constructed using a conventional system of abstract data structures. (When I say they are “conventional” what I mean is that the majority of software developers follow conventional patterns for the construction of data structures to represent their idiosynchratic subject areas.) Whether these structures are called “objects”, “tables”, “records”, or something else, they typically take the form of a heterogenous collection of smaller structures grouped together into regular conglomerations. Instances or examples of the larger collections of data structures will each be said to “represent” individual intances of some real-world conglomerate. Each of the individual component element structures of these conglomerations will each be said to represent the individual attributes or characteristics of the real-world conglomerate object. In order to permit efficient processing by the computer,   instances of similar phenomenon will be represented by the same kind of conglomeration.
Typically, business systems will be based on a data structure called a RECORD.  Records consist of a series of “attribute data structures” all related in some fashion to each other. (A more complex structure called an “object” still has record-like attributes combined together to represent a larger whole, the nuances and variation of object-based representation is a subject for later.)  Each RECORD will stereotypically symbolize one instance of a particular concept. This could be a reference to and certain observed details of a real-world object, or it could be something more ephemereal like observations of an event. For example, one “PERSON” record would represent a single individual person.
RECORDS themselves consist of individually defined data elements or FIELDS. Each RECORD of a particular type will share the same set of FIELDS. Each FIELD will symbolize one kind of fact about the thing symbolized by the RECORD. For example, a NAME field on a PERSON record will record what the represented individual’s name is, at least as it was at the time the record was created. 
The set of all records within a system having the same structure will typically be collected and stored together, often in a data structure called a TABLE. Each TABLE will symbolize the set of KNOWN INSTANCES of whatever type of thing each record represents. TABLES are also described as having ROWS and COLUMNS. Each row of a table is one RECORD. The set of shared element-attribute structures across the set of  rows can be described as the “columns” of the table. Each column represents the set of all instances of a FIELD in the table, in other words, the same field across all records. Tables are a commonly used data structure because they readily support interpretation using relational algebra and set theoretic operations, as well as being easily presented and understood both by human and computer.  

Basic Data Structures and Their Relationships

The nomenclature of “record”, ” table”, “row”, “column” and “fields” describes the construction building blocks of an abstract syntactic medium whose usage permits humans to represent complex concepts within the computer system. By assigning names to various collections and combinations of these generic structures, humans project meaning onto them. Using diagrams called “data models”, a short hand of sorts allows the modeler to describe how the generic tables and fields relate to each other and what these relationships signify in the external world. These models also, by virtue of the typified short hand they use, allows for the generation of computer logic that can be applied to a database to support certain standard operations and manipulations of the data generated by a computer system.

Traditional data modeling results in the creation of a data dictionary which relates each structural element to a particular kind of concept. Every structure will be given a name, and if the developers are diligent, these can be associated with more fully realized text descriptions as well. Some aspects of the data structures are not described, at least typically, within a data model, such as populations or subsets of records with similar structures.

Traditional data dictionary entries record name and description of the set of all structures contained in a table. Using a set of structures to represent a set or collection of similar objects is itself a symbolic action. So not only does each row in a table represent one instance of some type of thing, and each column represents one observed (or derived) fact or attribute of that instance, but the collection of all instances of these row data structures also represents the logical set or population of these things.

The strategy for applying meaning to these data structures begins when the decision is made to treat the entirety of each record as the representation of a member of a population of like things. Being similar, then, a set of fields is conceived to capture various detailed observations regarding the things. These fields are intended to capture details about both how each thing is different from the other things in the collection, but also how different things may share similarities. Much of the business logic of the application system will be consumed by the comparisons between individual things, and the mathematical derived counts (and other metrics) of those sets of things (and of subsets within). Using the computer to compare the bit sequences contained in each field, the computer will indicate whether these contents are the same or different between different instances. Humans will then interpret the results of these comparisons by projecting the conclusion out of the computer and into the conceptual world.

For example, let’s say that we have defined the computer sequence “10101010” to represent a reference to a specific person, “Julie Smith”. If we take two different instances of bit sequences and compare them in the computer, the computer will tell us if they are the same or not. As humans, we would then interpret the purely electro-mechanical result which the computer calculated that “10101010” and “10101010” are the same as an indication that the two instances of these sequences represent the same specific person. Likewise, we would interpret a computer result indicating that two bit sequences were not the same as an indication that different people were being referred to.  This type of projection of meaning from mechanical result to logical inference is fundamental to the way humans use computers.

The specific number of fields and their bit sequence representations (data types)  that are developed within a computer application is entirely dependent on the complexity of the problem domain and the attributes of the objects required to reason over that domain. However, no matter how simple or complex, it is the projection of meaning onto the representation of these attributes in the computer and the projection of an interpretation onto the results of the computer comparisons of the physical representations which makes the computer the powerful engine that it is in our society.

How Row Subsets Represent Subpopulations
How Row Subsets Represent Subpopulations

 

Context Shifting Is Easy

Today’s discussion asks that you perform a thought experiment.

Imagine that you are sitting in a room with a bunch of other people. All of your chairs face to the front of the room where there is a large desk. A young woman walks in with a stack of papers and places them on the desk. She picks up a piece of chalk from the desk, then, still standing, she turns to face all of you, smiles and begins to speak.

Right here I’m going to pause the narrative and ask that you consider the situation. Imagine it in your head for a moment. What is the context Ive described?

So what do I mean by context? Well if I were to say that our story so far is a very familiar context for most of us, one we all remember from childhood: an elementary school classroom, then here are some of the things you might expect to happen.

Having now stated a context, you, dear reader, should have images of yourselves sitting quietly in your desks while your teacher imparts some lesson. You also already know many of the basic ground rules of being in a classroom:

  • Pay attention to the teacher
  • Take notes
  • Don’t speak unless the teacher calls on you
  • Raise your hand if you have a question or comment and the teacher will call on you

Do you recognize this context? Feels familiar and confortable, right? Great! Let’s hold this thought now and count slowly to twenty while we let the memories of this context play about in our heads.

Really, start counting, or you won’t get the total effect:

1, 2, 3, 4, 5

6, 7, 8, 9, 10

11, 12, 13, 14, 15

16, 17, 18, 19, 20

Now let me throw you a little curve ball and tell you that you’ve been thinking about this in the wrong way. The situation I described is not really a classroom and that woman is not a teacher. She’s an actress, presenting a one-woman show about a famous teacher. The desk is a set, the papers just props. You are not in a classroom, you are in a theater made to appear as a classroom. This is just a play and you are a member of the audience. In fact, so there’s no doubt in your mind about this, you suddenly remember you put your ticket stub in your front pocket.

Did you feel that grinding sensation in your head as you read these last few sentences? That shifting from the classroom to the theater context – you should actually be able to feel it happen in your mind. The fact that even this little bit of information has allowed you to sense a shift in context is not a trivial matter. Usually, when you switch contexts like this, it is never so palpable or apparent. We humans are switching contexts all of the time, sometimes in the same sentence. It is one of our particular talents to recognize and adjust our conceptualizations at will when the context changes.

We have just completely switched contexts and you didn’t even need to lift a finger, did you? Just by my saying “this is a play” your expectations have completely changed. Now that we’re in the “performance context” what has happened to our mutual expectations. First of all, the roles have shifted, instead of a teacher, our woman is an actress, you, dear reader, are not students you are an audience. As a member of the audience (especially an audience witnessing a play about a teacher) here are some of the different expectations you may now have:

  • If you raise your hand, you may get an usher, but the actress will not respond to you
  • While you will still sit quietly and listen, the expectation is that at the end of the performance, you will clap your hands
  • The actress will provide the audience (hopefully) with an entertainment

So, shifting contexts is easy. And thus, I end this little monologue by pointing out that really, dear reader, we aren’t in a theater either. Instead, we’re sharing a context called “reading a blog entry”. I hope you enjoyed this little exercise!

What’s in a Name: Not That Much, Actually

The referenced paper is seminal. The comments that appear here are largely unaltered from when I first wrote them back in 1989. I follow this older writing with some additional conclusions, looking back over twenty years of experience working with data.

September 23, 1989:

When parsing a record-based system’s data, the software developer is faced with all of the problems of data structure semantics described by W. Kent (in William Kent, “Limitations of Record Based Information Models”, ACM Transactions on Database Systems 4(1), March 1979. Also John Mylopolous and Michael Brodie (eds), Readings in Artificial Intelligence and Databases, Morgan Kaufman, San Mateo, California, 1989. [20 pp]).

Field naming problems can be handled by naming all fields with a field number, then providing synonyms for all fields. I gave each field a “name” similar to the name of the original system which was possibly meaningless. This name was to allow for maintenance and information mapping between systems. Then, using synonyms I could give a more semantically significant name to the field. The record is just a place keeper – the concept represented is buried in the code supporting the use of the record, or perhaps by agreement (explicit or implicit) among the designers and users of the system. When this agreement is verbal, or worse, implied by training, that’s when the trouble arises: idiosyncratic usage enters the picture, along with the possibly disasterous loss of meaning accompanying the departure of those whose concept is being represented.

November 1, 2009:

This note was just one of several ideas I was toying with as I worked on a thesis paper for my Masters. The project I was working on was to integrate and add expert system capabilities (using Prolog) to an existing business application built on top of COBOL fixed record structures. What it describes is the idea I used to get around the very badly named columns of the COBOL records in order to improve the effectiveness and readability of the Prolog code. The basic trick was to put into the Prolog knowledgebase multiple names for the same data structures and attach to these Prolog structures logic statements that permitted the statement (in nearly human-language terms) of logical constraints.

In later years, I have come to recognize that this problem of naming conventions within code, while important to an extent, is not as important as some practitioners think. The fact of the matter is that the computer could care less what the column name of a table is, or the variable name within a program, etc. For all the computer cares, so long as the programming code references the right data structure at the right moment consistently, the actual references might as well be unique, semantically meaningless numbers.

Naming conventions are for the humans who have to write and maintain the code, or, more generally, who have to directly interact with the data structures. And while there can often be contentious, protracted debate amongst software developers on the “right” naming convention for various situations, in my mind, it is not usually worth the amount of attention it gets during development.

If left to my own devices, then the naming convention I try to impose is as richly semantic as possible. Column names and table names are as close to expressing the intended content, down to including qualifying adjectives, and role names to an appropriate, context-specific noun. The context I select the name from is defined by the context of the problem domain for which the software is being written. I also try to be very consistent in the use of names and name parts from one end to the other of whatever system I’m working on.

If the system already has a naming convention, so long as it can be written down in a set of repeatable rules, I’ll use whatever it is. Oftentimes I find I have to rationalize and standardize terms used previously, due to the fact that at different times, different developers may have used different conventions.

I have participated in efforts at making a universal naming convention, and these have all ultimately hit a wall and been stopped (the reasons for this have been to this point the primary subject of this blog – even if I haven’t explicitly described the scenario yet). Namely, the cross-context politics, long initial duration, required ongoing maintenance activities and ultimately the diminishing returns of such efforts cause them to sink from their own weight.

But even when I have had complete control over the data structure development, and I have had time to craft the “perfect” name for each column, even when I’ve checked and double checked and triple checked that I have consistently applied the same naming convention from one end of the system to the next, once my software has gone into use, it hasn’t taken long for the user community to start redefining the meaning of some aspect of the data structure. Or, the requirement changes and the programming team must change the usage of one of my finely-crafted data structures so that it supports a new meaning, not reflected in that finely crafted name.

This can be frustrating, and it can also pose a long term hazard to the maintenance of the system, as either the original meaning or the new meaning becomes a minority of the usage. But it is not the end of the world, and it does not always break the software if the code is changed to handle the new meaning correctly.

However, it does mean that the actual name of the field no longer reflects the contents it holds. But if the code is working properly, the name no longer matters to the operation of the system. Plus, the maintenance problem such a change presents is also no big deal, so long as the revised meaning is captured in an appropriate dictionary and made available to the programming team for future reference.

Why is this the case? The real truth is that the data structure stores symbols which have a meaning within a context defined by the USERS of the software. The data structures merely represent SYNTAX of the symbols, consisting of the data type of the symbol, and the manipulations of the symbol performed by the code. So long as the manipulations are applied appropriately to the correct part of the syntax, no matter HOW it is named, then the software will manage the MEANING intended by the USERS, despite of, not because of, the naming convention of the data structure.

Hence, what’s in a name used on a data structure? From the computer’s point of view, not so much. From the human’s point of view, since the meaning can change over time, the name shouldn’t be trusted until the code has been reviewed to confirm the content. So there again, not so much…

Functions On Symbols

Data integration is a complex problem with many facets. From a semiotic point of view, quite a lot of human cognitive and communicative processing capabilities is involved in the resolution. This post is entering the discussion at a point where a number of necessary terms and concepts have not yet been described on this site. Stay tuned, as I will begin to flesh out these related ideas.

You may also find one of my permanent pages on functions to be helpful.

A Symbol Is Constructed

Recall that we are building tautologies showing equivalence of symbols. Recall that symbols are made up of both signs and concepts.

If we consider a symbol as an OBJECT, we can diagram it using a Unified Modeling Language (UML) notation. Here is a UML Class diagram of the “Symbol” class.

UML Diagram of the "Symbol" Object

UML Diagram of the "Symbol" Object

The figure above depicts how a symbol is constructed from both a set of “signs” and a set of “concepts“. The sign is the arrangement of physical properties and/or objects following an “encoding paradigm” defined by the members of a context. The “concept” is really the meaning which that same set of people (context) has projected onto the symbol. When meaning is projected onto a physical sign, then a symbol is constructed.

Functions Impact Both Structure and Meaning

Symbols within running software are constructed from physical arrangements of electronic components and the electrical and magnetic (and optical) properties of physical matter at various locations (this will be explained in more depth later). The particular arrangement and convention of construction of the sign portion of the symbol defines the syntactic media of the symbol.

Within a context, especially within the software used by that context, the same concept may be projected onto many different symbols of different physical media. To understand what happens, let’s follow an example. Let’s begin with a computer user who wants to create a symbol within a particular piece of software.

Using a mechanical device, the human user selects a button representing the desired symbol and presses it. This event is recognized by the device which generates the new instance of the symbol using its own syntactic medium, which is the pulse of current on a closed electrical circuit on a particular wire. When the symbol is placed in long term storage, it may appear as a particular arrangement of microscopic magnetic fields of various polarities in a particular location on a semi-metalic substrate. When the symbol is in the computer’s memory, it may appear as a set of voltages on various microscopic wires. Finally, when the symbol is projected onto the computer monitor for human presentation, it forms a pattern of phosphoresence against a contrasting background allowing the user to perceive it visually.

Note through all of the last paragraph, I did not mention anything about what the symbol means! The question arises, in this sequence of events, how does the meaning of the symbol get carried from the human, through all of the various physical representations within the computer, and then back out to the human again?

First of all, let’s be clear, that at any particular moment, the symbol that the human user wanted to create through his actions actually becomes several symbols – one symbol for each different syntactic representation (syntactic media) required for it to exist in each of the environments described. Some of these symbols have very short lives, while others have longer lives.

So the meaning projected onto the computer’s keyboard by the human:

  • becomes a symbol in the keyboard,
  • is then transformed into a different symbol in the running hardware and operating system,
  • is transformed into a symbol for storage on the computer’s hard drive, and
  • is also transformed into an image which the human perceives as the shape of the symbol he selected on the keyboard.

But the symbol is not actually “transforming” in the computer, at least in the conventional notion of a thing changing morphology. Instead, the primary operation of the computer is to create a series of new symbols in each of the required syntactic media described, and to discard each of the old symbols in turn.

It does this trick by applying various “functions” to the symbols. These functions may affect both the structure (syntactic media) of the symbol, but possibly also the meaning itself. Most of the time, as the symbol is copied and transferred from one form to another, the meaning does not change. Most of the functions built into the hardware making up the “human-computer interface” (HCI) are “identity” functions, transferring the originally projected concept from one syntactic media form to another. If this were not so, if the symbol printed on the key I press is not the symbol I see on the screen after the computer has “transformed” it from keyboard to wire to hard drive to wire to monitor screen, then I would expect that the computer was broken or faulty, and I would cease to use it.

Sometimes, it is necessary/desirable that the computer apply a function (or a set of functions called a “derivation“) which actually alters the meaning of one symbol (concept), creating a new symbol with a different meaning (and possibly a different structure, too).

%d bloggers like this: