How Meaning Attaches to Data Structures: A Summary

What follows is a high level summary of how humans attach meaning to various kinds of data structures within a computer. It will serve as a good baseline account, though certainly not an exhaustive one, providing a model upon which more detailed dicussion can begin. 

 Background Terminology

Computer systems provide functionality to support the performance and record of business processes. They do that through three inter-related features: DATA, LOGIC, and PRESENTATION. The presentation consists of information displays permitting both an information visualization aspect and an information capture aspect. The logic consists of several aspects, much of it having to do with support of the presentation and manipulation of displays, but also a lot of it having to do with creation, transformation and storage of data. Data consists of sets of symbols constructed in a systematic, regular fashion using a set of data structures. Different data structures are constructed to represent different aspects of the recorded activity. It is in the relationships between the macro and micro structures where the specific detailed information captured.generated by the business process resides. By following a codified, rigid construction of its data structures, the computer system is able to record multiple recurring instances of similar events. Through the development of fixed transformations using program logic, the computer system is able to make routine, conventional conclusions about those events or observations, and it is able to maintain and retain those observations virtually indefinitely.
Data is maintained and stored in DATA STRUCTURES. The more regular these data structures are, the more easily they are interpreted by a broad audience of software developers. In most situations, the PRESENTATION of the data captured by a system to the end user of that system is in a more directly understandable form than the way that information is stored in the computer.  (This statement is not only trivially true, but in a very deep sense too, since the computer actually stores everything using more and more complex sequences of binary digits. That’s a different subject than our current presentation.)  The data structures within the computer system typically exist in two, simultaneous forms, one intended to support human reasoning (through what is often called a “logical”, “abstract” or “conceptual” model) and one supporting manipulations by the computer. Most software developers today strictly deal with the abstract model of the data for design, coding, and discussion. (There are still some developers working in assembly level code, but even that is at a more abstract level than the actual electro-mechanical machinations of the actual hardware!)
An obvious observation, at least on its face, is that different computer systems will store data representing similar ideas using different structures. We need to keep this in the back of our minds as we progress through the rest of this discussion, but it will be more directly adressed in other entries.
 A final thought concerns sets of data of similar structure, called a POPULATION. A population of data consists of some set of data symbols, all constructed using the same data structure pattern which represents a set of similar ideas. The classification of populations of data structures applies to the DATA portion of systems, represents an analogous classification of sets of observed events external to the computer system, and is affected by and affecting the LOGIC and PRESENTATION portions of the computer system. A more detailed definition of the notion of a “population” will also be treated in separate sections.

Commonalities of Structure

Many computer systems, especially those built in support of business (or other human activity) processes, are constructed using a conventional system of abstract data structures. (When I say they are “conventional” what I mean is that the majority of software developers follow conventional patterns for the construction of data structures to represent their idiosynchratic subject areas.) Whether these structures are called “objects”, “tables”, “records”, or something else, they typically take the form of a heterogenous collection of smaller structures grouped together into regular conglomerations. Instances or examples of the larger collections of data structures will each be said to “represent” individual intances of some real-world conglomerate. Each of the individual component element structures of these conglomerations will each be said to represent the individual attributes or characteristics of the real-world conglomerate object. In order to permit efficient processing by the computer,   instances of similar phenomenon will be represented by the same kind of conglomeration.
Typically, business systems will be based on a data structure called a RECORD.  Records consist of a series of “attribute data structures” all related in some fashion to each other. (A more complex structure called an “object” still has record-like attributes combined together to represent a larger whole, the nuances and variation of object-based representation is a subject for later.)  Each RECORD will stereotypically symbolize one instance of a particular concept. This could be a reference to and certain observed details of a real-world object, or it could be something more ephemereal like observations of an event. For example, one “PERSON” record would represent a single individual person.
RECORDS themselves consist of individually defined data elements or FIELDS. Each RECORD of a particular type will share the same set of FIELDS. Each FIELD will symbolize one kind of fact about the thing symbolized by the RECORD. For example, a NAME field on a PERSON record will record what the represented individual’s name is, at least as it was at the time the record was created. 
The set of all records within a system having the same structure will typically be collected and stored together, often in a data structure called a TABLE. Each TABLE will symbolize the set of KNOWN INSTANCES of whatever type of thing each record represents. TABLES are also described as having ROWS and COLUMNS. Each row of a table is one RECORD. The set of shared element-attribute structures across the set of  rows can be described as the “columns” of the table. Each column represents the set of all instances of a FIELD in the table, in other words, the same field across all records. Tables are a commonly used data structure because they readily support interpretation using relational algebra and set theoretic operations, as well as being easily presented and understood both by human and computer.  

Basic Data Structures and Their Relationships

The nomenclature of “record”, ” table”, “row”, “column” and “fields” describes the construction building blocks of an abstract syntactic medium whose usage permits humans to represent complex concepts within the computer system. By assigning names to various collections and combinations of these generic structures, humans project meaning onto them. Using diagrams called “data models”, a short hand of sorts allows the modeler to describe how the generic tables and fields relate to each other and what these relationships signify in the external world. These models also, by virtue of the typified short hand they use, allows for the generation of computer logic that can be applied to a database to support certain standard operations and manipulations of the data generated by a computer system.

Traditional data modeling results in the creation of a data dictionary which relates each structural element to a particular kind of concept. Every structure will be given a name, and if the developers are diligent, these can be associated with more fully realized text descriptions as well. Some aspects of the data structures are not described, at least typically, within a data model, such as populations or subsets of records with similar structures.

Traditional data dictionary entries record name and description of the set of all structures contained in a table. Using a set of structures to represent a set or collection of similar objects is itself a symbolic action. So not only does each row in a table represent one instance of some type of thing, and each column represents one observed (or derived) fact or attribute of that instance, but the collection of all instances of these row data structures also represents the logical set or population of these things.

The strategy for applying meaning to these data structures begins when the decision is made to treat the entirety of each record as the representation of a member of a population of like things. Being similar, then, a set of fields is conceived to capture various detailed observations regarding the things. These fields are intended to capture details about both how each thing is different from the other things in the collection, but also how different things may share similarities. Much of the business logic of the application system will be consumed by the comparisons between individual things, and the mathematical derived counts (and other metrics) of those sets of things (and of subsets within). Using the computer to compare the bit sequences contained in each field, the computer will indicate whether these contents are the same or different between different instances. Humans will then interpret the results of these comparisons by projecting the conclusion out of the computer and into the conceptual world.

For example, let’s say that we have defined the computer sequence “10101010” to represent a reference to a specific person, “Julie Smith”. If we take two different instances of bit sequences and compare them in the computer, the computer will tell us if they are the same or not. As humans, we would then interpret the purely electro-mechanical result which the computer calculated that “10101010” and “10101010” are the same as an indication that the two instances of these sequences represent the same specific person. Likewise, we would interpret a computer result indicating that two bit sequences were not the same as an indication that different people were being referred to.  This type of projection of meaning from mechanical result to logical inference is fundamental to the way humans use computers.

The specific number of fields and their bit sequence representations (data types)  that are developed within a computer application is entirely dependent on the complexity of the problem domain and the attributes of the objects required to reason over that domain. However, no matter how simple or complex, it is the projection of meaning onto the representation of these attributes in the computer and the projection of an interpretation onto the results of the computer comparisons of the physical representations which makes the computer the powerful engine that it is in our society.

How Row Subsets Represent Subpopulations
How Row Subsets Represent Subpopulations

 

Leave a comment