What’s in a Name: Not That Much, Actually

The referenced paper is seminal. The comments that appear here are largely unaltered from when I first wrote them back in 1989. I follow this older writing with some additional conclusions, looking back over twenty years of experience working with data.

September 23, 1989:

When parsing a record-based system’s data, the software developer is faced with all of the problems of data structure semantics described by W. Kent (in William Kent, “Limitations of Record Based Information Models”, ACM Transactions on Database Systems 4(1), March 1979. Also John Mylopolous and Michael Brodie (eds), Readings in Artificial Intelligence and Databases, Morgan Kaufman, San Mateo, California, 1989. [20 pp]).

Field naming problems can be handled by naming all fields with a field number, then providing synonyms for all fields. I gave each field a “name” similar to the name of the original system which was possibly meaningless. This name was to allow for maintenance and information mapping between systems. Then, using synonyms I could give a more semantically significant name to the field. The record is just a place keeper – the concept represented is buried in the code supporting the use of the record, or perhaps by agreement (explicit or implicit) among the designers and users of the system. When this agreement is verbal, or worse, implied by training, that’s when the trouble arises: idiosyncratic usage enters the picture, along with the possibly disasterous loss of meaning accompanying the departure of those whose concept is being represented.

November 1, 2009:

This note was just one of several ideas I was toying with as I worked on a thesis paper for my Masters. The project I was working on was to integrate and add expert system capabilities (using Prolog) to an existing business application built on top of COBOL fixed record structures. What it describes is the idea I used to get around the very badly named columns of the COBOL records in order to improve the effectiveness and readability of the Prolog code. The basic trick was to put into the Prolog knowledgebase multiple names for the same data structures and attach to these Prolog structures logic statements that permitted the statement (in nearly human-language terms) of logical constraints.

In later years, I have come to recognize that this problem of naming conventions within code, while important to an extent, is not as important as some practitioners think. The fact of the matter is that the computer could care less what the column name of a table is, or the variable name within a program, etc. For all the computer cares, so long as the programming code references the right data structure at the right moment consistently, the actual references might as well be unique, semantically meaningless numbers.

Naming conventions are for the humans who have to write and maintain the code, or, more generally, who have to directly interact with the data structures. And while there can often be contentious, protracted debate amongst software developers on the “right” naming convention for various situations, in my mind, it is not usually worth the amount of attention it gets during development.

If left to my own devices, then the naming convention I try to impose is as richly semantic as possible. Column names and table names are as close to expressing the intended content, down to including qualifying adjectives, and role names to an appropriate, context-specific noun. The context I select the name from is defined by the context of the problem domain for which the software is being written. I also try to be very consistent in the use of names and name parts from one end to the other of whatever system I’m working on.

If the system already has a naming convention, so long as it can be written down in a set of repeatable rules, I’ll use whatever it is. Oftentimes I find I have to rationalize and standardize terms used previously, due to the fact that at different times, different developers may have used different conventions.

I have participated in efforts at making a universal naming convention, and these have all ultimately hit a wall and been stopped (the reasons for this have been to this point the primary subject of this blog – even if I haven’t explicitly described the scenario yet). Namely, the cross-context politics, long initial duration, required ongoing maintenance activities and ultimately the diminishing returns of such efforts cause them to sink from their own weight.

But even when I have had complete control over the data structure development, and I have had time to craft the “perfect” name for each column, even when I’ve checked and double checked and triple checked that I have consistently applied the same naming convention from one end of the system to the next, once my software has gone into use, it hasn’t taken long for the user community to start redefining the meaning of some aspect of the data structure. Or, the requirement changes and the programming team must change the usage of one of my finely-crafted data structures so that it supports a new meaning, not reflected in that finely crafted name.

This can be frustrating, and it can also pose a long term hazard to the maintenance of the system, as either the original meaning or the new meaning becomes a minority of the usage. But it is not the end of the world, and it does not always break the software if the code is changed to handle the new meaning correctly.

However, it does mean that the actual name of the field no longer reflects the contents it holds. But if the code is working properly, the name no longer matters to the operation of the system. Plus, the maintenance problem such a change presents is also no big deal, so long as the revised meaning is captured in an appropriate dictionary and made available to the programming team for future reference.

Why is this the case? The real truth is that the data structure stores symbols which have a meaning within a context defined by the USERS of the software. The data structures merely represent SYNTAX of the symbols, consisting of the data type of the symbol, and the manipulations of the symbol performed by the code. So long as the manipulations are applied appropriately to the correct part of the syntax, no matter HOW it is named, then the software will manage the MEANING intended by the USERS, despite of, not because of, the naming convention of the data structure.

Hence, what’s in a name used on a data structure? From the computer’s point of view, not so much. From the human’s point of view, since the meaning can change over time, the name shouldn’t be trusted until the code has been reviewed to confirm the content. So there again, not so much…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: