Consensus gentium
An ancient criterion of truth, the consensus gentium (Latin for agreement of the peoples), states “that which is universal among men carries the weight of truth” (Ferm, 64). A number of consensus theories of truth are based on variations of this principle. In some criteria the notion of universal consent is taken strictly, while others qualify the terms of consensus in various ways. There are versions of consensus theory in which the specific population weighing in on a given question, the proportion of the population required for consent, and the period of time needed to declare consensus vary from the classical norm.*
* “Consensus theory of truth”, Wikipedia entry, November 8, 2008.
The Data Thesaurus
October 25, 2005
So much of IT’s best practices are taken for granted that no one ever asks if there might be a better way. An example of this is in the area of data standards, enterprise data modeling, and Master Data Management (MDM). The core idea of these initiatives is to try to create a single data dictionary in which every concept important to the enterprise is recorded once, with a single standardized name and definition.
The ideal promoted by this approach is that everyone who works with data in the organization will be much more productive if they all follow one naming convention, and if every data item is documented only once. Sounds logical and practical, and yet when we look around for examples of organizations who have managed to successfully create such a document, complete it for ALL of their systems, even commercial software applications, and then who have kept it maintained and complete for more than a year or two, we find very few. In fact in my experience, which has included a number of valiant efforts, I have found no examples.
When one digs into the anecdotal reasons why such success seems so rare, some mixture of the following statements are often heard:
- The company lost its will, the sponsor left and so they cut the budget.
- It took too long, the business has redirected the staff to focus on “tactical” efforts with short return on investment cycles.
- Even after all that work, no one used it, so it was not maintained.
- We were fine until the merger, then we just haven’t been able to keep up with the document and the systems integration/consolidation activities.
- Our division and their division just never agreed.
- We got our part done, but that other group wouldn’t talk to us.
With ultimate failure at the enterprise level the more common experience, it’s surprising that no one involved in the performance and practice of data standardization has questioned what might really be going on. Lots of enterprises have had successes within smaller efforts. Major lines of business may successfully establish their own data dictionaries for specific projects. Yet very few, if any, have succeeded in translating these tactical successes into truly enterprise-altering programs.
What’s going on here is that the search for the “consensus gentium” as the Romans called it, the universal agreement on the facts and nature of the world by a group of individuals, is a never-ending effort. Staying abreast of the changes in the world that affect this consensus is increasingly impossible, if it ever was possible.
The point here is that IT and the enterprise needs to stop trying to create a single universal dictionary. It must be recognized that such a comprehensive endeavor is an impossible task for all but the most extravagantly financed IT organizations. It can’t be done because the different contexts of the enterprise are constantly morphing and changing. Keeping abreast of changes costs a tremendous amount in both time and effort, and dollars. Proving an appropriate return on investment for such an ongoing endeavour is problematic, and suffers from the problem of diminishing returns.
A better approach must be out there. One that takes advantage of the tactical point solutions that most enterprises seem to succeed with, while taking into account the practical limitations imposed by the constant press of change that occurs in any “living” enterprise. This blog attempts to document first-principles affecting the entire endeavor, and will build a case based on the human factors which create the problem in the first place.
A better approach?
Why not build data dictionaries for individual systems or even small groups (as is often the full extent attempted and completed in most organizations). But instead of trying to extend these point solutions into a universal solution, take a different approach, namely the creation of a “data thesaurus” in which portions of each context are related to each other as synonyms, but only as needed for some particular solution. This thesaurus would track the movement of information through the organization by mapping semantics through and across changes in the “syntactics” of the data carrying this information. The thesaurus would need to track the context of a definition, and that definition would be less abstract and more detailed than those created by the current state of the practice. Links across contexts within the organization would be filled in only as practicality required, as the by-product of data integration projects or system consolidation efforts.
What’s wrong with the data dictionary of today:
- obtuse naming conventions (including local standards and ISO)
- abstract data structures that have lost connection with actual data structures
- only one name for a concept, when different contexts may have their own colloquialisms – making it hard for practitioners to find “their data”, and even causing the introduction of additional entries for synonyms and aliases as if they were separate things
- abstracted or generalized definitions reflecting the “least common denominator” and losing the specificity and nuance present in the original contexts
- loss of variations and special cases
- detachment from modern software development practices like Agile, XP and even SOA
A Parable for Enterprise Data Standardization (as practiced today)
The enterprise data standard goal of choosing “one term for one concept with one definition” would be the same thing as if the United Nations convened an international standards body whose charter would be to review all human languages and then select the “one best” term for every unique concept in the world. Selection, of course, would be fairly determined to ensure that the term that “best captures” the concept, no matter what the original language was in which the idea was first expressed, would be the term selected. Besides the absurd nature of such a task, consider the practical impossibility of such a task.
First, getting sufficient representation of the world’s languages to make the process fair would require a lot of time. Once started, think of the months of argument, the years and decades that would pass before a useful body of terms would be established and agreed upon. Consider also that while these eggheads were deliberating, life around them would continue. How many new words or concepts would be coined in every language before the first missives would come out of this body? Once an initial (partial) standard was chosen, then the proselytizing would begin. Consider the difficult task of convincing the entire world to stop using the terms of their own language. How would the sale be made? Appealing to some future when “everyone will speak the same language” thus eliminating all barriers to communication most likely. As a person in this environment, how do you learn all of those terms – and remember to use them?
The absurdity of this scenario is fairly clear. Then why do so many data standardization efforts approach their very similar problem in the same way? The example above may be extreme, and some will say that I’ve exaggerated the issue, but that’s just the point I’m trying to make. When one talks with the practitioners of data standardization efforts, they almost always believe that the end goal they are striving for is nothing less than the complete standardization of the enterprise. They may realize intellectually that the job may never be finished, but they still believe that the approach is sound, and that if they can just stay at it long enough, they’ll eventually attain the return on investment and justify their long effort.
If the notion of the UN attempting a global standardization effort seems absurd, than why is the best practice of data standardization the very same approach? If we create a continuum for the application of this approach (see figure) starting at the very smallest project (perhaps the definition of the data supporting a small application system used by a subset of a larger enterprise), and ending at this global UN standardization effort, one has to wonder where along this scale does the practical success of the small effort turn into the absurd impossibility of the global effort? If we choose a point on this continuum and say “here and no further” then no doubt arguments will ensue. Probably, there will be individuals who find the parable above to not be ridiculous. Likewise, there will be others who believe that trying any standardization is a waste of time. Others might try to rationally put an end point on the chart at the point representing their current employer. These folks will find, however, that their current employer merges with another enterprise in a few months, which then raises the question is the point of absurdity further out now, at the ends of the combined organization?
Myself, I believe in being practical, as much as possible. The point of absurdity for me is reached whenever the standardization effort becomes divorced from other initiatives of the enterprise and becomes its own goal. When the data standardization focuses on the particular problem at hand, then the return on the effort can be justified. When data standardization is performed for its own sake, no matter how noble or worthy the sentiment expressed behind the effort, then it is eventually going to overextend its reach and fail.
If we all agree that at SOME point on the continuum, attempting data standardization is an absurd endeavor, then we must recognize that there is a limit to the approach of trying to define data standards. The smaller the context, the more the likelihood of success, and the more utility of the standard to that context. Once we have agreed to this premise, the next question that should leap to mind is: Why don’t our data dictionaries, tools, methods, and best practices record the context within which they are defined? Since we agree we must work within some bounds or face an absurdly huge task, why isn’t it clear from our data dictionaries that they are meaningful only within a specific context?
The XML thought leaders have recognized the importance of context, and while I don’t believe their solution will ultimately solve the problems presented by the common multi-context environments we find ourselves working in, it is at least an attempt. This construct is the “namespace” used to unambiguously tie an XML tag to a validating schema.
Data standards proponents, and many data modelers have not recognized the importance and inevitability of context to their work. They come from a background where all data must be rationalized into a single, comprehensive model, resulting in the loss of variation, ideosyncracy and colloquialism from their environments. These last simply become the “burden of legacy systems” which are anathema to the state of the practice.
Filed under: Common Information Model, Context, data, Data Thesaurus, Master Data Management, Software | Tagged: Common Information Model, data, Data Thesaurus, Master Data Management, Software | 2 Comments »