Is MDM An Attempt to Reach “Consensus Gentium”?

Consensus gentium

An ancient criterion of truth, the consensus gentium (Latin for agreement of the peoples), states “that which is universal among men carries the weight of truth” (Ferm, 64). A number of consensus theories of truth are based on variations of this principle. In some criteria the notion of universal consent is taken strictly, while others qualify the terms of consensus in various ways. There are versions of consensus theory in which the specific population weighing in on a given question, the proportion of the population required for consent, and the period of time needed to declare consensus vary from the classical norm.*

* “Consensus theory of truth”, Wikipedia entry, November 8, 2008.

The Data Thesaurus

October 25, 2005

 So much of IT’s best practices are taken for granted that no one ever asks if there might be a better way. An example of this is in the area of data standards, enterprise data modeling, and Master Data Management (MDM). The core idea of these initiatives is to try to create a single data dictionary in which every concept important to the enterprise is recorded once, with a single standardized name and definition.

The ideal promoted by this approach is that everyone who works with data in the organization will be much more productive if they all follow one naming convention, and if every data item is documented only once. Sounds logical and practical, and yet when we look around for examples of organizations who have managed to successfully create such a document, complete it for ALL of their systems, even commercial software applications, and then who have kept it maintained and complete for more than a year or two, we find very few. In fact in my experience, which has included a number of valiant efforts, I have found no examples.

When one digs into the anecdotal reasons why such success seems so rare, some mixture of the following statements are often heard:

  1. The company lost its will, the sponsor left and so they cut the budget.
  2. It took too long, the business has redirected the staff to focus on “tactical” efforts with short return on investment cycles.
  3. Even after all that work, no one used it, so it was not maintained.
  4. We were fine until the merger, then we just haven’t been able to keep up with the document and the systems integration/consolidation activities.
  5. Our division and their division just never agreed.
  6. We got our part done, but that other group wouldn’t talk to us.

With ultimate failure at the enterprise level the more common experience, it’s surprising that no one involved in the performance and practice of data standardization has questioned what might really be going on. Lots of enterprises have had successes within smaller efforts. Major lines of business may successfully establish their own data dictionaries for specific projects. Yet very few, if any, have succeeded in translating these tactical successes into truly enterprise-altering programs.

What’s going on here is that the search for the “consensus gentium” as the Romans called it, the universal agreement on the facts and nature of the world by a group of individuals, is a never-ending effort. Staying abreast of the changes in the world that affect this consensus is increasingly impossible, if it ever was possible.

 The point here is that IT and the enterprise needs to stop trying to create a single universal dictionary. It must be recognized that such a comprehensive endeavor is an impossible task for all but the most extravagantly financed IT organizations. It can’t be done because the different contexts of the enterprise are constantly morphing and changing. Keeping abreast of changes costs a tremendous amount in both time and effort, and dollars. Proving an appropriate return on investment for such an ongoing endeavour is problematic, and suffers from the problem of diminishing returns.

 A better approach must be out there. One that takes advantage of the tactical point solutions that most enterprises seem to succeed with, while taking into account the practical limitations imposed by the constant press of change that occurs in any “living” enterprise. This blog attempts to document first-principles affecting the entire endeavor, and will build a case based on the human factors which create the problem in the first place.

 A better approach?

Why not build data dictionaries for individual systems or even small groups (as is often the full extent attempted and completed in most organizations). But instead of trying to extend these point solutions into a universal solution, take a different approach, namely the creation of a “data thesaurus” in which portions of each context are related to each other as synonyms, but only as needed for some particular solution. This thesaurus would track the movement of information through the organization by mapping semantics through and across changes in the “syntactics” of the data carrying this information. The thesaurus would need to track the context of a definition, and that definition would be less abstract and more detailed than those created by the current state of the practice. Links across contexts within the organization would be filled in only as practicality required, as the by-product of data integration projects or system consolidation efforts.

 What’s wrong with the data dictionary of today:

  1. obtuse naming conventions (including local standards and ISO)
  2. abstract data structures that have lost connection with actual data structures
  3. only one name for a concept, when different contexts may have their own colloquialisms – making it hard for practitioners to find “their data”, and even causing the introduction of additional entries for synonyms and aliases as if they were separate things
  4. abstracted or generalized definitions reflecting the “least common denominator” and losing the specificity and nuance present in the original contexts
  5. loss of variations and special cases
  6. detachment from modern software development practices like Agile, XP and even SOA

A Parable for Enterprise Data Standardization (as practiced today)

The enterprise data standard goal of choosing “one term for one concept with one definition” would be the same thing as if the United Nations convened an international standards body whose charter would be to review all human languages and then select the “one best” term for every unique concept in the world. Selection, of course, would be fairly determined to ensure that the term that “best captures” the concept, no matter what the original language was in which the idea was first expressed, would be the term selected. Besides the absurd nature of such a task, consider the practical impossibility of such a task.

First, getting sufficient representation of the world’s languages to make the process fair would require a lot of time. Once started, think of the months of argument, the years and decades that would pass before a useful body of terms would be established and agreed upon. Consider also that while these eggheads were deliberating, life around them would continue. How many new words or concepts would be coined in every language before the first missives would come out of this body? Once an initial (partial) standard was chosen, then the proselytizing would begin. Consider the difficult task of convincing the entire world to stop using the terms of their own language. How would the sale be made? Appealing to some future when “everyone will speak the same language” thus eliminating all barriers to communication most likely. As a person in this environment, how do you learn all of those terms – and remember to use them?

The absurdity of this scenario is fairly clear. Then why do so many data standardization efforts approach their very similar problem in the same way? The example above may be extreme, and some will say that I’ve exaggerated the issue, but that’s just the point I’m trying to make. When one talks with the practitioners of data standardization efforts, they almost always believe that the end goal they are striving for is nothing less than the complete standardization of the enterprise. They may realize intellectually that the job may never be finished, but they still believe that the approach is sound, and that if they can just stay at it long enough, they’ll eventually attain the return on investment and justify their long effort.

If the notion of the UN attempting a global standardization effort seems absurd, than why is the best practice of data standardization the very same approach? If we create a continuum for the application of this approach (see figure) starting at the very smallest project (perhaps the definition of the data supporting a small application system used by a subset of a larger enterprise), and ending at this global UN standardization effort, one has to wonder where along this scale does the practical success of the small effort turn into the absurd impossibility of the global effort? If we choose a point on this continuum and say “here and no further” then no doubt arguments will ensue. Probably, there will be individuals who find the parable above to not be ridiculous. Likewise, there will be others who believe that trying any standardization is a waste of time. Others might try to rationally put an end point on the chart at the point representing their current employer. These folks will find, however, that their current employer merges with another enterprise in a few months, which then raises the question is the point of absurdity further out now, at the ends of the combined organization?

Where Is The Threshold of Absurdity in Data Standardization?

Where Is The Threshold of Absurdity in Data Standardization?

Myself, I believe in being practical, as much as possible. The point of absurdity for me is reached whenever the standardization effort becomes divorced from other initiatives of the enterprise and becomes its own goal. When the data standardization focuses on the particular problem at hand, then the return on the effort can be justified. When data standardization is performed for its own sake, no matter how noble or worthy the sentiment expressed behind the effort, then it is eventually going to overextend its reach and fail.

If we all agree that at SOME point on the continuum, attempting data standardization is an absurd endeavor, then we must recognize that there is a limit to the approach of trying to define data standards. The smaller the context, the more the likelihood of success, and the more utility of the standard to that context. Once we have agreed to this premise, the next question that should leap to mind is: Why don’t our data dictionaries, tools, methods, and best practices record the context within which they are defined? Since we agree we must work within some bounds or face an absurdly huge task, why isn’t it clear from our data dictionaries that they are meaningful only within a specific context?

The XML thought leaders have recognized the importance of context, and while I don’t believe their solution will ultimately solve the problems presented by the common multi-context environments we find ourselves working in, it is at least an attempt. This construct is the “namespace” used to unambiguously tie an XML tag to a validating schema.

Data standards proponents, and many data modelers have not recognized the importance and inevitability of context to their work. They come from a background where all data must be rationalized into a single, comprehensive model, resulting in the loss of variation, ideosyncracy and colloquialism from their environments. These last simply become the “burden of legacy systems” which are anathema to the state of the practice.

Unmanage Master Data Management

Master Data Management is a discipline which tries to create, maintain and manage a single, standardized conceptual information model of all of an enterprise’s data structures. Taking as its goal that all IT systems eventually will be unified under a single semantic description so that information from all corners of the business can be understood and managed as a whole.

In my opinion, while I agree with the ultimate goal of information interoperability across the enterprise, I disagree with the approach usually taken to get there. A strategy that I might call:

  • Data Management with Multiple Masters
  • Uncontrolled/Unmanaged Master Data Management
  • Associative Search on an Uncontrolled Vocabulary
  • Emergent Data Management (added 2015)
  • Master-less Data Management (added 2015)

takes a different approach. The basic strategy is to permit multiple vocabularies to exist in the enterprise (one for each major context that can be identified). Then we build a cross reference of the semantics only describing the edges between these contexts (the “bridging” contexts between organizations within the enterprise), where interfaces exist. The interfaces that would be described and captured in this way would include non-automated ones (e.g., human mediated interfaces) as well as the traditionally documented software interfaces.

Instead of requiring that the entire content of each context be documented and standardized, this approach would provide the touchpoints between contexts only. New software (or business) integration tasks which the enterprise takes on would require new interfaces and new extensions of mappings, but would only have to cover the content of the new bridging context.

Information collected and maintained under this strategy would include the categorization of data element structures as follows:

  1. Data structure syntax and basic manipulations
  2. Origin Context and element Role (for example, markers versus non-markers)
  3. Storage types: transient (not stored), temporary (e.g. staging schemas and work tables), permanent (e.g., structures which are intended to provide the longest storage
  4. “Pass-through” versus “consumed” data elements. Also called “traveller” and “fodder”, these data structures and elements have no meaning and possibly no existence (respectively) in the Target Context.

For data symbols that are just “passing through” one context to another, these would be the traveller symbols (as discussed on one of my permanent pages and in the glossary) whose structure is simply moved unchanged from one context to the next, until it reaches a context which recognizes and uses them. “Fodder” symbols are used to trigger some logic or filter to change the operation of the bridging context software, but once consumed, do not move beyond the bridge.

The problem that I have encountered with MDM efforts is that they don’t try to scope themselves to what is RECOGNIZABLY REQUIRED. Instead, the focus is on the much larger, much riskier effort of the attempted elimination of local contexts within the enterprise. MDM breaks down in the moment it becomes divorced from a practical, immediate attempt to capture just what is needed today. The moment it attempts to “bank” standard symbols ahead of their usage, the MDM process becomes speculative, and proscriptive. The likelihood of wasting time on symbology which ultimately is wrong and unused is very high, once steps past the interface and into the larger contexts are taken.

Uses of Metamorphic Models in Data Management and Governance

In the Master Data Management arena, Metamorphic Models would allow the capture of the data elements necessary to stitch together an enterprise. By recognizing the information needed to pass as markers or to act as travellers, the scope of the data governance task should be reducible to a practical minimum.

Then the data governance problem can be built up only as needed. The task becomes, properly, just another project-related activity similar to Change Control and Risk Management, instead of the academic exercise into which it often devolves.

The scope of data management should focus on and document 100% of the data being moved across interfaces, whether these interfaces are automated or human-performed. Simple data can just be documented, and the equivalence of syntax and semantics captured. Data elements that act as markers for the processes should be recorded. Also all data elements/structures intended merely to make the trip as travellers should be indicated.

This approach addresses the high-value portion of the enterprise’s data structures, while minimizing work on documenting concepts which only apply within a particular context.