Chasing the Chimera: Searching for Universal Truth in the Data Center

There’s a widespread belief in the data community (sometimes stated and sometimes just implied) that not only does the pursuit of the definition of a universal Single Version of Truth have “obvious technical merits”, but that it is crucial to our collective success. Having spent an entire career helping customers in many different industries codify and fabricate business systems, including participating in more than a few attempts at establishing a single version of truth by standardizing data, I have been surprised by my own revelation in recent years that we, as an industry, have been chasing an unreachable, and possibly an undesirable, chimera.

It’s like the old riddle about how to swallow an elephant. The solution is to take small bites, and just keep at it. This is a common metaphor used whenever a large project to standardize an enterprise’s data is begun. The problem is, trying to create that all-encompassing, single standard for all of the data in the organization is not really comparable to eating a rotting elephant corpse. You’re not really eating a finite mass of elephant at all! A more appropriate metaphor would be to consider that you are actually chewing the grass on the edge of a vast plain, and it just keeps growing faster than you can chew!

The value of some data standardization cannot be denied. Re-engineering selected areas can result in better data quality, timeliness and actual value. Certainly we have seen that the wheels of e-commerce can be sped up by careful selection of the right standard. For some practitioners, however, taking this “piecemeal” approach, they feel, is insufficient, and may even detract from the ultimate goal. These practitioners have seen how much good came from a little standardization and rationalization, and then conclude that taking the practice to its logical conclusion should reap the ultimate benefit.

The problem with this logic is that it fails to take into account the cost of completion. My point is that no matter how valuable the end point is expected to be, the number of systems that come on and off line, the number of changes to the business, the number of external business partners, the number of external standards bodies, the number of mergers and acquisitions, means that they will never reach that end state.

Some people may agree with me on this point, and others may not. However, even those who might agree with me on the ultimate likelihood of success, may still take the same old approach to the problem: convening a steering committee of diverse end users, locking them in a room for weeks on end, and forcing them to define an abstract, but universal data dictionary. Only to find that major portions are already out of date, or that major subject areas are still missing, or worse still, that most people outside of this pressure-cooker committee disagree with or do not understand the result!

An alternative approach to this search for the universal would be to recognize that diversity of meaning and representation will be a given in any sufficiently large organization of humans, and to address this inevitability directly. This can be accomplished by creating a “federated data dictionary” following these rules:

  1. Don’t attempt to “swallow the elephant” – try “mapping the terrain” instead by creating well-documented data dictionaries of each context.
  2. Document the context that defined a concept in the first place.
  3. Only standardize as much as is necessary to knit together those portions of the enterprise that must work together, and do no more.
  4. Create a “data thesaurus” in addition to the data dictionaries that describes and documents the equivalence of meaning between the data structures of the different contexts, but only for those which must touch each other across the enterprise.
  5. Focus on the points of integration between the contexts first, where data flows from one context to another.

Isn’t it time we recognize that diversity exists? Maybe if we stop the never-ending chase for the universal, we’ll realize that diversity has its value too, and start trying to do a better job accommodating it.


