Unmanage Master Data Management

Master Data Management is a discipline which tries to create, maintain and manage a single, standardized conceptual information model of all of an enterprise’s data structures. Taking as its goal that all IT systems eventually will be unified under a single semantic description so that information from all corners of the business can be understood and managed as a whole.

In my opinion, while I agree with the ultimate goal of information interoperability across the enterprise, I disagree with the approach usually taken to get there. A strategy that I might call:

  • Data Management with Multiple Masters
  • Uncontrolled/Unmanaged Master Data Management
  • Associative Search on an Uncontrolled Vocabulary
  • Emergent Data Management (added 2015)
  • Master-less Data Management (added 2015)

takes a different approach. The basic strategy is to permit multiple vocabularies to exist in the enterprise (one for each major context that can be identified). Then we build a cross reference of the semantics only describing the edges between these contexts (the “bridging” contexts between organizations within the enterprise), where interfaces exist. The interfaces that would be described and captured in this way would include non-automated ones (e.g., human mediated interfaces) as well as the traditionally documented software interfaces.

Instead of requiring that the entire content of each context be documented and standardized, this approach would provide the touchpoints between contexts only. New software (or business) integration tasks which the enterprise takes on would require new interfaces and new extensions of mappings, but would only have to cover the content of the new bridging context.

Information collected and maintained under this strategy would include the categorization of data element structures as follows:

  1. Data structure syntax and basic manipulations
  2. Origin Context and element Role (for example, markers versus non-markers)
  3. Storage types: transient (not stored), temporary (e.g. staging schemas and work tables), permanent (e.g., structures which are intended to provide the longest storage
  4. “Pass-through” versus “consumed” data elements. Also called “traveller” and “fodder”, these data structures and elements have no meaning and possibly no existence (respectively) in the Target Context.

For data symbols that are just “passing through” one context to another, these would be the traveller symbols (as discussed on one of my permanent pages and in the glossary) whose structure is simply moved unchanged from one context to the next, until it reaches a context which recognizes and uses them. “Fodder” symbols are used to trigger some logic or filter to change the operation of the bridging context software, but once consumed, do not move beyond the bridge.

The problem that I have encountered with MDM efforts is that they don’t try to scope themselves to what is RECOGNIZABLY REQUIRED. Instead, the focus is on the much larger, much riskier effort of the attempted elimination of local contexts within the enterprise. MDM breaks down in the moment it becomes divorced from a practical, immediate attempt to capture just what is needed today. The moment it attempts to “bank” standard symbols ahead of their usage, the MDM process becomes speculative, and proscriptive. The likelihood of wasting time on symbology which ultimately is wrong and unused is very high, once steps past the interface and into the larger contexts are taken.

Uses of Metamorphic Models in Data Management and Governance

In the Master Data Management arena, Metamorphic Models would allow the capture of the data elements necessary to stitch together an enterprise. By recognizing the information needed to pass as markers or to act as travellers, the scope of the data governance task should be reducible to a practical minimum.

Then the data governance problem can be built up only as needed. The task becomes, properly, just another project-related activity similar to Change Control and Risk Management, instead of the academic exercise into which it often devolves.

The scope of data management should focus on and document 100% of the data being moved across interfaces, whether these interfaces are automated or human-performed. Simple data can just be documented, and the equivalence of syntax and semantics captured. Data elements that act as markers for the processes should be recorded. Also all data elements/structures intended merely to make the trip as travellers should be indicated.

This approach addresses the high-value portion of the enterprise’s data structures, while minimizing work on documenting concepts which only apply within a particular context.

EXAMPLE: Syntactic Medium in an Anchor State

Just what is an “Anchor State“? An example will explain this better.

Take an “extract-transform-load” (ETL) process in a Data Warehouse application that copies data from one system (a database) to another based on some criteria. In particular, the example organization needs to capture customer’s names for use in a business intelligence application measuring the success of marketing mass-mailings. An ETL process will be defined (in terms used within the Metamorphic Modeling convention) as a transformation from a source Anchor State (source) to a target Anchor State(target). The syntactic medium of the source application contains a table called “EMPLOYEE”. This data structure has been co-opted by the user organization to include customer information. The organization has chosen to use this table to represent customers since it is the only data structure available in their system that associates a person’s name to an address, telephone number and e-mail account, and it has no other means of recording this information about its customers.

 The source Anchor State has been constrained, therefore, to the “EMPLOYEE” data structure, and to the set of symbols within that medium which represent customers. That same medium, in a different Anchor State, may have been constrained to the set of “managers”.

 So, how does the ETL process recognize the set of symbols within the “EMPLOYEE” data structure that represent customers? The user organization realized that the application containing this data structure also contained a table called “EMPLOYEETYPE” which contains user-defined codes for defining types of employees. This table’s primary key is a coded value stored in a field named “EMPTYPE”, which also appears as a foreign key reference in the “EMPLOYEE” table. The organization decided to create a symbol, namely a code in this EMPLOYEETYPE table to represent the “customer type”. Then, whenever they want to record information about a customer in the EMPLOYEE table, they assign this code value to the “EMPTYPE” column on the row representing this customer.

 The following figure depicts a portion of an “Entity Relation Diagram” which defines the “EMPLOYEE” and “EMPLOYEETYPE” tables in this application. It also shows a subset of the values contained within the “EMPLOYEETYPE” table, as defined by this organization.

Example Employee Table Data Model

Example Employee Table Data Model

As can be seen in the figure, there are actually three different “EMPLOYEETYPE” codes defined to represent the concept of “customer”. These are EMP_TYPE values 5, 6, and 7, representing “Customers”, “Premier Customers” (which the organization has defined as important customers), and “Good Customers”. Asside from the “business practice” that these three types can be used to differentiate “customers” from other types of entities, there is nothing intrinsic to the structures that indicates this. Hence, from an application standpoint, all types are equal and will be manipulated in the same way.

 From the point of view of the ETL under development, however, the significance of the usage of these three codes is critical to its proper operation. The source Anchor State for the ETL is defined as the set of raw symbols within the “EMPLOYEE” table that have one of the “customer” type code values in their corresponding EMPTYPE column. For this ETL (transformation), the EMPTYPE column and its values represent the semantic marker for “customer” in this source Anchor State. The Anchor State therefore consists of the data structures, “EMPLOYEE” and “EMPLOYEETYPE”, and the constraint that only the rows of the “EMPLOYEE” table where the EMP_TYPE value is 5, 6, or 7 define what the ETL should consider to be “customers”.


All pages are Copyright (C) 2004, 2009 by Geoffrey A. Howe
All Rights Are Reserved

%d bloggers like this: