Value Proposition of Metamorphic Modeling

Integration projects are started for the following reasons: the replacement of core application systems, especially such applications as enterprise resource planning (ERP) systems,the creation of enterprise data warehousing or business intelligence capabilities, the establishment of supply chain automation and business to business “e-Commerce”, the automation of business processes through workflow,the replacement or introduction of infrastructure, especially of “middleware”, the federation and synchronization of corporate systems due to mergers, the establishment of a “service oriented architecture,” and the introduction of Semantic Web technologies, especially such things as the Resource Description Framework (RDF) and the Web Ontology Language (OWL).

Each of these endeavors have their merits and value to the organization, some more than others at different times in the organization’s lifetime. While each of these projects can share data integration requirements, they also have vast differences in requirements such as throughput, periodicity (the continuum from real-time, instant synchronization to periodic batch update), communication infrastructure, and data storage strategy. Despite the similarities in their data integration requirements, it is these other differences which have led vendors to develop products with highly diverse architectures.

One of the more interesting consequences of the diversity of integration products and architectures is that oftentimes, the vendors seem unaware of the similarities between their products, especially if they are being marketed to different types of projects. The same can often been said of their customers. It is often the case, therefore, that each new integration project is approached as completely separate and unrelated to other existing or planned integration efforts within the organization. Often this can mean that the organization assembles different teams of people to staff the different projects. Staffing such projects often focuses on the team’s familiarity with the chosen technologies or the languages used to invoke the integration tool, instead of their ability to understand and think through how the data ought to be integrated for the highest-value. This in turn leads, as well, to each project inventing its own methods for documenting (or often not documenting) the data integration.

Very large organizations may, over time, find themselves implementing examples of each of the types of projects listed above. If they don’t take a broad perspective to each problem, they are likely to find themselves with investments in several, incompatible data integration solutions, with little or no way of reusing the knowledge of data equivalences that are embedded in each one.

With this system-architectural and market environment in mind, the Metamorphic Modeling Methodology has been defined. Metamorphic Modeling provides a language for describing and capturing the integration design details of all of an organization’s data integration efforts. It presents a standard, reusable way for an organization to produce high quality designs for data integration of any type and for any purpose. Using it, the organization will find the following capabilities open to it: a “design for integration” can be codified and standardized, and a body of reusable work products can be developed with applicability across the full spectrum of data integration projects, resulting in less redundancy of effort and more consistency of results across the organization; the ability to move the data integration skills from one platform to another – the portability of method across different problem types; a cadre of practitioners can be trained in the methodology, establishing a team which can produce reliable, consistent results repeatedly, for any data integration project the organization is faced with;high-quality, consistent integration designs are more easily managed, their implementation more readily measured, tested and verified; high-value business knowledge can be retained by the organization while projects are freed to locate the best quality and most cost-effective development team they can for the chosen architecture, even and especially when the organization decides to outsource the actual development effort (perhaps even offshore); and established, pre-existing data integrations can be reverse-engineered into Metamorphic Modeling conventions, perhaps even automatically, where they can then be used as specifications for re-implementation in a different technology or tool.

EXAMPLE: Syntactic Medium in an Anchor State

Just what is an “Anchor State“? An example will explain this better.

Take an “extract-transform-load” (ETL) process in a Data Warehouse application that copies data from one system (a database) to another based on some criteria. In particular, the example organization needs to capture customer’s names for use in a business intelligence application measuring the success of marketing mass-mailings. An ETL process will be defined (in terms used within the Metamorphic Modeling convention) as a transformation from a source Anchor State (source) to a target Anchor State(target). The syntactic medium of the source application contains a table called “EMPLOYEE”. This data structure has been co-opted by the user organization to include customer information. The organization has chosen to use this table to represent customers since it is the only data structure available in their system that associates a person’s name to an address, telephone number and e-mail account, and it has no other means of recording this information about its customers.

 The source Anchor State has been constrained, therefore, to the “EMPLOYEE” data structure, and to the set of symbols within that medium which represent customers. That same medium, in a different Anchor State, may have been constrained to the set of “managers”.

 So, how does the ETL process recognize the set of symbols within the “EMPLOYEE” data structure that represent customers? The user organization realized that the application containing this data structure also contained a table called “EMPLOYEETYPE” which contains user-defined codes for defining types of employees. This table’s primary key is a coded value stored in a field named “EMPTYPE”, which also appears as a foreign key reference in the “EMPLOYEE” table. The organization decided to create a symbol, namely a code in this EMPLOYEETYPE table to represent the “customer type”. Then, whenever they want to record information about a customer in the EMPLOYEE table, they assign this code value to the “EMPTYPE” column on the row representing this customer.

 The following figure depicts a portion of an “Entity Relation Diagram” which defines the “EMPLOYEE” and “EMPLOYEETYPE” tables in this application. It also shows a subset of the values contained within the “EMPLOYEETYPE” table, as defined by this organization.

Example Employee Table Data Model

Example Employee Table Data Model

As can be seen in the figure, there are actually three different “EMPLOYEETYPE” codes defined to represent the concept of “customer”. These are EMP_TYPE values 5, 6, and 7, representing “Customers”, “Premier Customers” (which the organization has defined as important customers), and “Good Customers”. Asside from the “business practice” that these three types can be used to differentiate “customers” from other types of entities, there is nothing intrinsic to the structures that indicates this. Hence, from an application standpoint, all types are equal and will be manipulated in the same way.

 From the point of view of the ETL under development, however, the significance of the usage of these three codes is critical to its proper operation. The source Anchor State for the ETL is defined as the set of raw symbols within the “EMPLOYEE” table that have one of the “customer” type code values in their corresponding EMPTYPE column. For this ETL (transformation), the EMPTYPE column and its values represent the semantic marker for “customer” in this source Anchor State. The Anchor State therefore consists of the data structures, “EMPLOYEE” and “EMPLOYEETYPE”, and the constraint that only the rows of the “EMPLOYEE” table where the EMP_TYPE value is 5, 6, or 7 define what the ETL should consider to be “customers”.


All pages are Copyright (C) 2004, 2009 by Geoffrey A. Howe
All Rights Are Reserved

%d bloggers like this: