Why Comparability Is Critical To Solving The Data Integration Problem

At its most basic, the task of data integration from multiple source systems is one of recognizing the EQUIVALENCY and diagnosing the CONFLICTS among sets of symbols (the data) stored in each system’s data structures (syntactic media). Data integration is accomplished when the conflicts have been eliminated through TRANSFORMATION into new COMMON SYMBOLS which are COMPARABLE at both the syntactic and semantic levels.

The end result of data integration should be that SEMANTICALLY EQUIVALENT (or at least COMPARABLE) data structures become SYNTACTICALLY EQUIVALENT (COMPARABLE) as well. When this result is achieved, the data structures are considered COMPARABLY EQUIVALENT, and the data from the different source systems can be collapsed, combined or integrated correctly.

Structural Comparability

The issue can be characterized as one of the COMPARABILITY of data between systems.

  • Syntactic Comparability is defined by the DATA TYPE and internal DATA STRUCTURE
  • Semantic Comparability is defined by the CONCEPT or MEANING projected onto the data structure by the users of the source system
  • Two data items are COMPARABLE if they share both SYNTACTIC and SEMANTIC COMPARABILITY

Typical Conflicts

Typical conflicts occur between and among the data structures originating from different sources.

  • Syntactic Conflicts:
    • Data Type Conflicts
    • Structural Conflicts
    • Key Conflicts
  • Semantic Conflicts:
    • Scale Conflicts
    • Abstraction/Formula Conflicts
    • Domain Conflicts
  • Symbol Conflicts:
    • Naming Conflicts (Synonyms, Homonyms, Antonyms)

Syntactic Conflicts

  • Data Type Conflicts – The same concept projected onto different physical representations. Example: different codes for the same set of options
  • Structural Conflicts – For example, the same concept (referent) represented in one database by only a single attribute in one data source, but as a complete record of attributes in another source.
  • Key Conflicts – Two systems using different unique keys for the same concept.
    • As an example, from a freight rail project I once worked, one set of systems represented a “station” by using the nearest Mileboard number to the station, while another set used an industry standard designator called a “SPLC” which was a code assigned to every reported station on all rail lines in North America.
    • In this example, the two different keys conflicted syntactically (e.g., Mileboard was an integer, SPLC was a string), and semantically (e.g., Mileboards are only meaningful within the context of a single railroad, being the distance from the origin of the line, while SPLCs are universal designators within the context of North America railroads).

Semantic Conflicts

  • Scale Conflicts
    • Same data structure but representing different units. For example, corporate revenue represented as currency, but one using US Dollars and the other using CANADIAN Dollars.
  • Abstraction/Formula Conflicts
    • Same data structure and “symbol”, but two different formulas used to calculate values.
  • Domain Conflicts
    • Similar symbols and data structure, but two different sets of valid values or ranges of values.
    • For example, references to Customers in two systems each have assigned numeric identifiers, but the same customer has different assigned identifiers in each system.

Data Integration

The data integration specification documents how the symbols in two (or more) systems are similar and how they are different. The specification describes how the conflicts identified (under the rough categories described above) can be resolved to produce and combine comparable data symbols from each system. From a practical point of view, researching and documenting/describing the conflicts and similarities between symbols in two different systems is the same activity as defining the data integration specification which would be used to automate the integration.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: