0% found this document useful (0 votes)

732 views

Identifying Master Data

The document discusses the importance of understanding master data and metadata before implementing a master data management (MDM) program. It emphasizes that the core requirements are to identify master data objects across the enterprise, standardize how they are represented, and consolidate them into a single repository. This involves three key challenges: 1) collecting and analyzing metadata from various data sources, 2) resolving differences in data structure between sources, and 3) unifying different semantics and definitions of master data objects. Resolving these challenges is imperative for effective management and sharing of master data as a centralized organizational asset.

Uploaded by

desijnk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

732 views

Identifying Master Data

Uploaded by

desijnk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A DataFlux White Paper

Prepared by: David Loshin

Semantics, Metadata and Identifying Master Data

Leader in Data Quality www.dataflux.com International

and Data Integration 877–846–FLUX +44 (0) 1753 272 020
Once you have determined that your organization can achieve the benefits of
integrating data quality and data governance through introducing a master data
management (MDM) program, some typical early questions emerge, such as “What
architectural approaches will we take to deploy our MDM solution?” or “What are the
business approaches for acquiring the appropriate tools and technologies required for
MDM success?” These are good questions, but they are often asked prematurely. Even
before determining how to manage the enterprise master data asset, there are more
fundamental questions that need to be asked and comprehensively explored, such as:

• What data elements constitute our “master data?”

• How do we locate and isolate master data objects that exist within the
enterprise?

• How do we assess the variances between the different representations in order

to consolidate instances into a single view?

Because of the ways that diffused application architectures have evolved within
different organizations, it is likely that while there are a relatively small number of
core master objects used, there are many different ways that these objects are
modeled, represented and stored. For example, any application that must manage
contact information for individual customers will rely on a data model that maintains
the customer’s name. Yet one application will track an individual’s full name, while
others will break up the name into its first, middle and last parts. And even for those
that track the given and family names of a customer will do it differently – perform a
quick scan of the data sets within your own organization and you are likely to find
“LAST_NAME” attributes with a wide range of field lengths.

Figure 1: Isolating master data from different data sets.

The challenges are not limited to determining what master objects are used. Indeed,
the core requirement is to find where master objects are used and to chart a strategy
for standardizing, harmonizing and consolidating them into a master repository or
registry. When the intention is to create an organizational asset that is not just
another data silo, it is imperative that your organization provide the means for both
the consolidation and integration of master data – and facilitate the most effective and
appropriate sharing of that master data.

What is Master Data?

What are the characteristics of master data? So far, the industry has been better at
describing master data but less adept at actually defining what master data is. As a
description, master data objects are those core business objects that are used in the
different applications across the organization, along with their associated metadata,
attributes, definitions, roles, connections, and taxonomies. Master data objects are
those “things” that we care about – the things that are logged in our transaction
systems, measured and reported on in our reporting systems, and analyzed in our
analytical systems. Common examples of master data include:

• Customers

• Suppliers

• Parts

• Products

• Locations

• Contact mechanisms

For example, consider the following transaction: “David Loshin purchased seat 15B on
flight 238 from BWI to SFO on July 20, 2006.” Some of the master data elements in
this example and their types are shown in Table 1.

Master Data Object Value

Customer David Loshin
Product Seat 15B
Flight 238
Location BWI
Location SFO

Table 1: Master data elements for a typical airline reservation.

Aside from the above description, master data objects share certain characteristics:

• The real-world objects modeled within the environment as master data objects
tend to be referenced in multiple business areas. For example, the concept of
a “vendor” may exist in the finance application at the same time as in the
procurements application.
• Master data objects are referenced in both transaction and analytic system
records. While the sales system may log and process the transactions initiated
by a “customer,” those same activities may be analyzed for the purposes of
segmentation and marketing.

• Master data objects may be classified within a semantic hierarchy, with

different levels of classification, attribution and specialization applied
depending on the application. For example, we may have a master data
category of “party,” which in turn is comprised of “individuals” or
“organizations.” Those parties may also be classified based on their roles, such
as “prospect,” “customer,” “supplier,” “vendor,” or “employee.”

• Master data objects may require specialized application functions to create

new instances, as well as manage the updating and removal of instance
records. Each application that involves “supplier” interaction may have a
function enabling the creation of a new supplier record.

• They are likely to have models reflected across multiple applications, possibly
embedded in legacy data structure models.

While we may see a natural hierarchy across one dimension, the taxonomies that are
applied to our data instances may actually cross multiple hierarchies. For example, a
party may be an individual, a customer and an employee – simultaneously. In turn,
the same master data categories and their related taxonomies would be used for
transactions, analysis and reporting. For example, the headers in a monthly sales
report may be derived from the master data categories (sales by customer by region
by time period). Enabling the transactional systems to refer to the same data objects
as the subsequent reporting systems ensures that the analysis reports are consistent
with the transaction systems.

Centralizing Semantic Metadata

Master data may be sprinkled across the application environment. The objective of a
master data management program is to facilitate the effective management of the set
of master data instances as a single centralized master resource. But before we can
materialize a single master record for any entity, we must be able to:

1. Discover which data resources may contain entity information

2. Understand which attributes carry identifying information

3. Extract identifying information from the data resource

4. Transform the identifying information into a standardized or canonical form

5. Establish similarity to other standardized records

This entails cataloging the data sets, their attributes, formats, data domains,
definitions, contexts and semantics, not just as an operational resource, but rather in
a way that can be used to automate master data consolidation as well as governing
the ongoing application interactions with the master repository. In other words, to be
able to manage the master data, one must first be able to manage the master
metadata. But as there is a need to resolve multiple variant models into a single view,
the interaction with the master metadata must facilitate the resolution of three critical
aspects:
• Format at the element level

• Structure at the instance level

• Semantics across all levels.

Figure 2: Preparation for a master data integration process must resolve the
differences between the syntax, structure, and semantics of different source
data sets.

These are effectively three levels of integration that need to dovetail as a prelude to
any kind of enterprise-wide integration, and introduces three corresponding challenges
for master metadata management:

1. Collecting and analyzing master metadata

2. Resolving similarity in structure

3. Understanding and unifying master data semantics

Challenge 1: Consolidating and Analyzing Master Metadata

One approach is to analyze and document the metadata associated with all data
objects across the enterprise and use that information to guide analysts seeking out
master data. Many of the data sets may have documented some of the necessary
metadata. For example, relational database systems allow for querying table structure
and data element types, and COBOL copybooks reveal some structure and potentially
even some alias data. Some of the data may have little or no documented metadata,
such as fixed-format or character-separated files.
If the objective is to collect comprehensive and consistent metadata, as well as ensure
that the data appropriately correlates to its documented metadata, we can use data
profiling as the tool of choice. Because of its ability to apply both statistical and
analytical algorithms for characterizing data sets, data profiling can drive the empirical
assessment of structure and format metadata while simultaneously exposing of
embedded data models and dependencies.

Our consolidated metadata repository will eventually enumerate the relevant

characteristics associated with each data set in a standardized way, including the data
set name, its type (e.g., RDBMS table, VSAM file, CSV file) and the characteristics of
each of its columns/attributes (e.g., length, data type or format pattern).

At the end of this process, we will have more than simply a comprehensive catalog of
all data sets. We will also be able to review the frequency of meta-model
characteristics, such as frequently-used names, field sizes, and data types. Capturing
these values with a standard representation allows the metadata characteristics
themselves to be subjected to the kinds of statistical analysis that data profiling
provides. For example, we can assess the dependencies between common attribute
names (e.g., “CUSTOMER”) and their assigned data types (e.g., VARCHAR[20]) to
identify (and potentially standardize against) commonly-used types, sizes and
formats.

Challenge 2: Resolving Similarity in Structure

Despite the expectations that there are many variant forms and structures for your
organization’s master data, the different underlying models of each master data object
are bound to share many commonalities. For example, the structure for practically any
“residential” customer table will contain a name, an address and a telephone number.
On the other hand, almost any vendor table will probably also contain a name, an
address and a telephone number. A closer look might suggest considering an
underlying model concept of a “party,” used as the basis for both customer and
vendor. In turn, the analyst might review any model that contains those same
identifying attributes as a structure type that can be derived or is related to a party
type.

There are two aspects to structure similarity for the purpose of tracking down master
data instances. The first is seeking out overlapping structures, in which the core
attributes determined to carry identifying information for one data object overlap with
a similar set of attributes in another data object. The second is identifying derived
structures, in which one object’s set of attributes are completely embedded within
other data objects. Both cases indicate a structural relationship, and when related
attributes carry identifying information, the analyst should review those objects to
determine if they indeed represent master objects.

Challenge 3: Unifying Semantics

The third challenge focuses the qualitative difference between pure syntactic or
structural metadata (as we can discover through the profiling process), and the
underlying semantic metadata. This involves more than just analyzing structure
similarity. It involves understanding what the data means, how that meaning is
conveyed, how that meaning “connects” data sets across the enterprise, and
approaches to capturing semantics as an attribute of your metadata framework.

As a data set’s metadata is collected, the semantic analyst must approach the
business client to understand that data object’s business meaning. One step in this
process involves reviewing the degree of semantic consistency in data element naming
is related to overlapping data types, sizes and structures. The next step is to
document the business meanings assumed for each of the data objects, which involves
asking questions like:

• What are the definitions for the data elements?

• Or for the data sets themselves?

• Are there authoritative sources for the definitions?

• Do similar objects have different business meanings?

The answers to these questions not only help in determining which data sets truly
refer to the same underlying real-world objects, they also contribute to an
organizational resource that can be used to standardize a representation for each data
object as its definition is approved through the data governance process. Managing
that semantic metadata as a central asset enables the metadata repository to grow in
value as it consolidates semantics from different enterprise data collections.

Identifying and Qualifying Master Data

Once the semantic metadata has been collected and centralized, the analyst’s task of
identifying master data should be simplified. As more metadata representations of
similar objects and entities populate the repository, the frequency of specific models
will provide a basis for assessing whether the attributes of a represented object qualify
the data elements represented by the model as master data. By adding additional
characterization data for each data set’s metadata profile, we can add more knowledge
to the process of determining sources that can feed a master data repository, which
will help in the analyst’s task.

One approach is to characterize the value set associated with each column in each
table. At the conceptual level, designating a value set using a simplified classification
scheme reduces the level of complexity associated with data variance, and allows for
loosening the constraints when comparing multiple metadata instances. For example,
we can limit ourselves to six data value classes, such as these:

1. Boolean or Flag – There are only two valid values, one representing “true” and
one representing “false.”

2. Time/Date Stamp – A value that represents a point in time.

3. Magnitude – A numeric value on a continuous range, such as a quantity or an

amount.

4. Code Enumeration – A small set of values, either used directly (e.g., using the
colors “red” and “blue”) or mapped as a numeric enumeration (e.g., 1 = “red,”
2 = “blue”).

5. Handle – A character string with limited duplication across the set may be
used as part of an object description (e.g., name or address_line_1 fields
contain handle information).

6. Cross-Reference – An identifier that either is uniquely assigned to the record

or provides a reference to that identifier in another dataset.
The Fractal Nature of Metadata Profiling
At this point, each data attribute can be summarized in terms of a small number of
descriptive characteristics: data type, length, data value, class, etc. In turn, each data
set can be described as a collection of its component attributes. Because we are
looking for similar data sets with similar structures, formats and semantics, our job is
to assess each data set’s “identifying attribution,” try to find the collections of data
sets that share similar characteristics, and determine if they represent the same
objects.

Let’s summarize:

• We are using our tools to assess data element structure

• We are collecting this information into a metadata repository

• We use our tools to look for data attributes that share similar characteristics

• We use our tools to seek out attributes with similar names

• We analyze the data value sets and assign them into value classes

• We use our tools to detect similarities between representative data meta-

models

In essence, the techniques and tools we can use for determining the sources of master
data objects are the same types of tools we use for consolidating the data into a
master repository! Using data profiling, parsing, standardization and matching, we can
facilitate the process of identifying which data sets (tables, files, spreadsheets, etc.)
represent which master data objects.

Standardizing the Representation

The analyst now has a collection of master object representations. But as a prelude to
developing the consolidation road map, decisions must be made as part of the
organization’s governance process. To consolidate the variety of diverse master object
representations into a single repository, the relevant stakeholders need to agree on a
common representation as well as the underlying semantics for that representation. It
is critical that a standard representation be defined and agreed to so that the
participants expecting to benefit from the data in the master repository can effectively
share the data. And because MDM is a solution that integrates tools with policies and
procedures for data governance, there should be a process for defining and agreeing
to data standards.

Summary: Metadata Profiling Drives the Process

In effect, we have described a process for analyzing similarity of syntax, structure and
semantics as a prelude to identifying enterprise sources of master data. And since the
objective in identifying and consolidating master data representations requires
empirical analysis and similarity assessment as part of the resolution process, it is
comforting to know that the same kinds of tools and techniques that will subsequently
be used to facilitate data integration can also isolate and catalog organizational master
data.

Data Governance Playbook
100% (15)
Data Governance Playbook
168 pages
Dama Dmbok 77-100
100% (1)
Dama Dmbok 77-100
24 pages
Data Quality Remediation
50% (2)
Data Quality Remediation
9 pages
MDM and Data Governance
100% (4)
MDM and Data Governance
66 pages
CMMI Data Management Maturity Model Introduction
67% (3)
CMMI Data Management Maturity Model Introduction
54 pages
Data Steward: Prescriptive Role-Based Learning Path
No ratings yet
Data Steward: Prescriptive Role-Based Learning Path
1 page
Data Strategy Template v1 2
100% (5)
Data Strategy Template v1 2
13 pages
The IBM Data Governance Unified Process: Driving Business Value with IBM Software and Best Practices
From Everand
The IBM Data Governance Unified Process: Driving Business Value with IBM Software and Best Practices
Sunil Soares
4/5 (1)
Data Governance Process
100% (1)
Data Governance Process
23 pages
Profisee MDM - Reference Architecture
No ratings yet
Profisee MDM - Reference Architecture
5 pages
C Bw4hana 27
No ratings yet
C Bw4hana 27
10 pages
Logical Data Model Project Plan
No ratings yet
Logical Data Model Project Plan
6 pages
Sas Data Governance Framework 107325
No ratings yet
Sas Data Governance Framework 107325
12 pages
The DAMA Guide To The Data Management 1a 6
100% (2)
The DAMA Guide To The Data Management 1a 6
176 pages
IBM - Data Governance Council - Maturity Model
100% (3)
IBM - Data Governance Council - Maturity Model
16 pages
Data Governance For Master Data Management PDF
100% (4)
Data Governance For Master Data Management PDF
16 pages
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
Accenture MDM Foundations
No ratings yet
Accenture MDM Foundations
4 pages
Data & Analytics Maturity Model & Business Impact
100% (2)
Data & Analytics Maturity Model & Business Impact
28 pages
DW Architecture & Best Practices
No ratings yet
DW Architecture & Best Practices
67 pages
Enterprise Data Office PoV
No ratings yet
Enterprise Data Office PoV
10 pages
Mastering Master Data Management
100% (1)
Mastering Master Data Management
9 pages
4 Best Data Governance Models Itelligence North America
No ratings yet
4 Best Data Governance Models Itelligence North America
8 pages
Data Governance Good Practices PDF
No ratings yet
Data Governance Good Practices PDF
8 pages
Taking Data Quality To The Enterprise Through Data Governance
100% (1)
Taking Data Quality To The Enterprise Through Data Governance
28 pages
MDM Best Practices
No ratings yet
MDM Best Practices
12 pages
Data Strategies That Provide Business Value
No ratings yet
Data Strategies That Provide Business Value
12 pages
Data Governance The Way Forward
100% (3)
Data Governance The Way Forward
41 pages
The Forrester Wave™ - Enterprise Data Catalogs For DataOps, Q2 2022
No ratings yet
The Forrester Wave™ - Enterprise Data Catalogs For DataOps, Q2 2022
12 pages
Innovations in MDM Implementation: Success Via A Boxed Approach
No ratings yet
Innovations in MDM Implementation: Success Via A Boxed Approach
4 pages
Designing A Data Governance Framework PDF
100% (1)
Designing A Data Governance Framework PDF
22 pages
Defining A Data Strategy
No ratings yet
Defining A Data Strategy
9 pages
The Politics of Data Warehousing
No ratings yet
The Politics of Data Warehousing
9 pages
Implementing A Data Governance Program - Chalker 2014 PDF
100% (2)
Implementing A Data Governance Program - Chalker 2014 PDF
22 pages
MD 7257a-18 Data Strategy White Paper
No ratings yet
MD 7257a-18 Data Strategy White Paper
9 pages
Finservices Data Governance
100% (1)
Finservices Data Governance
1 page
Data Governance Trends For 2020
No ratings yet
Data Governance Trends For 2020
3 pages
Data Management vs. Data Governance - Why You Need Both - DataGalaxy
No ratings yet
Data Management vs. Data Governance - Why You Need Both - DataGalaxy
10 pages
The Data Driven Enterprise
No ratings yet
The Data Driven Enterprise
27 pages
Data Management Maturity (DMM) Model Update
100% (1)
Data Management Maturity (DMM) Model Update
24 pages
Data Quality - Trusted Data Across The Entreprise - Overview
100% (1)
Data Quality - Trusted Data Across The Entreprise - Overview
14 pages
Enterprise Data Strategy2
No ratings yet
Enterprise Data Strategy2
59 pages
Talend WhitePaper - Dashboards Data Governance
No ratings yet
Talend WhitePaper - Dashboards Data Governance
11 pages
Data Democratization
No ratings yet
Data Democratization
50 pages
Your Guide To Enterprise Data Architecture
No ratings yet
Your Guide To Enterprise Data Architecture
23 pages
An Introduction To Data Strategy
100% (3)
An Introduction To Data Strategy
8 pages
Data Quality Strategy Whitepaper
No ratings yet
Data Quality Strategy Whitepaper
15 pages
Business Intelligence A Maturity Model Covering Common PDF
No ratings yet
Business Intelligence A Maturity Model Covering Common PDF
12 pages
EDM EnterpriseDataDictionaryStandards
No ratings yet
EDM EnterpriseDataDictionaryStandards
33 pages
Building A Better Enterprise Data Architecture
No ratings yet
Building A Better Enterprise Data Architecture
15 pages
Mastering Your Data PDF Ms PDF
No ratings yet
Mastering Your Data PDF Ms PDF
20 pages
Data Strategy Worksheet: Component Typical Questions
No ratings yet
Data Strategy Worksheet: Component Typical Questions
2 pages
PWC A4 Data Governance Results
100% (2)
PWC A4 Data Governance Results
36 pages
Data Quality
No ratings yet
Data Quality
76 pages
Best Practices - Customer Data Quality Management
100% (1)
Best Practices - Customer Data Quality Management
10 pages
Data Governance Plan PDF
100% (4)
Data Governance Plan PDF
36 pages
Maturity Model For Data Governance
100% (1)
Maturity Model For Data Governance
4 pages
Issues in Enterprise Data Management (EDM)
No ratings yet
Issues in Enterprise Data Management (EDM)
12 pages
The Data Driven Enterprise of 2025 Final
No ratings yet
The Data Driven Enterprise of 2025 Final
10 pages
MDM Data Governance
No ratings yet
MDM Data Governance
20 pages
Becoming a Data Driven Organization: The Handbook
From Everand
Becoming a Data Driven Organization: The Handbook
Mirko Peters
2/5 (1)
Warren Buffett Letter To Shareholders 2014
No ratings yet
Warren Buffett Letter To Shareholders 2014
23 pages
Consumer Report Buying Guide 2014: Paints
100% (1)
Consumer Report Buying Guide 2014: Paints
6 pages
Consumer Reports Buying Guide 2014: Electronics
100% (1)
Consumer Reports Buying Guide 2014: Electronics
39 pages
Consumer Reports Buying Guide 2013: Home Appliances
100% (1)
Consumer Reports Buying Guide 2013: Home Appliances
66 pages
Consumer Reports Buying Guide 2014: Home Appliances
100% (6)
Consumer Reports Buying Guide 2014: Home Appliances
51 pages
Consumer Reports Buying Guide 2013: Printers
No ratings yet
Consumer Reports Buying Guide 2013: Printers
10 pages
Consumer Reports Buying Guide 2012 - Vacuum Cleaners
100% (1)
Consumer Reports Buying Guide 2012 - Vacuum Cleaners
7 pages
Consumer Reports Buying Guide 2013
100% (1)
Consumer Reports Buying Guide 2013
252 pages
Consumer Reports Buying Guide 2012 - Auto Reliability
No ratings yet
Consumer Reports Buying Guide 2012 - Auto Reliability
26 pages
Consumer Reports Buying Guide 2012
100% (3)
Consumer Reports Buying Guide 2012
205 pages
Barrons 2011 Investment Roundtable
No ratings yet
Barrons 2011 Investment Roundtable
55 pages
Barron's Mid-Year Investment Roundtable - Barrons
No ratings yet
Barron's Mid-Year Investment Roundtable - Barrons
19 pages
How To Rank Video On Youtube and Google From
100% (1)
How To Rank Video On Youtube and Google From
6 pages
MOLAP Vs ROLAP Vs HOLAP in Online Analytical Processing (OLAP) - Engineering Education (EngEd) Program - Section
No ratings yet
MOLAP Vs ROLAP Vs HOLAP in Online Analytical Processing (OLAP) - Engineering Education (EngEd) Program - Section
9 pages
NLP 05
No ratings yet
NLP 05
26 pages
CVK A Training Plan
No ratings yet
CVK A Training Plan
6 pages
Report of Online Food Ordering System
No ratings yet
Report of Online Food Ordering System
26 pages
Converting ER Diagrams To Tables: Yufei Tao
No ratings yet
Converting ER Diagrams To Tables: Yufei Tao
12 pages
Upload A Document To Access Your Download: DK Essential Managers - Innovation PDF
No ratings yet
Upload A Document To Access Your Download: DK Essential Managers - Innovation PDF
3 pages
Azure Practice Questions 2
No ratings yet
Azure Practice Questions 2
4 pages
DBMS Unit 1
No ratings yet
DBMS Unit 1
28 pages
Email Studio List Versus Email Studio Data Extension
No ratings yet
Email Studio List Versus Email Studio Data Extension
3 pages
Ethical and Social Issues in The Information Age PDF
0% (2)
Ethical and Social Issues in The Information Age PDF
2 pages
Presentation-Abstraction-Control (PAC) Pattern
No ratings yet
Presentation-Abstraction-Control (PAC) Pattern
14 pages
CBO AI Initiative - CBO T&I AI Infusion in Monitoring - Qlik Sense Infusion Guide January 2022
No ratings yet
CBO AI Initiative - CBO T&I AI Infusion in Monitoring - Qlik Sense Infusion Guide January 2022
34 pages
En 1307
No ratings yet
En 1307
45 pages
Whaleys Encyclopedic Dictionary of Magic 1584-2000 PDF
No ratings yet
Whaleys Encyclopedic Dictionary of Magic 1584-2000 PDF
2 pages
RAG EVALUATIONS _ A SIMPLE GUIDE TO RAG
No ratings yet
RAG EVALUATIONS _ A SIMPLE GUIDE TO RAG
16 pages
Case Study - Transformers in Machine Translation - Quiz - Attempt Review
No ratings yet
Case Study - Transformers in Machine Translation - Quiz - Attempt Review
6 pages
NAP Form 10 PDF
No ratings yet
NAP Form 10 PDF
2 pages
Unit 14 - Business Intellegence Sep 2022 - Assignment Brief
No ratings yet
Unit 14 - Business Intellegence Sep 2022 - Assignment Brief
3 pages
MIS006E + Word + Backup Policy
No ratings yet
MIS006E + Word + Backup Policy
4 pages
Devenir Super Conscient Transformer Sa Vie Et Accd 5e152b35097c47c64a8b457f
No ratings yet
Devenir Super Conscient Transformer Sa Vie Et Accd 5e152b35097c47c64a8b457f
4 pages
Genesys Licensing Guide: Genesys Technical Licenses List
No ratings yet
Genesys Licensing Guide: Genesys Technical Licenses List
5 pages
Measuring Similarity Between Question Pair in Online Forums: 1 Pramod Kumar Rai 2 Kunal Chakma
No ratings yet
Measuring Similarity Between Question Pair in Online Forums: 1 Pramod Kumar Rai 2 Kunal Chakma
5 pages
Atomic Commit and Concurrency Control: COS 418: Distributed Systems Wyatt Lloyd
No ratings yet
Atomic Commit and Concurrency Control: COS 418: Distributed Systems Wyatt Lloyd
40 pages
Blast Analisis II
No ratings yet
Blast Analisis II
15 pages
Commonly Used SAP T-Codes
No ratings yet
Commonly Used SAP T-Codes
3 pages
Super Important Questions For BDA-18CS72: Module-1
No ratings yet
Super Important Questions For BDA-18CS72: Module-1
2 pages
Enterprise Resource Planning by Alexis Leon Mohit
100% (1)
Enterprise Resource Planning by Alexis Leon Mohit
322 pages
Blur Tool - D-Wise
No ratings yet
Blur Tool - D-Wise
5 pages

Identifying Master Data

Uploaded by

Identifying Master Data

Uploaded by

A DataFlux White Paper

Prepared by: David Loshin

Semantics, Metadata and Identifying Master Data

Leader in Data Quality www.dataflux.com International

• What data elements constitute our “master data?”

• How do we assess the variances between the different representations in order

Figure 1: Isolating master data from different data sets.

What is Master Data?

Master Data Object Value

Table 1: Master data elements for a typical airline reservation.

• Master data objects may be classified within a semantic hierarchy, with

• Master data objects may require specialized application functions to create

Centralizing Semantic Metadata

1. Discover which data resources may contain entity information

2. Understand which attributes carry identifying information

3. Extract identifying information from the data resource

4. Transform the identifying information into a standardized or canonical form

5. Establish similarity to other standardized records

• Structure at the instance level

• Semantics across all levels.

1. Collecting and analyzing master metadata

2. Resolving similarity in structure

3. Understanding and unifying master data semantics

Challenge 1: Consolidating and Analyzing Master Metadata

Our consolidated metadata repository will eventually enumerate the relevant

Challenge 2: Resolving Similarity in Structure

Challenge 3: Unifying Semantics

• What are the definitions for the data elements?

• Or for the data sets themselves?

• Are there authoritative sources for the definitions?

• Do similar objects have different business meanings?

Identifying and Qualifying Master Data

2. Time/Date Stamp – A value that represents a point in time.

3. Magnitude – A numeric value on a continuous range, such as a quantity or an

6. Cross-Reference – An identifier that either is uniquely assigned to the record

• We are using our tools to assess data element structure

• We are collecting this information into a metadata repository

• We use our tools to seek out attributes with similar names

• We use our tools to detect similarities between representative data meta-

Standardizing the Representation

Summary: Metadata Profiling Drives the Process

You might also like