0% found this document useful (0 votes)

48 views6 pages

Data Migration First Steps

The document discusses data profiling and mapping, which are essential first steps in data migration and integration projects. It involves studying source data to understand its content and structure, then developing accurate mapping specifications. The document outlines challenges with conventional approaches and how a new software called Migration Architect addresses these challenges through automated discovery and interactive analysis.

Uploaded by

Ioanna Zlateva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views6 pages

Data Migration First Steps

Uploaded by

Ioanna Zlateva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Profiling and Mapping

The Essential First Step in Data Migration and Integration Projects

An Evoke Software White Paper

Summary
At any given time, according to industry analyst estimates, roughly two-thirds of the Fortune
1000/Global 2000 are engaged in some form of data migration or integration project—including
implementation of new ERP, CRM and e-commerce applications, data consolidations, data quality
improvements, and creation of data warehouses and data marts. These projects are driven by
increasing worldwide competition, industry consolidation, and constant pressure to increase revenues
and profits.

To achieve success, each of these projects must follow a path that begins with studying the source data
to thoroughly understand its content, structure, and quality—a process called Data Profiling. Once the
data has been profiled, an accurate set of mapping specifications must be developed based on this
profile—a process called Data Mapping. These two processes of Data Profiling and Mapping
comprise the essential first steps in any successful data migration or integration project and should be
completed prior to attempting to extract, scrub, transform, and transport the data.

Migration Architect
The Data Profiling and Mapping Solution

1 of 6
A new category of software that automates many of the complex processes involved in Data Profiling
and Mapping has emerged to simplify and accelerate these projects. The remainder of this document
discusses the market drivers and opportunity for this type of software, presents an overview of Data
Profiling and Mapping, and concludes with a look at associated benefits.

Market Drivers and Opportunities

The only true constant for today’s corporations is change. Technology standards have evolved
dramatically over the years. As a result, corporate computing environments are overpopulated with
disparate, poorly integrated applications and databases. For a variety of reasons—financial pressures,
increased competition, ongoing deregulation, mergers and acquisitions, and European currency
concerns—corporations are consolidating their data sources and integrating their applications. This
involves moving from legacy environments to packaged applications based on modern relational
technologies, as well as using data from existing applications to support second-generation e-business
applications, corporate-wide CRM implementations, and enterprise information portals.

According to recent studies conducted by The Standish Group, a research advisory firm based in
Dennis, Mass., 15,000 data migration projects with budgets of $3 million or greater began in 1999 at a
total cost of $95 billion. Data from other market research firms confirms the significant market
opportunity. AMR Research estimates that ERP conversion projects will generate $52 billion by 2002.
Gartner Group/Dataquest forecasts that $2.6 billion will be spent on data warehousing software in
1999, growing to $6.9 billion by 2002. International Data Corporation forecasts that e-commerce
revenues, including business-to-consumer and business-to-business, will top $1 trillion by 2003, and
will require extensive investments in supporting technology.

Unfortunately, most of these data migration, integration, and consolidation projects don’t go as
smoothly as anticipated. The Standish Group studies indicate that 88 percent of the 15,000 data
migration projects starting in 1999 will either overrun or fail. One of the primary reasons for this
extraordinary failure rate is the lack of a thorough understanding of the source data early on in these
projects. Conventional approaches to data profiling and mapping can create nearly as many problems
as they resolve—data not loading properly, poor quality data and compounded inaccuracies, time and
cost overruns and, in extreme cases, late-stage project cancellations. The old adage ‘garbage in, garbage
out’ is very applicable here. The market opportunity is enormous for an innovative solution that can
help companies solve these ongoing problems.

Fortunately, such a solution exists in the form of Data Profiling and Mapping software from Evoke
Software Corporation.

What Is Data Profiling and Mapping?

Data Profiling and Mapping is the process whereby the content and structure of legacy data sources
are examined and understood in detail, and mapping specifications are produced for the successful
movement and transformation of the data from source to target. This process consists of six sequential
steps, three for Data Profiling and three for Data Mapping, with each step building on the information
produced in the previous steps. The resulting transformation maps are used as specifications to drive
data extraction, scrubbing, transformation, and transport processes. These data movement processes
can be implemented by writing custom code or using third-party migration or integration products.

2 of 6
Conventional Approach to Data Profiling and Mapping: Problems and Pitfalls
The conventional approach to Data Profiling starts with a large team of people (data and business
analysts, data administrators, database administrators, system designers, subject matter experts, etc.).
These people meet in a series of joint application development (JAD) sessions and attempt to extract
useful information about the content and structure of the legacy data sources by examining outdated
documentation, COBOL copy books, inaccurate metadata and, in some cases, the physical data itself.
Typically, this is a very manually intensive process supplemented, in some cases, by semi-automated
query techniques. Profiling legacy data in this way is extremely complex, time-consuming, and error-
prone. And, once completed, the team has achieved only a limited understanding of the source data.

At that point, according to the project flow chart, it is time to move on to the mapping phase. But
since the source data is so poorly understood and inferences about it are largely based on assumptions
rather than facts, this phase typically results in an inaccurate data model and set of mapping
specifications. Based on this information, the data is extracted, scrubbed, transformed, and moved or
integrated.

Not surprisingly, in almost all cases, the new system doesn’t work correctly the first time. Then the
rework process begins: redesigning, recoding, and retesting. At best, the project incurs significant time
and cost overruns. At worst, faced with runaway costs and no clear end in sight, senior management
cancels the project, preferring to give up the promised but apparently unattainable benefits in favor of
the status quo.

Data Profiling with Migration Architect

Unlike conventional approaches to Data Profiling, Migration Architect utilizes a combination of
automated discovery and interactive analysis to provide data and business analysts—for the first
time—with a thorough understanding of their source data, including content, structure, quality, and
integrity. With Migration Architect, data sources are profiled in three dimensions: down columns
(Column Profiling); across rows (Dependency Profiling); and across tables (Redundancy Profiling).

3 of 6
Column Profiling
Column Profiling analyzes the values in each column or field of source data, inferring detailed
characteristics for each column, including data type and size, range of values, frequency and
distribution of values, cardinality, and null and uniqueness characteristics. Interactive drill-down
capabilities allow analysts to detect and analyze data content quality problems and to evaluate
discrepancies between the inferred, true metadata and the documented metadata.

Dependency Profiling.
Dependency Profiling analyzes data across rows—comparing values in every column with values in
every other column—and infers all dependency relationships that exist between attributes within each
table. This process cannot be done manually. Dependency Profiling identifies primary keys and
whether or not expected dependencies (e.g., those imposed by a new application) are supported by the
data. It also identifies ‘gray-area dependencies’dependencies that are true most of the time, but not
all the timeusually an indication of a data quality problem. Dependency Profiling is critical to the
subsequent elimination of duplicate information and the production of a true third normal form model
of the data sources.

Redundancy Profiling.
Redundancy Profiling compares data between tables of the same or different data sources, determining
which columns contain overlapping or identical sets of values. It looks for repeating patterns among an
organization’s ‘islands of information’—billing systems, sales force automation systems, post-sales
support systems, etc. Redundancy Profiling identifies attributes containing the same information but
with different names (synonyms), and attributes that have the same name but different business
meaning (homonyms). It also helps determine which columns are redundant and can be eliminated,
and which are necessary to connect information between tables (foreign keys needed for referential
integrity). Redundancy Profiling eliminates processing overhead and reduces the probability of error.
As with Dependency Profiling, this process cannot be done manually.

4 of 6
Data Mapping: The Remaining Three Steps
Once the Data Profiling process is finished, the profile results can be used to complete the remaining
three Data Mapping steps: Normalization; Model Enhancement; and Transformation Mapping.

Normalization
Using the information gathered from the first three Data Profiling steps, Migration Architect builds a
fully normalized relational model based on the consolidation of all the data. Because the model is fully
supported by the data—rather than by assumptions and inaccurate metadata—it will not fail.

Model Enhancement
Users can modify the normalized model by adding structures to support new requirements, or by
adding indexes and denormalizing the structures to enhance performance. Then Migration Architect
produces data definition language (DDL) for the resulting model, complete with referential integrity
instructions. The DDL is tailored to Oracle, Informix, Sybase, DB2 for AIX, or ANSI SQL-92
environments as required. It may also be imported into graphical design tools such as Computer
Associates’ ERwin.

Transformation Mapping
When data model modifications are complete, Migration Architect creates a set of transformation
maps that show the relationships between columns in the source files and tables in the enhanced
model, including attribute-to-attribute flows and scrubbing and transformation requirements. These
maps provide essential information to the programmers creating conversion routines to move or
integrate data.

Industry studies suggest that conventional approaches to Data Profiling and Mapping take between
three and five hours per attribute (or data element). And this does not include Dependency and
Redundancy Profiling, complex processes that typically involve millions of comparisons between
attributes within the same table and across tables of disparate sources so as to uncover functional
dependencies, primary and foreign keys, duplicate data, etc.

In sharp contrast to conventional methods, Data Profiling and Mapping with Migration Architect can
be done in a fraction of the time—minutes per attribute instead of hours—while providing users with
a much more thorough understanding of the source data. Based on this understanding, projects are
completed successfully the first time, accelerating time-to-benefit and lowering project cost and risk. In
addition, an accurate data profile improves communication between IT professionals and business end
users—groups that rarely speak the same language—by providing the data facts necessary to make
objective business decisions. This results in higher data and application quality and greater end user and
customer satisfaction.

5 of 6
The Bottom Line
Developing an accurate profile of existing data sources is the essential first step in any successful data
migration or integration project. Data Profiling software enables a small, focused team of technical and
business users to quickly perform the highly complex tasks necessary to achieve a thorough
understanding of source data—a level of understanding that simply cannot be achieved through
conventional manual processes and semi-automated query techniques.

Data Profiling software enables data migration, integration, and consolidation projects to be completed
successfully the first time, eliminating extensive design rework and late-stage project cancellations. It
can even warn IT management if the business objectives of the project are not supported by the data,
dramatically lowering project risk and enabling valuable resources to be re-directed to other, more
fruitful projects. Finally, Data Profiling will deliver higher data and application quality, resulting in more
informed business decisions and greater revenues and profits.

6 of 6

Data Analytics Lecture Notes
100% (1)
Data Analytics Lecture Notes
10 pages
Oracle CPQ 2021: Replace (STR, Old, New, (N) )
No ratings yet
Oracle CPQ 2021: Replace (STR, Old, New, (N) )
30 pages
DBMS Ninja Notes
No ratings yet
DBMS Ninja Notes
134 pages
Testbank 2
No ratings yet
Testbank 2
6 pages
Data Migration Strategies
No ratings yet
Data Migration Strategies
6 pages
Data Profiling
No ratings yet
Data Profiling
7 pages
Data Profiling White Paper1003-Final
No ratings yet
Data Profiling White Paper1003-Final
17 pages
Lecture3 - Informatica Developer Concepts 1
No ratings yet
Lecture3 - Informatica Developer Concepts 1
28 pages
Data Profiling
No ratings yet
Data Profiling
15 pages
Data Profiling
No ratings yet
Data Profiling
3 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Necessity of Data Profiling: A How-To Guide To Getting Started and Driving Value
No ratings yet
The Necessity of Data Profiling: A How-To Guide To Getting Started and Driving Value
2 pages
The Process of Data Mapping For Data Integration Projects
No ratings yet
The Process of Data Mapping For Data Integration Projects
6 pages
Basics of Data Integration
No ratings yet
Basics of Data Integration
67 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
No ratings yet
Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
3 pages
Data Integration with Blendo: Definitive Reference for Developers and Engineers
From Everand
Data Integration with Blendo: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
DAVAI Macro
No ratings yet
DAVAI Macro
6 pages
Kimball D T 59 Surprising Value
No ratings yet
Kimball D T 59 Surprising Value
2 pages
Cyber Security Unit - 5
No ratings yet
Cyber Security Unit - 5
43 pages
1020 Data Profiling
No ratings yet
1020 Data Profiling
3 pages
Data Profiling Overview
No ratings yet
Data Profiling Overview
11 pages
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Informatica Powercenter Data Profiling: Assumptions Lead To Poor Data Quality
No ratings yet
Informatica Powercenter Data Profiling: Assumptions Lead To Poor Data Quality
4 pages
Data Mapping
No ratings yet
Data Mapping
5 pages
10 5923 J Scit 20180801 01-1
No ratings yet
10 5923 J Scit 20180801 01-1
10 pages
Data Profiling Vision Felix Naumann
No ratings yet
Data Profiling Vision Felix Naumann
11 pages
Chapter 1.3
No ratings yet
Chapter 1.3
9 pages
Lesson4 - DATA MAPPING
No ratings yet
Lesson4 - DATA MAPPING
7 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Bana1 Midterm Reviewer
No ratings yet
Bana1 Midterm Reviewer
10 pages
Data Quality
No ratings yet
Data Quality
15 pages
Data Analytics
No ratings yet
Data Analytics
8 pages
Data Profiling, Quality and Governance - Research Paper
No ratings yet
Data Profiling, Quality and Governance - Research Paper
13 pages
Lesson4 - DATA MAPPING
No ratings yet
Lesson4 - DATA MAPPING
6 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
Aws Data Analytics Fundamentals
100% (1)
Aws Data Analytics Fundamentals
15 pages
Data Profiling Is A Critical Step in Data Manageme
No ratings yet
Data Profiling Is A Critical Step in Data Manageme
7 pages
Data Profiling
No ratings yet
Data Profiling
9 pages
Module 2 Lesson 1
No ratings yet
Module 2 Lesson 1
12 pages
Data Governance for Tax Administrations: A Practical Guide
From Everand
Data Governance for Tax Administrations: A Practical Guide
Inter-American Center of Tax Administrations – CIAT
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Paper 2 Datawarehouse Notes
No ratings yet
Paper 2 Datawarehouse Notes
20 pages
1,2 Units Notes
No ratings yet
1,2 Units Notes
53 pages
Fivetran Data Integration Essentials: Definitive Reference for Developers and Engineers
From Everand
Fivetran Data Integration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Migration
No ratings yet
Data Migration
15 pages
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Data Analysis
No ratings yet
Data Analysis
6 pages
Strategies For Data Migration During Operatorship Handover
100% (1)
Strategies For Data Migration During Operatorship Handover
3 pages
Datamining
100% (1)
Datamining
11 pages
Download
No ratings yet
Download
4 pages
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
Current Trends
No ratings yet
Current Trends
35 pages
Data Warehousing ETL Checklist
No ratings yet
Data Warehousing ETL Checklist
3 pages
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Using Agile To Accelerate Your Data Transformation
No ratings yet
Using Agile To Accelerate Your Data Transformation
8 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
41 pages
Big Data Categories-Life Cycle
No ratings yet
Big Data Categories-Life Cycle
15 pages
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Esg For Nonprofits A Primer
No ratings yet
Esg For Nonprofits A Primer
3 pages
Data Availability Statement Guidance 2021 01
No ratings yet
Data Availability Statement Guidance 2021 01
3 pages
Disclaimer: The Attached Paper Was Prepared by The OECD Secretariat. It Bears No Legal
No ratings yet
Disclaimer: The Attached Paper Was Prepared by The OECD Secretariat. It Bears No Legal
5 pages
FINAL ESG Framework For Non Profit Sector 2024
No ratings yet
FINAL ESG Framework For Non Profit Sector 2024
20 pages
KPMG ESG Risk Management in Banks
No ratings yet
KPMG ESG Risk Management in Banks
24 pages
Labor Productivity
No ratings yet
Labor Productivity
7 pages
Scoreboard - Lab.labour Productivity
No ratings yet
Scoreboard - Lab.labour Productivity
5 pages
Transfer Pricing Forum
No ratings yet
Transfer Pricing Forum
7 pages
Transfer Pricing Documentation Obligations: Part 35a-01-02
No ratings yet
Transfer Pricing Documentation Obligations: Part 35a-01-02
6 pages
IFRS 17 Resources Lifelong Learning v2 Mch21
No ratings yet
IFRS 17 Resources Lifelong Learning v2 Mch21
6 pages
Transfer Pricing Documentation
No ratings yet
Transfer Pricing Documentation
3 pages
CAP ICR Uniform-Guidance Handout-2
No ratings yet
CAP ICR Uniform-Guidance Handout-2
13 pages
Q1 2022 iAFC iAIFS IFRS 17 PDF V2
No ratings yet
Q1 2022 iAFC iAIFS IFRS 17 PDF V2
25 pages
Payments Made by An Operator To A Grantor in A Service Concession Arrangement
No ratings yet
Payments Made by An Operator To A Grantor in A Service Concession Arrangement
2 pages
A Comparison of 4 International Guidelines For CSR
No ratings yet
A Comparison of 4 International Guidelines For CSR
17 pages
Attachment 0
No ratings yet
Attachment 0
14 pages
Effects Study CL 04 - IFRIC 14 - Danske Bank
No ratings yet
Effects Study CL 04 - IFRIC 14 - Danske Bank
11 pages
July 2017: Accounting Policies, Changes in Accounting Estimates and Errors
No ratings yet
July 2017: Accounting Policies, Changes in Accounting Estimates and Errors
1 page
Solvency II Orsa Process Paper
No ratings yet
Solvency II Orsa Process Paper
29 pages
Guideline: Subject: Own Risk and Solvency Assessment Category: Sound Business and Financial Practices
No ratings yet
Guideline: Subject: Own Risk and Solvency Assessment Category: Sound Business and Financial Practices
13 pages
Opinion On Sustainability Within Solvency II: Eiopa Regular Use
No ratings yet
Opinion On Sustainability Within Solvency II: Eiopa Regular Use
60 pages
International ORSA Regulatory Requirements Chart
No ratings yet
International ORSA Regulatory Requirements Chart
5 pages
Corporate Social Responsibility at Nordea: Building Trust Every Day
No ratings yet
Corporate Social Responsibility at Nordea: Building Trust Every Day
8 pages
Guidelines On Own Risk and Solvency Assessment: Eiopa-Bos-14/259 en
No ratings yet
Guidelines On Own Risk and Solvency Assessment: Eiopa-Bos-14/259 en
8 pages
Difference Between .Include and .Append?
No ratings yet
Difference Between .Include and .Append?
4 pages
DBMS Interview Questions
No ratings yet
DBMS Interview Questions
12 pages
Mcs 23 Imp Q
No ratings yet
Mcs 23 Imp Q
29 pages
12 CS 2023-24 QP PB Central Cluster MS
No ratings yet
12 CS 2023-24 QP PB Central Cluster MS
8 pages
Ngo Duc Thang-Nguyen Do Tung Lam-INS2080-Co So Du Lieu-Tran Thi Oanh
No ratings yet
Ngo Duc Thang-Nguyen Do Tung Lam-INS2080-Co So Du Lieu-Tran Thi Oanh
21 pages
Database Project PDF 2007110
No ratings yet
Database Project PDF 2007110
15 pages
Mysql Notes
No ratings yet
Mysql Notes
67 pages
1st Class ABAP
No ratings yet
1st Class ABAP
24 pages
Cit208 Calculus Educational Consult Eze-Ego QQQZZZW Updated
No ratings yet
Cit208 Calculus Educational Consult Eze-Ego QQQZZZW Updated
31 pages
Final Year Project On Inventory Manageme
No ratings yet
Final Year Project On Inventory Manageme
51 pages
1ST Term .SSS2 Data Processing
No ratings yet
1ST Term .SSS2 Data Processing
22 pages
Information Technology (802) - Class 12 - Lesson 1 - Mysql Commands
100% (1)
Information Technology (802) - Class 12 - Lesson 1 - Mysql Commands
35 pages
As Level Computer Application Databases
No ratings yet
As Level Computer Application Databases
187 pages
My Visual Database Manual
100% (1)
My Visual Database Manual
95 pages
DBMS University Questions
No ratings yet
DBMS University Questions
20 pages
Project 2 Database Design and ETL: 1 Introduction: What Is This Project All About?
No ratings yet
Project 2 Database Design and ETL: 1 Introduction: What Is This Project All About?
16 pages
Biz Pro
No ratings yet
Biz Pro
2 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
45 pages
Mysql
No ratings yet
Mysql
33 pages
Introduction Mysql, DDL, DML and Keys
No ratings yet
Introduction Mysql, DDL, DML and Keys
5 pages
Assignment - 1 Solution
No ratings yet
Assignment - 1 Solution
9 pages
Experiment 2: AIM: To Execute The Queries in DDL Commands. Theory
No ratings yet
Experiment 2: AIM: To Execute The Queries in DDL Commands. Theory
9 pages
Database Management System New
No ratings yet
Database Management System New
72 pages
Payroll Managements Medium Project
No ratings yet
Payroll Managements Medium Project
15 pages
New Updated Oracle 10g DBA Interview Questions
No ratings yet
New Updated Oracle 10g DBA Interview Questions
42 pages
Logical Data Model
No ratings yet
Logical Data Model
12 pages
SQL Crash Course
100% (1)
SQL Crash Course
17 pages

Data Migration First Steps

Uploaded by

Data Migration First Steps

Uploaded by

Data Profiling and Mapping

The Essential First Step in Data Migration and Integration Projects

Market Drivers and Opportunities

What Is Data Profiling and Mapping?

Data Profiling with Migration Architect

You might also like