Lesson4 - DATA MAPPING
Lesson4 - DATA MAPPING
Enterprise data is getting more dispersed and voluminous by the day, and at the same time, it has
become more important than ever for businesses to leverage data and transform it into actionable
insights. However, enterprises today collect information from an array of data points, and they
may not always speak the same language. To integrate this data and make sense of it, data
mapping is used which is the process of establishing relationships between separate data models.
In simple words, data mapping is the process of mapping data fields from a source file to their
related target fields.
For example, in Figure 1, ‘Name,’ ‘Email,’ and ‘Phone’ fields from an Excel source are mapped
to the relevant fields in a Delimited file, which is our destination.
Mapping tasks vary in complexity, depending on the hierarchy of the data being mapped, as well
as the disparity between the structure of the source and the target. Every business application,
whether on-premise or cloud, uses metadata to explain the data fields and attributes that
constitute the data, as well as semantic rules that govern how data is stored within that
application or repository.
For example, Microsoft Dynamics CRM contains several data sets which comprise of different
objects, such as Leads, Opportunities, and Competitors. Each of these data sets has several fields
like Name, Account Owner, City, Country, Job Title, and more. The application also has a
defined schema along with attributes, enumerations, and mapping rules. Therefore, if a new
record is to be added to the schema of a data object, a data map needs to be created from the data
source to the Microsoft Dynamics CRM account.
Depending on the number, schema, and primary keys and foreign keys of the relational databases
data sources, database mappings can have a varying degree of complexity. For example, in the
following example, data from three different databases tables are joined and mapped to an Excel
destination.
Depending on the data management needs of an enterprise and the capabilities of the data
mapping software, data mapping is used to accomplish a range of data integration and
transformation tasks.
To leverage data and extract business value out of it, the information collected from various
external and internal sources must be unified and transformed into a format suitable for the
operational and analytical processes. This is accomplished through data mapping, which is an
integral step in various data management processes, including:
a) Data Integration
For successful data integration, the source and target data repositories must have the same data
model. However, it is rare for any two data repositories to have the same schema. Data mapping
tools help bridge the differences in the schemas of data source and destination, allowing
businesses to consolidate information from different data points easily.
b) Data Migration
Data migration is the process of moving data from one database to another. While there are
various steps involved in the process, creating mappings between source and target is one of the
most difficult and time-consuming tasks, particularly when done manually. Inaccurate and
invalid mappings at this stage not only impact the accuracy and completeness of data being
migrated but can even lead to the failure of the data migration project. Therefore, using a code-
free mapping solution that can automate the process is important to migrate data to the
destination successfully.
c) Data Warehousing
Data mapping in a data warehouse is the process of creating a connection between the source and
target tables or attributes. Using data mapping, businesses can build a logical data model and
define how data will be structured and stored in the data warehouse. The process begins with
collecting all the required information and understanding the source data. Once that has been
done and a data mapping document created, building the transformation rules and creating
mappings is a simple process with a data mapping solution.
d) Data Transformation
Because enterprise data resides in a variety of locations and formats, data transformation is
essential to break information silos and draw insights. Data mapping is the first step in data
transformation. It is done to create a framework of what changes will be made to data before it is
loaded to the target database.
Data mapping plays a significant role in EDI file conversion by converting the files into various
formats, such as XML, JSON, and Excel. An intuitive data mapping tool allows the user to
extract data from different sources and utilize built-in transformations and functions to map data
to EDI formats without writing a single line of code. This helps perform seamless B2B data
exchange.
Based on the level of automation, data mapping techniques can be divided into three types:
Manual data mapping involves hand-coding the mappings between the data source and target
database. Although hand-coded, manual data mapping process offers unlimited flexibility for
unique mapping scenarios initially, it can become challenging to maintain and scale as the
mapping needs of the business grow complex.
Database 1 Database 2
Student Name
Name
ID
SSN
Level
Major
Major
Grades
Marks
Once schema mapping has been done, Java, C++, or C# code is generated to achieve the required
data conversion tasks. The programming language used may vary depending on the data
mapping tool used.
Automated data mapping tools feature a complete code-free environment for data mapping tasks
of any complexity. Mappings are created between the data source and target database in a simple
drag-and-drop manner. An automated data mapping tool also has built-in transformations to
convert data from XML to JSON, EDI to XML, XML to XLS, hierarchical to flat files, or any
format without writing a single line of code. Some enterprise-grade data mapping software also
offer process orchestration and job scheduling features to automate database mapping.
1. On-Premise: Such tools are hosted on a company’s server and native computing
infrastructure. Many on-premise data mapping tools eliminate the need for hand-coding
to create complex mappings, and automate repetitive tasks in the data mapping process.
2. Cloud-Based: These tools leverage cloud technology to help a business perform its data
mapping projects.
3. Open-Source: Open-source mapping tools provide a low-cost alternative to on-premise
data mapping solutions. These tools work better for small businesses with lower data
volumes and simpler use-cases.
Selecting the right data mapping tool that’s the best fit for the enterprise is critical to the success
of any data integration, data transformation, and data warehousing project. The process involves
identifying the unique data mapping requirements of the business and must-have features.
The key to choosing the right data mapping software is research. Online reviews on websites
like Capterra, G2 Crowd, and Software Advice can be a good starting point to shortlist data
mapping software that offer the maximum number of features. The next step would be to classify
the features of data mapping tools into three different categories, including must-haves, good-to-
haves, and will-not-use, depending on the unique data management needs of the business.
Some of the key features that a data mapping solution must have include:
Support for various databases and hierarchical and flat file formats, such as delimited, XML,
JSON(JavaScript Object Notation), EDI, Excel, and text files are the basic staples of all data
mapping tools. In addition, for businesses that need to integrate structured data with semi-
structured and unstructured data sources, support for PDF, PDF forms, RTF, weblogs, etc., is
also a key feature.
If your business uses a cloud-based CRM application, such as Salesforce or Microsoft Dynamics
CRM, look for a data mapping tool that offers out-of-the-box connectivity to this enterprise
applications.
To break down information silos and allow both data professionals and business users access to
enterprise data, it is important to select a data mapping solution that offers you a code-free way
to create data maps. From built-in transformations to join, filter, and sort data to a range of
expressions and functions, user-friendly data mapping tools feature an extensive library of
transformations to fulfill the data conversion needs of an enterprise.
Since data mapping jobs, if not automated, can take up a significant amount of developer
resources and time, opting for data mapping software with process orchestration capabilities can
bring cost-savings to a business.
With the ability to orchestration a complete database mapping workflow and time-based and
event-triggered job scheduling, these data mapping solutions automate data mapping and
transformation process, thereby delivering analytics-ready data faster.
d) Instant Data Preview Feature for Real-Time Testing and Validation of Mappings
Mapping data to and from formats such as JSON, XML, and EDI can be complex due to the
diversity in data structures. However, to prevent mapping errors at the design-time, an effective
data mapping tool should feature an Instant Data Preview engine which lets the user view the
processed data, as well as raw data at any step of the data management process.
Often, companies are required to leverage incoming data from business partners, such as
resellers and suppliers. Mapping and integrating data from third parties can be challenging due to
difference in data representation. For example, one vendor might name the Order ID field as
‘Order No.’ while another vendor might name it as ‘Order #’. Hence, an agile data mapping
solution should possess a synonym-driven file reading and mapping feature to address the
challenge of naming conflicts. This can be done by defining synonyms for a word in the
synonym dictionary of a particular project.