0% found this document useful (0 votes)

15 views12 pages

Module 2 Lesson 1

The document outlines the course IT 2217, focusing on data mapping and exchange, which involves aligning data fields between systems and ensuring accurate data transfer. Key learning outcomes include understanding data standards, designing mappings, and applying security techniques. The document also discusses challenges in data mapping, tools available, and future trends such as automation and cloud-based solutions.

Uploaded by

capalihanernestjan09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

Module 2 Lesson 1

Uploaded by

capalihanernestjan09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

IT 2217

INTEGRATIVE PROGRAMMING AND TECHNOLOGIES 1

Precious Joan D. Samson, MIT [email protected]

Course Learning Outcome:

At the end of the semester, the students will be able:

1. Students will understand data standards and formats;

2. Students will recognize mapping tools and techniques;
3. Students will design and implement mappings between different data schemas;
4. Students will assess the suitability of different tools and platforms for specific data
mapping and exchange scenarios; and
5. Students will apply encryption, authentication, and authorization techniques to secure
data exchange.
INTRODUCTION

Data mapping is the process of matching data fields from one source to data fields in another
source. It helps ensure that data from one source can be accurately and effectively transformed
or transferred to another destination while maintaining its integrity, consistency, and meaning.

Data mapping and exchange are important because they ensure data is transferred accurately
and consistently between sources:

Data mapping

The process of matching data fields from one source to another. It's the first step in data
processes like ETL and data integration. Data mapping creates a map that shows how data should
move from one system to another.

Data exchange

The process of sharing data between systems or databases. Data mapping is the foundation for
data exchange.

Data migration

The process of transferring data from one source to a destination. Data migration can be used to
move data from outdated sources to new systems, or from a local source to the cloud.

Data integration

The process of combining data from multiple sources into a single destination, such as a data
warehouse or workflow. Data integration can involve combining data from spreadsheets,
databases, and other sources.

Data mapping is important because it helps ensure that data maintains its integrity, consistency,
and meaning when it's transferred or transformed.

DATA FORMATS AND STANDARDS

Some common data formats and standards used in data mapping and exchange include:

• CSV and JSON: Plain text files with columns of data separated by commas or tabs
• XML and HTML: Common data conversion formats
• SQL and NoSQL: Common data conversion formats
• ETL and ELT: Extraction, translation, and loading, which includes data batching,
transformation, and scheduling tools
• EDI and API: Common data conversion formats
• Shapefile: A GIS format that is accepted by both commercial and open source software
• DXF: Drawing Exchange Format, a standard for exchanging graphic data developed by
Autodesk

Differences between structured, semi-structured, and unstructured data:

The main differences between structured, semi-structured, and unstructured data are their level
of organization, format, and suitability for different types of data:

Structured data

Has a rigid format and is highly organized. Structured data is often stored in tables in databases
and is suitable for businesses that work with numerical data. Examples of structured data include
names, addresses, and credit card numbers.

Semi-structured data

Has some structure but is not strictly bound to a schema. Semi-structured data is a good choice
for combining structured and unstructured data, and is often used in the semantic web. Examples
of semi-structured data include digital photographs, which have some structured attributes like a
geotag and date, but can also be assigned tags like "yellow flower" or "white cat".

Unstructured data

Lacks a specific data model and is not organized. Unstructured data is best for data that can't be
easily structured, like text files, satellite images, and scientific data. Examples of unstructured
data include emails, social media posts, website content, and multimedia files.

DATA MAPPING FUNDAMENTALS

Data mapping is a process that involves tracking, documenting, and integrating data elements
to ensure data accuracy when moving data from one source to another. Here are some
fundamentals of data mapping:

Steps: The basic steps of data mapping include:

o Determine data fields: Decide which data fields to include in the map
o Naming conventions: Establish standard naming conventions
o Schema logic: Define schema logic or transformation rules
o Test logic: Test the logic on a small sample
o Complete the map: Finish the data map

Data mapping is essential for maintaining data quality and accuracy in a data warehouse. It's also
a foundational process for organizations that want a compliant privacy program.
Types of Data Mapping

There are several types of data mapping, including manual, semi-automated, and automated:

• Manual data mapping

Analysts use coding languages like SQL, C++, or Java to connect data sources and
document the process.

• Semi-automated data mapping

Also called schema mapping, this technique combines manual and automated
processes. A developer uses software to define relationships between similar data fields,
and then an IT specialist manually adjusts the connections. This method is good for
companies with limited budgets or small amounts of data.

• Automated data mapping

A tool handles all aspects of data mapping, which can be beneficial if a team lacks a
developer. This approach is good for eliminating human error and mapping large
datasets.

Other data mapping techniques include:

• Spreadsheet-based mapping
• Code-based mapping
• Metadata-driven mapping
• Schema matching and data profiling
• Rule-based mapping
• Template-based mapping

The type of data mapping technique you choose depends on the size and complexity of your
dataset, your desired level of precision, and your available resources.

Tools and techniques for data mapping

Here are some tools and techniques for data mapping:

• Data mapping tools

These tools can automate and monitor the data mapping process, and generate
documentation and reports. Some examples include:

• IBM InfoSphere DataStage: A tool that allows users to process large amounts of
complex data, and work on metadata projects

• CloverDX: A data integration platform that allows users to build and deploy
workflows to connect and transform data from various sources
• Jitterbit: A low-code data mapping platform that allows users to connect
applications and data, automate business processes, and create ETL pipelines

• Talend Open Studio: A free data mapping tool that allows users to visually map
source data to target data types

• Altova MapForce Platform: An integrated suite of tools that allows users to create
customizable and interactive maps

• Data quality tools

These tools allow users to define data mapping rules, perform data profiling, and identify
data quality issues

• Data mapping documentation

This documentation helps ensure transparency and collaboration, and provides a

reference for troubleshooting and future enhancements

When choosing a data mapping tool, you can consider things like:

• Ease of use
• Flexibility and scalability
• Data transformation capabilities
• Connectivity and integration
• Data quality features
• Performance and speed
• Debugging and error handling

DATA EXCHANGE PROTOCOLS

Data exchange protocols are a set of rules that govern how data is transmitted and received
between systems, and are essential for data mapping and exchange:

Data formats

Define how data is organized and presented, allowing different systems to interpret and use it
correctly. Examples include XML, JSON, and CSV.

Data exchange protocols

Govern how data is transmitted, ensuring that data exchange is secure and smooth. Examples
include HTTP, FTP, WebSockets, EDI, cXML, and API web services.
Data mapping

Connects different data sources so they can work together and speak the same language. For
example, if one system labels a customer's age as "Age" and another as "Birth Year", data
mapping would indicate that these are the same thing.

The choice of data formats and protocols depends on several factors, including: the nature of the
data, the requirements of the participating entities, the specific objectives of the data exchange,
and the existing technology infrastructure

COMMON CHALLENGES IN DATA MAPPING AND EXCHANGE

A. Data Incompatibility

Data incompatibility is when data has different formats, scales, ranges, units, or types, making it
difficult to process or integrate. This can happen when combining data from different sources, or
when the chosen method isn't appropriate for the data.

Data incompatibility can cause errors, inconsistencies, or poor quality results. For example, in
Google Analytics, you can't combine certain dimensions and metrics if they aren't compatible.

Here are some strategies for handling incompatible data:

• Identify and fix errors: Find and correct any errors in the data.
• Use a hybrid method: Use a combination of methods to ensure that each unit of data is
considered.
• Validate and monitor data: This helps ensure the quality of the data integration and the
accuracy of data points.
• Use data profiling, cleansing, mapping, transformation, and validation: These strategies
can help with data integration.

B. Data Quality Issues

Data quality issues can impact the accuracy and reliability of data, making it difficult to make
data-driven decisions. Some common data quality issues include:

Duplicate data:

When the same information is recorded more than once, which can skew analysis and lead to
errors.

Inaccurate data:
Data that contains errors, such as missing or incorrect values, typographical errors, or other
issues.
Incomplete data:

When a dataset has gaps due to missing data or records that are missing required fields.

Inconsistent data:

When the same information across multiple data sources has mismatches in formats, units, or
spellings.

Data overload:

When a large data set is overwhelming, making it difficult to find the necessary information.

Outdated data:

When data becomes obsolete and the computers don't notice the changes.

Ambiguous data:

When errors occur in large databases or data lakes, such as spelling mistakes, formatting
difficulties, or deceptive column heads.

C. Scalability

One of the most obvious system scalability issues is hardware limitations. This means that the
physical components of the system, such as CPU, memory, disk, network, or power supply, are
not sufficient to meet the demand or growth of the system.

D. Security

Here are some ways to ensure data integrity and privacy during exchange:

• Secure file transfer: Use encryption, authentication, and secure protocols to protect
data during transit.
• Data encryption: Encrypt data in transit and at rest using industry-standard encryption
protocols like SSL/TLS.
• Strong authentication: Use strong authentication protocols like OAuth 2.0, API keys,
tokens, and multi-factor authentication.
• Audit trails: Maintain an audit trail to track the source of data changes, who triggered
them, and when.
• Risk management: Adopt a proactive approach to risk management.
• Blockchain: Use blockchain to create an immutable ledger for data transactions.
• Data validation: Perform risk-based validation to ensure protocols address data quality
and reliability.
• Regular backups: Have regular backups and recovery plans.
• Data versioning: Use data versioning and timestamps.
• Error handling: Have error handling mechanisms.

E. Version Control

Schema evolution is the process of managing changes to a database schema while preserving
existing data and compatibility with the old schema. Here are some best practices for handling
schema evolution in dynamic systems:

• Schema design: Use a data format that supports schema evolution, like JSON, Avro, or
Parquet.
• Version control: Implement version control to track changes and maintain backward
compatibility.
• Schema registry: Use a schema registry to store blueprints for each type of data in a
centralized repository.
• Semantic versioning: Use semantic versioning rules to guide compatibility strategies.
• Backward and forward compatibility: Design schemas to ensure smooth transitions
between schema versions.
• Regular schema reviews: Conduct regular schema reviews to ensure that schema
changes are intentional and documented.
• Change management: Implement change management processes to ensure that
schema changes are aligned with the overall design and application requirements.

Some challenges of schema evolution include: Data loss, Compatibility issues, Inconsistent
schemas, and Data type mismatches.

BEST PRACTICES

• Early Validation:
o Test mappings for accuracy before full implementation.

• Documentation:
o Maintain detailed schema mappings and transformation logic.

• Use Standard Formats:

o Adhere to widely accepted formats and protocols.

• Automation:
o Employ tools to reduce manual errors and improve efficiency.

• Monitor and Audit:

o Implement logging and monitoring for data exchange processes.
FUTURE TRENDS

Some future trends in data mapping and exchange include:

Automation:

Automation will play a role in the future of data mapping and exchange.

Cloud-based solutions:

Cloud-based solutions will be a driving force in the future of data mapping and exchange.

Data governance:

Data governance will be a driving force in the future of data mapping and exchange.

Real-time integration:

Real-time integration will be a driving force in the future of data mapping and exchange.

Improved data quality:

Improved data quality will be a driving force in the future of data mapping and exchange.

Machine learning:

Machine learning will be a driving force in the future of data mapping and exchange.

Emerging technologies:

Emerging technologies like artificial intelligence (AI), blockchain, and edge computing will
enhance data processing capabilities, improve security, and streamline data integration.

Generative AI:

Generative AI will be a driving force in data management trends.

Artificial General Intelligence (AGI):

AGI is a type of AI with advanced capabilities that can outperform in many tasks.

Retrieval-augmented generation (RAG):

RAG is a technique that improves the accuracy and reliability of GenAI models by incorporating
information from external sources.
Gartner prediction:

Gartner predicts that more than 50% of critical data will be created and processed outside of the
enterprise's data center and cloud by 2025.
Summary

Data Mapping and Exchange refer to the processes of aligning data fields between different
systems and transferring data in a consistent and usable format.

• Data Mapping ensures that information from a source system corresponds correctly to
the target system, enabling integration, migration, and interoperability.
• Data Exchange focuses on transferring this mapped data using standardized formats
(e.g., JSON, XML, CSV) and protocols (e.g., REST, SOAP, EDI).

These processes are vital for seamless communication between systems, addressing challenges
like schema mismatches, data quality, and security. They are often supported by tools and
automation to enhance accuracy and efficiency.

Chapter-1: Internship Management System
No ratings yet
Chapter-1: Internship Management System
37 pages
The Process of Data Mapping For Data Integration Projects
No ratings yet
The Process of Data Mapping For Data Integration Projects
6 pages
Data Mapping
No ratings yet
Data Mapping
3 pages
Data Analytics
No ratings yet
Data Analytics
8 pages
Lesson4 - DATA MAPPING
No ratings yet
Lesson4 - DATA MAPPING
6 pages
Lesson4 - DATA MAPPING
No ratings yet
Lesson4 - DATA MAPPING
7 pages
Chapter-1 Introduction To Data Analytics
No ratings yet
Chapter-1 Introduction To Data Analytics
34 pages
Data Mapping
No ratings yet
Data Mapping
5 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
Data Analysis
No ratings yet
Data Analysis
6 pages
Emergency Chapter Two
No ratings yet
Emergency Chapter Two
41 pages
Chapter Two
No ratings yet
Chapter Two
57 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Data Mapping and Exchange
No ratings yet
Data Mapping and Exchange
13 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Unit 1 - PPT
No ratings yet
Unit 1 - PPT
67 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Ba CH02
No ratings yet
Ba CH02
23 pages
Data Mapping Ebook
No ratings yet
Data Mapping Ebook
17 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
CSD 1043: Big Data Fundamentals Week1: Big Data Landscape: Definitions
No ratings yet
CSD 1043: Big Data Fundamentals Week1: Big Data Landscape: Definitions
13 pages
Data Science: Chapter Two
No ratings yet
Data Science: Chapter Two
8 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
8 pages
Business Data Analytics Part 3
No ratings yet
Business Data Analytics Part 3
59 pages
CSC4404 Chap3
No ratings yet
CSC4404 Chap3
84 pages
Data Engineering QB 14 Aug v1.0
No ratings yet
Data Engineering QB 14 Aug v1.0
40 pages
CH-2 Data Science
No ratings yet
CH-2 Data Science
45 pages
Ugc Questions
No ratings yet
Ugc Questions
17 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
6 pages
Download
No ratings yet
Download
4 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Basics of Data Integration
No ratings yet
Basics of Data Integration
67 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Data Science
No ratings yet
Data Science
32 pages
Moshi Moshi
No ratings yet
Moshi Moshi
25 pages
Infot 1 - Chapter 5
No ratings yet
Infot 1 - Chapter 5
5 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
EDM - Chapter 5 - Data Integration
No ratings yet
EDM - Chapter 5 - Data Integration
27 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
Introduction To Big Data Platform (Module-3)
No ratings yet
Introduction To Big Data Platform (Module-3)
23 pages
Emerging CH2
No ratings yet
Emerging CH2
41 pages
Ict Ch. 2
No ratings yet
Ict Ch. 2
38 pages
6a - Data Quality and Data Cleaning
No ratings yet
6a - Data Quality and Data Cleaning
5 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
Methods and Techniques of Data Processing
No ratings yet
Methods and Techniques of Data Processing
22 pages
Data Warehouse and Data Mining - Unit 1
No ratings yet
Data Warehouse and Data Mining - Unit 1
40 pages
Google Certificate Notes
No ratings yet
Google Certificate Notes
36 pages
Chapter 2 (Data Science)
No ratings yet
Chapter 2 (Data Science)
35 pages
DA (Unit 1)
No ratings yet
DA (Unit 1)
45 pages
Chapter 2 - Overview For Data Science
No ratings yet
Chapter 2 - Overview For Data Science
31 pages
Rudra Bhatt Data
No ratings yet
Rudra Bhatt Data
9 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Data Curation and Managment Chap1-5 1-5
No ratings yet
Data Curation and Managment Chap1-5 1-5
31 pages
DAVAI Macro
No ratings yet
DAVAI Macro
6 pages
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
From Everand
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
RSV Hr. Sec. School1692960046548 PDF
No ratings yet
RSV Hr. Sec. School1692960046548 PDF
31 pages
XII CS PB-I Sample Paper-VIII by KVS RO Jaipur
No ratings yet
XII CS PB-I Sample Paper-VIII by KVS RO Jaipur
9 pages
SAPGUI720 Installation Procedure
No ratings yet
SAPGUI720 Installation Procedure
9 pages
Computer Applications Sem-III
No ratings yet
Computer Applications Sem-III
17 pages
Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling
No ratings yet
Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling
12 pages
Topic: Data Warehousing: Amity University Jharkhand
No ratings yet
Topic: Data Warehousing: Amity University Jharkhand
7 pages
Feasibility Study: "All Projects Are Feasible Given Unlimited Resources and Infinite Time"
No ratings yet
Feasibility Study: "All Projects Are Feasible Given Unlimited Resources and Infinite Time"
7 pages
Chapter 6: Database Design Using The E-R Model: Database System Concepts, 7 Ed
No ratings yet
Chapter 6: Database Design Using The E-R Model: Database System Concepts, 7 Ed
80 pages
Addis Ababa Science and Technology University Department of Software Engineering Fundamentals of Database Systems Ch-1 Slide Yaynshet.M (Lecturer)
No ratings yet
Addis Ababa Science and Technology University Department of Software Engineering Fundamentals of Database Systems Ch-1 Slide Yaynshet.M (Lecturer)
42 pages
Bhatti WebTechnology With .NET Unit-3
No ratings yet
Bhatti WebTechnology With .NET Unit-3
55 pages
Data Modeler Release Notes
No ratings yet
Data Modeler Release Notes
92 pages
Lab 4
No ratings yet
Lab 4
2 pages
@@database (SQL)
No ratings yet
@@database (SQL)
82 pages
Data Integrity Constraints-Lab-2 Notes
No ratings yet
Data Integrity Constraints-Lab-2 Notes
20 pages
Chapter 3
No ratings yet
Chapter 3
7 pages
Introduction of Oracle-1
No ratings yet
Introduction of Oracle-1
62 pages
How To Import Business Partner Addresses Using DTW
No ratings yet
How To Import Business Partner Addresses Using DTW
3 pages
PeopleSoft Interview Questions
100% (1)
PeopleSoft Interview Questions
46 pages
Query Languages - DPP 02 Discussion Notes
No ratings yet
Query Languages - DPP 02 Discussion Notes
12 pages
BMC Application Restart Control Document 2 PDF
No ratings yet
BMC Application Restart Control Document 2 PDF
2 pages
Create Table: Performing Operation On Table Data
No ratings yet
Create Table: Performing Operation On Table Data
6 pages
MCQ - Hadoop - Javaguides
No ratings yet
MCQ - Hadoop - Javaguides
3 pages
Dbms Question Bank2 Marks 16 Marks
No ratings yet
Dbms Question Bank2 Marks 16 Marks
31 pages
Associate Cloud Engineer (How To Prepare For Exams)
No ratings yet
Associate Cloud Engineer (How To Prepare For Exams)
7 pages
Schema For Decision Support
No ratings yet
Schema For Decision Support
3 pages
Data Dependency
No ratings yet
Data Dependency
5 pages
Mysql Assignment Cs - Class 12 Second Term Exerecise 1: Table: Member
100% (1)
Mysql Assignment Cs - Class 12 Second Term Exerecise 1: Table: Member
4 pages
PSC 2010 Paper
No ratings yet
PSC 2010 Paper
5 pages
CRS - GI - ASM - Database Version Compatibility - Doc ID 337737.1
No ratings yet
CRS - GI - ASM - Database Version Compatibility - Doc ID 337737.1
3 pages

Module 2 Lesson 1

Uploaded by

Module 2 Lesson 1

Uploaded by

IT 2217

INTEGRATIVE PROGRAMMING AND TECHNOLOGIES 1

Precious Joan D. Samson, MIT [email protected]

At the end of the semester, the students will be able:

1. Students will understand data standards and formats;

DATA FORMATS AND STANDARDS

Differences between structured, semi-structured, and unstructured data:

DATA MAPPING FUNDAMENTALS

Steps: The basic steps of data mapping include:

• Manual data mapping

• Semi-automated data mapping

• Automated data mapping

Other data mapping techniques include:

Tools and techniques for data mapping

Here are some tools and techniques for data mapping:

• Data mapping tools

• Data quality tools

• Data mapping documentation

This documentation helps ensure transparency and collaboration, and provides a

DATA EXCHANGE PROTOCOLS

Data exchange protocols

COMMON CHALLENGES IN DATA MAPPING AND EXCHANGE

Here are some strategies for handling incompatible data:

B. Data Quality Issues

• Use Standard Formats:

• Monitor and Audit:

Some future trends in data mapping and exchange include:

Improved data quality:

Generative AI will be a driving force in data management trends.

Artificial General Intelligence (AGI):

Retrieval-augmented generation (RAG):

You might also like