0% found this document useful (0 votes)

42 views5 pages

Data Integration and The Extraction, Transformation and Loading Processes

Uploaded by

Poet Cruz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views5 pages

Data Integration and The Extraction, Transformation and Loading Processes

Uploaded by

Poet Cruz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Integration and the Extraction, Transformation, and Load (ETL) Processes By

Sharda et al. (2018).

Global competitive pressures, demand for return on investment (ROI), management and

investor inquiry, and government regulations are forcing business managers to rethink how they

integrate and manage their businesses. A decision maker typically needs access to multiple

sources of data that must be integrated. Before data warehouses, DMs, and BI software,

providing access to data sources was a major, laborious process. Even with modern Web-based

data management tools, recognizing what data to access and providing them to the decision

maker is a nontrivial task that requires database specialists. As data warehouses grow in size, the

issues of integrating data grow as well. The business analysis needs continue to evolve. Mergers

and acquisitions, regulatory requirements, and the introduction of new channels can drive

changes in BI requirements. In addition to historical, cleansed, consolidated, and point-in-time

data, business users increasingly demand access to real-time, unstructured, and/or remote data.

And everything must be integrated with the contents of an existing data warehouse. Moreover,

access via PDAs and through speech recognition and synthesis is becoming more commonplace,

further complicating integration issues (Edwards, 2003). Many integration projects involve

enterprise-wide systems. Orovic (2003) provided a checklist of what works and what does not

work when attempting such a project. Properly integrating data from various databases and other

disparate sources is difficult. When it is not done properly, though, it can lead to disaster in

enterprise-wide systems such as CRM, ERP, and supplychain projects (Nash, 2002). Data

Integration Data integration comprises three major processes that, when correctly implemented,

permit data to be accessed and made accessible to an array of ETL and analysis tools and the

data warehousing environment: data access (i.e., the ability to access and extract data from any
data source), data federation (i.e., the integration of business views across multiple data stores),

and change capture (based on the identification, capture, and delivery of the changes made to

enterprise data sources). See Application Case 3.2 for an example of how BP Lubricant benefits

from implementing a data warehouse that integrates data from many sources. Some vendors,

such as SAS Institute, Inc., have developed strong data integration tools. The SAS enterprise data

integration server includes customer data integration tools that improve data quality in the

integration process. The Oracle Business Intelligence Suite assists in integrating data as well.

Extraction, Transformation, and Load

At the heart of the technical side of the data warehousing process is extraction,

transformation, and load (ETL). ETL technologies, which have existed for some time, are

instrumental in the process and use of data warehouses. The ETL process is an integral

component in any data-centric project. IT managers are often faced with challenges because the

ETL process typically consumes 70% of the time in a data-centric project. The ETL process

consists of extraction (i.e., reading data from one or more databases), transformation (i.e.,

converting the extracted data from its previous form into the form in which it needs to be so that

it can be placed into a data warehouse or simply another database), and load (i.e., putting the data

into the data warehouse). Transformation occurs by using rules or lookup tables or by combining

the data with other data. The three database functions are integrated into one tool to pull data out

of one or more databases and place them into another, consolidated database or a data

warehouse. ETL tools also transport data between sources and targets, document how data

elements (e.g., metadata) change as they move between source and target, exchange metadata

with other applications as needed, and administer all runtime processes and operations (e.g.,

scheduling, error management, audit logs, statistics). ETL is extremely important for data
integration as well as for data warehousing. The purpose of the ETL process is to load the

warehouse with integrated and cleansed data. The data used in ETL processes can come from

any source: a mainframe application, an ERP application, a CRM tool, a flat file, an Excel

spreadsheet, or even a message queue. In Figure 3.9, we outline the ETL process. Data

warehouse Other internal applications Legacy system Cleanse Load Packaged application

Extract Transform Transient data source Data marts FIGURE 3.9 The ETL Process. The process

of migrating data to a data warehouse involves the extraction of data from all relevant sources.

Data sources may consist of files extracted from OLTP databases, spreadsheets, personal

databases (e.g., Microsoft Access), or external files. Typically, all the input files are written to a

set of staging tables, which are designed to facilitate the load process. A data warehouse contains

numerous business rules that define such things as how the data will be used, summarization

rules, standardization of encoded attributes, and calculation rules. Any data quality issues

pertaining to the source files need to be corrected before the data are loaded into the data

warehouse. One of the benefits of a well-designed data warehouse is that these rules can be

stored in a metadata repository and applied to the data warehouse centrally. This differs from an

OLTP approach, which typically has data and business rules scattered throughout the system.

The process of loading data into a data warehouse can be performed either through data

transformation tools that provide a GUI to aid in the development and maintenance of business

rules or through more traditional methods, such as developing programs or utilities to load the

data warehouse, using programming languages such as PL/SQL, C++, Java, or .NET Framework

languages. This decision is not easy for organizations. Several issues affect whether an

organization will purchase data transformation tools or build the transformation process itself: •

Data transformation tools are expensive. • Data transformation tools may have a long learning
curve. • It is difficult to measure how the IT organization is doing until it has learned to use the

data transformation tools. In the long run, a transformation-tool approach should simplify the

maintenance of an organization’s data warehouse. Transformation tools can also be effective in

detecting and scrubbing (i.e., removing any anomalies in the data). OLAP and data mining tools

rely on how well the data are transformed. As an example of effective ETL, Motorola, Inc., uses

ETL to feed its data warehouses. Motorola collects information from 30 different procurement

systems and sends them to its global SCM data warehouse for analysis of aggregate company

spending (see Songini, 2004). Solomon (2005) classified ETL technologies into four categories:

sophisticated, enabler, simple, and rudimentary. It is generally acknowledged that tools in the

sophisticated category will result in the ETL process being better documented and more

accurately managed as the data warehouse project evolves. Even though it is possible for

programmers to develop software for ETL, it is simpler to use an existing ETL tool. The

following are some of the important criteria in selecting an ETL tool (see Brown, 2004): •

Ability to read from and write to an unlimited number of data source architectures • Automatic

capturing and delivery of metadata • A history of conforming to open standards • An easy-to-use

interface for the developer and the functional user Performing extensive ETL may be a sign of

poorly managed data and a fundamental lack of a coherent data management strategy. Karacsony

(2006) indicated that there is a direct correlation between the extent of redundant data and the

number of ETL processes. When data are managed correctly as an enterprise asset, ETL efforts

are significantly reduced, and redundant data are completely eliminated. This leads to huge

savings in maintenance and greater efficiency in new development while also improving data

quality. Poorly designed ETL processes are costly to maintain, change, and update.

Consequently, it is crucial to make the proper choices in terms of the technology and tools to use
for developing and maintaining the ETL process. A number of packaged ETL tools are available.

Database vendors currently offer ETL capabilities that both enhance and compete with

independent ETL tools. SAS acknowledges the importance of data quality and offers the

industry’s first fully integrated solution that merges ETL and data quality to transform data into

strategic valuable assets. Other ETL software providers include Microsoft, Oracle, IBM,

Informatica, Embarcadero, and Tibco. For additional info

ETL Process: (Extract, Transform, and Load) Process
No ratings yet
ETL Process: (Extract, Transform, and Load) Process
21 pages
Data Wrangling With Python Lab Manual
No ratings yet
Data Wrangling With Python Lab Manual
29 pages
ETL Best Practices
No ratings yet
ETL Best Practices
21 pages
Etl Testing
67% (3)
Etl Testing
25 pages
2024 EN Marjory Mastering ETL PDF 1717617674
No ratings yet
2024 EN Marjory Mastering ETL PDF 1717617674
15 pages
Difference Between Recursion and Iteration
100% (1)
Difference Between Recursion and Iteration
1 page
CouchDB Presentation1
No ratings yet
CouchDB Presentation1
48 pages
Data Extraction
No ratings yet
Data Extraction
153 pages
Khanna Class Notes
No ratings yet
Khanna Class Notes
196 pages
A Mini Project Report ON Web Based College Admission System: Bachelor of Computer Applications
50% (2)
A Mini Project Report ON Web Based College Admission System: Bachelor of Computer Applications
48 pages
ETL Basic Concepts
No ratings yet
ETL Basic Concepts
63 pages
Intro To ETL
No ratings yet
Intro To ETL
43 pages
Extract, Transform, Load
No ratings yet
Extract, Transform, Load
9 pages
ETL (Extract, Transform, and Load) Process
No ratings yet
ETL (Extract, Transform, and Load) Process
8 pages
CS351/Week 7 Linux Assignment
0% (1)
CS351/Week 7 Linux Assignment
3 pages
The Etl Process
No ratings yet
The Etl Process
70 pages
Ungkapan Rasa Sayang
No ratings yet
Ungkapan Rasa Sayang
60 pages
That Noir Passage Between Europe and Ame
No ratings yet
That Noir Passage Between Europe and Ame
216 pages
Data Warehousing and Data Mining: Sunil Paudel
No ratings yet
Data Warehousing and Data Mining: Sunil Paudel
29 pages
CSE Report Template
No ratings yet
CSE Report Template
24 pages
Lab Manual 03
No ratings yet
Lab Manual 03
19 pages
Assignment On Chapter 8 Data Warehousing and Management
No ratings yet
Assignment On Chapter 8 Data Warehousing and Management
13 pages
DW Lecture UNIT 2
No ratings yet
DW Lecture UNIT 2
40 pages
11 User Defined Functions
No ratings yet
11 User Defined Functions
69 pages
Unit 2 DW
No ratings yet
Unit 2 DW
75 pages
Data Min
No ratings yet
Data Min
2 pages
DW Unit II Notes
No ratings yet
DW Unit II Notes
57 pages
ETL Tutorial
No ratings yet
ETL Tutorial
32 pages
06-Data-Integration Quality Profiling
No ratings yet
06-Data-Integration Quality Profiling
39 pages
Balck Book Meadical
No ratings yet
Balck Book Meadical
70 pages
NEXGO SmartSDK API 3.0.2
No ratings yet
NEXGO SmartSDK API 3.0.2
202 pages
Assignment On Chapter 8 Data Warehousing and Management
No ratings yet
Assignment On Chapter 8 Data Warehousing and Management
13 pages
ETL - Extract, Transform and Load: What Is A Data Warehouse?
No ratings yet
ETL - Extract, Transform and Load: What Is A Data Warehouse?
30 pages
All Bi
No ratings yet
All Bi
17 pages
ETL (Extract, Transform, and Load
No ratings yet
ETL (Extract, Transform, and Load
17 pages
Lec 13-ETL
No ratings yet
Lec 13-ETL
18 pages
(IJCT-V2I5P3) Author :mr. Nilesh Mali, MR - SachinBojewar
No ratings yet
(IJCT-V2I5P3) Author :mr. Nilesh Mali, MR - SachinBojewar
8 pages
ETL (Extract, Transform, and Load) Process in Data Warehouse
No ratings yet
ETL (Extract, Transform, and Load) Process in Data Warehouse
6 pages
Reading Material Mod 4 Data Integration - Data Warehouse
No ratings yet
Reading Material Mod 4 Data Integration - Data Warehouse
33 pages
What Is ETL?: ETL Is A Process That Extracts The Data From Different Source Systems, Then
No ratings yet
What Is ETL?: ETL Is A Process That Extracts The Data From Different Source Systems, Then
7 pages
CC105 System Aquisition and Development
No ratings yet
CC105 System Aquisition and Development
25 pages
1500 Revision Management
No ratings yet
1500 Revision Management
18 pages
Strategic Management Accounting
No ratings yet
Strategic Management Accounting
18 pages
Strategic Management Accounting
No ratings yet
Strategic Management Accounting
18 pages
5.4 Peep-Hole Optimization
No ratings yet
5.4 Peep-Hole Optimization
20 pages
MIC Unit-2 Notes
No ratings yet
MIC Unit-2 Notes
21 pages
Lecture - 10 - OS Security
No ratings yet
Lecture - 10 - OS Security
45 pages
Annex 1 Technology Architecture
No ratings yet
Annex 1 Technology Architecture
20 pages
Data Warehousing Dr. L. Rajya Lakshmi
No ratings yet
Data Warehousing Dr. L. Rajya Lakshmi
16 pages
Survey On ETL Processes
No ratings yet
Survey On ETL Processes
11 pages
What Is ETL
No ratings yet
What Is ETL
13 pages
Utlimate Guide: ETL/ Datawarehouse Testing
No ratings yet
Utlimate Guide: ETL/ Datawarehouse Testing
12 pages
Dot Net Questions Solve 2024..
No ratings yet
Dot Net Questions Solve 2024..
34 pages
Religious Themes in Silence 1
No ratings yet
Religious Themes in Silence 1
9 pages
Unit 3-1
No ratings yet
Unit 3-1
19 pages
Essay Done - Edited
No ratings yet
Essay Done - Edited
17 pages
Sheila A. Ibia Bsit 2 What Is ETL (Extract, Transform, Load) ?
No ratings yet
Sheila A. Ibia Bsit 2 What Is ETL (Extract, Transform, Load) ?
5 pages
Lesson 01 Data Insights With ETL Essentials
No ratings yet
Lesson 01 Data Insights With ETL Essentials
44 pages
DWH Concepts Overview
No ratings yet
DWH Concepts Overview
11 pages
PlayPosit Video Transcript CDE Baby Human To Feel 2 Stimulation by Robin Suitt
No ratings yet
PlayPosit Video Transcript CDE Baby Human To Feel 2 Stimulation by Robin Suitt
11 pages
ETLO
No ratings yet
ETLO
13 pages
Online Analytical Processing: OLAP (Or Online Analytical Processing) Has Been Growing in Popularity Due To The
No ratings yet
Online Analytical Processing: OLAP (Or Online Analytical Processing) Has Been Growing in Popularity Due To The
12 pages
What Is ETL?
No ratings yet
What Is ETL?
6 pages
Extract, Transform, Load
No ratings yet
Extract, Transform, Load
8 pages
Gym Management
No ratings yet
Gym Management
12 pages
ETL Process in Data Warehouse
No ratings yet
ETL Process in Data Warehouse
26 pages
In The Realm of Understanding The Humans - Edited
No ratings yet
In The Realm of Understanding The Humans - Edited
13 pages
Assignment Report
No ratings yet
Assignment Report
15 pages
Data Warehousing & Data Mining Unit 1 ETL (Extract, Transform, and Load) Process What Is ETL?
No ratings yet
Data Warehousing & Data Mining Unit 1 ETL (Extract, Transform, and Load) Process What Is ETL?
4 pages
ETL (Extract, Transform, and Load) Process What Is ETL?: Data Warehouse Technique Needs To Change With Business Changes
No ratings yet
ETL (Extract, Transform, and Load) Process What Is ETL?: Data Warehouse Technique Needs To Change With Business Changes
4 pages
A Literature Review For A Dissertation
No ratings yet
A Literature Review For A Dissertation
17 pages
Nastran Interface Tutorial PDF
No ratings yet
Nastran Interface Tutorial PDF
14 pages
1 Name Instructor Course Date Biology Activities: Activity 34A: Connection: DDT and The Environment
No ratings yet
1 Name Instructor Course Date Biology Activities: Activity 34A: Connection: DDT and The Environment
8 pages
Agile ETL
No ratings yet
Agile ETL
7 pages
01 ETL Concepts
No ratings yet
01 ETL Concepts
10 pages
Transcript
No ratings yet
Transcript
7 pages
Running Head: Proposal Argument Essay
No ratings yet
Running Head: Proposal Argument Essay
7 pages
ETL
No ratings yet
ETL
3 pages
Love Is Blind - Edited
No ratings yet
Love Is Blind - Edited
10 pages
ETL Testing Â - Introduction - Tutorialspoint2
No ratings yet
ETL Testing Â - Introduction - Tutorialspoint2
3 pages
Affection, Deception, and Evolution
No ratings yet
Affection, Deception, and Evolution
10 pages
Data Management Research Paper - Sai Teja
No ratings yet
Data Management Research Paper - Sai Teja
7 pages
Methods in Java
No ratings yet
Methods in Java
4 pages
Utopia - Edited A.edited
No ratings yet
Utopia - Edited A.edited
7 pages
ETL Overview: What It Is and Why It Matters
No ratings yet
ETL Overview: What It Is and Why It Matters
5 pages
Documentation Breakdown 1
No ratings yet
Documentation Breakdown 1
3 pages
DWH and Testing1
No ratings yet
DWH and Testing1
11 pages
Sequential Numbering - Adobe Acrobat
No ratings yet
Sequential Numbering - Adobe Acrobat
5 pages
Function - Worksheet 1 - 2 Marks
No ratings yet
Function - Worksheet 1 - 2 Marks
6 pages
A Textual Analysis of Ballenger
No ratings yet
A Textual Analysis of Ballenger
5 pages
Race and Gender Are Social Constructs: Sally Haslanger: Feminist Metaphysics (Stanford Encyclopedia of Philosophy)
No ratings yet
Race and Gender Are Social Constructs: Sally Haslanger: Feminist Metaphysics (Stanford Encyclopedia of Philosophy)
4 pages
Stopping by The On A Snowy Evening by Robert Frost - Edited
No ratings yet
Stopping by The On A Snowy Evening by Robert Frost - Edited
4 pages
Running Head: SUPERMAN 1 Character Analysis of The Man of Steel (Superman) Film Student's Name Institutional Affiliation
No ratings yet
Running Head: SUPERMAN 1 Character Analysis of The Man of Steel (Superman) Film Student's Name Institutional Affiliation
4 pages
Plastic Pollution
No ratings yet
Plastic Pollution
4 pages
Experiment No. 04: Real-Life ETL Cycle
No ratings yet
Experiment No. 04: Real-Life ETL Cycle
4 pages
Video Analysi1
No ratings yet
Video Analysi1
4 pages
Learn Kotlin - Data Types & Variables Cheatsheet - Codecademy
No ratings yet
Learn Kotlin - Data Types & Variables Cheatsheet - Codecademy
3 pages
ETL Testing
No ratings yet
ETL Testing
12 pages
Journal of Orthopsychiatry, 33 (2), 390-392
No ratings yet
Journal of Orthopsychiatry, 33 (2), 390-392
2 pages
Duchenne Muscular Dystrophy (DMD)
No ratings yet
Duchenne Muscular Dystrophy (DMD)
3 pages
Proficient in Python For Professionals Syllabus
No ratings yet
Proficient in Python For Professionals Syllabus
3 pages
Casablanca
No ratings yet
Casablanca
2 pages
Integrating Data
No ratings yet
Integrating Data
4 pages
Chapter 4 (PRE 6)
No ratings yet
Chapter 4 (PRE 6)
4 pages
ETL
No ratings yet
ETL
3 pages
Software Engineeering Ar23-Cse
No ratings yet
Software Engineeering Ar23-Cse
2 pages
Vaibhav Verma Resume
No ratings yet
Vaibhav Verma Resume
2 pages
Module 6 - ETL (Extraction, Transformation, Loading)
No ratings yet
Module 6 - ETL (Extraction, Transformation, Loading)
3 pages
Opinion in The Article and The News
No ratings yet
Opinion in The Article and The News
2 pages
A Journey of Growth and New Beginnings
No ratings yet
A Journey of Growth and New Beginnings
2 pages
Entrylevel Software Engineer Resume Example
No ratings yet
Entrylevel Software Engineer Resume Example
1 page
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient ETL Systems Design: Definitive Reference for Developers and Engineers
From Everand
Efficient ETL Systems Design: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Data Integration and The Extraction, Transformation and Loading Processes

Uploaded by

Data Integration and The Extraction, Transformation and Loading Processes

Uploaded by

Data Integration and the Extraction, Transformation, and Load (ETL) Processes By

Sharda et al. (2018).

changes in BI requirements. In addition to historical, cleansed, consolidated, and point-in-time

Extraction, Transformation, and Load

maintenance of an organization’s data warehouse. Transformation tools can also be effective in

capturing and delivery of metadata • A history of conforming to open standards • An easy-to-use

Informatica, Embarcadero, and Tibco. For additional info

You might also like