0% found this document useful (0 votes)

350 views26 pages

ETL Process in Data Warehouse

The document discusses the ETL (extraction, transformation, loading) process used to integrate data from multiple source systems into a data warehouse. It describes how data is extracted from various sources, transformed for quality and consistency, and loaded into the data warehouse. Key aspects of the ETL process include extracting data from different source systems, cleaning and transforming data during the loading stage, and properly handling slowly changing dimensions when loading data into fact and dimension tables.

Uploaded by

rudran_786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

350 views26 pages

ETL Process in Data Warehouse

Uploaded by

rudran_786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 26

ETL Process in Data

Warehouse
Data Warehouse
A subject-oriented, integrated, time-variant,
non updatable collection of data used in
support of management decision-making
processes

 Subject-oriented: e.g. customers, patients, students,

products
 Integrated: Consistent naming conventions, formats,
encoding structures; from multiple data sources
 Time-variant: Can study trends and changes
 Nonupdatable: Read-only, periodically refreshed;
never deleted
Data preprocessing Outline
 ETL
 Extraction
 Transformation
 Loading
Operational ETL Architecture
Data

File
System Data
Staging Extract Transform
Ware
house

External Data Base Cleanse Norma-

Data lise

ETL Engine Data Marts

Atomic data
Summary data
Transient data
ETL Overview
 Extraction Transformation Loading – ETL
 To get data out of the source and load it into the data
warehouse – simply a process of copying data from one
database to other
 Data is extracted from an OLTP database, transformed
to match the data warehouse schema and loaded into
the data warehouse database
 Many data warehouses also incorporate data from non-
OLTP systems such as text files, legacy systems, and
spreadsheets; such data also requires extraction,
transformation, and loading
 When defining ETL for a data warehouse, it is important
to think of ETL as a process, not a physical
implementation
ETL Overview
 ETL is often a complex combination of process and
technology - business analysts, database designers, and
application developers
 It is not a one time event as new data is added to the
Data Warehouse periodically – monthly, daily, hourly
 Because ETL is an integral, ongoing, and recurring part of
a data warehouse
 Automated
 Well documented
 Easily changeable
ETL Staging Database
 ETL operations should be performed on a
relational database server separate from the
source databases and the data warehouse
database
 Creates a logical and physical separation
between the source systems and the data
warehouse
 Minimizes the impact of the intense periodic ETL
activity on source and data warehouse databases
Extraction
Extraction

 The integration of all of the disparate systems across the

enterprise is the real challenge to getting the data
warehouse to a state where it is usable
 Data is extracted from heterogeneous data sources
 Each data source has its distinct set of characteristics
that need to be managed and integrated into the ETL
system in order to effectively extract data.
Extraction
 ETL process needs to effectively integrate systems that have
different:
 DBMS
 Operating Systems
 Hardware
 Communication protocols

 Need to have a logical data map before the physical data can
be transformed

 The logical data map describes the relationship between the

extreme starting points and the extreme ending points of your
ETL system usually presented in a table or spreadsheet
 The analysis of the source system is
usually broken into two major phases:
 The data discovery phase
 The anomaly detection phase
Extraction - Data Discovery Phase
 Data Discovery Phase
key criterion for the success of the data warehouse is the
cleanliness and cohesiveness of the data within it
 Once you understand what the target needs to look like,
you need to identify and examine the data sources

 Understanding the content of the data is crucial for

determining the best approach for retrieval

- NULL values

- Dates in non date fields

Transformation
Transformation
 Main step where the ETL adds value
 Actually changes data and provides
guidance whether data can be used for its
intended purposes
Transformation
Data Quality paradigm
 Correct
 Unambiguous
 Consistent
 Complete
 Data quality checks are run at 2 places - after
extraction and after cleaning and confirming
additional check are run at this point
Transformation - Cleaning Data
 Anomaly Detection
 Data sampling – count(*) of the rows for a department
column
 Column Property Enforcement
 Null Values in reqd columns
 Numeric values that fall outside of expected high and
lows
 Cols whose lengths are exceptionally short/long
 Cols with certain values outside of discrete valid value
sets
 Adherence to a reqd pattern/ member of a set of
pattern
Transformation - Confirming
 Structure Enforcement
 Tableshave proper primary and foreign keys
 Obey referential integrity

 Data and Rule value enforcement

 Simple business rules
 Logical data checks
Stop

Yes

Cleaning
Fatal Errors No Loading
Staged Data And
Confirming
Loading

Loading Dimensions
Loading Facts
Loading Dimensions
 Physically built to have the minimal sets of components
 The primary key is a single field containing meaningless
unique integer – Surrogate Keys
 The DW owns these keys and never allows any other
entity to assign them
 De-normalized flat tables – all attributes in a dimension
must take on a single value in the presence of a
dimension primary key.
 Should possess one or more other fields that compose
the natural key of the dimension
 The data loading module consists of all the steps
required to administer slowly changing dimensions
(SCD) and write the dimension to disk as a physical
table in the proper dimensional format with correct
primary keys, correct natural keys, and final descriptive
attributes.
 Creating and assigning the surrogate keys occur in this
module.
 The table is definitely staged, since it is the object to be
loaded into the presentation system of the data
warehouse.
Loading dimensions
 When DW receives notification that an
existing row in dimension has changed it
gives out 3 types of responses
Type 1
Type 2
Type 3
Type 1 Dimension
Type 2 Dimension
Type 3 Dimensions

ETL Process: (Extract, Transform, and Load) Process
No ratings yet
ETL Process: (Extract, Transform, and Load) Process
21 pages
Basics of Data Integration
100% (1)
Basics of Data Integration
61 pages
Data Warehousing - C04 - ETL
100% (1)
Data Warehousing - C04 - ETL
52 pages
ETL Process in Data Warehouse
67% (3)
ETL Process in Data Warehouse
40 pages
ETL Best Practices
No ratings yet
ETL Best Practices
21 pages
Reading Material Mod 4 Data Integration - Data Warehouse
No ratings yet
Reading Material Mod 4 Data Integration - Data Warehouse
33 pages
Unit 3
No ratings yet
Unit 3
33 pages
Data Min
No ratings yet
Data Min
2 pages
Lecture 7 (17-04-2024)
No ratings yet
Lecture 7 (17-04-2024)
29 pages
Data Warehouse
No ratings yet
Data Warehouse
86 pages
ELT Process
No ratings yet
ELT Process
80 pages
Oracle Golden Gate Interview Questions
100% (1)
Oracle Golden Gate Interview Questions
37 pages
Pivot Tables
100% (1)
Pivot Tables
6 pages
Notes Download Ba
No ratings yet
Notes Download Ba
104 pages
Module 3
No ratings yet
Module 3
30 pages
Lecture 13 - Data Warehousing
No ratings yet
Lecture 13 - Data Warehousing
27 pages
Assignment On Chapter 8 Data Warehousing and Management
No ratings yet
Assignment On Chapter 8 Data Warehousing and Management
13 pages
DW Chap2
No ratings yet
DW Chap2
15 pages
Dataware House
100% (8)
Dataware House
42 pages
Survey On ETL Processes
No ratings yet
Survey On ETL Processes
11 pages
All Bi
No ratings yet
All Bi
17 pages
DW Lecture UNIT 2
No ratings yet
DW Lecture UNIT 2
40 pages
07 ETLProcess PDF
No ratings yet
07 ETLProcess PDF
11 pages
Unit 2 DW
No ratings yet
Unit 2 DW
75 pages
Data Warehouse Concepts Presentation
100% (2)
Data Warehouse Concepts Presentation
60 pages
Bahria University: Assignment # 5
No ratings yet
Bahria University: Assignment # 5
12 pages
Customer Relationship Management: Unit - IV: Lesson - 8
No ratings yet
Customer Relationship Management: Unit - IV: Lesson - 8
76 pages
Bi Unit 3
No ratings yet
Bi Unit 3
26 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Solve These Questions
No ratings yet
Solve These Questions
11 pages
Module 2
No ratings yet
Module 2
43 pages
(ETL) Ahmad Abdalkareem Lafta
No ratings yet
(ETL) Ahmad Abdalkareem Lafta
8 pages
06-Data-Integration Quality Profiling
No ratings yet
06-Data-Integration Quality Profiling
39 pages
ETL Power Point Presentation
No ratings yet
ETL Power Point Presentation
40 pages
ETL (Extract, Transform, and Load) Process
No ratings yet
ETL (Extract, Transform, and Load) Process
8 pages
08 - Data Pipelines Presentation
No ratings yet
08 - Data Pipelines Presentation
36 pages
ETL Process
No ratings yet
ETL Process
11 pages
ETL
No ratings yet
ETL
3 pages
Online Analytical Processing: OLAP (Or Online Analytical Processing) Has Been Growing in Popularity Due To The
No ratings yet
Online Analytical Processing: OLAP (Or Online Analytical Processing) Has Been Growing in Popularity Due To The
12 pages
Unit1 (DW&DM)
No ratings yet
Unit1 (DW&DM)
30 pages
Ims DB Material: Prepared by Ramanadham
100% (3)
Ims DB Material: Prepared by Ramanadham
55 pages
Data Warehousing Dr. L. Rajya Lakshmi
No ratings yet
Data Warehousing Dr. L. Rajya Lakshmi
16 pages
Data Warehousing and Data Mining: Sunil Paudel
No ratings yet
Data Warehousing and Data Mining: Sunil Paudel
29 pages
ETL Process in Data Warehouse: Chirayu Poundarik
No ratings yet
ETL Process in Data Warehouse: Chirayu Poundarik
40 pages
Extract, Transform, Load
No ratings yet
Extract, Transform, Load
8 pages
1 File Structure & Organization
No ratings yet
1 File Structure & Organization
23 pages
Assignment On Chapter 8 Data Warehousing and Management
No ratings yet
Assignment On Chapter 8 Data Warehousing and Management
13 pages
An Overview On Data Quality Issues at Data Staging ETL
No ratings yet
An Overview On Data Quality Issues at Data Staging ETL
4 pages
What Is ETL?: ETL Is A Process That Extracts The Data From Different Source Systems, Then
No ratings yet
What Is ETL?: ETL Is A Process That Extracts The Data From Different Source Systems, Then
7 pages
Outline: ETL Extraction Transformation Loading
No ratings yet
Outline: ETL Extraction Transformation Loading
38 pages
ETL (Extract, Transform, and Load) Process in Data Warehouse
No ratings yet
ETL (Extract, Transform, and Load) Process in Data Warehouse
6 pages
ETL Process in Data Warehouse: Click To Add Text Chirayu Poundarik
No ratings yet
ETL Process in Data Warehouse: Click To Add Text Chirayu Poundarik
37 pages
Data Warehousing
No ratings yet
Data Warehousing
16 pages
What Is ETL?
No ratings yet
What Is ETL?
6 pages
DWH and Testing1
No ratings yet
DWH and Testing1
11 pages
Sheila A. Ibia Bsit 2 What Is ETL (Extract, Transform, Load) ?
No ratings yet
Sheila A. Ibia Bsit 2 What Is ETL (Extract, Transform, Load) ?
5 pages
DWH Concepts Overview
No ratings yet
DWH Concepts Overview
11 pages
Library System Thesis Philippines
100% (3)
Library System Thesis Philippines
8 pages
Integrasi Data Dan ETL
No ratings yet
Integrasi Data Dan ETL
45 pages
Overview of Exchange Server Database Architecture and Database Engine
100% (1)
Overview of Exchange Server Database Architecture and Database Engine
5 pages
Veeam Definitive Guide 2023
No ratings yet
Veeam Definitive Guide 2023
36 pages
SAP BO Auditor Configuration
No ratings yet
SAP BO Auditor Configuration
11 pages
ETL Testing
No ratings yet
ETL Testing
12 pages
1904 - Horner - George - The Statutes of The Apostles or Canones Ecclesiastici
No ratings yet
1904 - Horner - George - The Statutes of The Apostles or Canones Ecclesiastici
533 pages
JC Consulting Case Exercise - QBE
No ratings yet
JC Consulting Case Exercise - QBE
3 pages
Advance Excel COURSE
No ratings yet
Advance Excel COURSE
6 pages
Essential Access Exercises
No ratings yet
Essential Access Exercises
15 pages
AI 900 Questions
0% (1)
AI 900 Questions
8 pages
Lecture Topic: Protein Databases: Topics Covered
No ratings yet
Lecture Topic: Protein Databases: Topics Covered
67 pages
Student Enrollment User - Manual PDF
No ratings yet
Student Enrollment User - Manual PDF
9 pages
1 Data Storage Structures-Ch13
No ratings yet
1 Data Storage Structures-Ch13
34 pages
Import Minex Data To Spry
No ratings yet
Import Minex Data To Spry
11 pages
Unit Testing: Types of Testing in ETL?
No ratings yet
Unit Testing: Types of Testing in ETL?
26 pages
STRT UNIT 3 and 4
No ratings yet
STRT UNIT 3 and 4
25 pages
Qlikview Hide Expression in Pivot Table
No ratings yet
Qlikview Hide Expression in Pivot Table
60 pages
Doesn't Meet Our Quality Requirements
No ratings yet
Doesn't Meet Our Quality Requirements
1 page
Power Platform With Power Bi
No ratings yet
Power Platform With Power Bi
11 pages
The Impact of Information Quality
No ratings yet
The Impact of Information Quality
54 pages
Computer Savvy
No ratings yet
Computer Savvy
5 pages
Digital Communucation
No ratings yet
Digital Communucation
6 pages
Database Management Systems: BITS Pilani
No ratings yet
Database Management Systems: BITS Pilani
14 pages
Augmented Reality in Libraries - Transforming User Experience and Engagement
No ratings yet
Augmented Reality in Libraries - Transforming User Experience and Engagement
2 pages
Terminal and Command-Line Cheat Sheet: Simon
No ratings yet
Terminal and Command-Line Cheat Sheet: Simon
6 pages
BD Ventas
No ratings yet
BD Ventas
8 pages
Fire Base Vs Fire Store
No ratings yet
Fire Base Vs Fire Store
1 page
Streamlining ETL: A Practical Guide to Building Pipelines with Python and SQL
From Everand
Streamlining ETL: A Practical Guide to Building Pipelines with Python and SQL
Peter Jones
No ratings yet
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient ETL Systems Design: Definitive Reference for Developers and Engineers
From Everand
Efficient ETL Systems Design: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

ETL Process in Data Warehouse

Uploaded by

ETL Process in Data Warehouse

Uploaded by

ETL Process in Data

 Subject-oriented: e.g. customers, patients, students,

External Data Base Cleanse Norma-

ETL Engine Data Marts

 The integration of all of the disparate systems across the

 The logical data map describes the relationship between the

 Understanding the content of the data is crucial for

- Dates in non date fields

 Data and Rule value enforcement

You might also like