Unit 6 ETL and ELT
Unit 6 ETL and ELT
NDAYAMBAJE Simeon
6/25/2021 2
Data preprocessing
Why preprocessing?
Data are generally
– Incomplete: lacking attribute values, lacking certain
attributes of interest, or containing only aggregate
data
– Noisy: containing errors or outliers
– Inconsistent: containing discrepancies in codes or
names
Tasks in data preprocessing
1. Data cleaning: fill in missing values, smooth noisy
data,
2. Data integration: using multiple databases, data
cubes, or files.
3. Data transformation: normalization and aggregation.
4. Data reduction: reducing the volume but producing
the same or similar analytical results.
ETL
(Extract, Transform, and Load)
6/25/2021 6
What is ETL?
• ETL Stand for Extraction, Transformation and
Loading.
• The mechanism of extracting information from
source systems and bringing it into the data
warehouse is commonly called ETL
• The ETL process requires active inputs from
various stakeholders, including developers,
analysts, testers, top executives and is
technically challenging.
6/25/2021 7
ETL (Extract, Transform, and Load)
6/25/2021 8
How ETL Works?
• ETL consists of three separate phases:
6/25/2021 9
Extraction
6/25/2021 12
Data Transformation Strategies
Generalization is the
process of extracting
shared
characteristics from
two or more classes,
and combining them
into a generalized
superclass.
6/25/2021 16
Generalization
Normalization
.
Loading
6/25/2021 21
Loading(Cont…)
Loading can be carried in two ways:
• Refresh: Data Warehouse data is completely
rewritten. This means that older file is
replaced.
• Update: Only those changes applied to
source information are added to the Data
Warehouse. An update is typically carried
out without deleting or modifying preexisting
data.
6/25/2021 22
ELT (Extract, Load and Transform)
6/25/2021 23
ELT :Extract, Load and Transform
• ELT involves the extraction of aggregate
information from the source system and
loading to the target method instead of
transformation between the extraction and
loading phase.
• Once the data is copied or loaded into the
target method, then change takes place.
6/25/2021 24
ELT :Extract, Load and Transform
6/25/2021 25
Difference between ETL vs. ELT
Basics ETL ELT
Process Data is transferred to the Data remains in
ETL server and moved back the DB except for
to DB. High network cross Database
bandwidth required. loads (e.g. source
to object).
Transformation Transformations are Transformations
performed in ETL Server. are performed (in
the source or) in
the target.
6/25/2021 26
Difference between ETL vs. ELT
Basics ETL ELT
Analysis
6/25/2021 27
Thank you