ETL (Extract, Transform, and Load
ETL (Extract, Transform, and Load
ETL (Extract, Transform, and Load
Load) Process
• The mechanism of extracting information from
source systems and bringing it into the data
warehouse is commonly called ETL, which
stands for Extraction, Transformation and
Loading.
• The ETL process requires active inputs from
various stakeholders, including developers,
analysts, testers, top executives and is
technically challenging.
• To maintain its value as a tool for decision-
makers, Data warehouse technique needs to
change with business changes.
• ETL is a recurring method (daily, weekly,
monthly) of a Data warehouse system and
needs to be agile, automated, and well
documented
How ETL Works?
• ETL consists of three separate phases
Extraction
• Extraction is the operation of extracting information from a
source system for further use in a data warehouse
environment. This is the first stage of the ETL process.
• Extraction process is often one of the most time-
consuming tasks in the ETL.
• The source systems might be complicated and poorly
documented, and thus determining which data needs to
be extracted can be difficult.
• The data has to be extracted several times in a periodic
manner to supply all changed data to the warehouse and
keep it up-to-date.
• The cleansing stage is crucial in a data warehouse
technique because it is supposed to improve data
quality.
• The primary data cleansing features found in ETL
tools are rectification and homogenization.
• They use specific dictionaries to rectify typing
mistakes and to recognize synonyms, as well as
rule-based cleansing to enforce domain-specific
rules and defines appropriate associations between
values.
• The following examples show the essential of data
cleaning:
• If an enterprise wishes to contact its users or its
suppliers, a complete, accurate and up-to-date list of
contact addresses, email addresses and telephone
numbers must be available.
• If a client or supplier calls, the staff responding should
be quickly able to find the person in the enterprise
database, but this need that the caller's name or
his/her company name is listed in the database.
• If a user appears in the databases with two or
more slightly different names or different
account numbers, it becomes difficult to
update the customer's information.
Transformation