0% found this document useful (0 votes)
17 views44 pages

ETL Basics Lesson 02

Uploaded by

singh.abhi.abhi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views44 pages

ETL Basics Lesson 02

Uploaded by

singh.abhi.abhi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

ETL Basics ETL Process

Page 02-1
ETL Basics ETL Process

Page 02-2
ETL Basics ETL Process

Page 02-3
ETL Basics ETL Process

Page 02-4
ETL Basics ETL Process

Now Let’ s go through that how Transforming data will take place in the
Data Warehousing environment

Page 02-5
ETL Basics ETL Process

Page 02-6
ETL Basics ETL Process

Page 02-7
ETL Basics ETL Process

Page 02-8
ETL Basics ETL Process

Page 02-9
ETL Basics ETL Process

Data staging is used in cleansing, transforming, and integrating the


data.

Page 02-10
ETL Basics ETL Process

Page 02-11
ETL Basics ETL Process

Though the extraction process can be done in either of the methods i.e
either by hand coded methods or by using the tools. Tool based
extraction have a well defined approach with a better documentation
and it also makes the extraction process easier by a simple click, drag
and drop features that are more user-friendly to the programmers.

Page 02-12
ETL Basics ETL Process

Bulk extraction needs the entire data warehouse to be refreshed


periodically in which the entire data which is there in the data
warehouse and the data to be loaded in to the warehouse are loaded
once again in to the warehouse which uses heavy network traffic. But
this mechanism is much easier to set up and maintain .

Page 02-13
ETL Basics ETL Process

Page 02-14
ETL Basics ETL Process

Aggregates, such as sales totals, are often precalculated and stored in


the warehouse to speed queries that require summary totals.

Page 02-15
ETL Basics ETL Process

Page 02-16
ETL Basics ETL Process

Data cleansing is critical to customer relationship management


initiatives.

Page 02-17
ETL Basics ETL Process

A good example to use is cleansing customer data. Most students can


identify with receiving multiple copies of the same catalog because the
company is not doing a good data cleansing job.

Page 02-18
ETL Basics ETL Process

The record is broken down into atomic data elements.

Page 02-19
ETL Basics ETL Process

Page 02-20
ETL Basics ETL Process

External data, such as census data, is often used in this process.

Page 02-21
ETL Basics ETL Process

Page 02-22
ETL Basics ETL Process

Companies decide on the standards that they want to use.

Page 02-23
ETL Basics ETL Process

Page 02-24
ETL Basics ETL Process

Commercial data cleansing software often uses AI techniques to match


records.

Page 02-25
ETL Basics ETL Process

Page 02-26
ETL Basics ETL Process

All of the data are now combined in a standard format.

Page 02-27
ETL Basics ETL Process

Page 02-28
ETL Basics ETL Process

Page 02-29
ETL Basics ETL Process

Page 02-30
ETL Basics ETL Process

Page 02-31
ETL Basics ETL Process

Page 02-32
ETL Basics ETL Process

Most loads involve only change data rather than a bulk reloading of all
of the data in the warehouse.

Page 02-33
ETL Basics ETL Process

Page 02-34
ETL Basics ETL Process

Page 02-35
ETL Basics ETL Process

The importance of meta data is now realized, even though creating it is


not glamorous work.

Page 02-36
ETL Basics ETL Process

Metadata is the high level core internal document of the source code
which runs as the lifeblood for a data warehouse.

Metadata not only describe the format and name but it provides details
about the context I,e what is the need of the data item and what are the
values that the data item can have, the relationship between the data
elements ie whether the data element is found on other locations and
how they are inter-linked to each other. Apart from the technical details
It also holds the business rule. The origin of the data is so critical that
the end user might like to trace back to the origin of the data which end
user sees through the OLAP tools.

Page 02-37
ETL Basics ETL Process

Importance of Metadata

Metadata establish the context of the Warehouse data


Metadata helps data warehouse administrators and users locate and
understand data items, both in the source systems and in the
warehouse data structures.

E.g.: The date 02/05/2010 could mean either May 2, 2010 or February
5, 2010 depending on the date convention used. Metadata describing
the format of this date field could help determine the definite and
unambiguous meaning of the data item.

Metadata facilitate the Analysis Process


Metadata must provide data warehouse end-users with the information
they need to easily perform the analysis steps. It should thus allow
users to quickly locate data that are in the warehouse.

Metadata should allow analysts to interpret data correctly by providing


information about data formats and data definitions.

Page 02-38
ETL Basics ETL Process

Metadata are a form of Audit Trail for Data Transformation


Metadata document the transformation of source data into warehouse
data. Hence warehouse metadata must be capable of explaining how a
particular piece of warehouse data was derived from the operational
systems.

All business rules governing the transformation of data to new values or


new formats are also documented as metadata.

Metadata Improve or Maintain Data Quality


Metadata can improve or maintain warehouse data quality through the
definition of valid values for individual warehouse data items. Using a
data quality tool prior to actual loading into the warehouse, the
warehouse load images can be reviewed to check for compliance with
valid values for key data items. Data errors are quickly highlighted for
correction.

Metadata can be used as the basis for any error-correction processing


that should be done if a data error is found. Error-correction rules are
documented in the metadata repository and executed by program code
on an as needed basis.

All business rules governing the transformation of data to new values or


new formats are also documented as metadata.

Metadata Improve or Maintain Data Quality


Metadata can improve or maintain warehouse data quality through the
definition of valid values for individual warehouse data items. Using a
data quality tool prior to actual loading into the warehouse, the
warehouse load images can be reviewed to check for compliance with
valid values for key data items. Data errors are quickly highlighted for
correction.

Metadata can be used as the basis for any error-correction processing


that should be done if a data error is found. Error-correction rules are
documented in the metadata repository and executed by program code
on a need basis.

Page 02-39
ETL Basics ETL Process

Page 02-40
ETL Basics ETL Process

Page 02-41
ETL Basics ETL Process

Page 02-42
ETL Basics ETL Process

Page 02-43
ETL Basics ETL Process

Add the notes here.

Page 02-44

You might also like