DWH by Concepts - v1
DWH by Concepts - v1
Data Warehousing
Design by Who =
Customer
What = Product
Who = Customer When = Time
What = Product Where =
Location
When = Time
Where = Location
Data warehousing Concepts
1.Time variant :-
A Data warehouse is a time variant Database which allows
the Business users in analyzing the Data with respective to
various time periods
2. Non- Volatile :-
A Data Warehouse is Non- Volatile Database.
Once the Data enter into Data Warehouse it does
n’t reflects changes taken place at operation
Data Base.
Datawarehousing Concepts
3. Subject Oriented :-
A Data Warehouse is a Subject Oriented Data Base
which supports the Business needs of middle
level management in the Enterprises.
EX :- Finance , Sales , HR , Loans (Bank).
4.Integrated Database :-
A Data Warehouse is a Integrated Data Base
which collects the information from various
Operation Sources.
Datawarehousing Concepts
Datawarehousing Concepts
Historical Database :-
Types of E T L Tools :-
To implement the Data Acquisition we need ETL products.
They are two types of ETL products.
1. GUI based ETL :- An application developer without
having an programming knowledge it develops the
process with simple GUI ,point & click techniques.
Ex:- 1. Informatica ,
2. DataStage ,
3. ODI(Oracle Data Integrator),
Datawarehousing Concepts
Source System :
A System which provides the Data is known as
Source system.
They are two types Source systems.
1. Internal Sources :-
Operation Data Bases constructed on any
RDBMS like Oracle , SQL server , DB2 are known
as Internal Sources.
Datawarehousing Concepts
External Sources :-
The file systems like flat files , XML files & Excel
sheets are known as External Sources.
Target System
The System to which Data being Loads is known
as Target System.
Datawarehousing Concepts
Enterprise Data ware House Architecture
Flat file
XML file
Datawarehousing Concepts
Data Extraction:
It is a process of reading the Data from Internal
sources & External Sources typical like
operational sources , Flat files , XML files etc.,
Data Transformation :-
It is a process of concerning the Data into required
Business format.
Datawarehousing Concepts
1. Data Merging
2. Data Cleansing
3. Data Scrubbing
4. Data Aggregation
Datawarehousing Concepts
Data Merging :-
In Accuracy :-
$ 10
$ 7.8
$ 4.879
$ 10.00
$ 4.844
$ 7.80
$ 4.88
$ 4.84
Datawarehousing Concepts
3. Data Scrubbing :-
It is a process deriving the new definition using exist Source
data definition.
Datawarehousing Concepts
Data Aggregation :-
It is a process where multiple detail values are
summarized into single summery values.
Datawarehousing Concepts
Data Loading :-
It Is a process of pumping the data into Data Warehouse.
Data Mart :-
1. A Data Mart is a sub-set of Entrieprise Data Warehouse.
2. Data Mart is a subject oriented Data Base which supports the
Business needs of subject oriented. It supports the Business
need of middle management.
Datawarehousing Concepts
DM
EDW
DM
Datawarehousing Concepts
2. Bottom-Up Data Ware Housing Approach :-
According to the Ralph Kimbbol first me need to develop subject oriented
Data Bases known as Data Mart than Conglomerate the Data Mart into
enterprise Data Ware House.
EDW
Datawarehousing Concepts
• Data modeling
1) Star Schema
the Star Schema is the simplest style of data mart schema.
The star schema consists of one or more fact tables
referencing any number of dimension tables. The star
schema is an important special case of the
snowflake schema, and is more effective for handling
simpler queries.
Datawarehousing Concepts
• Star Schema
Datawarehousing Concepts
• Data modeling
2) Snowflake schema
A snowflake schema is a logical arrangement of tables in a
multidimensional database such that the entity relationship
diagram resembles a snowflake shape. The snowflake
schema is represented by centralized fact tables which are
connected to multiple dimensions.
Datawarehousing Concepts
Snowflake schema
Datawarehousing Concepts
• Slowly Changing Dimension
Type 1 :-Slowly Changing Dimension Implementation
In Type 1 Slowly Changing Dimension, the new
information simply overwrites the original information. In
other words, no history is kept.
In our example, recall we originally have the following
table:
Datawarehousing Concepts
• SCD Type 1
After Christina moved from Illinois to California, the new
information replaces the new record, and we have the following
table:
• Advantages
This is the easiest way to handle the Slowly Changing
Dimension problem, since there is no need to keep track of
the old information.
Datawarehousing Concepts
• SCD Type 1
• Disadvantages
All history is lost. By applying this methodology, it is not
possible to trace back in history. For example, in this case,
the company would not be able to know that Christina lived
in Illinois before.
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is
not necessary for the data warehouse to keep track of
historical changes.
Datawarehousing Concepts
• Slowly Changing Dimension
• Type 2 :- In Type 2 Slowly Changing Dimension, a new
record is added to the table to represent the new information.
Therefore, both the original and the new record will be
present. The new record gets its own primary key.
In our example, recall we originally have the following table:
Datawarehousing Concepts
• SCD Type 2
After Christina moved from Illinois to California, the new
information replaces the new record, and we have the following
table:
• Advantages
This allows us to accurately keep all historical information.
Datawarehousing Concepts
• SCD Type 2
This will cause the size of the table to grow fast. In cases where
the number of rows for the table is very high to start with,
storage and performance can become a concern.
This necessarily complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:-
•Type 2 slowly changing dimension should be used when it is
necessary for the data warehouse to track historical changes.
Datawarehousing Concepts
• Slowly Changing Dimension
• Type 3 :- In Type 3 Slowly Changing Dimension, , there will
be two columns to indicate the particular attribute of
interest, one indicating the original value, and one indicating
the current value. There will also be a column that indicates
when the current value becomes active.
In our example, recall we originally have the following table:
Datawarehousing Concepts
• SCD Type 3
• To accommodate Type 3 Slowly Changing Dimension, we
will now have the following columns:
• Customer Key
• Name
• Original State
• Current State
• Effective Date
After Christina moved from Illinois to California, the original
information gets updated, and we have the following table
(assuming the effective date of change is January 15, 2003):
• Advantages
This allows us to accurately keep all historical information.
Datawarehousing Concepts
• SCD Type 3
• Advantages:
This does not increase the size of the table, since new
information is updated.
This allows us to keep some part of history.
Datawarehousing Concepts
• SCD Type 3
• Disadvantages:
Type 3 will not be able to keep all history where an attribute is
changed more than once. For example, if Christina later
moves to Texas on December 15, 2003, the California
information will be lost.
Usage: Type 3 is rarely used in actual practice.
When to use Type 3:
• Type III slowly changing dimension should only be used
when it is necessary for the data warehouse to track
historical changes, and when such changes will only occur
for a finite number of time.
Datawarehousing Concepts
• What is a surrogate key?
A surrogate key is a substitution for the natural primary key. It
is a unique identifier or number ( normally created by a
database sequence generator ) for each record of a dimension
table that can be used for the primary key to the table.
Questions