0% found this document useful (0 votes)
39 views56 pages

DWH by Concepts - v1

Uploaded by

srividya.1020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views56 pages

DWH by Concepts - v1

Uploaded by

srividya.1020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Data warehousing Concepts

Data Warehousing

• A Data Warehouse is a Database specially


design for analyzing the Business but not
for transactional process.
• A Data warehouse is design to provide
answers to the four basic Business
questions.
Data warehousing Concepts

Design by Who =
Customer
What = Product
Who = Customer When = Time
What = Product Where =
Location
When = Time
Where = Location
Data warehousing Concepts

Characteristic Features of Data Warehousing

1.Time variant :-
A Data warehouse is a time variant Database which allows
the Business users in analyzing the Data with respective to
various time periods

EX :- Yearly , Quarterly , Monthly , Weakly & Daily.


Datawarehousing Concepts

2. Non- Volatile :-
A Data Warehouse is Non- Volatile Database.
Once the Data enter into Data Warehouse it does
n’t reflects changes taken place at operation
Data Base.
Datawarehousing Concepts

Data Base Data Warehouse

Emp.ID E.Names E . Salary


Emp.ID E .Names E . Salary

7369 AAAA 8000


7369 AAAA 8000
7369 AAAA 9000

Volatile Non- Volatile


Datawarehousing Concepts

3. Subject Oriented :-
A Data Warehouse is a Subject Oriented Data Base
which supports the Business needs of middle
level management in the Enterprises.
EX :- Finance , Sales , HR , Loans (Bank).

4.Integrated Database :-
A Data Warehouse is a Integrated Data Base
which collects the information from various
Operation Sources.
Datawarehousing Concepts
Datawarehousing Concepts

Decision Supporting System


Since Data Warehouse is Design to Support
decision making process. It is known as
Decision Supporting System.

Historical Database :-

Since if contain Historical Business Data. It is


known as Historical Data Base
Datawarehousing Concepts

Read Only Database :-

Since the Data Base is Design to only read


the Data for analysis process of making
Decisions. It is known as Read Only Data
Base.

• Differences B/w Operation Database & Data


Ware House will be shown in the next slide
Datawarehousing Concepts

• Operation • Data Warehouse(OLAP)


Database(OLTP)
1. It is Design to support decision
1. It is Design to Support making process.
transactional processing. 2. Data Not volatile.
2. Data is Volatile. 3. Historical Data.
3. Current Data. 4. Summary Data
4. Detail Data.
Datawarehousing Concepts
Datawarehousing Concepts
Data Acquisition :-

It is a process of extracting the reverent Business


information, transforming the Data into a reverent
Business format and loading into the Data ware
Housing.

Data Acquisition is defined with following process.


1. Data Extraction
2. Data Transformation
3. Data Loading
Datawarehousing Concepts

ETT Extraction , Transformation &


Transportation.

ETM Extraction , Transformation & Move

ETL Extraction , Transformation & Loading


Datawarehousing Concepts

Types of E T L Tools :-
To implement the Data Acquisition we need ETL products.
They are two types of ETL products.
1. GUI based ETL :- An application developer without
having an programming knowledge it develops the
process with simple GUI ,point & click techniques.
Ex:- 1. Informatica ,
2. DataStage ,
3. ODI(Oracle Data Integrator),
Datawarehousing Concepts

4. Oracle ware house Builder,


5. Business objects Data Integration.

2. Code Based ETL :- A developer should know the


programming Languages like SQL , PL /SQL in implementing
the Data Acquisition.
Ex:- 1. SQL
2. PL / SQL
3. SAS
Datawarehousing Concepts

• ETL Stage 1 • ETL Stage 2


Datawarehousing Concepts

Source System :
A System which provides the Data is known as
Source system.
They are two types Source systems.

1. Internal Sources :-
Operation Data Bases constructed on any
RDBMS like Oracle , SQL server , DB2 are known
as Internal Sources.
Datawarehousing Concepts

External Sources :-
The file systems like flat files , XML files & Excel
sheets are known as External Sources.

Target System
The System to which Data being Loads is known
as Target System.
Datawarehousing Concepts
Enterprise Data ware House Architecture

Flat file

XML file
Datawarehousing Concepts

Data Extraction:
It is a process of reading the Data from Internal
sources & External Sources typical like
operational sources , Flat files , XML files etc.,

Data Transformation :-
It is a process of concerning the Data into required
Business format.
Datawarehousing Concepts

• The following are the Data Transformation


activities takes place on Staging Data
Base.

1. Data Merging
2. Data Cleansing
3. Data Scrubbing
4. Data Aggregation
Datawarehousing Concepts

Data Merging :-

It is a process of Integrating the Data from the


multiple input pipe lines into a single output pipe
line.
The minimum number of Input
pipelines required Data Merging is 2 (Two).
The Input pipe lines can be of similar
structure or dissimilar structure.
Datawarehousing Concepts
Dissimilar structure :-
Datawarehousing Concepts
Similar structure :-
Datawarehousing Concepts
2. Data Cleansing :-
It is a process of changing the Inconsistencies & Inaccuracy.
Inconsistencies :-
Datawarehousing Concepts

In Accuracy :-

$ 10
$ 7.8
$ 4.879
$ 10.00
$ 4.844
$ 7.80
$ 4.88
$ 4.84
Datawarehousing Concepts

3. Data Scrubbing :-
It is a process deriving the new definition using exist Source
data definition.
Datawarehousing Concepts

Data Aggregation :-
It is a process where multiple detail values are
summarized into single summery values.
Datawarehousing Concepts

Data Loading :-
It Is a process of pumping the data into Data Warehouse.

Data Mart :-
1. A Data Mart is a sub-set of Entrieprise Data Warehouse.
2. Data Mart is a subject oriented Data Base which supports the
Business needs of subject oriented. It supports the Business
need of middle management.
Datawarehousing Concepts

Types of Data Warehousing Approaches


1. Top – Down Data Warehousing Approaches :-
According W.R.H. Immon first we need to develop the Enterprise Dataware
House than from the enterprise Data Warehouse develop subject oriented
Databases known as Data Marts.

DM
EDW
DM
Datawarehousing Concepts
2. Bottom-Up Data Ware Housing Approach :-
According to the Ralph Kimbbol first me need to develop subject oriented
Data Bases known as Data Mart than Conglomerate the Data Mart into
enterprise Data Ware House.

EDW
Datawarehousing Concepts

Types of Data Marts


1. Dependent Data Mart:- In the Top-Down DW Approach Data Mart
development dependent on Enterprise Dataware house such Data
Mart are known as Dependent Data Mart.
2. Independent Data Mart:- In the Bottom-Up DW Approach Data Mart
development is Independent on Enterprise Dataware house such
Data.
Datawarehousing Concepts
Datawarehousing Concepts

Dataware housing Life Cycle.


The following are the different faces Involved in
DWH Development life cycle.
1. Business Requirement Analysis.
2. Data Modeling (Database Designing).
3. ETL Dev. Life Cycle. 4. ETL Testing Life Cycle.
5. Report Dev. Life Cycle. 6. Testing of Reports
7. Production & Support.
Datawarehousing Concepts
• Data modeling
There are three levels of data modeling. They are
conceptual, logical, and physical. This section will explain
the difference among the three, the order with which each
one is created, and how to go from one level to the other.

1) Conceptual Data Model


2) Logical Data Model
3) Physical Data Model
Datawarehousing Concepts
• Data modeling
1) Conceptual Data Model
Features of conceptual data model include:-
• Includes the important entities and the relationships
among them.
• No attribute is specified.
• No primary key is specified.
• At this level, the data modeler attempts to identify the
highest-level relationships among the different
entities.
Datawarehousing Concepts
• Data modeling

2) Logical Data Model


Features of logical data model include:-
• Includes all entities and relationships among them.
• All attributes for each entity are specified.
• The primary key for each entity specified.
• Foreign keys (keys identifying the relationship between
different entities) are specified.
• Normalization occurs at this level.
Datawarehousing Concepts
Data modeling

Physical Data Model


• Features of physical data model include:
•Specification all tables and columns.
•Foreign keys are used to identify relationships between
tables.
•Demoralization may occur based on user requirements.
•Physical considerations may cause the physical data
model to be quite different from the logical data model.
Datawarehousing Concepts

• Data modeling
1) Star Schema
the Star Schema is the simplest style of data mart schema.
The star schema consists of one or more fact tables
referencing any number of dimension tables. The star
schema is an important special case of the
snowflake schema, and is more effective for handling
simpler queries.
Datawarehousing Concepts
• Star Schema
Datawarehousing Concepts
• Data modeling

2) Snowflake schema
A snowflake schema is a logical arrangement of tables in a
multidimensional database such that the entity relationship
diagram resembles a snowflake shape. The snowflake
schema is represented by centralized fact tables which are
connected to multiple dimensions.
Datawarehousing Concepts
Snowflake schema
Datawarehousing Concepts
• Slowly Changing Dimension
Type 1 :-Slowly Changing Dimension Implementation
In Type 1 Slowly Changing Dimension, the new
information simply overwrites the original information. In
other words, no history is kept.
In our example, recall we originally have the following
table:
Datawarehousing Concepts
• SCD Type 1
After Christina moved from Illinois to California, the new
information replaces the new record, and we have the following
table:

• Advantages
This is the easiest way to handle the Slowly Changing
Dimension problem, since there is no need to keep track of
the old information.
Datawarehousing Concepts
• SCD Type 1
• Disadvantages
All history is lost. By applying this methodology, it is not
possible to trace back in history. For example, in this case,
the company would not be able to know that Christina lived
in Illinois before.
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is
not necessary for the data warehouse to keep track of
historical changes.
Datawarehousing Concepts
• Slowly Changing Dimension
• Type 2 :- In Type 2 Slowly Changing Dimension, a new
record is added to the table to represent the new information.
Therefore, both the original and the new record will be
present. The new record gets its own primary key.
In our example, recall we originally have the following table:
Datawarehousing Concepts
• SCD Type 2
After Christina moved from Illinois to California, the new
information replaces the new record, and we have the following
table:

• Advantages
This allows us to accurately keep all historical information.
Datawarehousing Concepts
• SCD Type 2
This will cause the size of the table to grow fast. In cases where
the number of rows for the table is very high to start with,
storage and performance can become a concern.
This necessarily complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:-
•Type 2 slowly changing dimension should be used when it is
necessary for the data warehouse to track historical changes.
Datawarehousing Concepts
• Slowly Changing Dimension
• Type 3 :- In Type 3 Slowly Changing Dimension, , there will
be two columns to indicate the particular attribute of
interest, one indicating the original value, and one indicating
the current value. There will also be a column that indicates
when the current value becomes active.
In our example, recall we originally have the following table:
Datawarehousing Concepts
• SCD Type 3
• To accommodate Type 3 Slowly Changing Dimension, we
will now have the following columns:
• Customer Key
• Name
• Original State
• Current State
• Effective Date
After Christina moved from Illinois to California, the original
information gets updated, and we have the following table
(assuming the effective date of change is January 15, 2003):

• Advantages
This allows us to accurately keep all historical information.
Datawarehousing Concepts
• SCD Type 3

• Advantages:
This does not increase the size of the table, since new
information is updated.
This allows us to keep some part of history.
Datawarehousing Concepts
• SCD Type 3
• Disadvantages:
Type 3 will not be able to keep all history where an attribute is
changed more than once. For example, if Christina later
moves to Texas on December 15, 2003, the California
information will be lost.
Usage: Type 3 is rarely used in actual practice.
When to use Type 3:
• Type III slowly changing dimension should only be used
when it is necessary for the data warehouse to track
historical changes, and when such changes will only occur
for a finite number of time.
Datawarehousing Concepts
• What is a surrogate key?
A surrogate key is a substitution for the natural primary key. It
is a unique identifier or number ( normally created by a
database sequence generator ) for each record of a dimension
table that can be used for the primary key to the table.

A surrogate key is useful because natural keys may change.


• What is the difference between a primary key and a
surrogate key?
See next slides
Datawarehousing Concepts
• Primary Key
A primary key is a special constraint on a column or set of columns.
A primary key constraint ensures that the column(s) so
designated have no NULL values, and that every value is unique.
Physically, a primary key is implemented by the database system
using a unique index, and all the columns in the primary key must
have been declared NOT NULL. A table may have only one
primary key, but it may be composite (consist of more than one
column).
Datawarehousing Concepts
• Surrogate Key
A surrogate key is any column or set of columns that can be
declared as the primary key instead of a "real" or natural key.
Sometimes there can be several natural keys that could be
declared as the primary key, and these are all called candidate
keys. So a surrogate is a candidate key. A table could actually
have more than one surrogate key, although this would be
unusual. The most common type of surrogate key is an
incrementing integer, such as an auto increment column in
MySQL, or a sequence in Oracle, or an identity column in SQL
Server.
Datawarehousing Concepts

Questions

You might also like