0% found this document useful (0 votes)
61 views20 pages

DatawareHousing Concepts

The document discusses concepts related to data warehousing including what a data warehouse is, its key characteristics like being subject-oriented, integrated, time-varying and non-volatile, how it differs from operational transaction systems, dimensional modeling concepts, fact tables, slowly changing dimensions, extract transform load (ETL) processes, and common data warehouse architectures. Dimensional modeling involves creating hierarchies of levels to structure data around subjects like customers, products and time periods.

Uploaded by

pattabhikv
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views20 pages

DatawareHousing Concepts

The document discusses concepts related to data warehousing including what a data warehouse is, its key characteristics like being subject-oriented, integrated, time-varying and non-volatile, how it differs from operational transaction systems, dimensional modeling concepts, fact tables, slowly changing dimensions, extract transform load (ETL) processes, and common data warehouse architectures. Dimensional modeling involves creating hierarchies of levels to structure data around subjects like customers, products and time periods.

Uploaded by

pattabhikv
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

DATA WAREHOUSING

Data Warehousing Concepts

What is a Data Warehouse?


A Data warehouse is a subject-oriented, integrated, time-varying, non-volatile collection of data in support of the management's decision making process (OR) Data warehouse is a relational database for Query and Analysis Rather than for transaction processing.
Subject-oriented(customer,products,sales,etc) Non-volatile Time-varying Integrated
(William.Inmon 1993)

Subject-oriented

A Data Warehouse is organized around major subjects, such as customer, supplier, product and sales. Rather than concentrating on day-to-day operations and transaction processing of an organization.

Integrated

A Data Warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures and so on.

Time Variant

Data are stored to provide information from a historical perspective (example: the past 5 -10 years). Every key structure in the data warehouse contains, either implicitly or explicitly an element of time

Non-Volatile

A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. due to this separation, a data warehouse does not require transaction processing,recovery,and concurrency control mechanisms. it usually requires only two operations in data accessing :initial loading loading of data and access of data.

Comparison of OLTP systems and data warehousing systems


OLTP Systems Data warehousing systems

Holds Current data


Stores detailed data Data is dynamic

Holds Historical data


stores detailed,lightly,and highly summarized data Data is largely static

Transaction-driven
Application -oriented Supports day-to-day decisions Serves large number of clerical/operational users

Analysis driven
Subject -oriented Supports strategic decisions Serves relatively low number of managerial users

Dimension Modeling
Dimension: Dimension is a structure which consists of levels, and hierarchies is defined on each level. Example:

SEX

MALE

FEMALE

Dimension Modeling
Example:
Profession Level 0

Engineer

Secretary

Teacher

Level 1

Chemical

Civil

Executive

Junior

Elementary

High School

Level 2

FACTS

Fact: Fact consists of whole data with primary key, foreign key relation ship with dimensions and also consists of measures. There are Three types of facts
1.ADDITIVE FACTS 2.SEMI ADDITIVE FACTS

3.NON ADDITIVE FACTS

Fact less Fact & Conform dim

Fact less fact is a fact it does not contain Measures. A Dimension which can share more than one Fact is called Conform Dimension Collection of Star Schemas and Snowflake Schemas is called Galaxy.

Star schema & Snow flake Schema

Star schema: A Centralized fact table surrounded with dimension tables having Primary, Foreign key relation ship between them is called star schema. Snow flake Schema: A normalized star schema Is called Snow flake Schema Galaxy: Collection of Star schemas and snow flake Schema is called Galaxy.

Star Schema
Sex-Dim
Sex key Sex

Date-Dim
Date key Current year Current month Current week

Fact Table
Profession Key ----------------------------Sex key ---------------------------Address key ----------------------------Date key --------------------------------------Measures (Numeric)

Conform Dim

Address Dim
Address key Country State City

Profession-Dim
Profession-key Profession-class Title Level discipline

Snow Flake Schema


Item Dim Item key Item name Type Supplier key Fact Table Time key Item key Location key ------------------------------------Rupees sold Units sold Time Dim Time key Year Quarter Month day

Supplier Dim Supplier key Supplier name Supplier address Supplier type

Location Dim Location key Street City key

City Dim City key City name State Country Pin code

TYPES OF MAPPINGS

History Simple pass through (None) Slowly growing target (Full) Slowly changing dimension (depends)

Types of SCDS

Slowly changing dimension1 Slowly changing dimension2

1. 2. 3.

Time stamping versioning Flaging Slowly changing dimension3

Slowly changing Dimension-1

SCD-1:When you does not want History use this kind of mapping (Only insert else Update takes place) it inserts the new row or Update the existing dimensions.

Slowly changing Dimension-2

SCD -2 (Time stamp):

When you want maintain full history use this kind of mapping. Inserts new and changed dimensions. Creates an effective date range to track changes.

SCD -2 (Versioning):
SCD -2 (Flaging):

Inserts new and changed dimensions. Creates a version number and increments the primary key to track changes. Inserts new and changed dimensions. Flags the current version and increments the primary key to track changes.

Slowly changing Dimension-3


SCD -3 : when you want partial history use this
kind of mapping. It Inserts new dimensions. Updates changed values in existing dimensions. Optionally uses the load date to track changes.

Data Warehouse Execution Architecture

Architecture

Source Systems
DB2/400

ETL

DATA WAREHOUSE Reporting( DSS) Servers

ODBC SQL Server Native FTP

Informatica (ETL) UNIX-HP

ODBC Native

NCR Tera Data Warehouse

Flat files

ODBC

Oracle

You might also like