DatawareHousing Concepts
DatawareHousing Concepts
Subject-oriented
A Data Warehouse is organized around major subjects, such as customer, supplier, product and sales. Rather than concentrating on day-to-day operations and transaction processing of an organization.
Integrated
A Data Warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures and so on.
Time Variant
Data are stored to provide information from a historical perspective (example: the past 5 -10 years). Every key structure in the data warehouse contains, either implicitly or explicitly an element of time
Non-Volatile
A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. due to this separation, a data warehouse does not require transaction processing,recovery,and concurrency control mechanisms. it usually requires only two operations in data accessing :initial loading loading of data and access of data.
Transaction-driven
Application -oriented Supports day-to-day decisions Serves large number of clerical/operational users
Analysis driven
Subject -oriented Supports strategic decisions Serves relatively low number of managerial users
Dimension Modeling
Dimension: Dimension is a structure which consists of levels, and hierarchies is defined on each level. Example:
SEX
MALE
FEMALE
Dimension Modeling
Example:
Profession Level 0
Engineer
Secretary
Teacher
Level 1
Chemical
Civil
Executive
Junior
Elementary
High School
Level 2
FACTS
Fact: Fact consists of whole data with primary key, foreign key relation ship with dimensions and also consists of measures. There are Three types of facts
1.ADDITIVE FACTS 2.SEMI ADDITIVE FACTS
Fact less fact is a fact it does not contain Measures. A Dimension which can share more than one Fact is called Conform Dimension Collection of Star Schemas and Snowflake Schemas is called Galaxy.
Star schema: A Centralized fact table surrounded with dimension tables having Primary, Foreign key relation ship between them is called star schema. Snow flake Schema: A normalized star schema Is called Snow flake Schema Galaxy: Collection of Star schemas and snow flake Schema is called Galaxy.
Star Schema
Sex-Dim
Sex key Sex
Date-Dim
Date key Current year Current month Current week
Fact Table
Profession Key ----------------------------Sex key ---------------------------Address key ----------------------------Date key --------------------------------------Measures (Numeric)
Conform Dim
Address Dim
Address key Country State City
Profession-Dim
Profession-key Profession-class Title Level discipline
Supplier Dim Supplier key Supplier name Supplier address Supplier type
City Dim City key City name State Country Pin code
TYPES OF MAPPINGS
History Simple pass through (None) Slowly growing target (Full) Slowly changing dimension (depends)
Types of SCDS
1. 2. 3.
SCD-1:When you does not want History use this kind of mapping (Only insert else Update takes place) it inserts the new row or Update the existing dimensions.
When you want maintain full history use this kind of mapping. Inserts new and changed dimensions. Creates an effective date range to track changes.
SCD -2 (Versioning):
SCD -2 (Flaging):
Inserts new and changed dimensions. Creates a version number and increments the primary key to track changes. Inserts new and changed dimensions. Flags the current version and increments the primary key to track changes.
Architecture
Source Systems
DB2/400
ETL
ODBC Native
Flat files
ODBC
Oracle