0% found this document useful (0 votes)
56 views37 pages

DWH Architecture & Concepts

The document describes data warehousing architecture. It discusses how data is extracted from multiple sources using ETL processes, cleaned and transformed, and then loaded into a data warehouse. The data warehouse consists of a staging layer, OLAP cube for summarization and aggregations, and a reporting layer for generating canned and ad-hoc reports. Dimensional modeling techniques are used to design the data warehouse for effective analysis of the data.

Uploaded by

RajaPraveen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views37 pages

DWH Architecture & Concepts

The document describes data warehousing architecture. It discusses how data is extracted from multiple sources using ETL processes, cleaned and transformed, and then loaded into a data warehouse. The data warehouse consists of a staging layer, OLAP cube for summarization and aggregations, and a reporting layer for generating canned and ad-hoc reports. Dimensional modeling techniques are used to design the data warehouse for effective analysis of the data.

Uploaded by

RajaPraveen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Data Warehouse Architecture

Data warehousing Architecture

OLTP ETL Data Warehouse Reporting


Data warehousing Architecture

Cleansing, Transformation & Loading


Extract-
Push/Pull

Source 1

Canned Reports

Source 2

Ad-hoc analysis
Source 3
Summaries
/
Aggregatio
ns

Source n

Sources
Staging Data Reporting
Layer Warehouse Layer

Internal/External ETL DWH Reporting


Sources
Data warehousing Architecture (ODS)

Cleansing, Transformation & Loading


Extract-
Push/Pull

Source 1

Canned Reports

Source 2

Ad-hoc analysis
Source 3 Summaries
Detail Data /
Aggregatio
ns

Source n Transformatio
n
Summarization
Aggregation
Sources Staging ODS Data Reporting
Layer Warehouse Layer

Internal/External ETL ODS DWH Reporting


Sources
Data warehousing Architecture (ODS & Data Marts)

Cleansing, Transformation & Loading


Extract-
Push/Pull

Source 1

Canned Reports

Source 2

Ad-hoc analysis
Source 3 Summaries
Detail Data / Cubes-
Aggregatio Conformed
ns Dimension
s

Source n Transformatio
n
Summarization
Aggregation
Sources Staging ODS Data Data Reporting
Layer Warehouse Marts Layer

Internal/External ETL ODS DWH Data Marts Reporting


Sources
Data Modeling
STEPS in DATA MODELING

Requirement Gathering

Analysis

Logical Database Design

Physical Database design


DATA MODELING Types

 Conceptual Data modeling


 Describe data requirements from a business point of
view without technical details
 Logical Data modeling
 Refine conceptual models
 Data structure oriented, platform independent
 Physical Data modeling
 Detailed specification of what is physically
implemented using specific technology
Conceptual Data Model

 A conceptual model shows data through business eyes.


 All entities which have business meaning.
 Important relationships
 Few significant attributes in the entities.
Logical Data Model
 This is actual implementation and extension of conceptual data
model.
 Logical data model includes all required entities, attributes, key
groups and relationships that represent business information and
define business rules.
Physical Data Model
A Physical data model may include
 Referential Integrity
 Indexes
 Views
 Alternate keys and other constraints
 Table spaces and physical storage objects.
Enterprise Data Model

Enterprise data model sometimes called as Global


business model and the entire information would
be captured in the form of entities.
Enterprise Data Model Example
Entity-Relationship Modeling

 Traditional modeling technique

 Technique of choice for OLTP

 Suited for corporate data warehouse


Limitations of E-R Modeling

 Poor Performance
 Tend to be very complex and difficult to
navigate.
Dimensional Modeling
 Dimensional data modeling comprises of one or more dimension tables and fact tables.

Eg . Dimension table - Location, Product, Time , Organization etc.,


 A Dimensional table stores Columns or dimensions that describe the objects in a fact
table. Dimension table contain the textual descriptors of the business. Each dimension is
defined by its single primary key.
 End users can easily understand and navigate the data structure.
Dimensional Modeling

 Dimensional modeling uses two basic concepts :


facts (measures), dimensions.

 Is powerful in representing the requirements of the


business user in the context of database tables.

 Focuses on numeric data, such as values counts,


weights, balances and occurrences.
Dimensional modeling

 Must identify
 Business process to be supported
 Grain (level of detail)
 Dimensions
 Facts
What is Fact?
 A fact is a collection of related data items,
consisting of measures and context data.
 Each fact typically represents a business item, a
business transaction, or an event that can be used
in analyzing the business or business process.
 Facts are measured, “continuously valued”,
rapidly changing information. Can be calculated
and/or derived.
Types of Facts
 Additive
Additive facts are facts that can be summed up through all of the
dimensions in the fact table.
 Able to add the facts along all the dimensions
Eg. Retail sales in $ (or) A sales fact
 Semi-Additive
Semi-Additive facts are facts that can be summed up for some of the dimensions
in the fact table, but not the others.
Eg. Daily balances fact can be summed up through the customers dimension
but not through the time dimension.
 Non-Additive
Non-Additive facts cannot be summed up for any of the dimensions present in
the fact table.
Eg. %(Percentages) , Ratios etc.,
Classification of Facts
 Based on the classification , there are 2 types of Fact tables.

 Cumulative Facts
 Snapshot Facts

 Cumulative Facts - This type of fact table describes what has happened over period
of time.

Eg. Additive Facts , Total sales by product by store by day or week or month or year .

 Snapshot Facts – This type of fact table describes the state of things in a particular
instance of time.

Eg. Semi-Additive & Non-Additive facts .


Factless Fact Table
 Some event tables have no obvious numeric facts
(measures) are called Factless fact tables.

 Events often are modeled as a fact table containing a series


of keys, each representing a participating dimension in the
event.

Example :- Promotion table


PROMO ID Promotion Start Dt End Dt Description
2213 Credit card 230413 270413 10% cash back
2214 Credit Card 280413 010513 15% cash back

In the above example PROMO ID ‘s Surrogate Keys


and those are not measures.
Dimensions Types

 Conformed Dimension

 Junk Dimension

 Slowly Changing Dimension

 Degenerated Dimension
Dimensions Types

 Conformed Dimension

A conformed dimension is a dimension, which is standard


across all data marts.

For example :- Enterprise Data Warehouse's data can


segmented into Sales Data Mart, Inventory and Shipping
Data Mart, Finance Data Mart, Geographical Data Mart,
HR and Management Data Mart and so on.
Dimensions Types
Dimensions Types

 Junk Dimension

Junk Dimension is used to records a collection of low-


cardinality Flags and Indicators data.
Flag data may be non-generic question's answers like
Yes/No or True/False or Activate/Deactivate.
Indicator data may be tiny text data like Height,
Width,Weight, Color, Status.
Dimensions Types
Figure 1 :
Dimensions Types
Figure 2 :
Dimensions Types

 Degenerated Dimension

The term degenerate dimension, refers to a field that will be


used as a criterion of analysis and that is stored in the fact
table.

For example :- If any fields from dimensions can not


perform grouping or summarized by the field in the fact
table.
Item number, Ticket numbers, Transaction number etc., are
examples of degenerated dimensions.
Data marts (DM)

- Data Mart is a subset of Data Warehouse.

It is really similar to a data warehouse but limited in scope and purpose


and is usually aligned with one department, function, application or
business unit.

Several names for DMs:


• Departmental DSS DBs
• OLAP Data bases
• multi-dimensional DBs (MDDB) or Cubes
• lightly summarized tables
Data marts Types

• Dependent data marts are marts that are fed directly by the DW,
sometimes supplemented with other feeds, such as external data.

• Independent data marts are marts that are fed directly by external
sources and do not use the DW.

• Embedded data marts are marts that are stored within the central DW.
They can be stored relationally as files or cubes.
Operational Data Store (ODS)
An ODS

• pulls together, validates, cleanses and integrates data

• foundation for providing integrated view of enterprise data.

• tactical decision support, day-to-day operations and management


reporting.

Characteristics
 Integrated
 Subject-oriented
 Volatile (including update)
 Current valued
Types of Schemas

- Star schema

- Snowflake schema

- constellation (or) Integrated (or) Galaxy (or)


Hybrid schema
Star Schema Design

 Single fact table surrounded by denormalized dimension


tables

 Fact table contains transaction type information.

 Many star schemas in a data mart.

 Easily understood by end users, more disk storage


required.
Example of Star Schema
Snowflake Schema

 Single fact table surrounded by normalized dimension tables.

 When dimensions become very large.

 Less intuitive, slower performance due to joins.


Example of Snowflake Schema

You might also like