0% found this document useful (0 votes)

4 views36 pages

Schemas

Uploaded by

dideep1624

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views36 pages

Schemas

Uploaded by

dideep1624

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Warehouse Models & Operators

∙ Data Models
− relations
− stars & snowflakes
− cubes
∙ Operators
− slice & dice
− roll-up, drill down
− pivoting
− other
CSE601 1
Multi-Dimensional Data
∙ Measures - numerical (and additive) data
being tracked in business, can be analyzed
and examined
∙ Dimensions - business parameters that
define a transaction, relatively static data
such as lookup or reference tables
∙ Example: Analyst may want to view sales
data (measure) by geography, by time, and
by product (dimensions)

CSE601 2
The Multi-Dimensional Model
“Sales by product line over the past six months”
“Sales by store between 1990 and 1995”
Store Info Key columns joining fact table
to dimension tables Numerical Measures

Prod Code Time Code Store Code Sales Qty

Fact table for

Product Info
measures

Dimension tables Time Info

...
CSE601 3
Multidimensional Modeling

∙ Multidimensional modeling is a technique

for structuring data around the business
concepts
∙ ER models describe “entities” and
“relationships”
∙ Multidimensional models describe
“measures” and “dimensions”

CSE601 4
Dimensional Modeling
∙ Dimensions are organized into hierarchies
− E.g., Time dimension: days → weeks → quarters
− E.g., Product dimension: product → product line →
brand
∙ Dimensions have attributes
Time Store
Date StoreID
Month City
Year State
Country
Region
CSE601 5
Dimension Hierarchies
Store Dimension Product Dimension

Total Total

Region Manufacturer

District Brand

Stores Products

CSE601 6
Schema Design
∙ Most data warehouses use a star schema to represent
multi-dimensional model.
∙ Each dimension is represented by a dimension table that
describes it.
∙ A fact table connects to all dimension tables with a
multiple join. Each tuple in the fact table consists of a
pointer to each of the dimension tables that provide its
multi-dimensional coordinates and stores measures for
those coordinates.
∙ The links between the fact table in the center and the
dimension tables in the extremities form a shape like a star.

CSE601 7
Star Schema (in RDBMS)

CSE601 8
Star Schema Example

CSE601 9
Star Schema
with Sample
Data

CSE601 10
The “Classic” Star Schema
⬥ A relational model with a one-to-many relationship
between dimension table and fact table.
⬥ A single fact table, with detail and summary data
⬥ Fact table primary key has only one key column per
dimension
⬥ Each dimension is a single table, highly denormalized
∙ Benefits: Easy to understand, intuitive mapping between the
business entities, easy to define hierarchies, reduces # of physical
joins, low maintenance, very simple metadata
∙ Drawbacks: Summary data in the fact table yields poorer
performance for summary levels, huge dimension tables a problem

CSE601 11
Need for Aggregates

∙ Sizes of typical tables:

− Time dimension: 5 years x 365 days = 1825
− Store dimension: 300 stores reporting daily sales
− Production dimension: 40,000 products in each store
(about 4000 sell in each store daily)
− Maximum number of base fact table records: 2 billion
(lowest level of detail)
∙ A query involving 1 brand, all store, 1 year:
retrieve/summarize over 7 million fact table rows.

CSE601 12
Aggregating Fact Tables
∙ Aggregate fact tables are summaries of the
most granular data at higher levels along the
dimension hierarchies.

e r a r chy
Hi
ls Product key
leve Store key
Product Store name
Category Territory
Department Product key
Time key Region
Store key
Unit sales
Multi-way aggregates:
Time key Sale dollars
Territory – Category – Month
Date Month
Quarter (Data values at higher level)
CSE601 Year 13
The “Fact Constellation” Schema

District Fact
Table Region Fact
District_ID Table
PRODUCT_KEY Region_ID
PRODUCT_KEY
PERIOD_KEY
PERIOD_KEY
Dollars
Dollars
Units Units
Price Price

CSE601 14
Aggregate Fact Tables

Product Base table Store

Sales facts Store key
Product key Product key Store name
Product Time key Territory
Category Store key Region
Department Unit sales
Sale dollars
Dimension
Time
One-way aggregate Derived from Product
Time key Sale facts Category
Date
Category key Category key
Month
Time key Category
Quarter
Store key Department
Year
Unit sales
Sales dollars
CSE601 15
Families of Stars

Dimensi
Dimensi Dimensio on
on n table
table table
Fact
table
Fact
Dimensi Dimens table
on ion
table table
Fact
table
Dimensio
Dimensio n
Dimensio
n n table
table table

CSE601 16
Snowflake Schema

∙ Snowflake schema is a type of star schema

but a more complex model.
∙ “Snowflaking” is a method of normalizing
the dimension tables in a star schema.
∙ The normalization eliminates redundancy.
∙ The result is more complex queries and
reduced query performance.

CSE601 17
Sales: Snowflake Schema
Category key
Product
Brand key category Region key
Brand name Region
Category name
key
Product key Territory key
Product Territory
name
Sales fact
name
Product Region key
code Product key
Brand key Time key Salesrep key
Product Customer Salesperson
key name
…. Territory key

Salesrep
CSE601 18
Snowflaking

∙ The attributes with low cardinality in each

original dimension table are removed to
form separate tables. These new tables are
linked back to the original dimension table
through artificial keys.

Product key Brand key Category key

Product name Brand name Product
Product code Category category
Brand key key

CSE601 19
Snowflake Schema

∙ Advantages:
− Small saving in storage space
− Normalized structures are easier to update and maintain
∙ Disadvantages:
− Schema less intuitive and end-users are put off by the
complexity
− Ability to browse through the contents difficult
− Degrade query performance because of additional joins

CSE601 20
What is the Best Design?

∙ Performance benchmarking can be used to

determine what is the best design.
∙ Snowflake schema: easier to maintain dimension
tables when dimension tables are very large
(reduce overall space). It is not generally
recommended in a data warehouse environment.
∙ Star schema: more effective for data cube
browsing (less joins): can affect performance.

CSE601 21
Aggregates
∙ Add up amounts for day 1
∙ In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1

CSE601 22
Aggregates
∙ Add up amounts by day
∙ In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date

CSE601 23
Another Example
∙ Add up amounts by day, product
∙ In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId

rollup
drill-down

CSE601 24
Aggregates
∙ Operators: sum, count, max, min,
median, ave
∙ “Having” clause
∙ Using dimension hierarchy
− average by region (within store)
− maximum by month (within date)

CSE601 25
Data Cube

Fact table view:

Multi-dimensional cube:

dimensions = 2

CSE601 26
3-D Cube

Fact table view: Multi-dimensional cube:

day 2

day 1

dimensions = 3

CSE601 27
Example
roll-up to region
Dimensions:
NY
Time, Product, Store
e

SF
or

roll-up to brand
St

LA
Attributes:
10
Product (upc, price, …)
Juice
Store …
Product

Milk 34
…
Coke 56
Hierarchies:
Cream 32 Product → Brand → …
Soap 12 roll-up to week Day → Week → Quarter
Bread 56
M T W Th F S S
Store → Region → Country
Time
56 units of bread sold in LA on M

CSE601 28
Cube Aggregation: Roll-up
Example: computing sums
day 2 ...
day 1

129
rollup
drill-down
CSE601 29
Cube Operators for Roll-up

day 2 ...
day 1
sale(s1,*,*)

129
sale(s2,p2,
*) sale(*,*,*)

CSE601 30
Extended Cube

day 2

day 1 sale(*,p2,*)

CSE601 31
Aggregation Using Hierarchies

day 2 store
day 1
region

country

(store s1 in Region A;
stores s2, s3 in Region B)

CSE601 32
Slicing

day 2

day 1

TIME = day 1

CSE601 33
Slicing &
Pivoting

CSE601 34
Summary of Operations
∙ Aggregation (roll-up)
− aggregate (summarize) data to the next higher dimension
element
− e.g., total sales by city, year → total sales by region, year
∙ Navigation to detailed data (drill-down)
∙ Selection (slice) defines a subcube
− e.g., sales where city =‘Gainesville’ and date = ‘1/15/90’
∙ Calculation and ranking
− e.g., top 3% of cities by average income
∙ Visualization operations (e.g., Pivot)
∙ Time functions
− e.g., time average
CSE601 35
Query & Analysis Tools
∙ Query Building
∙ Report Writers (comparisons, growth, graphs,…)
∙ Spreadsheet Systems
∙ Web Interfaces
∙ Data Mining

CSE601 36

M 1.4 Multidimensional Data Model
No ratings yet
M 1.4 Multidimensional Data Model
72 pages
3 - Data Warehousing and Business Intelligence
No ratings yet
3 - Data Warehousing and Business Intelligence
58 pages
Unit 3 OLAP and OLTP
No ratings yet
Unit 3 OLAP and OLTP
64 pages
Dmbi Assignment 2: Q.1. Explain STAR Schema. Ans-1
No ratings yet
Dmbi Assignment 2: Q.1. Explain STAR Schema. Ans-1
6 pages
Unit-1 Lecture Notes
100% (1)
Unit-1 Lecture Notes
43 pages
Bra Book Revision 10-22-13
100% (6)
Bra Book Revision 10-22-13
9 pages
Dimension Modelling
No ratings yet
Dimension Modelling
26 pages
Data Cube Technology
No ratings yet
Data Cube Technology
20 pages
Data Warehouse Models and OLAP Operations: Enrico Franconi
No ratings yet
Data Warehouse Models and OLAP Operations: Enrico Franconi
45 pages
Unit - 1
100% (1)
Unit - 1
29 pages
OLAP Vs OLTP 1635783645
No ratings yet
OLAP Vs OLTP 1635783645
44 pages
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
No ratings yet
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
40 pages
Data Warehouse - Logical Design
No ratings yet
Data Warehouse - Logical Design
40 pages
De Lab Programs
No ratings yet
De Lab Programs
32 pages
Data Warehousing - C03 - DM
No ratings yet
Data Warehousing - C03 - DM
42 pages
DW Mod 4
No ratings yet
DW Mod 4
37 pages
SQL01 - Introduction To Business Intelligence
No ratings yet
SQL01 - Introduction To Business Intelligence
75 pages
Multidimensional
No ratings yet
Multidimensional
77 pages
A Multi-Dimensional Data Model
No ratings yet
A Multi-Dimensional Data Model
37 pages
Olap Types and Operations
No ratings yet
Olap Types and Operations
42 pages
Designing Dimension Tables
No ratings yet
Designing Dimension Tables
6 pages
Unit 2 DWM
No ratings yet
Unit 2 DWM
16 pages
DWM Chp2 Notes
No ratings yet
DWM Chp2 Notes
21 pages
4 - Dimensional Modeling
No ratings yet
4 - Dimensional Modeling
71 pages
Dim Modelling Part 1 - Sh24
No ratings yet
Dim Modelling Part 1 - Sh24
50 pages
Unit-2 1
No ratings yet
Unit-2 1
60 pages
Datawarehouse Operations
No ratings yet
Datawarehouse Operations
18 pages
DW Unit IV Notes
No ratings yet
DW Unit IV Notes
36 pages
DMDW-MDM L8,9
No ratings yet
DMDW-MDM L8,9
53 pages
1
No ratings yet
1
35 pages
4 Lecture 4-Dimensional Modelling
No ratings yet
4 Lecture 4-Dimensional Modelling
45 pages
Data Warehousing: Data Models and OLAP Operations: by Kishore Jaladi
No ratings yet
Data Warehousing: Data Models and OLAP Operations: by Kishore Jaladi
41 pages
Lecture 3 Data Warehouse Modelling
No ratings yet
Lecture 3 Data Warehouse Modelling
58 pages
Multidimensional Data Model and OLAP
No ratings yet
Multidimensional Data Model and OLAP
21 pages
Unit 2
No ratings yet
Unit 2
32 pages
DWDM 2
No ratings yet
DWDM 2
16 pages
Data Warehousing Fundamentals: Priyanka Deshmukh
No ratings yet
Data Warehousing Fundamentals: Priyanka Deshmukh
43 pages
DWDM Notes
No ratings yet
DWDM Notes
19 pages
L7. Multidimensional Modeling
No ratings yet
L7. Multidimensional Modeling
29 pages
Unit I DMT
No ratings yet
Unit I DMT
74 pages
Warehouse Models & Operators
No ratings yet
Warehouse Models & Operators
36 pages
Unit 2 Notes DWM
No ratings yet
Unit 2 Notes DWM
14 pages
Maintenance of Plastics Processing & Testing Machinery Unit 1
100% (5)
Maintenance of Plastics Processing & Testing Machinery Unit 1
41 pages
Data Warehousing & OLAP (Business Intellegent)
No ratings yet
Data Warehousing & OLAP (Business Intellegent)
31 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
47 pages
BA
No ratings yet
BA
6 pages
GE3791 - Unit 3 - 4 Skepticism, Empiricism, Rationalism and Scientific Temper
0% (1)
GE3791 - Unit 3 - 4 Skepticism, Empiricism, Rationalism and Scientific Temper
22 pages
Data Warehouse Lec-3
No ratings yet
Data Warehouse Lec-3
38 pages
Chap 2
No ratings yet
Chap 2
21 pages
03 Data Warehousing Data Mining MIM
No ratings yet
03 Data Warehousing Data Mining MIM
48 pages
ADBMS Assignment 2
No ratings yet
ADBMS Assignment 2
16 pages
Data Warehouses and Data Cubes
No ratings yet
Data Warehouses and Data Cubes
21 pages
Data Warehousing: Data Models and OLAP Operations: Lecture-1
No ratings yet
Data Warehousing: Data Models and OLAP Operations: Lecture-1
47 pages
Data - Warehouse - Dimensional Modeling Advanced Topics
No ratings yet
Data - Warehouse - Dimensional Modeling Advanced Topics
29 pages
What Is Data Warehouse?: Data Mining by IK Unit 2
No ratings yet
What Is Data Warehouse?: Data Mining by IK Unit 2
21 pages
Data Warehousing and Data Mining: Sunil Paudel
No ratings yet
Data Warehousing and Data Mining: Sunil Paudel
29 pages
Star Schema
100% (3)
Star Schema
45 pages
Shandon Cytospin 3 Operator Guide
No ratings yet
Shandon Cytospin 3 Operator Guide
68 pages
Schema Designs
No ratings yet
Schema Designs
10 pages
Revised PN Staff Writing Manual - 1
No ratings yet
Revised PN Staff Writing Manual - 1
334 pages
Data Warehousing Mid-Term Answers (Tentative)
No ratings yet
Data Warehousing Mid-Term Answers (Tentative)
4 pages
SQL01 - Introduction To Business Intelligence
No ratings yet
SQL01 - Introduction To Business Intelligence
75 pages
Chapter 03
100% (2)
Chapter 03
16 pages
Unit 5 App Development
No ratings yet
Unit 5 App Development
25 pages
RRL
100% (1)
RRL
3 pages
Teip7419 Mo
No ratings yet
Teip7419 Mo
22 pages
CPAP-HFNC - Medin - NC3 Ops - Manual Book
No ratings yet
CPAP-HFNC - Medin - NC3 Ops - Manual Book
59 pages
OOPS Lab File
No ratings yet
OOPS Lab File
60 pages
Neofiti 1 - Deuteronomio - Translation-English
No ratings yet
Neofiti 1 - Deuteronomio - Translation-English
68 pages
ENG 201 Quiz # 1
50% (2)
ENG 201 Quiz # 1
5 pages
Soe Hed Cbcs Syllabus
No ratings yet
Soe Hed Cbcs Syllabus
53 pages
Lesson 1: Pre-Analytical Factors and Gross Description: Histopathologic and Cytologic Techniques - Lecture
No ratings yet
Lesson 1: Pre-Analytical Factors and Gross Description: Histopathologic and Cytologic Techniques - Lecture
28 pages
Data Warehouse 21reg
No ratings yet
Data Warehouse 21reg
2 pages
manual-KVL-c304i (D1) Öá W0208
No ratings yet
manual-KVL-c304i (D1) Öá W0208
8 pages
Quotation Dumbwaiter
No ratings yet
Quotation Dumbwaiter
10 pages
Akash Internship Report
No ratings yet
Akash Internship Report
49 pages
Resilience To Overfitting AdaBoosts Approach
No ratings yet
Resilience To Overfitting AdaBoosts Approach
8 pages
Chapter 7
No ratings yet
Chapter 7
19 pages
Practice Exam For Final Exam Acct301 With Answers
No ratings yet
Practice Exam For Final Exam Acct301 With Answers
9 pages
Solaris Disk Quota Implementation
No ratings yet
Solaris Disk Quota Implementation
2 pages
Litir - Laethanta Saoire
No ratings yet
Litir - Laethanta Saoire
3 pages
2104 RZIM Academy Notes 5.1
No ratings yet
2104 RZIM Academy Notes 5.1
5 pages
Math 6 Module 19 Sessions 3 4 Mini-Peta On No. Sequence
No ratings yet
Math 6 Module 19 Sessions 3 4 Mini-Peta On No. Sequence
13 pages
Configuring The Switch For Access Point Discovery
No ratings yet
Configuring The Switch For Access Point Discovery
8 pages
Asia-Pacific Trade Agreement
No ratings yet
Asia-Pacific Trade Agreement
2 pages
Excise, Taxation and Narcotics - Government of Sindh
No ratings yet
Excise, Taxation and Narcotics - Government of Sindh
1 page
The Chevron Way
No ratings yet
The Chevron Way
7 pages
The Ghosts of Adichanallur - Artefacts That Suggest An Ancient Tamil Civilisation of Great Sophistication - The Hindu
No ratings yet
The Ghosts of Adichanallur - Artefacts That Suggest An Ancient Tamil Civilisation of Great Sophistication - The Hindu
12 pages
RDMC - Cairo Metro Line-3 Checklist 03-02: Rail - Greasy Status Check Preventive
No ratings yet
RDMC - Cairo Metro Line-3 Checklist 03-02: Rail - Greasy Status Check Preventive
1 page
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages
TSR Notes
No ratings yet
TSR Notes
6 pages

Schemas

Uploaded by

Schemas

Uploaded by

Warehouse Models & Operators

Prod Code Time Code Store Code Sales Qty

Fact table for

Dimension tables Time Info

∙ Multidimensional modeling is a technique

∙ Sizes of typical tables:

Product Base table Store

∙ Snowflake schema is a type of star schema

∙ The attributes with low cardinality in each

Product key Brand key Category key

∙ Performance benchmarking can be used to

Fact table view:

Fact table view: Multi-dimensional cube:

You might also like