Data Warehouse Data Modelling - Vincent Rainardi
Data Warehouse Data Modelling - Vincent Rainardi
SQLbits IV
Manchester
28th March 2009
Vincent Rainardi
2
Vincent Rainardi
About you
•Data warehousing
•Data modelling
•Dimensional modelling
3
Data Warehouse Data Modelling
•What is it
•Why is it important
•How to do it (case study)
•Miscellaneous topics (time permitting)
•Questions
4
Data Warehouse
Dimensional Normalised
•Particular business events •All business events
•Query oriented •Efficient to update
•Large data packets •Small data packets
•Multiple versions •Single version
•Analytics •Operational
7
Why is it important
dimension dimension
dimension dimension
Subscription Event
Date Customer
Media Agent
Subscription
Media Code Agent Name
Unit Category
Media Name Fee
Format Fee Type
Discount Active Subscribers
... Paid ...
Media Agent
Subscription
Media Code Agent ID
Unit Agent Name
Media Name Fee
Format Category
Discount Fee Type
Paid Active Subscribers
Role-playing dimension
20
Degenerate Dimension
Low cardinality
22
Fact Key
Next
• Slowly Changing Dimension
• Snowflake
25
Slowly Changing Dimension
Type 1: Overwrite old values
Before: After:
Key Name Email Key Name Email
1 Andy [email protected] 1 Andy [email protected]
• Valid From & Valid To (a.k.a. Effective Date & Expiry Date)
To put the right surrogate key in the fact table
Datetime (not date)
main main
dimension dimension
dimension dimension
dimension dimension
main main
fact dimension
dimension
dimension dimension
dimension dimension
main main
dimension dimension
dimension dimension
dimension dimension
28
Snowflake
•What is it
•Why is it important
•How to do it
•Miscellaneous topics
•Smart Date Key
•Dimensional Grain
•Real Time Fact Table
•Questions
30
Smart Date Key
8 digit integer YYYYMMDD
Unknown date?
31
Dimension Grain
• Dim Product Line: 2 attributes, product_key
• Dim Product: 10 attributes, product_grp_key
• Dim Product Group: 5 attributes
Combine into 1 dimension?
Snowflake Star
2 10 5
Fact 1 PL P PG Fact 1 PL 17
Fact 2 P PG Fact 2 P 15
Fact 3 PG Fact 3 PG 5
3 tables:
3 tables, linked FK-PK
• Different surrogate keys
• More flexible (attributes)
1 table with 3 views:
• Same surrogate keys
• Simpler load
32
Real Time Fact Table
Updated every time a transaction happens in the source system
• Today’s transactions only
• Stored in surrogate keys
• Limited dim updates -> unknown SK
• Heap
• Union with main fact table on query