0% found this document useful (0 votes)
114 views36 pages

L03B-Dimensional Modeling II

This document discusses dimensional modeling best practices for data warehouses. It covers rules for designing fact tables and dimension tables, as well as different types of dimensions like conformed dimensions, date/time dimensions, and slowly changing dimensions. The objectives are to explain how to model dimensions and facts to ensure data integrity, optimize query performance, and allow for flexible analysis of data over time. Dimensional modeling is crucial for building an effective data warehouse schema.

Uploaded by

Frans Sitohang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views36 pages

L03B-Dimensional Modeling II

This document discusses dimensional modeling best practices for data warehouses. It covers rules for designing fact tables and dimension tables, as well as different types of dimensions like conformed dimensions, date/time dimensions, and slowly changing dimensions. The objectives are to explain how to model dimensions and facts to ensure data integrity, optimize query performance, and allow for flexible analysis of data over time. Dimensional modeling is crucial for building an effective data warehouse schema.

Uploaded by

Frans Sitohang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Dimensional Modeling II Data Warehouse and Business Intelligence

Dimensional Modeling II
Samuel I. G. Situmeang

Modified slides provided by: Michael A. Fudge, Jr.


Data Warehousing, Syracuse University, 2017
Dimensional Modeling II Data Warehouse and Business Intelligence

Lecture Objectives
• Rules of Fact Table Design • Junk Dimensions
• Rules of Dimension Table Design • Snowflake & Outrigger Dimensions
• Dimension Cases in Detail • Fact Table Cases in Detail
• Conformed Dimensions • Facts of Different Granularity
• Multiple currencies / Units of
• Date and Time Dimensions Measure
• Degenerate Dimensions • Factless Fact Tables
• Slowly Changing Dimensions • Consolidated Fact Tables
• Role-Playing Dimensions
• Do's and Don'ts of DM

2
Dimensional Modeling II Data Warehouse and Business Intelligence

Rules of Fact Table Design

• The Primary Key of your fact table uses the minimum number columns possible
& no surrogate keys.
(It should be made up of FK’s and Degenerate Dimensions)
• Referential Integrity is a must. Every foreign key in the fact table must have a
value.
• Avoid NULLs in the foreign key by using flags which are special values in place of
null.
• Ex. “No Shopper Card” in Customer Dimension
• The granularity of your fact table should be at the lowest, most detailed atomic
grain captured by the business process. (discussed last time)
• Each fact should be Additive, or re-designed to be as additive as possible.
• Each fact must be of the of the same granularity.

3
Dimensional Modeling II Data Warehouse and Business Intelligence

What's Wrong w/This Fact Table


of Basketball Player game stats?

Stat Player Game Shot Shots Points Pts Per Shooting Pct
ID (PK) ID ID Attempts Made Shot
1 Jordan 1 3 2 5 1.667 0.667
2 Jordan 2 7 6 12 1.714 0.583
3 Miller 1 2 0 0 0.000 0.000
4 Miller 2 5 3 9 1.800 0.600
5 Miller 1 2 0 0 0.000 0.000

Can you find the 3 things wrong with


the implementation of this fact table?

4
Dimensional Modeling II Data Warehouse and Business Intelligence

What's Wrong w/This Fact Table?


Non Additive Facts
Poor
PK
Stat
Choice Player Game Shot Shots Points Pts Per Shooting Pct
ID (PK) ID ID Attempts Made Shot
1 Jordan 1 3 2 5 1.667 0.667
2 Jordan 2 7 6 12 1.714 0.583
3 Miller 1 2 0 0 0.000 0.000
4 Miller 2 5 3 9 1.800 0.600
5 Miller 1 2 0 0 0.000 0.000

Poor Choice
Can you find the 3 things wrong with
of FK (or PK) the implementation of this fact table?

5
Dimensional Modeling II Data Warehouse and Business Intelligence

Rules of Dimension Table Design

• Verbose attribute values should be as descriptive as possible.


• Descriptive columns – should be easy to tell what the column means.
• Complete – no null / empty values in any of the attributes.
• Discretely valued – one business entity value per row.
• Quality Assured – data is clean and consistent.
• Should always contain a business key, or legacy PK from source
system.
• Always have a Surrogate Primary Key. You do not introduce a
dependency on an external key.

6
Dimensional Modeling II Data Warehouse and Business Intelligence

What's Wrong w/This Dimension


of Products?

Prod Id Prod Name Prod Cat Prod Price Prod Region Code

A Apple Fruit $2.00 E

B Carrot Veg $1.50 S

C Cherries Friut $3.00 S

D Lettuce Veg $1.50

E Apple Fruit $2.00 E

Can you find the 6 things wrong with


the implementation of this dimension?

7
Dimensional Modeling II Data Warehouse and Business Intelligence

What's Wrong w/This Dimension?


No
Surrogate
Key Poor Descriptions

Prod Id Prod Name Prod Cat Prod Price Prod Reg Code

A Apple Fruit $2.00 E

B Carrot Veg $1.50 S Not


C Cherries Friut $3.00 S
Verbose
(What
D Lettuce Veg $1.50 do S & E
E Apple Fruit $2.00 E mean?)

Not Discretely
Valued Poor Data Incomplete
Quality

8
Dimensional Modeling II Data Warehouse and Business Intelligence

The Dimension Table Key

• Surrogate keys (identities, sequences e.g. 1,2,3,…) are used for the
primary key constraint.
• They yield best performance for the Star Schema
• most efficient joins,
• smaller indexes in fact table,
• more rows per block in the fact table
• They have no dependency on primary key in operational source data.
• Makes it easier to deal with changes to the source data.
• Dimension table requires a natural key or business key to identify a
unique row.
• Ex: Customer’s email address, Employee’s ID number.

9
Dimensional Modeling II Data Warehouse and Business Intelligence

Dimension Cases in Detail

10
Dimensional Modeling II Data Warehouse and Business Intelligence

Conformed Dimensions

• These are master or common reference dimensions.


• Shared across business processes (fact tables) in the DW.
• Reusable, can be used for drill-across, lower time to develop next star
schema.
• Contain a super-set of attributes required by all fact tables.
• Two types of Conformed Dimensions:
• Identical Dimensions – exactly the same dimensions (Ex. Dates)
• Perfect Subset of an existing dimension.

11
Dimensional Modeling II Data Warehouse and Business Intelligence

Ex. Conformed Dimensions a


Logical View
Product Dimension
Sales Fact Table
Product key PK
Date key FK
Product description
Product key FK
SKU number
… other FKeys…
Brand description
Sales quantity
Class description
Sales amount
Department description

Subset

Sales Forecast Fact Table


Brand Dimension
Month key FK
Brand key PK
Brand key FK
Brand description
… other FKeys…
Class description
Forecast quantity
Department description
Forecast amount

12
Dimensional Modeling II Data Warehouse and Business Intelligence

Date and Time Dimensions

• Just about every fact table as a date and / or time dimension.


• This is the most common of conformed dimensions.
• Usually generated programmatically during the ETL process or
imported from a spreadsheet.
• Acceptable to use PK in the form YYYMMDD
• In you need time of day, use a separate dimension.
• Time of day should only be used if there are meaningful textual
descriptions of time
• Ex. Lunch, Dinner, 1st shift, 2nd Shift, Etc…
• Elapsed times intervals are facts, not attributes.
• Ex. Minutes between when order was received and shipped

13
Dimensional Modeling II Data Warehouse and Business Intelligence

Ex. Date Dimension

Demonstrate Date and Time dimensions on SQL Server

14
Dimensional Modeling II Data Warehouse and Business Intelligence

How do you handle Time Zones?

• Express time in coordinated universal time (UTC)


• Express in local time, too.
• Other options: use a single time zone (for example, ET) to express all
times in this zone.

local call date Call Center Activity Fact


dimension Local call date key FK Local call time of
UTC call date UTC call date key FK day dimension
dimension Local call time of day FK UTC call time of
UTC call time of day FK day dimension

15
Dimensional Modeling II Data Warehouse and Business Intelligence

Degenerate Dimensions

• Dimensions we store in the fact table, because there’s too many of


them for their own a dimension. (For example a 1-1 relationship
from fact to dimension)
• These occur in transaction fact tables that have a parent child (One to
Many) structure.
• Ex. Order  Order Detail,
• Airline Ticket  Flights
• Allow us to drill-through to operational data, in the ODS.
• Usually ends up as part of the primary key of the fact table.

16
Dimensional Modeling II Data Warehouse and Business Intelligence

Slowly Changing Dimensions

• Dimensional data changes infrequently but when it does you need a strategy for
addressing the change.
• Ex: What happens when a customer has a new address, or an Employee has a
name change?

4 Popular strategies
Type 1: Overwrite the existing attribute
Type 2: Add a new Dimension row
Type 3: Add a new Dimension attribute -
Mini-Dimension: Add a new Dimension

• These strategies are not mutually exclusive, and can be combined.

17
Dimensional Modeling II Data Warehouse and Business Intelligence

Type 1: Overwrite

• Appropriate for:
• correcting mistakes or errors in data
• changes where historical associations do not matter
• the old value has no significance
• If the previous value matters, don’t use this strategy. You are
rewriting history.
• Problems will occur with data aggregated on old values.
• Ex. Employee Name Changes, Corrections, Natural Key Edits.

18
Dimensional Modeling II Data Warehouse and Business Intelligence

Type 2: Add New Dimension Row

• Most popular strategy, as it preserves history


• Natural key is repeated.
• Old and new values are stored along with effective dates and
indicator of which row is “current”

19
Dimensional Modeling II Data Warehouse and Business Intelligence

Type 3: Add A New Dimension Attribute

• Infrequently used, preserves history


• Useful for “Soft” changes where users might want to choose between
the old and new attribute, or need to access both values for a time.
• The new value is written to the existing column, the old value is
stored in a new column.
• This way queries do not have to be re-written to access the new
attribute.
• Ex. Redistricting sales territories. Re-charting accounting codes.

20
Dimensional Modeling II Data Warehouse and Business Intelligence

Mini-Dimensions: Add a new Dimension

• If attributes change frequently consider placing them in their own


“mini-dimensions”
• Most effective when you have banded values, or ranges of discrete
values.
Customer Dimension
Customer key PK
Customer ID (Nat. Key)
Customer Name
Fact Table …
Customer Key FK
Customer Demographics Key FK
… other FKeys… Customer Demographics Dimension
… Facts… Customer Demographics Key PK
Customer Age Band
Customer Gender
Customer Income Band

21
Dimensional Modeling II Data Warehouse and Business Intelligence

Role-Playing Dimensions

• The same physical dimension plays more than one


logical dimensional role.
• This is common among the date dimension
• Stored in the same physical table, just aliased as a view.
• Examples:
• Date: Order Date, Shipping Date, Delivery Date  Same Date
• Address: Ship to, Bill to  Same Address Dimension
• Airport: Arrival, Departure  Same Airport Dimension

22
Dimensional Modeling II Data Warehouse and Business Intelligence

Junk Dimensions

• Miscellaneous Flags and text attributes which do not fit within any other
dimension.
• Do Not make a Dimension for each one.
• Instead place them in their own “Junk” dimension
Invoice Payment Order Ship
Indicator Id Terms Mode Mode

1 Net 10 Web Freight


Don’t Create a
2 Net 10 Web Air Row in your
3 Net 10 Fax Freight Junk
Dimension
4 Net 10 Fax Air
Until You
5 Net 10 Phone Freight Need It in a
6 Net 10 Phone Air Fact
7 Net 15 Web Freight

8 Net 15 Web Air

23
Dimensional Modeling II Data Warehouse and Business Intelligence

Snowflake & Outrigger Dimensions

• When the redundant attributes are moved to a separate table


to eliminate redundancy we get a snowflaked dimension.

Product Dimension Product Size Dimension


Product Key FK Product Size Key PK
Product Name Product Size (S,M,L)
Product Size Key FK Product Size Fee

• Pros: Data is back in 3NF, saves space


• Cons: More complex for users, decreased performance.
• Sometimes this is desirable when there are a significant
number of attributes in the outrigger dimension. These are
the exception not the rule!

24
Dimensional Modeling II Data Warehouse and Business Intelligence

Hierarchies in Dimensions

• Fixed hierarchies – Simply de-normalize as attributes


• Ex. Product: Department -> Type
• Variable-depth hierarchies - implement with a bridge table (used to
resolve M-M relationships)
• Should be used only when absolutely necessary
• Negatively affects usability
• Decreases performance Customer Dimension
Fact Table Customer Key PK
Date Key FK Customer Name
Customer Key FK ….
More Foreign Keys…
Facts …. Customer Hierarchy Bridge
Parent Customer Key PK,FK
Subsidiary Cust. Key PK,FK
# Levels from Parent
Bottom Flag
Top Flag

25
Dimensional Modeling II Data Warehouse and Business Intelligence

Multi-Valued Dimensions

• Almost all Fact-Dimension relationships are M-1


• Sometimes there’s a M-M relationship between fact and Dimension.
• The Weighing factor is between 0 and 1 and should add up to 1 for
each unique group key.

Health Care Billing Fact


Billing Date Key FK Diagnosis Dimension
Patient Key FK Diagnosis Key PK
Diagnosis Group Key FK ICD-9 Code
Bill Amount Diagnosis Description
More Facts …. ….
Diagnosis Group Bridge
Diagnosis Group Key PK,FK
Diagnosis Key PK,FK
Weighing Factor

26
Dimensional Modeling II Data Warehouse and Business Intelligence

Check yourself: What Kind of Dimension?

• Conformed? 1. Customers (for orders and


• Degenerate? sales leads)
• Slowly Changing? 2. The various classrooms on a
& Type? college campus?
• Role Playing? 3. Items on a restraint menu?
• Junk? 4. Parts required to repair an
automobile as part of a service
• Outrigger? record?
• M-M (Bridge)? 5. The instructors who teach a
college class?

27
Dimensional Modeling II Data Warehouse and Business Intelligence

Fact Table Cases in Detail

28
Dimensional Modeling II Data Warehouse and Business Intelligence

Recall 3 Types of Fact Tables grain

1. Events or
Transactions Transaction
(single event)
2. Workflows a.k.a.
Accumulating
Accumulating Snapshots Snapshot
(Events over Time)
3. Points in time a.k.a
Periodic Snapshots Periodic
Snapshot
(point in time)

29
Dimensional Modeling II Data Warehouse and Business Intelligence

Facts of Different Granularity == NO

• A single fact table cannot have facts with different levels of granularity
• All measurements must be in the same level of details
• Example:
• Measurements are captured for each line order except for the shipping charge
which is for the entire order
• Solutions:
• Allocating higher level facts to a lower granularity
(split shipping charge among each item)
• Create two separate fact tables
(Orders fact & Line Order fact)

30
Dimensional Modeling II Data Warehouse and Business Intelligence

Facts: Multiple currencies / Units of Measure

• Measurements are provided in a local currency


• Measurements should be converted to a standardized
currency or else conversion rates must be stored
• Similarly, in case of multiple units of measure, conversions
to all different units of measure should be provided
• Ex. Items received are by the box
(12 in a box =Received unit factor)
Received Price = Received unit factor * unit price

31
Dimensional Modeling II Data Warehouse and Business Intelligence

Factless Fact Tables

• Business processes that do not generate quantifiable


measurements
• Ex: Student attendance, College admissions
• Can be easily converted into traditional fact tables by
adding an attribute Count, which is always equal to 1.
• Consider adding facts for when the event did not
happen
• Helps to perform aggregations
• Ex: Attendance % present or absent versus class size.

32
Dimensional Modeling II Data Warehouse and Business Intelligence

Consolidated Fact Tables

• Fact tables populated from different sources may


consolidated into single fact table
• Level of granularity must be the same
• Measurements are listed side-by-side
• Ex. by combining forecast and actual sales amounts, a forecast/actual sales
variance amount can be easily calculated and stored

Sales & Forecast Fact


Sales Fact Forecast Fact Date Key FK
Date Key FK Date Key FK Customer Key FK
Customer Key FK Customer Key FK Region Key FK
Region Key FK Region Key FK Actual Sales $
Actual Sales $ Forecast Sales $ Forecast Sales $
Sales Variance $

33
Dimensional Modeling II Data Warehouse and Business Intelligence

Finally: Do’s and Don’ts of DM

• Do not take a “report centric” approach


• Reuse your dimensional models for multiple reports
• Dimensional models should not be departmentally bound.
• Reuse your dimensional models for multiple departments
• Create dimensional models with the finest level of granularity.
• This will be the most flexible and scalable option.
• Use Conformed dimensions
• Helps with integration efforts
• Simplifies the process of creating the next data mart.

34
Dimensional Modeling II Data Warehouse and Business Intelligence

Resources

• Reading:
• R. Kimball and M. Ross. (2007). The Data Warehouse Toolkit (2nd Edition),
Wiley & Sons.
• R. Kimball and M. Ross. (2013). The Data Warehouse Toolkit (3rd Edition),
Wiley & Sons.

35
Dimensional Modeling II Data Warehouse and Business Intelligence

EOF

36

You might also like