0% found this document useful (0 votes)

24 views27 pages

First Part 27 Pages

Uploaded by

luj.20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views27 pages

First Part 27 Pages

Uploaded by

luj.20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Business Intelligence Guidebook – From Data

Integration to Analytics

www.biguidebook.com
November 2014
Imprint: Morgan Kaufmann
Print Book ISBN : 9780124114616
eBook ISBN : 9780124115286

1
Chapter 9
Dimensional Modeling

2
Outline
• Introduction to dimensional modeling.
• High-level view of a dimensional model
- Facts
- Dimensions
- Schemas
• ER vs dimensional modeling
• Purpose of dimensional modeling
• Advanced dimensional modeling
• Dimensional modeling recap
3
Introduction to Dimensional Modeling
• The purpose of dimensional modeling is to enable business intelligence (BI) re-
porting, query, and analysis.

• Like enterprise relationship (ER) modeling, dimensional modeling is a logical design

technique.

• It depicts business processes throughout an enterprise and organizes that data and its
structure in a logical way.

• It is much better suited for business intelligence (BI) applications and data
warehousing (DW)

4
High-Level View of a Dimensional Model
• There are two key entities in a dimensional model:
● Facts (measures).
● Dimensions (context).
• Example:

• The fact Tbl_Fact_Store_Sales is at the core of the dimensional model.

• Four surrounding dimensions that define and put into context the store sales:
- Tbl_Dim_Item, which is what products were sold.
- Tbl_Dim_Date, which is when those products were sold.
- Tbl_Dim_Customer, who bought the products.
- Tbl_Dim_Buyer, who bought the product for the store

5
High-Level View of a Dimensional Model (Example)

f09-01
6
Facts
• A fact is a measurement of a business activity, such as a business event or transaction,
and is generally numeric.

• Examples of facts are sales, expenses, and inventory levels

• Numeric measurements may include counts, dollar amounts, percentages, or ratios.

• Facts can be aggregated or derived. For example, you can sum up the total revenue or
calculate the profitability of a set of sales transactions.

• Facts provide the measurements of how well or how poorly the business is
performing. A fact is also referred to as organizational performance measure.
7
Facts (Cont.)
• Fact tables are normalized and contain little redundancy.

• Fact table record counts can become very large. Ninety-percent of the data in a
dimensional model is typically located in the fact tables.

• The key dimensional modeling design concerns when working with the data in fact
tables are how to minimize and standardize it and make it consistent.

• Fact tables are composed of two types of columns: keys and measures

8
Fact table – keys
• The key column of a fact table,
consists of a group of foreign
keys (FK) that point to the
primary keys of dimensional
tables that are associated with
this fact table to enable business
analysis.

• The relationships between fact

tables and the dimensions are
one-to- many, similar to ER f09-02
modeling. 9
Fact table – keys (Cont.)
• The primary key of a fact table is typically a multipart key consisting of the
combination of foreign keys that can uniquely identify the fact table row.
This key may also be referred to as a compound or concatenated key.

• The multipart key may be a subset of the foreign keys such as in our
example: DateKey, StoreKey, ProductKey, and CustomerKey may
uniquely identify each row in the sales fact table.

• You have two alternatives if there is no combination of foreign keys that

creates the uniqueness required for creating a primary key:
- Primary key with degenerative dimensions
- Primary key using a surrogate key. 10
Fact tables—primary key with degenerative
dimensions
• The operational systems that record the
business transactions or events used to
populate fact tables typically create unique
identifiers related to those transactions.
• Examples of these identifiers are a sales order
number, invoice number, and shipment
tracking number.
• These identifiers are called degenerative
dimensions (discussed later in this chapter).
• If combining this identifier with a subset of
foreign keys creates uniqueness, then this
multipart key will become the primary key.
f09-03
11
Fact table – primary key is a surrogate key
• If you cannot identify unique rows with any
of the methods discussed so far, create a
primary key based on a surrogate key.
• A surrogate key, which is often generated by
the database system using an IDENTITY data
type, is an integer whose value is
meaningless.

f09-04 12
Fact table – measures
• The second type of column in a fact table is the
actual measures of the business activity such as
the sales revenue and order quantity.
• Every measurement has a grain, which is the
level of detail in the measurement of an event.
For example, the grain of currency could be to
the dollar amount, or be more granular and
include cents.
• Granularity is determined by its data source.
• Now, it’s an established practice to store at the
lowest transactional level of detail that’s available
from a transactional or operational systems. f09-05
13
Fact table - types of facts
• After you’ve defined the measures and their level of grain in the facts, you need to determine the
numeric attributes of the types of measures that are being stored in the fact. There are three types of
measures:

Additive Facts Semiadditive facts Non-additive facts

- The easiest to define and manage. - These are measurements in the - Nonadditive facts are measures in
- It’s simply a measure of the fact fact table that can be added across fact tables that can’t be added
table that can be added across all some dimensions but not others. across any dimensions.
dimensions. - E.g. bank account balances, the - Examples of these include unit
- E.g. the quantity of items you number of students attending a prices, ratios, and temperatures;
bought in an online store—such as class, or inventory levels. You even though they are numbers
the number of books. It can be can’t simply add 12 months of they aren’t supposed to be added
aggregated by all applicable account balances and get how
dimensions, which in our example much money somebody has in a
is customer, store, product, and bank account. In this case, you
date. would average those balances over
12 months.
14
Fact table - types of facts (cont.)
• It’s important to understand the concepts of additive, semiadditive, and
nonadditive facts because aggregating or summarizing data is:
- a big part of reporting and analysis.
- It’s one of the key benefits of using dimensional models,
- and one of the things for which it is most often used.

• After you define measures and determine whether they are additive,
nonadditive, or semiadditive facts, you need to establish how they can be
analyzed in BI.
• It is the BI team’s responsibility to ensure that the business people performing
the analysis know what type of measure they are accessing to prevent the risk
of using data inappropriately. 15
Dimensions
• A dimension is an entity that establishes the business context for the
measures (facts) used by an enterprise.
• Dimensions define the who, what, where, and why of the dimensional
model, and group similar attributes into a category or subject area.
• Whereas facts are numeric, dimensions are descriptive in nature
(although some of those descriptions, such as a product list price, may
be numeric).
• Creating a dimension enables facts to store attributes in a single place,
rather than multiplying them redundantly across the rows of the fact
16
tables (i.e, eliminates redundancy).
Dimensions (cont.)
• From a business perspective, the key purpose of dimensions
it to use their attributes to filter and analyze data based on
performance measures.

• In Figure 9.6, the dimension is a product, DimProduct, with

its attributes including name, weight, size, color, and list
price. When the product dimension is joined with a sales
fact table, a business person could examine sales based on
one or more of these specific product attributes, such as
analyzing sales by color or size.

f09-06
17
Dimensions (cont.)
To be useful in analysis a dimensional attribute needs these key characteristics:
• Descriptive, so business people and those designing the BI applications can
understand it.
• Complete, with no missing values.
• Unique, because it’s critical that values are uniquely identifiable.
• Valid, so the data is useful to the business.

18
Dimension Hierarchy
• Another aspect of the business context created by
dimensions is that they are often hierarchical; they group
things together in ways that an enterprise would measure
itself.
• These hierarchies represent many-to-one relationships.
Examples of hierarchies include:
- Organizational structures, such as a marketing or sales
organization.
- Product or service categories.
- Geographic groupings such as sales territories.
- Time. Years breaks down to quarters, months, weeks,
etc. f09-07
• Using BI terminology, dimensions allow you to drill up
and down and across.
19
Dimension keys

• A key concept in constructing dimensions is that each row of a dimension table is

unique.
• In a dimension table, the primary keys are a single field compared to facts, which use
a grouping of foreign keys as their primary key.

20
Dimensions - surrogate and natural keys
• One of the best practices to emerge for dimensions is
using a surrogate key as the primary key as depicted
in Figure 9.8.

• As discussed in ER modelling, the processes for

designating a primary key for ER modeling involves
selecting a key that uniquely identifies the entity. If
there is more than one possibility they are called
candidate keys, and the keys not selected are
alternate keys.

f09-08

21
Dimensions - surrogate and natural keys
(Cont.)
• The reasons to create a surrogate key are:

- When gathering dimensional data from multiple source systems, there are often inconsistent
or incompatible primary keys used across these systems.

- Primary keys from source systems often change over time with different naming or
numbering conventions being used at different times. Additionally, over time, source
applications may be replaced by newer systems, or mergers may create the need to replace
systems.

- Primary key consistency may be maintained by source systems for shorter periods than the
enterprise analytical needs dictate.

- Source systems may be using smart keys. (What is a smart key?)

22
Dimensions - surrogate and natural keys
(Cont.)
What is a smart key?
Operational and transactional systems sometimes define or identify items such as products with
smart keys.
These are alphanumeric strings, maybe 24 or 40 characters in length.
The character string is typically divided into substrings. The substrings have meaning, hence the
word smart key.
For example, the first three characters might designate what manufacturing plant the product was
built in. The next five characters might designate the materials that were used to construct the
product. The next 10 might designate the size or some other characteristics of the construction of
the product, and so on.

23
Dimensions - surrogate and natural keys
(Cont.)
• An additional best practice is to maintain the source
system’s primary key as an alternate key in the
dimension. This is also called the source system’s
natural key.

• If there are multiple source systems with natural

keys you should add an attribute that identifies the
source system. This results in a multipart alternate
key to identify the natural keys.

• In Figure 9.9, CustomerSK is the primary key in the

customer dimension, CustomerNK is the natural key
in the dimension and primary key in the source
system, SOR_NK is the SOR (systems of record)
indicator and the multipart alternate key is the f09-09
SOR_NK and CustomerNK columns.
24
Dimensions - surrogate and natural keys
(Cont.)
Benefits:
• The primary benefit of using a surrogate key as a dimension’s primary key is to provide an
identifier that is consistent and unique across source systems and time, and that is independent of
business systems.

• An additional benefit of the surrogate key is that, being integer-based, it is a great data type to
index and join in a relational model.

To summarize, dimensions should have the following characteristics:

• Unique rows.
• Surrogate keys used as primary keys.
• Non-NULL primary keys.

25
Dimensions – not null primary keys
• The foreign keys used as the primary key in fact tables should never
contain null values.

• For an example of how a null value gets assigned, suppose a row in a sales
fact table has a null value in the customer identifier column that is the
foreign key linked to the customer dimension. The null value was input
into that column when, loading the data from the source systems, the ETL
process could not find a customer associated with that sale because the
value was unknown, missing or invalid.

26
Dimensions – not null primary keys (Cont.)
• This condition clearly results in misleading analysis that potentially creates business risk. The best
practices that address this potential business risk include:
- Create row(s) in each dimension that are used when dimensional values are unknown,
missing, invalid, or other conditions in which referential integrity is not met.
- Because the numbering convention for surrogate keys is a positive integer, use negative
integers such as −999 for “missing” row keys.
- Dimensional rows have surrogate keys along with attributes used for naming and describing
them. Create a standard name and description for these rows used across all dimensions, e.g.,
“Missing,” “Unknown,” or “Invalid.”
- At a minimum, designate one row per dimension table for missing values, but if it is important
to be able to identify different conditions then use multiple rows. If there are multiple
conditions handled then use standard numbering and naming for each of these conditions
across all dimensions.

Unit II DWDM
No ratings yet
Unit II DWDM
97 pages
Unit 4
No ratings yet
Unit 4
41 pages
1.1 (Dimensional Modelling)
No ratings yet
1.1 (Dimensional Modelling)
51 pages
Bi Unit 2
No ratings yet
Bi Unit 2
14 pages
Lecture 3 & 4 - 5610
No ratings yet
Lecture 3 & 4 - 5610
19 pages
BI - Chap 3 - Data Warehouses Design
No ratings yet
BI - Chap 3 - Data Warehouses Design
54 pages
L04 Dimensional Modeling
100% (1)
L04 Dimensional Modeling
58 pages
DW Mod 4
No ratings yet
DW Mod 4
37 pages
Citer
No ratings yet
Citer
4 pages
The Problem: Data Warehouse Design
No ratings yet
The Problem: Data Warehouse Design
27 pages
DWT Chapter 2 Part 1
No ratings yet
DWT Chapter 2 Part 1
18 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
59 pages
CH 3
No ratings yet
CH 3
60 pages
COMP8047 - S05 Dimensional Modelling 2
No ratings yet
COMP8047 - S05 Dimensional Modelling 2
34 pages
Well Completion
No ratings yet
Well Completion
64 pages
07 Conceptual Design
No ratings yet
07 Conceptual Design
33 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
59 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
104 pages
Unit - I
No ratings yet
Unit - I
65 pages
Lecture 4
No ratings yet
Lecture 4
24 pages
DWM Unit-Ii Notes
No ratings yet
DWM Unit-Ii Notes
27 pages
Week 3
No ratings yet
Week 3
39 pages
Upper-Voice Structures and Compositional Process in The Ars Nova Motet
100% (2)
Upper-Voice Structures and Compositional Process in The Ars Nova Motet
175 pages
Data Warehouse Design
No ratings yet
Data Warehouse Design
29 pages
WON A Corp Is Entitled To Moral Damages
100% (1)
WON A Corp Is Entitled To Moral Damages
6 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
Session 4 Case Study Retail Case
50% (2)
Session 4 Case Study Retail Case
28 pages
Dimensional Modeling: E-BIZ Practice Tata Consultancy Services, India
No ratings yet
Dimensional Modeling: E-BIZ Practice Tata Consultancy Services, India
35 pages
02 - Data Modeling
No ratings yet
02 - Data Modeling
32 pages
Dimensional Data Modeling - Lecture 3
No ratings yet
Dimensional Data Modeling - Lecture 3
12 pages
Lecture 7 p1
No ratings yet
Lecture 7 p1
38 pages
Unit - 4
No ratings yet
Unit - 4
36 pages
Data Warehousing Fundamentals: Priyanka Deshmukh
No ratings yet
Data Warehousing Fundamentals: Priyanka Deshmukh
43 pages
Basics of Dimensional Modeling
100% (1)
Basics of Dimensional Modeling
14 pages
BI - Lecture 3 - Kimball Concepts
No ratings yet
BI - Lecture 3 - Kimball Concepts
44 pages
DW Lec7
No ratings yet
DW Lec7
15 pages
4 Lecture 4-Dimensional Modelling
No ratings yet
4 Lecture 4-Dimensional Modelling
45 pages
Customer Behavior
No ratings yet
Customer Behavior
14 pages
Bi Lecture4 - 2023
No ratings yet
Bi Lecture4 - 2023
49 pages
Dimensional Modeling (II)
No ratings yet
Dimensional Modeling (II)
11 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
36 pages
Dimensional Modeling: Prof. Sunita Sahu
No ratings yet
Dimensional Modeling: Prof. Sunita Sahu
50 pages
Dimensional Analysis: Prithwis Mukerjee, PH.D
No ratings yet
Dimensional Analysis: Prithwis Mukerjee, PH.D
48 pages
DOS 1.0 Jan82
No ratings yet
DOS 1.0 Jan82
307 pages
DW Unit 4
No ratings yet
DW Unit 4
39 pages
Week 5
No ratings yet
Week 5
19 pages
Fact Tables
No ratings yet
Fact Tables
3 pages
GK-Kailash Satyarthi - Notes and Worksheet
No ratings yet
GK-Kailash Satyarthi - Notes and Worksheet
4 pages
DWH Architecture & Concepts
No ratings yet
DWH Architecture & Concepts
37 pages
Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
No ratings yet
Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
34 pages
Dimensional Modelling: CS2.1.1 CS2.1.2
No ratings yet
Dimensional Modelling: CS2.1.1 CS2.1.2
22 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
14 pages
L03A-Dimensional Modeling I
No ratings yet
L03A-Dimensional Modeling I
27 pages
Ads 5295
No ratings yet
Ads 5295
93 pages
DW CrashCoursePPT
No ratings yet
DW CrashCoursePPT
24 pages
Formative Assessments
No ratings yet
Formative Assessments
17 pages
Lec2 Dimensional Model
No ratings yet
Lec2 Dimensional Model
30 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
7 pages
Fertitta, Et Al. v. Knoedler Gallery, LLC, Et Al. - Complaint
No ratings yet
Fertitta, Et Al. v. Knoedler Gallery, LLC, Et Al. - Complaint
74 pages
What Is Fact?: A Fact Is A Collection of Related Data Items, Each Fact Typically Represents A Business Item, A
No ratings yet
What Is Fact?: A Fact Is A Collection of Related Data Items, Each Fact Typically Represents A Business Item, A
28 pages
1917 Punjab District Gazetteers Vol 30 A Kangra District Parts 2, 3 and 4 S
No ratings yet
1917 Punjab District Gazetteers Vol 30 A Kangra District Parts 2, 3 and 4 S
384 pages
Lecture 1 Notes: Dimension Tables
No ratings yet
Lecture 1 Notes: Dimension Tables
2 pages
C 01 Dimensional Modeling
No ratings yet
C 01 Dimensional Modeling
30 pages
Mla Bibliography Website
100% (1)
Mla Bibliography Website
4 pages
What Is The Difference Between OLTP and OLAP?
No ratings yet
What Is The Difference Between OLTP and OLAP?
33 pages
Wa0018.
No ratings yet
Wa0018.
17 pages
Happy Days Farm, Exton Pennsylvania Historic Resource Survey Form - Photoisite Plan Sheet
No ratings yet
Happy Days Farm, Exton Pennsylvania Historic Resource Survey Form - Photoisite Plan Sheet
115 pages
InductiveReasoningTest4 Questions
100% (1)
InductiveReasoningTest4 Questions
31 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
26 pages
Atpl Ins Ques 2 N
No ratings yet
Atpl Ins Ques 2 N
14 pages
Protection Coordination For Networked Microgrids Using Single and Dual Setting
No ratings yet
Protection Coordination For Networked Microgrids Using Single and Dual Setting
11 pages
Grade 9 - Ems - Exam - Term 4
No ratings yet
Grade 9 - Ems - Exam - Term 4
6 pages
Types of Dimensions - Data Warehouse
No ratings yet
Types of Dimensions - Data Warehouse
8 pages
Renaissance (1300 - 1600)
No ratings yet
Renaissance (1300 - 1600)
1 page
Vaishnavweekly Diary
No ratings yet
Vaishnavweekly Diary
14 pages
What Is Dimensional Model
No ratings yet
What Is Dimensional Model
7 pages
Prisoner Diving Gear
No ratings yet
Prisoner Diving Gear
2 pages
Eo Organizing The BDC
No ratings yet
Eo Organizing The BDC
3 pages
HASYTEC DBPi Brochure
No ratings yet
HASYTEC DBPi Brochure
4 pages
Mauna Kea Investigation
No ratings yet
Mauna Kea Investigation
17 pages
Questions That Need Be Answered
No ratings yet
Questions That Need Be Answered
10 pages
A Dimension Table Consists of The Attributes About The Facts
No ratings yet
A Dimension Table Consists of The Attributes About The Facts
3 pages
MATRIX For Data Need and Analysis AYUPAN
No ratings yet
MATRIX For Data Need and Analysis AYUPAN
7 pages
Dimensions DW
No ratings yet
Dimensions DW
6 pages
Gatela, Jone Harry B. BSEd ENG 2
No ratings yet
Gatela, Jone Harry B. BSEd ENG 2
2 pages
Manzano Vs CA
No ratings yet
Manzano Vs CA
7 pages
Board 4-CHN
100% (23)
Board 4-CHN
30 pages
Normandy vs. Duque
No ratings yet
Normandy vs. Duque
2 pages
ZipGrade50QuestionV2 PDF
No ratings yet
ZipGrade50QuestionV2 PDF
1 page
Management Accounting: Business Strategy & Performance: Decision-Making by Numbers
From Everand
Management Accounting: Business Strategy & Performance: Decision-Making by Numbers
Commerce Central
No ratings yet

First Part 27 Pages

Uploaded by

First Part 27 Pages

Uploaded by

Business Intelligence Guidebook – From Data

• Like enterprise relationship (ER) modeling, dimensional modeling is a logical design

• The fact Tbl_Fact_Store_Sales is at the core of the dimensional model.

• Examples of facts are sales, expenses, and inventory levels

• Numeric measurements may include counts, dollar amounts, percentages, or ratios.

• The relationships between fact

• You have two alternatives if there is no combination of foreign keys that

Additive Facts Semiadditive facts Non-additive facts

• In Figure 9.6, the dimension is a product, DimProduct, with

• A key concept in constructing dimensions is that each row of a dimension table is

• As discussed in ER modelling, the processes for

- Source systems may be using smart keys. (What is a smart key?)

• If there are multiple source systems with natural

• In Figure 9.9, CustomerSK is the primary key in the

To summarize, dimensions should have the following characteristics:

You might also like