0% found this document useful (0 votes)
24 views27 pages

First Part 27 Pages

Uploaded by

luj.20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views27 pages

First Part 27 Pages

Uploaded by

luj.20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Business Intelligence Guidebook – From Data

Integration to Analytics

www.biguidebook.com
November 2014
Imprint: Morgan Kaufmann
Print Book ISBN : 9780124114616
eBook ISBN : 9780124115286

1
Chapter 9
Dimensional Modeling

2
Outline
• Introduction to dimensional modeling.
• High-level view of a dimensional model
- Facts
- Dimensions
- Schemas
• ER vs dimensional modeling
• Purpose of dimensional modeling
• Advanced dimensional modeling
• Dimensional modeling recap
3
Introduction to Dimensional Modeling
• The purpose of dimensional modeling is to enable business intelligence (BI) re-
porting, query, and analysis.

• Like enterprise relationship (ER) modeling, dimensional modeling is a logical design


technique.

• It depicts business processes throughout an enterprise and organizes that data and its
structure in a logical way.

• It is much better suited for business intelligence (BI) applications and data
warehousing (DW)

4
High-Level View of a Dimensional Model
• There are two key entities in a dimensional model:
● Facts (measures).
● Dimensions (context).
• Example:

• The fact Tbl_Fact_Store_Sales is at the core of the dimensional model.


• Four surrounding dimensions that define and put into context the store sales:
- Tbl_Dim_Item, which is what products were sold.
- Tbl_Dim_Date, which is when those products were sold.
- Tbl_Dim_Customer, who bought the products.
- Tbl_Dim_Buyer, who bought the product for the store

5
High-Level View of a Dimensional Model (Example)

f09-01
6
Facts
• A fact is a measurement of a business activity, such as a business event or transaction,
and is generally numeric.

• Examples of facts are sales, expenses, and inventory levels

• Numeric measurements may include counts, dollar amounts, percentages, or ratios.

• Facts can be aggregated or derived. For example, you can sum up the total revenue or
calculate the profitability of a set of sales transactions.

• Facts provide the measurements of how well or how poorly the business is
performing. A fact is also referred to as organizational performance measure.
7
Facts (Cont.)
• Fact tables are normalized and contain little redundancy.

• Fact table record counts can become very large. Ninety-percent of the data in a
dimensional model is typically located in the fact tables.

• The key dimensional modeling design concerns when working with the data in fact
tables are how to minimize and standardize it and make it consistent.

• Fact tables are composed of two types of columns: keys and measures

8
Fact table – keys
• The key column of a fact table,
consists of a group of foreign
keys (FK) that point to the
primary keys of dimensional
tables that are associated with
this fact table to enable business
analysis.

• The relationships between fact


tables and the dimensions are
one-to- many, similar to ER f09-02
modeling. 9
Fact table – keys (Cont.)
• The primary key of a fact table is typically a multipart key consisting of the
combination of foreign keys that can uniquely identify the fact table row.
This key may also be referred to as a compound or concatenated key.

• The multipart key may be a subset of the foreign keys such as in our
example: DateKey, StoreKey, ProductKey, and CustomerKey may
uniquely identify each row in the sales fact table.

• You have two alternatives if there is no combination of foreign keys that


creates the uniqueness required for creating a primary key:
- Primary key with degenerative dimensions
- Primary key using a surrogate key. 10
Fact tables—primary key with degenerative
dimensions
• The operational systems that record the
business transactions or events used to
populate fact tables typically create unique
identifiers related to those transactions.
• Examples of these identifiers are a sales order
number, invoice number, and shipment
tracking number.
• These identifiers are called degenerative
dimensions (discussed later in this chapter).
• If combining this identifier with a subset of
foreign keys creates uniqueness, then this
multipart key will become the primary key.
f09-03
11
Fact table – primary key is a surrogate key
• If you cannot identify unique rows with any
of the methods discussed so far, create a
primary key based on a surrogate key.
• A surrogate key, which is often generated by
the database system using an IDENTITY data
type, is an integer whose value is
meaningless.

f09-04 12
Fact table – measures
• The second type of column in a fact table is the
actual measures of the business activity such as
the sales revenue and order quantity.
• Every measurement has a grain, which is the
level of detail in the measurement of an event.
For example, the grain of currency could be to
the dollar amount, or be more granular and
include cents.
• Granularity is determined by its data source.
• Now, it’s an established practice to store at the
lowest transactional level of detail that’s available
from a transactional or operational systems. f09-05
13
Fact table - types of facts
• After you’ve defined the measures and their level of grain in the facts, you need to determine the
numeric attributes of the types of measures that are being stored in the fact. There are three types of
measures:

Additive Facts Semiadditive facts Non-additive facts


- The easiest to define and manage. - These are measurements in the - Nonadditive facts are measures in
- It’s simply a measure of the fact fact table that can be added across fact tables that can’t be added
table that can be added across all some dimensions but not others. across any dimensions.
dimensions. - E.g. bank account balances, the - Examples of these include unit
- E.g. the quantity of items you number of students attending a prices, ratios, and temperatures;
bought in an online store—such as class, or inventory levels. You even though they are numbers
the number of books. It can be can’t simply add 12 months of they aren’t supposed to be added
aggregated by all applicable account balances and get how
dimensions, which in our example much money somebody has in a
is customer, store, product, and bank account. In this case, you
date. would average those balances over
12 months.
14
Fact table - types of facts (cont.)
• It’s important to understand the concepts of additive, semiadditive, and
nonadditive facts because aggregating or summarizing data is:
- a big part of reporting and analysis.
- It’s one of the key benefits of using dimensional models,
- and one of the things for which it is most often used.

• After you define measures and determine whether they are additive,
nonadditive, or semiadditive facts, you need to establish how they can be
analyzed in BI.
• It is the BI team’s responsibility to ensure that the business people performing
the analysis know what type of measure they are accessing to prevent the risk
of using data inappropriately. 15
Dimensions
• A dimension is an entity that establishes the business context for the
measures (facts) used by an enterprise.
• Dimensions define the who, what, where, and why of the dimensional
model, and group similar attributes into a category or subject area.
• Whereas facts are numeric, dimensions are descriptive in nature
(although some of those descriptions, such as a product list price, may
be numeric).
• Creating a dimension enables facts to store attributes in a single place,
rather than multiplying them redundantly across the rows of the fact
16
tables (i.e, eliminates redundancy).
Dimensions (cont.)
• From a business perspective, the key purpose of dimensions
it to use their attributes to filter and analyze data based on
performance measures.

• In Figure 9.6, the dimension is a product, DimProduct, with


its attributes including name, weight, size, color, and list
price. When the product dimension is joined with a sales
fact table, a business person could examine sales based on
one or more of these specific product attributes, such as
analyzing sales by color or size.

f09-06
17
Dimensions (cont.)
To be useful in analysis a dimensional attribute needs these key characteristics:
• Descriptive, so business people and those designing the BI applications can
understand it.
• Complete, with no missing values.
• Unique, because it’s critical that values are uniquely identifiable.
• Valid, so the data is useful to the business.

18
Dimension Hierarchy
• Another aspect of the business context created by
dimensions is that they are often hierarchical; they group
things together in ways that an enterprise would measure
itself.
• These hierarchies represent many-to-one relationships.
Examples of hierarchies include:
- Organizational structures, such as a marketing or sales
organization.
- Product or service categories.
- Geographic groupings such as sales territories.
- Time. Years breaks down to quarters, months, weeks,
etc. f09-07
• Using BI terminology, dimensions allow you to drill up
and down and across.
19
Dimension keys

• A key concept in constructing dimensions is that each row of a dimension table is


unique.
• In a dimension table, the primary keys are a single field compared to facts, which use
a grouping of foreign keys as their primary key.

20
Dimensions - surrogate and natural keys
• One of the best practices to emerge for dimensions is
using a surrogate key as the primary key as depicted
in Figure 9.8.

• As discussed in ER modelling, the processes for


designating a primary key for ER modeling involves
selecting a key that uniquely identifies the entity. If
there is more than one possibility they are called
candidate keys, and the keys not selected are
alternate keys.

f09-08

21
Dimensions - surrogate and natural keys
(Cont.)
• The reasons to create a surrogate key are:

- When gathering dimensional data from multiple source systems, there are often inconsistent
or incompatible primary keys used across these systems.

- Primary keys from source systems often change over time with different naming or
numbering conventions being used at different times. Additionally, over time, source
applications may be replaced by newer systems, or mergers may create the need to replace
systems.

- Primary key consistency may be maintained by source systems for shorter periods than the
enterprise analytical needs dictate.

- Source systems may be using smart keys. (What is a smart key?)

22
Dimensions - surrogate and natural keys
(Cont.)
What is a smart key?
Operational and transactional systems sometimes define or identify items such as products with
smart keys.
These are alphanumeric strings, maybe 24 or 40 characters in length.
The character string is typically divided into substrings. The substrings have meaning, hence the
word smart key.
For example, the first three characters might designate what manufacturing plant the product was
built in. The next five characters might designate the materials that were used to construct the
product. The next 10 might designate the size or some other characteristics of the construction of
the product, and so on.

23
Dimensions - surrogate and natural keys
(Cont.)
• An additional best practice is to maintain the source
system’s primary key as an alternate key in the
dimension. This is also called the source system’s
natural key.

• If there are multiple source systems with natural


keys you should add an attribute that identifies the
source system. This results in a multipart alternate
key to identify the natural keys.

• In Figure 9.9, CustomerSK is the primary key in the


customer dimension, CustomerNK is the natural key
in the dimension and primary key in the source
system, SOR_NK is the SOR (systems of record)
indicator and the multipart alternate key is the f09-09
SOR_NK and CustomerNK columns.
24
Dimensions - surrogate and natural keys
(Cont.)
Benefits:
• The primary benefit of using a surrogate key as a dimension’s primary key is to provide an
identifier that is consistent and unique across source systems and time, and that is independent of
business systems.

• An additional benefit of the surrogate key is that, being integer-based, it is a great data type to
index and join in a relational model.

To summarize, dimensions should have the following characteristics:


• Unique rows.
• Surrogate keys used as primary keys.
• Non-NULL primary keys.

25
Dimensions – not null primary keys
• The foreign keys used as the primary key in fact tables should never
contain null values.

• For an example of how a null value gets assigned, suppose a row in a sales
fact table has a null value in the customer identifier column that is the
foreign key linked to the customer dimension. The null value was input
into that column when, loading the data from the source systems, the ETL
process could not find a customer associated with that sale because the
value was unknown, missing or invalid.

26
Dimensions – not null primary keys (Cont.)
• This condition clearly results in misleading analysis that potentially creates business risk. The best
practices that address this potential business risk include:
- Create row(s) in each dimension that are used when dimensional values are unknown,
missing, invalid, or other conditions in which referential integrity is not met.
- Because the numbering convention for surrogate keys is a positive integer, use negative
integers such as −999 for “missing” row keys.
- Dimensional rows have surrogate keys along with attributes used for naming and describing
them. Create a standard name and description for these rows used across all dimensions, e.g.,
“Missing,” “Unknown,” or “Invalid.”
- At a minimum, designate one row per dimension table for missing values, but if it is important
to be able to identify different conditions then use multiple rows. If there are multiple
conditions handled then use standard numbering and naming for each of these conditions
across all dimensions.

27

You might also like