0% found this document useful (0 votes)
109 views16 pages

Defining Slowly Changing Dimensions

Uploaded by

shd_sbq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views16 pages

Defining Slowly Changing Dimensions

Uploaded by

shd_sbq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Defining Slowly Changing Dimensions

Star Schema (Review)


The star schema is a partly normalized data model that is commonly used to
structure a data mart. It separates the data into two categories:

Product Organization
• Facts: transaction details Dimension
Dimension
such as invoice amount and
number of items sold.
• Dimensions: context to the facts Fact Table
such as customer information and
product information.
Customer Time
Dimension Dimension

6
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Slowly Changing Dimensions: Introduction


Slowly changing dimensions refers to a process that updates dimension tables
in a star schema while preserving a history of changes in the same table.
There are various business reasons for keeping historical records, including:
• trend analysis
• change tracking
• historical reporting
• archiving records

7
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
2 Defining Slowly Changing Dimensions

Trend Analysis

A customer dimension can be analyzed for purchasing behavior as a function of income levels.
Income-based trends in purchasing behavior can be used to optimize marketing campaigns.

Change Tracking

Student records must be tracked over time to meet state and federal reporting mandates.
The reports are used to determine education spending.

Historical Reporting

Patients change their health-care provider. Health-care records from before the change can be
distinguished from health-care records after the change.

Archiving Records

Human resources departments keep histories of employee records, including job title and salary
history.
Defining Slowly Changing Dimensions 3

“Slowly” Changing Dimensions


Facts and dimensions update at different speeds.

• All newly acquired facts must be added to the fact table.

• A dimension table is affected only when there is a change.

8
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Some examples for different dimension table types include:


• When dealing with a Customer entity, it might be useful to know various demographics (gender,
age, education, profession, income level, marital status) or which products a customer is
purchasing.
• When dealing with a Patient entity, it might be useful to know information such as gender, age,
illness, current drug medications, allergies or treatment regimens.
• When dealing with a Student entity, it might be useful to know information such as grade or class
level, gender, courses previously taken, courses currently taking, housing, or records of absences
and attendance.
• When dealing with a Provider entity, it might be useful to know information such as location,
ratings, network, or hospital affiliation.
• When dealing with an Organization entity such as an employee, it might be useful to know the
hierarchical structure of jobs within the organization and how an employee fits within this structure.
It might also be useful to know information like salary, job title, and the employee’s division,
department or group.
All of these types of information can change slowly over time. For example, every time a customer
orders a product that order is noted in the fact table, but their personal information, like address,
most likely will change much less often than they will order a product.
4 Defining Slowly Changing Dimensions

Consider a Customer Dimension


The source for the Customer Dimension table could be a Customer table
in a DBMS.

DBMS Analytical Data Mart


Product Organization
Dimension Dimension

Fact Table

Customer
Customer Time
Database Dimension Dimension
9
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

The Customer table in the DBMS has one row per customer.

Consider a Customer Dimension


In the initial load:
• All the rows from the DBMS table and selected columns are extracted.
• The needed calculated columns are added.

DBMS Analytical Data Mart


Product Organization
Dimension Dimension

Fact Table

Customer Initial Load


Customer Time
Database Dimension Dimension
10
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
Defining Slowly Changing Dimensions 5

Changes in the Database


Changes over the course of normal business operations:
• New customers are added.
• Existing customer properties change.
• Old customers are deleted.
DBMS Analytical Data Mart
Product Organization
new customers Dimension Dimension
updates deletes
Fact Table

Customer
Customer Time
Database Dimension Dimension
11
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Updating the Dimension Table without SCD


Rerunning the same job that was used for the initial load
• updates the customer dimension table
• overwrites historical values in the customer dimension table.

DBMS Analytical Data Mart


Product Organization
Dimension Dimension

Fact Table

Customer Overwrite
Customer Time
Database Dimension Dimension
12
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Rerunning the initial load job replaces the target table.


6 Defining Slowly Changing Dimensions

Updating the Dimension Table with SCD


The process of slowly changing dimensions
• adds records for new customers
• adds records for changed customers
• keeps historical records.
DBMS Analytical Data Mart
Product Organization
Dimension Dimension

Fact Table

Customer Update with SCD


Customer Time
Database Dimension Dimension
13
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Idea Exchange
What are some reasons that you might have
to keep historical records for your data?

Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.


Defining Slowly Changing Dimensions 7

Types of Slowly Changing Dimensions


There are three types of slowly changing dimensions:

Type 1 No historical data is retained.

Type 2 Historical data is retained in rows.

Type 3 Historical data is retained in columns.

Before explaining the three types of slowly changing dimensions, it is useful


to consider the primary/foreign key relationships in a star schema type data
model.

15
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Star Schema Keys


A star schema uses various keys:

16
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
8 Defining Slowly Changing Dimensions

Star Schema Keys


A star schema uses various keys: Foreign Key: associates a
Business Primary Key: row in the fact table with
Key: uniquely the corresponding row in
identifies a identifies a row a dimension table.
business in a dimension.
entity in a
dimension
table

19
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

continued...
Primary and Foreign Keys
Primary and foreign keys are used to maintain referential integrity.
• Each dimension table in a star schema has a primary key.

20
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
Defining Slowly Changing Dimensions 9

Primary and Foreign Keys


Primary and foreign keys are used to maintain referential integrity.
• The associated foreign keys are in the fact table.

21
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Referential integrity constraints include


• primary keys, in the dimension tables (must be unique and not null)
• foreign keys, in the fact table.
A primary key value cannot be deleted if it exists in a foreign key. A foreign key can have only values
that exist in the primary key.

Primary and Foreign Keys


Primary and foreign keys are also used to do the following:
• relate records in the fact table to records in the dimension tables
• join the fact table with any dimension tables for analysis and reporting

22
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
10 Defining Slowly Changing Dimensions

The SCD practice is mainly concerned with the relationship aspect of the primary and foreign keys
and not so much with the referential constraints.

Business Keys
Often, the business key in a dimension table can function as a primary key
in that table. Here are two examples:
• Customer_ID in the customer dimension table
• Product_ID in the product dimension table

23
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Example of Slowly Changing Dimensions


Consider an Employee Salary table to explain the three types of slowly
changing dimensions.
In the Employee Salary table, Emp_ID
• is the business key (each value of Emp_ID uniquely identifies a business
entity, in this case an individual employee).
• can be used as a primary key.

Employee Salary Information


Emp_ID (PK) Emp_Name Year Salary
1249 K Munch 2004 $42,000

24
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
Defining Slowly Changing Dimensions 11

Type 1 Slowly Changing Dimension


A Type 1 slowly changing dimension is updated by replacing old values with
new values.
In the year 2005, the salary of K Munch changes. The old year (2004) and
salary ($42,000) are updated by replacing them with the new values.
With a Type 1 SCD, old values are not recoverable. The table contains current
information only.

Employee Salary Information


Emp_ID (PK) Emp_Name Year Salary
1249 K Munch 2005 $45,000

25
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Type 2 Slowly Changing Dimension


A Type 2 slowly changing dimension is updated by adding a record when
a change occurs.
In this Type 2 SCD, a record is added with the new year and new salary
values. Emp_ID continues to identify K Munch as the business key.
However, with this additional row to record the change, Emp_ID can
no longer be used as a primary key.

Employee Salary Information


Emp_ID (PK) Emp_Name Year Salary
1249 K Munch 2004 $42,000
1249 K Munch 2005 $45,000
26
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

With Type 2 SCD, the business key will have duplicates, so it cannot function as a primary key.
There are several possible solutions to this problem.
12 Defining Slowly Changing Dimensions

Type 2 SCD with Effective/Expiration Datetime


An Effective Datetime column and an Expiration Datetime column can be
added to the table. These columns can be used to specify the current record.
The combination of Emp_ID with Effective Datetime can function as a primary
key in this table.
Employee Salary Information
Emp_ID Effective Emp_Name Year Salary Expiration
Datetime Datetime
1249 01Jan2004 K Munch 2004 $42,000 31Dec2004
12:00PM 11:59PM
1249 01Jan2005 K Munch 2005 $45,000
12:00PM

PK 27
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Using an effective date, effective datetime, begin date, or begin datetime is a common practice for
SCD Type 2.
Defining Slowly Changing Dimensions 13

Type 2 SCD with Retained Key/Current Indicator


Some tables are set up with a Retained Key column and an Effective Datetime
column. The combination of Retained Key with Effective Datetime functions
as a primary key. The Current Indicator column identifies the current record.
Employee Salary Information
Emp_ID Retained Effective Emp_Name Year Salary Current
Key Datetime Indicator

1249 1 01Jan2004 K Munch 2004 $42,000 0


12:00PM
1249 1 01Jan2005 K Munch 2005 $45,000 1
12:00PM

PK 28
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

A retained key has a generated value.


• A retained key is typically implemented as a uniformly incrementing integer except that it retains its
value for the same business key value.
• A retained key is often added to a table if the business key is composite or otherwise complex.
• The retained key gives better performance than a complex business key in joins. But because a
retained key is not unique, it cannot be used on its own as a primary key.
• The combination of retained key with an effective date or datetime can provide a primary key.
14 Defining Slowly Changing Dimensions

Type 2 SCD with Surrogate Key/Current Indicator


Alternatively, a Surrogate Key column and a Current Indicator column can
be added to the table.
The Surrogate Key column provides a primary key and the Current Indicator
column identifies the current record. A surrogate key is a unique key that
has a generated value.
Employee Salary Information
Emp_ID Surrogate Emp_Name Year Salary Current
Key Indicator
1249 1 K Munch 2004 $42,000 0
1249 2 K Munch 2005 $45,000 1

PK
29
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

A surrogate key is a unique key that has a generated value. It is typically implemented as a uniformly
incrementing integer.
The current indicator is either 1, representing a current record, or 0, representing an old record.

Type 3 Slowly Changing Dimension


A Type 3 slowly changing dimension is updated by moving the old value
to a column that holds old values and adding the new value to a column
that holds the current value.
Employee Salary Information
Emp_ID (PK) Emp_Name Year Salary Old Year Old Salary
1249 K Munch 2005 $45,000 2004 $42,000

The Type 3 slowly changing dimension is limited to retaining only a fixed


number of historical values.
Employee Salary Information
Emp_ID (PK) Emp_Name Year Salary Old Year Old Salary
1249 K Munch 2006 $50,000 2005 $45,000
30
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
Defining Slowly Changing Dimensions 15

Updating a Star Schema


SAS Data Integration Studio provides transformations to implement Type 1
and Type 2 slowly changing dimensions.

SCD Type 1 Updates a dimension table, with surrogate key


transformation generation and change column selection.

Updates a dimension table, with configurable


SCD Type 2
primary key generation and
transformation
date/version/current record identification.
Lookup Loads a fact table, using dimension tables as
transformation lookup tables for key values.

31
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

The Lookup transformation is a generic lookup utility. It loads a target table with columns from a
source table and columns from any number of lookup tables.

Setup for the Poll


Consider the following records in a customer dimension table:

32
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
16 Defining Slowly Changing Dimensions

3.01 Multiple Choice Poll


Given that the business key is contained in the Customer_ID column, does
the Key column contain a retained key or a surrogate key?

a. Retained key
b. Surrogate key
c. There is not enough information to decide.
d. It contains neither a surrogate nor a retained key.

33
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

You might also like