0% found this document useful (0 votes)
15 views59 pages

Dimensional Modeling

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views59 pages

Dimensional Modeling

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

3.

0 Introduction to
Dimensional Modeling

Eugene Rex L. Jalao, Ph.D.


Associate Professor
Department Industrial Engineering and Operations Research
University of the Philippines Diliman

Module 2 of the Business Intelligence and Analytics Track of


UP NEC and the UP Center of Business Intelligence
Outline for This Training

1. Introduction to Data Warehousing


2. DW Lifecycle and Project Management
– Case Study on DW PM
3. Dimensional Modeling
4. Designing Fact Tables
5. Designing Dimension Tables
– Case Study on Dimension Modeling
6. Extraction Transformation and Loading
– Case Study on ETL Planning
7. Transformation and Loading Methodologies
– Case Study on ETL

E.R. L. Jalao, UP NEC, [email protected] 2


Outline for This Session

• Inmon versus Kimball Paradigm


• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model

E.R. L. Jalao, UP NEC, [email protected] 3


Inmon versus Kimball Paradigm

• Two Models for Data Warehouses


– Imnon Model
– Kimball Model

E.R. L. Jalao, UP NEC, [email protected] 4


Inmon versus Kimball Paradigm

• Inmon Model
– Consists of all databases and information systems in an
organization
– Also called the CIF (Corporate Information Factory)
– Defines overall database environment as:
• Operational
• Atomic data warehouse
• Departmental
• Individual
– The Warehouse is part of the bigger whole (CIF)

E.R. L. Jalao, UP NEC, [email protected] 5


Inmon versus Kimball Paradigm

Figure 3.1: Inmon Model

E.R. L. Jalao, UP NEC, [email protected] 6


Inmon versus Kimball Paradigm

• Kimball Model
– The Dimensional Data Model
• Does not adhere to normalization theory
• Starts with tables
– Numeric Tables
– Context Tables
• User accessible

E.R. L. Jalao, UP NEC, [email protected] 7


Inmon versus Kimball Paradigm

Figure 3.2: Kimball Model

E.R. L. Jalao, UP NEC, [email protected] 8


Inmon versus Kimball Paradigm

Table 3.1: Comparison of the Inmon and Kimball Model

Inmon Kimball
Overall Approach Top-Down Bottom-Up
Complexity of Method Complex Simple
Data Orientation Data Driven Process Oriented
Tools Traditional ERDs Dimensional Modeling
End User Accessibility Low High

E.R. L. Jalao, UP NEC, [email protected] 9


Inmon versus Kimball Paradigm

Table 3.2: Philosophy Comparison of the Inmon and Kimball Model

Inmon Kimball
Primary Audience IT End Users
Deliver a Sound Deliver a Solution that
Technical Solution makes it easy for end
Objective
Based on Proven users to directly query
Methods data

E.R. L. Jalao, UP NEC, [email protected] 10


Inmon versus Kimball Paradigm

Table 3.3: How to Choose, Inmon versus Kimball Model?


Favors Inmon Favors Kimball
Planning Horizon Strategic Tactical
Data Integration Enterprise-Wide Individual Business
Requirements Integration Areas
Time to Delivery Need for First Data
Longer Start-up Time
Warehouse is Urgent
Cost Higher start-up costs, Lower start-up costs
with lower subsequent with each subsequent
project dev costs project costs the same
Staffing Requirements Large Teams of Small Teams of
Specialists Generalists

E.R. L. Jalao, UP NEC, [email protected] 11


Outline for This Session

• Inmon versus Kimball Paradigm


• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model

E.R. L. Jalao, UP NEC, [email protected] 12


What is Dimensional Modeling?

• Dimensional modeling is a logical design technique for


structuring data so such that
– It is intuitive for business users
– And delivers fast query performance.
• Widely accepted as the preferred approach for DW
presentation.
• Simplicity is fundamental to usefulness.
• Allows software to easily navigate databases.

E.R. L. Jalao, UP NEC, [email protected] 13


What is Dimensional Modeling?

Figure 3.3: The Kimball Lifecycle


E.R. L. Jalao, UP NEC, [email protected] 14
What is Dimensional Modeling?

Definition 3.1: Dimensional Modeling


• Divides world into measurements and context.
• Measurements are numeric values called facts.
• Context intuitively divided into clumps called dimensions.
• Dimensions describe the “who, what, where, when, why,
and how” of the facts.

E.R. L. Jalao, UP NEC, [email protected] 15


What is Dimensional Modeling?

Definition 3.2: Dimensional Model


• A dimensional model consists of a fact table containing
measurements surrounded by a halo of dimension tables
containing textual context.
• Known as a star join.
• Known as a star schema when stored in a relational
database (RDBMS).

E.R. L. Jalao, UP NEC, [email protected] 16


What is Dimensional Modeling?

Figure 3.4: Typical Dimensional Model


E.R. L. Jalao, UP NEC, [email protected] 17
Standard SQL Query Template

SELECT p.brand, sum(f.pesos_sold),


sum(f.units_sold)
FROM sales_fact f, product_dim p, date_dim d
WHERE f.productkey = p.productkey
and f.datekey = d.datekey
and d.quarter = ‘1 Q 2015’
GROUP BY p.brand
ORDER BY p.brand

E.R. L. Jalao, UP NEC, [email protected] 18


Typical Dimensional Answer Set

Brand Pesos Sales Unit Sales


Axon 780 263
Framis 1044 509
Widget 213 444
Zapper 95 39
Dimension Fact Table
Attribute Metrics

E.R. L. Jalao, UP NEC, [email protected] 19


Creating a Report by Drag and Drop

E.R. L. Jalao, UP NEC, [email protected] 20


Relating a Star Schema to a Report

• Drilling down = “give me more detail” by adding a row


header (to an existing SQL request)
• Real drill down can mix hierarchical and non-hierarchical
attributes from all available dimensions

E.R. L. Jalao, UP NEC, [email protected] 21


Dimension Attributes
Yield Interesting Results
• Dimension attributes are the source of most interesting
constraints
• Examples
– Slice sales by product category, by region, by barangay
– Analyze sales effectiveness on radio promotions via the AdType
attribute in Promotions dimension

E.R. L. Jalao, UP NEC, [email protected] 22


Outline for This Session

• Inmon versus Kimball Paradigm


• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model

E.R. L. Jalao, UP NEC, [email protected] 23


Two Paradigms

• Relational Modelling
• Dimensional Modelling

E.R. L. Jalao, UP NEC, [email protected] 24


Review: Relational Modeling

• Widely used method in most databases nowadays


• Data is divided into discrete entities
– each of which becomes a relational database table called an
entity
• Models are shown in two forms – logical and physical
• Logical models are designed to be independent of any
particular RDBMS.
– The “tables” in a logical model are called entities. The “columns”
are called attributes.

E.R. L. Jalao, UP NEC, [email protected] 25


Review: Relational Modeling

• Physical models are derived from logical models but are


specific to a given RDBMS.
• Each entity has a unique identifier known as its primary
key.
• The primary key consists of one or more
attributes/columns.

E.R. L. Jalao, UP NEC, [email protected] 26


Normalized Models

• Designed to eliminate redundancies. Other than keys,


each attribute may appear in only one table.
• Design objective: a Third Normal Form (3NF) model.
• Modeling business processes results in numerous data
entities/tables and a spaghetti-like interweaving of
relationships among them.
– Some ERP systems have tens of thousands of tables.
– Even a small model can be challenging.

E.R. L. Jalao, UP NEC, [email protected] 27


Northwind Normalized Model

E.R. L. Jalao, UP NEC, [email protected] 28


Normalized Models NOT
Good for DW Systems
• Not usable by end-users – too complicated and confusing
• Not usable for DW queries – performance too slow (many
joins)

E.R. L. Jalao, UP NEC, [email protected] 29


Normalized Models Best
for Operational Systems
• Normalized models essential to good operational systems
– Excellent for capturing and understanding the business (rules)
• One PO, multiple Line Items
– Great for speed when processing individual transactions

E.R. L. Jalao, UP NEC, [email protected] 30


Observations on Relational Models

• Normalized models look very different from dimensional


models
– Normalized models confuse business users
– Business users see their business in dimensional models
• Dimensional models may contain more content than
normalized models
– History
– Enhanced with content from external sources

E.R. L. Jalao, UP NEC, [email protected] 31


Two Key Benefits of Dimensional
Modeling à la Kimball
• Understandability
– Model must be easily understood by business users
– Yet represent complexities of the business
• Performance
– Fast response to queries that summarize millions of rows is essential
– Limiting models to single level joins rather than multi-level joins
– Denormalization has a significant impact on performance

E.R. L. Jalao, UP NEC, [email protected] 32


Benefits of Dimensional Models

• Predictable, Standard Framework


– Users recognize that this is “their business”
– Report writers, query tools, and user interfaces can be built into BI
tools
– Makes user interfaces more understandable
– Makes processing more efficient

E.R. L. Jalao, UP NEC, [email protected] 33


Benefits of Dimensional Models

• Gracefully Extensible to Accommodate Change


– Existing tables can be changed by adding new data rows
• Data should not have to be reloaded
– No query tool or reporting tool has to be reprogrammed
– Old BI applications continue to run without yielding different
results

E.R. L. Jalao, UP NEC, [email protected] 34


Benefits of Dimensional Models

• Star Join Schema is Symmetrical


– Every dimension is equivalent
– All dimensions symmetrically equal entry points to the fact table
• No concern about order in selecting tables
– Logical design can be done nearly independent of expected query
patterns
• Future queries not thought of can be accommodated easily
– User interfaces, query strategies, and SQL generated are all
symmetrical

E.R. L. Jalao, UP NEC, [email protected] 35


Benefits of Dimensional Models

• Standard Approaches for Common Modeling Situations


– Role-playing dimensions
• Sales Date versus Received Date
– Slowly changing dimensions
– Heterogeneous products
• Need to track lines of business together
• But each LOB product set is highly idiosyncratic
– And more…

E.R. L. Jalao, UP NEC, [email protected] 36


Benefits of Dimensional Models

• Aggregate Management
– Aggregate tables are summary tables
• Example: monthly sales fact table with month dimension
– A sound aggregate strategy is essential to good performance and
economic processing

E.R. L. Jalao, UP NEC, [email protected] 37


Outline for This Session

• Inmon versus Kimball Paradigm


• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model

E.R. L. Jalao, UP NEC, [email protected] 38


Star Schema Example

E.R. L. Jalao, UP NEC, [email protected] 39


With Dimension Families

E.R. L. Jalao, UP NEC, [email protected] 40


Sample Data

E.R. L. Jalao, UP NEC, [email protected] 41


Outline for This Session

• Inmon versus Kimball Paradigm


• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model

E.R. L. Jalao, UP NEC, [email protected] 42


Sample Fact Table Rows

E.R. L. Jalao, UP NEC, [email protected] 43


Sample Dimension Table

E.R. L. Jalao, UP NEC, [email protected] 44


Sample Dimension Table

E.R. L. Jalao, UP NEC, [email protected] 45


Sample Queries

• What was the best selling product category last week?


SELECT product_category, sum(sales _dollars)
FROM sales_fact sf, sales_date sd, product p
WHERE last_week_ind = ‘Y’ and <JOIN
Statements>
GROUP by product_category having
rank(sum(sales_dollars)) <2

E.R. L. Jalao, UP NEC, [email protected] 46


Sample Queries

• Which stores sold the most of product category ‘ABC’ last


week?
SELECT store, sum(sales_dollars)
FROM sales_fact sf, sales_date sd, product p
where last_week_ind = ‘Y’ AND
product_category = ‘ABC’ and <JOIN
Statements>
GROUP BY store having rank(sum(sales_ dollars))
<6

E.R. L. Jalao, UP NEC, [email protected] 47


Sample Report

• Business Analysis
– How did profit last month equate to store size?
• Report

E.R. L. Jalao, UP NEC, [email protected] 48


Outline for This Session

• Inmon versus Kimball Paradigm


• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model

E.R. L. Jalao, UP NEC, [email protected] 49


Designing the Dimensional Model Steps

• Establishing Naming Conventions


• Do the Four-Step Dimensional Modeling Process
• Document the High Level Data Model Diagram
• Define the Data Sources
• Document the Detailed Table Designs
• Develop Detailed Bus Matrix
• Identify, Track, and Resolve Issues

E.R. L. Jalao, UP NEC, [email protected] 50


Establishing Naming Conventions

• Use descriptive and consistent data names. Reasons:


– Names become column headers in reports. Column names must be
non-redundant. Example: not just City, but Customer City or Supplier
City
• Use standard naming convention
– PrimeWord_ZeroOrMoreQualifiers_ClassWord
• Dimension names – product_key, product_category_code,
product_category_name
• Fact names – item_amount, order_amount
• Know the naming rules of your RDBMS
– ProductKey, ProductCategoryCode, …

E.R. L. Jalao, UP NEC, [email protected] 51


Four Step Table Design Process

1. Choose the Business Process


2. Declare the Grain
3. Identify the Dimensions
4. Identify the Facts

E.R. L. Jalao, UP NEC, [email protected] 52


Document the High Level Data
Model Diagram
• High Level Data Model Diagram
– Used to communicate and validate with business users and senior
management
– Always follow the same convention in arranging dimensions
around the fact table, e.g., start with the date at the top
– Use the same arrangement with aggregates or omit or gray out
unused dimensions and substitute the names of shrunken
dimensions for others
– See exhibit 5

E.R. L. Jalao, UP NEC, [email protected] 53


Define the Data Sources

• This is sometimes known as the Application Architecture


• Often much more extensive descriptions are very helpful if
you have many sources
• See exhibit 6

E.R. L. Jalao, UP NEC, [email protected] 54


Document the Detailed Table Designs

• Document the detailed dimension worksheet


– Known as a Source-to-Target Map
– See Exhibit 7
• Note that spreadsheets are used extensively in metadata
documentation

E.R. L. Jalao, UP NEC, [email protected] 55


Develop Detailed Bus Matrix

• Bus matrix makes several things articulate and obvious


– Business processes have several fact tables
– Explicit granularity for fact tables
– Named facts for fact tables
– Reusable conformed dimensions
• See exhibit 8

E.R. L. Jalao, UP NEC, [email protected] 56


Identify, Track, and Resolve Issues

• Issues continually arise as the team works among its


members and with business participants
• Important to identify, track, and resolve these issues
– See issues log
• Assign someone to capture and track issues that arise at
meetings or in discussions

E.R. L. Jalao, UP NEC, [email protected] 57


Outline for This Session

• Inmon versus Kimball Paradigm


• What is Dimensional Modeling?
• Why not Relational Modeling?
• Examples of Dimensional Modeling
• Fact and Dimension Tables
• Designing the Dimension Model

E.R. L. Jalao, UP NEC, [email protected] 58


References

• Kimball, Ralph, Margy Ross, Warren Thornthwaite, Joy


Mundy, and Bob Becker, The Data Warehouse Life Cycle
Toolkit, Second Edition, Wiley, 2008, ISBN 978-0-470-
14977-5
• Schmitz, Michael D. UCI Irvine Data Warehousing Notes
(2014), High Performance Data Warehousing
• Simon, Alan. CIS 391 PPT Slides
• Jeltema ,Bernie, UCI Irvine Data Warehousing Notes
(2014), Strategic Frameworks, Inc.

E.R. L. Jalao, UP NEC, [email protected] 59

You might also like