Week 03 High Level Dimensional Modeling
Week 03 High Level Dimensional Modeling
Assignment 03:
High-Level Dimensional Modeling
Part 1: Overview
This assignment will introduce you to the high-level dimensional-modeling process. The goal of this
process is to turn functional business requirements into dimensional data warehouse (DDS)
specifications based on the Kimball technical architecture. Upon completing this lab activity you learn:
Additional ways of profiling data using the SQL Query language as to identify master data and
business processes
The process of high-level dimensional modeling, including:
o Create a high-level dimensional model diagram (Kimball: Figure 7-3, p. 304)
o Create an attribute and metrics list (Kimball: Figure 7-2, p. 294)
o Keeping track of issues
Goals
Specifically the goals of this assignment are to:
Understand the goals of the high-level dimensional modeling process and practice its steps
Master the act of profiling data and transforming functional requirements into a technical
specifications for a Kimball (DDS) data warehouse architecture
Understand the value of the high-level modeling worksheet as a technical documentation tool,
which can be later used to determine how to properly build tables in our DDS
Effort
This assignment can be done individually or with a partner. If you work with a partner, do not simply
divide up the work. Collaborate with each other throughout the exercise as if you were working on the
same data warehousing team.
Technical Requirements
To complete this assignment you will need the following:
Access to the course ist-cs-dw1.ad.syr.edu SQL Server, specifically the Northwind Traders
database. You should connect to this server before starting the assignment.
The High-Level Dimensional-Modeling Excel workbook, available in the same place where you
got this assignment.
Microsoft Excel 2007 or later for editing the workbook.
Page 1 of 9
IST722—Data Warehouse Assignment 03
Michael A. Fudge, Jr. High-Level Dimensional Modeling
Page 2 of 9
IST722—Data Warehouse Assignment 03
Michael A. Fudge, Jr. High-Level Dimensional Modeling
1. Sales reporting. Senior management would like to be able to track sales by customer, employee,
product, and supplier, with the goal of establishing which products are the top sellers, which
employees place the most orders, and who are the best suppliers.
2. Order fulfillment and delivery. There is a need to analyze the order-fulfillment process to see if
the time between when the order is placed and when it is shipped can be improved
3. Product inventory analysis. Management requires a means to track inventory, on order, and
reorder levels of products by supplier or category. Inventory levels should be snapshotted daily
and recorded into the warehouse for analysis.
4. Sales coverage analysis. An analysis of the employees and the sales territories they cover.
Part 2: Walk-Through
In this part of the assignment, we will work together to create a high-level design for the first functional
business requirement: sales reporting. Along the way we’ll profile our dimensional data and get a feel
for our facts using SQL queries against the Northwind database.
Getting Started
Connect to your SQL Server using SQL Server Management Studio, and open a query window
for the Northwind database.
Open the High-Level-Dimensional-Modeling Excel workbook, to the Bus Matrix page.
Page 3 of 9
IST722—Data Warehouse Assignment 03
Michael A. Fudge, Jr. High-Level Dimensional Modeling
At this point you might be wondering: What does order detail look like, and how to we know it is what
we need? This is where data profiling comes into
play. Let’s take a look.
NOTE: In real life you won’t strike gold so easily. You’ll have to look at several tables before you can get
a clear picture of your fact table grain.
For example if you review the database diagram on page 2 of the lab you’ll see that the Order Details
table connects directly to the Products table via a foreign key in a many-to-one relationship. Because it
appears on multiple orders, Product fits the candidacy of a dimension. Once again we can verify this
dimension works for us and “rolls up” a couple of our known facts by writing some SQL.
Page 4 of 9
IST722—Data Warehouse Assignment 03
Michael A. Fudge, Jr. High-Level Dimensional Modeling
Important Tip: You should always exercise caution when profiling live systems. Executing SQL queries
against production data is usually not a wise decision as you may impact performance negatively. It is
important to seek the advice of a database administrator prior to embarking your data-profiling
adventure!
Page 5 of 9
IST722—Data Warehouse Assignment 03
Michael A. Fudge, Jr. High-Level Dimensional Modeling
There are cases where some other business process might need Suppliers or Categories, and therefore it
would make sense to combine them into a single dimension. This is the fundamental idea behind
snowflaking.
Once you’ve identified a useful dimension, it’s time to add it to our Bus Matrix like so. In this example
we’ve added the Product dimension.
Important Tip: There should always be a many-to-one relationship between the business process table
and the master data that make up your dimension. One row in the dimension should appear many times
in the business process. For example, one product appears many times on different orders.
Fast-forward through some more data profiling, and here’s a screenshot of the dimensions I’ve
discovered so far:
Important Tip: The x at the intersection of dimension and business process indicates there will be a
foreign key in our DDS connecting the business process to the dimension table. Our goal is to reuse
dimensions like Product, Customer, and so on across other business processes. This is called conforming
dimensions.
Page 6 of 9
IST722—Data Warehouse Assignment 03
Michael A. Fudge, Jr. High-Level Dimensional Modeling
In our case if you run an SQL Query on the Orders table, you’ll see Order Date and Shipped Date. So
we’ll add both to our model:
Important Tip: These are not two different dimensions. They are the same dimension, but we need two
foreign keys back to the same table. This is referred to as a role-play dimension.
How many of a specific product category were sold? Category is the attribute of the Product
dimension, and how many is the measurement and, therefore, the fact.
Which customers have ordered the most? Customer is the dimension, and Sold Amount is the
measurement (fact).
From merely identifying the fact grain of the model you probably already have a few facts in mind (they
can be found in the business process table), but now’s the time to really nail down the facts you need in
your model. Like everything else in this step a lot will depend on your requirements.
One important thing to recognize is not all facts appear among your source data. Some of the facts you’ll
need are derived facts. We do a little math on some of the source data values. We include the facts we
want in the Bus Matrix but explain how they are derived in the Attributes and Metrics worksheet. For
now, we’ll add the following facts to our Bus Matrix and complete it.
Important Tip: That last one is a bit tricky. The Freight value is found in the Orders table and is not
broken out per item on the order. We have made a data governance decision to evenly divide the
freight by the number of items in the order. In the real world we don’t make this decision—the business
users decide how it should be handled.
Page 7 of 9
IST722—Data Warehouse Assignment 03
Michael A. Fudge, Jr. High-Level Dimensional Modeling
The idea behind the attributes and metrics is to define your facts and outline the important attributes in
your dimensions.
Completing the Attributes and Metrics page in the workbook is self-explanatory, and, therefore, I will
leave it as an exercise for you. As you complete this part, keep the following in mind:
1. Sales reporting. Senior management would like to be able to track sales by customer, employee,
product, and supplier, with the goal of establishing which products are the top sellers, which
employees place the most orders, and who are the best suppliers.
2. Order fulfillment and delivery. There is a need to analyze the order-fulfillment process to see if
the time between when the order is placed and when it is shipped can be improved
3. Product inventory analysis. Management requires a means to track inventory, on order, and
reorder levels of products by supplier or category. Inventory levels should be snapshotted daily
and recorded into the warehouse for analysis.
4. Sales coverage analysis. An analysis of the employees and the sales territories they cover.
Page 8 of 9
IST722—Data Warehouse Assignment 03
Michael A. Fudge, Jr. High-Level Dimensional Modeling
In this part, you will repeat the process outlined in Part 2 of the assignment for the remaining three
business processes.
When you are finished you should have the following in your High- Level Dimensional-Modeling
workbook:
1. A completed Bus Matrix with all four business processes in it. No dimensions should repeat. To
reuse a dimension for another business process, include an X at its intersection.
2. A completed Attributes and Metrics list. Specifically you should define facts and derived facts
and important dimension attributes.
3. Along the way if you encounter issues or unknowns, record them under the Issues List tab so
you remember to address them at some point in the process.
Tip: Keep in mind you can model only the data you have. If it’s not in your external world source data (in
this case, it’s Northwind Traders), then you cannot include it in your data warehouse!
Turning It In
Please turn in your completed High-Level Dimensional-Modeling worksheet. Make sure your name,
NetID, and date appear somewhere at the top of the Bus Matrix page.
If you worked with a partner, please indicate that in your assignment by including your partner’s name
and NetID. You should both submit the assignment individually.
Page 9 of 9