Week 01 Data Profiling
Week 01 Data Profiling
Goals
Specifically the goals are to
Get you familiar with how to use the tools provided in the course.
Help you understand how to profile data—reading unknown database schemas/data and
deducing intent behind table designs and data.
Get you comfortable writing SQL SELECT queries against a relational database, as a means to
understand the capabilities of the data and metadata that you have.
Effort
This assignment should be done individually. The work you complete should represent your own ability.
Technical Requirements
To complete this assignment you will need the following:
Page 1 of 7
IST722—Data Warehouse Assignment 01
Michael A. Fudge, Jr. Data Profiling
Page 2 of 7
IST722—Data Warehouse Assignment 01
Michael A. Fudge, Jr. Data Profiling
Of course, in this class we cannot ask someone who works at Northwind Traders about customers or
orders. We can only make assumptions about how they would use the data, and for that reason we will
rely heavily on the existing database schema.
Assignment Structure
All technical assignments in this course are structured in the same manner. Their purpose is to give you
hands-on experience “doing” data warehousing, allowing you to put into practice the concepts you
learned through the class sessions. You will learn quite a bit as you work through these assignments.
This is by design. Every assignment begins with an overview section, which explains the activity, its goals,
and its requirements.
The second part of the assignment will walk you through the process of completing the activity. In this
particular assignment, it’s profiling data from the Northwind Traders database.
The third part of the assignment you must complete on your own. It is expected you will take what you
learned from your studies, class sessions, and your experiences with Parts 1 and 2, then apply them
toward the problems you face in the third part.
At the end of the assignment it is stated what you should hand in as a deliverable to demonstrate you
completed the work to satisfaction.
Page 3 of 7
IST722—Data Warehouse Assignment 01
Michael A. Fudge, Jr. Data Profiling
Part 2: Walk-Through
In this part we will walk you through the process of data profiling. We will profile two types of sources: a
master data source and a business process.
How do you know which source is master data and which is a business process? The trick is to think of
your data at the conceptual model level:
Assigned to 1/M
Employee Territory
1/M Assigned to
In this example Employee and Territory are master data. They represent categorical business data. The
many-to-many relationship between them is a business process representing who gets assigned to
which territory.
In the Kimball method of data warehousing your master data become dimensions, and the business
processes, events, or transactions become fact tables. Honestly, that is a gross oversimplification, but
for now it works.
What are the business or natural keys? We’re not interested in the primary key, which is
internal to the RDMBS implementation. We’re looking for what makes each entity unique from
the business user’s vantage point. There should always be a natural key for master data, but
there might not be one for business processes.
What does “one row” of data mean? The answer to this question often reveals itself when you
discover the business key or identify the business process.
Page 4 of 7
IST722—Data Warehouse Assignment 01
Michael A. Fudge, Jr. Data Profiling
The easiest way to figure this out is by using the select statement. Observe:
If you want to try it for yourself, open a new query window in the Northwind database by pressing
CTRL+N, then type the select statement, and press the !Execute button in the toolbar.
It appears as if there’s one row for each employee. That stands to reason, but to be honest it’s not
always that simple.
What about the business key? What is used to uniquely identify the employee? Normally this would be a
Social Security number or tax payer ID number because each row would have a unique value. With this
data it’s not clear. You might think of using a name:
Here are some sample queries that might be asked as part of a functional business requirement and the
SQL statements that satisfy them.
Page 5 of 7
IST722—Data Warehouse Assignment 01
Michael A. Fudge, Jr. Data Profiling
From the screenshot we now know there is only one territory with more than one employee assigned:
New York.
We order by EmployeeCount
to see the territories with the
most employees first.
Page 6 of 7
IST722—Data Warehouse Assignment 01
Michael A. Fudge, Jr. Data Profiling
Write SQL queries to answer the following questions that might be associated with functional business
requirements in a data warehouse. For each of the following provide a screenshot of the SQL query and
its output, making sure your name or NetID appears in the screenshot.
1. List the customer contact names and titles sorted by company name.
2. Factoring in discounts, what is total amount of product sold?
3. Provide a list of product category names with counts of products in each category.
4. Select a specific customer, and display that customer’s orders with total amount of product sold
for each order.
5. Select a specific employee and each order, how it was shipped (shipvia), the company who
shipped it, and the total number of days elapsed from order date to shipped date.
Turning It In
Please turn in a Word document with your name, NetID, and date at the top. Copy and paste your
completed Part 3. Be sure you include screenshots as directed.
Do not submit a copy of this assignment file. I only need the Part 3.
Page 7 of 7