Lecture 3 - BISM7233 - AS - 2023
Lecture 3 - BISM7233 - AS - 2023
Dimensional
Modelling
Avijit Sengupta
E-mail: [email protected]
Room: 510 Joyce Ackroyd (37) Building
1
Recap: Business Analytics Framework
2
Recap: Invoice example (Normalized Relations and ER Diagram)
3
Level 1
Customer Contact
Address Type Postal Area Belong To Region Supplier
Type
Possess
Customer Contact Type Code Address Type Code Postal Area Code Region Code Supplier Number
Customer Contact Type Description Address Type Name Postal Area Suburb Region Name Supplier Name
Postal Area City Region Description
Classify Postal Area State
Postal Area Country Supplier Contact
Classify Situate Region Code Have
Customer Contact Name
Supplier Number
Customer Contact Number
Supplier Contact Person
Customer Contact Customer Number
Supplier Contact Number
Person Customer Contact Type Code
Product Number
Customer Industry Categorise Product Name
Customer Place Sales Order Include Sales Order Item Belong Product Product Description
Product Category Code
Customer Industry Type Code Product Current Cost Price
Customer Number Sales Order Number Sales Order Number
Customer Number Product Quantity on Hand
Customer Name Customer Number Product Number Product Re Order Level
Customer Registered Date Sales Order Date Accepted Sales Order Item Quantity Requested
Offer Customer Credit Limit Sales Order Confirmation * Sales Order Item Sale Selling Price
Customer Credit Terms Code Employee Number Sales Order Item Sale Cost Price
Customer Credit Customer Segment Code Accept Address Code Group Contain
Terms Manage
Employee
Product Category
Qualification
Customer Credit Terms Code Has
Customer Credit Terms Description Categorise
Employee Qualification Code Product Category Code
Customer / Employee Number Product Category Name
Product Price
Employee Be Employee Employee Qualification Year Product Category Description
Customer History
Assignment Has
Segment
Employee Qualification Code
Customer Number Employee Number Employee Employee Qualification Description Product Number
Customer Segment Code Customer/Employee Assignment Start Date Employee Name Qualification Type Product Price History Termination Date
Customer Segment Description Customer/Employee Assignment End Date * Employee Telephone Extension Product Price History Minimum Quantity
Employee Number Employee Position Title Product Price History Selling Price
Employee Start Date
4
Recap: Querying a Database (Four Primary
Operations of SQL)
Create INSERT
Read SELECT
Update UPDATE
Delete DELETE
5
6
Recap: SELECT Example 1
SQL
RESULT
SQL
RESULT
8
Recap: An SQL Primer: GROUP BY
Aggregating data by particular attribute
SQL
RESULT
SQL
RESULT
10
Recap: SQL Joins – Inner JOIN
• Inner Join the tables with foreign keys!
SQL
RESULT
Level 1
Customer Contact
Address Type Postal Area Belong To Region Supplier
Type
Possess
Customer Contact Type Code Address Type Code Postal Area Code Region Code Supplier Number
Customer Contact Type Description Address Type Name Postal Area Suburb Region Name Supplier Name
Postal Area City Region Description
Classify Postal Area State
Postal Area Country Supplier Contact
Classify Situate Region Code Have
Customer Contact Name
Supplier Number
Customer Contact Number
Supplier Contact Person
Customer Contact Customer Number
Supplier Contact Number
Person Customer Contact Type Code
Product Number
Customer Industry Categorise Product Name
Customer Place Sales Order Include Sales Order Item Belong Product Product Description
Product Category Code
Customer Industry Type Code Product Current Cost Price
Customer Number Sales Order Number Sales Order Number
Customer Number Product Quantity on Hand
Customer Name Customer Number Product Number Product Re Order Level
Customer Registered Date Sales Order Date Accepted Sales Order Item Quantity Requested
Offer Customer Credit Limit Sales Order Confirmation * Sales Order Item Sale Selling Price
Customer Credit Terms Code Employee Number Sales Order Item Sale Cost Price
Customer Credit Customer Segment Code Accept Address Code Group Contain
Terms Manage
Employee
Product Category
Qualification
Customer Credit Terms Code Has
Customer Credit Terms Description Categorise
Employee Qualification Code Product Category Code
Customer / Employee Number Product Category Name
Product Price
Employee Be Employee Employee Qualification Year Product Category Description
11
Customer History
Assignment Has
Segment
Employee Qualification Code
Customer Number Employee Number Employee Employee Qualification Description Product Number
Customer Segment Code Customer/Employee Assignment Start Date Employee Name Qualification Type Product Price History Termination Date
Customer Segment Description Customer/Employee Assignment End Date * Employee Telephone Extension Product Price History Minimum Quantity
Employee Number Employee Position Title Product Price History Selling Price
Employee Start Date
NATURAL JOIN Vs INNER JOIN
SR.NO. NATURAL JOIN INNER JOIN
Inner Join joins two table on the
Natural Join joins two tables based basis of the column which is
1. on same attribute name and explicitly specified in the ON
datatypes. clause.
In Natural Join, The resulting table In Inner Join, The resulting table
2.
will contain all the attributes of will contain all the attribute of
both the tables but keep only one both the tables including duplicate
copy of each common column columns also
In Natural Join, If there is no
In Inner Join, only those records
condition specifies then it returns
3. will return which exists in both the
the rows based on the common
tables
column
SYNTAX:
SYNTAX: SELECT *
SELECT * FROM table1 INNER JOIN table2
4. FROM table1 NATURAL JOIN ON table1.Column_Name =
table2; table2.Column_Name;
12
ER Modelling Task
The Brisbane Movie Library purchases movies on various formats and loans them to its members for a
charge in order to make a profit. The business is designing a new information system.
The proposed new system will include an accurate catalogue to inform members of movies held in each
store by a number of different categories (eg. action, comedy, etc.) or which movies are held featuring
their favourite actors. The catalogue will also show if a particular movie is available that day at a particular
store.
Accurate information about which members have borrowed which movies, and when movies are due to
be returned will also be available. This should encourage borrowers to return their movies promptly.
Keeping track of loans using the current membership system has proven to be slow and prone to error.
Improved turnaround of movies should increase profit.
In order to keep track of the costs involved in purchasing movies, details of purchase orders will be stored
for all movies. This information will help to select suppliers, negotiate cheaper prices for future purchases,
and help with auditing.
Each movie is allocated a rental charge and all loans are for one day (24 hour period). Occasionally, a
special member may be given a longer loan period. All overdue movies incur an excess charge of $2 per
day for each day they are late. While members will be encouraged to return movies to the store from
which they borrowed them, the new system should also make it easier to keep track of movies returned
13
to other stores.
Brisbane Movie Library – ER Model
14
Brisbane Movie Library – ER Model
15
Recap: Business Analytics Framework
16
Transactional vs Informational Databases
17
Agenda and Learning Objectives for today
18
Transactional Databases
• Support operations of an organization (running
transactions)
• Selling a products, shipping, hiring, supplying
• Store data from every-day transactions
• Highly normalised to avoid redundancy of data
• Optimised to write new data in as transactions
happen (because of normalised structure)
19
Is normalization good for analytical
decision-making purposes?
Let’s look at the two types of databases:
• Transactional databases
• used to answer operational questions
20
Sales Transactions September 2 for total
8, 2012 of $19.88
Store Information Store Visit Item Scan
Leesburg
Sam’s Club Kendall Jackson
Chardonnay
Member Index
Item Description
Marten
Risius
SubCategory
Liquor
21
Transactional (Operational) Questions
Store Information Store Visit Item Scan
Customer Service:
Help! I forgot my
membership card! Member Index
Item Description
Select
membership_nbr from
MEMBER_INDEX where SubCategory
phone_num = ‘555-
1212’
22
Transactional (Operational)
Store Information Store Visit Item Scan
Inventory:
Where do you carry Kendall
Jackson chardonnay? Member Index
Item Description
Select item_location
from ITEM_DESCRIP
where item_name =
SubCategory
‘Kendall Jackson
chardonnay’
23
Transactional (Operational)
Store Information Store Visit Item Scan
SubCategory
Customer Service:
What stores are open on
Sunday in Queensland?
24
Analytical Questions 1,007,961
48,204,709
150
SubCategory
9,894
25
With business analytics, we are
interested in analytical queries
• One is interested in numerical aggregations
• How many?
• What is the average?
• What is the total cost?
26
Transactional vs Informational Databases
Data Structure Optimised for transactions (lots of Optimised for complex queries
writes)
31
Data Warehouse Features
32
Defining Features
• Subject Oriented Data
• Data warehouses are organised around particular subjects
• Data is integrated across functions
• sales, customers, products
• Data in a DW cuts across Application requirements
Savings Accounts
Accounts Receivable Claims Product
Customer Loans
Billing Processing Customer Account
33
Defining Features
• Integrated Data
• Data from different systems
• Can be from different applications, operating systems, etc
• File layouts, field naming conventions could be different
• Locale information could be different
• Need to convert to a common format
• allows comparison and consolidation of data from different
sources
• Data from various sources are validated before storing them
in a data warehouse.
• Data quality is crucial to the credibility of the warehouse
34
Defining Features
• Time-Variant Data
• In application systems the data is current
• i.e. The current true (or correct) value
• In a Data Warehouse
• Data used for analysis and decision making
• Need current and past data = Historical data
• Otherwise can’t answer many analytical questions
• Data is stored as snapshots of the current values
• Snapshots are time stamped
• Data changes stored over time
• Allows
• Analysis of the past
• Relation of data to the present
• Forecasting for the future
35
Defining Features
• Non-Volatile Data
• Unlike transaction systems the DW doesn’t get updated
every time the data changes
• Store extracted data snapshots over time
• Data is periodically updated
• That could be every second, hour, day, week or even month
• Different data items updated with different frequencies
• Users have read access only
• all updating done automatically by ETL process and
periodically by DB Administrator
36
Defining Features
• Data Granularity
• Operational systems
• Data kept at lowest level of detail
• Summary data created by adding up the numbers
• Its not stored
• Informational systems
• Queries usually start with summary data
• Then as analysis occurs more detailed levels of data are needed
• Data usually stored at various levels for efficiency
• Data granularity is the level of detail
• The finer the granularity the lower the level of detail
• The lowest level of granularity is called “the grain”
37
Defining Features
• Supports management needs
• Used by end users
• Data warehouses require a simple and easy to navigate
structure
• Responses to queries should be “timely”
38
Data Warehouse Design:
Dimensional Modelling (Kimball)
39
Business Analyst World
• How much revenue did the product G generate in
the last three months, broken down by month for
the south eastern sales region, by individual
stores, broken down by promotions, compared to
estimates and to the previous version of the product
• Analysis starts usually with a single indication of something
strange, then goes deep into the data, left to a new
dimension, right to another, up to the summary, back down
and left and right again, until the problem is identified…
40
Introduction to Dimensional Modelling
41
Dimensional Modelling- Objectives
42
Dimensional Modelling
43
Fact Table
• A fact table contains the actual business measures
(additive), called facts
• Also contain foreign keys for dimensions
keys Sale
{
Time key
Store key
Customer key
Product key
Unit sales}
Dollar sales
facts
44
Fact Table - example
45
Grain Example
• Impact
• Higher storage requirements for fine grain
• More reporting flexibility for fine grain
46
Dimension Tables
Customer
Customer key
Name
Customer type
Sale
Product Time key
Store key
Store
Product key Store key
Customer key
Product type Address
Product key
weight Region
Dollar sales
Unit sales
Time
Time key
Day
Month
47
Dimension Hierarchies
Sale
Product Time key
Product key Store key
Product type Customer key
Product group Product key
Product sub-group Dollar sales
weight Unit sales
48
Dimension Table - example
• Actual data might look like this
• Hierarchy evident in data
49
Dimensional model as an ER model
Customer
Customer key
Name
Customer type
Product Sale
Product key Time key
Store key
Store
Product type Store key
Product group Customer key
Address
Product sub-group Product key
Region
weight Dollar sales
Unit sales
50
Star Schema
“WHERE” dimension
“WHO” dimension
51
Designing a Dimensional Model
• Choose a Business Process
• Choose the grain of the fact table
• Choose the dimensions
• Choose the measured facts (usually numeric,
additive quantities)
• Complete the dimension tables
(Kimball, 1996)
52
Dimensional Modelling Task
• Design a dimensional model for LOANS
The Brisbane Movie Library purchases movies and loans them to its members for a charge in
order to make a profit. The business is designing a data mart and decision support system.
Management wants to analyse the borrowing patterns of members in order to better identify the
key members (most revenue per quarter). They can then focus on providing service to these
members.
Management needs to analyse the value of their movies. They want to know which movies
generate the most revenue per quarter. They don’t want to keep movies which are never (or
rarely) borrowed. If a movie has not been borrowed for 3 months, it will be sold. Movies
which generate a total rental return over 6 months which is less than their purchase price
should also be sold to help keep inventory levels down.
Management wants to analyse the performance of each store to understand which are the most
successful (in terms of profit = revenue – cost). 53
Design Outcomes: Normalised or
Denormalised?
• Normalisation
• Eliminates redundancy
• Storage efficiency
• Referential Integrity
• Denormalisation
• Fewer tables (fewer joins)
• Fast querying
• Design is tuned for end-user analysis (tools & cognition)
54
Let’s Summarise!!
• Transactional databases suitable for running
transactions
• Store data in normalized structure
• Informational databases suitable for decision-making
• It is not highly normalized
55
What is Examinable:
• Differenced between informational and transactional
databases/questions
• DW Features
• Developing dimensional models
56
Next Seminar
57
Next Seminar
58
Basic Structure of SQL
SELECT: The select clause corresponds to the projection operation of the relational
algebra. It is used to list the attributes desired in the result of a query.
WHERE: The where clause corresponds to the selection predicate of the relational
algebra. It consists of a predicate involving attributes of the relations that appear
in the from clause.
59
60