0% found this document useful (0 votes)
15 views60 pages

Lecture 3 - BISM7233 - AS - 2023

Uploaded by

492rkbqfxj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views60 pages

Lecture 3 - BISM7233 - AS - 2023

Uploaded by

492rkbqfxj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Lecture 3

Dimensional
Modelling

Avijit Sengupta
E-mail: [email protected]
Room: 510 Joyce Ackroyd (37) Building

1
Recap: Business Analytics Framework

2
Recap: Invoice example (Normalized Relations and ER Diagram)

• We can name the relations now


• Customer (CustomerNumber, CustomerName, CustomerAddress)
• Clerk (ClerkNumber, ClerkName)
• Product (ProductNumber, ProductDescription)
• Invoice (InvoiceNumber, Date, CustomerNumber, ClerkNumber)
• InvoiceLineItem (InvoiceNumber, ProductNumber, UniPrice, Quantity)

3
Level 1

Customer Contact
Address Type Postal Area Belong To Region Supplier
Type
Possess

Customer Contact Type Code Address Type Code Postal Area Code Region Code Supplier Number
Customer Contact Type Description Address Type Name Postal Area Suburb Region Name Supplier Name
Postal Area City Region Description
Classify Postal Area State
Postal Area Country Supplier Contact
Classify Situate Region Code Have
Customer Contact Name
Supplier Number
Customer Contact Number
Supplier Contact Person
Customer Contact Customer Number
Supplier Contact Number
Person Customer Contact Type Code

Customer Industry Customer


Belong To Address Actual Delivery Delivery Product Source
Type Contact Location

Address Type Code Address Code Delivery Number Product Number


Customer Industry Type Code Delivery Date Supplier Number
Address Code Address Occupancy Number
Customer Industry Type Description Delivery Instructions Product Source Start Date
Customer Number Address Street Name
Postal Area Code Delivery Quantity Delivered Product Source End Date
Classify Has Distribute Sales Order Number Produce
Request Delivery
Address Code

Product Number
Customer Industry Categorise Product Name
Customer Place Sales Order Include Sales Order Item Belong Product Product Description
Product Category Code
Customer Industry Type Code Product Current Cost Price
Customer Number Sales Order Number Sales Order Number
Customer Number Product Quantity on Hand
Customer Name Customer Number Product Number Product Re Order Level
Customer Registered Date Sales Order Date Accepted Sales Order Item Quantity Requested
Offer Customer Credit Limit Sales Order Confirmation * Sales Order Item Sale Selling Price
Customer Credit Terms Code Employee Number Sales Order Item Sale Cost Price
Customer Credit Customer Segment Code Accept Address Code Group Contain
Terms Manage
Employee
Product Category
Qualification
Customer Credit Terms Code Has
Customer Credit Terms Description Categorise
Employee Qualification Code Product Category Code
Customer / Employee Number Product Category Name
Product Price
Employee Be Employee Employee Qualification Year Product Category Description
Customer History
Assignment Has
Segment
Employee Qualification Code
Customer Number Employee Number Employee Employee Qualification Description Product Number
Customer Segment Code Customer/Employee Assignment Start Date Employee Name Qualification Type Product Price History Termination Date
Customer Segment Description Customer/Employee Assignment End Date * Employee Telephone Extension Product Price History Minimum Quantity
Employee Number Employee Position Title Product Price History Selling Price
Employee Start Date
4
Recap: Querying a Database (Four Primary
Operations of SQL)

Operation SQL Command

Create INSERT

Read SELECT

Update UPDATE

Delete DELETE

5
6
Recap: SELECT Example 1
SQL

RESULT

The TABLE (name) we


want to query

The names of the


attributes that we
want data from in the
table
7
Recap: SELECT Example 2

SQL

RESULT

8
Recap: An SQL Primer: GROUP BY
Aggregating data by particular attribute

SQL

RESULT

Logic: Count (Customer ID) will return the number of customers,


Group BY CustType will group the result based on CustType 9
SQL Joins – Natural JOIN
• Natural Join: Join the tables with foreign keys where the
primary key and foreign key have the same name

SQL

RESULT

10
Recap: SQL Joins – Inner JOIN
• Inner Join the tables with foreign keys!

SQL

RESULT

Level 1

Customer Contact
Address Type Postal Area Belong To Region Supplier
Type
Possess

Customer Contact Type Code Address Type Code Postal Area Code Region Code Supplier Number
Customer Contact Type Description Address Type Name Postal Area Suburb Region Name Supplier Name
Postal Area City Region Description
Classify Postal Area State
Postal Area Country Supplier Contact
Classify Situate Region Code Have
Customer Contact Name
Supplier Number
Customer Contact Number
Supplier Contact Person
Customer Contact Customer Number
Supplier Contact Number
Person Customer Contact Type Code

Customer Industry Customer


Belong To Address Actual Delivery Delivery Product Source
Type Contact Location

Address Type Code Address Code Delivery Number Product Number


Customer Industry Type Code Delivery Date Supplier Number
Address Code Address Occupancy Number
Customer Industry Type Description Delivery Instructions Product Source Start Date
Customer Number Address Street Name
Postal Area Code Delivery Quantity Delivered Product Source End Date
Classify Has Distribute Sales Order Number Produce
Request Delivery
Address Code

Product Number
Customer Industry Categorise Product Name
Customer Place Sales Order Include Sales Order Item Belong Product Product Description
Product Category Code
Customer Industry Type Code Product Current Cost Price
Customer Number Sales Order Number Sales Order Number
Customer Number Product Quantity on Hand
Customer Name Customer Number Product Number Product Re Order Level
Customer Registered Date Sales Order Date Accepted Sales Order Item Quantity Requested
Offer Customer Credit Limit Sales Order Confirmation * Sales Order Item Sale Selling Price
Customer Credit Terms Code Employee Number Sales Order Item Sale Cost Price
Customer Credit Customer Segment Code Accept Address Code Group Contain
Terms Manage
Employee
Product Category
Qualification
Customer Credit Terms Code Has
Customer Credit Terms Description Categorise
Employee Qualification Code Product Category Code
Customer / Employee Number Product Category Name
Product Price
Employee Be Employee Employee Qualification Year Product Category Description

11
Customer History
Assignment Has
Segment
Employee Qualification Code
Customer Number Employee Number Employee Employee Qualification Description Product Number
Customer Segment Code Customer/Employee Assignment Start Date Employee Name Qualification Type Product Price History Termination Date
Customer Segment Description Customer/Employee Assignment End Date * Employee Telephone Extension Product Price History Minimum Quantity
Employee Number Employee Position Title Product Price History Selling Price
Employee Start Date
NATURAL JOIN Vs INNER JOIN
SR.NO. NATURAL JOIN INNER JOIN
Inner Join joins two table on the
Natural Join joins two tables based basis of the column which is
1. on same attribute name and explicitly specified in the ON
datatypes. clause.
In Natural Join, The resulting table In Inner Join, The resulting table
2.
will contain all the attributes of will contain all the attribute of
both the tables but keep only one both the tables including duplicate
copy of each common column columns also
In Natural Join, If there is no
In Inner Join, only those records
condition specifies then it returns
3. will return which exists in both the
the rows based on the common
tables
column
SYNTAX:
SYNTAX: SELECT *
SELECT * FROM table1 INNER JOIN table2
4. FROM table1 NATURAL JOIN ON table1.Column_Name =
table2; table2.Column_Name;

12
ER Modelling Task
The Brisbane Movie Library purchases movies on various formats and loans them to its members for a
charge in order to make a profit. The business is designing a new information system.
The proposed new system will include an accurate catalogue to inform members of movies held in each
store by a number of different categories (eg. action, comedy, etc.) or which movies are held featuring
their favourite actors. The catalogue will also show if a particular movie is available that day at a particular
store.
Accurate information about which members have borrowed which movies, and when movies are due to
be returned will also be available. This should encourage borrowers to return their movies promptly.
Keeping track of loans using the current membership system has proven to be slow and prone to error.
Improved turnaround of movies should increase profit.
In order to keep track of the costs involved in purchasing movies, details of purchase orders will be stored
for all movies. This information will help to select suppliers, negotiate cheaper prices for future purchases,
and help with auditing.
Each movie is allocated a rental charge and all loans are for one day (24 hour period). Occasionally, a
special member may be given a longer loan period. All overdue movies incur an excess charge of $2 per
day for each day they are late. While members will be encouraged to return movies to the store from
which they borrowed them, the new system should also make it easier to keep track of movies returned
13
to other stores.
Brisbane Movie Library – ER Model

14
Brisbane Movie Library – ER Model

15
Recap: Business Analytics Framework

16
Transactional vs Informational Databases
17
Agenda and Learning Objectives for today

By the end of this class you should be


able to:
• Identify difference between
informational and transactional
questions
• Explain the differences between
transactional and informational
databases
• Define and develop dimensional data
models for data-driven decision

18
Transactional Databases
• Support operations of an organization (running
transactions)
• Selling a products, shipping, hiring, supplying
• Store data from every-day transactions
• Highly normalised to avoid redundancy of data
• Optimised to write new data in as transactions
happen (because of normalised structure)

19
Is normalization good for analytical
decision-making purposes?
Let’s look at the two types of databases:
• Transactional databases
• used to answer operational questions

• Informational (Analytical) databases


• used to answer strategic questions

20
Sales Transactions September 2 for total
8, 2012 of $19.88
Store Information Store Visit Item Scan

Leesburg
Sam’s Club Kendall Jackson
Chardonnay
Member Index
Item Description

Marten
Risius

SubCategory

Liquor

21
Transactional (Operational) Questions
Store Information Store Visit Item Scan

Customer Service:
Help! I forgot my
membership card! Member Index
Item Description

Select
membership_nbr from
MEMBER_INDEX where SubCategory
phone_num = ‘555-
1212’
22
Transactional (Operational)
Store Information Store Visit Item Scan

Inventory:
Where do you carry Kendall
Jackson chardonnay? Member Index
Item Description

Select item_location
from ITEM_DESCRIP
where item_name =
SubCategory
‘Kendall Jackson
chardonnay’
23
Transactional (Operational)
Store Information Store Visit Item Scan

Select store_nbr from


Member Index
STORE_INFORMATION where Item Description
open_Sun_flag = ‘yes’ and
state = ‘Queensland’

SubCategory
Customer Service:
What stores are open on
Sunday in Queensland?
24
Analytical Questions 1,007,961
48,204,709

Store Information Store Visit Item Scan

150

Campaign Management: Member Index


Item Description
How many customers
purchased more than $500 432,233

worth of alcohol in our


5,668,375
Brisbane stores this year?

SubCategory

9,894

25
With business analytics, we are
interested in analytical queries
• One is interested in numerical aggregations
• How many?
• What is the average?
• What is the total cost?

• One is interested in understanding


dimensions
• Sales by state by customer type
• Sales by product by store by quarter

26
Transactional vs Informational Databases

Transactional Databases Informational Databases


 Focus is on supporting day to  Have a different scope &
day operations different purpose
• Recording orders • Show me the top products
• Processing claims • Show me problem regions
• Making shipments • Tell me why (drill down)
• Generating invoices • View other data (drill across)
• Receiving cash • Show the highest margins
• Reserving airline seats • Alert me if calls are high
 Focus is on getting information
at a higher level suitable for
decision-making
27
Transactional vs Informational Databases
Transactional Informational

Data Content Current Values Archives, derived, summarised

Data Structure Optimised for transactions (lots of Optimised for complex queries
writes)

Access Frequency Very High Medium

Access Type Read, update, delete Read

Usage Predictable, repetitive Ad hoc, random, heuristic

Response Time Sub-seconds Seconds to Minutes

Users Many Relatively few


28
So for decision-making purposes,
we need an Informational Database
• Designed for analytic tasks
• Gets data from multiple locations
• Internal / external
• Intuitive and easy to use
• Allows direct access by users without IT support
• Conducive to long analysis sessions
• Read intensive
• Updated at known intervals and is stable
• Storing historical data also
• Able to allow users to run queries and get results
online
• Able to allow users to initiate reports 29
Our Solution
• The Data Warehouse!
• So, what’s a data warehouse?
• A single repository of organisational data
• Current and historical
• Integrates data from multiple sources
• Internal and external
• Extracts data from source systems, transforms,
loads into the warehouse
• “Single version of truth” – a holistic integrated view
of organization data
• Makes data available to managers/users
• Without hindering day to day transactional work
• It’s a database! But it is denormalised…
30
Q. Can we use informational/analytical database for
supporting day to day operations?

31
Data Warehouse Features
32
Defining Features
• Subject Oriented Data
• Data warehouses are organised around particular subjects
• Data is integrated across functions
• sales, customers, products
• Data in a DW cuts across Application requirements

Operational Applications Data Warehouse Subjects


Order Supplier
Processing Orders Supplier Sales

Savings Accounts
Accounts Receivable Claims Product

Customer Loans
Billing Processing Customer Account

33
Defining Features
• Integrated Data
• Data from different systems
• Can be from different applications, operating systems, etc
• File layouts, field naming conventions could be different
• Locale information could be different
• Need to convert to a common format
• allows comparison and consolidation of data from different
sources
• Data from various sources are validated before storing them
in a data warehouse.
• Data quality is crucial to the credibility of the warehouse

34
Defining Features
• Time-Variant Data
• In application systems the data is current
• i.e. The current true (or correct) value
• In a Data Warehouse
• Data used for analysis and decision making
• Need current and past data = Historical data
• Otherwise can’t answer many analytical questions
• Data is stored as snapshots of the current values
• Snapshots are time stamped
• Data changes stored over time
• Allows
• Analysis of the past
• Relation of data to the present
• Forecasting for the future

35
Defining Features
• Non-Volatile Data
• Unlike transaction systems the DW doesn’t get updated
every time the data changes
• Store extracted data snapshots over time
• Data is periodically updated
• That could be every second, hour, day, week or even month
• Different data items updated with different frequencies
• Users have read access only
• all updating done automatically by ETL process and
periodically by DB Administrator

36
Defining Features
• Data Granularity
• Operational systems
• Data kept at lowest level of detail
• Summary data created by adding up the numbers
• Its not stored
• Informational systems
• Queries usually start with summary data
• Then as analysis occurs more detailed levels of data are needed
• Data usually stored at various levels for efficiency
• Data granularity is the level of detail
• The finer the granularity the lower the level of detail
• The lowest level of granularity is called “the grain”

37
Defining Features
• Supports management needs
• Used by end users
• Data warehouses require a simple and easy to navigate
structure
• Responses to queries should be “timely”

38
Data Warehouse Design:
Dimensional Modelling (Kimball)
39
Business Analyst World
• How much revenue did the product G generate in
the last three months, broken down by month for
the south eastern sales region, by individual
stores, broken down by promotions, compared to
estimates and to the previous version of the product
• Analysis starts usually with a single indication of something
strange, then goes deep into the data, left to a new
dimension, right to another, up to the summary, back down
and left and right again, until the problem is identified…

40
Introduction to Dimensional Modelling

• Popularised by Ralph Kimball in the 1990s


• Based on the multi-dimensional model of data and
designed for retrieval-only databases
• Very simple, intuitive, and easily-understood
structure
• Also known as star schema design

41
Dimensional Modelling- Objectives

• Produce database structures that are easy for end


users to understand and write queries against
• Optimise query performance (as opposed to update
performance)

42
Dimensional Modelling

• A dimensional model consists of


• a fact table
• several dimensional tables
• hierarchies in the dimensions

• Essentially a simple and restricted type of ER model

43
Fact Table
• A fact table contains the actual business measures
(additive), called facts
• Also contain foreign keys for dimensions

keys Sale

{
Time key
Store key
Customer key
Product key

Unit sales}
Dollar sales
facts

44
Fact Table - example

• Actual data might look like this


• Granularity, or level of detail, is a key issue
• Finest level of detail for a fact table
• Determined by the finest level of each dimension

Time-id Store-id Cust-id Prod-id Dollar sales Unit Sales

T100 S303 C101 P98 $120,000 5,000

T101 S303 C256 P98 $240000 10,000

T102 S387 C101 P10 $456,000 27,899

T100 S234 C400 P56 $100,200 5,600

45
Grain Example

• Rough: customer postal codes (5,000), product type


(200), store (300), week (52)

• Detailed: individual customer (200,000), individual


product (2,000), store (200), day (365)

• Impact
• Higher storage requirements for fine grain
• More reporting flexibility for fine grain

46
Dimension Tables
Customer
Customer key
Name
Customer type

Sale
Product Time key
Store key
Store
Product key Store key
Customer key
Product type Address
Product key
weight Region
Dollar sales
Unit sales

Time
Time key
Day
Month

47
Dimension Hierarchies
Sale
Product Time key
Product key Store key
Product type Customer key
Product group Product key
Product sub-group Dollar sales
weight Unit sales

Product group e.g. Hardware


- Product type e.g. Tool
- Product e.g. Hammer

48
Dimension Table - example
• Actual data might look like this
• Hierarchy evident in data

Prod-id Prod-Name Prod-Group Prod-Subgroup Weight

P10 Hammer Hardware Tool 5kg

P56 10cm Nails Hardware Nails 1kg

P98 Plastic Pipe Plumbing Pipe 1kg

49
Dimensional model as an ER model
Customer
Customer key
Name
Customer type

Product Sale
Product key Time key
Store key
Store
Product type Store key
Product group Customer key
Address
Product sub-group Product key
Region
weight Dollar sales
Unit sales

Time Fact table is an


Time key intersection table
Day
Month

50
Star Schema

“WHAT” dimension Product

“WHERE” dimension

Sales Summary Retail


Customer
(Fact Table) Outlet

“WHO” dimension

Time “WHEN” dimension

51
Designing a Dimensional Model
• Choose a Business Process
• Choose the grain of the fact table
• Choose the dimensions
• Choose the measured facts (usually numeric,
additive quantities)
• Complete the dimension tables

(Kimball, 1996)

52
Dimensional Modelling Task
• Design a dimensional model for LOANS

The Brisbane Movie Library purchases movies and loans them to its members for a charge in
order to make a profit. The business is designing a data mart and decision support system.
Management wants to analyse the borrowing patterns of members in order to better identify the
key members (most revenue per quarter). They can then focus on providing service to these
members.
Management needs to analyse the value of their movies. They want to know which movies
generate the most revenue per quarter. They don’t want to keep movies which are never (or
rarely) borrowed. If a movie has not been borrowed for 3 months, it will be sold. Movies
which generate a total rental return over 6 months which is less than their purchase price
should also be sold to help keep inventory levels down.
Management wants to analyse the performance of each store to understand which are the most
successful (in terms of profit = revenue – cost). 53
Design Outcomes: Normalised or
Denormalised?
• Normalisation
• Eliminates redundancy
• Storage efficiency
• Referential Integrity

• Denormalisation
• Fewer tables (fewer joins)
• Fast querying
• Design is tuned for end-user analysis (tools & cognition)

54
Let’s Summarise!!
• Transactional databases suitable for running
transactions
• Store data in normalized structure
• Informational databases suitable for decision-making
• It is not highly normalized

55
What is Examinable:
• Differenced between informational and transactional
databases/questions
• DW Features
• Developing dimensional models

56
Next Seminar
57
Next Seminar

• More Dimensional Modelling

58
Basic Structure of SQL

The basic structure of an SQL expression consists of three clauses:

SELECT: The select clause corresponds to the projection operation of the relational
algebra. It is used to list the attributes desired in the result of a query.

FROM: The from clause corresponds to the Cartesian-product operation of the


relational algebra. It lists the relations to be scanned in the evaluation of the
expression.

WHERE: The where clause corresponds to the selection predicate of the relational
algebra. It consists of a predicate involving attributes of the relations that appear
in the from clause.

59
60

You might also like