0% found this document useful (0 votes)

53 views

Lecture 3

The document provides information on data warehouse usage and design. It discusses the advantages of data warehouses including competitive advantage through improved decision making, consistent and high quality centralized data, and tracking of customer and historical data. It also notes potential disadvantages such as additional workload, data inflexibility, and ownership concerns. The document outlines common data warehouse applications and different design approaches and views. It provides examples of star schemas and the use of facts, dimensions, and granularity. It defines natural keys, surrogate keys, and why surrogate keys are used in data warehouses.

Uploaded by

chan chanchan

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Lecture 3

Uploaded by

chan chanchan

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

CST3340

Data Warehouse Usage

and Design

Slides adapted from Jiawei Han, Micheline Kamber, Jian Pei,

(2011), Data Mining: Concepts and Techniques, Third Edition, The
Morgan Kaufmann Series in Data Management System.

CST3340 _ Business Intelligence

Advantages of a Data Warehouse
• Advantages of a Data Warehouse
– Competitive Advantage.
• Gives managers access to data/information for use in the decision-
making process.
– Data Quality and Consistency.
• Data stored in a central location for efficient analysis.
• Data stored in a standard format.
– Customer Relationship Management
• Keeps track of the organization’s customer base.
– Tracks Historical Data
• Allows tracking of trends, patterns and exceptions over time.

CST3340 _ Business Intelligence

Disadvantages of a Data Warehouse
• Disadvantages of a Data Warehouse
– Extra Workload.
• Needs a team of specialist personnel to maintain.
– Data Inflexibility.
• Stores structured data.
• Stored in a standard format.
• Unstructured or semi-structured data not supported.
– Ownership Concerns.
• Departments don’t like sharing their data.
• Departments loose ownership of data
• Centrally stored data can lead to security issues.

CST3340 _ Business Intelligence

Data Warehouse Usage
• A data warehouse can be use for many applications including:
– Reporting and ad hoc queries
• Organisational reports including a variety of graphs and charts,
statistical analysis and ad-hoc queries.
– Multi-dimensional analysis
• Data view from many dimensions (viewpoints).
• OLAP operations including slice/dice, drilling, pivoting.
– Visualisation and Data mining
• Visualisation using graphs and charts; E.G. Tableau.
• Data mining using algorithms such as association rules, clustering,
classification and prediction to identify trend and patterns.

CST3340 _ Business Intelligence

Design Views of a Data Warehouse
• Different design views
– Top-down view.
• Overall view of organizational data requirements.
• Selection of the relevant data/information.
– Data source view.
• Overall view of data being captured, stored and managed by
operational systems.
– Data warehouse view.
• view of fact and dimension tables.
– Business query view
• Overall view of the end-user's data requirements.

CST3340 _ Business Intelligence

Data Warehouse Design
Top-Down Approach

• Design.
– Data Warehouse designed for whole organisation.
– Enterprise Data Warehouse(EDW) built first.
– Data Marts created as subsets of the EDW.
– Mature Design.

• Advantages.
• Systematic solution
• Minimises integration problems

• Disadvantages.
• Expensive.
• Long development time.
• Lacks flexibility.
CST3340 _ Business Intelligence
Data Warehouse Design
Bottom-Up Approach
• Design.
– Starts with experiments and prototypes.
– Departmental data marts built first.
– EDW – Combination of departmental data marts
– Rapid Design.

• Advantages.
• Design, development and deployment of independent data marts.
• Flexibly.
• Low cost.
• Rapid return on investment.
• Disadvantages.
• Integration problems.
CST3340 _ Business Intelligence
Data Warehouse Development

Enterprise Data
Departmental Warehouse (EDW)
Data Marts

Departmental Departmental
Data Marts Data Marts

Enterprise Data Model

CST3340 _ Business Intelligence

Data Warehouses – Conceptual Model
• Conceptual Model: uses dimensions & facts
– Star schema: A single fact table surrounded by a set of
dimension tables. Represented by a star shape.
– Snowflake schema: An extension of a star schema where
some dimension tables are split into a set of smaller tables
by normalization. Represented by a snowflake shape.
– Fact constellations (Galaxy): Multiple connected star
schemas. Several fact tables share the same dimension
tables. Represented as a collection of star shapes.

CST3340 _ Business Intelligence

Example of a Star Schema
Product
Time Product_SK
Time_SK Sales Fact Table Name
Day Time__SK Subcategory
Week Product_SK Category
Month Location_SK Supplier_Type
Quarter Units_sold
Year Unit_cost Location
Revenue Location_SK
Profit Store
Key: Average_Sales Street
Surrogate Keys City
Facts
District
Country
CST3340 _ Business Intelligence
Star Schema

• Used to model the data in a Data Warehouse from

a decision-makers view of the business.
• Represents a subject e.g. Sales
• One fact table
• Multiple dimension tables
• Allows different views of the business facts
• Allows user to filter, aggregate, drill down & slice
and dice the business fact

CST3340 _ Business Intelligence

Fact Tables

• A fact table typically has two types of data:

• numeric facts (measures) containing data to be analysed.
• foreign keys linking the dimension tables.
• Facts (measures) can be
• Detail level data.
• Data that have been aggregated. E.g. Sum, average etc.
• Most useful are numeric and additive.
• Each row in a fact table corresponds to an instance
of the subject.
• All the measurements in a fact table must be of the
same grain which is defined by the dimension tables.

CST3340 _ Business Intelligence

Dimension Tables
• Represent the different views of the business facts
(measures).
• Allows users to browse fact data from different angles
– e.g. time, product, location.
• Can be used as a filter to minimise the rows of data
within a fact table.
• Allow users to aggregate fact data e.g. consider
quarterly sales rather than daily sales.
• Allow users to analyse more detailed data e.g. sales at
individual stores rather than sale in a particular city.

CST3340 _ Business Intelligence

Granularity
• The level of aggregation of the data in the fact
table.
• Define by the lowest level of detail in the
dimension tables
• E.g. Sales Schema:
– Time : daily; Location : Individual store; Product :
product name.
– Therefore each row in the sales fact table represents
the daily sales of a particular product at individual
stores.
CST3340 _ Business Intelligence
Example of a granularity in a Star
Schema
Lowest level for each Product_SK
dimension: Sales Fact Table
Time : daily; Name
Time__SK
Location : Individual Brand
store; Product_SK
Product : product Location_SK
name. Location_SK
Units_sold
Unit_cost Store
Therefore each row in
the sales fact table Revenue Street
represents the daily
Profit
sales of a particular
product at individual Average_Sales Time_SK
stores Day
Week

CST3340 _ Business Intelligence

Natural keys
• Also known as Production keys, Intelligent
keys, Smart keys.
• Natural key can represent the data being
stored. E.g. Student Id – M00123456
• Can be imported from the operational systems
data.

CST3340 _ Business Intelligence

Surrogate Keys
• Also known as Integer keys, Artificial keys,
Non-intelligent keys, Meaningless keys.
• Do not have any meaning about the data.
• Used as the primary keys of the dimension
tables.
• Usually generated by the data warehouse as
data added to the dimension table.
• Usually sequential numeric numbers.

CST3340 _ Business Intelligence

Surrogate Keys Usage

A surrogate key is used as the unique

identifier for the dimension tables.
– Replaces the source data primary keys (business/natural
keys)
– Protect against changes in source data systems
– Acts as a buffer between the data warehouse and the
source data systems.
– Allows integration from multiple data sources.
– Enable rows that do not exist in the source data.
– Track changes over time (e.g. new customer instances
when addresses change)
– Replace text keys with integers for efficiency

CST3340 _ Business Intelligence

Surrogate Keys Usage Cont.

A surrogate key is used as the unique

identifier for the dimension tables.
– Appears as foreign keys in the corresponding data
warehouse fact table.
– Primary key for the fact table is usually the composite
key made up of the foreign keys (surrogate keys) from
the dimension tables.
– The fact table may have its own surrogate key.

CST3340 _ Business Intelligence

Advantages of a Surrogate Keys.

A surrogate key is usually a

sequential numeric number.
– Saves storage space.

– Allow for faster joins during data processing,

– Allow for handling slowly changing dimensions.

• E.g. Allow customers to change billing address. The surrogate key
can change while the natural key (Customer ID) remains the same.

CST3340 _ Business Intelligence

Snowflake Schema
• The snowflake schema is:
– An extension of the star schema.
– One fact table.
– Dimension tables can be replaced by a set of
smaller normalised tables.
– Allows for more detailed dimensions
– Reduces storage space
– Increases processing time (more table joins)

CST3340 _ Business Intelligence

Example of a Snowflake Schema
Product Supplier
Time
Product_SK Supplier_
Time_SK Sales Fact Table ID
Name
Day Time__SK City
Subcategory
Week Product_SK Type
Category
Month Location_SK
Supplier_ID
Quarter Units_sold
Year Unit_cost Location
Revenue
Location_SK City_ID
Profit
Store City
Key: Average_Sales
Street District
Surrogate Keys
City_ID Country
Facts

CST3340 _ Business Intelligence

Fact Constellation

• A fact constellation is:

• Multiple connected star schemas
• Several fact tables share some dimensions
• Allow a more flexible schema
• More complex queries
• More processing time

CST3340 _ Business Intelligence

Example of Fact Constellation
Product
Time
Product_SK
Time_SK Sales Fact Table Delivery Fact
Name Table
Day Time__SK
Subcategory Time__SK
Week Product_SK
Category Product_SK
Month Location_SK
Supplier_Type To_Location_SK
Quarter Units_sold
Location From_Location_SK
Year Unit_cost
Location_SK Quantity
Revenue
Store Revenue
Profit
Street
Average_Sales
City
Key:
District
Surrogate Keys
Facts Country
CST3340 _ Business Intelligence
Reading
• Chapter 4, section 4.2:
– Jiawei Han, Micheline Kamber, Jian Pei, (2011),
Data Mining: Concepts and Techniques, Third
Edition, The Morgan Kaufmann Series in Data
Management System.

CST3340 _ Business Intelligence

Dim Modelling Part 1 -Sh24
No ratings yet
Dim Modelling Part 1 -Sh24
50 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
No ratings yet
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
59 pages
BusinessIntelligence 2023
No ratings yet
BusinessIntelligence 2023
36 pages
DataMining and Data Warehousing
No ratings yet
DataMining and Data Warehousing
96 pages
APznzab3upw_UOf0tS71yzluuvSezhLOcz0V7YImO44BKlMzoQgANMOu408H90gWZEJRzh0QRc8b5XMYwXV25p9Q4tzh7igo57bYxI3CvqCHVgm4M1pnEXoAEjP5LvnGF9SXNlLIy347ksJ1-4jgkX6Ti8kztG1r4z60z674JDmz2y3qz0AQ66NvgOVcgnbL55H7P0DJyD6aBGp
No ratings yet
APznzab3upw_UOf0tS71yzluuvSezhLOcz0V7YImO44BKlMzoQgANMOu408H90gWZEJRzh0QRc8b5XMYwXV25p9Q4tzh7igo57bYxI3CvqCHVgm4M1pnEXoAEjP5LvnGF9SXNlLIy347ksJ1-4jgkX6Ti8kztG1r4z60z674JDmz2y3qz0AQ66NvgOVcgnbL55H7P0DJyD6aBGp
43 pages
Unit-I: Introduction and Data Warehousing
No ratings yet
Unit-I: Introduction and Data Warehousing
17 pages
The Key in Business Is To Know Something That Nobody Else Knows.
No ratings yet
The Key in Business Is To Know Something That Nobody Else Knows.
43 pages
Designing The Data Warehouse - Part 1
100% (2)
Designing The Data Warehouse - Part 1
45 pages
ch3
No ratings yet
ch3
60 pages
Lect 14 DM
No ratings yet
Lect 14 DM
33 pages
21IS503 UnitI LM1
No ratings yet
21IS503 UnitI LM1
28 pages
Introduction To Data Warehouses. Data Warehouse Development Lifecycle (Kimball's Approach)
No ratings yet
Introduction To Data Warehouses. Data Warehouse Development Lifecycle (Kimball's Approach)
29 pages
UNIT-1 Data Warehousing Part-III
No ratings yet
UNIT-1 Data Warehousing Part-III
68 pages
1
No ratings yet
1
35 pages
L7. Multidimensional Modeling
No ratings yet
L7. Multidimensional Modeling
29 pages
DW Unit IV Notes
No ratings yet
DW Unit IV Notes
36 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
26 pages
Assignment - 2 DWH
No ratings yet
Assignment - 2 DWH
13 pages
DW Life Cycle
No ratings yet
DW Life Cycle
114 pages
Unit – I (1)
No ratings yet
Unit – I (1)
65 pages
Unit 2.4 Star SnowFlake Schema ETl Process
No ratings yet
Unit 2.4 Star SnowFlake Schema ETl Process
14 pages
Datawarehouse Intro Slides
No ratings yet
Datawarehouse Intro Slides
33 pages
DWDM Unit 1----
No ratings yet
DWDM Unit 1----
23 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
41 pages
2-Datawarehousing Schema and Architecture-11!08!2021 (11-Aug-2021) Material I 11-Aug-2021 Datawarehousing - Introductory Slides
No ratings yet
2-Datawarehousing Schema and Architecture-11!08!2021 (11-Aug-2021) Material I 11-Aug-2021 Datawarehousing - Introductory Slides
90 pages
F'Hysjc 1: Models
No ratings yet
F'Hysjc 1: Models
7 pages
Building A Data Warehouse With SQL Server: Presented by John Sterrett
No ratings yet
Building A Data Warehouse With SQL Server: Presented by John Sterrett
28 pages
DW - Course Information: - Teachers
No ratings yet
DW - Course Information: - Teachers
18 pages
DMDW Ref
No ratings yet
DMDW Ref
26 pages
CT075!3!2-DTM-Topic 7 - Data Warehouse
No ratings yet
CT075!3!2-DTM-Topic 7 - Data Warehouse
35 pages
Data Warehouse
No ratings yet
Data Warehouse
49 pages
Principles of Dimensional Modeling
No ratings yet
Principles of Dimensional Modeling
9 pages
Presentation DW DM
No ratings yet
Presentation DW DM
132 pages
Lec 5,6,7,8 DW Revison
No ratings yet
Lec 5,6,7,8 DW Revison
31 pages
Elective-I Advanced Database Management Systems
No ratings yet
Elective-I Advanced Database Management Systems
67 pages
Lecture 6
No ratings yet
Lecture 6
15 pages
dwdm011
No ratings yet
dwdm011
48 pages
Data Model
100% (1)
Data Model
11 pages
An Overview of Data Warehousing and OLAP Technology What Is Decision Support?
No ratings yet
An Overview of Data Warehousing and OLAP Technology What Is Decision Support?
4 pages
Unit 1
No ratings yet
Unit 1
22 pages
To DSS & Data Warehousing Concepts
No ratings yet
To DSS & Data Warehousing Concepts
41 pages
Chapter 2
No ratings yet
Chapter 2
33 pages
MVA Implementing A Data Warehouse With SQL Jump Start Mod 1 Final
No ratings yet
MVA Implementing A Data Warehouse With SQL Jump Start Mod 1 Final
37 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
58 pages
20bcs087 Akhil Kholia
No ratings yet
20bcs087 Akhil Kholia
28 pages
Chapter-04-Analisis Dan Drfinisi Kebutuhan Datawarehouse
No ratings yet
Chapter-04-Analisis Dan Drfinisi Kebutuhan Datawarehouse
56 pages
DWDM Mid 1
No ratings yet
DWDM Mid 1
10 pages
OLAP Vs OLTP 1635783645
No ratings yet
OLAP Vs OLTP 1635783645
44 pages
06 Data Warehouse Design and Analytics
No ratings yet
06 Data Warehouse Design and Analytics
36 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
39 pages
DW Olap
No ratings yet
DW Olap
57 pages
2010-SQL Saturday WM Presentation
No ratings yet
2010-SQL Saturday WM Presentation
20 pages
Entity Relational Modeling Vs
No ratings yet
Entity Relational Modeling Vs
9 pages
UNIT II
No ratings yet
UNIT II
59 pages
Unit II Notes
No ratings yet
Unit II Notes
64 pages
04a - PPT4 - Business Requirements - R0
No ratings yet
04a - PPT4 - Business Requirements - R0
20 pages
DWDM(Farhaan)
No ratings yet
DWDM(Farhaan)
9 pages
Data Warehousing & OLAP (Business Intellegent)
No ratings yet
Data Warehousing & OLAP (Business Intellegent)
31 pages
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
From Everand
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
alasdair gilchrist
No ratings yet
Syntax: Database Tables
100% (1)
Syntax: Database Tables
103 pages
Rdbms Lab
No ratings yet
Rdbms Lab
10 pages
DelphiInDepth FireDAC CaryJensen 2017 TableOfContents - pdf729567237
No ratings yet
DelphiInDepth FireDAC CaryJensen 2017 TableOfContents - pdf729567237
12 pages
Practical Research 2: Quarter 1 - Module 2
100% (3)
Practical Research 2: Quarter 1 - Module 2
37 pages
IE Sem1 MLE5004 en Vancea 2023 7679
No ratings yet
IE Sem1 MLE5004 en Vancea 2023 7679
6 pages
Narrative Report
No ratings yet
Narrative Report
19 pages
10th Lesson Plan - Statistics
No ratings yet
10th Lesson Plan - Statistics
5 pages
Chapter 3: Formulating The Research Problem
No ratings yet
Chapter 3: Formulating The Research Problem
11 pages
Report by GROUP 3
No ratings yet
Report by GROUP 3
15 pages
Data Structure: Linked List
100% (1)
Data Structure: Linked List
15 pages
Analyzing Market Research Firms in BD, Assignment
No ratings yet
Analyzing Market Research Firms in BD, Assignment
11 pages
ST Ceph Storage QCT Object Storage Reference Architecture f7901 201706 v2 en
No ratings yet
ST Ceph Storage QCT Object Storage Reference Architecture f7901 201706 v2 en
56 pages
Module 4 - Normalization
No ratings yet
Module 4 - Normalization
141 pages
brio
No ratings yet
brio
3 pages
Dipika Format
No ratings yet
Dipika Format
6 pages
Science Scope and Sequence f-6
No ratings yet
Science Scope and Sequence f-6
6 pages
10987C ENU PowerPoint Day 3
No ratings yet
10987C ENU PowerPoint Day 3
125 pages
Capstone Project - Guidelines On Synopsis Report
No ratings yet
Capstone Project - Guidelines On Synopsis Report
2 pages
Lecture 4 Linked Linear List Representation
No ratings yet
Lecture 4 Linked Linear List Representation
18 pages
Input/Output: Operating Systems CSE 4300
No ratings yet
Input/Output: Operating Systems CSE 4300
74 pages
w564 Ny-series Industrial Pc Platform Troubleshooting Users Manual en (1)
No ratings yet
w564 Ny-series Industrial Pc Platform Troubleshooting Users Manual en (1)
800 pages
2018 Development - of - Modular - Cable-Driven - Parallel - Robotic - Systems
No ratings yet
2018 Development - of - Modular - Cable-Driven - Parallel - Robotic - Systems
13 pages
Top 10 Strategic Technology Trends For 2019 A Gartner Trend Insight Report
No ratings yet
Top 10 Strategic Technology Trends For 2019 A Gartner Trend Insight Report
18 pages
Data Science Task 1
No ratings yet
Data Science Task 1
12 pages
Bisleri Competitor Analysis
No ratings yet
Bisleri Competitor Analysis
34 pages
UnderstandingQVD PDF
No ratings yet
UnderstandingQVD PDF
48 pages
Presentation, Analysis and Interpretation of Data
No ratings yet
Presentation, Analysis and Interpretation of Data
10 pages
Linked List Ques and Ans
No ratings yet
Linked List Ques and Ans
12 pages
Saga Distributed Transactions Pattern
No ratings yet
Saga Distributed Transactions Pattern
7 pages
sports ai
No ratings yet
sports ai
14 pages