0% found this document useful (0 votes)
71 views29 pages

L7. Multidimensional Modeling

The document discusses multidimensional modeling techniques for structuring data. Multidimensional models describe measures, which are numerical data points, and dimensions, which provide context for the measures. Dimensions can be organized hierarchically. Data warehouses typically use a star schema with fact and dimension tables to represent multidimensional models, forming a multi-dimensional data cube. Aggregate functions can perform calculations across this cube.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views29 pages

L7. Multidimensional Modeling

The document discusses multidimensional modeling techniques for structuring data. Multidimensional models describe measures, which are numerical data points, and dimensions, which provide context for the measures. Dimensions can be organized hierarchically. Data warehouses typically use a star schema with fact and dimension tables to represent multidimensional models, forming a multi-dimensional data cube. Aggregate functions can perform calculations across this cube.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Multidimensional Modeling

Multidimensional Modeling
• Multidimensional modeling is a technique for structuring
data around the business concepts
• ER models describe “entities” and “relationships”
• Multidimensional models describe “measures” and
“dimensions”

03/12/2023 2
Multi-Dimensional Data
• Measures - numerical (and additive)
data being tracked in business, can be
analyzed and examined
• Dimensions - business parameters that
define a transaction, relatively static
data such as lookup or reference tables
• Example: Analyst may want to view
sales data (measure) by geography, by
time, and by product (dimensions)

03/12/2023 3
The Multi-Dimensional Model

“Sales by product line over the past six months”


“Sales by store between 1990 and 1995”

Store Info Key columns joining fact table


to dimension tables Numerical Measures

Prod Code Time Code Store Code Sales Qty

Fact table for


Product Info
measures

Dimension tables Time Info

...
03/12/2023 4
Dimensional Modeling
• Dimensions are organized into
hierarchies
• E.g., Time dimension: days  weeks 
quarters
• E.g., Product dimension: product  product
line  brand
• Dimensions have attributes
Time StoreID Store
Date City
Month State
Year Country
Region
03/12/2023 5
Dimension Hierarchies
Store Dimension Product Dimension

Total Total

Region Manufacturer

District Brand

Stores Products

03/12/2023 6
Schema Design
• Most data warehouses use a star schema to represent multi-
dimensional model.
• Each dimension is represented by a dimension table that
describes it.
• A fact table connects to all dimension tables with a multiple
join. Each tuple in the fact table consists of a pointer to each of
the dimension tables that provide its multi-dimensional
coordinates and stores measures for those coordinates.
• The links between the fact table in the center and the dimension
tables in the extremities form a shape like a star.

03/12/2023 7
Star Schema (in RDBMS)

03/12/2023 8
Star Schema Example

03/12/2023 9
Star Schema
with Sample
Data

03/12/2023 10
The “Classic” Star Schema
A relational model with a one-to-many relationship
between dimension table and fact table.
 A single fact table, with detail and summary data
 Fact table primary key has only one key column per
dimension
 Each dimension is a single table, highly denormalized
• Benefits: Easy to understand, intuitive mapping between the
business entities, easy to define hierarchies, reduces # of
physical joins, low maintenance, very simple metadata
• Drawbacks: Summary data in the fact table yields poorer
performance for summary levels, huge dimension tables a
problem

03/12/2023 11
Snowflake Schema
• Snowflake schema is a type of star schema but a more
complex model.
• “Snowflaking” is a method of normalizing the dimension
tables in a star schema.
• The normalization eliminates redundancy.
• The result is more complex queries and reduced query
performance.

03/12/2023 12
Sales: Snowflake Schema

Category key
Product category
Brand key Region key
Brand name Region name
Category key

Product key Territory key


Product name
Sales fact Territory name
Product code Region key
Brand key Product key
Time key Salesrep key
Product Customer key Salesperson name
…. Territory key

Salesrep
03/12/2023 13
Snowflaking
• The attributes with low cardinality in each original
dimension table are removed to form separate tables.
These new tables are linked back to the original
dimension table through artificial keys.

Product key Brand key


Product name Category key
Brand name Product category
Product code Category key
Brand key

03/12/2023 14
Snowflake Schema
• Advantages:
• Small saving in storage space
• Normalized structures are easier to update and maintain
• Disadvantages:
• Schema less intuitive and end-users are put off by the
complexity
• Ability to browse through the contents difficult
• Degrade query performance because of additional joins

03/12/2023 15
What is the Best Design?
• Performance benchmarking can be used to determine
what is the best design.
• Snowflake schema: easier to maintain dimension tables
when dimension tables are very large (reduce overall
space). It is not generally recommended in a data
warehouse environment.
• Star schema: more effective for data cube browsing
(less joins): can affect performance.

03/12/2023 16
Aggregates
· Add up amounts for day 1
· In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1

81

03/12/2023 17
Aggregates
· Add up amounts by day
· In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date

03/12/2023 18
Another Example
· Add up amounts by day, product
· In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId

rollup
drill-down

03/12/2023 19
Aggregates
• Operators: sum, count, max, min, median,
average
• “Having” clause
• Using dimension hierarchy
• average by region (within store)
• maximum by month (within date)

03/12/2023 20
Data Cube

Fact table view: Multi-dimensional cube:

dimensions = 2

03/12/2023 21
3-D Cube
Fact table view: Multi-dimensional cube:

day 2

day 1

dimensions = 3

03/12/2023 22
Example
roll-up to region
Dimensions:
NY
ore SF
Time, Product, Store
St roll-up to brand
Attributes:
LA
Product (upc, price, …)
Juice 10
Store …
Product

Milk 34
56

Coke
32
Hierarchies:
Cream
12 Product  Brand  …
Soap
Bread 56 roll-up to week Day  Week  Quarter
M T W Th F S S Store  Region  Country
Time
56 units of bread sold in LA on M

03/12/2023 23
Cube Aggregation: Roll-up
Example: computing sums
day 2 ...
day 1

129
rollup

drill-down
03/12/2023 24
Cube Operators for Roll-up
day 2 ...
day 1
sale(s1,*,*)

129

sale(s2,p2,*) sale(*,*,*)

03/12/2023 25
Aggregation Using Hierarchies

day 2 store
day 1
region

country

(store s1 in Region A;
stores s2, s3 in Region B)

03/12/2023 26
Slicing
day 2

day 1

TIME = day 1

03/12/2023 27
Slicing
&
Pivotin
g

03/12/2023 28
Summary of Operations
• Aggregation (roll-up)
• aggregate (summarize) data to the next higher dimension
element
• e.g., total sales by city, year  total sales by region, year
• Navigation to detailed data (drill-down)
• Selection (slice) defines a subcube
• e.g., sales where city =‘Gainesville’ and date = ‘1/15/90’
• Calculation and ranking
• e.g., top 3% of cities by average income
• Visualization operations (e.g., Pivot)
• Time functions
• e.g., time average

03/12/2023 29

You might also like