L7. Multidimensional Modeling
L7. Multidimensional Modeling
Multidimensional Modeling
• Multidimensional modeling is a technique for structuring
data around the business concepts
• ER models describe “entities” and “relationships”
• Multidimensional models describe “measures” and
“dimensions”
03/12/2023 2
Multi-Dimensional Data
• Measures - numerical (and additive)
data being tracked in business, can be
analyzed and examined
• Dimensions - business parameters that
define a transaction, relatively static
data such as lookup or reference tables
• Example: Analyst may want to view
sales data (measure) by geography, by
time, and by product (dimensions)
03/12/2023 3
The Multi-Dimensional Model
...
03/12/2023 4
Dimensional Modeling
• Dimensions are organized into
hierarchies
• E.g., Time dimension: days weeks
quarters
• E.g., Product dimension: product product
line brand
• Dimensions have attributes
Time StoreID Store
Date City
Month State
Year Country
Region
03/12/2023 5
Dimension Hierarchies
Store Dimension Product Dimension
Total Total
Region Manufacturer
District Brand
Stores Products
03/12/2023 6
Schema Design
• Most data warehouses use a star schema to represent multi-
dimensional model.
• Each dimension is represented by a dimension table that
describes it.
• A fact table connects to all dimension tables with a multiple
join. Each tuple in the fact table consists of a pointer to each of
the dimension tables that provide its multi-dimensional
coordinates and stores measures for those coordinates.
• The links between the fact table in the center and the dimension
tables in the extremities form a shape like a star.
03/12/2023 7
Star Schema (in RDBMS)
03/12/2023 8
Star Schema Example
03/12/2023 9
Star Schema
with Sample
Data
03/12/2023 10
The “Classic” Star Schema
A relational model with a one-to-many relationship
between dimension table and fact table.
A single fact table, with detail and summary data
Fact table primary key has only one key column per
dimension
Each dimension is a single table, highly denormalized
• Benefits: Easy to understand, intuitive mapping between the
business entities, easy to define hierarchies, reduces # of
physical joins, low maintenance, very simple metadata
• Drawbacks: Summary data in the fact table yields poorer
performance for summary levels, huge dimension tables a
problem
03/12/2023 11
Snowflake Schema
• Snowflake schema is a type of star schema but a more
complex model.
• “Snowflaking” is a method of normalizing the dimension
tables in a star schema.
• The normalization eliminates redundancy.
• The result is more complex queries and reduced query
performance.
03/12/2023 12
Sales: Snowflake Schema
Category key
Product category
Brand key Region key
Brand name Region name
Category key
Salesrep
03/12/2023 13
Snowflaking
• The attributes with low cardinality in each original
dimension table are removed to form separate tables.
These new tables are linked back to the original
dimension table through artificial keys.
03/12/2023 14
Snowflake Schema
• Advantages:
• Small saving in storage space
• Normalized structures are easier to update and maintain
• Disadvantages:
• Schema less intuitive and end-users are put off by the
complexity
• Ability to browse through the contents difficult
• Degrade query performance because of additional joins
03/12/2023 15
What is the Best Design?
• Performance benchmarking can be used to determine
what is the best design.
• Snowflake schema: easier to maintain dimension tables
when dimension tables are very large (reduce overall
space). It is not generally recommended in a data
warehouse environment.
• Star schema: more effective for data cube browsing
(less joins): can affect performance.
03/12/2023 16
Aggregates
· Add up amounts for day 1
· In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
81
03/12/2023 17
Aggregates
· Add up amounts by day
· In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
03/12/2023 18
Another Example
· Add up amounts by day, product
· In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
rollup
drill-down
03/12/2023 19
Aggregates
• Operators: sum, count, max, min, median,
average
• “Having” clause
• Using dimension hierarchy
• average by region (within store)
• maximum by month (within date)
03/12/2023 20
Data Cube
dimensions = 2
03/12/2023 21
3-D Cube
Fact table view: Multi-dimensional cube:
day 2
day 1
dimensions = 3
03/12/2023 22
Example
roll-up to region
Dimensions:
NY
ore SF
Time, Product, Store
St roll-up to brand
Attributes:
LA
Product (upc, price, …)
Juice 10
Store …
Product
Milk 34
56
…
Coke
32
Hierarchies:
Cream
12 Product Brand …
Soap
Bread 56 roll-up to week Day Week Quarter
M T W Th F S S Store Region Country
Time
56 units of bread sold in LA on M
03/12/2023 23
Cube Aggregation: Roll-up
Example: computing sums
day 2 ...
day 1
129
rollup
drill-down
03/12/2023 24
Cube Operators for Roll-up
day 2 ...
day 1
sale(s1,*,*)
129
sale(s2,p2,*) sale(*,*,*)
03/12/2023 25
Aggregation Using Hierarchies
day 2 store
day 1
region
country
(store s1 in Region A;
stores s2, s3 in Region B)
03/12/2023 26
Slicing
day 2
day 1
TIME = day 1
03/12/2023 27
Slicing
&
Pivotin
g
03/12/2023 28
Summary of Operations
• Aggregation (roll-up)
• aggregate (summarize) data to the next higher dimension
element
• e.g., total sales by city, year total sales by region, year
• Navigation to detailed data (drill-down)
• Selection (slice) defines a subcube
• e.g., sales where city =‘Gainesville’ and date = ‘1/15/90’
• Calculation and ranking
• e.g., top 3% of cities by average income
• Visualization operations (e.g., Pivot)
• Time functions
• e.g., time average
03/12/2023 29