0% found this document useful (0 votes)
37 views18 pages

COMP 430 Intro. To Database Systems: Denormalization & Dimensional Modeling

This document discusses dimensional modeling, an alternative approach to entity-relationship (ER) modeling for database design. Dimensional modeling emphasizes fast retrieval and aggregation of historical data. It models data with fact tables containing numeric, additive data and dimension tables in one-to-many relationships with facts. Dimensional modeling starts by identifying potential queries and facts, and aims to support each query with one fact table and related dimensions. It may begin with an ER model but then denormalizes into a star or starflake schema to optimize for queries. Facts represent the center of a multidimensional data cube, and dimensions define the axes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views18 pages

COMP 430 Intro. To Database Systems: Denormalization & Dimensional Modeling

This document discusses dimensional modeling, an alternative approach to entity-relationship (ER) modeling for database design. Dimensional modeling emphasizes fast retrieval and aggregation of historical data. It models data with fact tables containing numeric, additive data and dimension tables in one-to-many relationships with facts. Dimensional modeling starts by identifying potential queries and facts, and aims to support each query with one fact table and related dimensions. It may begin with an ER model but then denormalizes into a star or starflake schema to optimize for queries. Facts represent the center of a multidimensional data cube, and dimensions define the axes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

COMP 430

Intro. to Database Systems


Denormalization & Dimensional Modeling
Some consequences of normalization
• Data redundancy is reduced or eliminated.
• Relations are broken into smaller, related tables.
• Using all the attributes from the original relation requires joining
these smaller tables.
Denormalization
Deliberately reintroducing some redundancy, so that we can access
data faster.

DENORMALIZED
DATA AHEAD
Example

Technique:
Add duplicate fields

Technique:
Add computed fields
Example

Technique:
Join tables
Less common denormalization techniques
• Duplicating a commonly-used subset of table fields
• Splitting some table rows into different tables
• Frequently- vs. rarely-used data
• Data for different regions
• Common subclasses of data
When to denormalize?

Typically used when some or all of the following apply:


• Many queries need to join the data
• Joining the data is expensive – uses scans, rather than indices
• Computing derived data is expensive – complex queries or complex functions
Dealing with redundant data
Still want data consistency, but now it requires work.
Is full consistency required at all times?

Techniques:
• Stored procedures act as API for updating DB. They add the redundant data.
• Triggers check (and fix?) consistency.
• Application code carefully maintains consistency during updates.
• Reconcile data as background process.
• Reconcile data during system maintenance.
Dimensional modeling
In a tiny, brief nutshell
An alternative to ER modeling

Only a brief overview.


So, we’ll view it through our lens of ER modeling + denormalization.

Emphasizes decision making & use of historical data


• Fast retrieval & aggregation of data
• Less concern with updating data & maintaining consistency while updating
DB design often resembles multiple
starflakes
Starflake = tree of junction table & child tables in 1-to-many relationships

College(collegeid, …)

Course(crn, …) Student(sid, …, collegeid, …, home_stateid)


Enrollment(crn, sid, …) State(stateid, …, country)

Teach(crn, instr_id, …)
Typical:
• Few cycles.
Instructor(instr_id, …) • Super/sub classes implemented
only with superclass table.
Starflakes can be compressed to stars
Star = junction table & one level of child tables in 1-to-many relationships

Course(crn, …) Student(sid, …, college, …, home_state)


Enrollment(crn, sid, …)

Teach(crn, instr_id, …)
Denormalize by joining each child’s tree.

Instructor(instr_id, …)
Fact & dimension tables
Facts: The junction tables are the most important data – the facts
• E.g.: store purchases, class enrollments, click data
• Generally the largest tables
• Key data often numeric & additive – e.g., quantity bought, cost per unit,
advertisement views

Dimensions: The child tables in 1-to-many relationship with facts


• E.g., stores, customers, sales people, sales period
Dimensional modeling process
• Centered on identifying the business model
• Identifying the potential queries
• Identifying the facts – the data used in such queries
• Each query should use only one fact table & its dimension tables.

• Possibly start with an ER model


• Identify which junction tables serve as fact tables
• Use only surrogate keys
• Often add time dimension
• Denormalize into starflake or star schema
Customer Product
customer_ID sku
customer_name
purchase_profile
ER model description
brand
credit_profile category
address

OrderLine
Store Order
order_id
store_id order_id
sku
store_name customer_id
promotion_id
address store_id
dollars_sold
district clerk_id
units_sold
floor_type date
dollars_cost

Clerk Promotion
clerk_id promotion_id
clerk_name promotion_name
clerk_grade price_type
ad_type
Customer Product
customer_key
customer_name
Dimensional product_key
sku
purchase_profile
credit_profile model description
brand
address category
Order
time_key
Store
store_key
store_key Promotion
clerk_key
store_id promotion_key
product_key
store_name promotion_name
customer_key
address price_type
promotion_key
district ad_type
dollars_sold
floor_type
units_sold
dollars_cost
Time
Clerk time_key
clerk_key SQL_date
clerk_id day_of_week
clerk_name month
clerk_grade
 View fact table as -dimensional data cube

Facts are the data in Each dimension table


the cube. represents a dimension
of the cube.

Facts might be pre-


aggregated along each
dimension or combination
of dimensions.
Sometimes things are still messy

Not all data fit nicely into facts + 1-to-many dimensions.

Leads to exceptions from this simple presentation.

You might also like