Session 4 - Datawarehousing
Session 4 - Datawarehousing
architecture
data warehouse
Data modeling: Schemas
data warehouse
Data modeling: Schemas
• The star schema is the most common schema for dimensional models. It is a fact
table surrounded by multiple dimension tables
• The star schema is a compromise between a fully normalized and a denormalized
model
• Facts are stored in normalized tables, Dimensions, on the other hand, are
denormalized tables containing attributes that are often spread out across multiple
tables if a 3NF data model is used
• Star schema have “flattened” table (denormalized dimensions): hierarchies are
flattened into one table
126
data warehouse
Data modeling: Schemas
data warehouse
Data modeling: Schemas
data warehouse
Data modeling: Schemas
data warehouse
Data modeling: Schemas
data warehouse
Data modeling
data warehouse
Data modeling: Schemas
• Advantages:
• Small savings in storage space
• Normalized structures are easier to update and maintain
• Disadvantages:
• Schema less intuitive and end-users are put off by the complexity
• Ability to browse through the contents difficult
• Degraded query performance because of additional joins
132
data warehouse
Data modeling: Schemas
data warehouse
Data modeling: Fact table sizes
• Maximum number of base fact table records: 1825 × 300 × 4000 × 1 = 2 billion
134
data warehouse
Data modeling: Fact table sizes
data warehouse
Data modeling
Kimball approach
According to Kimball, there are four key decisions that must be made during the
design of a dimensional model:
1. Select the business process
2. Declare the grain
3. Identify the dimensions
4. Identify the facts
It is also important to decide on the duration of the database. Determining how far
back in time you should go for historical data.
Source: Kimball & Ross (2002)
136
data warehouse
Data modeling
• A retail business has 100 grocery stores spread across five states
• Each store has a full complement of departments, including grocery, frozen foods,
dairy, meat, produce, bakery, floral, and health/beauty aids
• Each store has approximately 60,000 individual products, called stock keeping
units (SKUs), on its shelves
• Data is collected at the cash registers as customers purchase products
137
data warehouse
Data modeling
data warehouse
Data modeling
data warehouse
Data modeling
data warehouse
Data modeling
data warehouse
Data modeling
data warehouse
Data modeling
data warehouse
Data modeling
data warehouse
Data modeling
data warehousing
Data Modeling