0% found this document useful (0 votes)
5 views

Week 3 -Data Warehouse Design

The document outlines data warehouse design principles, focusing on data modeling techniques such as Entity-Relationship (ER) diagrams and dimensional modeling. It explains star and snowflake schemas, detailing their structures, advantages, and limitations, along with the roles of fact and dimension tables in organizing data. A practical exercise is included to demonstrate how to create a data model for an online retail store, emphasizing the iterative nature of data warehousing design.

Uploaded by

moroansoma23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Week 3 -Data Warehouse Design

The document outlines data warehouse design principles, focusing on data modeling techniques such as Entity-Relationship (ER) diagrams and dimensional modeling. It explains star and snowflake schemas, detailing their structures, advantages, and limitations, along with the roles of fact and dimension tables in organizing data. A practical exercise is included to demonstrate how to create a data model for an online retail store, emphasizing the iterative nature of data warehousing design.

Uploaded by

moroansoma23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Week 3: Data Warehouse Design

1. Data Modeling Techniques


• Data Modeling: The process of formally describing the data and information within an
organization. It involves defining entities, their attributes, and the relationships
between them.
• Key Techniques:
o Entity-Relationship (ER) Diagrams:
 A visual representation of data using entities (boxes representing real-
world objects like customers, products), attributes (characteristics of
entities like customer name, product price), and relationships
(connections between entities like "Customer" orders "Product").
 Example:
 Entities: Customer, Product, Order
 Attributes:
 Customer: CustomerID, CustomerName, Address
 Product: ProductID, ProductName, Price
 Order: OrderID, OrderDate, CustomerID (foreign key
referencing Customer), ProductID (foreign key
referencing Product)
 Relationships:
 Customer places Order (one-to-many)
 Product is included in Order (many-to-many)
 ER Diagram Symbols:
 Rectangle: Entity
 Ellipse: Attribute
 Diamond: Relationship
 Lines: Represent relationships (with cardinality notations like
1:1, 1:N, N:M)
o Dimensional Modeling:
 Specifically designed for data warehouses.
 Focuses on organizing data around business dimensions (e.g., Time,
Customer, Product) and measures (e.g., Sales, Revenue, Quantity).
 Aims to simplify data analysis by making it easier to answer business
questions.
2. Star Schema and Snowflake Schema
• Star Schema:
o The simplest and most common data warehouse schema.
o Structure:
 A central fact table containing measures (e.g., sales amount, quantity
sold).
 Multiple dimension tables surrounding the fact table, each representing
a dimension (e.g., Time, Customer, Product).
 The fact table contains foreign keys that link to the primary keys of the
dimension tables.
o Example:
 Fact Table: Sales
 Attributes: SaleID, ProductID, CustomerID, TimeID,
QuantitySold, SalesAmount
 Dimension Tables:
 Product: ProductID, ProductName, Category, Brand
 Customer: CustomerID, CustomerName, City, Country
 Time: TimeID, Date, Year, Month, Day
o Advantages: Simple to understand and query, efficient for basic reporting.
o Limitations: Can lead to data redundancy in dimension tables, limited
flexibility for complex analysis.
• Snowflake Schema:
o An extension of the star schema.
o Structure:
 Dimension tables are further normalized (broken down into smaller,
more granular tables).
 Creates a "snowflake" shape in the diagram.
o Example:
 Product Dimension (in Snowflake):
 Product: ProductID, ProductName
 Category: CategoryID, CategoryName
 Brand: BrandID, BrandName
 (Relationships: Product belongs to Category, Product belongs
to Brand)
o Advantages: Reduces data redundancy, improves data integrity, better for
complex analysis.
o Disadvantages: More complex to design and query than star schema.
3. Fact and Dimension Tables
• Fact Tables:
o Store numerical measurements or facts about the business.
o Contain foreign keys that link to dimension tables.
o Examples:
 Sales: SalesAmount, QuantitySold, UnitCost
 Orders: OrderID, OrderDate, OrderAmount
 Inventory: ProductID, WarehouseID, QuantityInStock
• Dimension Tables:
o Provide context and meaning to the facts.
o Contain attributes that describe the dimensions of the business.
o Examples:
 Customer: CustomerID, CustomerName, Address, City, Country
 Product: ProductID, ProductName, Category, Brand, Color
 Time: TimeID, Date, Year, Month, Day, Hour
 Geography: Country, Region, City
o Key Characteristics:
 Usually contain a single primary key (often a surrogate key - an
artificial key generated for a table).
 Attributes are typically descriptive and non-additive (e.g.,
CustomerName, ProductName).
 Often contain hierarchies (e.g., Country > Region > City).
Practical: Creating a Data Model for a Sample Business Scenario
• Scenario: Let's consider an online retail store.
• Steps:
1. Identify Entities: Customer, Product, Order, Time, Employee
2. Define Attributes:
 Customer: CustomerID, CustomerName, Address, Email, Phone
 Product: ProductID, ProductName, Description, Price, Category,
Brand
 Order: OrderID, OrderDate, OrderAmount, CustomerID, EmployeeID
 Time: TimeID, Date, Year, Month, Day, Hour
 Employee: EmployeeID, EmployeeName
3. Define Relationships:
 Customer places Order (one-to-many)
 Product is included in Order (many-to-many)
 Order is placed by Employee (one-to-many)
4. Create ER Diagram: (Use a tool like Lucidchart or draw it manually)
5. Design Star Schema:
 Fact Table: Sales
 Attributes: SaleID, ProductID, CustomerID, TimeID,
EmployeeID, QuantitySold, SalesAmount
 Dimension Tables: Customer, Product, Time, Employee
6. Consider Snowflake Schema:
 Product Dimension:
 Product: ProductID, ProductName, Description
 Category: CategoryID, CategoryName
 Brand: BrandID, BrandName
 Customer Dimension:
 Customer: CustomerID, CustomerName, Email
 Address: AddressID, Street, City, State, Country
This practical exercise will help you understand how to apply the concepts of data modeling,
star schema, and snowflake schema in a real-world scenario.
Remember: Data warehousing is an iterative process. The design of the data warehouse will
evolve over time as business requirements and data needs change.

You might also like