Introduction To Data Warehouse
Introduction To Data Warehouse
to Data
Warehouse
Second release
th
30 Jan, 2025
2
Contents
Data Warehouse ..................................................................................................................... 4
Characteristics of Data Warehouse ..................................................................................... 5
Data Mart ................................................................................................................................ 6
Characteristics of Data Mart ................................................................................................ 7
Operational Data Store (ODS).................................................................................................. 8
Characteristics of ODS ........................................................................................................ 9
Key Differences .................................................................................................................... 10
Approaches of Data Warehouse ............................................................................................ 11
Inmon Approach ................................................................................................................ 12
The key characteristics of the Inmon approach .................................................................. 13
Kimball Approach .............................................................................................................. 14
The key characteristics of the Kimball approach ................................................................ 15
Key Differences .................................................................................................................... 16
Dimensions & Facts .............................................................................................................. 17
Types of Data Warehouse Schemas ...................................................................................... 18
Star Schema ...................................................................................................................... 19
Components of a Star Schema ......................................................................................... 20
Snowflake Schema ............................................................................................................ 22
Components of a Snowflake Schema ............................................................................... 23
Galaxy Schema ................................................................................................................. 25
Components of a Galaxy Schema ..................................................................................... 26
Dimension Tables in a Data Warehouse ................................................................................ 27
Conformed Dimensions .................................................................................................... 28
Non-Conformed Dimensions ............................................................................................. 29
Degenerate Dimensions .................................................................................................... 30
Role-Playing Dimensions................................................................................................... 31
Junk Dimensions ............................................................................................................... 32
Factless Fact Tables .......................................................................................................... 33
Bridge Table....................................................................................................................... 34
Slowly Changing Dimension (SCD) .................................................................................... 35
Type 0 SCD: No Changes.................................................................................................. 36
3
Data Warehouse
• A Data Warehouse is a centralized
repository that stores data from various
sources in a single location.
Characteristics of Data
Warehouse
• Centralized repository
• Stores data from multiple sources
• Designed for business intelligence and
analytics
• Contains historical data
• Supports complex queries and analysis
6
Data Mart
• A Data Mart is a subset of a Data
Warehouse that contains a specific set of
data for a particular business area or
department.
Characteristics of ODS
• Stores current and near-real-time data
• Designed for operational reporting and
analysis
• Contains a small amount of historical data
• Optimized for fast data ingestion and
query performance
• Supports real-time or near-real-time data
integration
10
Key Differences
• A Data Warehouse is a centralized
repository that stores data from multiple
sources, while a Data Mart is a subset of
a Data Warehouse that contains a
specific set of data for a business area or
department.
Approaches of Data
Warehouse
• Inmon Approach
• Kimball Approach
12
Inmon Approach
Kimball Approach
Key Differences
• Inmon focuses on an enterprise-wide data
warehouse, while Kimball focuses on
smaller, independent data marts.
Types of Data
Warehouse Schemas
• Star Schema
• Snowflake Schema
• Galaxy Schema / Fact Constellation
19
Star Schema
• A star schema is a type of data
warehouse modeling that consists of a
central fact table surrounded by dimension
tables.
• Dimension Tables:
o Descriptive Table: Provide
characteristics that describe the data in
the fact table.
o Hierarchical Structure: Often have a
hierarchal structure, such as Customer
dimension with levels like Customer ID,
Country, Region, and City.
21
Snowflake Schema
• A snowflake schema is a variation of the
star schema, where each dimension
table is further divided into multiple related
tables.
Year
Month Table:
Month Key (Primary Key),
Short Month Name,
Long Month Name,
Quarter
25
Galaxy Schema
• A galaxy schema is a type of data
warehouse modeling that consists of
multiple fact tables and dimension
tables.
Dimension Tables in a
Data Warehouse
• In a data warehouse, dimension tables
are used to describe the data in the fact
tables.
Conformed Dimensions
• A Conformod dimension is a dimension
table that is used across multiple fact
tables, ensuring consistency and data
integrity.
• The same dimension attributes are used in
all associated fact tables.
• The values in the dimension table are
consistent across different fact tables.
• The level of detail in the dimension table
is appropriate for all fact tables.
• Example: Consider a data warehouse with
two fat tables: Sales and Returns.
• Conformed dimension: Customer
Attributes: {Customer ID, Customer Name,
Address, City, Country} Ensures
consistent customer information across
both fact tables.
29
Non-Conformed Dimensions
• A Non-conformed dimension is a
dimension table that has different
attributes or granularity when used in
different fact tables.
• It provides a unique view of the data for a
specific business process.
• The values in the dimension table may not
be consistent across all fact tables.
• The level of detail in the dimension table
may vary between fact tables.
• Example: Consider Sales Fact Table might
include Product Price, while Returns might
include Return Reason.
Sales might have a product-level
granularity, while Returns might have a
product- variant level granularity.
30
Degenerate Dimensions
• A Degenerate dimension is a dimension
table that is entirely composed of keys
from other dimension tables.
• It doesn't have its own unique attributes or
data, but rather serves as a bridge
between other dimension tables and the
fact table.
• Its primary key is typically composed of
foreign keys from other dimension tables.
• Example: Consider a Sales fact table with
columns like {Order ID, Product ID,
Customer ID, and Sales Amount}.
If the Product Category is not a significant
dimension for analysis, it can be
represented as a degenerate dimension.
Degenerate dimension: Product
Category
Attributes: {Product ID (Primary Key),
Category ID (Foreign key to a Category
dimension)}
31
Role-Playing Dimensions
• A Role-playing dimension is dimension
table that plays multiple roles in a data
warehouse
• Reduces the number of dimension tables
required.
• It can provide different perspectives on the
same data.
• The level of detail in the dimension can be
adjusted to suit different analytical needs.
• Example: Date Dimension.
A Date Dimension can be used in multiple
roles:
o Date: For analyzing data by date.
o Month: For analyzing data by month.
o Quarter: For analyzing data by quarter.
o Year: For analyzing data by year.
32
Junk Dimensions
• Junk dimensions Are used to store low-
cardinality, categorical data that is not
suitable for inclusion in other dimensions.
• These dimensions often contain attributes
that are frequently filtered or used in
reporting, but do not have a significant
impact on the overall data model.
• They contain categorical data, such as
flags, indicators, or codes.
• Can simplify the data model by
consolidating multiple attributes into a
single dimension.
• Example: A Customer dimension might
include attributes like {Customer ID,
Customer Name, Address, and City}.
If there are a few common customer
types, such as “Retail”, “Wholesale”, and
“Government”, they can stored in a junk
dimension called Customer Type.
33
Bridge Table
• Bridge tables in data warehouse serve a
similar purpose to their counterparts in
transactional databases: they facilitate
many-to-many relationships between
dimensions. However, their design and
usage often differ due to the analytical
nature of data warehouse.
• Bridge tables typically connect two or
more dimensions.
• Example: In a sales data warehouse, a
bridge table might connect the “Product”
dimension, the “Customer” dimension, and
the “Order” dimension to represent the
products purchased by customers in
different orders.
35
Surrogate Key
• A surrogate key is a system-generated,
unique identifier for each record in a
dimension table.
Composite Key
• A composite key is a combination of two
or more columns that uniquely identify a
record.
Primary Key
• The primary key is a column (or set of
columns) that uniquely identifies each row
in a table.
Foreign Key
• A foreign key is a column in a fact table
that references the primary key of a
dimension table.
• Example: Version_Number or
Effective_Date to distinguish between
historical and current records.
50
Database Triggers
▪ Triggers can be set up on tables to
capture changes (inserts, updates,
deletes) and log them into a
separate change table.
▪ This method is straightforward but
can introduce overhead on the
source database.
Log-Based CDC
▪ This method involves reading the
database transaction logs to
capture changes.
▪ It is often more efficient than
triggers because it does not add
overhead to the database
operations.
▪ Log-based CDC can capture
changes in real-time and is
commonly used in enterprise-level
solutions.
60
Timestamp-Based CDC
• In this approach, a timestamp column is
added to the source tables to track when
records were last modified.
• The CDC process queries the source
tables for records with timestamps greater
than the last processed timestamp.
• This method is simple but may not capture
all changes if multiple updates occur
within the same timestamp.
Change Data Tables
• Some databases provide built-in features
for CDC, where changes are automatically
tracked and stored in change data tables.
For example, SQL Server has a feature
called Change Data Capture that allows
for easy tracking of changes.
61
ETL Tools
• Many Extract, Transform, Load (ETL) tools
offer built-in CDC capabilities, allowing
organizations to configure CDC as part of
their data integration workflows.
• These tools can handle various methods
of CDC and provide a user-friendly
interface for managing data flows.
62
Kerolos Alfons
@LinkedIn