Week 3
Week 3
Data Warehousing
Week # 03
2
Data warehouse architecture
4
Data Mart
5
• Fact
• Dimension
• Attributes
• Fact Table
• Dimension Table
9
Fact
• Facts are the measurements/metrics or facts from your
business process. For a Sales business process, a
measurement would be quarterly sales number
10
Dimension
• Dimension provides the context surrounding a business
process event. In simple terms, they give who, what, where of
a fact. In the Sales business process, for the fact quarterly
sales number, dimensions would be
• Who – Customer
• Where – Location
• What – Product
11
Attributes
• The Attributes are the various characteristics of the dimension
in dimensional data modeling.
• In the Location dimension, the attributes can be
• State
• Country
• Zipcode etc.
• Attributes are used to search, filter, or classify facts.
Dimension Tables contain Attributes
13
Fact Table:
• A fact table contains records that combine
attributes from different dimension tables.
• These records allow users to analyze different
aspects of their business, which can aid in
decision-making and improving the business.
14
Dimension table:
Facts
17
Facts
18
Dimensions
19
Dimensions
20
SCD Type 1
• In this, no history is stored. If we take Address Dimension as an
example then in that we do not store any address history of an
individual.
SCD Type 2
• This is the most commonly used type out of all these. In this
approach complete history is maintained along with dates to
identify from when to when the record is valid.
• They are usually called start_date and end_date. If the end date is
empty then that means that is the active record!
SCD Type 3
• In this, only a limited history is stored.
• Let's say a business wants to keep only the current address and
previous address of the individuals and they do not care to store all
of the previous addresses.
26
Conformed Dimension
• A dimension that can be used by multiple facts
and has the same meaning across the model is
called a Conformed Dimension.
• For example, if we have a dimension for a list of
places. This can be used across multiple fact
tables.
27
Junk Dimension
• A junk dimension is the combination of several row-level
cardinality flags and attributes into a single dimension
table rather than modeling them as a separate dimension
table.
• Cardinality:
• it’s the number of distinct values in a table column relative to the number of rows in the table.
Repeated values in the column don’t count.
31
32
Degenerate Dimension
Role-playing Dimension
Normalization
36
De- Normalization
37
Star Schema
• Each dimension in a star schema is represented
with only one-dimension table.
• This dimension table contains the set of attributes.
• The following diagram shows the sales data of a
company with respect to the four dimensions,
namely time, item, branch, and location.
• There is a fact table at the center. It contains the
keys to each of four dimensions.
• The fact table also contains the attributes, namely
dollars sold and units sold.
39
Star Schema