DW 2 - Conceptual Data WareHouse Design
DW 2 - Conceptual Data WareHouse Design
2
Conceptual Modeling of Data Warehouses
➔ Conceptual modeling DW performs a transformation of the semi-formal business
requirements specification into a formalized conceptual multidimensional schema
with the highest-level relationships between the different entities.
➔ Characteristics of the conceptual data model in the DW:
● It contains the essential entities and the relationships among them.
● No attribute is specified.
● No primary key is specified.
3
Example
Conceptual schema of the
Northwind data warehouse
4
Conceptual Modeling of Data Warehouses: Terminologies
• A schema is composed of a set of dimensions and a set of facts
(measures).
• A fact consists of quantifying values stored in measures and a qualifying
context which is determined through dimension levels.
• A dimension is composed of either one level or one or more hierarchies.
• A hierarchy is in turn composed of a set of levels.
• A level is analogous to an entity type in the ER model.
• Instances of a level are called members.
• A level has a set of attributes that describe the characteristics of their
members (additional information related to a dimension level).
•
5
Example
6
Multidimensional Schema
➔ Star schema consists of a ➔ Snowflake schema is a variation on the
fact table with a single table star schema in which the dimensional
for each dimension. tables from a star schema are organized
into a hierarchy by normalizing them
7
Example: Star schema
A star schema for sales in a manufacturing company. The sales fact table
include quantity, price, and other relevant metrics. SALESREP, CUSTOMER,
PRODUCT, and TIME are the dimension tables.
8
Example: Snowflake schema
In the snowflake schema, the normalized version now extends to eleven tables
and the attributes with low cardinality in each original dimension tables are
removed to form separate tables
9
Hierarchies
➔ are key elements in analytical applications, since they provide the means to
represent the data under analysis at different abstraction levels.
➔ There exist many kinds of hierarchies.
◆ Balanced Hierarchies
◆ Unbalanced Hierarchies
◆ Generalized Hierarchies
◆ Alternative Hierarchies
◆ Parallel Hierarchies
◆ Nonstrict Hierarchies
10
Balanced Hierarchies
A balanced hierarchy has only one path at the schema level, where all
levels are mandatory.
Example: balanced hierarchy in which there is the same number of levels
from each individual product to the root of the hierarchy.
11
Unbalanced Hierarchies
➔ An unbalanced hierarchy has only one path at the schema level, where
at least one level is not mandatory.
➔ At the instance level, there can be parent members without associated
child members.
➔ Unbalanced hierarchies include a special case that is recursive
hierarchies, also called parent-child hierarchies where the same level is
linked by the two roles of a parent-child relationship.
12
Example
a. A hierarchy schema
in which a bank is composed
of several branches,
where a branch may have
agencies; further, an agency
may have ATMs.
13
Example
14
Generalized Hierarchies
Hierarchies are called generalized hierarchies when the members of a level
are of different types.
Example:
Customers can be either companies or persons. measures pertaining
to customers must be aggregated differently according to the customer type,
e.g. for companies the aggregation path is Customer → Sector → Branch,
while for persons it is Customer → Profession → Branch
15
Alternative Hierarchies
➔ Alternative hierarchies represent the situation where at the schema level,
there are several nonexclusive hierarchies that share at least the leaf level.
➔ A child member is associated with more than one parent member and these
parent members belong to different levels.
➔ Alternative hierarchies are needed when we want to analyze measures from
a unique perspective (e.g., time) using alternative aggregations.
➔ In a generalized hierarchy, a child member is related to only one of the
paths, whereas in an alternative hierarchy, a child member is related to all
paths, and the user must choose one of them for analysis
16
Example
17
Parallel Hierarchies
➔ Parallel hierarchies arise when a dimension has several hierarchies
associated with it, accounting for different analysis criteria.
➔ The component hierarchies may be of different kinds.
➔ Parallel hierarchies can be dependent or independent depending on
whether the component hierarchies share levels.
18
Example
An example of a dimension that has two parallel independent hierarchies.
The hierarchy ProductGroups is used for grouping products according to
categories or departments, while the hierarchy DistributorLocation groups
them according to distributors’ divisions or regions.
19
Nonstrict Hierarchy
➔ A hierarchy that has at least one many-to-many relationship is called
nonstrict; otherwise, it is called strict.
➔ The fact that a hierarchy is strict or not is orthogonal to its kind
Example:
a nonstrict hierarchy where
an employee may be
assigned to several cities.
20
Double- Counting Problem
21
Transforming a nonstrict hierarchy into a strict one
➔ Creating a new parent member for each group of parent members
linked to a single child member in a many-to-many relationship.
Ex: A new member that represents the three cities Atlanta, Orlando, and
Tampa will be created. a new member must also be created in in the state
level, since the three cities belong to two states.
➔ ignoring the existence of several parent members and to choose one
of them as the primary member.
Ex: may choose the city of Atlanta.
22
Distributing Attribute
indicate how measures are distributed between several parent members
for many-to-many relationships.
one-third of the value of the measure will be accounted for each city.
24
Transforming into independent dimensions.
only be applied when the exact distribution of the measures is known.
Ex: when the amounts of salary paid for working in the different sections
are known.
25
Facts with Multiple Granularities
26
Multidimensional Normal Forms (MNFs)
➔ ensure correct measure aggregation in the presence of the complex hierarchies.
➔ requires each measure to be uniquely identified by the set of associated leaf levels.
➔ 1MNF is the basis for correct schema design.
27
Example
double counting!!!
28
Many-to-Many Dimensions
➔ In a many-to-many dimension, several members of the dimension
participate in the same fact member.
29
30
31
32
Nhân bản – Phụng sự – Khai phóng
Computer Graphics 33