Data Warehousing Logical Design: Mirjana Mazuran Mazuran@elet - Polimi.it
Data Warehousing Logical Design: Mirjana Mazuran Mazuran@elet - Polimi.it
Mirjana Mazuran
[email protected]
1/23
Outline
2/23
Introduction
Logical design
3/23
ROLAP
Star schema
4/23
ROLAP
Snowflake schema
Ex Each (primary) dimension is represented by a relation:
RO the primary key of the relation is the primary key of the
dimension
RO the attributes of the relation directly depend by the primary key
RO a set of foreign keys is used to access information at different
levels of aggregation. Such information is part of the
secondary dimensions and is stored in dedicated relations
Ex A fact is represented by a relation such that:
RO the primary key of the relation is the set of primary keys
imported from all and only the primary dimension tables
RO the attributes of the relation are the measures of the fact
6/23
Exercise 1: A possible solution
Snowflake schema
FACT Sales
MEASURES Quantity, Cost
DIMENSIONS Customer, Area, Time, Wine → Class
Customer Wine
CustomerCode WineCode
Name ClassCode
Address Name
SalesTable
Phone Vintage
WineCode
BottlePrice
CustomerCode
CasePrice
OrderTimeCode
AreaCode
Time Class
Quantity
TimeCode ClassCode
Cost
Date Name
Year Region
Area
AreaCode
Description
7/23
Exercise 2
Real estate agency
Let us consider the case of a real estate agency whose database is
composed by the following tables:
OWNER (IDOwner, Name, Surname, Address, City, Phone)
ESTATE (IDEstate, IDOwner, Category, Area, City, Province,
Rooms, Bedrooms, Garage, Meters)
CUSTOMER (IDCust, Name, Surname, Budget, Address, City,
Phone)
AGENT (IDAgent, Name, Surname, Office, Address, City,
Phone)
AGENDA (IDAgent, Data, Hour, IDEstate, ClientName)
VISIT (IDEstate,IDAgent, IDCust, Date, Duration)
SALE (IDEstate,IDAgent, IDCust, Date, AgreedPrice,
Status)
RENT (IDEstate,IDAgent, IDCust, Date, Price, Status,
Time)
8/23
Exercise 2
Real estate agency
Ex Goal:
RO Provide a supervisor with an overview of the situation. The
supervisor must have a global view of the business, in terms of
the estates the agency deals with and of the agents’ work.
Ex Questions:
1. Design a conceptual schema for the DW.
2. What facts and dimensions do you consider?
3. Design a Star Schema or Snowflake Schema for the DW.
Ex Write the following SQL queries:
RO How many customers have visited properties of at least 3
different categories?
RO What is the average duration of visits per property category?
RO Who has paid the highest price among the customers that
have viewed properties of at least 3 different categories?
RO Who has bought a flat for the highest price w.r.t. each month?
RO What kind of property sold for the highest price w.r.t each city
and month?
9/23
Exercise 2: A possible solution
Facts and dimensions
Points 1 and 2 are left as homework. In particular it is required to
discuss the facts of interest (with respect to your point of view)
and then define, for each fact, the attribute tree, dimensions and
measures with the corresponding glossary.
The following ideas will be used during the solution of the exercise:
Ex supervisors should be able to control the sales of the agency
FACT Sales
MEASURES OfferPrice, AgreedPrice, Status
DIMENSIONS EstateID, OwnerID, CustomerID, AgentID,
TimeID
Ex supervisors should be able to control the work of the agents by
analyzing the visits to the estates, which the agents are in
charge of
FACT Viewing
MEASURES Duration
DIMENSIONS EstateID, CustomerID, AgentID, TimeID
10/2
Exercise 2: A possible solution
Star schema
Estate Owner
EstateID SalesTable OwnerID
Category EstateID Name
Area OwnerID Surname
City CustomerID Address
Province AgentID City
Rooms TimeID Phone
Bedrooms OfferPrice
Agent
Garage AgreedPrice
AgentID
Meters Status
Name
Sheet
Surname
Map
Office
Address
Customer City
CustomerID Phone
ViewingTable
Name
EstateID
Surname Time
CustomerID
Budget TimeID
AgentID
Address Day
TimeID
City Month
Duration
Phone Year
11/2
Exercise 2: A possible solution
Snowflake schema
Estate SalesTable Time
EstateID EstateID TimeID
PlaceID TimeID Day
Category CustomerID Month
Rooms OwnerID Year
Bedrooms AgentID Owner
Garage OfferPrice OwnerID
Meters AgreedPrice PlaceID
Sheet Status Name
Map Surname
Customer ViewingTable
Address
CustomerID TimeID
Phone
PlaceID EstateID
Agent
Name CustomerID
AgentID
Surname AgentID
PlaceID
Budget Duration Name
Address Surname
Place
Phone Office
PlaceID
Address
City
Phone
Province
Area
12/2
Exercise 2: A possible solution
SQL queries wrt the star schema
SELECT COUNT(*)
FROM ViewingTable V, Estate E
WHERE V.EstateID = E.EstateID
GROUP BY V.CustomerID
HAVING COUNT(DISTINCT E.Category) >= 3
13/2
Exercise 2: A possible solution
SQL queries wrt the star schema
Ex Who has paid the highest price among the customers that
have viewed properties of at least 3 different categories?
CREATE VIEW Cust3Cat AS
SELECT V.CustomerID
FROM ViewingTable V, Estate E
WHERE V.EstateID = E.EstateID
GROUP BY V.CustomerID
HAVING COUNT(DISTINCT E.Category) >= 3
SELECT C.CustomerID
FROM Cust3Cat C, SalesTable S
WHERE C.CustID = S.CustID AND S.AgreedPrice IN
(SELECT MAX(S.AgreedPrice)
FROM Cust3Cat C1, SalesTable S1
WHERE C1.CustomerID = S1.CustomerID)
14/2
Exercise 2: A possible solution
SQL queries wrt the star schema
Ex Who has bought a flat for the highest price w.r.t. each month?
15/2
Exercise 2: A possible solution
SQL queries wrt the star schema
Ex What kind of property sold for the highest price w.r.t each city
and month?
SELECT E.Category, E.City, T.Moth, T.Year,
E.AgreedPrice
FROM SalesTable S, Time T, Estate E
WHERE S.TimeID = T.TimeID AND E.EstateID =
S.EstateID AND (P.AgreedPrice, P.City, T.month,
T.year) IN (
SELECT MAX(E1.AgreedPrice), E1.City,
T1.Month, T1.Year)
FROM SalesTable S1, Time T1, Estate E1
WHERE S1.TimeID = T1.TimeID AND
E1.EstateID = S1.EstateID
GROUP BY T.Month, T.Year, E.City)
16/2
Exam 9/1/07
Travel agency
17/2
Exam 9/1/07: A possible solution
Reverse engineering
code
CATEGORY
description (0,N)
of code description name code description
code
(0,1)
duration (1,1) (0,N) DESTINATION (0,1) (0,N)
TRIP to of TYPE
address birthday
(0,N)
code (0,N) (0,N)
(0,1) (0,N)
name GUIDE of NATION
name
surname
continent
address birthday
sex
18/2
Exam 9/1/07: A possible solution
Facts, measures, dimensions, attribute tree
FACT Trip
MEASURES PartecipantNr, Duration, Income
DIMENSIONS Partecipant, Place, Guide, Time, Category
Duration
Name Description Description
Surname IdCategory
IdType
Sex IDGuide Description
IDNation
IdDestin Name
Address IDTrip
Bithday IDPartecipant IDNation name
Address Day
name continent Name
Month continent
IDNation
Birthday
grafting Surname Year
19/2
Exam 9/1/07: A possible solution
Attribute tree, fact schema
Nationality Nationality
Birthday
Birthday
ID
Partecip
IDPartecip IDGuide IDGuide
Description
Id Category
Trip
Nationality
Duration
Nationality IdCategory
Income
Day
Day PartecipNr
IDTrip
Month
Duration
Month De scription
Year
IdDestin IdDestin
Year
IdType IdType
Continent Continent
Category
Partecipant
IdCategory
IdPartecipant
Description Trip
IdNation
IdCategory
Birthday
IdTime
Time
IdDestin
IdTime
IdPartecipant
Day
IdGuide
Month
Duration Guide
Year
Income IdGuide
PartecipNr IdNation
Destination Birthday
IdDestin Sex
IdNation Nation
Name IdNation
Type Name
Continent
21/2
Exam 9/1/07: A possible solution
SQL queries
Ex Average trip duration for a given place
SELECT AVG(T.Duration)
FROM Trip T, Destination D
WHERE T.IdDestin = D.IdDstin AND D.Name = “ Place”
Ex Average trip duration for a given trip category and month
SELECT AVG(T.Duration)
FROM Trip T, Time Ti
WHERE T.IdTime = Ti.IdTime AND T.IdCategory =
“ Category” AND Ti.Month = “ Month”