Data Warehousing by Example
Data Warehousing by Example
|1
|2
1. Why ?
............................................. 2
2. The Approach
............................................. 3
......................................... 125
8. Retail Banks
......................................... 141
......................................... 178
1. Why ?
The purpose of this document is to present our Best Practice approach to Data Warehouse design
based on more than 15 years experience.
We are publishing it on Kindle, as cheaply as possible, in order to encourage constructive criticism so
that we can improve the book.
We would be very glad to have your comments at [email protected].
After 5 years ago, a teacher emailed me to say that his students found my Data Models boring and
were falling asleep in class !!!
So I began to wonder how I could make them more interesting and still easy to relate to.
My conclusion was that I could take everyday events to use as examples.
That is why I have used Football, Malaysia and the Olympics.
Of course, a holiday in Malaysia is not something we do every day, and Olympics is not an everyday
event ;-0)
|3
2. The Approach
In this Section we will discuss our Approach to the design of an Enterprise Data Model with
associated Data Warehouses and how it applies to a Day at the Olympics and a Holiday in Malaysia.
2.1 Data Architecture
This Architecture supports Data Migration into an Enterprise Data Warehouse to meet
BI requirements.
It shows the major Layers in an End-to-End Architecture for Data Migration from Data Sources, into a
Data Warehouse and finally to a BI Layer to deliver data to the end-user.
Dimensional Models
(Stars and Snowflakes)
Data Sources
(Salesforce, SAP, Mobile, etc.)
|4
Start by reviewing the list of candidates on the Database Answers Web Site :
http://www.databaseanswers.org/data_models/enterprise_data_models.htm
http://www.databaseanswers.org/data_models/subject_area_data_models.htm
http://www.databaseanswers.org/data_models/canonical_data_models/index.htm
|5
A Sales Receipt
A Bill of Lading
A Contract
A Delivery Note
An early example of the use of the Canonical Data Model (CDM) is to map data from Data Sources to
the EDM.
This is a good opportunity to review the design of both the CDM and the EDM.
You can see examples of how this works in practice in the Chapters on A Day at the Olympics and
A Holiday in Malaysia.
The current list of top-level Entities that feature in the CDM includes :
A Party Entity is implied but does not appear at the top-level to simplify the layout of the CDM.
|6
Industry-specific
o
http://www.databaseanswers.org/data_models/insurance_data_warehouses/common_data_model.htm
|7
This shows the components used in the design of an Enterprise Data Model (EDM) with associated
Subject Area Models, based on Industry-specific Models.
Each Data Source is reviewed against the Canonical Data Model and the appropriate Messages
formats are defined. Then the data in the Message is mapped to the Enterprise and Industry-specific
Models.
The current Enterprise Data Models are defined on this page of the Database Answers Web Site :
http://www.databaseanswers.org/data_models/enterprise_data_models.htm
http://www.databaseanswers.org/data_models/index.htm
Generic Subject
Area Models
Industry Specific
Models
(eg Customers)
(eg Insurance)
Messages
Data Sources
(eg Sales Receipts)
|8
Compare your Draft with the off-the-Shelf Enterprise Data Models on this Page :
http://www.databaseanswers.org/data_models/enterprise_data_models.htm
http://www.databaseanswers.org/data_models/subject_area_data_models.htm
http://www.databaseanswers.org/data_models/generic_data_models.htm
http://www.databaseanswers.org/data_models/canonical_data_models/index.htm
|9
http://www.databaseanswers.org/data_models/football_clubs_tutorial/index.htm
We decide that Clubs employ Players and play Games against other Clubs.
So in a Data Model it begins to look like this :-
So far so good - except that when we think about how this works in practice, we realise that not all
Players play in every Game.
Therefore we need a more flexible way of identifying which players play in each Game.
| 10
The additional data items that we know about the Entities is shown in this 3NF Data Model which
shows all Attributes.
Lets discuss what we know about Clubs :1.Because our Clubs entity refers to all Clubs, both ours and others, we need to identify our Club.
Therefore, we add a data item called Our_Home_Team_YN which we set to either Y or N as
appropriate.
We will always want to record the Club Name, and we will usually want to know the Manager, and
the Club Colours, and perhaps a description, such as the history of the Club.
To allow for future work, we include a data item called Other_Details.
In passing, lets point out that we follow a certain order to the appearance of these items in the
Entity definition so as to help us in simply having a consistent layout on all our Data Models.
We show Keys and Flags (such as YN values) at the beginning, with all other items listed
alphabetically. The additional items in the other Entities follows similar thinking.
| 11
| 12
| 13
This is how it is shown on the BBC Web Site at 8 pm on Saturday, February 9th. 2013 :-
| 14
This is the data that we need to produce these statistics, listed by Number of Point, in descending
order :
Club Name
Where the Game Date is from the start of the season up to the present.
Number of Points where each Win Result Code = 3, Draw=2 and Loss=0
In order to introduce an analytical Framework, we can define a Key Performance Indicator (KPI 1)
as the Standing that we have defined above.
Then we can create an SQL View as a neat way to bring the data together and populate it to a BI
front-end. So we would have :-
| 15
| 16
4. BI on the Beach
A Data Warehouse is frequently developed to meet an organisations requirements for data to meet
Business Intelligence (BI) requirements.
One of the most common requirements is to provide Key Performance Indicators (KPIs).
For a Retail business, a typical Financial KPI would be Sales or Profitability per staff headcount, and a
Performance KPI would be On-Time-Delivery (OTD).
At Database Answers we have designed a generic approach that we call BMEWS, which stands for
Business Monitoring and Early Warning System.
The reminder of this Section describes how BMEWS could be applied to monitor risk of a Data
Warehouse that would allow the CIO or Security Manager to relax on the beach.
This page describes this BI on the Beach application :
http://www.databaseanswers.org/data_models/bmews_bi_on_the_beach/index.htm
Analyses
& Reports
KPI
Dashboard
Feedback
BI Layer
KPIs
BMEWS Platform
Data Extracts
| 17
Governance, Risk
and Compliance
Business
Intelligence Layer
Financial
Data Mart
Performance
Data Mart
Data Warehouse
Data Sources
| 18
Situation
Reports
KRI
Dashboard
Feedback
Key Risk
Indicators
(KRIs)
| 19
Compliance with Best Practice in Data Management, such as Data Consistency checks on
Master Data Management, Orphan Orders, and so on.
| 20
| 21
| 22
| 23
| 24
5.2 Introduction
In this Chapter we use a trip to the Olympics to discuss an approach to the implementation of a
Reference Data Architecture and the design of a Data Warehouse.
A Canonical Data Model (CDM) is central to this and we discuss the benefits of using Design
Patterns based on a CDM.
During the trip, I used the Ticket I had bought online, ate lunch and watched the Judo competition.
| 25
http://www.databaseanswers.org/data_models/canonical_data_models/index.htm
| 26
http://www.databaseanswers.org/data_models/canonical_data_model/index.htm
Suppliers
Services
Locations
Customers
Address
Credit Card
Customer Services
Documents
Staff
| 27
DATE
LOCATION
PRICE
DETAILS
Supplier Chains
Suppliers
Services
Customers
Address
(Location)
Credit Card
(Payment Methods)
Customer Services
Staff
Products
Venues
(Locations)
Outcome
| 28
| 29
CUSTOMERS
SERVICES
eg Checkin
EVENTS
eg Purchase a Ticket
LOCATION
eg Judo Venue
oVenue
DOCUMENTS
eg My Tickets for Train
and Olympics
CREDIT CARD
STAFF
| 30
DATE
LOCATION
PRICE
DETAILS
This shows the fields in the Message for this Event :EVENT
DATE
LOCATION
PRICE
DETAILS
Purchase Ticket
Date of Purchase
Seat Price
Date, Time of
Competition
| 31
Customers
Address
Credit Card
Services
Location
Customer Service
Document
Staff
| 32
| 33
Restaurant
Customers
Menu
Buy Lunch
Credit Card
Staff
Sales Receipt
DATE
LOCATION
PRICE
DETAILS
This shows the fields in the Message for this Event :EVENT
DATE
LOCATION
PRICE
DETAILS
Buy Lunch
Restaurant
Total Price
| 34
Supplier
Customers
Address
Credit Card
Services
Location
Customer Service
Document
Staff
| 35
| 36
My Ticket
Audience
(Customers)
Judo Venue
Staff
(Security, etc)
Competition
Outcome
DATE
LOCATION
PRICE
DETAILS
This shows the fields in the Message for this Event :EVENT
DATE
LOCATION
PRICE
Watch the
Judo
Event Date
DETAILS
Outcome / Result
| 37
Supplier
Customers
Address
Credit Card
Services
Location
Customer Service
Document
Staff
Outcome
Restaurant Orders
Ticket Sales
| 38
| 39
| 40
This is a simple example of how this data could be displayed using a Green Traffic light :-
DISCIPLINE
BUDGET
TARGET
ACHIEVEMENT
8 million
0-2 medals
2 medals
TRAFFIC LIGHT
(RED/AMBER/GREEN)
Archery
Judo
Wrestling
5.14 Conclusion
This Chapter has presented a Method for designing a Data Warehouse following a Canonical Data
Model and Messages.
We have validated the Method by designing a Data Warehouse for a Day at the Olympics.
I would be pleased to have your comments and you can email me at [email protected].
| 41
6. Holiday in Malaysia
6.1 Management Summary
In this Chapter we use a trip to Malaysia to discuss an approach to the implementation of a
Reference Data Architecture and the design of a Data Warehouse.
A Canonical Data Model (CDM) is central to this and we discuss the benefits of using Design
Patterns based on a CDM.
During the trip, my wife and I stayed in three Hotels, hired a car and visited a number of Tourist
Attractions, including an Elephant Sanctuary, a Crocodile Farm and an Underwater World in
Langkawi Island in Malaysia.
After we returned to England I found myself thinking that the trip would provide a good opportunity
to develop an interesting and User-Friendly Tutorial on Data Warehouses.
The design of the Data Models reflects the scope and the fact that the overall aim is to provide data
for Business Intelligence.
We also try to keep in mind that a well-designed Data Model should be good to look at and it should
be possible to tell a story based on the Model.
6.1.1 The Approach
http://www.databaseanswers.org/data_models/canonical_data_model/index.htm
Event 3 - Go Shopping
| 42
DATA WAREHOUSE
SALES RECEIPTS
Coffee Bean Receipt
Elephant Sanctuary
Starbucks in Langkawi
| 43
A short Slide Show has been created to give you an overview of how this approach works in practice
and how the Canonical Data Model is central.
http://www.databaseanswers.org/bmews_slideshow_book_tutorial/index.htm
Do these Steps for all Events
| 44
| 45
Web Services
Design
Implementation
Data Sources / Databases CSV, ODBC, Oracle, SAP, SQL Server, etc.
| 46
The Integrated Data Platform is a specific example of a more general Data Virtualization Layer.
BI Layer
Semantic Models
Dimensional Models
Customer
Master Index
Product
Master Catalogue
Staging Area
ODS 1 Shipments
EtE.
| 47
The Canonical Data Model is used as a Template for a Design Pattern for an ERD for a Business Event.
This Model appears on this page on our Database Answers Web Site :
http://www.databaseanswers.org/data_models/canonical_data_models/index.htm
It provides a stripped-down Event-oriented Model that applies to a wide range of business and
everyday situations.
We use it as a standard to translate data into a common format suitable for loading into a Data
Warehouse.
We have used ERWin for this Data Model.
This allows us to show Many-to-Many Relationships in a very concise and economical style.
When we come to use the CDM we will expand these into One-to-Many Relationships.
| 48
CDM
Customers
Guests
Documents
Events
Hotel Check-In
Locations
Hotel Address
Organisations
Hotel Chains
Organisations
Hotels
Organisations
Products or Services
Room Reservation
Third Parties
N/A
COMMENTS
Supplier
Customer
Products or
Services
Specific,
eg Hotel
Check-in
Eg Hotel
Date
Customer,
Guest
Room,Meals,
etc.
Unit Price
From Date
To Date
Total Price
| 49
2.
Definition of each item which is in common use and where a clear unique understanding is important.
Validation
Validation
Clean-up
Transform
Unit Price
Cannot be negative
From Date
To Date
Total Price
Cannot be negative
| 50
| 51
| 52
ODS
Staging
Area
3NF Data
Warehouse
Dimensional
Models
Data
Warehouse
Semantic
Layer
Bus
Master
Data
Conformed Dimensions
Conformed Facts
eg Calendar Date
eg Profitability
eg Location
eg Performance
BI User
View
| 53
The MDM Component provides a Single View of the truth for Customers and Products.
The principles behind the Single View of both the Customers and Products and Customers is the
same :
If possible, define one Data Source as the Golden Record or Master Recore.
For example, where Salesforce is in place, it is often provides the Master Customer recore.
In this case, the GetCustomer Service will try to match between the new characteristics, such as
Customer name and address, and existing Customers.
| 54
A Customer Master Index (CMI) maintains a link between the single master Customer record and the
Source Data Customers.
This Data Model shows how the approach above will be implemented in our Reference Data
Architecture.
For this Event, we must plan for loading Customer data into the Data Warehouse.
This requires that we establish a Single View of the Customer.
In the UK, the name Joe Bloggs is used whereas in the States it would be John Doe.
Joe, of course, is an abbreviation of Joseph.
On official documents, the name would be spelled Joseph, whereas in everyday conversation, it
would normally be Joe.
Therefore we have to allow for the possibility that a Joe Bloggs might be the same person as
Joseph Bloggs
| 55
Resolution of this Problem and the ambiguity requires us to define a set of Business Rules that can
be run whenever we load a Customer who might be ambiguous.
Therefore, our solution to this problem of establishing a Single View of Customer Joe or Joseph is to
have a Rules Engine where we can define and execute a Rule like Joe is equivalent to Joseph.
The recommended practice to implement the Customer Master Index is to use Web Services for Get,
Update and Put facilities.
6.2.12.3 Products
| 56
| 57
| 58
From KL we took an internal flight to Langkawi island where we checked in at the Berjaya Hotel and
Resort.
For each Event, we check to see whether we can derive a Design Pattern based on our Canonical
Data Model. If we can, then we have validated our CDM.
6.3.1.4 Data Warehouse for Hotel Check-In
| 59
6.4 Events 1 to 10
6.4.0 Adding new Data Sources
A very important activity that we have to plan for is adding new Data Sources to our Data
Warehouse.
This is often as a direct response to a new business requirement.
The Steps that we will follow are those that we describe in the succession of Events in this Appendix.
This Section discusses how the Canonical Data Model (CDM), shown earlier, applies to the Event of
Checking-in to a Hotel. The CDM provides a Design Pattern for the Event-oriented Data Models that
we need.
The Design Pattern based on the Hotel Check-in Event looks like this :This Section discusses how the Canonical Data Model (CDM) applies to the Event of Checking-in to a
Hotel. The CDM provides a Design Pattern for the Event-oriented Data Models that we need.
| 60
Hotels
Rooms
Customers
Hotel Reservations
Staff
Hotel Address
Business Rules
It is very good practice to write out the Business Rules that define the conditions that the logic of the
Model must comply with.
They can then be reviewed and agreed with a Subject Matter Expert (SME).
If it is appropriate, we can use a Business Rules Engine to automate the implementation of the Rules.
In this case, the Rules look like this : A Customer has one and only one Address.
A Customer has one or more Addresses.
A Hotel belongs to one and only one Hotel Chain.
A Hotel has one and only one Address.
A Reservation is associated with one Customer.
A Reservation is associated with one member of Staff.
A Room belongs to one and only one Hotel.
A Room Card or Key is associated with one and only one Room and Reservation
| 61
This Table shows how the Entities in our Hotel Check-In Data Model map on to our Design Pattern
based on our Canonical Data Model (CDM).
We are very happy to see that it does because it helps to validate the CDM.
CDM
Customers
Guests
Documents
Events
Rent a Room
Locations
Hotel Address
Organisations
Hotel
Organisations
Hotel Chains
Organisations
Products or Services
Third Parties
N/A
COMMENTS
This shows how the Generic Message Template applies to the Hotel Check-In Event.
It defines the Source Data for this Event.
TBD stands for To Be Determined .
Supplier
Date &
Time
Customer
Details
Products or
Services
Unit Price
From
Date
To Date
Total Price
Berjaya
Hotel
Check-in
Date &
Time
Barrys Name
and Credit
Card Details
Room
number
Price per
Night
From
Date
To Date
TBD
| 62
At this point, the Data Warehouse will contain data only for the Hotel Check-in Event and therefore
it will look like the CDM.
This diagram shows the Entities in the Data Model.
The attributes are shown in the Dimensional Model in the next Section.
In this diagram, we have positioned the Room Card/Key so that it corresponds to the Document
Entity in the C DM.
We have also shown the Hotel_Address entity with its correct relationship to the Hotel entity,
whereas in the CDM, it is shown, for convenience sake, in a Many-to-Many Relationship with the
Events entity.
This is the same as Third-Normal Form Data Warehouse as the one shown earlier in this Section.
| 63
Derived data must not appear in an ERD, therefore the Room Card/Key does not appear in this ERD
because the data is derived from data already recorded.
| 64
This Section discusses how the Canonical Data Model (CDM), shown in Section 2.1, applies to the
Event of Hiring a Car. The CDM provides a Design Pattern for the Event-oriented Data Models that
we need.
The Design Pattern based on the Car Hire Event looks like this :-
Rental Office
Cars
Customers
Office Address
Staff
This Section discusses how the Canonical Data Model applies to the Event of Hiring a Car.
We hired a car from a local Car Hire company at the airport when we landed on Langkawi island.
We would normally create a Subject Area Model for Cars, to show details such as Car Make and
Model.
This Table shows how the Entities in our Car Hire Data Model map on to our Design Pattern based on
our Canonical Data Model (CDM).
We are very happy to see that it does because it helps to validate the CDM.
CDM
COMMENTS
| 65
Customers
Documents
Event
Locations
Organisations
Products or Services
Car Hire
Third Parties
N/A
We hired a car from a local rental company in the airport in Langkawi, which worked out very well.
Car Hire
Company
Date &
Time
Customer Details
Products or
Services
Unit Price
From Date
To Date
Total
Price
ABC Car
Hires
Check-in
Date &
Time
Car Reg
Number
Rental charge
per Day
From Date
To Date
---
| 66
At this point, we add the Car Hire data to the data for the Hotel Check-in Event which is already in
the Data Warehouse.
| 67
After a little thought, we have combined Car Hire and Hotel Reservations into Suppliers and Services.
| 68
This is Event 2 where we add the Dimensions and Facts for Hotel Check-in to the existing ones for
Car Hire.
| 69
Harrods is very popular in Malaysia, as you can tell from the customers browsing in the store.
| 70
Here we have Data Models for Receipts from Harrods, Starbucks and Tesco in Malaysia.
This shows a consolidated Receipt that provides a generic view of the three specific examples above.
This Section discusses how the Canonical Data Model applies to the Shopping Event
We go Shopping which is when the long-suffering husband says one of three things :1. Its a tough job but someones got to do it
2. When the going gets tough, the tough go shopping
3. Yes, dear
But usually, we survive the experience ;-0)
This is how the CDM Design Pattern applies to the Shopping Event :Retail Chains
Stores
Products
Customers
Event : Go Shopping
Store Address
Staff
Sales Receipt
| 71
| 72
This Section discusses how the Canonical Data Model applies to the Event of Shopping.
We went shopping at a number of stores in Malaysia.
We would normally create a Subject Area Model for Shopping.
This Table shows how the Entities in our Shopping Data Model map on to our Design Pattern based
on our Canonical Data Model (CDM).
We are very happy to see that it does because it helps to validate the CDM.
CDM
EVENT : Go Shopping
Customers
Customers
Documents
Sales Receipt
Events
Go Shopping
Locations
Stores
Organisations
Products or Services
Retail Products
Third Parties
N/A
COMMENTS
The Message Format for this Event will resemble the Sales Receipt.
Store
Name
Date &
Time
Customer Details
Products or
Services
Unit Price
From
Date
To Date
Total Price
Harrods
Visit
Date &
Time
Cash (Anonymous)
or Barrys Name
and Credit Card
Details
One or
many
Products
Purchase
Price
N/A
N/A
To be
calculated
| 73
At this point, we add the Shopping data for Event 3 to the data for the Car Hire and Hotel Check-in
Events which is already in the Data Warehouse, so our design looks like this :-
At this point, we would normally consider creating a Glossary of Terms to establish agreed
definitions of the word that are in common use.
The Dimensional Model will have data for Shopping, Car Hire and Hotel Reservations.
| 74
| 75
6.4.4.0 Discussion
6.4.4.0.1 Elephants
Elephants, especially in small numbers that you see in a Sanctuary or a Circus, frequently have
names and we often know their ages.
However, this is not true for crocodiles.
Therefore, we store names and ages for elephants but not for crocodiles so here is the Data Model
for Elephants :-
| 76
This Section discusses how the Canonical Data Model applies to the Event of Visiting an Elephant
Sanctuary.
We were on Langkawi island where there are a lot of interesting things to see and do.
My wife voted for a trip to Elephant Sanctuary because she thinks baby Elephants are very cute.
So we decided on the Elephant Sanctuary, then the Crocodile Farm and finally the Underwater
World.
They have an overhead aquarium and I have always wanted to see fish going over my head, and it
had a number of individual Attractions, including the Fish Aquarium and the Penguin Area.
Here is our ticket for the Elephant Ride (called a Dumbo Boarding Pass !!!) :-
In our Canonical Data Model (CDM) this is an example of a Document related to an Event.
In other words, this is an example of how we are able to validate our CDM.
| 77
Customers
Complexes
Elephant
Sanctuary
Attraction Address
Staff
Tickets
This Section discusses how the Design Pattern for Visiting a Tourist Attraction maps to the Canonical
Data Model applies to the Event of Visiting a Tourist Attraction.
If we need to include more detail we would probably create a Subject Area Model for Tourist
Attraction.
This Table shows how the Entities in our Tourist Attraction Data Model map on to our Design Pattern
based on our Canonical Data Model (CDM).
We are very happy to see that it does because it helps to validate the CDM.
CDM
Customers
Tourists
Documents
Tickets
Events
Locations
Address of Elephant
Sanctuary
Organisations
Products or Services
Third Parties
N/A
COMMENTS
An Elephant Sanctuary is an
example of a Tourist Attraction
| 78
Date &
Time
Customer
Details
Services
Unit Price
From Date
To Date
Total Price
eg
Elephant
Sanctuary
Visit
Date &
Time
N/A
Attraction
Entry Fee
N/A
N/A
As
determined
| 79
This consolidates the Tourist Attraction Entities with the existing Entities for Car Hire, Hotel Check-in
and Shopping.
| 80
At this point, data for Event 4 Elephant Sanctuary ( Tourist Attractions) is added to data for Cars,
Hotels and Shopping.
| 81
In Langkawi we were very impressed to see a brave guy sitting on the back of a crocodile.
Later we found that he was an employee and somehow he had trained the crocodile to let him sit on
its back.
| 82
6.4.5.0 Discussion
This Section discusses some of the implications for Data Modeling of combining a Visit to the
Crocodile Farm with a Visit to the Elephant Sanctuary.
6.4.5.0.1 Adding Crocodiles to Elephant Data Model
We can see that the Crocodile Entity looks very similar to the Elephant Entity.
The only difference is that we often know the name and age for an Elephant because they are
somehow more user-friendly than Crocodiles.
We never know the age and name of a Crocodile !!!
When we try to produce a combined Model for both elephants and crocodiles this is our first draft.
| 83
| 84
| 85
| 86
This Section discusses how the Canonical Data Model applies to the Event of Visiting a Crocodile
Farm.
We would expect this to be identical to a Visit to the Elephant Sanctuary.
But it is worth the effort of compiling the Mapping Analysis so that we can double-check the
situation.
Sure enough, after we complete the Mapping, we can see that the logic is identical.
Therefore we do not need to change the Design Pattern or the Data Warehouse.
The Dimensional Model will simply have additional data for the Crocodile Farm.
6.4.5.2 Mapping to the CDM
This Section discusses how the Canonical Data Model applies to the Event of Visiting a Crocodile
Farm.
We can see that the Data Model for Crocodile Farm is identical to that for the Elephant Sanctuary.
We can simply create a Data Model for Tourist Attraction and create Event Types of Visits to a
Crocodile Farm and an Elephant Sanctuary.
Therefore, we handle Elephant Sanctuaries and Crocodile Farms as different sorts of Reference Data.
This Table shows how the Entities in our Crocodile Farm Data Model map on to our Design Pattern
based on our Canonical Data Model (CDM).
We are very happy to see that it does because it helps to validate the CDM.
CDM
Customers
Tourists
Documents
Tickets
Events
Locations
Organisations
Products or Services
Third Parties
N/A
COMMENTS
| 87
Date &
Time
Customer
Details
Services
Unit Price
From Date
To Date
Total Price
eg
Crocodile
Farm
Visit
Date &
Time
N/A
Attraction
Entry Fee
N/A
N/A
As
determined
| 88
When we review this Data Model, we can see that the logic of the Elephant Sanctuary applies
equally to the Crocodile Farm.
In other words, our Consolidated Data Warehouse is identical, and we simply add to the Reference
Data as another kind of Tourist Attraction.
| 89
When we review the Dimensional Model, we can see that that we can accommodate Crocodile
Farms by simply adding to the Reference Data.
| 90
http://www.underwaterworldlangkawi.com.my/
| 91
From a Data Modelling point of view, visiting an Aquarium is identical to visiting a Crocodile Farm or
Elephant Sanctuary.
Therefore we do not need a separate CDM, and we go through the process of mapping simply to
confirm that it is identical.
This Table shows how the Entities in our Visit the Aquarium Event map on to our Design Pattern
based on our Canonical Data Model (CDM).
CDM
Customers
Tourists
Documents
Tickets
Events
Visit to Aquarium
Locations
Address of Aquarium
Organisations
Aquarium Owners
Services
Third Parties
N/A
COMMENTS
| 92
| 93
| 94
This Section shows how the Design Pattern looks for Penguins.
6.4.8.2 Mapping to the CDM
This Section discusses how the CDM applies to the Penguin Area.
When we check our CDM we can see that it applies in an identical way that it does to Elephants and
Crocodiles.
That is we pay for a Service and receive a Document, in the form of a ticket, that allows us to enter
the Attraction.
Therefore we do not need a separate CDM, and we go through the process of mapping simply to
confirm that it is identical.
This Table shows how the Entities in our Tourist Attraction Data Model map on to our Design Pattern
based on our Canonical Data Model (CDM).
CDM
Customers
Tourists
Documents
Tickets
Events
Visit to Penguins
Locations
Organisations
Services
Third Parties
N/A
COMMENTS
| 95
Date &
Time
Customer
Details
Services
Unit Price
From Date
To Date
Total Price
Underwater
World
Visit
Date &
Time
N/A
Penguin
Area
Free
N/A
N/A
Free
| 96
This diagram shows the Design Pattern of the Canonical Data Model adapted for Tourist Attract ions.
| 97
This diagram shows the Design Pattern of the Canonical Data Model adapted for Tourist Attract ions.
| 98
The Credit Card that I use was, of course, associated with me, but it was also associated with
payment of the Hotel Bill so there is a relationship between the Credit Card and the Hotel Guest (ie
Customer) and between the Credit Card and the Total Hotel Bill.
This Section discusses how the Canonical Data Model (CDM), shown in Section 2.1, applies to the
Event of Checking-in to a Hotel. The CDM provides a Design Pattern for the Event-oriented Data
Models that we need.
The Design Pattern based on the Hotel Check-in Event looks like this :Hotel Chains
Hotels
Room, Meals
and Extras
Customers
Hotel Address
Staff
Receipt
| 99
This Table shows how the Entities in our Hotel Check-out Data Model map on to our Design Pattern
based on our Canonical Data Model (CDM).
We are very happy to see that it does because it helps to validate the CDM.
CDM
COMMENTS
Customers
Guest
Documents
Receipts
Events
Locations
Hotel Address
Organisations
Products or Services
Third Parties
N/A
This shows how the Generic Message Template applies to the Hotel Check-Out Event.
It defines the Source Data for this Event.
Generic Supplier
Date &
Time
Customer
Details
Products
or
Services
Unit
Price
From
Date
To
Date
Total
Price
Shangrila Hotel
Checkin Date
& Time
Barrys
Name and
Credit Card
Details
Room,
Meal,
Services
Price
per
Night
From
Date
To
Date
Derived
Specific
| 100
| 101
At this point, we add the Shopping data for Event 3 to the data for the Car Hire and Hotel Check-in
Events which is already in the Data Warehouse, so our design looks like this :-
At this point, the Dimensional Model will have the complete set of Dimensions and Facts.
This Data Model shows them all consolidated into a single Dimensional Model.
An alternative design with more than one Fact Table is shown in the BI discussion.
This is normally called multiple Data Marts which require Conformed Dimensions.
| 102
Maersk
Maersk Line
Shipping
Services
Customers
Port Locations
Staff
Shipping Contract
| 103
| 104
This Table is your starting-point for defining how the Entities correspond to the Entities in your
Stopping for a Coffee Event.
Replace the question marks by your answers.
CDM Generic Entities
Customers
Customer
Documents
Contracts
Events
Locations
Booking Office
Organisations
Maersk
Organisations
Products or Services
Shipping Service
Third Parties
N/A
COMMENTS
This shows how the Generic Message Template applies to the Specific Hotel Check-In Event.
Generic Supplier
Specific
Maersk
Line
Date &
Time
Customer
Details
Check-in Barrys
Date &
Name and
Credit Card
Time
Details
Products
Unit
or Services Price
From
Date
To
Date
Total
Price
Shipping
Service
From
Date
To
Date
---
Quoted
price to
ship
Cargo
| 105
In general, we might be shipping many things and not just one thing, so our Data Model needs to provide for
this. We will call these things Commodities
We can do this by simply start with a many-to-many relationship between Commodities entity and the
Shipments entity and resolve it to two one-many relationships, so that our Model looks like this :-
When we think about how we would show this Data Model in a generic format, this is the result :-
| 106
| 107
At this point, we add the Car Shipping data to the data for all the other Events which is already in the
Data Warehouse, so our design looks like this :-
| 108
| 109
Profitability
Performance
Revenue Data
Costs Data
Complaints
Customer
Count
The initial Data Architecture will include this basic structure of KPI and Source Data.
This will provide a simple but flexible framework which can be enhanced in a controlled manner.
We will use this as a starting-point for discussions to establish the requirements with the business
users.
| 110
Which Hilton had the most complaints last week as a percentage of total Customers ?
What was the most popular Tourist Attraction over the past six months ?
| 111
| 112
http://benchmark.kpilibrary.com/
http://www.smartkpis.com/i_kpi/industries/professional-services/
http://www.microstrategy.com/cloud/personal/
4)Stacey Barr
Stacey is a prolific writer on KPIs and Performance Measurement :
http://www.staceybarr.com/products/performancemeasureblueprint.html
| 113
When we should consider using multiple Data Marts with Conformed Dimensions to meet our BI
requirements.
| 114
After we checked out of the Shangri-la, we took a limo to Penang Airport for the first leg of our trip
back home to London.
With plenty of time we got a coffee at the Coffee Bean , which was started by Herbert Hyman in
California in 1963. It is now an international operation, and very popular in Malaysia.
Looking for a suitable photo, I came across an excellent one on this page : http://www.airliners.net/aviation-forums/trip_reports/reae.main/126525/
| 115
Supplier
Products or
Services
Customers
Address
Staff
Documents
| 116
This Table is your starting-point for defining how the Entities correspond to the Entities in your
Stopping for a Coffee Event.
Replace the question marks by your answers.
CDM Generic Entities
Customers
???
Documents
???
Events
Locations
???
Organisations
???
Organisations
???
Organisations
Products or Services
Room Reservation
Third Parties
N/A
COMMENTS
This shows how the Generic Message Template applies to the Specific Hotel Check-In Event.
Generic Supplier
Specific
Shangrila Hotel
Date &
Time
Customer
Details
Check-in Barrys
Date &
Name and
Credit Card
Time
Details
Products
or Services
Unit
Price
From
Date
To
Date
Total
Price
Room
Price per
Night
From
Date
To
Date
---
number
| 117
This is the same as Third-Normal Form Data Warehouse as the one shown earlier in this Section.
| 118
This is the Dimensional Model for the Hotel Check-in Event, which you can use as your starting-point.
| 119
We have 9 different types of Events and our first Data Model looks like this :-
| 120
| 121
When we think about the Entities, it seems intuitive that visiting a Crocodile Farm must be similar to
visiting an Elephant Sanctuary.
Then we can review the discussion in the Section above to convince ourselves to treat them the
same way.
| 122
At this stage, we are thinking about the underlying relationships between the entities that we have
determined are in the scope of our study.
We have started to group entities together in a way that helps us to reduce the overall size of the
Data Model by identifying generic characteristics in the specific entities.
However, every Data Model should tell a story and now we have to consider how our story can be
told.
For example, we would say Products are found in Stores and therefore we should show a one-tomany relationship between Stores and Products.
We can see that it is the other way round in Version 3 because of the way it had evolved so now we
have to change it to make it correct
| 123
At this stage, we start thinking about the story we want to tell and how we should structure the Data
Model to reflect the story.
For example, we would say Products are found in Stores and therefore we should show a one-tomany relationship between Stores and Products.
At this point, we like the left-hand side of the Data Model and we can say :1. There is a Go Shopping Event which means buying Products, rather than go to Stores.
2. This allows us to show Products at the same level as Stores which means that we can
maintain consistency with the terminology we have used elsewhere in this document.
3. We have decided that Services included Hiring a Car and Staying in a Hotel.
So now we turn to the right-hand side of the Data Model.
| 124
At this stage, we start thinking about the story we want to tell and how we should structure the Data
On the right-hand side, we show Species as a Tourist Attraction, whereas in fact, people say Lets go
the Elephant Sanctuary so the entity that should be related to the Tourist Attraction is the Elephant
Sanctuary, rather than the Elephant.
Of course, people can also say Lets go and see the Elephants but in the physical world, the reality
is that Elephants are housed in an Elephant Sanctuary and it the Sanctuary that they actually go and
see.
So we conclude that it is OK to show the Tourist Attractions and their relationships.
We still have one or two questions to resolve but we are quite content because the Model looks
good and the overall logic is good.
6.8.8 Conclusion
| 125
7. Retail Sales
7.1 Development Framework
In this Section, we discuss the Framework which contains two elements :3. The Development Approach
4. Components used in the Approach
The Approach involves these Steps :6. Identify the business Events
7. Define a Message for each Event
8. Map each Event to the Entities and Attributes in the CDM
9. Determine whether the CDM should be extended.
10. Create an Industry-specific CDM if appropriate.
The Components include : Generic and Industry-specific Canonical Data Models (CDMs)
Generic and Industry-specific Data Warehouse designs
Core Entities and Core Subject Area Models Customers, Products and Suppliers.
Generic (Horizontal)
Industry-specific o
o
| 126
http://www.databaseanswers.org/data_models/enterprise_data_models.htm
http://www.databaseanswers.org/data_models/industries_index.htm
Generic Subject
Area Models
Industry Specific
Models
Canonical
Entities
(eg Customers)
(eg Insurance)
(eg Contract)
Messages
Data Sources
(eg Sales Receipts)
| 127
| 128
http://www.databaseanswers.org/data_models/canonical_data_models/index.htm
In passing, we should add that we are discussing only purchases made by Customers in a Store.
Online Purchases can be added later.
| 129
We can see similarities between the ARTS Model and our Canonical Data Model shown above.
This is because the Canonical Model reflects the most common universal structure which is Events,
Organisations, People and Products or Services.
| 130
Credit/Debit Card
o
Type (eg Amex,Visa) last 4 digits, Expiry Date, Transaction Authorisation Number
We can consider this to be our Operational Data Store, and from it we can generate details for the
following Things of Interest :
Customer Cards
Products
Purchases
Stores
Sales Receipt
Stores
Tills
Products
Customers
Cards
(Credit/Debit)
Of course, when a Customer uses a Card then we know some details of name, address and so on.
| 131
http://www.databaseanswers.org/mdm_master_data_management.htm
If a Customer pays by cash then we know nothing about them, and we would consider this to be an
Anonymous Customers.
| 132
| 133
The Message for this Event includes the Store Number, Name and Address.
Store Number
Store Name
Store Address
| 134
The Message for this Event includes the Product Type, Number, Short Name and Retail Price.
Product Type
Product Number
Product Name
Retail Price
Here we add the Products Entity (shown in yellow) but, of course, there is no relationship between
Products and Stores at this point.
| 135
The Message for this Event includes the Customer Name, Address, Card Number, Merchant Name,
Amount, Date and Time of Purchase.
Customer Name
Customer Address
Card Number
Credit or Debit
Card Type
Expiry Date
| 136
In this diagram we have not shown the Reference Data Entities because they add too much to the
detail which makes the diagrams more difficult to understand at a glance without adding
significantly to the information contained.
At this point, the Customer Purchase generates the Sales Receipt that records the details that we
have examined up to this point.
Therefore we add the Customer Purchase entity that we will use to add details of Cards, Products,
Purchase and Stores.
We have separated the details of the Store, Product s and Customer and put them in separate
Entities. Therefore, they are replaced by links in ID fields to the other Entities.
This results in a simple elegant Data Model that makes it very clear what the Entities are and how
they are related.
When the time comes, it will be very straightforward for us to generate a Physical Data Model, with
the SQL to create the Tables in a Database.
| 137
This diagram shows how the Customer Purchase is the Event that ties all the data together, and is, of
course, reflects the business event that is the foundation of the business.
This Model shows only the Entity names.
This makes it suitable for discussion with business users and Subject Matter Experts.
7.9.3 The Complete Data Model (showing Ref Data and Attributes)
This version of the Model shows the Entity names, the Attributes and the Reference Data.
This makes it suitable for discussion with developers and anybody interested in the details,
especially of the relationships between the Entities.
| 138
| 139
This version of the Model shows the Entity names, the Attributes and the Reference Data, and is
taken from this page :
http://www.databaseanswers.org/data_models/arts_retail_and_can_data_model/arts_and
_canonical_model.htm
This version of the Model shows only the Entity names to facilitate comparison with the ARTS
Model.
| 140
8. Retail Banks
Here we describe a simple Event-Driven Approach to Data Warehouse Design.
It also appears on our Database Answers Web Site with this Data Model for Retail Banks :
http://www.databaseanswers.org/data_models/retail_banks/index.htm
| 141
| 142
| 143
| 144
The Message for this Event includes the data required to set up a new Account.
| 145
| 146
| 147
The Message for this Event includes the Account number, the date and the amount of the deposit
| 148
| 149
The Message for this Event show what is printed on atypical Sales Receipt, including Card Number,
Merchant Name, Amount, Date and Time of Purchase.
| 150
| 151
The Message for this Event includes Customer name, Account Number, Date and Amount of
Statement.
| 152
The Message for this Event includes Account Number and Date to be Closed.
| 153
| 154
| 155
9.2 Banking
You will find this page interesting because it discusses Banking Data Warehouses
http://www.databaseanswers.org/data_models/banking_data_warehouses/index.htm
The bottom of this page on Investment Banking Models shows three versions of a Data Warehouse
as it evolved through facilitated Workshops with Client Management :
http://www.databaseanswers.org/data_models/investment_banking/index.htm
| 156
It is rather surprising to find that the majority of Subject Area Models are Generic.
However, based on our experience over ten years of working with Investment Banks, we would say
that the devil is in the detail. In order words, once we start to develop more detailed Models for a
specific Bank, we would find a great deal of bank-specific detail.
This discussion is recorded on this page
http://www.databaseanswers.org/data_models/enterprise_data_model_for_investment_banks/index.htm
| 157
This Model shows Federated Data Marts for Corporate Service and Treasury with just one shared
Conformed Dimension, which is Time.
Settlements are somewhat like Payments but money can be exchanged both ways and the details
can be more complex than conventional Payments. Therefore, we categorise Settlements as specific
to Investment Banking
Settlements - http://www.databaseanswers.org/data_models/investment_banking/settlements.htm
| 158
| 159
| 160
| 161
9.4 Insurance
9.4.1 Insurance Top-Level 3NF Model
This Section contains Links to our Web Site and shows the Insurance Enterprise Data Model and
Common Data Model.
Here's a 3NF Insurance Data Model :
http://www.databaseanswers.org/data_models/enterprise_data_model_for_insurance/index.htm
| 162
http://www.databaseanswers.org/data_models/insurance_data_warehouses/common_data_model.htm
| 163
| 164
| 165
http://www.databaseanswers.org/data_models/customers_and_purchases_data_warehouse/index.htm
| 166
http://www.databaseanswers.org/data_models/canonical_data_models/index.htm
| 167
http://www.databaseanswers.org/data_models/banking_data_warehouses/generic_dimensional_model.htm
This shows the four most common Dimensions of Customer, Location, Product and Calendar (Time period).
| 168
| 169
http://www.databaseanswers.org/data_models/banking_data_warehouses/federated_data_marts.htm
It shows two Data Marts related by five Conformed Dimensions of Account, Customer, Location and
Status.
| 170
10.4.3 Retail
This page shows Retail Conformed Data Marts :
http://www.databaseanswers.org/data_models/retail_customers/retail_customers_data_mart.htm
| 171
| 172
| 173
11.1 Why ?
We have sometimes found ourselves in situations where there are multiple Data Warehouses
provided from different sources that need to be integrated into one Generic design.
11.2 How ?
We have developed a Top-to-Bottom Approach that is based on these Steps :1. Establish a Business Mission Statement and agree it with the business and Subject Matter
Experts (SMEs).
2. Define some Key Performance Indicators (KPIs)
3. Use the KPIs to trace the required data down to the Generic Data Warehouse.
4. Establish Generic designs for the Conceptual Models, Semantic Layer, Data Marts and Data
Warehouse
5. Create a Data Dictionary
6. Determine mapping from the Source Data Models to the Target Models.
Data Dictionary
Mapping
Generic
Data Warehouse
Model
| 174
http://www.databaseanswers.org/data_models/telecomms/index.htm
This example shows a top-down view of the Layers in a BI Data Architecture for Telecomms :
Conceptual Models
Semantic Layer
Data Marts
Data Warehouse
On the Database Answers Web Site, we also show the Terada Communications Logical Data Model :
http://www.databaseanswers.org/data_models/teradata_communications_data_model/index.htm
The specifications for these three was taken from the GameChanger page on the Web Site for the
TeliaSonera Telecoms company in Sweden :
http://gamechanger.teliasonera.com/en/
We have taken these three as examples, but without any implication that they are realistic.
That would be determined in discussion with Subject Matter Experts.
| 175
| 176
This shows an example for Telecomms, with a e Approach is to map the Source Data Warehouse
This shows an example for Telecomms, with a e Approach is to map the Source Data Warehouse
Model to a single Generic Mode.
An example for Telec omms shows a Traffic Data Mart and a Revenue Data Marts, with Conformed
Dimensions, which ensure common values for shared Dimensions.
| 177
For our Data Warehouse, we need a Third-Normal-Form Data Model, so we have taken this one as a
suitable candidate :
http://www.databaseanswers.org/data_models/customers_and_phone_bills/index.htm
| 178
12. Conclusion
In this book we have presented an approach to Data Warehousing that we have used with great
success in past assignments with a wide variety of Clients.
Our intention is to update it regularly as a Kindle e-Book and to keep it timely and to establish it as a
reference book of Best Practice.
We would be very happy to hear from you if you have any comments or suggestions for
improvement.
Please email us at [email protected]