DWDM 2020 Lecture02 Datawarehouses
DWDM 2020 Lecture02 Datawarehouses
1
Motivation
2
Outline of today's lecture
1. Introduction
2. Multidimensional data model
a. Measures/ Facts
b. Dimensions
c. Dimensional (dependent) attributes
d. Visualization of multidimensional data
e. Queries to multidimensional data
3. Reference architecture of a Data Warehouse
a. Subtasks in loading data
b. overview on DW reference architecture
c. DW requirements
d. some DW examples
3
1. Introduction: Data Warehouse
Customers
Customers
Customers
Database
administrator DB Database
administrator DB
Database
administrator DB
5
... simple example ... (not yet a Data
Warehouse)
Relation sales_data:
6 turnover= Umsatz
.. in another representation (which is a
DW with 2 dimensions)
20.12.2017
21.12.2017
Time
7
Data model of a Data Warehouse with
three dimensions
Total sales of
TV sets in the USA
Time
Q1 Q2 Q3 Q4 sum
TV
PC USA
MP3
sum
Country
Canada
Mexico
sum
9
2a) Facts/Measures
Measures/ Facts
Detailed or Compressed (numerical) values 1200
Describing the business or application context
Examples:
Revenue (Einnahmen), profit, loss, ...
Types:
Additive facts:
additive calculation over all consolidation levels possible, e.g. purchase
value
Semi-additive facts :
additive calculation only over selected hierarchies, e.g.: inventory
(addition over different places possible, addition over time leads to the
wrong result), reason: objects to be added must be disjoint
Non-additive facts:
no additive calculation possible, e.g. averages or percent values
10
2b) Dimensions
20.12.2017
Describe the meaning of facts
Finite set of dimensions (n>=2) Rostock 1200
Classification levels
12
Product hierarchy
13
Hierarchies in dimensions
Dimension members:
Nodes of a classification hierarchy
Classification level describes degree of compression
Forms:
Simple hierarchies Parallel hierarchies
Top Top
Top
Country Product Group
category Year
federal State Branch Region
Product Quarter Week
group
City Area Company
Month
Product
Shop Address
Department
Day
Higher hierarchies contain the aggregated
values of exactly one lower hierarchy
14
2c) Category attributes
Attributes in the hierarchies
Primary attributes Top
Classification attributes
higher-level attributes of the dimensions federal State
Example: customer group, city, state, country
Dimensional attributes City
Other attributes like address, telephone number, branch
name ...
Shop Address
Depending on a primary attribute or classification
attribute
stored together with the classification attributes
adding additional information to the dimensions
15
Cube
(actually hyper cuboid)
The data structure:
Cube (actually cuboid):
Basis for the multidimensional data model
Edges of the cube: dimensions
Cells: one or more (!) key figures as a function of the
dimensions
Number of dimensions: dimensionality of the cube
Visualization:
2 Dimensions: table
3 dimensions: cube
More than 3 dimensions: multidimensional structure
16
Definition of a cube
Schema of a cube
Set of dimensions (schemes) DS
Set of facts (measures) M
Formal:
C =(DS, M) = ( {D1, ..., Dn}, {M1, …, Mm} )
17
2d) Graphical representation, example
of a cube
The individual
cells contain 1 or
more facts each
18
2e) OLAP Operations
Operations on the
multidimensional model:
1. Slice:
Excision of "slices" of a cube
Reducing the dimensionality
E.g.: all values of a year
2. Dice:
Cutting out a "partial cube"
Getting the dimensionality,
change of hierarchy objects
E.g.: values of certain
products or regions
19
OLAP Operations
3. Roll-up
Generating new information by aggregating the data along the
consolidation path (dimension hierarchies)
Example: Day Month Quarter Year
(the number of dimensions in the result is unchanged, the number
of values reduces)
4. Drill-down
Complementary to roll up
Navigation of aggregated data to detailed data along the
dimension hierarchy
5. Drill-across
Change from one cube to another
20
OLAP Operations
drill down
June sum
May Mexico
April 2. Quarter
sum
March 1. Quarter
February
January roll up
21
OLAP Operations
rollup and drill down in a tabular representation
22
OLAP Operations
6. rotating (or pivoting)
Rotating the cube by swapping the dimensions
For analyzing the data from different perspectives
23
OLAP Operations
7. Split and Merge
can also be referred to Nest and Unnest
25
3a) How does the data comes into the
Data Warehouse?
Multi-step process to insert the data
called ETL Process (Extraction, Transformation, Load)
26
3b) Reference architecture
Data Warehouse system
Analysis
Data
Warehouse
Data
Metadata
Loading Warehouse Repository
Manager
Manager
Basis
Database
Loading
Trans-
Work formation Description
Space
Data flow to this slide:
Extraction Control flow
in the lecture
Monitor
29
No summary for today, only an outlook
Next lectures:
Different storage methods
1. based on relational databases or
2. multidimensional storage
Data Warehouse queries: OLAP (Online Analytic
Processing)
31
Literature
Veit Köppen, Gunter Saake, Kai-Uwe Sattler Data Warehouse Technologien:
Technische Grundlagen, mitp Professional, 2012
VL: Data Warehousing und Data Mining, Ulf Leser: HU Berlin,
Sommersemester 2007
VL: Kai-Uwe Sattler, Gunter Saake: Data Warehouses, TU Magdeburg
Wolfgang Lehner: Datenbanktechnologie für Data-Warehouse-Systeme;
dPunkt, Heidelberg, 2002
A. Bauer, H. Günzel; Data-Warehouse-Systeme - Architektur, Entwicklung,
Anwendung; dPunkt, Heidelberg, 2000
VL: Rico Landefeld, Blockseminar Data Warehousing, Lehrstuhl für
Datenbanken und Informationssysteme, Universität Jena, 2005
Matthias Goeken: Entwicklung von Data-Warehouse-Systemen :
Anforderungsmanagement, Modellierung, Implementierung, Deutscher
Universitäts-Verlag, Wiesbaden, 2006
Neckel, Knobloch: Customer Relationship Analytics, dpunkt.verlag, 2005
32