0% found this document useful (0 votes)
67 views31 pages

DWDM 2020 Lecture02 Datawarehouses

This document provides an overview of multidimensional data modeling and data warehousing reference architectures. It discusses: 1. The multidimensional data model which structures data into facts (measures) and dimensions to support analysis. Dimensions describe measures from different perspectives in hierarchies. 2. Common operations on multidimensional data like slicing, dicing, and rolling up which allow analyzing specific subsets and aggregations of the data. 3. Reference architectures for data warehouses which load transformed and aggregated data from transaction systems to support analysis and decision making. Data is stored redundantly in a multidimensional structure optimized for queries.

Uploaded by

xainshah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views31 pages

DWDM 2020 Lecture02 Datawarehouses

This document provides an overview of multidimensional data modeling and data warehousing reference architectures. It discusses: 1. The multidimensional data model which structures data into facts (measures) and dimensions to support analysis. Dimensions describe measures from different perspectives in hierarchies. 2. Common operations on multidimensional data like slicing, dicing, and rolling up which allow analyzing specific subsets and aggregations of the data. 3. Reference architectures for data warehouses which load transformed and aggregated data from transaction systems to support analysis and decision making. Data is stored redundantly in a multidimensional structure optimized for queries.

Uploaded by

xainshah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Data Warehouses, Business Intelligence, Data Mining

Lecture 2: Multidimensional Data


Model, Reference Architecture,
Meike Klettke
Fakultät für Informatik und Elektrotechnik
[email protected]

1
Motivation

 Content of the last lecture,


 Aims and Tasks of a Data Warehouse

 Today: HOW this is realized:


1. Multidimensional data model
 Structure
 Queries
2. Reference architecture of a Data Warehouse

2
Outline of today's lecture
1. Introduction
2. Multidimensional data model
a. Measures/ Facts
b. Dimensions
c. Dimensional (dependent) attributes
d. Visualization of multidimensional data
e. Queries to multidimensional data
3. Reference architecture of a Data Warehouse
a. Subtasks in loading data
b. overview on DW reference architecture
c. DW requirements
d. some DW examples

3
1. Introduction: Data Warehouse

Customers
Customers

Customers

Database
administrator DB Database
administrator DB
Database
administrator DB

 Redundant data storage


 specific modeling ( multidimensional data)
 Transformed and selected data
 asynchronous update
4
2. Multidimensional Data Model

 Thus we need to find an appropriate data structure for


data warehouses

 consideration which typical queries occur


 on which data

Data model for Data


Warehouses

5
... simple example ... (not yet a Data
Warehouse)
 Relation sales_data:

Store Date Turnover


Rostock 2017-12-20 1200
Schwerin 2017-12-20 1400
Bützow 2017-12-20 850
Rostock 2017-12-21 1300
Schwerin 2017-12-21 1200
Bützow 2017-12-21 670

aggregate turnover of a day: aggregate turnover of the Rostock store:


select sum (turnover) select sum (turnover)
from sales_data from sales_data
where date=´2017-12-20´ where store=´Rostock´

6 turnover= Umsatz
.. in another representation (which is a
DW with 2 dimensions)

20.12.2017

21.12.2017
Time

Rostock 1200 1300 2500


Store

Schwerin 1400 1200 2600

Bützow 850 670 1520

3450 3170 6620 Total sum of all sales

7
Data model of a Data Warehouse with
three dimensions
Total sales of
TV sets in the USA
Time
Q1 Q2 Q3 Q4 sum
TV
PC USA
MP3
sum

Country
Canada

Mexico

sum

Total of all sales


8
figure from Han/Kamber: Data Mining
2. Multidimensional data model

 Data model to support the analysis of data


 Data analysis during the decision-making process
 Business key indicators (revenues, profits, losses are the focus) -
these are the measures or facts
 Description (or specification) of these measures from different
perspectives = dimensions
 Seasonal, regional, according to product groups, etc.
 Subdivision of the evaluation dimensions is possible (for instance:
day, month, quarter, year) = hierarchical subdivision, consolidation
levels

9
2a) Facts/Measures
 Measures/ Facts
 Detailed or Compressed (numerical) values 1200
 Describing the business or application context
 Examples:
 Revenue (Einnahmen), profit, loss, ...
 Types:
 Additive facts:
 additive calculation over all consolidation levels possible, e.g. purchase
value
 Semi-additive facts :
 additive calculation only over selected hierarchies, e.g.: inventory
(addition over different places possible, addition over time leads to the
wrong result), reason: objects to be added must be disjoint
 Non-additive facts:
 no additive calculation possible, e.g. averages or percent values

10
2b) Dimensions
20.12.2017
 Describe the meaning of facts
 Finite set of dimensions (n>=2) Rostock 1200

 Serve the orthogonal structure of the data space


 no functional dependencies between dimensions
 Example of dimensions: product, time, place
 Every dimension has a schema
 Day, week, year
 County, state, country
 Product group, product category, product family
 ... and values
 (1, 2, 3, ..., 31), (1, ... 52), (1900, ..., 2017)
 (...), (Berlin, NRW, ... ), (D, F, ...)
11
Dimension
Dimension schema Top

Year 2013 2014 2015 2017

Quarter I II III IV I II III IV

Month Jan Feb Mar Oct Nov Dec

Day 1 ... 31 1 ... 28 Classification nodes

Classification levels
12
Product hierarchy

From: Geppert, ETZ Zürich, Lecture “Data Warehouse“

 Elements of a level can be organized (have an given


order)
 Organized (ordered): Time
 Disorganized: Products

13
Hierarchies in dimensions
 Dimension members:
 Nodes of a classification hierarchy
 Classification level describes degree of compression
 Forms:
Simple hierarchies Parallel hierarchies
Top Top
Top
Country Product Group
category Year
federal State Branch Region
Product Quarter Week
group
City Area Company
Month
Product
Shop Address
Department
Day
Higher hierarchies contain the aggregated
values of exactly one lower hierarchy

14
2c) Category attributes
Attributes in the hierarchies
 Primary attributes Top

 Define most detailed level


 Example: shop address, product, second, ... Country

 Classification attributes
 higher-level attributes of the dimensions federal State
 Example: customer group, city, state, country
 Dimensional attributes City
 Other attributes like address, telephone number, branch
name ...
Shop Address
 Depending on a primary attribute or classification
attribute
 stored together with the classification attributes
 adding additional information to the dimensions

15
Cube
(actually hyper cuboid)
The data structure:
 Cube (actually cuboid):
 Basis for the multidimensional data model
 Edges of the cube: dimensions
 Cells: one or more (!) key figures as a function of the
dimensions
 Number of dimensions: dimensionality of the cube
 Visualization:
 2 Dimensions: table
 3 dimensions: cube
 More than 3 dimensions: multidimensional structure

16
Definition of a cube

 Schema of a cube
 Set of dimensions (schemes) DS
 Set of facts (measures) M
 Formal:
 C =(DS, M) = ( {D1, ..., Dn}, {M1, …, Mm} )

 Dimensions have to be independent,


 that means: No functional dependencies between
attributes of different dimensions

17
2d) Graphical representation, example
of a cube

The individual
cells contain 1 or
more facts each

18
2e) OLAP Operations
Operations on the
multidimensional model:

1. Slice:
 Excision of "slices" of a cube
 Reducing the dimensionality
 E.g.: all values of a year

2. Dice:
 Cutting out a "partial cube"
 Getting the dimensionality,
change of hierarchy objects
 E.g.: values of certain
products or regions

19
OLAP Operations
3. Roll-up
 Generating new information by aggregating the data along the
consolidation path (dimension hierarchies)
 Example: Day Month Quarter Year
 (the number of dimensions in the result is unchanged, the number
of values reduces)

4. Drill-down
 Complementary to roll up
 Navigation of aggregated data to detailed data along the
dimension hierarchy

5. Drill-across
 Change from one cube to another

20
OLAP Operations

drill down

June sum

May Mexico

April 2. Quarter
sum
March 1. Quarter

February

January roll up

21
OLAP Operations
 rollup and drill down in a tabular representation

22
OLAP Operations
6. rotating (or pivoting)
 Rotating the cube by swapping the dimensions
 For analyzing the data from different perspectives

23
OLAP Operations
7. Split and Merge
 can also be referred to Nest and Unnest

 merge/nest means that n dimensions are shown in n-x dimensions


 all combinations of the dimension values are generated
24
3. Data Warehousing Process

 after introducing the multidimensional model with


 multidimensional data (facts, dimensions) and
 operations (dice, slice, roll up, drill down, rotate)

 we now focus on the Process and a Reference


Architecture for Data Warehouses

25
3a) How does the data comes into the
Data Warehouse?
 Multi-step process to insert the data
 called ETL Process (Extraction, Transformation, Load)

 Runs periodically / frequently


 Must be an automated process and thus also part of the
Data Warehouse
Reference architecture of a Data Warehouse (see next slide)

26
3b) Reference architecture
Data Warehouse system
Analysis

Data
Warehouse

Data
Metadata
Loading Warehouse Repository
Manager
Manager

Basis
Database

Loading

Trans-
Work formation Description
Space
Data flow to this slide:
Extraction Control flow
in the lecture
Monitor

Data Illustration following Bauer, A; Günzel, H.


Sources
27
Metadata repository
 „... identified as key success factor in DWH ...“
 Extension of the database system catalog
 Storing all relevant data warehouse metadata
 source descriptions
 schemes, data types,
 process descriptions, scripts (ETL)
 access rights
 view definitions
 authors
 version control
 configuration management, ...
 Objectives:
 Transparency of the processes
 Avoiding misinterpretations
 Technical description of the DWH
28
3c) Requirements on a Data Warehouse
 Independence between data sources and analysis systems
 concerning: availability, load, current changes
 Stable storage of integrated and derived data (persistence)
 Reusability of stored data
 Possibility of conducting arbitrary analysis on the data in
the warehouse
 Support of individual views (e.g. in terms of structure, time,
geographic characteristics)
 Extensibility: integration of new sources
 Automation of processes (!!)
 Focus on the purpose: analysis of data

29
No summary for today, only an outlook

We have already seen:


 Multidimensional data model of Data Warehouses

Next lectures:
 Different storage methods
1. based on relational databases or
2. multidimensional storage
 Data Warehouse queries: OLAP (Online Analytic
Processing)

31
Literature
 Veit Köppen, Gunter Saake, Kai-Uwe Sattler Data Warehouse Technologien:
Technische Grundlagen, mitp Professional, 2012
 VL: Data Warehousing und Data Mining, Ulf Leser: HU Berlin,
Sommersemester 2007
 VL: Kai-Uwe Sattler, Gunter Saake: Data Warehouses, TU Magdeburg
 Wolfgang Lehner: Datenbanktechnologie für Data-Warehouse-Systeme;
dPunkt, Heidelberg, 2002
 A. Bauer, H. Günzel; Data-Warehouse-Systeme - Architektur, Entwicklung,
Anwendung; dPunkt, Heidelberg, 2000
 VL: Rico Landefeld, Blockseminar Data Warehousing, Lehrstuhl für
Datenbanken und Informationssysteme, Universität Jena, 2005
 Matthias Goeken: Entwicklung von Data-Warehouse-Systemen :
Anforderungsmanagement, Modellierung, Implementierung, Deutscher
Universitäts-Verlag, Wiesbaden, 2006
 Neckel, Knobloch: Customer Relationship Analytics, dpunkt.verlag, 2005

32

You might also like