0% found this document useful (0 votes)

401 views55 pages

Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann

The document discusses data warehousing and online analytical processing (OLAP), including the differences between OLTP and OLAP systems, common data warehouse architectures like star schemas, and SQL extensions used for aggregation queries in data warehousing. It also provides examples of typical OLAP queries and concepts like drill-down, roll-up, and data cubes used to enable interactive exploration and analysis of aggregated data.

Uploaded by

Priya Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

401 views55 pages

Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann

Uploaded by

Priya Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 55

Data Warehousing

(Advanced Query Processing)

Carsten Binnig
Donald Kossmann
http://www.systems.ethz.ch
Selected References
• General
– Chaudhuri, Dayal: An Overview of Data Warehousing and OLAP Technology.
SIGMOD Record 1997
– Lehner: Datenbanktechnologie für Data Warehouse Systeme. Dpunkt Verlag 2003
– (…)
• New Operators and Algorithms
– Agrawal, Srikant: Fast Algorithms for Association Rule Mining. VLDB 1994
– Barateiro, Galhardas: A Survey of Data Quality Tools. Datenbank Spektrum 2005
– Börszonyi, Kossmann, Stocker: Skyline Operator. ICDE 2001
– Carey, Kossmann: On Saying Enough Already in SQL. SIGMOD 1997
– Dalvi, Suciu: Efficient Query Evaluation on Probabilistic Databases. VLDB 2004
– Gray et al.: Data Cube... ICDE 1996
– Helmer: Evaluating different approaches for indexing fuzzy sets. Fuzzy Sets and
Systems 2003
– Olken: Database Sampling - A Survey. Technical Report LBL.
– (…)
References (ctd.)
• Projects & Systems
– Aurora, Borealis Systems (Brown, Brandeis, MIT)
– Dittrich, Kossmann, Kreutz: Bridging the Gap between OLAP and
SQL. VLDB 2005 (Btell Demo)
– Garlic (IBM) - Haas et al. VLDB 1997
– STREAM (Stanford)
– Telegraph (Berkeley)
– Trio (Stanford)
• SQL Extensions
– Jensen et al. The Consensus Glossary of Temporal Database
Concepts. Dagstuhl 1997 (also see TSQL)
– Kimball, Strehlo: Why decision support fails and how to fix it.
SIGMOD Record 1995
– Witkowski et al.: Spreadsheets in RDBMS. SIGMOD 2003
History of Databases
• Age of Transactions (70s - 00s)
– Goal: reliability - make sure no data is lost
– 60s: IMS (hierarchical data model)
– 80s: Oracle (relational data model)
• Age of Business Intelligence (95 -)
– Goal: analyze the data -> make business decisions
– Aggregate data for boss. Tolerate imprecision!
– SAP BW, Microstrategy, Cognos, … (rel. model)
• Age of „Data for the Masses“
– Goal: everybody has access to everything, M2M
– Google (text), Cloud (XML, JSON: Services)
Purpose of this Class
• Age of Transactions (70s - 00s)
– Goal: reliability - make sure no data is lost
– 60s: IMS (hierarchical data model)
– 80s: Oracle (relational data model)
• Age of Business Intelligence (95 -)
– Goal: analyze the data -> make business decisions
– Aggregate data for boss. Tolerate imprecision!
– SAP BW, Microstrategy, Cognos, … (rel. model)
• Age of „Data for the Masses“
– Goal: everybody has access to everything, M2M
– Google (text), Cloud (XML, JSON: Services)
Overview
• Motivation and Architecture
• SQL Extensions for Data Warehousing (DSS)
• Algorithms and Query Processing Techniques
• ETL, Virtual Databases (Data Integration)
• Parallel Databases, Cloud
• Data Mining
• Probabilistic Databases
• Industry Talk
OLTP vs. OLAP
• OLTP – Online Transaction Processing
– Many small transactions
(point queries: UPDATE or INSERT)
– Avoid redundancy, normalize schemas
– Access to consistent, up-to-date database
• OLTP Examples:
– Flight reservation (see IS-G)
– Order Management, Procurement, ERP
• Goal: 6000 Transactions per second
(Oracle 1995)
OLTP vs. OLAP
• OLAP – Online Analytical Processing
– Big queries (all the data, joins); no Updates
– Redundancy a necessity (Materialized Views, special-
purpose indexes, de-normalized schemas)
– Periodic refresh of data (daily or weekly)
• OLAP Examples
– Management Information (sales per employee)
– Statistisches Bundesamt (Volkszählung)
– Scientific databases, Bio-Informatics
• Goal: Response Time of seconds / few minutes
OLTP vs. OLAP (Water and Oil)
• Lock Conflicts: OLAP blocks OLTP
• Database design:
– OLTP normalized, OLAP de-normalized
• Tuning, Optimization
– OLTP: inter-query parallelism, heuristic optimization
– OLAP: intra-query parallelism, full-fledged optimization
• Freshness of Data:
– OLTP: serializability
– OLAP: reproducability
• Precision:
– OLTP: ACID
– OLAP: Sampling, Confidence Intervals
Solution: Data Warehouse
• Special Sandbox for OLAP
• Data input using OLTP systems
• Data Warehouse aggregates and replicates data
(special schema)
• New Data is periodically uploaded to Warehouse
• Old Data is deleted from Warehouse
– Archiving done by OLTP system for legal reasons
Architecture
OLTP OLAP

OLTP Applications GUI, Spreadsheets

DB1
DB2
Data Warehouse
DB3
Data Warehouses in the real World
• First industrial projects in 1995
• At beginning, 80% failure rate of projects
• Consultants like Accenture dominate market
• Why difficult: Data integration + cleaning,
poor modeling of business processes in warehous
• Data warehouses are expensive
(typically as expensive as OLTP system)
• Success Story: WalMart - 20% cost reduction
because of Data Warehouse (just in time...)
Products and Tools
• Oracle 11g, IBM DB2, Microsoft SQL Server, ...
– All data base vendors
• SAP Business Information Warehouse
– ERP vendors
• MicroStrategy, Cognos
– Specialized vendors
– „Web-based EXCEL“
• Niche Players (e.g., Btell)
– Vertical application domain
Star Schema (relational)
Dimension Table
(e.g., POS)

Dimension Table Dimension Table

(e.g. Customer) (e.g., Time)

Fact Table
(e.g., Order)

Dimension Table
Dimension Table (e.g., Product)
(e.g., Supplier)
Fact Table (Order)
No. Cust. Date ... POS Price Vol. TAX
001 Heinz 13.5. ... Mainz 500 5 7.0
002 Ute 17.6. ... Köln 500 1 14.0
003 Heinz 21.6. ... Köln 700 1 7.0
004 Heinz 4.10. ... Mainz 400 7 7.0
005 Karin 4.10. ... Mainz 800 3 0.0
006 Thea 7.10. ... Köln 300 2 14.0
007 Nobbi 13.11. ... Köln 100 5 7.0
008 Sarah 20.12 ... Köln 200 4 7.0
Fact Table
• Structure:
– key (e.g., Order Number)
– Foreign key to all dimension tables
– measures (e.g., Price, Volume, TAX, …)
• Store moving data (Bewegungsdaten)
• Very large and normalized
Dimension Table (PoS)
Name Manager City Region Country Tel.
Mainz Helga Mainz South D 1422
Köln Vera Hürth South D 3311

• De-normalized: City -> Region -> Country

• Avoid joins
• fairly small and constant size
• Dimension tables store master data (Stammdaten)
• Attributes are called Merkmale in German
Snowflake Schema
• If dimension tables get too large
– Partition the dimension table
• Trade-Off
– Less redundancy (smaller tables)
– Additional joins needed
• Exercise: Do the math!
Typical Queries
SELECT d1.x, d2.y, d3.z, sum(f.z1), avg(f.z2)
FROM Fact f, Dim1 d1, Dim2 d2, Dim3 d3
WHERE a < d1.feld < b AND d2.feld = c AND
Join predicates
GROUP BY d1.x, d2.y, d3.z;

• Select by Attributes of Dimensions

– E.g., region = „south“
• Group by Attributes of Dimensions
– E.g., region, month, quarter
• Aggregate on measures
– E.g., sum(price * volumen)
Example
SELECT f.region, z.month, sum(a.price * a.volume)
FROM Order a, Time z, PoS f
WHERE a.pos = f.name AND a.date = z.date
GROUP BY f.region, z.month

South May 2500

North June 1200
South October 5200
North October 600
Drill-Down und Roll-Up
• Add attribute to GROUP BY clause
– More detailed results (e.g., more fine-grained results)
• Remove attribute from GROUP BY clause
– More coarse-grained results (e.g., big picture)
• GUIs allow „Navigation“ through Results
– Drill-Down: more detailed results
– Roll-Up: less detailed results
• Typical operation, drill-down along hierarchy:
– E.g., use „city“ instead of „region“
Data Cube
Product
Sales by
Product and
Year

Year
all

Balls
alle
2000
Nets 1999
1998 Region
North South all
Moving Sums, ROLLUP
• Example:
GROUP BY ROLLUP(country, region, city)
Give totals for all countries and regions
• This can be done by using the ROLLUP Operator
• Attention: The order of dimensions in the GROUP
BY clause matters!!!
• Again: Spreadsheets (EXCEL) are good at this
• The result is a table! (Completeness of rel. model!)
ROLLUP alla IBM UDB

SELECT Country, Region, City, sum(price*vol)

FROM Orders a, PoS f
WHERE a.pos = f.name
GROUP BY ROLLUP(Country, Region, City)
ORDER BY Country, Region, City;

Also works for other aggregate functions; e.g., avg().

Result of ROLLUP Operator

D North Köln 1000

D North (null) 1000
D South Mainz 3000
D South München 200
D South (null) 3200
D (null) (null) 4200
Summarizability (Unit)
• Legal Query
SELECT product, customer, unit, sum(volume)
FROM Order
GROUP BY product, customer, unit;
• Legal Query (product -> unit)
SELECT product, customer, sum(volume)
FROM Order
GROUP BY product, customer;
• Illegal Query (add „kg“ to „m“)!!!
SELECT customer, sum(volume)
FROM Order
GROUP BY customer;
Summarizability (de-normalized data)
Region Customer Product Volume Populat.
South Heinz Balls 1000 3 Mio.
South Heinz Nets 500 3 Mio.
South Mary Balls 800 3 Mio.
South Mary Nets 700 3 Mio.
North Heinz Balls 1000 20 Mio.
North Heinz Nets 500 20 Mio.
North Mary Balls 800 20 Mio.
North Mary Nets 700 20 Mio.
Customer, Product -> Revenue
Region -> Population
Summarizability (de-normalized data)
• What is the result of the following query?

SELECT region, customer, product, sum(volume)

FROM Order
GROUP BY ROLLUP(region, customer, product);

• All off-the-shelf databases get this wrong!

• Problem: Total Revenue is 3000 (not 6000!)
• BI Tools get it right: keep track of functional
dependencies
• Problem arises if reports involve several unrelated
measures.
Cube Operator

• Operator that computes all „combinations“

• Result contains „(null)“ Values to encode „all“

SELECT product, year, region, sum(price * vol)

FROM Orders
GROUP BY CUBE(product, year, region);
Result of Cube Operator
Product Region Year Revenue
Netze Nord 1998 ...
Bälle Nord 1998 ...
(null) Nord 1998 ...
Netze Süd 1998 ...
Bälle Süd 1998 ...
(null) Süd 1998 ...
Netze (null) 1998 ...
Bälle (null) 1998 ...
(null) (null) 1998 ...
Visualization as Cube
Product

Year
all

Balls
all
2000
Nets 1999
1998 Region
North South all
Pivot Tables
• Define „columns“ by group by predicates
• Not a SQL standard! But common in products
•Reference: Cunningham, Graefe, Galindo-Legaria: PIVOT
and UNPIVOT: Optimization and Execution Strategies in an
RDBMS. VLDB 2004
UNPIVOT (material, factory)
PIVOT (material, factory)
Btell Demo

http://www.btell.de
Top N
• Many applications require top N queries
• Example 1 - Web databases
– find the five cheapest hotels in Madison
• Example 2 - Decision Support
– find the three best selling products
– average salary of the 10,000 best paid employees
– send the five worst batters to the minors
• Example 3 - Multimedia / Text databases
– find 10 documents about „database“ and „web“.
• Queries and updates, any N, all kinds of data
Key Observation
Top N queries cannot be expressed well in SQL

SELECT * FROM Hotels h

WHERE city = Madison AND
5 > (SELECT count(*) FROM Hotels h1
WHERE city = Madison AND h1.price < h.price);

• So what do you do?

– Implement top N functionality in your application
– Extend SQL and the database management system
Implementation of Top N in the App
• Applications use SQL to get as close as possible
• Get results ordered, consume only N objects
and/or specify predicate to limit # of results
SELECT * SELECT *
FROM Hotels FROM Hotels
WHERE city = Madison WHERE city = Madison
ORDER BY price; AND price < 70;

– either too many results, poor performance

– or not enough results, user must ask query again
– difficult for nested top N queries and updates
Extend SQL and DBMS
• STOP AFTER clause specifies number of results
SELECT *
FROM Hotels
WHERE city = Madison
ORDER BY price
STOP AFTER 5 [WITH TIES];

• Returns five hotels (plus ties)

• Challenge: extend query processor, performance
Top N Queries Revisited
• Example: The five cheapest hotels
SELECT *
FROM Hotels
ORDER BY price
STOP AFTER 5;

• What happens if you have several criteria?

Nearest Neighbor Search
• Cheap and close to the beach
SELECT *
FROM Hotels
ORDER BY distance * x + price * y
STOP AFTER 5;

• How to set x and y ?

Skyline Queries
• Hotels which are close to the beach and cheap.

distance
x x x
x x
x x
x x x x x x
x x
x x Convex Hull
x
x x
Top 5 x Skyline (Pareto Curve)
price

Literatur: Maximum Vector Problem. [Kung et al. 1975]

Syntax of Skyline Queries
• Additional SKYLINE OF clause
[Börszönyi, Kossmann, Stocker 2001]

• Cheap & close to the beach

SELECT *
FROM Hotels
WHERE city = ´Nassau´
SKYLINE OF distance MIN, price MIN;
Flight Reservation
• Book flight from Washington DC to San Jose

SELECT *
FROM Flights
WHERE depDate < ´Nov-13´
SKYLINE OF price MIN,
distance(27750, dept) MIN,
distance(94000, arr) MIN,
(`Nov-13` - depDate) MIN;
Visualisation (VR)
• Skyline of NY (visible buildings)

SELECT * FROM Buildings

WHERE city = `New York`
SKYLINE OF h MAX, x DIFF, z MIN;
Location-based Services
• Cheap Italian restaurants that are close
• Query with current location as parameter

SELECT *
FROM Restaurants
WHERE type = `Italian`
SKYLINE OF price MIN, d(addr, ?) MIN;
Skyline and Standard SQL
• Skyline can be expressed as nested Queries

SELECT *
FROM Hotels h
WHERE NOT EXISTS (
SELECT * FROM Hotels
WHERE h.price >= price AND h.d >= d
AND (h.price > price OR h.d > d))

• Such queries are quite frequent in practice

• The response time is desastrous
Online Aggregation
• Get approximate result very quickly
• Result (conf. intervals) get better over time
• Based on random sampling (difficult!)
• No product supports this yet

SELECT cust, avg(price)

Cust Avg +/- Conf
FROM Order Heinz 1375 5% 90%
GROUP BY cust Ute 2000 5% 90%
Karin - - -
Time Travel
• Give results of query AS OF a certain point in time
• Idea: Database is a sequence of states
– DB1, DB2, DB3, … DBn
– Each commit of a transaction creates a new state
– To each state, associate time stamp and version number
• Idea builds on top of „serialization“
– Time travel mostly relevant for OLTP system in order to
get reproducable results or recover old data
• Implementation (Oracle - Flashback)
– Leverage versioned data store + snapshot semantics
– Chaining of versions of records
– Specialized index structures (add time as a „parameter“)
Time Travel Syntax
• Give me avg(price) per customer as of last week
SELECT cust, avg(price)
FROM Order AS OF MAR-23-2007
GROUP BY cust

• Can use timestamp or version number

– Special built-in functions to convert timestamp <-> von
– None of this is standardized (all Oracle specific)
Notification (Oracle)
• Inform me when account drops below 1000
SELECT *
FROM accounts a
WHEN a.balance < 1000

• Based on temporal model

– Query state transitions; monitor transition: false->true
– No notification if account stays below 1000
• Some issues:
– How to model „delete“?
– How to create an RSS / XML stream of events?
DBMS for Data Warehouses
• ROLAP – Extend RDBMS
– Special Star-Join Techniques
– Bitmap Indexes
– Partition Data by Time (Bulk Delete)
– Materialized Views
• MOLAP – special multi-dimensional systems
– Implement cube as (multi-dim.) array
– Pro: potentially fast (random access in array)
– Problem: array is very sparse
• Religious war (ROLAP wins in industry)
Materialized Views
• Compute the result of a query using the
result of another query
• Principle: Subsumption
– The set of all German researchers is a subset of
the set of all researchers
– If query asks for German researchers, use set of
all researchers, rather than all people
• Subsumption works well for GROUP BY
Materialized View

SELECT product, year, region, sum(price * vol)

FROM Order
GROUP BY product, year, region;

GROUP BY Materialized
product, year View

SELECT product, year, sum(price * vol)

FROM Order
GROUP BY product, year;
Computation Graph of Cube
{}

{product} {year} {region}

{product, year} {product, region} {year, region}

{product, year, region}

Power BI Notes
100% (11)
Power BI Notes
104 pages
SnowFlake Notes
100% (1)
SnowFlake Notes
40 pages
Azure Databricks Course Slide Deck V4
100% (4)
Azure Databricks Course Slide Deck V4
308 pages
Snowflake Scenario Based Interview Questions
100% (2)
Snowflake Scenario Based Interview Questions
20 pages
Data Analytics With MS Excel Power BI This Book Will Transform You Into Data Analytics Expert
100% (3)
Data Analytics With MS Excel Power BI This Book Will Transform You Into Data Analytics Expert
183 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
Microsoft Power BI Cookbook by Greg Deckler
100% (19)
Microsoft Power BI Cookbook by Greg Deckler
655 pages
Azure Databricks
67% (6)
Azure Databricks
69 pages
Power Query - Reference Book PDF
100% (16)
Power Query - Reference Book PDF
236 pages
Azure Data Factory
77% (13)
Azure Data Factory
52 pages
Power BI DAX Simplified B099SBN1XP
93% (15)
Power BI DAX Simplified B099SBN1XP
542 pages
Data Build Tool (DBT)
No ratings yet
Data Build Tool (DBT)
65 pages
Azure Databricks Course Slide Deck
75% (4)
Azure Databricks Course Slide Deck
169 pages
PastPapers Harony P6 2024
No ratings yet
PastPapers Harony P6 2024
335 pages
EDW-ETL Migration Approaches With Databricks
No ratings yet
EDW-ETL Migration Approaches With Databricks
34 pages
MGH Data Analysis With Microsoft Power BI 126045861X
92% (13)
MGH Data Analysis With Microsoft Power BI 126045861X
808 pages
DMDW Notes
100% (1)
DMDW Notes
62 pages
Collect, Transform and Combine Data Using Power BI and Power Query in Excel (Business Skills)
85% (13)
Collect, Transform and Combine Data Using Power BI and Power Query in Excel (Business Skills)
543 pages
Data Analysis With Databricks
75% (4)
Data Analysis With Databricks
80 pages
MultiDimensional Data Model
No ratings yet
MultiDimensional Data Model
22 pages
Power BI Data Analysis and Visualization
100% (9)
Power BI Data Analysis and Visualization
289 pages
Introduction To Data Structures - BCA
100% (1)
Introduction To Data Structures - BCA
27 pages
Applied Microsoft Power BI Bring Your Data To Life
100% (13)
Applied Microsoft Power BI Bring Your Data To Life
592 pages
Slido Question On Module 1
No ratings yet
Slido Question On Module 1
4 pages
Learn Power BI Step by Step Guide To Building Your Own Reports (CDO Advisors Book 1)
100% (5)
Learn Power BI Step by Step Guide To Building Your Own Reports (CDO Advisors Book 1)
87 pages
Data Warehouse Schemas For Decision Support
No ratings yet
Data Warehouse Schemas For Decision Support
13 pages
POWER BI Tutorial
89% (9)
POWER BI Tutorial
77 pages
PowerBI Presentation
100% (6)
PowerBI Presentation
155 pages
100 Dataengineering Interview Questions TRRaveendra 1694654407
No ratings yet
100 Dataengineering Interview Questions TRRaveendra 1694654407
58 pages
Power BI
100% (6)
Power BI
32 pages
DW Concepts
100% (1)
DW Concepts
40 pages
Azure Databricks Interview
100% (2)
Azure Databricks Interview
35 pages
OLAP2
No ratings yet
OLAP2
53 pages
Power BI MVP Book 1089210515 PDF
100% (10)
Power BI MVP Book 1089210515 PDF
495 pages
8085 Assembly Language Programs
No ratings yet
8085 Assembly Language Programs
29 pages
Azure Data Factory
100% (2)
Azure Data Factory
10 pages
OLAP1
100% (1)
OLAP1
31 pages
01-Structures, Unions, Enumerations, File Processing
No ratings yet
01-Structures, Unions, Enumerations, File Processing
52 pages
The Data Analytics Handbook2 PDF
100% (6)
The Data Analytics Handbook2 PDF
43 pages
C++ Data Structures
100% (1)
C++ Data Structures
6 pages
SQL Interview Questions PDF
88% (43)
SQL Interview Questions PDF
48 pages
Architecting A Data Lake
100% (8)
Architecting A Data Lake
60 pages
Online Random File Generator
No ratings yet
Online Random File Generator
2 pages
LectureNotes Data Warehousing
No ratings yet
LectureNotes Data Warehousing
126 pages
Neo4j-Manual-2 0 0
No ratings yet
Neo4j-Manual-2 0 0
591 pages
Lowell Fryman USGS 2017 Final
No ratings yet
Lowell Fryman USGS 2017 Final
18 pages
Introduction To Data Warehousing and Business Intelligence
No ratings yet
Introduction To Data Warehousing and Business Intelligence
72 pages
Data Warehousing & OLAP
No ratings yet
Data Warehousing & OLAP
57 pages
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
No ratings yet
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
119 pages
Reydisp Manual
No ratings yet
Reydisp Manual
63 pages
DW Notes
No ratings yet
DW Notes
72 pages
Data Warehousing: Data Models and OLAP Operations: by Kishore Jaladi
No ratings yet
Data Warehousing: Data Models and OLAP Operations: by Kishore Jaladi
41 pages
Lecture 2 Data Access
No ratings yet
Lecture 2 Data Access
40 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
39 pages
Unit 2
No ratings yet
Unit 2
34 pages
OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch
No ratings yet
OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch
46 pages
Elective-I Advanced Database Management Systems
No ratings yet
Elective-I Advanced Database Management Systems
67 pages
03 Data Warehousing Data Mining MIM
No ratings yet
03 Data Warehousing Data Mining MIM
48 pages
Data Mining - 3 PDF
No ratings yet
Data Mining - 3 PDF
62 pages
Data Integration - Solution Strategy
No ratings yet
Data Integration - Solution Strategy
37 pages
User Manual: Version 3.3 - February 2015
No ratings yet
User Manual: Version 3.3 - February 2015
108 pages
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
No ratings yet
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
37 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
9 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
DWM Unit 1
No ratings yet
DWM Unit 1
67 pages
OLAP Vs OLTP 1635783645
No ratings yet
OLAP Vs OLTP 1635783645
44 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
47 pages
DMDW 6
No ratings yet
DMDW 6
41 pages
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
No ratings yet
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
32 pages
Datawarehousing
No ratings yet
Datawarehousing
71 pages
An Introduction To MSBI & DWH by QuontraSolutions
No ratings yet
An Introduction To MSBI & DWH by QuontraSolutions
32 pages
Lecture - Notes C++
No ratings yet
Lecture - Notes C++
24 pages
Data Warehousing & OLAP (Business Intellegent)
No ratings yet
Data Warehousing & OLAP (Business Intellegent)
31 pages
Adbms: Data Warehousing OLAP Technology
No ratings yet
Adbms: Data Warehousing OLAP Technology
57 pages
Lect4 IoT NetConf Yang1
No ratings yet
Lect4 IoT NetConf Yang1
38 pages
Difference Between OLAP and OLTP: Feature OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
No ratings yet
Difference Between OLAP and OLTP: Feature OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
34 pages
BA Module 1
No ratings yet
BA Module 1
52 pages
Decision Support, Data Warehousing, and OLAP
No ratings yet
Decision Support, Data Warehousing, and OLAP
48 pages
Unit 2
No ratings yet
Unit 2
32 pages
CS 345: Topics in Data Warehousing - Lecture 2
No ratings yet
CS 345: Topics in Data Warehousing - Lecture 2
27 pages
Chapter-V Memory Management
No ratings yet
Chapter-V Memory Management
43 pages
Decision Support, Data Warehousing, and OLAP: by Prof. Sham Navathe
No ratings yet
Decision Support, Data Warehousing, and OLAP: by Prof. Sham Navathe
37 pages
BusinessIntelligence 2023
No ratings yet
BusinessIntelligence 2023
36 pages
DW - Rolap Molap Holap
No ratings yet
DW - Rolap Molap Holap
48 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
41 pages
CS554 - Advanced Database Systems Homework 8: Undo-Log Records
No ratings yet
CS554 - Advanced Database Systems Homework 8: Undo-Log Records
15 pages
DM 6
No ratings yet
DM 6
29 pages
ML Module1
No ratings yet
ML Module1
56 pages
Chapter6 DataWareHousing Final
No ratings yet
Chapter6 DataWareHousing Final
46 pages
Data Engineering 101 Sample DW
No ratings yet
Data Engineering 101 Sample DW
23 pages
Chapter 3 Data Warehouse & OLAP
No ratings yet
Chapter 3 Data Warehouse & OLAP
17 pages
Teradata SQL Cheat Sheet - Free Download - ETL With SQL
No ratings yet
Teradata SQL Cheat Sheet - Free Download - ETL With SQL
31 pages
COMP321F Database
No ratings yet
COMP321F Database
31 pages
Data Warehousing Unit 1,2
No ratings yet
Data Warehousing Unit 1,2
9 pages
Lecture 3 - 2 - OLAP, Data Warehouse, and Column Store
No ratings yet
Lecture 3 - 2 - OLAP, Data Warehouse, and Column Store
60 pages
Data Warehouse Unit 4 Complete
No ratings yet
Data Warehouse Unit 4 Complete
21 pages
Serial Key Maker API
No ratings yet
Serial Key Maker API
16 pages
Creating and Configuring Alarms: Struxureware Building Operation 1.9
No ratings yet
Creating and Configuring Alarms: Struxureware Building Operation 1.9
26 pages
By - Shaik Yasir Ahmed
No ratings yet
By - Shaik Yasir Ahmed
30 pages
OLTP and OLAP
No ratings yet
OLTP and OLAP
46 pages
Uart Controller v1 1
No ratings yet
Uart Controller v1 1
10 pages
Lecture No. 15
No ratings yet
Lecture No. 15
11 pages
Radio Frequency Modems: Technical Note #15
No ratings yet
Radio Frequency Modems: Technical Note #15
5 pages
ADBMS Lec1
No ratings yet
ADBMS Lec1
46 pages
Data Mining Finals Notes
No ratings yet
Data Mining Finals Notes
35 pages
Backup
No ratings yet
Backup
5 pages
InteliConfig Install Suite
No ratings yet
InteliConfig Install Suite
2 pages
Computer Architecture Introduction
No ratings yet
Computer Architecture Introduction
4 pages
PHP 2
No ratings yet
PHP 2
4 pages
DXF Codes and AutoLISP Da
No ratings yet
DXF Codes and AutoLISP Da
4 pages
DataWarehousing - Powerpoint Canadien Cs - Sfu.ca 2e Version
No ratings yet
DataWarehousing - Powerpoint Canadien Cs - Sfu.ca 2e Version
14 pages
Facts and Opinions
No ratings yet
Facts and Opinions
1 page
Data Warehousing and Decision Support
No ratings yet
Data Warehousing and Decision Support
8 pages
TP Error 0247
No ratings yet
TP Error 0247
3 pages
Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide
From Everand
Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide
Reza Rad
No ratings yet
Programming Microsoft Dynamics® NAV 2013
From Everand
Programming Microsoft Dynamics® NAV 2013
David A. Studebaker
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Fabricated Steel Plate Work World Summary: Market Values & Financials by Country
From Everand
Fabricated Steel Plate Work World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet

Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann

Uploaded by

Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann

Uploaded by

Data Warehousing

(Advanced Query Processing)

OLTP Applications GUI, Spreadsheets

Dimension Table Dimension Table

• De-normalized: City -> Region -> Country

• Select by Attributes of Dimensions

South May 2500

SELECT Country, Region, City, sum(price*vol)

Also works for other aggregate functions; e.g., avg().

D North Köln 1000

SELECT region, customer, product, sum(volume)

• All off-the-shelf databases get this wrong!

• Operator that computes all „combinations“

SELECT product, year, region, sum(price * vol)

SELECT * FROM Hotels h

• So what do you do?

– either too many results, poor performance

• Returns five hotels (plus ties)

• What happens if you have several criteria?

• How to set x and y ?

Literatur: Maximum Vector Problem. [Kung et al. 1975]

• Cheap & close to the beach

SELECT * FROM Buildings

• Such queries are quite frequent in practice

SELECT cust, avg(price)

• Can use timestamp or version number

• Based on temporal model

SELECT product, year, region, sum(price * vol)

SELECT product, year, sum(price * vol)

{product} {year} {region}

{product, year} {product, region} {year, region}

{product, year, region}

You might also like