0% found this document useful (0 votes)

76 views46 pages

OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch

Uploaded by

Shakul Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views46 pages

OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch

Uploaded by

Shakul Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

OLAP and Data Warehousing

Slides courtesy of:

Julia Stoyanovitch
Columbia University

Surajit Chaudhuri
Microsoft Research, Redmond, WA, USA
[email protected]

Umeshwar Dayal
Hewlett-Packard Labs., Palo Alto, CA, USA
[email protected]
What is OLAP?
 On-Line Analytical Processing
 Information technology to help the knowledge
worker (executive, manager, analyst) make faster
and better decisions.
 OLAP is an element of decision support systems
(DSS).

© Surajit Chaudhuri, Umeshwar Dayal 23

Running Example: Car Sales
 Cars: carId, make, model, color

 Dealers: dealerId, city, state

 Time of Sale: tid, year, month, day

 Sales: carId, dealerId, tid, price

3
OLTP Queries: Examples

 create a new sales record that indicates that a red

VW Golf was sold in Boston, MA

 see how many black and silver VW Passats were

sold at dealership #123 on April 11th 2005

4
OLAP Queries: Examples
 Analyze comparative sales of the different colors
of VW Golf by state

 See which months are particularly favorable to the

sale of different VW models and colors

 Rank VW dealerships by revenue, displaying a

ranked list of dealerships and % differences in
sales between each dealership and the one ranked
1 place higher
5
OLAP vs. OLTP
OLTP  OLAP
 User  Clerk, IT professional  Knowledge worker
 Function  Day to day operations  Decision support
 DB design  Application-oriented  Subject-oriented
 (E-R based)  (Star, snowflake)
Data  Current, Isolated  Historical, Consolidated
 View  Detailed, Flat relational Summarized,
 Usage  Structured, Repetitive Multidimensional
 Unit of work  Short, simple transaction Ad hoc
 Access Read/write  Complex query
 Operations  Index/hash on prim. key  Read mostly
  Tens
# Records accessed
 Lots of scans
 # Users  Thousands  Millions
 Db size  100 MB - GB  Hundreds
 Metric  Trans. throughput  100 GB - TB
© Surajit Chaudhuri, Umeshwar Dayal
 Query throughput, response 6
OLAP Queries: Challenges
 Many AND, OR in the WHERE clause
 Self-join, nested sub-queries
» Last year’s sales vs this year’s sales for each product
» Show reps for whom every sale has been more than $15000
 Extensive use of aggregation, often on related datasets
 Aggregation over time periods
 Ranking
 Use of statistical functions
 Very large datasets
 Expectation of an interactive response time

7
OLAP Query Tools
 Goal of OLAP is to support ad-hoc querying for
the business analyst (Power user)
 Business analysts are familiar with spreadsheets
 Extend spreadsheet analysis model to work with
warehouse data
» Large data set
» Semantically enriched to understand business terms
(e.g., time, geography)
» Combined with reporting features
 Multidimensional view of data is the foundation of
OLAP.
© Surajit Chaudhuri, Umeshwar Dayal 8
Multidimensional Data Model
 Database is a set of facts (points) in a
multidimensional space
 A fact has a measure dimension
» quantity that is analyzed, e.g., sale amount, budget
 A set of dimensions with respect to which data is
analyzed
» e.g., store, product, date associated with a sale amount
 Dimensions form a sparsely populated coordinate
system
 Each dimension has a set of attributes
» e.g., owner, city and county of store

© Surajit Chaudhuri, Umeshwar Dayal 9

Attribute Hierarchies
 Attributes of a dimension may be related
 An m:1 dependency is most common
 Dependency graph may be:
» Hierarchy: e.g.,
city -> state -> country
» Lattice:
date -> month -> year
date -> week -> year
 Hierarchies are most common
 Dependencies influence choice of operations and
data representation

© Surajit Chaudhuri, Umeshwar Dayal 10

Multidimensional Data
Sales volume as a function of product, time, geography
Dimensions
Color, State, Date
ate

WI
CA Attributes
St

NY
Red
date (year, month, day)
10
Color

Green 50 Attribute Hierarchies and Lattice

Blue 20 Industry Country Year
White 12

Silver 15
Category State Quarter
Black 10

1 2 3 4 5 67
Date Product City Month Week

Fact data: Sales volume in $100 Date

© Surajit Chaudhuri, Umeshwar Dayal 11
ROLAP and MOLAP
 Relational OLAP (ROLAP)
» Relational and Specialized Relational DBMS to store
and manage warehouse data
» OLAP middleware to support missing pieces
– Optimize for each DBMS backend
– Aggregation Navigation Logic
– Additional tools and services
 Multidimensional OLAP (MOLAP)
» Array-based storage structures
» Direct access to array data structures

© Surajit Chaudhuri, Umeshwar Dayal 12

Multiple Aggregations
 Create a 2-dimensional spreadsheet that shows
sum of sales by year as well as by model of car
 Each subtotal requires a separate aggregate query

STATE
Y Sum
E by
A
R Year
Sum By State

© Surajit Chaudhuri, Umeshwar Dayal 13

Example:
Multiple Aggregations
WI CA Total

2003 63 81 144

2004 38 107 145

2005 75 35 110

Total 176 223 399

14
Generalization: The Data Cube
 Base tuples
 Aggregate tuples:
» one aggregation for each subset of dimensions
(powerset)
» exponential number of subsets, but can optimize the
computation
 Example
» N = 3 dimensions
– model = {Golf, Jetta}
– color = {red, black, white}
– state = {NY, CA, WI}
» How many aggregate tuples in the data cube?
– face – 1D agg; edge – 2D agg; corner – 3D agg

15
Operations on Multidimensional
Data Model
 Aggregation (roll-up) of detailed data to create summary
data
 Navigation to detailed data (drill-down) from summary
 Selection (slice) defines a subcube
– Project the cube on fewer dimensions by specifying
coordinates of remaining dimensions
– e.g., sales where state = NY and month = Jan
 Calculation
– Within a dimension, e.g., (sales - expense) by state
– Across dimensions
 Ranking
– top 3% of states by average sales
 Window Queries
16
© Surajit Chaudhuri, Umeshwar Dayal
Roll-up and Drill-Down
 Roll-Up: Use of aggregation
» dimension reduction:
– e.g., total sales by state by color
– e.g., total sales by state
» navigating attribute hierarchy:
– e.g., sales by city -> total sales by state -> total sales by
country
– e.g., total sales by city and year -> total sales by state and year
-> total sales by country
 Drill-Down: Inverse operation of roll-up
» Provides the data set that was aggregated
– e.g., show “base” data for total sales figure for CA state

© Surajit Chaudhuri, Umeshwar Dayal 17

Slice and Dice
 What colors of Golf are not doing so well?

Select color, sum(price)

From SALES
Where model = ‘Golf’ slicing
Group By color dicing

 Keep slicing if results are uniform

18
More Examples
Q: Given a query, which values from the CUBE do
we need to retrieve?

A: To answer a query Q use tuples T s.t.

» If Q groups by A, T must have a non-* value in its
component for A
» If Q slices by A = b, T must have the value b (not * or
any other value) in its component for A
» If Q neither groups nor slices by A, then T has to have
* in its component for A

19
Pivot (Rotate)

th
on
LA
C it

M
SF
NY
y

Juice 10
Product

Cola 50
Milk 20
Cream 12

City
Toothpaste 15
Soap 10
1 2 3 4 5 67
Month
Product
Fact data: Sales volume in $100
Result: cross tabulation
© Surajit Chaudhuri, Umeshwar Dayal 20
Warehouse Database Schema
 Entity-Relationship design techniques not
appropriate
 Design should reflect multidimensional view
 Typical schemas:
» Star Schema
» Snowflake Schema
» Fact Constellation Schema

© Surajit Chaudhuri, Umeshwar Dayal 21

Example of a Star Schema
Product
Order ProdNo
OrderNo ProdName
OrderDate ProdDescr
Fact table Category
Customer OrderNo CategoryDescr
SalespersonID UnitPrice
CustomerNo
CustomerNo QOH
CustomerName Date
CustomerAddress ProdNo
DateKey DateKey
City
CityName Date
Quantity Month
Salesperson
TotalPrice Year
SalespersonID
City
SalespersonName
City CityName
Quota State
Country
© Surajit Chaudhuri, Umeshwar Dayal 22
Star Schema and Variants
 A single fact table and a single table for each
dimension
 Generated keys are used for performance and
maintenance reasons
 Fact constellation: Multiple Fact tables that share
common dimension tables
» Example: ProjectedExpense and ActualExpense may
share dimensional tables
 Snowflake Schema: Represents dimensional
hierarchy by normalization

© Surajit Chaudhuri, Umeshwar Dayal 23

Example of a Snowflake Schema
Product
Order Category
ProdNo
OrderNo CategoryName
ProdName
OrderDate CategoryDescr
ProdDescr
Fact table
Category
Customer OrderNo UnitPrice
CustomerNo SalespersonID QOH
CustomerName CustomerNo
CustomerAddress DateKey Date Month Year
City CityName
DateKey Month Year
ProdNo
Date Year
Salesperson Quantity
Month
TotalPrice
SalespersonID
City State
SalespesonName
City CityName StateName
Quota State Country

© Surajit Chaudhuri, Umeshwar Dayal 24

Performance Considerations
 Normalization for dimension tables
» Read-only data, so no update anomalies
» Fewer joins – better performance
 Pre-computation of summary tables
» Re-use can speed up performance
» How can we use pre-computed results effectively?
 Data is very large, dimension data often sparse
» Crucial to use indexes effectively
» Need for new indexing techniques: bitmap indexes, join
indexes

25
Bit Map Index
 An alternative representation of RID-list
 Comparison, join and aggregation operations are
reduced to bit arithmetic
 Specially advantageous for low-cardinality
domains
» Significant reduction in space and I/O (30:1)
» Adapted for higher cardinality domains
» Compression (e.g., run-length encoding) exploited
» Upper Bound of 2R words for any bitmap over R rows
[Hasan & Sinha, 1997]

© Surajit Chaudhuri, Umeshwar Dayal 26

Bitmap Index Example
M F custid name gender rating
1 0 112 Joe M 3
1 0 115 Ram M 5
0 1 119 Sue F 5
1 0 116 Woo M 4

1 2 3 4 5
0 0 1 0 0
0 0 0 0 1
0 0 0 0 1
0 0 0 1 0 27
Join Index
 Traditional index maps the value in a column to a
list of rows with that value
 Join index maintain relationships between
attribute value of a dimension and the matching
rows in the fact table
 Join index may span multiple dimensions
(composite join index)
» Use join index to identify regions of cartesian product
that are of interest
» Few people in Southern California may buy umbrellas

© Surajit Chaudhuri, Umeshwar Dayal 28

Algorithm Using Bitmapped Join
Indexes
 [O’Neil&Graefe95]
 Maintain bit mapped join indexes between each
dimension table and the fact table
 To answer a query over multiple dimensions
» Take intersection of join indexes until the set of
candidate fact tuples is small
» Do foreign key joins with rest of the dimension tables
» Look up the fact table

Join Index over Star Schema
Product
Order ProdNo
OrderNo ProdName
OrderDate ProdDescr
Fact table Category
Customer OrderNo CategoryDescr
SalespersonID UnitPrice
CustomerNo
CustomerNo QOH
CustomerName Dat
CustomerAddress ProdNo
DateKey e
DateKey
City
CityName Date
Quantity Month
Salesperson
TotalPrice Year
SalespersonID
City
SalespesonName
City CityName
Quota State
Country
© Surajit Chaudhuri, Umeshwar Dayal 30
ROLAP:
Handling of Aggregate Views
 Important component for ROLAP Servers
 Choice of aggregate views to materialize
 Physical representation of Materialized Views in
the star schema
 Logic for Aggregation Navigation
» make optimum use of materialized aggregates to
answer a query

ROLAP: Choice of Aggregate
Views to Materialize
 Storage can increase dramatically if precomputed
views are not chosen properly
 Must take into account queries in the workload,
their frequencies and their costs
 The decision must be taken in the broader context
of physical database design
» e.g., should take into account the choice of indexes
 Heuristic approaches adopted in products

ROLAP: Using Materialized
Views Through Selection
 A query can use a view through a selection if
» Each selection condition C on each dimension d
in the query is
» Logically implies a condition C’ on dimension
d in the view
 Example: A view has sum(sales) by product and
by year for products introduced after 1991
» OK to use for sum(sales) by product for
products introduced after 1992
» CANNOT use for sum(sales) for products
introduced after 1989

Using Materialized Views
through Group By (Roll Up)
 The view V may be applicable via roll-up if for
every grouping attribute g of the query Q:
» Q has Group By a1,..,g, an
» V has Group By a1,..,h, an
» Attribute g is higher than h in the attribute
hierarchy
» Aggregation functions are distributive
 Example: Compute “sum(sales) by category” from
the view “sum(sales) by product”

Data Warehouse
 A decision support database that is maintained
separately from the organization’s operational
databases.
 A data warehouse is a
– subject-oriented,
– integrated,
– time-varying,
– non-volatile
collection of data that is used primarily in
organizational decision making.

-- W.H. Inmon, Building the Data Warehouse, 1992.

© Surajit Chaudhuri, Umeshwar Dayal 35
Why Separate Data Warehouse
 Performance
» Op dbs designed & tuned for known trans. workloads.
» Complex OLAP queries would degrade performance
for operational transactions.
» Special data organization, access & implementation
methods needed for multidimensional views & queries.
 Function
» Missing data: Decision support requires historical data,
which op dbs do not typically maintain.
» Data consolidation: Decision support requires data
consolidation (aggregation, summarization) from many
heterogeneous sources: op dbs, external sources.
» Data quality: Different sources typically use
inconsistent data representations, codes, and formats,
which have to be reconciled.

Data Warehousing Architecture
Monitoring & Administration

Metadata
Repository
OLAP
Servers OLAP
Data Warehouse
External
sources Extract
Transform Query/Reporting
Operational Transport
dbs Serve
Data Mining

Data sources
Data Marts Front-End Tools

Data Warehouse vs. Data Marts
 Enterprise data warehouse: collects all information
about subjects (customers, products, sales, assets,
personnel) that span the entire organization.
» Requires extensive business modeling.
» May take years to design and build.
 Data Marts: Departmental subsets that focus on
selected subjects.
» Marketing data mart: customer, products, sales.
» Faster roll out, but complex integration in the long run.
 Virtual warehouse: views over operational dbs
» materialize some summary views for efficient query
processing
» easier to build
» requisite excess capacity on operational db servers.

Three-Tier Architecture
 Warehouse database server
» almost always a relational DBMS; rarely flat files.
 OLAP servers
» Relational OLAP (ROLAP): extended relational
DBMS that maps operations on multidimensional data
to standard relational operations.
» Multidimensional OLAP (MOLAP): special purpose
server that directly implements multidimensional data
and operations.
 Clients
» Query and reporting tools.
» Analysis tools.
» Data mining tools.

Populating & Refreshing the
Warehouse
 Data extraction
 Data cleaning
 Data transformation
» Convert from legacy/host format to warehouse format
 Load
» Sort, summarize, consolidate, compute views, check
integrity, build indexes, partition
 Refresh
» Propagate updates from sources to the warehouse.

Data Cleaning
 Why?
» Data warehouse contains data that is analyzed for
business decisions
» More data and mulitple sources could mean more errors
» Results in incorrect analysis
 Detecting data anomalies and rectifying them early
has huge payoffs
 Important to identify tools that work together well
 Long Term Solution
» Change business practices and data entry tools
» Repository for metadata

Load
 Issues:
» huge volumes of data to be loaded
» small time window (usually at night) when the
warehouse can be taken off-line
» when to build indexes and summary tables
» allow system administrator to monitor status, cancel
suspend, resume load, or change load rate
» restart after failure with no loss of data integrity.
 Techniques:
» batch load utility: sort input records on clustering key
and use sequential I/O; build indexes and derived tables
» sequential loads still too long (~100 days for TB)
» use parallelism and incremental techniques.
© Surajit Chaudhuri, Umeshwar Dayal 42
Parallel Load
Pipelined and partitioned parallelism

Source tables Scan Sort runs Merge runs Table insert Target tables

Build index record

Sort runs Merge runs Index insert Target index

[Barclay, Barnes, Gray, Sundaresan: Loading Databases Using Dataflow Parallelism]

Incremental Load
 Full load may still take too long.
» entire load is a (long) batch transaction
» replace old table with new after transaction commits
» use periodic checkpoints; after failure, restart from last
checkpoint.
 Use incremental loads during refresh to reduce data
volume
» insert only updated tuples
» now, incremental load conflicts with queries
» break into sequence of shorter transactions (every
~1000 records, every few seconds)
» coordinate this sequence of transactions: must ensure
consistency between base tables and derived tables &
indices.

Refresh
 Issues:
» when to refresh
– on every update: too expensive, only necessary if
OLAP queries need current data (e.g., up-to-the-
minute stock quotes)
– periodically (e.g., every 24 hours, every week) or
after “significant” events
– refresh policy set by administrator based on user
needs and traffic
– possibly different policies for different sources.
» how to refresh.

Refresh Techniques
 Full extract from base tables
» read entire source table or database: expensive
» may be the only choice for legacy databases or files.
 Incremental techniques (related to work on active dbs)
» detect & propagate changes on base tables: replication
servers
– snapshots & triggers (Oracle)
– transaction shipping (Sybase)
» logical correctness
– computing changes to star tables
– computing changes to derived and summary tables
– optimization: only significant changes
» transactional correctness: incremental load.

Electrical Installation Level 5 Learning Guide
No ratings yet
Electrical Installation Level 5 Learning Guide
76 pages
Tappi T411
100% (1)
Tappi T411
4 pages
IMA2023109 - Imagine Invoice 132432 - Thecaratshop
No ratings yet
IMA2023109 - Imagine Invoice 132432 - Thecaratshop
1 page
SQL & NoSQL Cheat Sheet
No ratings yet
SQL & NoSQL Cheat Sheet
52 pages
Isolated Footing Excel Computation
No ratings yet
Isolated Footing Excel Computation
27 pages
Madrid Protocol TMR
No ratings yet
Madrid Protocol TMR
21 pages
Trial Memorandum Plaintiff SAMPLE
100% (4)
Trial Memorandum Plaintiff SAMPLE
10 pages
Amazon Elastic MapReduce PDF
No ratings yet
Amazon Elastic MapReduce PDF
231 pages
Philippine Education: Where We Are, Basic Characteristics, Issues and Concerns
No ratings yet
Philippine Education: Where We Are, Basic Characteristics, Issues and Concerns
56 pages
Govindarajan Data Vault PDF
100% (1)
Govindarajan Data Vault PDF
29 pages
Deep Dive and Best Practices For Amazon Redshift ANT418
100% (1)
Deep Dive and Best Practices For Amazon Redshift ANT418
85 pages
Data Engineering For Everyone 3
No ratings yet
Data Engineering For Everyone 3
81 pages
ADF Course Content
No ratings yet
ADF Course Content
11 pages
Table Showing Current Ratio: List of Tables
No ratings yet
Table Showing Current Ratio: List of Tables
37 pages
Fundamentals of Big Data Engineering: A Guide To The
No ratings yet
Fundamentals of Big Data Engineering: A Guide To The
14 pages
Reading Comprehension
100% (2)
Reading Comprehension
13 pages
Teradata Vantage™ SQL Operators and User Defined Functions
No ratings yet
Teradata Vantage™ SQL Operators and User Defined Functions
272 pages
Informatica Audit Tables
100% (2)
Informatica Audit Tables
27 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
DBT Notes
No ratings yet
DBT Notes
66 pages
Bokaro Steel City - Town of Jharkhand
100% (1)
Bokaro Steel City - Town of Jharkhand
26 pages
THEONE ? Sentence Improvement Pre 4th Oct Level Up Your English
No ratings yet
THEONE ? Sentence Improvement Pre 4th Oct Level Up Your English
145 pages
Create Int Varchar Date Varchar State Varchar: Emp - Piyush Employeeid Empname 30 Dob City 20 20
100% (1)
Create Int Varchar Date Varchar State Varchar: Emp - Piyush Employeeid Empname 30 Dob City 20 20
10 pages
DWNotes PDF
No ratings yet
DWNotes PDF
209 pages
Working With Informatica Scripts
100% (1)
Working With Informatica Scripts
17 pages
Rebranding and Revitalisation
100% (1)
Rebranding and Revitalisation
7 pages
Teradata Commands Syntaxes
100% (1)
Teradata Commands Syntaxes
3 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
52 pages
Graphing Motion
No ratings yet
Graphing Motion
30 pages
Akshay Pratap - Informatica IICS
No ratings yet
Akshay Pratap - Informatica IICS
3 pages
Multimedia Database Management System: Wei Tsang Ooi CS731
No ratings yet
Multimedia Database Management System: Wei Tsang Ooi CS731
55 pages
Raisecom Note
No ratings yet
Raisecom Note
1 page
JPPPF June2025 111 02 13 26 Dwi+Ambar
No ratings yet
JPPPF June2025 111 02 13 26 Dwi+Ambar
14 pages
Apache Pig
No ratings yet
Apache Pig
61 pages
Tutorial Letter 201/1/2018: Organisational Communication
No ratings yet
Tutorial Letter 201/1/2018: Organisational Communication
37 pages
Basic Tools in Routine Evaluation of Cardiac Patients
No ratings yet
Basic Tools in Routine Evaluation of Cardiac Patients
26 pages
Talend Data Integration: Subramanyam K
No ratings yet
Talend Data Integration: Subramanyam K
64 pages
2 Hadoop (Uploaded)
No ratings yet
2 Hadoop (Uploaded)
82 pages
DimensionalityModeling 2023
No ratings yet
DimensionalityModeling 2023
25 pages
Functional Dependencies and Normalization For Relational Databases
No ratings yet
Functional Dependencies and Normalization For Relational Databases
41 pages
Oktoma Et Al - 2020
No ratings yet
Oktoma Et Al - 2020
10 pages
Company Profile
No ratings yet
Company Profile
28 pages
Windowing Functions
No ratings yet
Windowing Functions
54 pages
Soil Mechanics Formula 1700830319
No ratings yet
Soil Mechanics Formula 1700830319
3 pages
Configuring Teradata Vantage™ After Installation
No ratings yet
Configuring Teradata Vantage™ After Installation
57 pages
Decision Support, Data Warehousing, and OLAP
No ratings yet
Decision Support, Data Warehousing, and OLAP
48 pages
Priciples of Marketing by Philip Kotler and Gary Armstrong
No ratings yet
Priciples of Marketing by Philip Kotler and Gary Armstrong
33 pages
Instructions / Checklist For Filling KYC Form
No ratings yet
Instructions / Checklist For Filling KYC Form
19 pages
Linking Words Practice
No ratings yet
Linking Words Practice
9 pages
B1 Final Test SpeakingTestFormat
No ratings yet
B1 Final Test SpeakingTestFormat
4 pages
Differences Between Active Transformation and Passive Transformation
No ratings yet
Differences Between Active Transformation and Passive Transformation
18 pages
Xquery and Xpath 2
No ratings yet
Xquery and Xpath 2
25 pages
FSLDM Data Modeller
No ratings yet
FSLDM Data Modeller
1 page
SQL Exercise
No ratings yet
SQL Exercise
11 pages
Informatica Power Center 9.0.1: Building Financial Data Mode - Lab#29
No ratings yet
Informatica Power Center 9.0.1: Building Financial Data Mode - Lab#29
23 pages
1.1 Identify Ty
No ratings yet
1.1 Identify Ty
7 pages
049 Hadoop Commands Reference Guide.
No ratings yet
049 Hadoop Commands Reference Guide.
3 pages
Punzalan, Joshua Mitchell L. Case-Scenarios-NICU
No ratings yet
Punzalan, Joshua Mitchell L. Case-Scenarios-NICU
2 pages
DW
No ratings yet
DW
29 pages
Sentence Completation
No ratings yet
Sentence Completation
5 pages
Tax Problems
No ratings yet
Tax Problems
3 pages
Big Data Hadoop Architect - V4
No ratings yet
Big Data Hadoop Architect - V4
20 pages
Origin of HAZOP Analysis
No ratings yet
Origin of HAZOP Analysis
5 pages
Informatica Performance Tuning
No ratings yet
Informatica Performance Tuning
35 pages
CODE201911 Practices DataVisualizations
No ratings yet
CODE201911 Practices DataVisualizations
19 pages
Forrester - Enabling Smarter Procurement
No ratings yet
Forrester - Enabling Smarter Procurement
15 pages
Beam & Gain Current Affairs
No ratings yet
Beam & Gain Current Affairs
2 pages
IMS Whitepaper
No ratings yet
IMS Whitepaper
2 pages
Resume: Venkata Manoj Emmidisetty
No ratings yet
Resume: Venkata Manoj Emmidisetty
4 pages
AWS Big Data Specialty Study Guide PDF
No ratings yet
AWS Big Data Specialty Study Guide PDF
13 pages
SQL Challenges: Scenario Based Data Challenges With Solutions
No ratings yet
SQL Challenges: Scenario Based Data Challenges With Solutions
8 pages
Important Notice For Information of The Candidates For The Post of Computer Operator (Post Code-753)
No ratings yet
Important Notice For Information of The Candidates For The Post of Computer Operator (Post Code-753)
2 pages
Akash Resume
No ratings yet
Akash Resume
7 pages
Will (Advanced Uses)
No ratings yet
Will (Advanced Uses)
5 pages
Perofrmance and Indexes Discussion Questions Solutions PDF
No ratings yet
Perofrmance and Indexes Discussion Questions Solutions PDF
5 pages
IClebo Arte User Guide-English
No ratings yet
IClebo Arte User Guide-English
20 pages
Understanding Business Intelligence:: ETL and Data Mart Best Practices
No ratings yet
Understanding Business Intelligence:: ETL and Data Mart Best Practices
20 pages
Lead Data Engineer Resume Example
No ratings yet
Lead Data Engineer Resume Example
1 page
Create Materialized View
No ratings yet
Create Materialized View
2 pages
Sampath Polishetty BigData Consultant
No ratings yet
Sampath Polishetty BigData Consultant
7 pages
Azure SQL Trainings: Contact: +91 90 32 82 44 67
No ratings yet
Azure SQL Trainings: Contact: +91 90 32 82 44 67
6 pages
Online Rail Project Proposal
No ratings yet
Online Rail Project Proposal
2 pages
Datawarehouse DVP
No ratings yet
Datawarehouse DVP
12 pages
SQL Notes
No ratings yet
SQL Notes
96 pages
Aggregated Function in HIVE
No ratings yet
Aggregated Function in HIVE
5 pages
XARIOS 400.: Superior Versatility and Reliability For Large-Sized Delivery Vehicles
No ratings yet
XARIOS 400.: Superior Versatility and Reliability For Large-Sized Delivery Vehicles
2 pages
Informatica MDM Course Contents
No ratings yet
Informatica MDM Course Contents
7 pages
1.85 Water and Wastewater Treatment Engineering Homework 3
No ratings yet
1.85 Water and Wastewater Treatment Engineering Homework 3
1 page
Hadoop ECO System
No ratings yet
Hadoop ECO System
1 page
Imp Quries
No ratings yet
Imp Quries
3 pages
Tuning SQL Queries - Oracle
100% (1)
Tuning SQL Queries - Oracle
27 pages
Data Analysis with LLMs
From Everand
Data Analysis with LLMs
Immanuel Trummer
No ratings yet
Instant Jsoup How-to
From Everand
Instant Jsoup How-to
Pete Houston
No ratings yet
WS-BPEL 2.0 Beginner's Guide
From Everand
WS-BPEL 2.0 Beginner's Guide
Matjaz B. Juric
No ratings yet
Monitoring Hadoop
From Everand
Monitoring Hadoop
Gurmukh Singh
No ratings yet
Instant Redis Optimization How-to
From Everand
Instant Redis Optimization How-to
Arun Chinnachamy
No ratings yet
Database testing Third Edition
From Everand
Database testing Third Edition
Gerardus Blokdyk
No ratings yet
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet

OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch

Uploaded by

OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch

Uploaded by

OLAP and Data Warehousing

Slides courtesy of:

© Surajit Chaudhuri, Umeshwar Dayal 23

 Dealers: dealerId, city, state

 Time of Sale: tid, year, month, day

 Sales: carId, dealerId, tid, price

 create a new sales record that indicates that a red

 see how many black and silver VW Passats were

 See which months are particularly favorable to the

 Rank VW dealerships by revenue, displaying a

© Surajit Chaudhuri, Umeshwar Dayal 9

© Surajit Chaudhuri, Umeshwar Dayal 10

Green 50 Attribute Hierarchies and Lattice

Fact data: Sales volume in $100 Date

© Surajit Chaudhuri, Umeshwar Dayal 12

© Surajit Chaudhuri, Umeshwar Dayal 13

2004 38 107 145

Total 176 223 399

© Surajit Chaudhuri, Umeshwar Dayal 17

Select color, sum(price)

 Keep slicing if results are uniform

A: To answer a query Q use tuples T s.t.

© Surajit Chaudhuri, Umeshwar Dayal 21

© Surajit Chaudhuri, Umeshwar Dayal 23

© Surajit Chaudhuri, Umeshwar Dayal 24

© Surajit Chaudhuri, Umeshwar Dayal 26

© Surajit Chaudhuri, Umeshwar Dayal 28

© Surajit Chaudhuri, Umeshwar Dayal 29

© Surajit Chaudhuri, Umeshwar Dayal 31

© Surajit Chaudhuri, Umeshwar Dayal 32

© Surajit Chaudhuri, Umeshwar Dayal 33

© Surajit Chaudhuri, Umeshwar Dayal 34

-- W.H. Inmon, Building the Data Warehouse, 1992.

© Surajit Chaudhuri, Umeshwar Dayal 36

© Surajit Chaudhuri, Umeshwar Dayal 37

© Surajit Chaudhuri, Umeshwar Dayal 38

© Surajit Chaudhuri, Umeshwar Dayal 39

© Surajit Chaudhuri, Umeshwar Dayal 40

© Surajit Chaudhuri, Umeshwar Dayal 41

Build index record

Sort runs Merge runs Index insert Target index

[Barclay, Barnes, Gray, Sundaresan: Loading Databases Using Dataflow Parallelism]

© Surajit Chaudhuri, Umeshwar Dayal 43

© Surajit Chaudhuri, Umeshwar Dayal 44

© Surajit Chaudhuri, Umeshwar Dayal 45

© Surajit Chaudhuri, Umeshwar Dayal 46

You might also like