0% found this document useful (0 votes)

11 views10 pages

2025-Handouts - OLAP - Lecture 1

Chapter 4 discusses the fundamental concepts and techniques of data warehousing and online analytical processing (OLAP). It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data that supports decision-making processes. The chapter also covers data warehouse architecture, models, and the extraction, transformation, and loading (ETL) processes necessary for effective data management.

Uploaded by

hoannguyen2k1hhbg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views10 pages

2025-Handouts - OLAP - Lecture 1

Uploaded by

hoannguyen2k1hhbg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Chapter 4: Data Warehousing and On-line

Data Mining: Analytical Processing

Concepts and Techniques ◼ Data Warehouse: Basic Concepts

(3rd ed.) ◼ Data Warehouse Modeling: Data Cube and OLAP
◼ Data Warehouse Design and Usage
— Chapter 4 — ◼ Data Warehouse Implementation

Source: Jiawei Han, Micheline Kamber, and Jian Pei ◼ Data Generalization by Attribute-Oriented
University of Illinois at Urbana-Champaign & Induction
Simon Fraser University
◼ Summary
©
1 2

1 2

What is a Data Warehouse? Data Warehouse—Subject-Oriented

◼ Defined in many different ways, but not rigorously.
◼ Organized around major subjects, such as customer,
◼ A decision support database that is maintained separately from
product, sales
the organization’s operational database
◼ Focusing on the modeling and analysis of data for
◼ Support information processing by providing a solid platform of
consolidated, historical data for analysis.
decision makers, not on daily operations or transaction

◼ “A data warehouse is a subject-oriented, integrated, time-variant,

processing
and nonvolatile collection of data in support of management’s ◼ Provide a simple and concise view around particular
decision-making process.”—W. H. Inmon subject issues by excluding data that are not useful in
◼ Data warehousing: the decision support process
◼ The process of constructing and using data warehouses

3 4

Data Warehouse—Integrated Data Warehouse—Time Variant

◼ Constructed by integrating multiple, heterogeneous data ◼ The time horizon for the data warehouse is significantly
sources longer than that of operational systems
◼ relational databases, flat files, on-line transaction
◼ Operational database: current value data
records
◼ Data cleaning and data integration techniques are ◼ Data warehouse data: provide information from a
applied. historical perspective (e.g., past 5-10 years)
◼ Ensure consistency in naming conventions, encoding ◼ Every key structure in the data warehouse
structures, attribute measures, etc. among different
◼ Contains an element of time, explicitly or implicitly
data sources
◼ E.g., Hotel price: currency, tax, breakfast covered, etc. ◼ But the key of operational data may or may not
◼ When data is moved to the warehouse, it is contain “time element”
converted.

5 6

1
Data Warehouse—Nonvolatile OLTP vs. OLAP

◼ A physically separate store of data transformed from the OLTP OLAP

users clerk, IT professional knowledge worker
operational environment function day to day operations decision support

◼ Operational update of data does not occur in the data DB design application-oriented subject-oriented
data current, up-to-date historical,
warehouse environment detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
◼ Does not require transaction processing, recovery, usage repetitive ad-hoc
access read/write lots of scans
and concurrency control mechanisms index/hash on prim. key
unit of work short, simple transaction complex query
◼ Requires only two operations in data accessing: # records accessed tens millions

◼ initial loading of data and access of data #users thousands hundreds

DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response

7 8

Why a Separate Data Warehouse? Data Warehouse: A Multi-Tiered Architecture

◼ High performance for both systems
◼ DBMS— tuned for OLTP: access methods, indexing, concurrency Monitor
OLAP Server
control, recovery Other Metadata &
Integrator
◼ Warehouse—tuned for OLAP: complex OLAP queries, sources
multidimensional view, consolidation Analysis
◼ Different functions and different data: Operational Extract Query
DBs Transform Data Serve
◼ missing data: Decision support requires historical data which Reports
Load
operational DBs do not typically maintain Refresh
Warehouse Data mining
◼ data consolidation: DS requires consolidation (aggregation,
summarization) of data from heterogeneous sources
◼ data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled
Data Marts
◼ Note: There are more and more systems which perform OLAP
analysis directly on relational databases Data Sources Data Storage OLAP Engine Front-End Tools
9 10

9 10

Three Data Warehouse Models Extraction, Transformation, and Loading (ETL)

◼ Enterprise warehouse ◼ Data extraction

◼ get data from multiple, heterogeneous, and external
◼ collects all of the information about subjects spanning
sources
the entire organization
◼ Data cleaning
◼ Data Mart ◼ detect errors in the data and rectify them when possible
◼ a subset of corporate-wide data that is of value to a
◼ Data transformation
specific groups of users. Its scope is confined to ◼ convert data from legacy or host format to warehouse
specific, selected groups, such as marketing data mart format
◼ Independent vs. dependent (directly from warehouse) data mart ◼ Load
◼ Virtual warehouse ◼ sort, summarize, consolidate, compute views, check

◼ A set of views over operational databases integrity, and build indicies and partitions
◼ Only some of the possible summary views may be
◼ Refresh
◼ propagate the updates from the data sources to the
materialized
warehouse
11 12

11 12

2
Chapter 4: Data Warehousing and On-line
Metadata Repository Analytical Processing
◼ Meta data is the data defining warehouse objects. It stores:
◼ Description of the structure of the data warehouse ◼ Data Warehouse: Basic Concepts
◼ schema, view, dimensions, hierarchies, derived data defn, data
mart locations and contents ◼ Data Warehouse Modeling: Data Cube and OLAP
Operational meta-data
Data Warehouse Design and Usage
◼
◼
◼ data lineage (history of migrated data and transformation path),
currency of data (active, archived, or purged), monitoring
◼ Data Warehouse Implementation
information (warehouse usage statistics, error reports, audit trails)
◼ The algorithms used for summarization ◼ Data Generalization by Attribute-Oriented
◼ The mapping from operational environment to the data warehouse
◼ Data related to system performance Induction
◼ warehouse schema, view and derived data definitions

◼ Business data
◼ Summary
◼ business terms and definitions, ownership of data, charging policies
13 14

13 14

From Tables and Spreadsheets to

Data Cubes Cube: A Lattice of Cuboids
◼ A data warehouse is based on a multidimensional data model all
0-D (apex) cuboid
which views data in the form of a data cube
◼ A data cube, such as sales, allows data to be modeled and viewed in time item location supplier
multiple dimensions 1-D cuboids

◼ Dimension tables, such as item (item_name, brand, type), or

time,location item,location location,supplier
time(day, week, month, quarter, year)
time,item 2-D cuboids
time,supplier
◼ Fact table contains measures (such as dollars_sold) and keys item,supplier

to each of the related dimension tables time,location,supplier

3-D cuboids
◼ In data warehousing literature, an n-D base cube is called a base time,item,location
time,item,supplier item,location,supplier
cuboid. The top most 0-D cuboid, which holds the highest-level of
4-D (base) cuboid
summarization, is called the apex cuboid. The lattice of cuboids
time, item, location, supplier
forms a data cube.
15 16

15 16

Conceptual Modeling of Data Warehouses Example of Star Schema

◼ Modeling data warehouses: dimensions & measures time

time_key item
◼ Star schema: A fact table in the middle connected to a day item_key
day_of_the_week
set of dimension tables month
Sales Fact Table item_name
brand
time_key
◼ Snowflake schema: A refinement of star schema quarter type
year supplier_type
item_key
where some dimensional hierarchy is normalized into a
branch_key
set of smaller dimension tables, forming a shape branch location
location_key
similar to snowflake branch_key location_key
branch_name units_sold street
◼ Fact constellations: Multiple fact tables share branch_type city
dollars_sold state_or_province
dimension tables, viewed as a collection of stars, country
avg_sales
therefore called galaxy schema or fact constellation Measures

17 18

3
Example of Snowflake Schema Example of Fact Constellation
time time
item time_key item Shipping Fact Table
time_key
day item_key
day item_key supplier
Sales Fact Table day_of_the_week Sales Fact Table item_name time_key
day_of_the_week item_name supplier_key month brand
month brand supplier_type quarter item_key
time_key time_key type
quarter type year supplier_type shipper_key
year item_key supplier_key item_key
branch_key from_location
branch_key
branch location to_location
branch location_key location
location_key
location_key
branch_key branch_key location_key dollars_cost
units_sold street branch_name
units_sold
branch_name street
city_key branch_type units_shipped
branch_type
dollars_sold city dollars_sold city
province_or_state
city_key avg_sales
avg_sales city
country shipper
state_or_province Measures shipper_key
Measures country shipper_name
location_key
19 shipper_type 20

19 20

A Concept Hierarchy:
Dimension (location) Data Cube Measures: Three Categories

all all ◼ Distributive: if the result derived by applying the function

to n aggregate values is the same as that derived by
applying the function on all the data without partitioning
region Europe ... North_America
◼ E.g., count(), sum(), min(), max()
◼ Algebraic: if it can be computed by an algebraic function
country Germany ... Spain Canada ... Mexico with M arguments (where M is a bounded integer), each of
which is obtained by applying a distributive aggregate
function
city Frankfurt ... Vancouver ... Toronto ◼ E.g., avg(), min_N(), standard_deviation()
◼ Holistic: if there is no constant bound on the storage size
office L. Chan ... M. Wind needed to describe a subaggregate.
◼ E.g., median(), mode(), rank()
21 22

21 22

View of Warehouses and Hierarchies Multidimensional Data

◼ Sales volume as a function of product, month,

and region
Dimensions: Product, Location, Time
Hierarchical summarization paths
Specification of hierarchies
◼ Schema hierarchy Industry Region Year

day < {month < Category Country Quarter

quarter; week} < year
Product

Product City Month Week

◼ Set_grouping hierarchy
{1..10} < inexpensive Office Day

Month
23 24

23 24

4
A Sample Data Cube Cuboids Corresponding to the Cube
Total annual sales
Date of TVs in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum all
TV
PC U.S.A 0-D (apex) cuboid
VCR product date country

Country
sum 1-D cuboids
Canada
product,date product,country date, country
Mexico 2-D cuboids

sum
3-D (base) cuboid
product, date, country

All, All, All

25 26

Typical OLAP Operations

◼ Roll up (drill-up): summarize data
◼ by climbing up hierarchy or by dimension reduction
◼ Drill down (roll down): reverse of roll-up
Fig. 3.10 Typical OLAP
◼ from higher level summary to lower level summary or Operations

detailed data, or introducing new dimensions

◼ Slice and dice: project and select
◼ Pivot (rotate):
◼ reorient the cube, visualization, 3D to series of 2D planes
◼ Other operations
◼ drill across: involving (across) more than one fact table
◼ drill through: through the bottom level of the cube to its
back-end relational tables (using SQL)

27 28

A Star-Net Query Model Browsing a Data Cube

Customer Orders
Shipping Method
Customer
CONTRACTS
AIR-EXPRESS

ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY

Visualization
DISTRICT
◼
REGION
DIVISION ◼ OLAP capabilities
Each circle is
Location
called a footprint Promotion Organization ◼ Interactive manipulation
29 30

29 30

5
Chapter 4: Data Warehousing and On-line Design of Data Warehouse: A Business
Analytical Processing Analysis Framework

◼ Data Warehouse: Basic Concepts ◼ Four views regarding the design of a data warehouse
◼ Top-down view
◼ Data Warehouse Modeling: Data Cube and OLAP ◼ allows selection of the relevant information necessary for the
data warehouse
◼ Data Warehouse Design and Usage ◼ Data source view
◼ Data Warehouse Implementation ◼ exposes the information being captured, stored, and
managed by operational systems
◼ Data Generalization by Attribute-Oriented ◼ Data warehouse view
consists of fact tables and dimension tables
Induction ◼

◼ Business query view

◼ Summary ◼ sees the perspectives of data in the warehouse from the view
of end-user
31 32

31 32

Data Warehouse Development:

Data Warehouse Design Process
A Recommended Approach
◼ Top-down, bottom-up approaches or a combination of both
◼ Top-down: Starts with overall design and planning (mature) Multi-Tier Data
◼ Bottom-up: Starts with experiments and prototypes (rapid) Warehouse
Distributed
◼ From software engineering point of view Data Marts
◼ Waterfall: structured and systematic analysis at each step before
proceeding to the next
◼ Spiral: rapid generation of increasingly functional systems, short Enterprise
turn around time, quick turn around Data Data
Data
◼ Typical data warehouse design process
Mart Mart
Warehouse
◼ Choose a business process to model, e.g., orders, invoices, etc.
◼ Choose the grain (atomic level of data) of the business process Model refinement Model refinement
◼ Choose the dimensions that will apply to each fact table record
◼ Choose the measure that will populate each fact table record Define a high-level corporate data model
33 34

33 34

From On-Line Analytical Processing (OLAP)

Data Warehouse Usage
to On Line Analytical Mining (OLAM)
◼ Three kinds of data warehouse applications ◼ Why online analytical mining?
◼ Information processing ◼ High quality of data in data warehouses
supports querying, basic statistical analysis, and reporting
◼ DW contains integrated, consistent, cleaned data
◼

using crosstabs, tables, charts and graphs

◼ Available information processing structure surrounding
Analytical processing
◼
data warehouses
multidimensional analysis of data warehouse data
◼ ODBC, OLEDB, Web accessing, service facilities,
◼

◼ supports basic OLAP operations, slice-dice, drilling, pivoting reporting and OLAP tools
◼ Data mining ◼ OLAP-based exploratory data analysis
◼ knowledge discovery from hidden patterns ◼ Mining with drilling, dicing, pivoting, etc.

supports associations, constructing analytical models,

◼ On-line selection of data mining functions
◼

performing classification and prediction, and presenting the

◼ Integration and swapping of multiple mining
mining results using visualization tools
functions, algorithms, and tasks
35 36

35 36

6
Chapter 4: Data Warehousing and On-line
Analytical Processing Efficient Data Cube Computation
◼ Data cube can be viewed as a lattice of cuboids
◼ Data Warehouse: Basic Concepts ◼ The bottom-most cuboid is the base cuboid
◼ Data Warehouse Modeling: Data Cube and OLAP ◼ The top-most cuboid (apex) contains only one cell
◼ How many cuboids in an n-dimensional cube with L
◼ Data Warehouse Design and Usage levels? n
T =  ( Li +1)
i =1
◼ Data Warehouse Implementation
◼ Materialization of data cube
◼ Data Generalization by Attribute-Oriented ◼ Materialize every (cuboid) (full materialization),
Induction none (no materialization), or some (partial
materialization)
◼ Summary ◼ Selection of which cuboids to materialize
◼ Based on size, sharing, access frequency, etc.
37 38

37 38

The “Compute Cube” Operator Indexing OLAP Data: Bitmap Index

◼ Index on a particular column
◼ Cube definition and computation in DMQL
◼ Each value in the column has a bit vector: bit-op is fast
define cube sales [item, city, year]: sum (sales_in_dollars) ◼ The length of the bit vector: # of records in the base table
compute cube sales ◼ The i-th bit is set if the i-th row of the base table has the value for
the indexed column
◼ Transform it into a SQL-like language (with a new operator cube
by, introduced by Gray et al.’96) () ◼ not suitable for high cardinality domains
◼ A recent bit compression technique, Word-Aligned Hybrid (WAH),
SELECT item, city, year, SUM (amount)
makes it work for high cardinality domain as well [Wu, et al. TODS’06]
FROM SALES (city) (item) (year)
Base table Index on Region Index on Type
CUBE BY item, city, year Cust Region Type RecIDAsia Europe America RecID Retail Dealer
◼ Need compute the following Group-Bys C1 Asia Retail 1 1 0 0 1 1 0
(city, item) (city, year) (item, year)
(date, product, customer), C2 Europe Dealer 2 0 1 0 2 0 1
(date,product),(date, customer), (product, customer), C3 Asia Dealer 3 1 0 0 3 0 1
(date), (product), (customer) (city, item, year) C4 America Retail 4 0 0 1 4 1 0
() C5 Europe Dealer 5 0 1 0 5 0 1
39 40

39 40

Indexing OLAP Data: Join Indices Efficient Processing OLAP Queries

◼ Join index: JI(R-id, S-id) where R (R-id, …)  S ◼ Determine which operations should be performed on the available cuboids
(S-id, …) ◼ Transform drill, roll, etc. into corresponding SQL and/or OLAP operations,
◼ Traditional indices map the values to a list of e.g., dice = selection + projection
record ids
Determine which materialized cuboid(s) should be selected for OLAP op.
◼ It materializes relational join in JI file and
◼

speeds up relational join ◼ Let the query to be processed be on {brand, province_or_state} with the
◼ In data warehouses, join index relates the values condition “year = 2004”, and there are 4 materialized cuboids available:
of the dimensions of a start schema to rows in
1) {year, item_name, city}
the fact table.
◼ E.g. fact table: Sales and two dimensions city 2) {year, brand, country}
and product 3) {year, brand, province_or_state}
◼ A join index on city maintains for each
4) {item_name, province_or_state} where year = 2004
distinct city a list of R-IDs of the tuples
recording the Sales in the city Which should be selected to process the query?
◼ Join indices can span multiple dimensions ◼ Explore indexing structures and compressed vs. dense array structs in MOLAP
41 42

41 42

7
OLAP Server Architectures Chapter 4: Data Warehousing and On-line
Analytical Processing
◼ Relational OLAP (ROLAP)
◼ Use relational or extended-relational DBMS to store and manage ◼ Data Warehouse: Basic Concepts
warehouse data and OLAP middle ware
◼ Include optimization of DBMS backend, implementation of
◼ Data Warehouse Modeling: Data Cube and OLAP
aggregation navigation logic, and additional tools and services
◼ Data Warehouse Design and Usage
◼ Greater scalability
◼ Multidimensional OLAP (MOLAP) ◼ Data Warehouse Implementation
◼ Sparse array-based multidimensional storage engine
◼ Fast indexing to pre-computed summarized data ◼ Data Generalization by Attribute-Oriented
Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)
◼
Induction
◼ Flexibility, e.g., low level: relational, high-level: array
◼ Specialized SQL servers (e.g., Redbricks) ◼ Summary
◼ Specialized support for SQL queries over star/snowflake schemas
43 44

43 44

Attribute-Oriented Induction Attribute-Oriented Induction: An Example

Example: Describe general characteristics of graduate
◼ Proposed in 1989 (KDD ‘89 workshop)
students in the University database
◼ Not confined to categorical data nor particular measures
◼ Step 1. Fetch relevant set of data using an SQL
◼ How it is done?
statement, e.g.,
◼ Collect the task-relevant data (initial relation) using a
Select * (i.e., name, gender, major, birth_place,
relational database query
birth_date, residence, phone#, gpa)
◼ Perform generalization by attribute removal or
from student
attribute generalization
where student_status in {“Msc”, “MBA”, “PhD” }
◼ Apply aggregation by merging identical, generalized
◼ Step 2. Perform attribute-oriented induction
tuples and accumulating their respective counts
◼ Step 3. Present results in generalized relation, cross-tab,
◼ Interaction with users for knowledge presentation
or rule forms

45 46

Class Characterization: An Example Basic Principles of Attribute-Oriented Induction

Name Gender Major Birth-Place Birth_date Residence Phone # GPA
Jim M CS Vancouver,BC, 8-12-76 3511 Main St., 687-4598 3.67 ◼ Data focusing: task-relevant data, including dimensions,
Initial
Woodman
Relation Scott M CS
Canada
Montreal, Que, 28-7-75
Richmond
345 1st Ave., 253-9106 3.70
and the result is the initial relation
Attribute-removal: remove attribute A if there is a large set
Lachance Canada Richmond
Laura Lee F Physics Seattle, WA, USA 25-8-70 125 Austin Ave., 420-5232 3.83 ◼

of distinct values for A but (1) there is no generalization

… … … … … Burnaby … …
…
Removed Retained Sci,Eng, Country Age range City Removed Excl,
Bus VG,.. operator on A, or (2) A’s higher level concepts are
Prime
Gender Major Birth_region Age_range Residence GPA Count
expressed in terms of other attributes
M Science Canada 20-25 Richmond Very-good 16
Generalized
Relation
F
…
Science
…
Foreign
…
25-30
…
Burnaby
…
Excellent
…
22
…
◼ Attribute-generalization: If there is a large set of distinct
Birth_Region
values for A, and there exists a set of generalization
Gender
Canada Foreign Total
operators on A, then select an operator and generalize A
M 16 14 30 ◼ Attribute-threshold control: typical 2-8, specified/default
F 10 22 32
Total 26 36 62 ◼ Generalized relation threshold control: control the final
relation/rule size
47 48

47 48

8
Attribute-Oriented Induction: Basic
Presentation of Generalized Results
Algorithm
◼ Generalized relation:
◼ InitialRel: Query processing of task-relevant data, deriving
Relations where some or all attributes are generalized, with counts
the initial relation.
◼

or other aggregation values accumulated.

◼ PreGen: Based on the analysis of the number of distinct
◼ Cross tabulation:
values in each attribute, determine generalization plan for
◼ Mapping results into cross tabulation form (similar to contingency
each attribute: removal? or how high to generalize? tables).
◼ PrimeGen: Based on the PreGen plan, perform ◼ Visualization techniques:
generalization to the right level to derive a “prime ◼ Pie charts, bar charts, curves, cubes, and other visual forms.
generalized relation”, accumulating the counts. ◼ Quantitative characteristic rules:
◼ Presentation: User interaction: (1) adjust levels by drilling, ◼ Mapping generalized result into characteristic rules with quantitative
(2) pivoting, (3) mapping into rules, cross tabs, information associated with it, e.g.,
visualization presentations. grad ( x)  male( x) 
birth _ region( x) ="Canada"[t :53%]  birth _ region( x) =" foreign"[t : 47%].
49 50

49 50

Mining Class Comparisons Concept Description vs. Cube-Based OLAP

◼ Comparison: Comparing two or more classes ◼ Similarity:
◼ Method: ◼ Data generalization
◼ Partition the set of relevant data into the target class and the ◼ Presentation of data summarization at multiple levels of
contrasting class(es) abstraction
◼ Generalize both classes to the same high level concepts ◼ Interactive drilling, pivoting, slicing and dicing
◼ Compare tuples with the same high level descriptions ◼ Differences:
◼ Present for every tuple its description and two measures
◼ OLAP has systematic preprocessing, query independent,
◼ support - distribution within single class and can drill down to rather low level
◼ comparison - distribution between classes
◼ AOI has automated desired level allocation, and may
◼ Highlight the tuples with strong discriminant features perform dimension relevance analysis/ranking when
◼ Relevance Analysis: there are many relevant dimensions
◼ Find attributes (features) which best distinguish different classes
◼ AOI works on the data which are not in relational forms

51 52

Chapter 4: Data Warehousing and On-line

Analytical Processing Summary
◼ Data warehousing: A multi-dimensional model of a data warehouse
◼ Data Warehouse: Basic Concepts ◼ A data cube consists of dimensions & measures
◼ Star schema, snowflake schema, fact constellations
◼ Data Warehouse Modeling: Data Cube and OLAP ◼ OLAP operations: drilling, rolling, slicing, dicing and pivoting
◼ Data Warehouse Architecture, Design, and Usage
◼ Data Warehouse Design and Usage ◼ Multi-tiered architecture
◼ Business analysis design framework
◼ Data Warehouse Implementation Information processing, analytical processing, data mining, OLAM (Online
◼

Analytical Mining)
◼ Data Generalization by Attribute-Oriented ◼ Implementation: Efficient computation of data cubes
Partial vs. full vs. no materialization
Induction
◼

◼ Indexing OALP data: Bitmap index and join index

◼ OLAP query processing
◼ Summary ◼ OLAP servers: ROLAP, MOLAP, HOLAP
◼ Data generalization: Attribute-oriented induction
53 54

53 54

9
References (I) References (II)
◼ S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S.
◼ C. Imhoff, N. Galemmo, and J. G. Geiger. Mastering Data Warehouse Design: Relational and
Sarawagi. On the computation of multidimensional aggregates. VLDB’96
Dimensional Techniques. John Wiley, 2003
◼ D. Agrawal, A. E. Abbadi, A. Singh, and T. Yurek. Efficient view maintenance in data ◼ W. H. Inmon. Building the Data Warehouse. John Wiley, 1996
warehouses. SIGMOD’97
◼ R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional
◼ R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. ICDE’97 Modeling. 2ed. John Wiley, 2002
◼ S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM ◼ P. O’Neil and G. Graefe. Multi-table joins through bitmapped join indices. SIGMOD Record, 24:8–
SIGMOD Record, 26:65-74, 1997 11, Sept. 1995.
◼ E. F. Codd, S. B. Codd, and C. T. Salley. Beyond decision support. Computer World, 27, July ◼ P. O'Neil and D. Quass. Improved query performance with variant indexes. SIGMOD'97
1993. ◼ Microsoft. OLEDB for OLAP programmer's reference version 1.0. In
◼ J. Gray, et al. Data cube: A relational aggregation operator generalizing group-by, cross-tab http://www.microsoft.com/data/oledb/olap, 1998
and sub-totals. Data Mining and Knowledge Discovery, 1:29-54, 1997. ◼ S. Sarawagi and M. Stonebraker. Efficient organization of large multidimensional arrays. ICDE'94
◼ A. Gupta and I. S. Mumick. Materialized Views: Techniques, Implementations, and ◼ A. Shoshani. OLAP and statistical databases: Similarities and differences. PODS’00.
Applications. MIT Press, 1999. ◼ D. Srivastava, S. Dar, H. V. Jagadish, and A. V. Levy. Answering queries with aggregation using
views. VLDB'96
◼ J. Han. Towards on-line analytical mining in large databases. ACM SIGMOD Record, 27:97-107,
◼ P. Valduriez. Join indices. ACM Trans. Database Systems, 12:218-246, 1987.
1998.
◼ J. Widom. Research problems in data warehousing. CIKM’95
◼ V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently.
◼ K. Wu, E. Otoo, and A. Shoshani, Optimal Bitmap Indices with Efficient Compression, ACM Trans.
SIGMOD’96
on Database Systems (TODS), 31(1): 1-38, 2006
◼ J. Hellerstein, P. Haas, and H. Wang. Online aggregation. SIGMOD'97
55 56

55 56

Compression of Bitmap Indices

◼ Bitmap indexes must be compressed to reduce I/O costs
and minimize CPU usage—majority of the bits are 0’s
◼ Two compression schemes:
◼ Byte-aligned Bitmap Code (BBC)
Word-Aligned Hybrid (WAH) code
Surplus Slides ◼

◼ Time and space required to operate on compressed

bitmap is proportional to the total size of the bitmap
◼ Optimal on attributes of low cardinality as well as those of
high cardinality.
◼ WAH out performs BBC by about a factor of two
57 58

57 58

Csb4318 DWDM Unit - 1 Revised
No ratings yet
Csb4318 DWDM Unit - 1 Revised
68 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
9 DMW Olap PPT 11.2
No ratings yet
9 DMW Olap PPT 11.2
12 pages
P6 Olap
No ratings yet
P6 Olap
47 pages
04OLAP
No ratings yet
04OLAP
48 pages
04OLAP
No ratings yet
04OLAP
35 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
Chap3-Data Warehousing and OLAP
No ratings yet
Chap3-Data Warehousing and OLAP
67 pages
03 DM BI Data Warehousing
No ratings yet
03 DM BI Data Warehousing
94 pages
Warehouse
No ratings yet
Warehouse
58 pages
04OLAP
100% (1)
04OLAP
58 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
Unit 1
No ratings yet
Unit 1
54 pages
04OLAP Editted v1
No ratings yet
04OLAP Editted v1
59 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
4-Data Warehousing and Integration in Business
No ratings yet
4-Data Warehousing and Integration in Business
39 pages
Unit 1 - Data Warehouse
No ratings yet
Unit 1 - Data Warehouse
21 pages
04olap New
No ratings yet
04olap New
55 pages
Warehouse
No ratings yet
Warehouse
60 pages
Data Warehouse
No ratings yet
Data Warehouse
174 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
Data Warehouse OLAP OLTP
No ratings yet
Data Warehouse OLAP OLTP
12 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
66 pages
Module-3 Data Warehousing
No ratings yet
Module-3 Data Warehousing
44 pages
Data Warehousing and OLAP Technology
No ratings yet
Data Warehousing and OLAP Technology
51 pages
03 04OLAP SKJ Edited Oct 1, 2024
No ratings yet
03 04OLAP SKJ Edited Oct 1, 2024
93 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
Ict502 - Final Report Group7
No ratings yet
Ict502 - Final Report Group7
26 pages
Lecture 4 (Dataware Housing)
No ratings yet
Lecture 4 (Dataware Housing)
50 pages
04OLAP
No ratings yet
04OLAP
58 pages
Wk3-4 Data Warehouse
No ratings yet
Wk3-4 Data Warehouse
60 pages
Datawarehouse Notes
No ratings yet
Datawarehouse Notes
39 pages
04OLAP
No ratings yet
04OLAP
66 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
86 pages
04DWH & Olap
No ratings yet
04DWH & Olap
50 pages
Multitier DW Architecture & Implementation
No ratings yet
Multitier DW Architecture & Implementation
63 pages
Datawarehouse: Fact Table
No ratings yet
Datawarehouse: Fact Table
55 pages
CS403 IMP Short Notes
100% (1)
CS403 IMP Short Notes
88 pages
UEU Sistem Pendukung Keputusan Pertemuan 5
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 5
46 pages
04OLAP
No ratings yet
04OLAP
50 pages
CH 1
No ratings yet
CH 1
53 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
DataMining and Data Warehousing
No ratings yet
DataMining and Data Warehousing
96 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
70 pages
CSEP 546 Data Mining: Instructor: Pedro Domingos
No ratings yet
CSEP 546 Data Mining: Instructor: Pedro Domingos
63 pages
CSE 592 Data Mining: Instructor: Pedro Domingos
No ratings yet
CSE 592 Data Mining: Instructor: Pedro Domingos
63 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
51 pages
Chapter 1 Datawarehouse
100% (1)
Chapter 1 Datawarehouse
47 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
58 pages
Asritha Kolli: OS/DB Migration Using SUM With DMO Tool
100% (2)
Asritha Kolli: OS/DB Migration Using SUM With DMO Tool
10 pages
Data Warehouse
No ratings yet
Data Warehouse
77 pages
Data Warehousing: Lecturer: Dr. Nguyen Thi Ngoc Anh
No ratings yet
Data Warehousing: Lecturer: Dr. Nguyen Thi Ngoc Anh
23 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
Zbmath Source Operator Theory C+
No ratings yet
Zbmath Source Operator Theory C+
2 pages
Dbms MCQ: Show Answer Workspace
No ratings yet
Dbms MCQ: Show Answer Workspace
15 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
SYNON Tutorial: Extracted From CA 2E Tutorial r8.5
No ratings yet
SYNON Tutorial: Extracted From CA 2E Tutorial r8.5
104 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
SPLK-1004 Splunk Core Certified Advanced Power User Exam Updated Dumps
No ratings yet
SPLK-1004 Splunk Core Certified Advanced Power User Exam Updated Dumps
11 pages
Oracle: Group Members: Hamza Ahmad
No ratings yet
Oracle: Group Members: Hamza Ahmad
28 pages
De Unit-3
No ratings yet
De Unit-3
21 pages
6.1 GCP - Cloud - Bigtable PDF
100% (1)
6.1 GCP - Cloud - Bigtable PDF
18 pages
Design A Database
No ratings yet
Design A Database
65 pages
DBMS Individual Project
No ratings yet
DBMS Individual Project
16 pages
DBMS Practical 10
No ratings yet
DBMS Practical 10
11 pages
Unit !. Database System Concepts
No ratings yet
Unit !. Database System Concepts
29 pages
12 DataWarehouse
No ratings yet
12 DataWarehouse
55 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Exercicios SQL
No ratings yet
Exercicios SQL
6 pages
MySQL Essay
No ratings yet
MySQL Essay
9 pages
Module 9 - Advanced SQL Query
No ratings yet
Module 9 - Advanced SQL Query
6 pages
A Crash Course in Caching - Part 2 - by Alex Xu
No ratings yet
A Crash Course in Caching - Part 2 - by Alex Xu
9 pages
For Pregnant Examinee, Please Refer To The Printed Names With The Triple A (AAA) Legend
No ratings yet
For Pregnant Examinee, Please Refer To The Printed Names With The Triple A (AAA) Legend
19 pages
Static Vs Dynamic Query Optimization (23027119-003, Qaiser Ali)
No ratings yet
Static Vs Dynamic Query Optimization (23027119-003, Qaiser Ali)
9 pages
DBMS 12
No ratings yet
DBMS 12
3 pages
Selvarani Mylsamy: "We Swim in A Sea of Data and The Sea Level Is Rising Rapidly."
No ratings yet
Selvarani Mylsamy: "We Swim in A Sea of Data and The Sea Level Is Rising Rapidly."
33 pages
Pega Training Syllabus: BPM Overview, Project Implementation Methodology, Class Structures & Hierarchy
No ratings yet
Pega Training Syllabus: BPM Overview, Project Implementation Methodology, Class Structures & Hierarchy
4 pages
Assignment 04
No ratings yet
Assignment 04
7 pages
To Read Image Files From An Oracle APEX Application Hosted On Tomcat With The Image Directory On A Linux Server
No ratings yet
To Read Image Files From An Oracle APEX Application Hosted On Tomcat With The Image Directory On A Linux Server
2 pages
Cadm Mid
No ratings yet
Cadm Mid
5 pages
Quiz - Module-4 - Genta Yusuf Madhani - 1201174352 - Fri-033 - Tuesday - Shift-4 - WLN
No ratings yet
Quiz - Module-4 - Genta Yusuf Madhani - 1201174352 - Fri-033 - Tuesday - Shift-4 - WLN
5 pages
Yogita Aher Python Developer 2023
No ratings yet
Yogita Aher Python Developer 2023
1 page
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
From Everand
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
Robert Johnson
No ratings yet
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

2025-Handouts - OLAP - Lecture 1

Uploaded by

2025-Handouts - OLAP - Lecture 1

Uploaded by

Chapter 4: Data Warehousing and On-line

Data Mining: Analytical Processing

Concepts and Techniques ◼ Data Warehouse: Basic Concepts

What is a Data Warehouse? Data Warehouse—Subject-Oriented

◼ “A data warehouse is a subject-oriented, integrated, time-variant,

Data Warehouse—Integrated Data Warehouse—Time Variant

◼ A physically separate store of data transformed from the OLTP OLAP

◼ initial loading of data and access of data #users thousands hundreds

Why a Separate Data Warehouse? Data Warehouse: A Multi-Tiered Architecture

Three Data Warehouse Models Extraction, Transformation, and Loading (ETL)

◼ Enterprise warehouse ◼ Data extraction

From Tables and Spreadsheets to

◼ Dimension tables, such as item (item_name, brand, type), or

to each of the related dimension tables time,location,supplier

Conceptual Modeling of Data Warehouses Example of Star Schema

◼ Modeling data warehouses: dimensions & measures time

all all ◼ Distributive: if the result derived by applying the function

View of Warehouses and Hierarchies Multidimensional Data

◼ Sales volume as a function of product, month,

day < {month < Category Country Quarter

Product City Month Week

All, All, All

Typical OLAP Operations

detailed data, or introducing new dimensions

A Star-Net Query Model Browsing a Data Cube

◼ Business query view

Data Warehouse Development:

From On-Line Analytical Processing (OLAP)

using crosstabs, tables, charts and graphs

supports associations, constructing analytical models,

performing classification and prediction, and presenting the

The “Compute Cube” Operator Indexing OLAP Data: Bitmap Index

Indexing OLAP Data: Join Indices Efficient Processing OLAP Queries

Attribute-Oriented Induction Attribute-Oriented Induction: An Example

Class Characterization: An Example Basic Principles of Attribute-Oriented Induction

of distinct values for A but (1) there is no generalization

or other aggregation values accumulated.

Mining Class Comparisons Concept Description vs. Cube-Based OLAP

Chapter 4: Data Warehousing and On-line

◼ Indexing OALP data: Bitmap index and join index

Compression of Bitmap Indices

◼ Time and space required to operate on compressed

You might also like