Data Warehouses and Data Cubes
Data Warehouses and Data Cubes
all all
Specification of hierarchies
Schema hierarchy
day < {month <
quarter; week} < year
Set_grouping hierarchy
{1..10} < inexpensive
Office Day
Month
Han: Data Cubes 11
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR
Country
sum
Canada
Mexico
sum
Visualization
OLAP capabilities
Interactive manipulation
Han: Data Cubes 13
Typical OLAP Operations
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a footprint Promotion Organization
Han: Data Cubes 15
Discovery-Driven Exploration of Data
Cubes
Hypothesis-driven: exploration by user, huge search space
Discovery-driven (Sarawagi et al.98)
pre-compute measures indicating exceptions, guide user in the
data analysis, at all levels of aggregation
Exception: significantly different from the value anticipated,
based on a statistical model
Visual cues such as background color are used to reflect the
degree of exception of each cell
Computation of exception indicator (modeling fitting and
computing SelfExp, InExp, and PathExp values) can be
overlapped with cube construction
Han: Data Cubes 16
Examples: Discovery-Driven Data Cubes
http://www.bi-verdict.com/
http://www.bi-
verdict.com/fileadmin/FreeAnalyses/Comment_
OLAP_revival.htm
K. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proc. 1997 Int.
Conf. Very Large Data Bases, 116-125, Athens, Greece, Aug. 1997.
K. A. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple granularities. In
Proc. Int. Conf. of Extending Database Technology (EDBT'98), 263-277, Valencia, Spain, March
1998.
S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data cubes. In
Proc. Int. Conf. of Extending Database Technology (EDBT'98), pages 168-182, Valencia, Spain,
March 1998.
E. Thomsen. OLAP Solutions: Building Multidimensional Information Systems. John Wiley & Sons,
1997.
Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous
multidimensional aggregates. In Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data, 159-
170, Tucson, Arizona, May 1997.
Han: Data Cubes 21