0% found this document useful (0 votes)
125 views15 pages

DM104 - Evaluation of Business Performance

This document discusses cubes and multidimensional analysis for data warehousing. It defines cubes as multi-dimensional arrays representing data along different measures and dimensions. Dimensions represent attributes like time, products, customers etc. Facts tables contain measures and links to dimension tables. Cubes allow analysis by grouping and consolidating data along different dimensions and hierarchies within dimensions. Common OLAP operations on cubes include slice, dice, pivot and roll-up/down.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views15 pages

DM104 - Evaluation of Business Performance

This document discusses cubes and multidimensional analysis for data warehousing. It defines cubes as multi-dimensional arrays representing data along different measures and dimensions. Dimensions represent attributes like time, products, customers etc. Facts tables contain measures and links to dimension tables. Cubes allow analysis by grouping and consolidating data along different dimensions and hierarchies within dimensions. Common OLAP operations on cubes include slice, dice, pivot and roll-up/down.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

DM104 – Evaluation of Business Performance Page 1 of 15

Cubes and Multidimensional Analysis

CUBES AND MULTIDIMENSIONAL ANALYSIS


 Hierarchies of concepts and OLAP operations
 Materialization of Cubes of Data

Welcome Notes:
WELCOME BSIS STUDENTS!

I. INTRODUCTION:

This module discusses cubes and multidimensional analysis. Cube is used to represent data along some
measure of interest. Although called a "cube", it can be 2-dimensional, 3-dimensional, or higher-dimensional.
Each dimension represents some attribute in the database and the cells in the data cube represent the
measure of interest. It also explain the data analysis process that groups data by dimensions and
measurements.

II. OBJECTIVES:

At the end of this module, you should be able to:


1. Apply multidimensional analysis.
2. Differentiate dimension tables and fact tables.
3. Illustrate star, snowflake, and galaxy schema.
4. Identify hierarchies of concepts and OLAP operations.
DM104 – Evaluation of Business Performance Page 2 of 15
Cubes and Multidimensional Analysis

III. PRELIMINARY ACTIVITIES:

Before you proceed to the main lesson, test yourself in this activity.

Direction: Write TRUE if the statement is correct, otherwise write FALSE.


_______ 1. A data cube is a multi-dimensional array of values.
_______ 2. OLAP stands for online analytical programming.
_______ 3. The pivot operation produces a rotation of the axes, swapping some dimensions to obtain a
different view of a data cube.
_______ 4. A slice operation allows navigation through a data cube from aggregated and consolidated
information to more detailed information.
_______ 5. The roll-down operation the value of an attribute is selected and fixed along one dimension.

GREAT!!!
You may now proceed to the main lesson.

IV. DEVELOPMENT OF THE LESSON

Based on the preliminary activities, what did you notice about it?
________________________________________________________
CONGRATULATIONS!
You may now proceed to the lesson.

Cubes and multidimensional analysis


The design of data warehouses and data marts is based on a multidimensional paradigm for data
representation that provides at least two major advantages: on the functional side, it can guarantee fast
response times even to complex queries, while on the logical side the dimensions naturally match the criteria
followed by knowledge workers to perform their analyses.
The multidimensional representation is based on a star schema which contains two types of data tables:
dimension tables and fact tables.
DM104 – Evaluation of Business Performance Page 3 of 15
Cubes and Multidimensional Analysis

Dimension tables. In general, dimensions are associated with the entities around which the processes of
an organization revolve. Dimension tables then correspond to primary entities contained in the data
warehouse, and in most cases they directly derive from master tables stored in OLTP systems, such as
customers, products, sales, locations and time. Each dimension table is often internally structured according
to hierarchical relationships. For example, the temporal dimension is usually based upon two major
hierarchies: {day, week, year} and {day, month, quarter, year}. Similarly, the location dimension may be
hierarchically organized as {street, zip code, city, province, region, country, area}. Products in their turn have
hierarchical structures such as {item, family, type} in the manufacturing industry and {item, category,
department} in the retail industry. In a way, dimensions predetermine the main paths along which OLAP
analyses will presumably be developed.

Fact tables. Fact tables usually refer to transactions and contain two types of data:
 Links to dimension tables, that are required to properly reference the information contained in each fact
table;
 Numerical values of the attributes that characterize the corresponding transactions and that represent
the actual target of the subsequent OLAP analyses.
For example, a fact table may contain sales transactions and make reference to several dimension tables,
such as customers, points of sale, products, suppliers, time. The corresponding measures of interest are
attributes such as quantity of items sold, unit price and discount. In this example the fact table allows analysts
to evaluate the trends of sales over time, either total, or referred to a single customer, or referred to a group
of customers, that can be identified through any hierarchy induced by the dimension table associated with
the customers. The analyst may also evaluate the trend over time of sales percentages relative to customers
located in a specific region.
DM104 – Evaluation of Business Performance Page 4 of 15
Cubes and Multidimensional Analysis

Figure 1: Example of Star Schema

Figure 2: Example of Snowflake Schema

Figure 1 shows the star schema associated with the fact table representing sales transactions. The fact
table is placed in the middle of the schema and is linked to the dimension tables through appropriate
references. The measures in the fact table appear in bold type.
DM104 – Evaluation of Business Performance Page 5 of 15
Cubes and Multidimensional Analysis

Sometimes dimension tables are connected in their turn to other dimension tables, as shown in Figure
2, through a process of partial data standardization, in order to reduce memory use. In the given example
the dimension table referring to the location is in turn hierarchically connected with the dimension table
containing geographical information. This brings about a snowflake schema.

Figure 3: Example of Galaxy Schema

A data warehouse includes several fact tables, interconnected with dimension tables, linked in their turn
with other dimension tables. The latter type of schema, shown in Figure 3, is termed a galaxy schema.
A fact table connected with n dimension tables may be represented by an n-dimensional data cube where
each axis corresponds to a dimension. Multidimensional cubes are a natural extension of the popular two-
dimensional spreadsheets, which can be interpreted as two-dimensional cubes. For instance, consider a
sales fact table developed along the three dimensions of {time, product, region}. Suppose we select only two
dimensions for the analysis, such as {time, product}, having preset the region attribute along the three values
{USA, Asia, Europa}. In this way we obtain the three two-dimensional tables in which the rows correspond to
quarters of a year and the columns to products (see Tables 3.3–3.5). The cube shown in Figure 4 is a three-
dimensional illustration of the same sales fact table. Atomic data are represented by 36 cells that can be
obtained by crossing all possible values along the three dimensions: time {Q1, Q2, Q3, Q4}, region {USA,
Asia, Europa} and product {TV, PC, DVD}. These atomic cells can be supplemented by 44 cells
corresponding to the summary values obtained through consolidation along one or more dimensions, as
shown by the cube in the figure.
DM104 – Evaluation of Business Performance Page 6 of 15
Cubes and Multidimensional Analysis

Suppose that the sales fact table also contains a fourth dimension represented by the suppliers. The
corresponding data cube constitutes a structure in four-dimensional space and therefore cannot be
represented graphically. However, we can obtain four logical views composed of three-dimensional cubes,
called cuboids, inside the four-dimensional cube, by fixing the values of one dimension.

More generally, starting from a fact table linked to n dimension tables, it is possible to obtain a lattice of
cuboids, each of them corresponding to a different level of consolidation along one or more dimensions. This
type of aggregation is equivalent in structured query language (SQL) to a query sum derived from a group-
by condition. Figure 5 illustrates the lattice composed by the cuboids obtained from the data cube defined
along the four dimensions {time, product, region, supplier}.
DM104 – Evaluation of Business Performance Page 7 of 15
Cubes and Multidimensional Analysis

Figure 4: Example of Three-Dimensional Cube

Figure 5: Lattice of Cuboids Derived from a Four-Dimensional Cube

The cuboid associated with the atomic data, which therefore does not imply any type of consolidation, is
called a base cuboid. At the other extreme, the apex cuboid is defined as the cuboid corresponding to the
consolidation along all dimensions, therefore associated with the grand total of the measure of interest.
DM104 – Evaluation of Business Performance Page 8 of 15
Cubes and Multidimensional Analysis

Hierarchies of concepts and OLAP operations


In many instances, OLAP analyses are based on hierarchies of concepts to consolidate the data and to
create logical views along the dimensions of a data warehouse. A concept hierarchy defines a set of maps
from a lower level of concepts to a higher level.
For example, the {location} dimension may originate a totally ordered hierarchy, as shown in Figure 6,
developing along the {address, municipality, province, country} relationship. The temporal dimension, on the
other hand, originates a partially ordered hierarchy, also shown in Figure 6.
Specific hierarchy types may be predefined in the software platform used for the creation and
management of a data warehouse, as in the case of the dimensions shown in Figure 6. For other hierarchies
it is necessary for analysts to explicitly define the relationships among concepts.
Hierarchies of concepts are also used to perform several visualization operations dealing with data cubes
in a data warehouse.

Roll-up. A roll-up operation, also termed drill-up, consists of an aggregation of data in the cube, which can
be obtained alternatively in the following two ways.
 Proceeding upwards to a higher level along a single dimension defined over a concepts hierarchy.
For example, for the {location} dimension it is possible to move upwards from the {city} level to the
{province} level and to consolidate the measures of interest through a group-by conditioned sum
over all records whereby the city belongs to the same province.

Figure 6: Hierarchies of Concepts


 Reducing by one dimension. For example, the removal of the {time} dimension leads to consolidated
measures through the sum over all time periods existing in the data cube.
DM104 – Evaluation of Business Performance Page 9 of 15
Cubes and Multidimensional Analysis

Roll-down. A roll-down operation, also referred to as drill-down, is the opposite operation to roll-up. It allows
navigation through a data cube from aggregated and consolidated information to more detailed information.
The effect is to reverse the result achieved through a roll-up operation. A drill-down operation can therefore
be carried out in two ways.
 Shifting down to a lower level along a single dimension hierarchy. For example, in the case of the
{location} dimension, it is possible to shift from the {province} level to the {city} level and to
disaggregate the measures of interest over all records whereby the city belongs to the same
province.
 Adding one dimension. For example, the introduction of the {time} dimension leads to disaggregate
the measures of interest over all time periods existing in a data cube.

Slice and dice. Through the slice operation the value of an attribute is selected and fixed along one
dimension. For example, Table 3.3 has been obtained by fixing the region at the {Usa} value. The dice
operation obtains a cube in a subspace by selecting several dimensions simultaneously.

Pivot. The pivot operation, also referred to as rotation, produces a rotation of the axes, swapping some
dimensions to obtain a different view of a data cube.

Materialization of cubes of data


OLAP analyses developed by knowledge workers may need to access the information associated with
several cuboids, based on the specific queries and analyses being carried out. In order to guarantee
adequate response time, it might be useful to design a data warehouse where all (or at least a large portion
of) values of the measures of interest associated with all possible cuboids are pre-calculated. This approach
is termed full materialization of the information relative to the data cubes.
Observe that where hierarchies of concepts are missing, it is possible to form 2n distinct cuboids from all
possible combinations of n dimensions. The existence of hierarchies along different dimensions makes the
number of distinct cuboids even greater. If Li denotes the number of hierarchical levels associated with the
ith dimension, for an n-dimensional data cube it is possible to calculate the full number of cuboids, given by
DM104 – Evaluation of Business Performance Page 10 of 15
Cubes and Multidimensional Analysis

For example, if a data cube includes 5 dimensions, and if each of these dimensions includes 3 hierarchical
levels, the number of cuboids is equal to 45 = 210 ≈ 103. It is clear that the full materialization of the cuboids
for all the cubes associated with the fact tables of a data warehouse would impose storage requirements that
could be hardly sustained over time, considering the rate at which new records are gathered.
For all of the above reasons, it is necessary to strike a balance between the need for fast access to
information, which would suggest the full materialization of the cuboids, and the need to keep memory use
within reasonable limits. As a consequence, preventive materialization should be carried out only for those
cuboids that are most frequently accessed, while for the others the computation should be carried out on
demand only when actual queries requesting the associated information are performed. This latter approach
is referred to as partial materialization of the information relative to the data cubes.

We had just finished the discussion on cubes and multidimensional


analysis. Let’s move on to the next higher level of activity/ies or exercise/s that
demonstrate your potential skills/knowledge of what you have learned.
DM104 – Evaluation of Business Performance Page 11 of 15
Cubes and Multidimensional Analysis

V. ANALYSIS, APPLICATION AND EXPLORATION


ACTIVITY 1
Name: _________________________________ Date: __________________
Year & Section: _________________________ Score: _________________
Direction: Read each statement below and fill in the missing words from the word bank.

OLAP Paradigm Dimensions Data Warehouse Multidimensional


Fact Table Roll-Down Roll-Up Hierarchies of Concepts Pivot

1. The design of data warehouses and data marts is based on a multidimensional __________.
2. __________ representation is based on a star schema which contains dimension tables and fact tables.
3. __________ are associated with the entities around which the processes of an organization revolve.
4. __________ is placed in the middle of the schema and is linked to the dimension tables through
appropriate references.
5. __________ includes several fact tables, interconnected with dimension tables, linked in their turn with
other dimension tables.
6. __________ analyses are based on hierarchies of concepts to consolidate the data and to create logical
views.
7. __________ are also used to perform several visualization operations dealing with data cubes in a data
warehouse.
8. __________ operation performs aggregation on a data cube either by climbing up the hierarchy or by
dimension reduction.
9. __________ operation navigates from less detailed data to more detailed data.
10. __________ operation produces a rotation of the axes, swapping some dimensions to obtain a different
view of a data cube.

Finally, let us summarize the lesson of what we had discussed today.


DM104 – Evaluation of Business Performance Page 12 of 15
Cubes and Multidimensional Analysis

VI. GENERALIZATION

Name: _________________________________ Date: __________________


Year & Section: _________________________ Score: _________________
Direction: Answer the following questions.
1. What is Dimension Tables?
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

2. What is Fact Tables?


__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

3. What is multidimensional cubes?


__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

KUDOS!
You have come to an end of Module 5.
OOPS! Don’t forget that you have still an assignment to do.
Here it is….
DM104 – Evaluation of Business Performance Page 13 of 15
Cubes and Multidimensional Analysis

VII. ASSIGNMENT

Name: _________________________________ Date: __________________


Year & Section: _________________________ Score: _________________
Direction: Write TRUE if the statement is correct, otherwise write FALSE.
_______ 1. The design of data warehouses and data marts is based on a multidimensional paradigm.
_______ 2. OLAP stands for online analytical programming.
_______ 3. The pivot operation produces a rotation of the axes, swapping some dimensions to obtain a
different view of a data cube.
_______ 4. Roll-down operation performs aggregation on a data cube either by climbing up the hierarchy
or by dimension reduction
_______ 5. Roll-up operation navigates from less detailed data to more detailed data.
_______ 6. Fact table are associated with the entities around which the processes of an organization
revolve.
_______ 7. Dimensions is placed in the middle of the schema and is linked to the dimension tables through
appropriate references.
_______ 8. Data warehouse includes several fact tables, interconnected with dimension tables, linked in
their turn with other dimension tables.
_______ 9. OLAP analyses are based on hierarchies of concepts to consolidate the data and to create
logical views.
_______ 10. Hierarchies of concepts are also used to perform several visualization operations dealing with
data cubes in a data warehouse.

After your long journey of reading and accomplishing the module, let us now
challenge your mind by answering the evaluation part of this module.
DM104 – Evaluation of Business Performance Page 14 of 15
Cubes and Multidimensional Analysis

VIII. EVALUATION

Name: _________________________________ Date: __________________


Year & Section: _________________________ Score: _________________
Direction: Read each sentence/ situation carefully and select the BEST answer among the choices and
encircle its corresponding letter.
1. What does OLAP stands for?
A. Online Analytical Programming C. Online Analytical Paradigm
B. Online Analytical Processing D. Online Analytical Production
2. It is a multidimensional array of values used to represent data.
A. Data Cubes B. Dimensions C. Fact Tables D. Dimension Tables
3. These are associated with the entities around which the processes of an organization revolve.
A. Data Cubes B. Dimensions C. Fact Tables D. Dimension Tables
4. It is placed in the middle of the schema and is linked to the dimension tables through appropriate
references.
A. Data Cubes B. Dimensions C. Fact Tables D. Dimension Tables
5. It consist several fact tables, interconnected with dimension tables, linked in their turn with other
dimension tables.
A. Data Cubes B. Data Warehouse C. Fact Tables D. Dimension Tables
6. This operation produces a rotation of the axes, swapping some dimensions to obtain a different view of
a data cube.
A. Roll-Up B. Roll-Down C. Pivot D. Slice and Dice
7. It performs aggregation on a data cube either by climbing up the hierarchy or by dimension reduction.
A. Roll-Up B. Roll-Down C. Pivot D. Slice and Dice
8. This operation navigates from less detailed data to more detailed data.
A. Roll-Up B. Roll-Down C. Pivot D. Slice and Dice
9. It is the simplest style of data mart schema and is the approach most widely used to develop data
warehouses and dimensional data marts.
A. Star Schema B. Snowflake Schema C. Galaxy Schema D. None of these.
10. A technology performs multidimensional analysis of business data and provides the capability for
complex calculations, trend analysis, and sophisticated data modeling.
A. Data Cubes B. Data Warehouse C. Fact Tables D. OLAP
DM104 – Evaluation of Business Performance Page 15 of 15
Cubes and Multidimensional Analysis

CONGRATULATIONS on reaching the end of this module!


You may now proceed to the next module.
Don’t forget to submit all the exercises, activities and portfolio
on ___________________.
KEEP UP THE GOOD WORK.
Well Done!!!

You might also like