DM104 - Evaluation of Business Performance
DM104 - Evaluation of Business Performance
Welcome Notes:
WELCOME BSIS STUDENTS!
I. INTRODUCTION:
This module discusses cubes and multidimensional analysis. Cube is used to represent data along some
measure of interest. Although called a "cube", it can be 2-dimensional, 3-dimensional, or higher-dimensional.
Each dimension represents some attribute in the database and the cells in the data cube represent the
measure of interest. It also explain the data analysis process that groups data by dimensions and
measurements.
II. OBJECTIVES:
Before you proceed to the main lesson, test yourself in this activity.
GREAT!!!
You may now proceed to the main lesson.
Based on the preliminary activities, what did you notice about it?
________________________________________________________
CONGRATULATIONS!
You may now proceed to the lesson.
Dimension tables. In general, dimensions are associated with the entities around which the processes of
an organization revolve. Dimension tables then correspond to primary entities contained in the data
warehouse, and in most cases they directly derive from master tables stored in OLTP systems, such as
customers, products, sales, locations and time. Each dimension table is often internally structured according
to hierarchical relationships. For example, the temporal dimension is usually based upon two major
hierarchies: {day, week, year} and {day, month, quarter, year}. Similarly, the location dimension may be
hierarchically organized as {street, zip code, city, province, region, country, area}. Products in their turn have
hierarchical structures such as {item, family, type} in the manufacturing industry and {item, category,
department} in the retail industry. In a way, dimensions predetermine the main paths along which OLAP
analyses will presumably be developed.
Fact tables. Fact tables usually refer to transactions and contain two types of data:
Links to dimension tables, that are required to properly reference the information contained in each fact
table;
Numerical values of the attributes that characterize the corresponding transactions and that represent
the actual target of the subsequent OLAP analyses.
For example, a fact table may contain sales transactions and make reference to several dimension tables,
such as customers, points of sale, products, suppliers, time. The corresponding measures of interest are
attributes such as quantity of items sold, unit price and discount. In this example the fact table allows analysts
to evaluate the trends of sales over time, either total, or referred to a single customer, or referred to a group
of customers, that can be identified through any hierarchy induced by the dimension table associated with
the customers. The analyst may also evaluate the trend over time of sales percentages relative to customers
located in a specific region.
DM104 – Evaluation of Business Performance Page 4 of 15
Cubes and Multidimensional Analysis
Figure 1 shows the star schema associated with the fact table representing sales transactions. The fact
table is placed in the middle of the schema and is linked to the dimension tables through appropriate
references. The measures in the fact table appear in bold type.
DM104 – Evaluation of Business Performance Page 5 of 15
Cubes and Multidimensional Analysis
Sometimes dimension tables are connected in their turn to other dimension tables, as shown in Figure
2, through a process of partial data standardization, in order to reduce memory use. In the given example
the dimension table referring to the location is in turn hierarchically connected with the dimension table
containing geographical information. This brings about a snowflake schema.
A data warehouse includes several fact tables, interconnected with dimension tables, linked in their turn
with other dimension tables. The latter type of schema, shown in Figure 3, is termed a galaxy schema.
A fact table connected with n dimension tables may be represented by an n-dimensional data cube where
each axis corresponds to a dimension. Multidimensional cubes are a natural extension of the popular two-
dimensional spreadsheets, which can be interpreted as two-dimensional cubes. For instance, consider a
sales fact table developed along the three dimensions of {time, product, region}. Suppose we select only two
dimensions for the analysis, such as {time, product}, having preset the region attribute along the three values
{USA, Asia, Europa}. In this way we obtain the three two-dimensional tables in which the rows correspond to
quarters of a year and the columns to products (see Tables 3.3–3.5). The cube shown in Figure 4 is a three-
dimensional illustration of the same sales fact table. Atomic data are represented by 36 cells that can be
obtained by crossing all possible values along the three dimensions: time {Q1, Q2, Q3, Q4}, region {USA,
Asia, Europa} and product {TV, PC, DVD}. These atomic cells can be supplemented by 44 cells
corresponding to the summary values obtained through consolidation along one or more dimensions, as
shown by the cube in the figure.
DM104 – Evaluation of Business Performance Page 6 of 15
Cubes and Multidimensional Analysis
Suppose that the sales fact table also contains a fourth dimension represented by the suppliers. The
corresponding data cube constitutes a structure in four-dimensional space and therefore cannot be
represented graphically. However, we can obtain four logical views composed of three-dimensional cubes,
called cuboids, inside the four-dimensional cube, by fixing the values of one dimension.
More generally, starting from a fact table linked to n dimension tables, it is possible to obtain a lattice of
cuboids, each of them corresponding to a different level of consolidation along one or more dimensions. This
type of aggregation is equivalent in structured query language (SQL) to a query sum derived from a group-
by condition. Figure 5 illustrates the lattice composed by the cuboids obtained from the data cube defined
along the four dimensions {time, product, region, supplier}.
DM104 – Evaluation of Business Performance Page 7 of 15
Cubes and Multidimensional Analysis
The cuboid associated with the atomic data, which therefore does not imply any type of consolidation, is
called a base cuboid. At the other extreme, the apex cuboid is defined as the cuboid corresponding to the
consolidation along all dimensions, therefore associated with the grand total of the measure of interest.
DM104 – Evaluation of Business Performance Page 8 of 15
Cubes and Multidimensional Analysis
Roll-up. A roll-up operation, also termed drill-up, consists of an aggregation of data in the cube, which can
be obtained alternatively in the following two ways.
Proceeding upwards to a higher level along a single dimension defined over a concepts hierarchy.
For example, for the {location} dimension it is possible to move upwards from the {city} level to the
{province} level and to consolidate the measures of interest through a group-by conditioned sum
over all records whereby the city belongs to the same province.
Roll-down. A roll-down operation, also referred to as drill-down, is the opposite operation to roll-up. It allows
navigation through a data cube from aggregated and consolidated information to more detailed information.
The effect is to reverse the result achieved through a roll-up operation. A drill-down operation can therefore
be carried out in two ways.
Shifting down to a lower level along a single dimension hierarchy. For example, in the case of the
{location} dimension, it is possible to shift from the {province} level to the {city} level and to
disaggregate the measures of interest over all records whereby the city belongs to the same
province.
Adding one dimension. For example, the introduction of the {time} dimension leads to disaggregate
the measures of interest over all time periods existing in a data cube.
Slice and dice. Through the slice operation the value of an attribute is selected and fixed along one
dimension. For example, Table 3.3 has been obtained by fixing the region at the {Usa} value. The dice
operation obtains a cube in a subspace by selecting several dimensions simultaneously.
Pivot. The pivot operation, also referred to as rotation, produces a rotation of the axes, swapping some
dimensions to obtain a different view of a data cube.
For example, if a data cube includes 5 dimensions, and if each of these dimensions includes 3 hierarchical
levels, the number of cuboids is equal to 45 = 210 ≈ 103. It is clear that the full materialization of the cuboids
for all the cubes associated with the fact tables of a data warehouse would impose storage requirements that
could be hardly sustained over time, considering the rate at which new records are gathered.
For all of the above reasons, it is necessary to strike a balance between the need for fast access to
information, which would suggest the full materialization of the cuboids, and the need to keep memory use
within reasonable limits. As a consequence, preventive materialization should be carried out only for those
cuboids that are most frequently accessed, while for the others the computation should be carried out on
demand only when actual queries requesting the associated information are performed. This latter approach
is referred to as partial materialization of the information relative to the data cubes.
1. The design of data warehouses and data marts is based on a multidimensional __________.
2. __________ representation is based on a star schema which contains dimension tables and fact tables.
3. __________ are associated with the entities around which the processes of an organization revolve.
4. __________ is placed in the middle of the schema and is linked to the dimension tables through
appropriate references.
5. __________ includes several fact tables, interconnected with dimension tables, linked in their turn with
other dimension tables.
6. __________ analyses are based on hierarchies of concepts to consolidate the data and to create logical
views.
7. __________ are also used to perform several visualization operations dealing with data cubes in a data
warehouse.
8. __________ operation performs aggregation on a data cube either by climbing up the hierarchy or by
dimension reduction.
9. __________ operation navigates from less detailed data to more detailed data.
10. __________ operation produces a rotation of the axes, swapping some dimensions to obtain a different
view of a data cube.
VI. GENERALIZATION
KUDOS!
You have come to an end of Module 5.
OOPS! Don’t forget that you have still an assignment to do.
Here it is….
DM104 – Evaluation of Business Performance Page 13 of 15
Cubes and Multidimensional Analysis
VII. ASSIGNMENT
After your long journey of reading and accomplishing the module, let us now
challenge your mind by answering the evaluation part of this module.
DM104 – Evaluation of Business Performance Page 14 of 15
Cubes and Multidimensional Analysis
VIII. EVALUATION