A Multi-Dimensional Data Model
A Multi-Dimensional Data Model
1
From Tables and Spreadsheets to Data Cubes
• A data warehouse is based on a multidimensional data model which views
data in the form of a data cube.
• In general terms, dimensions are the perspectives or entities with respect
to which an organization wants to keep records.
• A data cube, such as sales, allows data to be modeled and viewed in
multiple dimensions
– Dimension tables, such as item (item_name, brand, type), or time(day,
week, month, quarter, year)
– Fact table contains measures (such as dollars_sold) and keys to each of
the related dimension tables
• Each dimension may have a table associated with it, called a dimension
table,
• Star schema: The most common modeling paradigm is the star schema,
in which the data warehouse contains (1) a large central table (fact
table) containing the bulk of the data, with no redundancy, and (2) a set
of smaller attendant tables (dimension tables), one for each dimension.
The schema graph resembles a starburst, with the dimension tables
displayed in a radial pattern around the central fact table.
Example 3.1 Star schema.
• A star schema for AllElectronics sales is shown in Figure
3.4. Sales are considered along four dimensions,
namely, time, item, branch, and location.
• The schema contains a central fact table for sales that
contains keys to each of the four dimensions, along with
two measures: dollars sold and units sold. To minimize
the size of the fact table, dimension identifiers (such as
time key and item key) are system-generated identifiers.
• Snowflake schema: The snowflake schema is a
variant of the star schema model, where some
dimension tables are normalized, thereby
further splitting the data into additional
tables. The resulting schema graph forms a
shape similar to a snowflake
• Example 3.2
• Snowflake schema. A snowflake schema for AllElectronics sales is given in Figure 3.5.
Here, the sales fact table is identical to that of the star schema in Figure 3.4. The main
• The single dimension table for item in the star schema is normalized in the snowflake
schema, resulting in new item and supplier tables. For example, the item dimension
table now contains the attributes item key, item name, brand, type, and supplier key,
where supplier key is linked to the supplier dimension table, containing supplier key and
• Similarly, the single dimension table for location in the star schema can be normalized
into two new tables: location and city. The city key in the new location table links to the
city dimension.
• Notice that further normalization can be performed on province or state and country in
This schema specifies two fact tables, sales and shipping. The sales table
definition is identical to that of the star schema (Figure 3.4). The shipping
table has five dimensions, or keys: item key, time key, shipper key, from
location, and to location, and two measures: dollars cost and units
shipped.
• For example, the dimensions tables for time, item, and location are
subjects.
department wide.
• For data marts, the star or snowflake schema are commonly used.
Cube Definition Syntax (BNF) in DMQL
20
Defining Star Schema in DMQL
21
Defining Snowflake Schema in DMQL
22
Defining Fact Constellation in DMQL
23
• Distributive: An aggregate function is distributive if it can be computed in a
• Suppose the data are partitioned into n sets. We apply the function to each
function to the n aggregate values is the same as that derived by applying the
function to the entire data set (without partitioning), the function can be
data cube by first partitioning the cube into a set of subcubes, computing count()
for each subcube, and then summing up the counts obtained for each subcube.
• Hence, count() is a distributive aggregate function. For the same reason, sum(),
25
Concept Hierarchies
• Roll-up: The roll-up operation (also called the drill-up operation by some
dimensions are removed from the given cube. For example, consider a
sales data cube containing only the two dimensions location and time.
and by time.
• Drill-down: Drill-down is the reverse of roll-up. It
additional dimensions.
• Figure 3.10 shows a slice operation where the sales data are
selected from the central cube for the dimension time using the