Data Warehouse Modeling
Data Warehouse Modeling
Data warehouse modeling is the process of designing the schemas of the detailed
and summarized information of the data warehouse. The goal of data warehouse
modeling is to develop a schema describing the reality, or at least a part of the fact,
which the data warehouse is needed to support.
Data warehouse modeling is an essential stage of building a data warehouse for two
main reasons. Firstly, through the schema, data warehouse clients can visualize the
relationships among the warehouse data, to use them with greater ease. Secondly, a
well-designed schema allows an effective data warehouse structure to emerge, to
help decrease the cost of implementing the warehouse and improve the efficiency of
using it.
Older detail data is stored in some form of mass storage, and it is infrequently
accessed and kept at a level detail consistent with current detailed data.
Lightly summarized data is data extract from the low level of detail found at the
current, detailed level and usually is stored on disk storage. When building the data
warehouse have to remember what unit of time is summarization done over and also
the components or what attributes the summarized data will contain.
Highly summarized data is compact and directly available and can even be found
outside the warehouse.
Metadata is the final element of the data warehouses and is really of various
dimensions in which it is not the same as file drawn from the operational data, but it
is used as:-
o A directory to help the DSS investigator locate the items of the data
warehouse.
o A guide to the mapping of record as the data is changed from the operational
data to the data warehouse environment.
o A guide to the method used for summarization between the current, accurate
data and the lightly summarized information and the highly summarized data,
etc.
The objective of the data modeling life cycle is primarily the creation of a storage
area for business information. That area comes from the logical and physical data
modeling stages, as shown in Figure:
Conceptual Data Model
A conceptual data model recognizes the highest-level relationships between the
different entities.
We can see that the only data shown via the conceptual data model is the entities
that define the data and the relationships between those entities. No other data, as
shown through the conceptual data model.
Logical Data Model
A logical data model defines the information in as much structure as possible,
without observing how they will be physically achieved in the database. The primary
objective of logical data modeling is to document the business data structures,
processes, rules, and relationships by a single view - the logical data model.
The phase for designing the logical data model which are as follows:
The steps for physical data model design which are as follows:
Enterprise Warehouse
An Enterprise warehouse collects all of the records about subjects spanning the
entire organization. It supports corporate-wide data integration, usually from one or
more operational systems or external data providers, and it's cross-functional in
scope. It generally contains detailed information as well as summarized information
and can range in estimate from a few gigabyte to hundreds of gigabytes, terabytes,
or beyond.
Data Mart
A data mart includes a subset of corporate-wide data that is of value to a specific
collection of users. The scope is confined to particular selected subjects. For example,
a marketing data mart may restrict its subjects to the customer, items, and sales. The
data contained in the data marts tend to be summarized.
Independent Data Mart: Independent data mart is sourced from data captured
from one or more operational systems or external data providers, or data generally
locally within a different department or geographic area.
Dependent Data Mart: Dependent data marts are sourced exactly from enterprise
data-warehouses.
Virtual Warehouses
Virtual Data Warehouses is a set of perception over the operational database. For
effective query processing, only some of the possible summary vision may be
materialized. A virtual warehouse is simple to build but required excess capacity on
operational database servers.
Roll-Up
The roll-up operation (also known as drill-up or aggregation operation) performs
aggregation on a data cube, by climbing down concept hierarchies, i.e., dimension
reduction. Roll-up is like zooming-out on the data cubes. Figure shows the result of
roll-up operations performed on the dimension location. The hierarchy for the
location is defined as the Order Street, city, province, or state, country. The roll-up
operation aggregates the data by ascending the location hierarchy from the level of
the city to the level of the country.
14.9M
234
Triggers in SQL (Hindi)
Example
Consider the following cubes illustrating temperature of certain days recorded
weekly:
Temperature 64 65 68 69 70 71 72 75 80 81 83
Week1 1 0 1 0 1 0 0 0 0 0 1
Week2 0 0 0 1 0 0 1 2 0 1 0
Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in
temperature from the above cubes.
To do this, we have to group column and add up the value according to the concept
hierarchies. This operation is known as a roll-up.
Week1 2 1 1
Week2 2 1 1
Drill-Down
The drill-down operation (also called roll-down) is the reverse operation of roll-up.
Drill-down is like zooming-in on the data cube. It navigates from less detailed record
to more detailed data. Drill-down can be performed by either stepping down a
concept hierarchy for a dimension or adding additional dimensions.
Because a drill-down adds more details to the given data, it can also be performed
by adding a new dimension to a cube. For example, a drill-down on the central cubes
of the figure can occur by introducing an additional dimension, such as a customer
group.
Example
Drill-down adds more details to the given data
Day 1 0 0 0
Day 2 0 0 0
Day 3 0 0 1
Day 4 0 1 0
Day 5 1 0 0
Day 6 0 0 0
Day 7 1 0 0
Day 8 0 0 0
Day 9 1 0 0
Day 10 0 1 0
Day 11 0 1 0
Day 12 0 1 0
Day 13 0 0 1
Day 14 0 0 0
Slice
A slice is a subset of the cubes corresponding to a single value for one or more
members of the dimension. For example, a slice operation is executed when the
customer wants a selection on one dimension of a three-dimensional cube resulting
in a two-dimensional site. So, the Slice operations perform a selection on one
dimension of the given cube, thus resulting in a subcube.
For example, if we make the selection, temperature=cool we will obtain the following
cube:
Temperature cool
Day 1 0
Day 2 0
Day 3 0
Day 4 0
Day 5 1
Day 6 1
Day 7 1
Day 8 1
Day 9 1
Day 11 0
Day 12 0
Day 13 0
Day 14 0
Dice
The dice operation describes a subcube by operating a selection on two or more
dimension.
For example, Implement the selection (time = day 3 OR time = day 4) AND
(temperature = cool OR temperature = hot) to the original cubes we get the
following subcube (still two-dimensional)
Temperature cool hot
Day 3 0 1
Day 4 0 0
The dice operation on the cubes based on the following selection criteria involves
three dimensions.
Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations which
rotates the data axes in view to provide an alternative presentation of the data. It
may contain swapping the rows and columns or moving one of the row-dimensions
into the column dimensions.
Other OLAP operations may contain ranking the top-N or bottom-N elements in lists,
as well as calculate moving average, growth rates, and interests, internal rates of
returns, depreciation, currency conversions, and statistical tasks.