0% found this document useful (0 votes)
45 views17 pages

Data Warehouse Modeling

Data warehouse modeling involves designing schemas to organize detailed and summarized information to support business intelligence needs. Effective modeling creates a schema that visualizes relationships between warehouse data and allows for efficient querying. Data warehouse modeling differs from operational database modeling by focusing on supporting complex queries over historical data rather than transactions. Key aspects of data warehouse modeling include conceptual, logical, and physical modeling stages to map business requirements to database structures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views17 pages

Data Warehouse Modeling

Data warehouse modeling involves designing schemas to organize detailed and summarized information to support business intelligence needs. Effective modeling creates a schema that visualizes relationships between warehouse data and allows for efficient querying. Data warehouse modeling differs from operational database modeling by focusing on supporting complex queries over historical data rather than transactions. Key aspects of data warehouse modeling include conceptual, logical, and physical modeling stages to map business requirements to database structures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Data Warehouse Modeling

Data warehouse modeling is the process of designing the schemas of the detailed
and summarized information of the data warehouse. The goal of data warehouse
modeling is to develop a schema describing the reality, or at least a part of the fact,
which the data warehouse is needed to support.

Data warehouse modeling is an essential stage of building a data warehouse for two
main reasons. Firstly, through the schema, data warehouse clients can visualize the
relationships among the warehouse data, to use them with greater ease. Secondly, a
well-designed schema allows an effective data warehouse structure to emerge, to
help decrease the cost of implementing the warehouse and improve the efficiency of
using it.

Data modeling in data warehouses is different from data modeling in operational


database systems. The primary function of data warehouses is to support DSS
processes. Thus, the objective of data warehouse modeling is to make the data
warehouse efficiently support complex queries on long term information.

In contrast, data modeling in operational database systems targets efficiently


supporting simple transactions in the database such as retrieving, inserting, deleting,
and changing data. Moreover, data warehouses are designed for the customer with
general information knowledge about the enterprise, whereas operational database
systems are more oriented toward use by software specialists for creating distinct
applications.

Data Warehouse model is illustrated in the given diagram.


The data within the specific warehouse itself has a particular architecture with the
emphasis on various levels of summarization, as shown in figure:

The current detail record is central in importance as it:


o Reflects the most current happenings, which are commonly the most
stimulating.
o It is numerous as it is saved at the lowest method of the Granularity.
o It is always (almost) saved on disk storage, which is fast to access but
expensive and difficult to manage.

Older detail data is stored in some form of mass storage, and it is infrequently
accessed and kept at a level detail consistent with current detailed data.

Lightly summarized data is data extract from the low level of detail found at the
current, detailed level and usually is stored on disk storage. When building the data
warehouse have to remember what unit of time is summarization done over and also
the components or what attributes the summarized data will contain.

Highly summarized data is compact and directly available and can even be found
outside the warehouse.

Metadata is the final element of the data warehouses and is really of various
dimensions in which it is not the same as file drawn from the operational data, but it
is used as:-

o A directory to help the DSS investigator locate the items of the data
warehouse.
o A guide to the mapping of record as the data is changed from the operational
data to the data warehouse environment.
o A guide to the method used for summarization between the current, accurate
data and the lightly summarized information and the highly summarized data,
etc.

Data Modeling Life Cycle


In this section, we define a data modeling life cycle. It is a straight forward process of
transforming the business requirements to fulfill the goals for storing, maintaining,
and accessing the data within IT systems. The result is a logical and physical data
model for an enterprise data warehouse.

The objective of the data modeling life cycle is primarily the creation of a storage
area for business information. That area comes from the logical and physical data
modeling stages, as shown in Figure:
Conceptual Data Model
A conceptual data model recognizes the highest-level relationships between the
different entities.

Characteristics of the conceptual data model

o It contains the essential entities and the relationships among them.


o No attribute is specified.
o No primary key is specified.

We can see that the only data shown via the conceptual data model is the entities
that define the data and the relationships between those entities. No other data, as
shown through the conceptual data model.
Logical Data Model
A logical data model defines the information in as much structure as possible,
without observing how they will be physically achieved in the database. The primary
objective of logical data modeling is to document the business data structures,
processes, rules, and relationships by a single view - the logical data model.

Features of a logical data model

o It involves all entities and relationships among them.


o All attributes for each entity are specified.
o The primary key for each entity is stated.
o Referential Integrity is specified (FK Relation).

The phase for designing the logical data model which are as follows:

o Specify primary keys for all entities.


o List the relationships between different entities.
o List all attributes for each entity.
o Normalization.
o No data types are listed
Physical Data Model
Physical data model describes how the model will be presented in the database. A
physical database model demonstrates all table structures, column names, data
types, constraints, primary key, foreign key, and relationships between tables. The
purpose of physical data modeling is the mapping of the logical data model to the
physical structures of the RDBMS system hosting the data warehouse. This contains
defining physical RDBMS structures, such as tables and data types to use when
storing the information. It may also include the definition of new data structures for
enhancing query performance.

Characteristics of a physical data model

o Specification all tables and columns.


o Foreign keys are used to recognize relationships between tables.

The steps for physical data model design which are as follows:

o Convert entities to tables.


o Convert relationships to foreign keys.
o Convert attributes to columns.
Types of Data Warehouse Models

Enterprise Warehouse
An Enterprise warehouse collects all of the records about subjects spanning the
entire organization. It supports corporate-wide data integration, usually from one or
more operational systems or external data providers, and it's cross-functional in
scope. It generally contains detailed information as well as summarized information
and can range in estimate from a few gigabyte to hundreds of gigabytes, terabytes,
or beyond.

An enterprise data warehouse may be accomplished on traditional mainframes, UNIX


super servers, or parallel architecture platforms. It required extensive business
modeling and may take years to develop and build.

Data Mart
A data mart includes a subset of corporate-wide data that is of value to a specific
collection of users. The scope is confined to particular selected subjects. For example,
a marketing data mart may restrict its subjects to the customer, items, and sales. The
data contained in the data marts tend to be summarized.

Data Marts is divided into two parts:

Independent Data Mart: Independent data mart is sourced from data captured
from one or more operational systems or external data providers, or data generally
locally within a different department or geographic area.

Dependent Data Mart: Dependent data marts are sourced exactly from enterprise
data-warehouses.

Virtual Warehouses
Virtual Data Warehouses is a set of perception over the operational database. For
effective query processing, only some of the possible summary vision may be
materialized. A virtual warehouse is simple to build but required excess capacity on
operational database servers.

OLAP Operations in the


Multidimensional Data Model
In the multidimensional model, the records are organized into various dimensions,
and each dimension includes multiple levels of abstraction described by concept
hierarchies. This organization support users with the flexibility to view data from
various perspectives. A number of OLAP data cube operation exist to demonstrate
these different views, allowing interactive queries and search of the record at hand.
Hence, OLAP supports a user-friendly environment for interactive data analysis.

Consider the OLAP operations which are to be performed on multidimensional data.


The figure shows data cubes for sales of a shop. The cube contains the dimensions,
location, and time and item, where the location is aggregated with regard to city
values, time is aggregated with respect to quarters, and an item is aggregated with
respect to item types.

Roll-Up
The roll-up operation (also known as drill-up or aggregation operation) performs
aggregation on a data cube, by climbing down concept hierarchies, i.e., dimension
reduction. Roll-up is like zooming-out on the data cubes. Figure shows the result of
roll-up operations performed on the dimension location. The hierarchy for the
location is defined as the Order Street, city, province, or state, country. The roll-up
operation aggregates the data by ascending the location hierarchy from the level of
the city to the level of the country.

When a roll-up is performed by dimensions reduction, one or more dimensions are


removed from the cube. For example, consider a sales data cube having two
dimensions, location and time. Roll-up may be performed by removing, the time
dimensions, appearing in an aggregation of the total sales by location, relatively than
by location and by time.

14.9M
234
Triggers in SQL (Hindi)

Example
Consider the following cubes illustrating temperature of certain days recorded
weekly:

Temperature 64 65 68 69 70 71 72 75 80 81 83

Week1 1 0 1 0 1 0 0 0 0 0 1

Week2 0 0 0 1 0 0 1 2 0 1 0

Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in
temperature from the above cubes.
To do this, we have to group column and add up the value according to the concept
hierarchies. This operation is known as a roll-up.

By doing this, we contain the following cube:

Temperature cool mild hot

Week1 2 1 1

Week2 2 1 1

The roll-up operation groups the information by levels of temperature.

The following diagram illustrates how roll-up works.

Drill-Down
The drill-down operation (also called roll-down) is the reverse operation of roll-up.
Drill-down is like zooming-in on the data cube. It navigates from less detailed record
to more detailed data. Drill-down can be performed by either stepping down a
concept hierarchy for a dimension or adding additional dimensions.

Figure shows a drill-down operation performed on the dimension time by stepping


down a concept hierarchy which is defined as day, month, quarter, and year. Drill-
down appears by descending the time hierarchy from the level of the quarter to a
more detailed level of the month.

Because a drill-down adds more details to the given data, it can also be performed
by adding a new dimension to a cube. For example, a drill-down on the central cubes
of the figure can occur by introducing an additional dimension, such as a customer
group.

Example
Drill-down adds more details to the given data

Temperature cool mild hot

Day 1 0 0 0

Day 2 0 0 0

Day 3 0 0 1

Day 4 0 1 0

Day 5 1 0 0

Day 6 0 0 0

Day 7 1 0 0

Day 8 0 0 0

Day 9 1 0 0

Day 10 0 1 0

Day 11 0 1 0

Day 12 0 1 0
Day 13 0 0 1

Day 14 0 0 0

The following diagram illustrates how Drill-down works.

Slice
A slice is a subset of the cubes corresponding to a single value for one or more
members of the dimension. For example, a slice operation is executed when the
customer wants a selection on one dimension of a three-dimensional cube resulting
in a two-dimensional site. So, the Slice operations perform a selection on one
dimension of the given cube, thus resulting in a subcube.

For example, if we make the selection, temperature=cool we will obtain the following
cube:
Temperature cool

Day 1 0

Day 2 0

Day 3 0

Day 4 0

Day 5 1

Day 6 1

Day 7 1

Day 8 1

Day 9 1

Day 11 0

Day 12 0

Day 13 0

Day 14 0

The following diagram illustrates how Slice works.


Here Slice is functioning for the dimensions "time" using the criterion time = "Q1".

It will form a new sub-cubes by selecting one or more dimensions.

Dice
The dice operation describes a subcube by operating a selection on two or more
dimension.

For example, Implement the selection (time = day 3 OR time = day 4) AND
(temperature = cool OR temperature = hot) to the original cubes we get the
following subcube (still two-dimensional)
Temperature cool hot

Day 3 0 1

Day 4 0 0

Consider the following diagram, which shows the dice operations.

The dice operation on the cubes based on the following selection criteria involves
three dimensions.

o (location = "Toronto" or "Vancouver")


o (time = "Q1" or "Q2")
o (item =" Mobile" or "Modem")

Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations which
rotates the data axes in view to provide an alternative presentation of the data. It
may contain swapping the rows and columns or moving one of the row-dimensions
into the column dimensions.

Consider the following diagram, which shows the pivot operation.


Other OLAP Operations
executes queries containing more than one fact table. The drill-through operations
make use of relational SQL facilitates to drill through the bottom level of a data
cubes down to its back-end relational tables.

Other OLAP operations may contain ranking the top-N or bottom-N elements in lists,
as well as calculate moving average, growth rates, and interests, internal rates of
returns, depreciation, currency conversions, and statistical tasks.

OLAP offers analytical modeling capabilities, containing a calculation engine for


determining ratios, variance, etc. and for computing measures across various
dimensions. It can generate summarization, aggregation, and hierarchies at each
granularity level and at every dimensions intersection. OLAP also provide functional
models for forecasting, trend analysis, and statistical analysis. In this context, the
OLAP engine is a powerful data analysis tool.

You might also like