0% found this document useful (0 votes)
8 views24 pages

DWDM Unit 1

The document provides an overview of data warehousing, detailing its definition, characteristics, goals, and benefits. It explains the differences between operational databases and data warehouses, outlines the types of users and metadata involved, and introduces the concept of data marts along with their types and implementation steps. Overall, it emphasizes the importance of data warehousing in supporting decision-making and business intelligence efforts.

Uploaded by

Siddharth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views24 pages

DWDM Unit 1

The document provides an overview of data warehousing, detailing its definition, characteristics, goals, and benefits. It explains the differences between operational databases and data warehouses, outlines the types of users and metadata involved, and introduces the concept of data marts along with their types and implementation steps. Overall, it emphasizes the importance of data warehousing in supporting decision-making and business intelligence efforts.

Uploaded by

Siddharth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Data Warehousing and Data Mining

Unit 1: Introduction to Data Warehousing

Data Warehouse
Data Warehouse is a relational database management system (RDBMS) construct to meet the requirement of
transaction processing systems. It can be loosely described as any centralized data repository which can be
queried for business benefits. It is a database that stores information oriented to satisfy decision-making requests. It
is a group of decision support technologies, targets to enabling the knowledge worker (executive, manager, and
analyst) to make superior and higher decisions. So, Data Warehousing support architectures and tool for business
executives to systematically organize, understand and use their information to make strategic decisions.
Data Warehouse environment contains an extraction, transportation, and loading (ETL) solution, an online analytical
processing (OLAP) engine, customer analysis tools, and other applications that handle the process of gathering
information and delivering it to business users.

What is a Data Warehouse?


A Data Warehouse (DW) is a relational database that is designed for query and analysis rather than transaction
processing. It includes historical data derived from transaction data from single and multiple sources.
A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on providing support for
decision-makers for data modeling and analysis.

A Data Warehouse is a group of data specific to the entire organization, not only to a particular group of users.
It is not used for daily operations and transaction processing but used for making decisions.
A Data Warehouse can be viewed as a data system with the following attributes:

It is a database designed for investigative tasks, using data from various applications.

It supports a relatively small number of clients with relatively long interactions.

It includes current and historical data to provide a historical perspective of information.

Its usage is read-intensive.

It contains a few large tables.

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in support of


management's decisions."

Characteristics of Data Warehouse

Data Warehousing and Data Mining 1


Subject-Oriented
A data warehouse target on the modeling and analysis of data for decision-makers. Therefore, data warehouses
typically provide a concise and straightforward view around a particular subject, such as customer, product, or
sales, instead of the global organization's ongoing operations. This is done by excluding data that are not useful
concerning the subject and including all data needed by the users to understand the subject.

Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and online transaction
records. It requires performing data cleaning and integration during data warehousing to ensure consistency in
naming conventions, attributes types, etc., among different data sources.

Data Warehousing and Data Mining 2


Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files from 3 months, 6 months, 12
months, or even previous data from a data warehouse. These variations with a transactions system, where often
only the most current file is kept.

Non-Volatile
The data warehouse is a physically separate data storage, which is transformed from the source operational
RDBMS. The operational updates of data do not occur in the data warehouse, i.e., update, insert, and delete
operations are not performed. It usually requires only two procedures in data accessing: Initial loading of data and
access to data. Therefore, the DW does not require transaction processing, recovery, and concurrency capabilities,
which allows for substantial speedup of data retrieval. Non-Volatile defines that once entered into the warehouse,
and data should not change.

Data Warehousing and Data Mining 3


Goals of Data Warehousing
To help reporting as well as analysis

Maintain the organization's historical information

Be the foundation for decision making.

Need for Data Warehouse


Data Warehouse is needed for the following reasons:

1. 1) Business User: Business users require a data warehouse to view summarized data from the past. Since these
people are non-technical, the data may be presented to them in an elementary form.

2. 2) Store historical data: Data Warehouse is required to store the time variable data from the past. This input is
made to be used for various purposes.

Data Warehousing and Data Mining 4


3. 3) Make strategic decisions: Some strategies may be depending upon the data in the data warehouse. So, data
warehouse contributes to making strategic decisions.

4. 4) For data consistency and quality: Bringing the data from different sources at a commonplace, the user can
effectively undertake to bring the uniformity and consistency in data.

5. 5) High response time: Data warehouse has to be ready for somewhat unexpected loads and types of queries,
which demands a significant degree of flexibility and quick response time.

Benefits of Data Warehouse


1. Understand business trends and make better forecasting decisions.

2. Data Warehouses are designed to perform well enormous amounts of data.

3. The structure of data warehouses is more accessible for end-users to navigate, understand, and query.

4. Queries that would be complex in many normalized databases could be easier to build and maintain in data
warehouses.

5. Data warehousing is an efficient method to manage demand for lots of information from lots of users.

6. Data warehousing provide the capabilities to analyze a large amount of historical data.

Prerequisites
Before learning about Data Warehouse, you must have the fundamental knowledge of basic database concepts
such as schema, ER model, structured query language, etc.

Difference between Operational Database and Data


Warehouse
Operational Database Data Warehouse

Operational systems are designed to support high-volume Data warehousing systems are typically designed to support
transaction processing. high-volume analytical processing (i.e., OLAP).

Data warehousing systems are usually concerned with


Operational systems are usually concerned with current data.
historical data.

Data within operational systems are mainly updated regularly Non-volatile, new data may be added regularly. Once Added
according to need. rarely changed.

It is designed for analysis of business measures by subject


It is designed for real-time business dealing and processes.
area, categories, and attributes.

It is optimized for a simple set of transactions, generally adding or It is optimized for extent loads and high, complex,
retrieving a single row at a time per table. unpredictable queries that access many rows per table.

It is optimized for validation of incoming information during Loaded with consistent, valid information, requires no real-
transactions, uses validation data tables. time validation.

It supports thousands of concurrent clients. It supports a few concurrent clients relative to OLTP.

Operational systems are widely process-oriented. Data warehousing systems are widely subject-oriented

Operational systems are usually optimized to perform fast inserts Data warehousing systems are usually optimized to perform
and updates of associatively small volumes of data. fast retrievals of relatively high volumes of data.

Data In Data Out

Less Number of data accessed. Large Number of data accessed.

Relational databases are created for on-line transactional Data Warehouse designed for on-line Analytical Processing
Processing (OLTP) (OLAP)

What are Data Warehouse Users?


There are various types of data warehouse users which are as follows −

Statisticians − There are generally only a handful of sophisticated analysts Statisticians and operations
research types in any organization. Though few, they are multiple best users of the data warehouse; those

Data Warehousing and Data Mining 5


whose work can contribute to closed-loop systems that deeply hold the operations and profitability of the
organizations.
These users must come to fondness the data warehouse. It is usually that is not difficult. These people are very
self-sufficient and required only to be pointed to the database and given some simple instructions about how to
get to the information and what times of the day are best for implementing large queries to retrieve data to
analyze using their sophisticated tools.

Knowledge Workers − A relatively small number of analysts implement the number of new queries and analyses
against the data warehouse. These are the users who have the “Designer” or “Analyst” versions of user access
tools.

After a few iterations, their queries and documents generally get published for the benefit of the information
consumers. Knowledge Workers are often intensely engaged with the data warehouse design and place the
highest demands on the ongoing data warehouse operations team for training and support.

Information Consumers − Some users of the data warehouse are information consumers and they will probably
never compose a valid ad hoc query. They use static or simple interactive documents that have been developed.
It is simple to forget about these users because they generally communicate with the data warehouse only
through the work product of others.

Executives − Executives are a specific case of the Information Consumers group. Some executives issue their
queries, but an executive’s slightest musing can create a flurry of activity between the other types of users. A
wise data warehouse designer will develop a very frigid digital dashboard for executives, considering it is easy
and economical to do so. Generally, this must follow other data warehouse work, but it never hurts to influence
the bosses.

What is Meta Data?


Metadata is data about the data or documentation about the information which is required by the users. In data
warehousing, metadata is one of the essential aspects.

Metadata includes the following:

1. The location and descriptions of warehouse systems and components.

2. Names, definitions, structures, and content of data-warehouse and end-users views.

3. Identification of authoritative data sources.

4. Integration and transformation rules used to populate data.

5. Integration and transformation rules used to deliver information to end-user analytical tools.

6. Subscription information for information delivery to analysis subscribers.

7. Metrics used to analyze warehouses usage and performance.

8. Security authorizations, access control list, etc.

Metadata is used for building, maintaining, managing, and using the data warehouses. Metadata allow users access
to help understand the content and find data.

Several examples of metadata are:


1. A library catalog may be considered metadata. The directory metadata consists of several predefined
components representing specific attributes of a resource, and each item can have one or more values. These
components could be the name of the author, the name of the document, the publisher's name, the publication
date, and the methods to which it belongs.

2. The table of content and the index in a book may be treated metadata for the book.

3. Suppose we say that a data item about a person is 80. This must be defined by noting that it is the person's
weight and the unit is kilograms. Therefore, (weight, kilograms) is the metadata about the data is 80.

4. Another examples of metadata are data about the tables and figures in a report like this book. A table (which is a
record) has a name (e.g., table titles), and there are column names of the tables that may be treated metadata.
The figures also have titles or names.

Data Warehousing and Data Mining 6


Why is metadata necessary in a data warehouses?
First, it acts as the glue that links all parts of the data warehouses.

Next, it provides information about the contents and structures to the developers.

Finally, it opens the doors to the end-users and makes the contents recognizable in their terms.

Metadata is Like a Nerve Center. Various processes during the building and administering of the data warehouse
generate parts of the data warehouse metadata. Another uses parts of metadata generated by one process. In the
data warehouse, metadata assumes a key position and enables communication among various methods. It acts as a
nerve centre in the data warehouse.

Types of Metadata:
There are many types of metadata that can be used to describe different aspects of data, such as its content,
format, structure, and provenance. Some common types of metadata include:

1. Descriptive metadata: This type of metadata provides information about the content, structure, and format of
data, and may include elements such as title, author, subject, and keywords. Descriptive metadata helps to
identify and describe the content of data and can be used to improve the discoverability of data through search
engines and other tools.

2. Administrative metadata: This type of metadata provides information about the management and technical
characteristics of data, and may include elements such as file format, size, and creation date. Administrative
metadata helps to manage and maintain data over time and can be used to support data governance and
preservation.

3. Structural metadata: This type of metadata provides information about the relationships and organization of
data, and may include elements such as links, tables of contents, and indices. Structural metadata helps to
organize and connect data and can be used to facilitate the navigation and discovery of data.

4. Provenance metadata: This type of metadata provides information about the history and origin of data, and may
include elements such as the creator, date of creation, and sources of data. Provenance metadata helps to
provide context and credibility to data and can be used to support data governance and preservation.

5. Rights metadata: This type of metadata provides information about the ownership, licensing, and access
controls of data, and may include elements such as copyright, permissions, and terms of use. Rights metadata
helps to manage and protect the intellectual property rights of data and can be used to support data governance
and compliance.

6. Educational metadata: This type of metadata provides information about the educational value and learning
objectives of data, and may include elements such as learning outcomes, educational levels, and competencies.
Educational metadata can be used to support the discovery and use of educational resources, and to support
the design and evaluation of learning environments.

What is Data Mart?


A Data Mart is a subset of a directorial information store, generally oriented to a specific purpose or primary data
subject which may be distributed to provide business needs. Data Marts are analytical record stores designed to
focus on particular business functions for a specific community within an organization. Data marts are derived from
subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology, the data
warehouse is created from the union of organizational data marts.
The fundamental use of a data mart is Business Intelligence (BI) applications. BI is used to gather, store, access,
and analyze record. It can be used by smaller businesses to utilize the data they have accumulated since it is less
expensive than implementing a data warehouse.

Reasons for creating a data mart


Creates collective data by a group of users

Easy access to frequently needed data

Ease of creation

Data Warehousing and Data Mining 7


Improves end-user response time

Lower cost than implementing a complete data warehouses

Potential clients are more clearly defined than in a comprehensive data warehouse

It contains only essential business data and is less cluttered.

Types of Data Marts


There are mainly two approaches to designing data marts. These approaches are

Dependent Data Marts

Independent Data Marts

Dependent Data Marts


A dependent data marts is a logical subset of a physical subset of a higher data warehouse. According to this
technique, the data marts are treated as the subsets of a data warehouse. In this technique, firstly a data warehouse
is created from which further various data marts can be created. These data mart are dependent on the data
warehouse and extract the essential record from it. In this technique, as the data warehouse creates the data mart;
therefore, there is no need for data mart integration. It is also known as a top-down approach.

Independent Data Marts


The second approach is Independent data marts (IDM) Here, firstly independent data marts are created, and then a
data warehouse is designed using these independent multiple data marts. In this approach, as all the data marts are
designed independently; therefore, the integration of data marts is required. It is also termed as a bottom-up
approach as the data marts are integrated to develop a data warehouse.

Other than these two categories, one more type exists that is called "Hybrid Data Marts."

Hybrid Data Marts


It allows us to combine input from sources other than a data warehouse. This could be helpful for many situations;
especially when Adhoc integrations are needed, such as after a new group or product is added to the organizations.

Steps in Implementing a Data Mart


The significant steps in implementing a data mart are to design the schema, construct the physical storage,
populate the data mart with data from source systems, access it to make informed decisions and manage it over
time. So, the steps are:

Designing
The design step is the first in the data mart process. This phase covers all of the functions from initiating the request
for a data mart through gathering data about the requirements and developing the logical and physical design of the
data mart.
It involves the following tasks:

1. Gathering the business and technical requirements

2. Identifying data sources

3. Selecting the appropriate subset of data

4. Designing the logical and physical architecture of the data mart.

Constructing
This step contains creating the physical database and logical structures associated with the data mart to provide
fast and efficient access to the data.
It involves the following tasks:

Data Warehousing and Data Mining 8


1. Creating the physical database and logical structures such as tablespaces associated with the data mart.

2. creating the schema objects such as tables and indexes describe in the design step.

3. Determining how best to set up the tables and access structures.

Populating
This step includes all of the tasks related to the getting data from the source, cleaning it up, modifying it to the right
format and level of detail, and moving it into the data mart.
It involves the following tasks:

1. Mapping data sources to target data sources

2. Extracting data

3. Cleansing and transforming the information.

4. Loading data into the data mart

5. Creating and storing metadata

Accessing
This step involves putting the data to use: querying the data, analyzing it, creating reports, charts and graphs and
publishing them.
It involves the following tasks:

1. Set up and intermediate layer (Meta Layer) for the front-end tool to use. This layer translates database
operations and objects names into business conditions so that the end-clients can interact with the data mart
using words which relates to the business functions.

2. Set up and manage database architectures like summarized tables which help queries agree through the front-
end tools execute rapidly and efficiently.

Managing
This step contains managing the data mart over its lifetime. In this step, management functions are performed as:

1. Providing secure access to the data.

2. Managing the growth of the data.

3. Optimizing the system for better performance.

4. Ensuring the availability of data event with system failures.

Difference between Data Warehouse and Data Mart

Data Warehouse Data Mart

A Data Warehouse is a vast repository of information collected A data mart is an only subtype of a Data Warehouses. It is
from various organizations or departments within a corporation. architecture to meet the requirement of a specific user group.

It may hold multiple subject areas. It holds only one subject area. For example, Finance or Sales.

It holds very detailed information. It may hold more summarized data.

It concentrates on integrating data from a given subject area


Works to integrate all data sources
or set of source systems.

In data warehousing, Fact constellation is used. In Data Mart, Star Schema and Snowflake Schema are used.

It is a Centralized System. It is a Decentralized System.

Data Warehousing is the data-oriented. Data Marts is a project-oriented.

Data Warehouse Architecture


A data warehouse architecture is a method of defining the overall architecture of data communication processing
and presentation that exist for end-clients computing within the enterprise. Each data warehouse is different, but all

Data Warehousing and Data Mining 9


are characterized by standard vital components.
Production applications such as payroll accounts payable product purchasing and inventory control are designed for
online transaction processing (OLTP). Such applications gather detailed data from day to day operations.

Data Warehouse applications are designed to support the user ad-hoc data requirements, an activity recently
dubbed online analytical processing (OLAP). These include applications such as forecasting, profiling, summary
reporting, and trend analysis.
Production databases are updated continuously by either by hand or via OLTP applications. In contrast, a warehouse
database is updated from operational systems periodically, usually during off-hours. As OLTP data accumulates in
production databases, it is regularly extracted, filtered, and then loaded into a dedicated warehouse server that is
accessible to users. As the warehouse is populated, it must be restructured tables de-normalized, data cleansed of
errors and redundancies and new fields and keys added to reflect the needs to the user for sorting, combining, and
summarizing data.
Data warehouses and their architectures very depending upon the elements of an organization's situation.
Three common architectures are:

Data Warehouse Architecture: Basic

Data Warehouse Architecture: With Staging Area

Data Warehouse Architecture: With Staging Area and Data Marts

Data Warehouse Architecture: Basic

Operational System
An operational system is a method used in data warehousing to refer to a system that is used to process the day-
to-day transactions of an organization.

Flat Files
Meta Data
A set of data that defines and gives information about other data.
Meta Data used in Data Warehouse for a variety of purpose, including:
Meta Data summarizes necessary information about data, which can make finding and work with particular
instances of data more accessible. For example, author, data build, and data changed, and file size are examples of
very basic document metadata.

Metadata is used to direct a query to the most appropriate data source.


Lightly and highly summarized data
The area of the data warehouse saves all the predefined lightly and highly summarized (aggregated) data generated
by the warehouse manager.
The goals of the summarized information are to speed up query performance. The summarized record is updated
continuously as new information is loaded into the warehouse.
End-User access Tools

The principal purpose of a data warehouse is to provide information to the business managers for strategic
decision-making. These customers interact with the warehouse using end-client access tools.
The examples of some of the end-user access tools can be:

Reporting and Query Tools

Application Development Tools

Executive Information Systems Tools

Online Analytical Processing Tools

Data Mining Tools

Data Warehousing and Data Mining 10


Data Warehouse Architecture: With Staging Area
We must clean and process your operational information before put it into the warehouse.
We can do this programmatically, although data warehouses uses a staging area (A place where data is processed
before entering the warehouse).
A staging area simplifies data cleansing and consolidation for operational method coming from multiple source
systems, especially for enterprise data warehouses where all relevant data of an enterprise is consolidated.

Data Warehouse Staging Area is a temporary location where a record from source systems is copied.

Data Warehouse Architecture: With Staging Area and Data Marts


We may want to customize our warehouse's architecture for multiple groups within our organization.
We can do this by adding data marts. A data mart is a segment of a data warehouses that can provided information
for reporting and analysis on a section, unit, department or operation in the company, e.g., sales, payroll,
production, etc.
The figure illustrates an example where purchasing, sales, and stocks are separated. In this example, a financial
analyst wants to analyze historical data for purchases and sales or mine historical information to make predictions
about customer behavior.

Properties of Data Warehouse Architectures


The following architecture properties are necessary for a data warehouse system:

1. Separation: Analytical and transactional processing should be keep apart as much as possible.
2. Scalability: Hardware and software architectures should be simple to upgrade the data volume, which has to be
managed and processed, and the number of user's requirements, which have to be met, progressively increase.
3. Extensibility: The architecture should be able to perform new operations and technologies without redesigning
the whole system.
4. Security: Monitoring accesses are necessary because of the strategic data stored in the data warehouses.
5. Administerability: Data Warehouse management should not be complicated.

Types of Data Warehouse Architectures

Single-Tier Architecture
Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the amount of data stored to
reach this goal; it removes data redundancies.
The figure shows the only layer physically available is the source layer. In this method, data warehouses are virtual.
This means that the data warehouse is implemented as a multidimensional view of operational data created by
specific middleware, or an intermediate processing layer.

The vulnerability of this architecture lies in its failure to meet the requirement for separation between analytical and
transactional processing. Analysis queries are agreed to operational data after the middleware interprets them. In
this way, queries affect transactional workloads.

Two-Tier Architecture

Data Warehousing and Data Mining 11


The requirement for separation plays an essential role in defining the two-tier architecture for a data warehouse
system, as shown in fig:

Although it is typically called two-layer architecture to highlight a separation between physically available sources
and data warehouses, in fact, consists of four subsequent data flow stages:

1. Source layer: A data warehouse system uses a heterogeneous source of data. That data is stored initially to
corporate relational databases or legacy databases, or it may come from an information system outside the
corporate walls.

2. Data Staging: The data stored to the source should be extracted, cleansed to remove inconsistencies and fill
gaps, and integrated to merge heterogeneous sources into one standard schema. The so-
named Extraction, Transformation, and Loading Tools (ETL) can combine heterogeneous schemata, extract,
transform, cleanse, validate, filter, and load source data into a data warehouse.

3. Data Warehouse layer: Information is saved to one logically centralized individual repository: a data warehouse.
The data warehouses can be directly accessed, but it can also be used as a source for creating data marts,
which partially replicate data warehouse contents and are designed for specific enterprise departments. Meta-
data repositories store information on sources, access procedures, data staging, users, data mart schema, and
so on.

4. Analysis: In this layer, integrated data is efficiently, and flexible accessed to issue reports, dynamically analyze
information, and simulate hypothetical business scenarios. It should feature aggregate information navigators,
complex query optimizers, and customer-friendly GUIs.

Three-Tier Architecture
The three-tier architecture consists of the source layer (containing multiple source system), the reconciled layer and
the data warehouse layer (containing both data warehouses and data marts). The reconciled layer sits between the
source data and data warehouse.
The main advantage of the reconciled layer is that it creates a standard reference data model for a whole
enterprise. At the same time, it separates the problems of source data extraction and integration from those of data
warehouse population. In some cases, the reconciled layer is also directly used to accomplish better some
operational tasks, such as producing daily reports that cannot be satisfactorily prepared using the corporate
applications or generating data flows to feed external processes periodically to benefit from cleaning and
integration.

Data Warehousing - Schemas


Schema is a logical description of the entire database. It includes the name and description of records of all record
types including all associated data-items and aggregates. Much like a database, a data warehouse also requires to
maintain a schema. A database uses relational model, while a data warehouse uses Star, Snowflake, and Fact
Constellation schema. In this chapter, we will discuss the schemas used in a data warehouse.

What is Star Schema?


A star schema is the elementary form of a dimensional model, in which data are organized
into facts and dimensions. A fact is an event that is counted or measured, such as a sale or log in. A dimension
includes reference data about the fact, such as date, item, or customer.

A star schema is a relational schema where a relational schema whose design represents a multidimensional data
model. The star schema is the explicit data warehouse schema. It is known as star schema because the entity-
relationship diagram of this schemas simulates a star, with points, diverge from a central table. The center of the
schema consists of a large fact table, and the points of the star are the dimension tables.

Data Warehousing and Data Mining 12


Fact Tables
A table in a star schema which contains facts and connected to dimensions. A fact table has two types of columns:
those that include fact and those that are foreign keys to the dimension table. The primary key of the fact tables is
generally a composite key that is made up of all of its foreign keys.

A fact table might involve either detail level fact or fact that have been aggregated (fact tables that include
aggregated fact are often instead called summary tables). A fact table generally contains facts with the same level
of aggregation.

Dimension Tables
A dimension is an architecture usually composed of one or more hierarchies that categorize data. If a dimension has
not got hierarchies and levels, it is called a flat dimension or list. The primary keys of each of the dimensions table
are part of the composite primary keys of the fact table. Dimensional attributes help to define the dimensional value.
They are generally descriptive, textual values. Dimensional tables are usually small in size than fact table.

Fact tables store data about sales while dimension tables data about the geographic region (markets, cities), clients,
products, times, channels.

Characteristics of Star Schema


The star schema is intensely suitable for data warehouse database design because of the following features:

It creates a DE-normalized database that can quickly provide query responses.

It provides a flexible design that can be changed easily or added to throughout the development cycle, and as
the database grows.

It provides a parallel in design to how end-users typically think of and use the data.

It reduces the complexity of metadata for both developers and end-users.

Advantages of Star Schema


Star Schemas are easy for end-users and application to understand and navigate. With a well-designed schema, the
customer can instantly analyze large, multidimensional data sets.

The main advantage of star schemas in a decision-support environment are:

Query Performance
A star schema database has a limited number of table and clear join paths, the query run faster than they do against
OLTP systems. Small single-table queries, frequently of a dimension table, are almost instantaneous. Large join
queries that contain multiple tables takes only seconds or minutes to run.
In a star schema database design, the dimension is connected only through the central fact table. When the two-
dimension table is used in a query, only one join path, intersecting the fact tables, exist between those two tables.

Data Warehousing and Data Mining 13


This design feature enforces authentic and consistent query results.

Load performance and administration


Structural simplicity also decreases the time required to load large batches of record into a star schema database.
By describing facts and dimensions and separating them into the various table, the impact of a load structure is
reduced. Dimension table can be populated once and occasionally refreshed. We can add new facts regularly and
selectively by appending records to a fact table.

Built-in referential integrity


A star schema has referential integrity built-in when information is loaded. Referential integrity is enforced because
each data in dimensional tables has a unique primary key, and all keys in the fact table are legitimate foreign keys
drawn from the dimension table. A record in the fact table which is not related correctly to a dimension cannot be
given the correct key value to be retrieved.

Easily Understood
A star schema is simple to understand and navigate, with dimensions joined only through the fact table. These joins
are more significant to the end-user because they represent the fundamental relationship between parts of the
underlying business. Customer can also browse dimension table attributes before constructing a query.

Disadvantage of Star Schema


There is some condition which cannot be meet by star schemas like the relationship between the user, and bank
account cannot describe as star schema as the relationship between them is many to many.
Example: Suppose a star schema is composed of a fact table, SALES, and several dimension tables connected to it
for time, branch, item, and geographic locations.

The TIME table has a column for each day, month, quarter, and year. The ITEM table has columns for each item_Key,
item_name, brand, type, supplier_type. The BRANCH table has columns for each branch_key, branch_name,
branch_type. The LOCATION table has columns of geographic data, including street, city, state, and country.

In this scenario, the SALES table contains only four columns with IDs from the dimension tables, TIME, ITEM,
BRANCH, and LOCATION, instead of four columns for time data, four columns for ITEM data, three columns for
BRANCH data, and four columns for LOCATION data. Thus, the size of the fact table is significantly reduced. When
we need to change an item, we need only make a single change in the dimension table, instead of making many
changes in the fact table.

We can create even more complex star schemas by normalizing a dimension table into several tables. The
normalized dimension table is called a Snowflake.

What is Snowflake Schema?


A snowflake schema is equivalent to the star schema. "A schema is known as a snowflake if one or more dimension
tables do not connect directly to the fact table but must join through other dimension tables."
The snowflake schema is an expansion of the star schema where each point of the star explodes into more points. It
is called snowflake schema because the diagram of snowflake schema resembles a snowflake. Snowflaking is a
method of normalizing the dimension tables in a STAR schemas. When we normalize all the dimension tables
entirely, the resultant structure resembles a snowflake with the fact table in the middle.

Snowflaking is used to develop the performance of specific queries. The schema is diagramed with each fact
surrounded by its associated dimensions, and those dimensions are related to other dimensions, branching out into
a snowflake pattern.

The snowflake schema consists of one fact table which is linked to many dimension tables, which can be linked to
other dimension tables through a many-to-one relationship. Tables in a snowflake schema are generally normalized
to the third normal form. Each dimension table performs exactly one level in a hierarchy.
The following diagram shows a snowflake schema with two dimensions, each having three levels. A snowflake
schemas can have any number of dimension, and each dimension can have any number of levels.

Data Warehousing and Data Mining 14


Example: Figure shows a snowflake schema with a Sales fact table, with Store, Location, Time, Product, Line, and
Family dimension tables. The Market dimension has two dimension tables with Store as the primary dimension table,
and Location as the outrigger dimension table. The product dimension has three dimension tables with Product as
the primary dimension table, and the Line and Family table are the outrigger dimension tables.

A star schema store all attributes for a dimension into one denormalized table. This needed more disk space than a
more normalized snowflake schema. Snowflaking normalizes the dimension by moving attributes with low
cardinality into separate dimension tables that relate to the core dimension table by using foreign keys. Snowflaking
for the sole purpose of minimizing disk space is not recommended, because it can adversely impact query
performance.

In snowflake, schema tables are normalized to delete redundancy. In snowflake dimension tables are damaged into
multiple dimension tables.

Figure shows a simple STAR schema for sales in a manufacturing company. The sales fact table include quantity,
price, and other relevant metrics. SALESREP, CUSTOMER, PRODUCT, and TIME are the dimension tables.

The STAR schema for sales, as shown above, contains only five tables, whereas the normalized version now
extends to eleven tables. We will notice that in the snowflake schema, the attributes with low cardinality in each
original dimension tables are removed to form separate tables. These new tables are connected back to the original
dimension table through artificial keys.

A snowflake schema is designed for flexible querying across more complex dimensions and relationship. It is
suitable for many to many and one to many relationships between dimension levels.

Advantage of Snowflake Schema


1. The primary advantage of the snowflake schema is the development in query performance due to minimized
disk storage requirements and joining smaller lookup tables.

2. It provides greater scalability in the interrelationship between dimension levels and components.

3. No redundancy, so it is easier to maintain.

Disadvantage of Snowflake Schema


1. The primary disadvantage of the snowflake schema is the additional maintenance efforts required due to the
increasing number of lookup tables. It is also known as a multi fact star schema.

2. There are more complex queries and hence, difficult to understand.

3. More tables more join so more query execution time.

What is Fact Constellation Schema?


A Fact constellation means two or more fact tables sharing one or more dimensions. It is also called Galaxy schema.

Fact Constellation Schema describes a logical structure of data warehouse or data mart. Fact Constellation Schema
can design with a collection of de-normalized FACT, Shared, and Conformed Dimension tables.

Fact Constellation Schema is a sophisticated database design that is difficult to summarize information. Fact
Constellation Schema can implement between aggregate Fact tables or decompose a complex Fact table into
independent simplex Fact tables.

Example: A fact constellation schema is shown in the figure below.

Data Warehousing and Data Mining 15


This schema defines two fact tables, sales, and shipping. Sales are treated along four dimensions, namely, time,
item, branch, and location. The schema contains a fact table for sales that includes keys to each of the four
dimensions, along with two measures: Rupee_sold and units_sold. The shipping table has five dimensions, or keys:
item_key, time_key, shipper_key, from_location, and to_location, and two measures: Rupee_cost and units_shipped.
The primary disadvantage of the fact constellation schema is that it is a more challenging design because many
variants for specific kinds of aggregation must be considered and selected.

Data Warehouse Applications


The application areas of the data warehouse are:
Information Processing

It deals with querying, statistical analysis, and reporting via tables, charts, or graphs. Nowadays, information
processing of data warehouse is to construct a low cost, web-based accessing tools typically integrated with web
browsers.

Analytical Processing
It supports various online analytical processing such as drill-down, roll-up, and pivoting. The historical data is being
processed in both summarized and detailed format.
OLAP is implemented on data warehouses or data marts. The primary objective of OLAP is to support ad-hoc
querying needed for support DSS. The multidimensional view of data is fundamental to the OLAP application. OLAP
is an operational view, not a data structure or schema. The complex nature of OLAP applications requires a
multidimensional view of the data.

Data Mining
It helps in the analysis of hidden design and association, constructing scientific models, operating classification and
prediction, and performing the mining results using visualization tools.

Data mining is the technique of designing essential new correlations, patterns, and trends by changing through high
amounts of a record save in repositories, using pattern recognition technologies as well as statistical and
mathematical techniques.

It is the phase of selection, exploration, and modeling of huge quantities of information to determine regularities or
relations that are at first unknown to access precise and useful results for the owner of the database.

It is the process of inspection and analysis, by automatic or semi-automatic means, of large quantities of records to
discover meaningful patterns and rules.

What is OLAP (Online Analytical Processing)?


OLAP stands for On-Line Analytical Processing. OLAP is a classification of software technology which authorizes
analysts, managers, and executives to gain insight into information through fast, consistent, interactive access in a
wide variety of possible views of data that has been transformed from raw information to reflect the real
dimensionality of the enterprise as understood by the clients.
OLAP implement the multidimensional analysis of business information and support the capability for complex
estimations, trend analysis, and sophisticated data modeling. It is rapidly enhancing the essential foundation for
Intelligent Solutions containing Business Performance Management, Planning, Budgeting, Forecasting, Financial
Documenting, Analysis, Simulation-Models, Knowledge Discovery, and Data Warehouses Reporting. OLAP enables
end-clients to perform ad hoc analysis of record in multiple dimensions, providing the insight and understanding
they require for better decision making.

Who uses OLAP and Why?


OLAP applications are used by a variety of the functions of an organization.
Finance and accounting:

Budgeting

Activity-based costing

Data Warehousing and Data Mining 16


Financial performance analysis

And financial modeling

Sales and Marketing

Sales analysis and forecasting

Market research analysis

Promotion analysis

Customer analysis

Market and customer segmentation

Production

Production planning

Defect analysis

OLAP cubes have two main purposes. The first is to provide business users with a data model more intuitive to them
than a tabular model. This model is called a Dimensional Model.
The second purpose is to enable fast query response that is usually difficult to achieve using tabular models.

How OLAP Works?


Fundamentally, OLAP has a very simple concept. It pre-calculates most of the queries that are typically very hard to
execute over tabular databases, namely aggregation, joining, and grouping. These queries are calculated during a
process that is usually called 'building' or 'processing' of the OLAP cube. This process happens overnight, and by
the time end users get to work - data will have been updated.

OLAP Guidelines (Dr.E.F.Codd Rule)


Dr E.F. Codd, the "father" of the relational model, has formulated a list of 12 guidelines and requirements as the
basis for selecting OLAP systems:

1) Multidimensional Conceptual View: This is the central features of an OLAP system. By needing a
multidimensional view, it is possible to carry out methods like slice and dice.
2) Transparency: Make the technology, underlying information repository, computing operations, and the dissimilar
nature of source data totally transparent to users. Such transparency helps to improve the efficiency and
productivity of the users.
3) Accessibility: It provides access only to the data that is actually required to perform the particular analysis,
present a single, coherent, and consistent view to the clients. The OLAP system must map its own logical schema to
the heterogeneous physical data stores and perform any necessary transformations. The OLAP operations should
be sitting between data sources (e.g., data warehouses) and an OLAP front-end.

4) Consistent Reporting Performance: To make sure that the users do not feel any significant degradation in
documenting performance as the number of dimensions or the size of the database increases. That is, the
performance of OLAP should not suffer as the number of dimensions is increased. Users must observe consistent
run time, response time, or machine utilization every time a given query is run.
5) Client/Server Architecture: Make the server component of OLAP tools sufficiently intelligent that the various
clients to be attached with a minimum of effort and integration programming. The server should be capable of
mapping and consolidating data between dissimilar databases.

6) Generic Dimensionality: An OLAP method should treat each dimension as equivalent in both is structure and
operational capabilities. Additional operational capabilities may be allowed to selected dimensions, but such
additional tasks should be grantable to any dimension.

7) Dynamic Sparse Matrix Handling: To adapt the physical schema to the specific analytical model being created
and loaded that optimizes sparse matrix handling. When encountering the sparse matrix, the system must be easy
to dynamically assume the distribution of the information and adjust the storage and access to obtain and maintain a
consistent level of performance.
8) Multiuser Support: OLAP tools must provide concurrent data access, data integrity, and access security.

Data Warehousing and Data Mining 17


9) Unrestricted cross-dimensional Operations: It provides the ability for the methods to identify dimensional order
and necessarily functions roll-up and drill-down methods within a dimension or across the dimension.

10) Intuitive Data Manipulation: Data Manipulation fundamental the consolidation direction like as reorientation
(pivoting), drill-down and roll-up, and another manipulation to be accomplished naturally and precisely via point-
and-click and drag and drop methods on the cells of the scientific model. It avoids the use of a menu or multiple
trips to a user interface.
11) Flexible Reporting: It implements efficiency to the business clients to organize columns, rows, and cells in a
manner that facilitates simple manipulation, analysis, and synthesis of data.
12) Unlimited Dimensions and Aggregation Levels: The number of data dimensions should be unlimited. Each of
these common dimensions must allow a practically unlimited number of customer-defined aggregation levels within
any given consolidation path.

Characteristics of OLAP
In the FASMI characteristics of OLAP methods, the term derived from the first letters of the characteristics are:

Fast
It defines which the system targeted to deliver the most feedback to the client within about five seconds, with the
elementary analysis taking no more than one second and very few taking more than 20 seconds.

Analysis
It defines which the method can cope with any business logic and statistical analysis that is relevant for the function
and the user, keep it easy enough for the target client. Although some preprogramming may be needed we do not
think it acceptable if all application definitions have to be allow the user to define new Adhoc calculations as part of
the analysis and to document on the data in any desired method, without having to program so we excludes
products (like Oracle Discoverer) that do not allow the user to define new Adhoc calculation as part of the analysis
and to document on the data in any desired product that do not allow adequate end user-oriented calculation
flexibility.

Share
It defines which the system tools all the security requirements for understanding and, if multiple write connection is
needed, concurrent update location at an appropriated level, not all functions need customer to write data back, but
for the increasing number which does, the system should be able to manage multiple updates in a timely, secure
manner.

Multidimensional
This is the basic requirement. OLAP system must provide a multidimensional conceptual view of the data, including
full support for hierarchies, as this is certainly the most logical method to analyze business and organizations.

Information
The system should be able to hold all the data needed by the applications. Data sparsity should be handled in an
efficient manner.
The main characteristics of OLAP are as follows:

1. Multidimensional conceptual view: OLAP systems let business users have a dimensional and logical view of
the data in the data warehouse. It helps in carrying slice and dice operations.

2. Multi-User Support: Since the OLAP techniques are shared, the OLAP operation should provide normal
database operations, containing retrieval, update, adequacy control, integrity, and security.

3. Accessibility: OLAP acts as a mediator between data warehouses and front-end. The OLAP operations should
be sitting between data sources (e.g., data warehouses) and an OLAP front-end.

4. Storing OLAP results: OLAP results are kept separate from data sources.

5. Uniform documenting performance: Increasing the number of dimensions or database size should not
significantly degrade the reporting performance of the OLAP system.

Data Warehousing and Data Mining 18


6. OLAP provides for distinguishing between zero values and missing values so that aggregates are computed
correctly.

7. OLAP system should ignore all missing values and compute correct aggregate values.

8. OLAP facilitate interactive query and complex analysis for the users.

9. OLAP allows users to drill down for greater details or roll up for aggregations of metrics along a single business
dimension or across multiple dimension.

10. OLAP provides the ability to perform intricate calculations and comparisons.

11. OLAP presents results in a number of meaningful ways, including charts and graphs.

Benefits of OLAP
OLAP holds several benefits for businesses: -

1. OLAP helps managers in decision-making through the multidimensional record views that it is efficient in
providing, thus increasing their productivity.

2. OLAP functions are self-sufficient owing to the inherent flexibility support to the organized databases.

3. It facilitates simulation of business models and problems, through extensive management of analysis-
capabilities.

4. In conjunction with data warehouse, OLAP can be used to support a reduction in the application backlog, faster
data retrieval, and reduction in query drag.

Motivations for using OLAP


1) Understanding and improving sales: For enterprises that have much products and benefit a number of channels
for selling the product, OLAP can help in finding the most suitable products and the most famous channels. In some
methods, it may be feasible to find the most profitable users. For example, considering the telecommunication
industry and considering only one product, communication minutes, there is a high amount of record if a company
want to analyze the sales of products for every hour of the day (24 hours), difference between weekdays and
weekends (2 values) and split regions to which calls are made into 50 region.

2) Understanding and decreasing costs of doing business: Improving sales is one method of improving a business,
the other method is to analyze cost and to control them as much as suitable without affecting sales. OLAP can assist
in analyzing the costs related to sales. In some methods, it may also be feasible to identify expenditures which
produce a high return on investments (ROI). For example, recruiting a top salesperson may contain high costs, but
the revenue generated by the salesperson may justify the investment.

Types of OLAP
There are three main types of OLAP servers are as following:

ROLAP stands for Relational OLAP, an application based on relational DBMSs.


MOLAP stands for Multidimensional OLAP, an application based on multidimensional DBMSs.

HOLAP stands for Hybrid OLAP, an application using both relational and multidimensional techniques.

Relational OLAP (ROLAP) Server


These are intermediate servers which stand in between a relational back-end server and user frontend tools.

They use a relational or extended-relational DBMS to save and handle warehouse data, and OLAP middleware to
provide missing pieces.
ROLAP servers contain optimization for each DBMS back end, implementation of aggregation navigation logic, and
additional tools and services.
ROLAP technology tends to have higher scalability than MOLAP technology.

Data Warehousing and Data Mining 19


ROLAP systems work primarily from the data that resides in a relational database, where the base data and
dimension tables are stored as relational tables. This model permits the multidimensional analysis of data.
This technique relies on manipulating the data stored in the relational database to give the presence of traditional
OLAP's slicing and dicing functionality. In essence, each method of slicing and dicing is equivalent to adding a
"WHERE" clause in the SQL statement.

Relational OLAP Architecture


ROLAP Architecture includes the following components

Database server.

ROLAP server.

Front-end tool.

Relational OLAP (ROLAP) is the latest and fastest-growing OLAP technology segment in the market. This method
allows multiple multidimensional views of two-dimensional relational tables to be created, avoiding structuring
record around the desired view.

Some products in this segment have supported reliable SQL engines to help the complexity of multidimensional
analysis. This includes creating multiple SQL statements to handle user requests, being 'RDBMS' aware and also
being capable of generating the SQL statements based on the optimizer of the DBMS engine.

Advantages
Can handle large amounts of information: The data size limitation of ROLAP technology is depends on the data size
of the underlying RDBMS. So, ROLAP itself does not restrict the data amount.
RDBMS already comes with a lot of features. So ROLAP technologies, (works on top of the RDBMS) can control
these functionalities.

Disadvantages
Performance can be slow: Each ROLAP report is a SQL query (or multiple SQL queries) in the relational database,
the query time can be prolonged if the underlying data size is large.
Limited by SQL functionalities: ROLAP technology relies on upon developing SQL statements to query the relational
database, and SQL statements do not suit all needs.

Multidimensional OLAP (MOLAP) Server


A MOLAP system is based on a native logical model that directly supports multidimensional data and operations.
Data are stored physically into multidimensional arrays, and positional techniques are used to access them.

One of the significant distinctions of MOLAP against a ROLAP is that data are summarized and are stored in an
optimized format in a multidimensional cube, instead of in a relational database. In MOLAP model, data are
structured into proprietary formats by client's reporting requirements with the calculations pre-generated on the
cubes.

MOLAP Architecture
MOLAP Architecture includes the following components

Database server.

MOLAP server.

Front-end tool.

MOLAP structure primarily reads the precompiled data. MOLAP structure has limited capabilities to dynamically
create aggregations or to evaluate results which have not been pre-calculated and stored.
Applications requiring iterative and comprehensive time-series analysis of trends are well suited for MOLAP
technology (e.g., financial analysis and budgeting).

Data Warehousing and Data Mining 20


Examples include Arbor Software's Essbase. Oracle's Express Server, Pilot Software's Lightship Server, Sniper's
TM/1. Planning Science's Gentium and Kenan Technology's Multiway.
Some of the problems faced by clients are related to maintaining support to multiple subject areas in an RDBMS.
Some vendors can solve these problems by continuing access from MOLAP tools to detailed data in and RDBMS.

This can be very useful for organizations with performance-sensitive multidimensional analysis requirements and
that have built or are in the process of building a data warehouse architecture that contains multiple subject areas.

An example would be the creation of sales data measured by several dimensions (e.g., product and sales region) to
be stored and maintained in a persistent structure. This structure would be provided to reduce the application
overhead of performing calculations and building aggregation during initialization. These structures can be
automatically refreshed at predetermined intervals established by an administrator.

Advantages
Excellent Performance: A MOLAP cube is built for fast information retrieval, and is optimal for slicing and dicing
operations.

Can perform complex calculations: All evaluation have been pre-generated when the cube is created. Hence,
complex calculations are not only possible, but they return quickly.

Disadvantages
Limited in the amount of information it can handle: Because all calculations are performed when the cube is built, it
is not possible to contain a large amount of data in the cube itself.

Requires additional investment: Cube technology is generally proprietary and does not already exist in the
organization. Therefore, to adopt MOLAP technology, chances are other investments in human and capital
resources are needed.

Hybrid OLAP (HOLAP) Server


HOLAP incorporates the best features of MOLAP and ROLAP into a single architecture. HOLAP systems save more
substantial quantities of detailed data in the relational tables while the aggregations are stored in the pre-calculated
cubes. HOLAP also can drill through from the cube down to the relational tables for delineated data. The Microsoft
SQL Server 2000 provides a hybrid OLAP server.

Advantages of HOLAP
1. HOLAP provide benefits of both MOLAP and ROLAP.

2. It provides fast access at all levels of aggregation.

3. HOLAP balances the disk space requirement, as it only stores the aggregate information on the OLAP server and
the detail record remains in the relational database. So no duplicate copy of the detail record is maintained.

Disadvantages of HOLAP
1. HOLAP architecture is very complicated because it supports both MOLAP and ROLAP servers.

Other Types
There are also less popular types of OLAP styles upon which one could stumble upon every so often. We have listed
some of the less popular brands existing in the OLAP industry.

Web-Enabled OLAP (WOLAP) Server


WOLAP pertains to OLAP application which is accessible via the web browser. Unlike traditional client/server OLAP
applications, WOLAP is considered to have a three-tiered architecture which consists of three components: a client,
a middleware, and a database server.

Desktop OLAP (DOLAP) Server


DOLAP permits a user to download a section of the data from the database or source, and work with that dataset
locally, or on their desktop.

Data Warehousing and Data Mining 21


Mobile OLAP (MOLAP) Server
Mobile OLAP enables users to access and work on OLAP data and applications remotely through the use of their
mobile devices.

Spatial OLAP (SOLAP) Server


SOLAP includes the capabilities of both Geographic Information Systems (GIS) and OLAP into a single user
interface. It facilitates the management of both spatial and non-spatial data.

OLAP Operations in the Multidimensional Data Model


In the multidimensional model, the records are organized into various dimensions, and each dimension includes
multiple levels of abstraction described by concept hierarchies. This organization support users with the flexibility to
view data from various perspectives. A number of OLAP data cube operation exist to demonstrate these different
views, allowing interactive queries and search of the record at hand. Hence, OLAP supports a user-friendly
environment for interactive data analysis.

Consider the OLAP operations which are to be performed on multidimensional data. The figure shows data cubes
for sales of a shop. The cube contains the dimensions, location, and time and item, where the location is
aggregated with regard to city values, time is aggregated with respect to quarters, and an item is aggregated with
respect to item types.

Roll-Up
The roll-up operation (also known as drill-up or aggregation operation) performs aggregation on a data cube, by
climbing down concept hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the data cubes. Figure
shows the result of roll-up operations performed on the dimension location. The hierarchy for the location is defined
as the Order Street, city, province, or state, country. The roll-up operation aggregates the data by ascending the
location hierarchy from the level of the city to the level of the country.

When a roll-up is performed by dimensions reduction, one or more dimensions are removed from the cube. For
example, consider a sales data cube having two dimensions, location and time. Roll-up may be performed by
removing, the time dimensions, appearing in an aggregation of the total sales by location, relatively than by location
and by time.

Example
Consider the following cubes illustrating temperature of certain days recorded weekly:

Temperature 64 65 68 69 70 71 72

Week1 1 0 1 0 1 0 0

Week2 0 0 0 1 0 0 1

Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in temperature from the above
cubes.

To do this, we have to group column and add up the value according to the concept hierarchies. This operation is
known as a roll-up.

By doing this, we contain the following cube:

Temperature cool mild hot

Week1 2 1 1

Week2 2 1 1

The roll-up operation groups the information by levels of temperature.

The following diagram illustrates how roll-up works.

Drill-Down

Data Warehousing and Data Mining 22


The drill-down operation (also called roll-down) is the reverse operation of roll-up. Drill-down is like zooming-in on
the data cube. It navigates from less detailed record to more detailed data. Drill-down can be performed by
either stepping down a concept hierarchy for a dimension or adding additional dimensions.
Figure shows a drill-down operation performed on the dimension time by stepping down a concept hierarchy which
is defined as day, month, quarter, and year. Drill-down appears by descending the time hierarchy from the level of
the quarter to a more detailed level of the month.

Because a drill-down adds more details to the given data, it can also be performed by adding a new dimension to a
cube. For example, a drill-down on the central cubes of the figure can occur by introducing an additional dimension,
such as a customer group.

Example
Drill-down adds more details to the given data

Temperature cool mild hot

Day 1 0 0 0

Day 2 0 0 0

Day 3 0 0 1

Day 4 0 1 0

Day 5 1 0 0

Day 6 0 0 0

Day 7 1 0 0

Day 8 0 0 0

Day 9 1 0 0

Day 10 0 1 0

Day 11 0 1 0

Day 12 0 1 0

Day 13 0 0 1

Day 14 0 0 0

The following diagram illustrates how Drill-down works.

Slice
A slice is a subset of the cubes corresponding to a single value for one or more members of the dimension. For
example, a slice operation is executed when the customer wants a selection on one dimension of a three-
dimensional cube resulting in a two-dimensional site. So, the Slice operations perform a selection on one dimension
of the given cube, thus resulting in a subcube.

For example, if we make the selection, temperature=cool we will obtain the following cube:

Temperature cool

Day 1 0

Day 2 0

Day 3 0

Day 4 0

Day 5 1

Day 6 1

Day 7 1

Day 8 1

Day 9 1

Day 11 0

Day 12 0

Data Warehousing and Data Mining 23


Day 13 0

Day 14 0

The following diagram illustrates how Slice works.

Here Slice is functioning for the dimensions "time" using the criterion time = "Q1".

It will form a new sub-cubes by selecting one or more dimensions.

Dice
The dice operation describes a subcube by operating a selection on two or more dimension.

For example, Implement the selection (time = day 3 OR time = day 4) AND (temperature = cool OR temperature =
hot) to the original cubes we get the following subcube (still two-dimensional)

Temperature cool hot

Day 3 0 1

Day 4 0 0

Consider the following diagram, which shows the dice operations.

The dice operation on the cubes based on the following selection criteria involves three dimensions.

(location = "Toronto" or "Vancouver")

(time = "Q1" or "Q2")

(item =" Mobile" or "Modem")

Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations which rotates the data axes in view to
provide an alternative presentation of the data. It may contain swapping the rows and columns or moving one of the
row-dimensions into the column dimensions.

Consider the following diagram, which shows the pivot operation.

Other OLAP Operations


executes queries containing more than one fact table. The drill-through operations make use of relational SQL
facilitates to drill through the bottom level of a data cubes down to its back-end relational tables.

Other OLAP operations may contain ranking the top-N or bottom-N elements in lists, as well as calculate moving
average, growth rates, and interests, internal rates of returns, depreciation, currency conversions, and statistical
tasks.
OLAP offers analytical modeling capabilities, containing a calculation engine for determining ratios, variance, etc.
and for computing measures across various dimensions. It can generate summarization, aggregation, and
hierarchies at each granularity level and at every dimensions intersection. OLAP also provide functional models for
forecasting, trend analysis, and statistical analysis. In this context, the OLAP engine is a powerful data analysis tool.

Data Warehousing and Data Mining 24

You might also like