0% found this document useful (0 votes)
15 views50 pages

DM Unit V

Uploaded by

aravindhan062003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views50 pages

DM Unit V

Uploaded by

aravindhan062003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

UNIT V

Need for Data Warehouse – Database versus Data Warehouse – Multidimensional Data Model – Schemas for
Multidimensional Databases – OLAP operations – OLAP versus OLTP – Data Warehouse Architecture –
Extraction, Transformation and Loading (ETL).

What is Database?
A database is a collection of related data which represents some elements of the real world. It is designed to be
built and populated with data for a specific task. It is also a building block of your data solution.It is designed to
be built and populatedwith data for a specific task. It is also a building block of your datasolution.

Need for Data Warehouse

\What is a Data Warehouse?

A data warehouse is a tool that manage of data after and outside of operational systems. Thus, they are not
replacements for the operational systems but are major tools that acquires data from the operational system.
Data warehousing technology has evolved in business applications for the process of strategic decision making.
Data warehouses may be sometimes considered as the key components of IT strategy and architecture of an
organisation.

Formal Definition Of Data Warehouse

A data warehouse as defined by W.H. Inmon is a subject-oriented, integrated, nonvolatile, time-variant


collection of data that supports decision-making of the management. Data warehouses provide controlled access
to data for complex analysis, knowledge discovery, and decision-making.

some uses of data warehousing in various industries

A data warehouse offers the following advantages:


1. It provides historical information that can be used in many different forms of comparative and
competitive analysis.
2. It enhances the quality of the data and tries to make it complete.
3. It can help in supporting disaster recovery although not alone but with other back up resources.
4. One of the major advantages a data warehouse offers is that it allows a large collection of historical data
of many operational databases, which may be heterogeneous in nature, that can be analysed through one
data warehouse interface, thus, it can be said to be a ONE STOP portal of historical information of an
organisation. It can also be used in determining many trends through the use of data mining techniques.
5. Data warehouse supports various business intelligence applications. Some of these may be - online
analytical processing (OLAP), decision-support systems (DSS), data mining etc.

Why Use Data Warehouse?


Here, are Important reasons for using Data Warehouse:
i) Data warehouse helps business users to access critical data from some sources all in one place.
ii) It provides consistent information on various cross-functional activities
iii) Helps you to integrate many sources of data to reduce stress on the production system.
iv) Data warehouse helps you to reduce TAT (total turnaround time) for analysis and reporting.
v) Data warehouse helps users to access critical data from different sources in a single place so, it saves
user’s time of retrieving data information from multiple sources. You can also access data from the
cloud easily.
vi) Data warehouse allows you to store a large amount of historical data to analyze different periods and
trends to make future predictions.
vii) Enhances the value of operational business applications and customer relationship management systems
viii) Separates analytics processing from transactional databases, improving the performance of both systems
ix) Stakeholders and users may be overestimating the quality of data in the source systems. Data warehouse
provides more accurate reports.

Characteristics of Data Warehouses

Data warehouses have the following important features:


1) Multidimensional conceptual view: A data warehouse contains data of many operational systems, thus,
instead of a simple table it can be represented in multidimensional data form. 2) Unlimited dimensions and
unrestricted cross-dimensional operations: Since the data is available in multidimensional form, it requires a
schema that is different from the relational schema.
3) Dynamic sparse matrix handling: This is a feature that is much needed as it contains huge amount of data.
4)Client/server architecture: This feature help a data warehouse to be accessed in a controlled environment by
multiple users.
5) Accessibility and transparency, intuitive data manipulation and consistent reporting performance: This
is one of the major features of the data warehouse. A Data warehouse contains, huge amounts of data, however,
that should not be the reason for bad performance or bad user interfaces. Since the objectives of data warehouse
are clear, therefore, it has to support the following easy to use interfaces, strong data manipulation, support for
applying and reporting of various analyses and user-friendly output.

The Basic Components of a Data Warehouse


A data warehouse basically consists of three components: The Data Sources The ETL and The Schema of
data of data warehouse including meta data. The analytical reports are not a part of the data warehouse but are
one of the major business application areas including OLAP and DSS.
A Three Tier Data Warehouse Architecture:
The Data Sources The data of the data warehouse can be obtained from many operational systems. A data
warehouse interacts with the environment that provides most of the source data for the data warehouse. By the
term environment, we mean, traditionally developed applications. In a large installation, hundreds or even
thousands of these database systems or files based system exist with plenty of redundant data.
The warehouse database obtains most of its data from such different forms of legacy systems − files and
databases. Data may also be sourced from external sources as well as other organisational systems, for example,
an office system. This data needs to be integrated into the warehouse. But how do we integrate the data of these
large numbers of operational systems to the data warehouse system? We need the help of ETL tools to do so.
These tools capture the data that is required to be put in the data warehouse database.

Data of Data Warehouse A data warehouse has an integrated, “subject-oriented”, “time-variant” and
“nonvolatile” collection of data.
The basic characteristics of the data of a data warehouse can be described in the following way:
i) Integration: Integration means bringing together data of multiple, dissimilar operational sources on the basis
of an enterprise data model. The enterprise data model can be a basic template that identifies and defines the
organisation’s key data items uniquely. It also identifies the logical relationships between them ensuring
organisation wide consistency in terms of:
Data naming and definition: Standardising for example, on the naming of “student enrolment number” across
systems. Encoding structures: Standardising on gender to be represented by “M” for male and “F” for female
or that the first two digit of enrolment number would represent the year of admission. Measurement of
variables: A Standard is adopted for data relating to some measurements, for example, all the units will be
expressed in metric system or all monetary details will be given in Indian Rupees.
ii) Subject Orientation: The second characteristic of the data warehouse’s data is that its design and structure
can be oriented to important objects of the organisation. These objects such as STUDENT, PROGRAMME,
REGIONAL CENTRES etc., are in contrast to its operational systems, which may be designed around
applications and functions such as ADMISSION, EXAMINATION and RESULT DELCARATIONS (in the case
of a University).
iii) Time-Variance: The third defining characteristic of the database of data warehouse is that it is time-variant,
or historical in nature. The entire data in the data warehouse is/was accurate at some point of time. This is, in
contrast with operational data that changes over a shorter time period. The data warehouse’s data contains data
that is date-stamped, and which is historical data.

Figure 4 defines this characteristic of data warehouse.

iv) Non-volatility (static nature) of data: Data warehouse data is loaded on to the data warehouse database and
is subsequently scanned and used, but is not updated in the same classical sense as operational system’s data
which is updated through the transaction processing cycles.
Decision Support and Analysis Tools A data warehouse may support many OLAP and DSS tools. Such
decision support applications would typically access the data warehouse database through a standard query
language protocol; an example of such a language may be SQL. These applications may be of three categories:
simple query and reporting, decision support systems and executive information systems.

Meta Data Directory The meta data directory component defines the repository of the information stored in the
data warehouse. The meta data can be used by the general users as well as data administrators. It contains the
following information:
i) the structure of the contents of the data warehouse database,
ii) the source of the data,
iii) the data transformation processing requirements, such that, data can be passed from the legacy systems
into the data warehouse database,
iv) the process summarisation of data,
v) the data extraction history, and
vi) how the data needs to be extracted from the data warehouse.

Meta data has several roles to play and uses in the data warehouse system. For an end user, meta data
directories also provide some additional information, such as what a particular data item would mean in
business terms. It also identifies the information on reports, spreadsheets and queries related to the data of
concern. All database management systems (DBMSs) have their own data dictionaries that serve a similar
purpose. Information from the data dictionaries of the operational system forms a valuable source of
information for the data warehouse’s meta data directory.

Data Warehouse Design Process: A data warehouse can be built using a top-down approach, a bottom-up
approach, or a combination of both.

BOTTOM TIER: It is a warehouse database server. Data is fed using Back end tools and utilities. Data
extracted using programs called gateways. It also contains Meta data repository. The bottom tier is a warehouse
database server that is almost always a relational database system. Back-end tools and utilities are used to feed
data into the bottom tier from operational databases or other external sources (such as customer profile
information provided by external consultants). These tools and utilities perform data extraction, cleaning, and
transformation (e.g., to merge similar data from different sources into a unified format), as well as load and
refresh functions to update the data warehouse. The data are extracted using application program interfaces
known as gateways. A gateway is supported by the underlying DBMS and allows client programs to generate
SQL code to be executed at a server. Examples of gateways include ODBC (Open Database Connection) and
OLEDB (Open Linking and Embedding for Databases) by Microsoft and JDBC (Java Database Connection).
This tier also contains a metadata repository, which stores information about the data warehouse and its
contents.

MIDDLE TIER: The middle tier is an OLAP server that is typically implemented using either(1) a relational
OLAP (ROLAP) model, that is, an extended relational DBMS that maps operations on multidimensional data to
standard relational operations.

OLAP model is an extended relational DBMS thatmaps operations on multidimensional data to standard
relational operations. A multidimensional OLAP (MOLAP) model, that is, a special-purpose server that directly
implements multidimensional data and operations.

TOP TIER: The top tier is a front-end client layer, which contains query and reporting tools, analysis tools,
and/or datamining tools. (e.g., trend analysis, prediction, and so on).

The warehouse design process consists of the following steps:


i) Choose a business process to model, for example, orders, invoices, shipments, inventory, account
administration, sales, or the general ledger. If the business process is organizational and involves
multiple complex object collections, a data warehouse model should be followed. However, if the
process is departmental and focuses on the analysis of one kind of business process, a data mart model
should be chosen.

ii) Choose the grain of the business process. The grain is the fundamental, atomic level of data to be
represented in the fact table for this process, for example, individual transactions, individual daily
snapshots, and so on.

iii) Choose the dimensions that will apply to each fact table record. Typical dimensions are time, item,
customer, supplier, warehouse, transaction type, and status.

iv) Choose the measures that will populate each fact table record. Typical measures are numeric additive
quantities like dollars sold and units sold.

Data Extraction, Transformation and Loading (ETL)

The first step in data warehousing is, to perform data extraction, transformation, and loading of data into the
data warehouse. This is called ETL that is Extraction, Transformation, and Loading. ETL refers to the
methods involved in accessing and manipulating data available in various sources and loading it into a target
data warehouse.

Initially the ETL was performed using SQL programs, however, now there are tools available for ETL
processes.

The manual ETL was complex as it required the creation of a complex code for extracting data from many
sources. ETL tools are very powerful and offer many advantages over the manual ETL. ETL is a step-by-step
process.

As a first step, it maps the data structure of a source system to the structure in the target data warehousing
system.
In the second step, it cleans up the data using the process of data transformation and finally, it loads the data
into the target system.

What happens during the ETL Process?


The ETL is three-stage process.

During the Extraction phase the desired data is identified and extracted from many different sources. These
sources may be different databases or non-databases. Sometimes when it is difficult to identify the desirable
data then, more data than necessary is extracted. This is followed by the identification of the relevant data from
the extracted data. The process of extraction sometimes, may involve some basic transformation. For example,
if the data is being extracted from two Sales databases where the sales in one of the databases is in Dollars and
in the other in Rupees, then, simple transformation would be required in the data. The size of the extracted data
may vary from several hundreds of kilobytes to hundreds of gigabytes, depending on the data sources and
business systems. Even the time frame for the extracted data may vary, that is, in some data warehouses, data
extraction may take a few days or hours to a real time data update. For example, a situation where the volume
of extracted data even in real time may be very high is a web server.

The extraction process involves data cleansing and data profiling. Data cleansing can be defined as the
process of removal of inconsistencies among the data. For example, the state name may be written in many
ways also they can be misspelt too. For example, the state Uttar Pradesh may be written as U.P., UP, Uttar
Pradesh, Utter Pradesh etc. The cleansing process may try to correct the spellings as well as resolve such
inconsistencies. But how does the cleansing process do that? One simple way may be, to create a Database of
the States with some possible fuzzy matching algorithms that may map various variants into one state name.
Thus, cleansing the data to a great extent. Data profiling involves creating the necessary data from the point
of view of data warehouse application. Another concern here is to eliminate duplicate data. For example, an
address list collected from different sources may be merged as well as purged to create an address profile with
no duplicate data.

One of the most time-consuming tasks - data transformation and loading follows the extraction stage. This
process includes the following: Use of data filters, Data validation against the existing data, Checking of data
duplication, and Information aggregation. Transformations are useful for transforming the source data
according to the requirements of the data warehouse. The process of transformation should ensure the quality of
the data that needs to be loaded into the target data warehouse.

Some of the common transformations are:

Filter Transformation: Filter transformations are used to filter the rows in a mapping that do not meet specific
conditions. For example, the list of employees of the Sales department who made sales above Rs.50,000/- may
be filtered out.

Joiner Transformation: This transformation is used to join the data of one or more different tables that may be
stored on two different locations and could belong to two different sources of data that may be relational or
from any other sources like XML data.

Aggregator Transformation: Such transformations perform aggregate calculations on the extracted data. Some
such calculations may be to find the sum or average.

Sorting Transformation: requires creating an order in the required data, based on the application requirements
of the data warehouse.
Once the data of the data warehouse is properly extracted and transformed, it is loaded into a data warehouse.
This process requires the creation and execution of programs that perform this task. One of the key concerns
here is to propagate updates.Some times, this problem is equated to the problem of maintenance of the
materialised views.

When should we perform the ETL process for data warehouse? ETL process should normally be performed
during the night or at such times when the load on the operational systems is low. The integrity of the extracted
data can be ensured by synchronising the different operational applications feeding the data warehouse and the
data of the data warehouse.

ETL (Extract, Transform, and Load) Process

What is ETL?

The mechanism of extracting information from source systems and bringing it into the data warehouse is
commonly called ETL, which stands for Extraction, Transformation and Loading.

The ETL process requires active inputs from various stakeholders, including developers, analysts, testers, top
executives and is technically challenging.

To maintain its value as a tool for decision-makers, Data warehouse technique needs to change with business
changes. ETL is a recurring method (daily, weekly, monthly) of a Data warehouse system and needs to be agile,
automated, and well documented.

How ETL Works?

ETL consists of three separate phases:


Extraction

o Extraction is the operation of extracting information from a source system for further use in a data
warehouse environment. This is the first stage of the ETL process.
o Extraction process is often one of the most time-consuming tasks in the ETL.
o The source systems might be complicated and poorly documented, and thus determining which data needs
to be extracted can be difficult.
o The data has to be extracted several times in a periodic manner to supply all
changed data to the warehouse and keep it up-to-date.

Cleansing

The cleansing stage is crucial in a data warehouse technique because it is supposed to improve
data quality.

The primary data cleansing features found in ETL tools are rectification andhomogenization.

They use specific dictionaries to rectify typing mistakes and to recognize synonyms, as well as
rule-based cleansing to enforce domain-specific rules and defines appropriate associations
between values.

The following examples show the essential of data cleaning:

If an enterprise wishes to contact its users or its suppliers, a complete, accurate and up- to-date
list of contact addresses, email addresses and telephone numbers must be available.

If a client or supplier calls, the staff responding should be quickly able to find the person in the
enterprise database, but this need that the caller's name or his/her company name is listed in the
database.

If a user appears in the databases with two or more slightly different names or different account
numbers, it becomes difficult to update the customer's information.

Transformation

Transformation is the core of the reconciliation phase. It converts records from its operational
source format into a particular data warehouse format. If we implement a three-layer
architecture, this phase outputs our reconciled data layer.

The following points must be rectified in this phase:

o Loose texts may hide valuable information. For example, XYZ PVT Ltd does not
explicitly show that this is a Limited Partnership company.
o Different formats can be used for individual data. For example, data can be saved as a
string or as three integers.
Following are the main transformation processes aimed at populating the reconciled data layer:

o Conversion and normalization that operate on both storage formats and units of measure
to make data uniform.
o Matching that associates equivalent fields in different sources.
o Selection that reduces the number of source fields and records.

Cleansing and Transformation processes are often closely linked in ETL tools.

Loading

The Load is the process of writing the data into the target database. During the load step, it is
necessary to ensure that the load is performed correctly and with as little resources as possible.
Loading can be carried in two ways:

1. Refresh: Data Warehouse data is completely rewritten. This means that older file is
replaced. Refresh is usually used in combination with static extraction to populate a data
warehouse initially.
2. Update: Only those changes applied to source information are added to the Data
Warehouse. An update is typically carried out without deleting or modifying preexisting
data. This method is used in combination with incremental extraction to update data
warehouses regularly.

Selecting an ETL Tool

Selection of an appropriate ETL Tools is an important decision that has to be made in choosing
the importance of an ODS or data warehousing application.

The ETL tools are required to provide coordinated access to multiple data sources so that
relevant data may be extracted from them.

An ETL tool would generally contains tools for data cleansing, re-organization, transformations,
aggregation, calculation and automatic loading of information into the object database.

An ETL tool should provide a simple user interface that allows data cleansing and data
transformation rules to be specified using a point-and-click approach.

When all mappings and transformations have been defined, the ETL tool should automatically
generate the data extract/transformation/load programs, which typically run in batch
mode.MarkLogic,Oracle,Amazon RedShift etc.

Data Warehouse vs. Database:

Parameter Data Warehouse Database

Transactional and
Workloads Analytical
Operational
It is subject-focused since it
provides information on a certain
topic rather than information Removes redundancy
about a company's current and offers security. It
Characteristics
activities. The data also has to allows for numerous
be stored in a unanimously data views.
acceptable manner and data
warehouse in common.

It stores both historical and


The data in the
Data Type current data. It is possible that
database is updated.
the data is out of date.

Might not be updated. Depends


Orientation on the frequency of ETL Real-time
processes.

Purpose Designed to analyze Designed to record

A database's tables
Tables and joins are
and joins are
Tables and Joins straightforward since they're
complicated because
denormalized.
they're normalized.

Data is updated from


Availability It is available in real-time. source systems when
needed.
Technique Analyze data Capture data

Complex queries are


Simple transaction queries are
Query Type utilized for analytical
implemented.
reasons.

Flexible or rigid
Fixed and pre-defined schema
Schema Flexibility schema based on the
definition for ingest.
type of database.

Data scientists and business Application


Users
analysts. developers

It makes use of OLTP


It uses OLAP (Online Analytical
Processing Method (Online Transactional
Processing).
Processing).

Generally confined to
Data from any number of apps is
Storage Limit a particular
stored.
application.

ER modeling
Data modeling approaches are approaches are
employed for designing. It employed for
Usage
permits you to analyze your designing. It aids in
enterprise. the execution of basic
business procedures
Banking, universities,
airlines, finance,
Healthcare sector, airline, retail
telecommunication,
Applications chain, insurance sector, banking,
manufacturing, sales
and telecommunication.
and production, and
HR management.

A data warehouse allows It provides data


business users to access vital security and access.
data from several sources in one
location.

A database provides
a number of methods
It delivers consistent information for storing and
Pros on numerous cross-functional retrieving data.
tasks.

Database function as
Aids in the integration of several an efficient handler to
data sources in order to alleviate balance the need of
the load on the production various applications
system. using the same data.

The cost of hardware


Adding additional data sources
and software for
takes effort and comes at a
creating a database
considerable cost.
system is quite high,
Cons
which might raise
your organization's
budget.
Problems with the data
warehouse can sometimes go
undiscovered for years.
Because many DBMS
Data warehouses require a lot of systems are
upkeep. complicated, training
users to utilize the
DBMS is essential.

Data extraction, loading, and


cleaning can be time-consuming.
Data owners might
lose control of their
data, generating
concerns about
ownership,
security, and privacy.

Data Warehouse Models:

Data Modeling
The process of developing a visual representation of an entire information system or sections to express
connections between data points and structures is known as data modeling.

Why is data modeling important?


Data modeling ensures that all data objects required by the Database are correctly represented. Omission of data
will lead to inaccurate reports and produce incorrect results. The omission of data will result in incorrect
reporting and bad outcomes. Data modeling aims to illustrate the types of data utilized and stored within the
system, the relationships between them, how they can be grouped and organized, and their formats and
attributes.

The Data Model gives a clear picture of business requirements.

The life cycle of Data Modeling


 Gathering Business Requirements
 Conceptual Data Modeling
 Logical Data Modeling
 Physical Data Modeling
 Development of Schema / the database
 Maintenance of data Model time to time as per requirement

Types of the data model


Data modeling facilitates the creation of a conceptual model and the establishment of relationships between
items.
 Conceptual Data Model
 Logical Data Model
 Physical Data Model

Conceptual Data Model


Conceptual models are usually built as part of gathering early project requirements.
The conceptual model defines what the system contains.
This data model focuses on finding the data used in a business instead of the processing flow.
The main objective of this data model is to organize and establish business rules and concepts.
It includes entity classes, properties and constraints, relationships, and the necessary security and data integrity
requirements.
 Its primary purpose is to establish entities, attributes, and relationships between two entities.
 Business stakeholders or data architects create it
 Its purpose is to create various busine business rules.

Logical Data Model


The logical Data Model map of rules and Data Structures includes the required data, such as tables, columns,
etc.
A logical data model consists of tables, documents, descriptions, etc. The document structures are defined in this
model.
This data model is always present in the root package object.
This type of data model helps create the physical model base. There is no secondary or primary key defined in
this model.
 It defines the structure of data elements and their relationships also.
 Business analysts and data architects create it.

Physical Data Model


In a physical data model, we care about how the system can store the actual data.
It manages the replication, shards, etc., physically.
It defines the components and services which are required to build a database. It is created by using the database
language and queries.
The physical data model provides database column keys, constraints, and RDBMS features.
 We create various schemas, abstraction of schemas, and different mapping types in these data models.
 Database administrators and developers create it.
 It is the actual implementation of the Database.
Data Warehouse Modeling
According to the definition by Bill Inmon, "Data Warehouse is a subject-oriented, integrated, non-volatile and
time-variant collection of data in support of management's decision."
A data warehouse is a storage and reporting system for data. Data is often collected from various sources before
being transported to a data warehouse for long-term storage and analysis. This data is organized so that users
from various divisions or departments within an organization may access and evaluate it as needed.

How Data warehouse helps to improve business processes


 ' structured' data is simpler to report on in a data warehouse.
 With a data warehouse, we can report from multiple data sources simultaneously.
 Historical reporting and trend analysis are feasible with a data warehouse.
 A data warehouse minimizes reporting errors and saves a lot of time.

Types of Data Warehouses Models


So, mainly there are three different types of data warehouse models
1. Enterprise warehouse
2. Data Mart
3. Virtual warehouse

Enterprise Warehouse
An Enterprise database brings together various functional areas of an organization and brings them together in a
unified manner. An enterprise data warehouse structures and stores all company's business data for analytics
querying and reporting. It collects all of the information about subjects spanning the entire organization.
The goal of the Enterprise data warehouse is to provide a complete overview of any particular object in the data
model.
It mainly contains detailed summarized information and can range from a few gigabytes to hundreds of
gigabytes, terabytes, or maybe beyond.

Data Mart
It is a data store designed for a particular department of an organization or company. Data Mart is a subset of the
data warehouse where all the information related to specific business area is stored usually oriented to a specific
task.
A data mart contains a subset of corporate-wide data that is of value to a specific group of users. The scope is
confined to specific selected subjects. For example, a marketing data mart may confine its subjects to customer,
item, and sales. The data contained in datamarts tend to be summarized.

Data marts are usually implemented on low-cost departmental servers that are UNIX/LINUX- or Windows-
based. The implementation cycle of a data mart is more likely to be measured in weeks rather than months or
years. However, it may involve complex integration in the long run if its design and planning were not
enterprise-wide.

Data that we use for a particular department or purpose is called data mart.
Reason for creating a data mart
 Easy access of frequently used data
 It improves end-user response time
 It can be easily creation of data mart
 Less cost in building a data mart.

Types of data mart


There are two types of data mart
1. Dependent Data Mart
2. Independent Data Mart

Dependent data mart


The dependent data mart is built by drawing data from a central data warehouse.
Independent data mart
The independent data mart is built by drawing from operational or external data sources or both.
The Independent data mart is created without the help of a data warehouse.

Virtual warehouse
A virtual data warehouse gives you a quick overview of your data. It has metadata in it. It connects to several
data sources with the use of middleware. They are quick because they allow users to filter the most critical data
from various older applications.
A virtual warehouse is easy to set up, but it requires more database server capacity.

Meta Data Repository:


Metadata are data about data. When used in a data warehouse, metadata are the data that define warehouse
objects. Metadata are created for the data names and definitions of the given warehouse. Additional metadata
are created and captured for time stamping any extracted data, the source of the extracted data, and missing
fields that have been added by data cleaning or integration processes.

A metadata repository should contain the following:

A description of the structure of the data warehouse, which includes the warehouse schema, view,
dimensions, hierarchies, and derived data definitions, as well as data mart locations and contents.
Operational metadata, which include data lineage (history of migrated data and the sequence of
transformations applied to it), currency of data (active, archived, or purged), and monitoring information
(warehouse usage statistics, error reports, and audit trails).

The algorithms used for summarization, which include measure and dimension definition algorithms, data on
granularity, partitions, subject areas, aggregation, summarization, and predefined queries and reports.

The mapping from the operational environment to the data warehouse, which includes source databases and
their contents, gateway descriptions, data partitions, data extraction, cleaning, transformation rules and defaults,
data refresh and purging rules, and security (user authorization and access control).

Data related to system performance, which include indices and profiles that improve data access and retrieval
performance, in addition to rules for the timing and scheduling of refresh, update, and replication cycles.

Business metadata, which include business terms and definitions, data ownership information, and charging
policies.

MULTIDIMENSIONAL DATA MODELING Warehousing FOR DATA WAREHOUSING

 A multidimensional model is a technique, structure for data warehousing tools. Ralph


Kimball firstly introduce the three-dimension model.
 A multidimensional model display data in the form of a data-cube.
 This data cube able data to be designed and display in multiple dimensions. This cube is
defined through dimensions and facts.
 The dimensions are the parts or entities regards which an organization saves data or
records.
 The Dimensional model is basically developed to read, analyze numeric information like
values, counts, weights, balances, summarize thedata, etc. in a data warehouse.
 This model has a unique way of stored data for different advantages.
 This model used in the relational model to minimize redundancy and also used to
normalize ER models.
 The data is stored in three dimensions in an easy way, it is easily retrieved and generate
reports.
Example of Multi-Dimensional Data Model
A shop keeper may build a sales data warehouse in which he wants to keep a record of the stores’ sales for the
dimension product, time, and location. These three dimensions keep the record of these things. For
example, it keeps records of how many sales are made in one month. Each dimension allows keeping the
records.
Each dimension has a table, this table is called dimension tablets table also specifies the further dimensions.
The dimension table for products may contain attributes like product name, brand, price, and type.
The whole model is built around a specific theme. For example, we create it for the sales data warehouse. The
theme is represented by the fact table.
The fact table consists of numerical measures. This may contain the name of fact and all measurements of
related dimensions.

Elements of the dimensional data model


1. Dimension
2. Facts
1. Attribute
2. Dimension table
3. Fact table
4. Dimension

Dimension
Dimension provides information about a business process event. In other words, they specify who, what, whereof
a fact. For example In the Sales dimension or business process, for the fact quarterly sales number, dimensions
may contain
Who – Names of customer
Where – Location/place
What – Product Name
Fact
Facts are the numerical measurements or facts from your business process. For a Sales business process, a
measurement may be sales number
Attribute
The characteristics of dimension are called attributes. For the locationdimension, the attributes would be
Country
City
Zipcode
State
Dimension table
A dimension table consists of dimensions of a fact. The dimension table is used to join the fact table with a
foreign key.
Dimension tables are de-normalized table. In a dimension table the Dimension Attributes as different columns.
It describes the characteristics of the facts with the help of their attributes.
Fact table
It is known as the primary table in a dimensional model. This table consistsof Foreign key to the dimension table
Measurements/facts.

The multidimensional data model is an integral part of On-Line Analytical Processing, or OLAP.
Multidimensional data model is to view it as a cube. The cable at the left contains detailed sales data by product,
market and time. The cube on the right associates sales number (unit sold) with dimensions-product type, market
and time with the unit variables organized as cell in an array.
This cube can be expended to include another array-price-which can be associates with all or only some
dimensions. As number of dimensions increases number of cubes cell increase exponentially.
Dimensions are hierarchical in nature i.e. time dimension may contain hierarchies for years, quarters, months,
weak and day. GEOGRAPHY may contain country, state, city etc.
A data warehouse is a huge collection of data. Such data may involve grouping of data on multiple attributes.
For example, the enrolment data of the students of a University may be represented using a student schema such
as: Student_enrolment (year, programme, region, number) Here, some typical data value may be (These values
are shown in below Figure also.

Although, in an actual situation almost all the values will be filled up):
• In the year 2002, BCA enrolment at Region (Regional Centre Code) RC-07 (Delhi) was 350.
• In year 2003 BCA enrolment at Region RC-07 was 500.
• In year 2002 MCA enrolment at all the regions was 8000.

To define the student number here, we need to refer to three attributes: the year, programme and the region.
Each of these attributes is identified as the dimension attributes.

Thus, the data of student_enrolment table can be modeled using dimension attributes (year, programme, region)
and a measure attribute (number). Such kind of data is referred to as a Multidimensional data. Thus, a data
warehouse may use multidimensional matrices referred to as a data cubes model. The multidimensional
data of a corporate data warehouse, for example, would have the fiscal period, product and branch dimensions.
If the dimensions of the matrix are greater than three, then it is called a hypercube. Query performance in
multidimensional matrices that lend themselves to dimensional formatting can be much better than that of the
relational data model. The following figure represents the Multidimensional data of a University:

Multidimensional data may be a little difficult to analyse. Therefore, Multidimensional data may be displayed
on a certain pivot, for example, consider the following table:
The table given above, shows, the multidimensional data in cross-tabulation. This is also referred to as a pivot-
table. That cross-tabulation is done on any two dimensions keeping the other dimensions fixed as ALL. For
example, the table above has two dimensions Year and Programme, the third dimension Region has a fixed
value ALL for the given table. Please note that, the cross-tabulation as we have shown in the table above is,
different to a relation. The relational representation for the data of the table above may be:

A cross tabulation can be performed on any two dimensions. The operation of changing the dimensions in a
cross tabulation is termed as pivoting. In case a cross tabulation is done for a value other than ALL for the fixed
third dimension, then it is called slicing. For example, a slice can be created for Region code RC-07 instead of
ALL the regions in the cross tabulation of regions. This operation is called dicing if values of multiple
dimensions are fixed. Multidimensional data allows data to be displayed at various level of granularity. An
operation that converts data with a fine granularity to coarse granularity using aggregation is, termed rollup
operation. For example, creating the cross tabulation for All regions is a rollup operation. On the other hand an
operation that moves from a coarse granularity to fine granularity is known as drill down operation. For
example, moving from the cross tabulation on All regions back to Multidimensional data is a drill down
operation. Please note: For the drill down operation, we need, the original data or any finer granular data.

A multidimensional storage model contains two types of tables: the dimension tables and the fact table. The
dimension tables have tuples of dimension attributes, whereas the fact tables have one tuple each for a
recorded fact. In order to relate a fact to a dimension, we may have to use pointers. Let us demonstrate this
with the help of an example. Consider the University data warehouse where one of the data tables is the Student
enrolment table. The three dimensions in such a case would be:
• Year • Programme, and • Region

Schema Design:
Stars, Snowflakes, and Fact Constellations:
Schemas for Multidimensional Databases The entityrelationship data model is commonly used in the design of
relational databases, where a database schema consists of a set of entities and the relationships between them.
Such a data model is appropriate for online transaction processing.

A data warehouse, however, requires a concise, subject-oriented schema that facilitates on-line data analysis.
The most popular data model for a data warehouse is a multidimensional model. Such a model can exist in the
form of a star schema, a snowflake schema, or a fact constellation schema.

schema types.

Star schema: A fact table in the middle connected to a set of dimension tables It contains: (1) a large central
table (fact table) containing the bulk of the data, with no redundancy, and (2) a set of smaller attendant tables
(dimension tables), one for each dimension. The schema graph resembles a starburst, with the dimension tables
displayed in a radial pattern around the central fact table.

Snowflake schema.:
A refinement of star schema where some dimensional hierarchy is further splitting (normalized) into a set of
smaller dimension tables, forming a shape similar to snowflake However, the snowflake structure can reduce
the effectiveness of browsing, since more joins will be needed.

Example 1:

The star schema for such a data is shown in Figure

In the above figure, the fact table points to different dimension tables, thus, ensuring the reliability of the
data. Please notice that, each Dimension table is a table for a single dimension only and that is why this schema
is known as a star schema. However, a dimension table may not be normalised. Thus, a new schema named the
snowflake schema was created. A snowflake schema has normalised but hierarchical dimensional tables. For
example, consider the star schema shown in Figure , if in the Region dimension table, the value of the field
Rcphone is multivalued, then the Region dimension table is not normalised. Thus, we can create a snowflake
schema for such a situation as:
Data warehouse storage can also utilise indexing to support high performance access. Dimensional data
can be indexed in star schema to tuples in the fact table by using a join index. Data warehouse storage facilitates
access to summary data due to the nonvolatile nature of the data.

Example 2: Star schema:


Snowflake schema

Fact constellation.
Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy
schema or fact constellation.
In data warehousing, there is a distinction between a data warehouse and a data mart.

A data warehouse collects information about subjects that span the entire organization, such as customers,
items, sales, assets, and personnel, and thus its scope is enterprise-wide. For data warehouses, the fact
constellation schema is commonly used, since it can model multiple, interrelated subjects.

A data mart, on the other hand, is a department subset of the data warehouse that focuses on selected subjects,
and thus its scope is department wide.
For data marts, the star or snowflake schema are commonly used, since both are geared toward modeling single
subjects, although the star schema is more popular and efficient.

Measures: Their Categorization and Computation:


“How are measures computed?” To answer this question, we first study how measures can be categorized.that a
multidimensional point in the data cube space can be defined by a set of dimension-value pairs, for example,
htime = “Q1”, location = “Vancouver”,item = “computer”i. A data cube measure is a numerical function that
can be evaluated at each point in the data cube space.

A measure value is computed for a given point by aggregating the data corresponding to the respective
dimension-value pairs defining the given point.

Measures can be organized into three categories (i.e., distributive, algebraic, holistic), based on the kind of
aggregate functions used.
Distributive: An aggregate function is distributive if it can be computed in a distributed manner as follows.

Suppose the data are partitioned into n sets. We apply the function to each partition, resulting in n aggregate
values. If the result derived by applying the function to the n aggregate values is the same as that derived by
applying the function to the entire data set (without partitioning), the function can be computed in a distributed
manner.

For example, count() can be computed for a data cube by first partitioning the cube into a set of subcubes,
computing count() for each subcube, and then summing up the counts obtained for each subcube. Hence,
count() is a distributive aggregate function.

For the same reason, sum(), min(), and max() are distributive aggregate functions. A measure is distributive if it
is obtained by applying a distributive aggregate function. Distributive measures can be computed efficiently
because they can be computed in a distributive manner.
OLAP& OLTP

Data warehouses are not suitably designed for transaction processing, however, they support increased
efficiency in query processing. Therefore, a data warehouse is a very useful support for the analysis of data. But
are there any such tools that can utilise the data warehouse to extract useful analytical information?

On Line Analytical Processing (OLAP)


Online Analytical Processing, a category of software tools which provide analysis of data for business
decisions. OLAP systems allow users to analyze database information from multiple database systems at one
time. The primary objective is data analysis and not data processing.

Example of OLAP Any Datawarehouse system is an OLAP system.


Uses of OLAP are as follows • A company might compare their mobile phone sales in September with sales in
October, then compare those results with another location which may be stored in a sperate database. • Amazon
analyzes purchases by its customers to come up with a personalized homepage with products which likely
interest to their customer.

It is an approach for performing analytical queries and statistical analysis of multidimensional data. OLAP tools
can be put in the category of business intelligence tools along with data mining.

Some of the typical applications of OLAP may include reporting of sales projections, judging the performance
of a business, budgeting and forecasting etc.

OLAP tools require multidimensional data and distributed query-processing capabilities. Thus, OLAP has data
warehouse as its major source of information and query processing. But how do OLAP tools work?
In an OLAP system a data analyst would like to see different cross tabulations by interactively selecting the
required attributes. Thus, the queries in an OLAP are expected to be executed extremely quickly. The basic data
model that may be supported by OLAP is the star schema, whereas, the OLAP tool may be compatible to a data
warehouse.

Let us, try to give an example on how OLAP is more suitable to a data warehouse rather than to a relational
database. An OLAP creates an aggregation of information, for example, the sales figures of a sales person can
be grouped (aggregated) for a product and a period. This data can also be grouped for sales projection of the
sales person over the regions (North, South) or states or cities. Thus, producing enormous amount of aggregated
data. If we use a relational database, we would be generating such data many times. However, this data has
many dimensions so it is an ideal candidate for representation through a data warehouse. The OLAP tool thus,
can be used directly on the data of the data warehouse to answer many analytical queries in a short time span.
The term OLAP is sometimes confused with OLTP.

OLAP is an approach to answering multi-dimensional analytical (MDA) queries swiftly. OLAP is part of the
broader category of business intelligence, which also encompasses relational database, report writing and data
mining.

OLAP tools enable users to analyze multidimensional data interactively from multiple perspectives.
OLAP consists of three basic analytical operations:
➢Consolidation (Roll-Up)
➢Drill-Down
➢Slicing And Dicing
Consolidation involves the aggregation of data that can be accumulated and computed in one or more
dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate
sales trends.
The drill-down is a technique that allows users to navigate through the details. For instance, users can view the
sales by individual products that make up a region’s sales.
Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and
view (dicing) the slices from different viewpoints.

OLAP Implementation/ Types of OLAP:


1. Relational OLAP (ROLAP):
2. Multidimensional OLAP (MOLAP):
3. Hybrid OLAP (HOLAP):

This classical form of OLAP implementation uses multidimensional arrays in the memory to store
multidimensional data. Such implementation of OLAP is also referred to as Multidimensional OLAP
(MOLAP). MOLAP is faster as it stores data in an already processed aggregated data form using dimension and
fact tables.

MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP.
MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database.
Therefore it requires the pre-computation and storage of information in the cube - the operation known as
processing.
MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all
the possible answers to a given range of questions.
MOLAP tools have a very fast response time and the ability to quickly write back data into the data set.

Relational OLAP (ROLAP), which stores data directly in the relational databases. ROLAP creates
multidimensional views upon request rather than in advance as in MOLAP. ROLAP may be used on complex
data with a wide number of fields. ROLAP works directly with relational databases. The base data and the
dimension tables are stored as relational tables and new tables are created to hold the aggregated information. It
depends on a specialized schema design.

This methodology relies on manipulating the data stored in the relational database to give the appearance of
traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to
adding a "WHERE" clause in the SQL statement.

ROLAP tools do not use pre-calculated data cubes but instead pose the query to the standard relational database
and its tables in order to bring back the data required to answer the question.
ROLAP tools feature the ability to ask any question because the methodology does not limit to the contents of a
cube. ROLAP also has the ability to drill down to the lowest level of detail in the database.

Hybrid OLAP (HOLAP):


There is no clear agreement across the industry as to what constitutes Hybrid OLAP, except that a database will
divide data between relational and specialized storage.
For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of
detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or
less-detailed data.
HOLAP addresses the shortcomings of MOLAP and ROLAP by combining the capabilities of both approaches.
HOLAP tools can utilize both pre-calculated cubes and relational data sources.
OLTP is online transaction processing.

Online transaction processing shortly known as OLTP supports transaction oriented applications in a 3-tier
architecture. OLTP administers day to day transaction of an organization. The primary objective is data
processing and not data analysis.

Example of OLTP system An example of OLTP system is ATM center. Assume that a couple has a joint
account with a bank. One day both simultaneously reach different ATM centers at precisely the same time and
want to withdraw total amount present in their bank account. However, the person that completes authentication
process first will be able to get money. In this case, OLTP system makes sure that withdrawn amount will be
never more than the amount present in the bank. The key to note here is that OLTP systems are optimized for
transactional superiority instead data analysis. Other examples of

OLTP systems focus on highly concurrent transactions and better commit protocols that support high rate of
update transactions. On the other hand, OLAP focuses on good query-evaluation and query-optimisation
algorithms.

In this cube we can observe, that each side of the cube represents one of the elements of the
question. The x-axis represents the time, the y-axis represents the products and the z-axis
represents different centers. The cells of in the cube represents the number of product sold or
can represent the price of the items.

This Figure also gives a different understanding to the drilling down operations.
The relations defined must not be directly related, they related directly.

DATA CUBE
A data cube in a data warehouse is a multidimensional structure used to store data. The data cube was initially
planned for the OLAP tools that could easily access the multidimensional data. But the data cube can also be
used for data mining. Data cube stores the precomputed data and eases online
analytical processing.

Data cube represents the data in terms of dimensions and facts. A data cube is used to represents the aggregated
data. A data cube is basically categorized into two main kinds that are multidimensional data cube and
relational data cube.

When it comes to cube, we, all think it as a three-dimensional structure but in data warehousing, we can
implement an n-dimensional data cube.

The dimensions of data cube are the attitude, angle or the entities with respect to which the enterprise wants to
store the data.
Now, how does it help the analyst to analyze and extract the data?
Let us take an example, consider we have data about AllElectronics sales.

Here we can store the sales data in many perspectives or dimensions like sales in all time, sale at all branches,
sales at all location, sales of all items.
The figure below shows the data cube for AllElectronics sales.

Each dimension has a dimension table which contains a further description of that dimension. Such as a branch
dimension may have branch_name, branch_code, branch_address etc.

A multidimensional data model like data cube is always based on a theme which is termed as fact. Like in the
above example of a data set of AllElectronic we have stored data based on the sales of the electronic item.

So, here the fact is sales. A fact has a fact table associated with it.

The fact table has the data in numeric forms which denotes the numeric measures such as a number of units of
an item sold, sale of a particular branch in a particular year, etc. Knowing data cube let us further move to the
data cube classification.

Data Cube Classification


Data cube can be classified into two main categories as discussed below:
1. Multidimensional Data Cube (MOLAP)
Multidimensional arrays are used to store data that assures a multidimensional view of the data.
Multidimensional data cube helps in storing large amount of data. Multidimensional data cube implements
indexing to represent each dimension of a data cube which improves the accessing, retrieving and storing data
from the data cube.

2. Relational Data cube (ROLAP)


You can consider the relational data cube as the ‘extended version of relational DBMS’. Relational tables are
used to store data and each relational table represents the dimension of a data cube.
To calculate the aggregated data relational data cube implements SQL but when it comes to performance the
relational data cube’s performance is slower than the multidimensional data cube. But the relational data cube is
scalable for steadily increasing data.

You can even get the combination of both relational data cube as well as multidimensional data cube which is
termed as a hybrid data cube.

The hybrid data cube (HOLAP) retrieve features such as scalability from relational data cube and it retrieves
faster computation from multidimensional data cube.

Operations on Data Cube/OLAP operations:


Now, let us discuss the operations that can be conducted on data cube in order to view data from different angles.
There are four basic operations that can be implemented on a data cube which are discussed below:

In the multidimensional model, the records are organized into various dimensions, and each dimension includes
multiple levels of abstraction described by concept hierarchies. This organization support users with the
flexibility to view data from various perspectives.

A number of OLAP data cube operation exist to demonstrate these different views, allowing interactive queries
and search of the record at hand. Hence, OLAP supports a user-friendly environment for interactive data
analysis.

Consider the OLAP operations which are to be performed on multidimensional data. The figure shows data
cubes for sales of a shop. The cube contains the dimensions, location, and time and item, where the location is
aggregated with regard to city values, time is aggregated with respect to quarters, and an item is aggregated
with respect to item types.

1. Roll Up Roll-up operation summarizes or aggregates the dimensions either by performing dimension
reduction or you can perform concept hierarchy.
The below figure shows you the example of a roll-up operation performed on the location dimension of the data
cube we have seen above.
The roll-up operation (also known as drill-up or aggregation operation) performs aggregation on a data cube,
by climbing down concept hierarchies, i.e., dimension reduction. Roll-up is like zooming-out on the data cubes.
Figure shows the result of roll-up operations performed on the dimension location. The hierarchy for the
location is defined as the Order Street, city, province, or state, country. The roll-up operation
aggregates the data by ascending the location hierarchy from the level of the city to the level of the country.
When a roll-up is performed by dimensions reduction, one or more dimensions are removed from the cube. For
example, consider a sales data cube having two dimensions, location and time. Roll-up may be performed by
removing, the time dimensions, appearing in an aggregation of the total sales by location, relatively than by
location and by time.

Example
Consider the following cubes illustrating temperature of certain days recorded
weekly

Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in temperature from the
above cubes.
To do this, we have to group column and add up the value according to the concept hierarchies. This
operation is known as a roll-up.
By doing this, we contain the following cube:
The roll-up operation groups the information by levels of temperature.
The following diagram illustrates how roll-up works.

2. Drill Down
When the drill-down operation is performed on any dimension the data on the dimension is fragmented into
granular form.
In the figure below you can see the drill-down operation on the time dimension where the quarter Q1, Q2, is
fragmented into months.
the drill-down operation (also called roll-down) is the reverse operation of roll-up. Drill-down is like zooming-
in on the data cube. It navigates from less detailed record to more detailed data. Drill-down can be performed by
either stepping down a concept hierarchy for a dimension or adding additional dimensions.

Figure shows a drill-down operation performed on the dimension time by stepping down a concept hierarchy
which is defined as day, month, quarter, and year. Drill-down appears by descending the time hierarchy from the
level of the quarter to a more detailed level of the month. Because a drill-down adds more details to the given
data, it can also be performed by adding a new dimension to a cube. For example, a drill-down on the central
cubes of the figure can occur by introducing an additional dimension, such as a customer group.
Example
Drill-down adds more details to the given data
The following diagram illustrates how Drill-down works
3. Slice and Dice
The slice and dice operation pick up one dimension of the data cube and then forms a sub cube out of it. The
figure below represents the slice operation on a data cube where the data cube is sliced based on Time.

The dice operation select more than one dimension to form a subcube. Like in the figure below you can see that
the subcube is formed by selecting the dimensions such as location, items and time
A slice is a subset of the cubes corresponding to a single value for one or more members of the
dimension. For example, a slice operation is executed when the customer wants a selection on one
dimension of a three-dimensional cube resulting in a two-dimensional site. So, the Slice operations
perform a selection on one dimension of the given cube, thus resulting in a subcube.

For example, if we make the selection, temperature=cool we will obtain the following cube:

The following diagram illustrates how Slice works.


The following diagram illustrates how Slice works.
Here Slice is functioning for the dimensions "time" using the criterion time = "Q1". It will form a new sub-cubes
by selecting one or more dimensions.

The dice operation describes a subcube by operating a selection on two or more dimension.
For example, Implement the selection (time = day 3 OR time = day 4) AND (temperature = cool OR
temperature = hot) to the original cubes we get the following subcube (still two-dimensional)
The dice operation on the cubes based on the following selection criteria involves three dimensions.
(location = "Toronto" or "Vancouver")
o (time = "Q1" or "Q2")
o (item =" Mobile" or "Modem")

Pivot
Pivot is not a calculative operation actually it rotates the data cube in order to view data cube from different
dimensions.
The figure below shows the pivot operation performed on the data cube
So, these are the four operations that can be performed on the data cube.
The pivot operation is also called a rotation. Pivot is a visualization operations which rotates the data axes in
view to provide an alternative presentation of the data. It may contain swapping the rows and columns or
moving one of the row-dimensions into the columndimensions.
.

Other OLAP Operations


executes queries containing more than one fact table. The drill-through operations make use of relational SQL
facilitates to drill through the bottom level of a data cubes down to its back-end relational tables.
Other OLAP operations may contain ranking the top-N or bottom-N elements in lists, as well as calculate
moving average, growth rates, and interests, internal rates of returns, depreciation, currency conversions, and
statistical tasks.
OLAP offers analytical modeling capabilities, containing a calculation engine for determining ratios, variance,
etc. and for computing measures across various dimensions. It can generate summarization, aggregation, and
hierarchies at each granularity level and at every dimensions intersection. OLAP also provide functional models
for forecasting, trend analysis, and statistical analysis. In this context, the OLAP engine is a powerful data
analysis tool.

Difference between OLTP and OLAP


Below is the difference between OLAP and OLTP in Data Warehouse:

Parameters OLTP OLAP

It is an online transactional
OLAP is an online analysis and data
Process system. It manages database
retrieving process.
modification.

It is characterized by large
It is characterized by a large volume of
Characteristic numbers of short online
data.
transactions.

OLTP is an online database OLAP is an online database query


Functionality
modifying system. management system.

Method OLTP uses traditional DBMS. OLAP uses the data warehouse.

Insert, Update, and Delete


Query Mostly select operations
information from the database.

Tables in OLTP database are Tables in OLAP database are not


Table
normalized. normalized.

OLTP and its transactions are Different OLTP databases become the
Source
the sources of data. source of data for OLAP.

OLAP database does not get frequently


OLTP database must maintain
Data Integrity modified. Hence, data integrity is not an
data integrity constraint.
issue.

It’s response time is in


Response time Response time in seconds to minutes.
millisecond.

The data in the OLTP


The data in OLAP process might not be
Data quality database is always detailed
organized.
and organized.
Parameters OLTP OLAP

It helps to control and run It helps with planning, problem-solving,


Usefulness
fundamental business tasks. and decision support.

Operation Allow read/write operations. Only read and rarely write.

It is a market orientated
Audience It is a customer orientated process.
process.

Queries in this process are Complex queries involving


Query Type
standardized and simple. aggregations.

Complete backup of the data OLAP only need a backup from time to
Back-up combined with incremental time. Backup is not important compared
backups. to OLTP

DB design is application
oriented. Example: Database DB design is subject oriented. Example:
Design design changes with industry Database design changes with subjects
like Retail, Airline, Banking, like sales, marketing, purchasing, etc.
etc.

It is used by Data critical users


Used by Data knowledge users like
User type like clerk, DBA & Data Base
workers, managers, and CEO.
professionals.

Designed for real time Designed for analysis of business


Purpose
business operations. measures by category and attributes.

Performance Transaction throughput is the Query throughput is the performance


metric performance metric metric.

Number of This kind of Database users This kind of Database allows only
users allows thousands of users. hundreds of users.

It helps to Increase user’s self- Help to Increase productivity of the


Productivity
service and productivity business analysts.
Parameters OLTP OLAP

Data Warehouses historically An OLAP cube is not an open SQL


have been a development server data warehouse. Therefore,
Challenge
project which may prove technical knowledge and experience is
costly to build. essential to manage the OLAP server.

It provides fast result for daily It ensures that response to the query is
Process
used data. quicker consistently.

It is easy to create and It lets the user create a view with the
Characteristic
maintain. help of a spreadsheet.

OLTP is designed to have fast A data warehouse is created uniquely


response time, low data so that it can integrate different data
Style
redundancy and is sources for building a consolidated
normalized. database

Benefits of using OLAP services


 OLAP creates a single platform for all types of business analytical needs
which includes planning, budgeting, forecasting, and analysis.
 The main benefit of OLAP is the consistency of information and
calculations.
 Easily apply security restrictions on users and objects to comply with
regulations and protect sensitive data.

Benefits of OLTP method


 It administers daily transactions of an organization.
 OLTP widens the customer base of an organization by simplifying
individual processes.
Drawbacks of OLAP service
 Implementation and maintenance are dependent on IT professional
because the traditional OLAP tools require a complicated modeling
procedure.
 OLAP tools need cooperation between people of various departments
tobe effective which might always be not possible.

Drawbacks of OLTP method


 If OLTP system faces hardware failures, then online transactions get
severely affected.
 OLTP systems allow multiple users to access and change the same
data at the same time which many times created unprecedented
situation.

You might also like