0% found this document useful (0 votes)
32 views8 pages

Eval of Business Performance - Module 1

The document provides an overview of data warehousing including key concepts and terminology. It discusses what a data warehouse is, why it is separated from operational databases, its features and applications. It also covers different types of data warehouses and compares them with operational databases.

Uploaded by

Daniela Samia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views8 pages

Eval of Business Performance - Module 1

The document provides an overview of data warehousing including key concepts and terminology. It discusses what a data warehouse is, why it is separated from operational databases, its features and applications. It also covers different types of data warehouses and compares them with operational databases.

Uploaded by

Daniela Samia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

[Evaluation of Business Performance]

1
[Introduction to Data Warehousing]

Module 1: Data Warehousing

Course Learning Outcomes:


1. To have an overview in Data Warehousing
2. To know the concepts of Data Warehousing
3. To know the different terminologies that are used in Data Warehousing

Introduction
The term "Data Warehouse" was first coined by Bill Inmon in 1990.
According to Inmon, a data warehouse is a subject oriented, integrated, time-
variant, and non-volatile collection of data. This data helps analysts to take
informed decisions in an organization.

An operational database undergoes frequent changes on a daily basis on


account of the transactions that take place. Suppose a business executive
wants to analyze previous feedback on any data such as a product, a supplier,
or any consumer data, then the executive will have no data available to
analyze because the previous data has been updated due to transactions.

A data warehouses provides us generalized and consolidated data in


multidimensional view. Along with generalized and consolidated view of
data, a data warehouses also provides us Online Analytical Processing
(OLAP) tools. These tools help us in interactive and effective analysis of data
in a multidimensional space. This analysis results in data generalization and
data mining.

Data mining functions such as association, clustering, classification,


prediction can be integrated with OLAP operations to enhance the interactive
mining of knowledge at multiple level of abstraction. That's why data
warehouse has now become an important platform for data analysis and
online analytical processing.
Understanding a Data Warehouse
 A data warehouse is a database, which is kept separate from the organization's operational
database.
 There is no frequent updating done in a data warehouse.
 It possesses consolidated historical data, which helps the organization to analyze its
business.

Course Module
[Evaluation of Business Performance]
2
[Introduction to Data Warehousing]

 A data warehouse helps executives to organize, understand, and use their data to take
strategic decisions.

 Data warehouse systems help in the integration of diversity of application systems.

 A data warehouse system helps in consolidated historical data analysis.


Why a Data Warehouse is separated from Operational Databases
A data warehouses is kept separate from operational databases due to the following reasons:
 An operational database is constructed for well-known tasks and workloads such as
searching particular records, indexing, etc. In contract, data warehouse queries are often
complex and they present a general form of data.
 Operational databases support concurrent processing of multiple transactions.
Concurrency control and recovery mechanisms are required for operational databases to
ensure robustness and consistency of the database.
 An operational database query allows to read and modify operations, while an OLAP query
needs only read only access of stored data.
 An operational database maintains current data. On the other hand, a data warehouse
maintains historical data.
Data Warehouse Features
The key features of a data warehouse are discussed below:
1. Subject Oriented − A data warehouse is subject oriented because it provides information
around a subject rather than the organization's ongoing operations. These subjects can be
product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the
ongoing operations, rather it focuses on modelling and analysis of data for decision
making.
2. Integrated − A data warehouse is constructed by integrating data from heterogeneous
sources such as relational databases, flat files, etc. This integration enhances the effective
analysis of data.
3. Time Variant − the data collected in a data warehouse is identified with a particular time
period. The data in a data warehouse provides information from the historical point of
view.
4. Non-volatile − Non-volatile means the previous data is not erased when new data is added
to it. A data warehouse is kept separate from the operational database and therefore
frequent changes in operational database is not reflected in the data warehouse .
Note − A data warehouse does not require transaction processing, recovery, and concurrency
controls, because it is physically stored and separate from the operational database.
Data Warehouse Applications
As discussed before, a data warehouse helps business executives to organize, analyze, and use
their data for decision making. A data warehouse serves as a sole part of a plan-execute-assess
[Evaluation of Business Performance]
3
[Introduction to Data Warehousing]

"closed-loop" feedback system for the enterprise management. Data warehouses are widely used
in the following fields:

 Financial services
 Banking services
 Consumer goods
 Retail sectors
 Controlled manufacturing

Types of Data Warehouse

Information processing, analytical processing, and data mining are the three types of data
warehouse applications that are discussed below:

 Information Processing − A data warehouse allows to process the data stored in it. The
data can be processed by means of querying, basic statistical analysis, reporting using
crosstabs, tables, charts, or graphs.
 Analytical Processing − A data warehouse supports analytical processing of the
information stored in it. The data can be analyzed by means of basic OLAP operations,
including slice-and-dice, drill down, drill up, and pivoting.
 Data Mining − Data mining supports knowledge discovery by finding hidden patterns and
associations, constructing analytical models, performing classification and prediction.
These mining results can be presented using the visualization tools.
Sr.No. Data Warehouse (OLAP) Operational
Database(OLTP)
1.
It involves historical processing of It involves day-to-day
information processing.
2. OLAP systems are used by knowledge OLTP systems are used by
workers such as executives, managers, clerks, DBAs, or database
and analysts. professionals.
3.
It is used to analyze the business. It is used to run the business.
4. It focuses on Information out. It focuses on Data in.
5. It is based on Star Schema, Snowflake It is based on Entity
Schema, and Fact Constellation Schema. Relationship Model.
6. It focuses on Information out. It is application oriented.
7. It contains historical data. It contains current data.
8. It provides summarized and It provides primitive and
consolidated data. highly detailed data
9. It provides summarized and It provides detailed and flat
multidimensional view of data. relational view of data.
10. The number of users is in hundreds. The number of users is in

Course Module
[Evaluation of Business Performance]
4
[Introduction to Data Warehousing]

thousands.
11. The number of records accessed is in The number of records
millions. accessed is in tens.
12. The database size is from 100GB to 100 The database size is from 100
TB. MB to 100 GB.
13. These are highly flexible. It provides high performance.

Terminologies
In this chapter, we will discuss some of the most commonly used terms in data
warehousing.
Metadata
Metadata is simply defined as data about data. The data that are used to represent other data is
known as metadata. For example, the index of a book serves as a metadata for the contents in the
book. In other words, we can say that metadata is the summarized data that leads us to the
detailed data.
In terms of data warehouse, we can define metadata as following:
 Metadata is a road-map to data warehouse.
 Metadata in data warehouse defines the warehouse objects.
 Metadata acts as a directory. This directory helps the decision support system to locate the
contents of a data warehouse.
Metadata Repository
Metadata repository is an integral part of a data warehouse system. It contains the following
metadata:
 Business metadata − It contains the data ownership information, business definition, and changing
policies.
 Operational metadata − It includes currency of data and data lineage. Currency of data refers to the
data being active, archived, or purged. Lineage of data means history of data migrated and
transformation applied on it.
 Data for mapping from operational environment to data warehouse − It metadata includes source
databases and their contents, data extraction, data partition, cleaning, transformation rules, data refresh
and purging rules.
 The algorithms for summarization − It includes dimension algorithms, data on granularity,
aggregation, summarizing, etc.
Data Cube
A data cube helps us represent data in multiple dimensions. It is defined by dimensions and
facts. The dimensions are the entities with respect to which an enterprise preserves the
records.
[Evaluation of Business Performance]
5
[Introduction to Data Warehousing]

Illustration of Data Cube


Suppose a company wants to keep track of sales records with the help of sales data warehouse
with respect to time, item, branch, and location. These dimensions allow to keep track of
monthly sales and at which branch the items were sold. There is a table associated with each
dimension. This table is known as dimension table. For example, "item" dimension table may
have attributes such as item_name, item_type, and item_brand.
The following table represents the 2-D view of Sales Data for a company with respect to time,
item, and location dimensions.

But here in this 2-D table, we have records with respect to time and item only. The sales for New
Delhi are shown with respect to time, and item dimensions according to type of items sold. If we
want to view the sales data with one more dimension, say, the location dimension, then the 3-D
view would be useful. The 3-D view of the sales data with respect to time, item, and location is
shown in the table below:

Course Module
[Evaluation of Business Performance]
6
[Introduction to Data Warehousing]

The above 3-D table can be represented as 3-D data cube as shown in the following figure:

Data Mart
Data marts contain a subset of organization-wide data that is valuable to specific groups of
people in an organization. In other words, a data mart contains only those data that is specific
to a particular group. For example, the marketing data mart may contain only data related to
items, customers, and sales. Data marts are confined to subjects.
Points to Remember About Data Marts
 Windows-based or Unix/Linux-based servers are used to implement data marts. They
are implemented on low-cost servers.
 The implementation cycle of a data mart is measured in short periods of time, i.e., in
weeks rather than months or years.
 The life cycle of data marts may be complex in the long run, if their planning and design
are not organization-wide.
 Data marts are small in size.
 Data marts are customized by department.
 The source of a data mart is departmentally structured data warehouse.
 Data marts are flexible.
The following figure shows a graphical representation of data marts.
[Evaluation of Business Performance]
7
[Introduction to Data Warehousing]

Virtual Warehouse
The view over an operational data warehouse is known as virtual warehouse. It is easy to build a
virtual warehouse. Building a virtual warehouse requires excess capacity on operational database
servers.

Course Module
[Evaluation of Business Performance]
8
[Introduction to Data Warehousing]

References and Supplementary Materials


Online References/reading Materials
1. Learn DWH, https://www.tutorialspoint.com/dwh/dwh_overview.htm; March 11,
2020
2. Introduction to data warehousing; https://www.google.com/url?
sa=t&source=web&rct=j&url=https://cdn.ttgtmedia.com/searchDataManagement/do
wnloads/Data_Warehouse_Design.pdf&ved=2ahUKEwih3OSJ67HoAhU9K6YKHcL6B6
gQFjABegQIBRAB&usg=AOvVaw2dBRRQJg-gU9VuCaMt4QYa; March 24, 2020
3. Concepts and fundamentals of data warehousing and OLAP;
https://www.google.com/url?
sa=t&source=web&rct=j&url=https://www.researchgate.net/publication/31985240
8_Concepts_and_Fundaments_of_Data_Warehousing_and_OLAP&ved=2ahUKEwj72p3
O67HoAhWrF6YKHcLpDFYQFjAAegQIARAB&usg=AOvVaw2hIqLpGz15J7rsNu1dMXt
a; March 24, 2020

You might also like