Eval of Business Performance - Module 1
Eval of Business Performance - Module 1
1
[Introduction to Data Warehousing]
Introduction
The term "Data Warehouse" was first coined by Bill Inmon in 1990.
According to Inmon, a data warehouse is a subject oriented, integrated, time-
variant, and non-volatile collection of data. This data helps analysts to take
informed decisions in an organization.
Course Module
[Evaluation of Business Performance]
2
[Introduction to Data Warehousing]
A data warehouse helps executives to organize, understand, and use their data to take
strategic decisions.
"closed-loop" feedback system for the enterprise management. Data warehouses are widely used
in the following fields:
Financial services
Banking services
Consumer goods
Retail sectors
Controlled manufacturing
Information processing, analytical processing, and data mining are the three types of data
warehouse applications that are discussed below:
Information Processing − A data warehouse allows to process the data stored in it. The
data can be processed by means of querying, basic statistical analysis, reporting using
crosstabs, tables, charts, or graphs.
Analytical Processing − A data warehouse supports analytical processing of the
information stored in it. The data can be analyzed by means of basic OLAP operations,
including slice-and-dice, drill down, drill up, and pivoting.
Data Mining − Data mining supports knowledge discovery by finding hidden patterns and
associations, constructing analytical models, performing classification and prediction.
These mining results can be presented using the visualization tools.
Sr.No. Data Warehouse (OLAP) Operational
Database(OLTP)
1.
It involves historical processing of It involves day-to-day
information processing.
2. OLAP systems are used by knowledge OLTP systems are used by
workers such as executives, managers, clerks, DBAs, or database
and analysts. professionals.
3.
It is used to analyze the business. It is used to run the business.
4. It focuses on Information out. It focuses on Data in.
5. It is based on Star Schema, Snowflake It is based on Entity
Schema, and Fact Constellation Schema. Relationship Model.
6. It focuses on Information out. It is application oriented.
7. It contains historical data. It contains current data.
8. It provides summarized and It provides primitive and
consolidated data. highly detailed data
9. It provides summarized and It provides detailed and flat
multidimensional view of data. relational view of data.
10. The number of users is in hundreds. The number of users is in
Course Module
[Evaluation of Business Performance]
4
[Introduction to Data Warehousing]
thousands.
11. The number of records accessed is in The number of records
millions. accessed is in tens.
12. The database size is from 100GB to 100 The database size is from 100
TB. MB to 100 GB.
13. These are highly flexible. It provides high performance.
Terminologies
In this chapter, we will discuss some of the most commonly used terms in data
warehousing.
Metadata
Metadata is simply defined as data about data. The data that are used to represent other data is
known as metadata. For example, the index of a book serves as a metadata for the contents in the
book. In other words, we can say that metadata is the summarized data that leads us to the
detailed data.
In terms of data warehouse, we can define metadata as following:
Metadata is a road-map to data warehouse.
Metadata in data warehouse defines the warehouse objects.
Metadata acts as a directory. This directory helps the decision support system to locate the
contents of a data warehouse.
Metadata Repository
Metadata repository is an integral part of a data warehouse system. It contains the following
metadata:
Business metadata − It contains the data ownership information, business definition, and changing
policies.
Operational metadata − It includes currency of data and data lineage. Currency of data refers to the
data being active, archived, or purged. Lineage of data means history of data migrated and
transformation applied on it.
Data for mapping from operational environment to data warehouse − It metadata includes source
databases and their contents, data extraction, data partition, cleaning, transformation rules, data refresh
and purging rules.
The algorithms for summarization − It includes dimension algorithms, data on granularity,
aggregation, summarizing, etc.
Data Cube
A data cube helps us represent data in multiple dimensions. It is defined by dimensions and
facts. The dimensions are the entities with respect to which an enterprise preserves the
records.
[Evaluation of Business Performance]
5
[Introduction to Data Warehousing]
But here in this 2-D table, we have records with respect to time and item only. The sales for New
Delhi are shown with respect to time, and item dimensions according to type of items sold. If we
want to view the sales data with one more dimension, say, the location dimension, then the 3-D
view would be useful. The 3-D view of the sales data with respect to time, item, and location is
shown in the table below:
Course Module
[Evaluation of Business Performance]
6
[Introduction to Data Warehousing]
The above 3-D table can be represented as 3-D data cube as shown in the following figure:
Data Mart
Data marts contain a subset of organization-wide data that is valuable to specific groups of
people in an organization. In other words, a data mart contains only those data that is specific
to a particular group. For example, the marketing data mart may contain only data related to
items, customers, and sales. Data marts are confined to subjects.
Points to Remember About Data Marts
Windows-based or Unix/Linux-based servers are used to implement data marts. They
are implemented on low-cost servers.
The implementation cycle of a data mart is measured in short periods of time, i.e., in
weeks rather than months or years.
The life cycle of data marts may be complex in the long run, if their planning and design
are not organization-wide.
Data marts are small in size.
Data marts are customized by department.
The source of a data mart is departmentally structured data warehouse.
Data marts are flexible.
The following figure shows a graphical representation of data marts.
[Evaluation of Business Performance]
7
[Introduction to Data Warehousing]
Virtual Warehouse
The view over an operational data warehouse is known as virtual warehouse. It is easy to build a
virtual warehouse. Building a virtual warehouse requires excess capacity on operational database
servers.
Course Module
[Evaluation of Business Performance]
8
[Introduction to Data Warehousing]