0% found this document useful (0 votes)
94 views10 pages

Unit 1 Data Warehouse Fundamentals: Structure

This document provides an introduction and overview of data warehousing and data mining. It describes the key differences between online transaction processing (OLTP) systems and data warehouse systems. OLTP systems are designed for transactional operations like purchases, whereas data warehouses are designed to support analysis and decision making through consolidated data from various sources. The document outlines the objectives and characteristics of data warehouses, including being subject-oriented, integrated, non-volatile, and time-variant. Common functions that data warehouses support for analysis are also described such as roll-ups, drill-downs, pivots, slices and dices.

Uploaded by

Amit Parab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views10 pages

Unit 1 Data Warehouse Fundamentals: Structure

This document provides an introduction and overview of data warehousing and data mining. It describes the key differences between online transaction processing (OLTP) systems and data warehouse systems. OLTP systems are designed for transactional operations like purchases, whereas data warehouses are designed to support analysis and decision making through consolidated data from various sources. The document outlines the objectives and characteristics of data warehouses, including being subject-oriented, integrated, non-volatile, and time-variant. Common functions that data warehouses support for analysis are also described such as roll-ups, drill-downs, pivots, slices and dices.

Uploaded by

Amit Parab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Warehousing and Data Mining

Unit 1

Unit 1

Data Warehouse Fundamentals

Structure:
1.1 Introduction
Objectives
1.2 OLTP Systems
1.3 Characteristics & Functions of Data Warehouses
1.4 Advantages and Applications of Data Warehouse
1.5 Top- Down and Bottom-Up Development Methodology
1.6 Tools for Data warehouse development
1.7 Data Warehouse Types
1.8 Summary
1.9 Terminal Questions
1.10 Answers

1.1 Introduction
Data Warehouses and Data Warehouse applications are designed primarily
to support executives, senior managers, and business analysts in making
complex business decisions. Data Warehouse applications provide the
business community with access to accurate consolidated information from
various internal and external sources. The goal of using a Data Warehouse
is to have an efficient way of managing information and analyzing data. Now
days corporate organizations are generating Gigabytes of data daily and
storing these data in various database systems. But the question is, how
efficiently do people use such a huge amount of data to control and monitor
their business performance? Are they able to get timely information without
errors? Are they able get to useful data for analysis? The answer for these
questions is Data Warehouse. So what is Data Warehouse and how it will
be used? You will get the answers to these questions in the subsequent
paragraphs.
Objectives:
After studying this unit, you will be able to:
describe the differences between Online Analytical Processing (OLAP)
Systems and Data Warehouse systems
define the characteristics of a Data Warehouse
describe the functionality of Data Warehouse
Sikkim Manipal University

B1633

Page No.: 1

Data Warehousing and Data Mining

Unit 1

explain various methodologies for Data Warehouse Development


(Top-Down and Bottom-up)
describe the tools available for Data Warehouse development

1.2 OLTP Systems


Transaction processing is a type of computer processing that takes place in
the presence of a computer user. It provides for an immediate response to a
user request (or transaction). When a large number of transactions are
taken and stored to be dealt with at a later time (without the presence of a
user), the process is known as batch processing. Different examples of
transaction processing include automated teller machines, credit card
authorizations, online bill payments, self-checkout stations at grocery stores,
the trading of stocks over the Internet, and various other forms of electronic
commerce.
Transaction processing systems are the backbone of an organization
because they update data base constantly. At any given moment, if
someone needs an inventory balance, an account balance or the total
current value of a financial portfolio, the OLTP provides it. The OLTP market
is a demanding one, often requiring 24x7 operations.
Every business has to deal with some form of transactions. How a company
decides to manage these transactions can be an important factor in its
success. As a business grows, its number of transactions usually grows as
well. Careful planning must be done in order to ensure that transaction
management does not become too complex. Transaction processing is a
tool that can help growing businesses deal with their increasing number of
transactions.
Definition:
OLTP is an Online Transaction Processing System to handle day-to-day
business transactions. Examples are Railway Reservation Systems, Online
Store Purchases etc. These systems handle tremendous amount of data
daily. But the common questions that every business personnel get are as
follows:
Are these systems good enough for analyzing your business?
Can we predict our business for long period of time?
Can we forecast the business with these data?
Sikkim Manipal University

B1633

Page No.: 2

Data Warehousing and Data Mining

Unit 1

The OLTP systems alone cannot give the answers for all these questions.
And again the answer for all these systems is again a Data Warehouse. So
it is time know, the differences between OLTP and Data Warehouse
systems.
Differences between OLTP and Data Warehouse
Application databases are OLTP (On-Line Transaction Processing)
systems where every transaction has to be recorded as and when it
occurs. Consider the scenario where a bank ATM has disbursed cash to
a customer but was unable to record this event in the bank records. If
this happens frequently, the bank wouldn't stay in business for too long.
So the banking system is designed to make sure that every transaction
gets recorded within the time you stand before the ATM machine.
A Data Warehouse (DW) on the other end, is a database (yes, you are
right, it's a database) that is designed for facilitating querying and
analysis. Often designed as OLAP (On-Line Analytical Processing)
systems, these databases contain read-only data that can be queried
and analyzed far more efficiently as compared to your regular OLTP
application databases. In this sense an OLAP system is designed to be
read-optimized.
Separation from your application database also ensures that your
business intelligence solution is scalable (your bank and ATMs don't go
down just because the CFO asked for a report), better documented and
managed.
Creation of a DW leads to a direct increase in quality of analysis as the
table structures are simpler (you keep only the needed information in
simpler tables), standardized (well-documented table structures), and
often de-normalized (to reduce the linkages between tables and the
corresponding complexity of queries). Having a well-designed DW is the
foundation for successful BI (Business Intelligence)/Analytics initiatives,
which are built upon.
Data Warehouses usually store many months or years of data. This is to
support historical analysis. OLTP systems usually store data from only a
few weeks or months. The OLTP system stores only historical data as
needed to successfully meet the requirements of the current transaction.

Sikkim Manipal University

B1633

Page No.: 3

Data Warehousing and Data Mining

Unit 1

Table 1.1: OLTP VS Data Warehouses


Property

OLTP

Data Warehouse

Nature of Data Warehouse

3 NF

Multidimensional

Indexes

Few

Many

Joins

Many

Some

Duplicate data

Normalized

Demoralized

Aggregate data

Rare

Common

Queries

Mostly predefined

Mostly adhoc

Nature of queries

Mostly simple

Mostly complex

Updates

All the time

Not allowed, only refreshed

Historical data

Often not available

Essential

Self
1.
2.
3.
4.

Assessment Questions
OLTP stands for ________________________
OLTP handles day to day business transactions (true/false)
Updates on the Data Warehouse is allowed (true/false)
Data Warehouse is a database that is designed for facilitating
_________ and __________.

1.3 Characteristics & Functions of a Data Warehouse


What is a Data Warehouse?
A Data Warehouse is a relational database that is designed for query
and analysis rather than for transaction processing.
It usually contains historical data derived from transaction data, but it can
include data from other sources. It separates analysis workload from
transaction workload and enables an organization to consolidate data from
several sources.
Bill Inmon (known as the Father of Data Warehouse) describes Data
Warehouse characteristics, which are as follows:
Subject - Oriented
Data Warehouses are designed to help you analyze data. For example, to
learn more about your company's sales data, you can build a Warehouse
that concentrates on sales. Using this warehouse, you can answer
questions like, "Who was our best customer for this item last year?" This
Sikkim Manipal University

B1633

Page No.: 4

Data Warehousing and Data Mining

Unit 1

ability to define a Data Warehouse by subject matter, sales in this case


makes the Data Warehouse Subject Oriented.
Integrated
Integration is closely related to Subject Orientation. Data Warehouses must
put data from disparate sources into a consistent format. They must resolve
such problems as naming conflicts and inconsistencies among units of
measures. When they achieve this, they are said to be integrated.
Note:
During Integration time, several business rules and constraints have to
be incorporated.
Non - Volatile
Non - Volatile means that, once entered into the warehouse, data should not
change. This is logical because the purpose of a Warehouse is to enable
you to analyze what has occurred. This means that it cannot be deleted, and
must be held to be analyzed in the future
Time - Variant
In order to discover trends in business, analysts need large amounts of
data. This is very much in contrast to online transaction processing (OLTP)
systems, where performance requirements demand that historical data be
moved to an archive. A Data Warehouse's focus on change over time is
what is meant by the term, time variant.
Functionality of Data Warehouses
Data Warehouses exist to facilitate complex, data-intensive and frequent
adhoc queries. Data Warehouses must provide far greater and more
efficient query support than is demanded of transactional databases. Data
Warehouses provide the following functionality:
Roll-up: Data is summarized with increased generalization.
Drill-down: Increasing levels of detail are revealed.
Pivot: Cross tabulation that is, rotation is performed.
Slice and Dice: Performing projection operations on the dimensions.
Sorting: Data is sorted by ordinal value.
Selection: Data is available by value or range.
Sikkim Manipal University

B1633

Page No.: 5

Data Warehousing and Data Mining

Unit 1

Derived or Computer Attributes: Attributes are computed by


operations on stored data and values are derived.
Self Assessment Questions
5. Data Warehouse is defined as subject oriented, integrated, time variant
and ___________ .
6. Data Warehouse contains only aggregated data and individual
transactions (true/false)

1.4 Advantages and Applications of Data Warehouse


A Data Warehouse provides a common data model for data, regardless of
the data source. This makes it easier to report and analyze information than
it would be if multiple data models from disparate sources were used to
retrieve information such as sales invoices, order receipts, general ledger
charges, etc.
Prior to loading data into the Data Warehouse inconsistencies are
identified and resolved. This greatly simplifies reporting and analysis.
Information in the Data Warehouse is under the control of Data
Warehouse users so that, even if the source system data is purged over
time, the information in the warehouse can be stored safely for extended
periods of time.
Because they are separate from operational systems, Data Warehouses
provide fast retrieval of data without slowing down operational systems.
Data Warehouses facilitate Decision Support System applications such
as trend reports (e.g., the items with the most sales in a particular area
within the last two years), exception reports, and reports that show
actual performance versus goals.
Applications of Data Warehouse
The Data Warehouse is primarily used for,
Revenue Management
Customer-Relationship Management
Fraud Detection
Crew Payroll-Management Applications
Sales Analysis for Business Organization
Note: The primary purpose of a Data Warehouse is to analyze the Business
to meet future goals.
Sikkim Manipal University

B1633

Page No.: 6

Data Warehousing and Data Mining

Unit 1

1.5 Top- Down and Bottom - Up Development Methodology


Despite the fact that Data Warehouses can be designed in a number of
different ways, they all share a number of important characteristics. Most
Data Warehouses are Subject Oriented. This means that the information
that is in the Data Warehouse is stored in a way that allows it to be
connected to objects or event, which occur in reality.
Another characteristic that is frequently seen in Data Warehouses is called
Time Variant. A time variant Data Warehouse will allow changes in the
information to be monitored and recorded over time. All the programs that
are used by a particular institution will be stored in the Data Warehouse, and
it will be integrated together. The first Data Warehouses were developed in
the 1980s. As societies entered the information age, there was a large
demand for efficient methods of storing information.
Many of the systems that existed in the 1980s were not powerful enough to
store and manage large amounts of data. There were a number of reasons
for this. The systems that existed at the time took too long to report and
process information. Many of these systems were not designed to analyze
or report information. In addition to this, the computer programs that were
necessary for reporting information were both costly and slow. To solve
these problems, companies began designing computer databases that
placed an emphasis on managing and analyzing information. These were
the first Data Warehouses, and they could obtain data from a variety of
different sources, and some of these include PCs and mainframes.
Spreadsheet programs have also played an important role in the
development of Data Warehouses. By the end of the 1990s, the technology
had greatly advanced, and was much lower in cost. The technology has
continued to evolve to meet the demands of those who are looking for more
functions and speed. There are four advances in Data Warehouse
technology that has allowed it to evolve. These advances are offline
operational databases, real time Data Warehouses, offline Data
Warehouses, and the integrated Data Warehouses.
The offline operational database is a system in which the information within
the database of an operational system is copied to a server that is offline.
When this is done, the operational system will perform at a much higher
level. As the name implies, a real time Data Warehouse system will be
Sikkim Manipal University

B1633

Page No.: 7

Data Warehousing and Data Mining

Unit 1

updated every time an event occurs. For example, if a customer orders a


product, a real time Data Warehouse will automatically update the
information in real time.
With the integrated Data Warehouse, transactions will be transferred back to
the operational systems each day, and this will allow the data to easily be
analyzed by companies and organizations. There are a number of devices
that will be present in the typical Data Warehouse. Some of these devices
are the source data layer, reporting layer, Data Warehouse layer, and
transformation layer. There are a number different data sources for Data
Warehouses. Some popular forms of data sources are Teradata, Oracle
database, or Microsoft SQL Server.
Another important concept that is related to Data Warehouses is called data
transformation. As the name suggests, data transformation is a process in
which information transferred from specific sources is cleaned and loaded
into a repository.

1.6 Tools for Data Warehouse Development


To maintain a leading edge in this highly competitive business world, the
managers of every organization, big or small, strive to get the right
information at the right time. As a result, every organization is now
developing and deploying Data Marts (Data Mart is small Data Warehouse
which is having a limited scope, usually departmental level data) and Data
Warehouses to obtain the Business Intelligence to take strategic decisions.
The demand for software engineers with exposure to Data Warehouse
development tools has grown exponentially in the last few years. The
following are some of the popular tools for Data Warehouse development.
Business Objects
COGNOS
SAS/ Warehouse Administrator
SAS/Enterprise Studio
Informatica
Oracle Warehouse Builder etc

Sikkim Manipal University

B1633

Page No.: 8

Data Warehousing and Data Mining

Unit 1

1.7 Data Warehouse Types


Real Time Data Warehouse
Data warehouses at this stage are updated every time an operational
system performs a transaction (e.g., an order or a delivery or a
booking.).
Federated Data Warehouse
A Federated Data Warehouse is the integration of heterogeneous
business intelligence systems set to provide analytical capabilities
across different functions of an organization. Its a realistic method to
achieve the single version of the truth across the organization
Distributed Data Warehouse
Distributed Data Warehouses are those in which components are
distributed across a number of physical databases. These Data
Warehouses usually involve the most redundant data and, as a
consequence, most complex loading and updating process.
Self Assessment Questions
7. List the types of data warehouse.
8. _______________________ data Warehouse will allow changes in the
information to be monitored and recorded over time.

1.8 Summary

Data Warehouse is a type of computer database that is responsible for


collecting and storing the information of a particular organization. The
goal of using a Data Warehouse is to have an efficient way of managing
information and analyzing data
Transaction processing is a type of computer processing that takes
place in the presence of a computer user.
The main purpose of Data Warehouse is to manage and analyze the
data dynamically. But the OLTP systems are mainly used for transaction
processing/capturing. With OLTP we cannot achieve analysis on data.
Bill Inmon (called father of Data Warehouse) has described Data
Warehouse as, subject oriented, integrated, non-volatile, time-variant
data base for analytical purposes.
Data Warehouses can be developed using top-down or bottom-up
methodologies

Sikkim Manipal University

B1633

Page No.: 9

Data Warehousing and Data Mining

Unit 1

The popular data warehouse tools are Cognos, Informatica and SAS
etc.

1.9 Terminal Questions


1. What is a Data Warehouse? Mention its advantages.
2. Explain the functionality of a Data Warehouse.
3. Explain the Top-Down and Bottom-up Data Warehouse development
Methodologies.
4. What is a Data Mart?
5. Differentiate between OLTP and Data Warehouses.
6. Differentiate between Data marts and Data Warehouses.

1.10 Answers
Self Assessment Questions
1. On-Line Transaction Processing
2. True
3. False
4. Query and Analysis
5. Non-Volatile
6. True
7. Real time, federated and distributed
8. time variant
Terminal Questions
1. A Data Warehouse is a relational database that is designed for query
and analysis rather than for transaction processing Refer section 1.3 &
1.4
2. Data Warehouses are designed to help you analyze data. 2. Refer
section 1.3
3. Data Warehouses can be developed using top-down or bottom-up
methodologies Refer section 1.5
4. Data Mart is small Data Warehouse which is having a limited scope,
usually departmental level data Refer section 1.6.
5. On-Line Analytical Processing) systems, these databases contain readonly data that can be queried Refer section 1.2
6. Refer section 1.6
Sikkim Manipal University

B1633

Page No.: 10

You might also like