0% found this document useful (0 votes)
72 views30 pages

Various Applications of Data Warehouse

The document discusses various applications of data warehousing. It begins by describing the problems with only having transactional systems and the need for a centralized data warehouse. It then defines a data warehouse as a managed database that is subject-oriented, integrated, time-variant, and non-volatile. The document outlines the key components of a data warehouse including hardware, database management system, front-end access tools, and other tools. It also describes the process of getting data into the warehouse through extraction, transformation, and loading steps. Finally, it discusses the benefits and costs of implementing a data warehouse.

Uploaded by

Rafiul Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views30 pages

Various Applications of Data Warehouse

The document discusses various applications of data warehousing. It begins by describing the problems with only having transactional systems and the need for a centralized data warehouse. It then defines a data warehouse as a managed database that is subject-oriented, integrated, time-variant, and non-volatile. The document outlines the key components of a data warehouse including hardware, database management system, front-end access tools, and other tools. It also describes the process of getting data into the warehouse through extraction, transformation, and loading steps. Finally, it discusses the benefits and costs of implementing a data warehouse.

Uploaded by

Rafiul Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Various Applications of Data

Warehouse

Rafiul Hasan Khan


ID: 1220617
420-DF1-GC-Datawarehouse
Introduction

• The Past and The Problem


• What is a Data Warehouse?
• Components of a Data Warehouse
• OLAP, Metadata, Data Mining
• Getting the Data in
• Benefits vs. Costs
• Conclusion
The Past and The Problem

• Only had scattered transactional systems in the


organization – data spread among different
systems
• Transactional systems were not designed for
decision support analysis
• Data constantly changes on transactional systems
• Lack of historical data
• Often resources were taxed with both needs on
the same systems
The Past and The Problem

• Operational databases are designed to keep transactions


from daily operations. It is optimized to efficiently
update or create individual records
• A database for analysis on the other hand needs to be
geared toward flexible requests or queries (Ad hoc,
statistical analysis)
What is a Data Warehouse?

Data warehousing is an architectural model designed to


gather data from various sources into a single unified
data model for analysis purposes.
What Is a Data Warehouse?

Term was introduced in 1990 by William Immon


A managed database in which the data is:
• Subject Oriented

• Integrated

• Time Variant

• Non Volatile
Subject Oriented

• Organized around major subject areas in the enterprise


(Sales, Inventory, Financial, etc.)
• Only includes data which is used in the decision making
processes
• Elements used for transactional processing are removed
Integrated

• Data from different sources are brought together and


consolidated
• The data is cleaned and made consistent

Example – Bank Systems using Different


Codes
Loan Department – COMM
Transactional System - C
Time Variant

• Data in a Data Warehouse contains both current and


historical information
• Operational Systems contain only current data

Systems typically retain data:


Operational Systems – 60 to 90 Days
Data Warehouse – 5 to 10 Years
Non Volatile

• Operational systems have continually changing data


• Data Warehouses continually absorb current data and
integrates it with its existing data (Aggregate or
Summary tables)

Example of volatile data would be an account


balance at a bank
What Is a Data Warehouse?

• Not a product, it is a process


• Combination of hardware and software
• Concept of a Data Warehouse is not new, but the
technology that allows it is
What Is a Data Warehouse?

Can often be set up as one VLDB (Very Large Database) or a


collection of subject areas called Data Marts.

There are now tools which “unify” these Data Marts and
make it appear as a single database.
What Is a Data Warehouse?
Transformation of Data to Information

Information
Exploration / Analysis

SQL reporting

Relational Warehouse

Cleansing / & Normalization

Data Transaction Processing


Components of a Data
Warehouse
Four General Components:
• Hardware
• DBMS - Database Management System
• Front End Access Tools
• Other Tools

In all components scalability is vital


Scalability is the ability to grow as your data and
processing needs increase
Components of a Data
Warehouse - Hardware
• Power - # of Processors, Memory, I/O
Bandwidth, and Speed of the Bus
• Availability – Redundant equipment
• Disk Storage - Speed and enough
storage for the loaded data set
• Backup Solution - Automated and be
able to allow for incremental backups
and archiving older data
Components of a Data
Warehouse - DBMS
• Physical storage capacity of the DBMS
• Loading, indexing, and processing speed
• Availability
• Handle your data needs
• Operational integrity, reliability, and manageability
Components of a Data Warehouse
- Front End & Other Tools

• Query Tools (SQL & GUI based)


• Report Writers
• Metadata Repositories
• OLAP (Online Analytical Processing)
• Data Mining Products
Components of a Data Warehouse
– Metadata Repositories

Metadata is Data about Data. Users and


Developers often need a way to find
information on the data they use.
Information can include:
• Source System(s) of the Data, contact
information
• Related tables or subject areas
• Programs or Processes which use the data
• Population rules (Update or Insert and
how often)
• Status of the Data Warehouse’s processing
and condition
Components of a Data
Warehouse – OLAP Tools
OLAP - Online Analytical Processing. It works by
aggregating detail data and looks at it by
dimensions
• Gives the ability to “Drill Down” in to the detail
data
• Decision Support Analysis Tool
• Multidimensional DB focusing on retrieval of
precalculated data
• Ends the “big reports” with large amounts of
detailed data
• These tools are often graphical and can run on a
“thin client” such as a web browser
Components of a Data
Warehouse – Data Mining
• Answers the questions you didn’t know to ask
• Analyzes great amounts of data (usually contained in a
Data Warehouse) and looks for trends in the data
• Technology now allows us to do this better than in the
past
Components of a Data
Warehouse – Data Mining
• Most famous example is the Huggies - Heineken case
• Used in Retail sector to analyze buying habits
• Used in financial areas to detect fraud
• Used in the stock market to find trends
• Used in scientific research
• Used in national security
Getting the Data In

• Data will come from multiple databases and files within


the organization
• Also can come from outside sources

• Examples:
• Weather Reports
• Demographic information
by Zip Code
Getting the Data In

Three Steps :

1. Extraction Phase

2. Transformation Phase

3. Loading Phase
Getting the Data In

Extraction Phase:
• Source systems export data via files or populates
directly when the databases can “talk” to each other
• Transfers them to the Data Warehouse server and puts it
into some sort of staging area
Getting the Data In

Transformation Phase:
• Takes data and turns it into a form that is
suitable for insertion into the warehouse
• Combines related data
• Removes redundancies
• Common Codes (Commercial Customer)
• Spelling Mistakes (Lozenges)
• Consistency (PA,Pa,Penna,Pennsylvania)
• Formatting (Addresses)
Getting the Data In

Loading Phase:
• Places the cleaned data into the DBMS in its final,
useable form
• Compare data from source systems and the Data
Warehouse
• Document the load information for the users
Benefits vs. Costs
Benefits

• Creates a single point for all data


• System is optimized and designed specifically for
analysis
• Access data without impacting the operational systems
• Users can access the data directly without the direct
help from IT dept
Costs

• Cost of implementation & maintenance (hardware,


software, and staffing)
• Lack of compatibility between components
• Data from many sources are hard to combine, data
integrity issues
• Bad designs and practices can lead to costly failures
Conclusion

• What is a Data Warehouse?


• Components of a Data Warehouse
• How the Data Gets In
• OLAP, Metadata, and Data Mining
• Benefits vs. Costs

You might also like