0% found this document useful (0 votes)
20 views24 pages

Introduction To Data Warehousing

Uploaded by

Filmy Studio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views24 pages

Introduction To Data Warehousing

Uploaded by

Filmy Studio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Introduction to

Data Warehousing
From DBMS to Decision Support
• DBMSs widely used to maintain transactional
data
• Attempts to use of these data for analysis,
exploration, identification of trends etc. has led
to Decision Support Systems.
• Rapid Growth since mid 70’s
• DBMSs vendors have answered this trend by
adding new features to existing products
• Rarely enough
DBs for Decision Support
• Trend towards Data Warehousing
• Data Warehousing – consolidation of data
from several databases which are in turn
maintained by individual business units
along with historical and summary
information
Characteristics of TPSs

Characteristic OLTP

Typical operation Update

Level of analytical requirements Low

Screens Unchanging

Amount of data per transaction Small

Data level Detailed

Age of data Current

Orientation Records
TPS
TPS vs
vs Decision
Decision Support
Support

OLTP Complex Analysis

Information to support Historical information


day-to-day service to analyze

Data stored at transaction Data needs to be integrated


level

Database design: Normalized Database design:


Denormalized, star schema
MIS and Decision Support

Ad hoc access

Production
platforms

Operational reports Decision makers

• MIS systems provided business data


• Reports were developed on request
• Reports provided little analysis capability
• no personal ad hoc access to data
Analyzing Data from
Operational Systems
• Data structures are complex
• Systems are designed for high performance and
ERP
throughput
• Data is not meaningfully represented
• Data is dispersed
• TPS systems unsuitable for intensive queries

Production
platforms

Operational reports
Data Extract Processing

Operational systems Extracts Decision makers

• End user computing offloaded from


the operational environment
• User’s own data
Management Issues

Operational systems Extracts Decision makers

Extract explosion
• Duplicated effort
• Multiple technologies
• Obsolete reports
• No metadata
Data Quality Issues
• No common time basis
• Different calculation algorithms
• Different levels of extraction
• Different levels of granularity
• Different data field names
• Different data field meanings
• Missing information
• No data correction rules
• No drill-down capability
From Extract to Warehouse DSS

Internal and Data warehouse Decision makers


external systems

• Controlled
• Reliable
• Quality information
• Single source of
data
Data Warehousing Architecture
External Data Sources

Visualisation

Extract Clean
Metadata
Transform Load respository Serves
OLAP
Refresh

Operational Databases Data Warehouse


Data Mining
Business Motivators
• Provide superior services and
products
• Know the business
• New products
• Invest in customers
• Retain customers
• Invest in technology
• Reinvent to face new challenges
Centralised data warehouse
Corporate
Analyst
data-
warehouse
Mainframe Server Analyst
Corporate
Financial
Marketing
Analyst
Manufacturing
Distribution

Federated data warehouse


Financial Analyst

Corporate Marketing Analyst


data
Mainframe warehouse Manufacturing Analyst

Distribution Analyst
Tiered data warehouse

Mainframe

Tier 3 (detailed data) Corporate data warehouse

Tier 2 (summarized data) Local data mart

Tier 1 (highly summarized data) Workstation Analyst


Data Warehouses Vs Data
Marts
Data
Data Mart
Warehouse

Property Data Warehouse Data Mart

Scope Enterprise Department

Subjects Multiple Single-subject

Data Source Many Few

Size (typical) 100 GB to > 1 TB < 100 GB

Implementation time Months to years Months


End-user Access Tools
• High performance is achieved by pre-planning the
requirements for joins, summations, and periodic
reports by end-users.

• There are five main groups of access tools:


– Data reporting and query tools
– Application development tools
– Executive information system (EIS) tools
– Online analytical processing (OLAP) tools
– Data mining tools
Data Usage - $1000 questions
Verification Discovery
What is the average sale for What is the best predictor
in-store and catalog of sales?
customers?
What is the average high What are the best
school GPA of students who predictors of college
graduate from college graduation?
compared to those who do
not?

Need to complement RDBMS technology with a flexible,


multidimensional view of data
The Functionality of OLAP
• Rotate and drill down
• Create and examine calculated data
• Determine comparative or relative
differences.
• Perform exception and trend analysis.
• Perform advanced analytical functions
The star structure

Product Region
Model Nation
Type Facts District
Color Product Dealer
Region
Time
Channel
Revenue
Channel Expenses Time
Units Week
Year
Multidimensional Database
Customer Store Model Store

Time Time

SALES FINANCE

Product

The data is found at the intersection of


dimensions.
Data Mining
Data mining functions
• Associations
– 85 percent of customers who buy a certain brand of wine also buy a
certain type of pasta
• Sequential patterns
– 32 percent of female customers who order a red jacket within six months
buy a gray skirt
• Classifying
– Frequent customers are those with incomes about $50,000 and having
two or more children
• Clustering
– Market segmentation
• Predicting
– predict the revenue value of a new customer based on that personal
demographic variables

You might also like