Data Warehousing & DATA MINING (SE-409) : Lecture-2
Data Warehousing & DATA MINING (SE-409) : Lecture-2
MINING (SE-409)
Lecture-2
Introduction and Background
Huma Ayub
Software Engineering department
Program
Classical SDLC
Requirements gathering
Analysis
Design
Programming
Testing
Integration
Implementation
DWH-Ahsan Abdullah 3
How is it Different?
• Does not follows the traditional development
model
DWH
Program
Requirements
DWH SDLC (CLDS)
Implement warehouse
Integrate data
Test for biasness
Program w.r.t data
Design DSS system
Analyze results
Understand requirement
DWH-Ahsan Abdullah 4
Data Warehouse Vs. OLTP
DWH-Ahsan Abdullah 5
Data Warehouse Vs. OLTP
DWH
Select balance, age, sal, gender from
customer_table, tx_table
Where age between (30 and 40) and
Education = ‘graduate’ and
CustID.customer_table =
Customer_ID.tx_table;
DWH-Ahsan Abdullah 6
Data Warehouse Vs. OLTP
OLTP: OnLine Transaction Processing (MIS or Database System)
OLTP DWH
Primary key used Primary key NOT used
No concept of Primary Index Primary index used
Few rows returned Many rows returned
DWH-Ahsan Abdullah 7
Putting the pieces together
Semistructured MOLAP
Sources Query/Reporting
www data
Meta
Data
Extract
Data
Analysis
Archived
Transform
Load Warehouse
data
(ETL) ROLAP Business
IT Data Mining
Users
Users
Operational
Data Bases
Data sources Data Marts Tools
Business Users
DWH-Ahsan Abdullah 8
Types & Typical Applications of DWH
DWH-Ahsan Abdullah 9
Types of data warehouse
• Financial
• Telecommunication
• Insurance
• Human Resource
• Global
• Exploratory
DWH-Ahsan Abdullah 10
Types of data warehouse
Financial
First data warehouse that an organization
builds. This is appealing because:
DWH-Ahsan Abdullah 11
Types of data warehouse
Telecommunication
Controlled by complete volume of data.
DWH-Ahsan Abdullah 12
Types of data warehouse
Insurance
Insurance data warehouses are similar to other
data warehouses BUT with a few exceptions.
Stored data that is very, very old, used for actuarial
processing.(RISK ASSESMENT)
Typical business may change dramatically over
last 40-50 years, but not insurance.
In retailing or telecomm there are a few important
dates, but in the insurance environment there are
many dates of many kinds.
DWH-Ahsan Abdullah 13
Types of data warehouse
Insurance
Insurance data warehouses are similar to other
data warehouses BUT with a few exceptions.
Long operational business cycles, in years.
Processing time in months. Thus the operating
speed is different.
Transactions are not gathered and processed, but
are in kind of “frozen”.
Thus a very unique approach of design &
implementation.
DWH-Ahsan Abdullah 14
Typical Applications
Impact on organization’s core business is to streamline
and maximize profitability.
• Fraud detection.
• Profitability analysis.
• Direct mail/database marketing.
• Credit risk prediction.
• Customer retention modeling.
• Yield management.
• Inventory management.
DWH-Ahsan Abdullah 15
Typical Applications
Fraud detection
DWH-Ahsan Abdullah 16
Typical Applications
Profitability Analysis
• Every Banks know if they are profitable or not.
• Don’t know which customers are profitable.
• Typically more than 50% are NOT profitable.
• Don’t know which one?
• Balance is not enough, transactional behavior is
the key.
• Restructure products and pricing strategies.
• Life-time profitability models (next 3-5 years).
DWH-Ahsan Abdullah 17
Typical Applications
Direct mail marketing
• Targeted marketing.
• Offering high bandwidth package NOT to all
users.
• Know from call detail records of web surfing.
• Saves marketing expense, saving pennies.
DWH-Ahsan Abdullah 18
Typical Applications
Credit risk prediction
DWH-Ahsan Abdullah 19
Normalization
Ahsan Abdullah 20
Normalization
What is normalization?
What are the goals of normalization?
Eliminate redundant data.
Ensure data dependencies make sense.
Ahsan Abdullah 21
Rules for First Normal Form
The first normal form expects you to follow a few simple rules while designing your
database, and they are:
For example: If you have a column dob to save date of births of a set of people,
then you cannot or you must not save 'names' of some of them in that column along
with 'date of birth' of others in that column. It should hold only 'date of birth' for all
the records/rows.
Rules for First Normal Form
Rule 3: Unique name for Attributes/Columns
This rule expects that each column in a table should have a unique name. This is to
avoid confusion at the time of retrieving data or performing any other operation on
the stored data.
If one or more columns have same name, then the DBMS system will be left
confused.