Data Warehousing and Data Mining
Data Warehousing and Data Mining
Overview
Introduction Data Warehousing Data Warehousing V/S OLAP Data Mining
Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases
30%
25% Respondents 20% 15% 10% Initial 5% 0%
Projected 2Q96
Source: META Group, Inc.
5GB
10-19GB
5-9GB
50-99GB
250-499GB
500GB-1TB
5
20-49GB
100-249GB
non-volatile
Metadata Repository
9
Application Areas
Industry Finance Insurance Telecommunication Transport Consumer goods Data Service providers Utilities Application Credit Card Analysis Claims, Fraud Analysis Call record analysis Logistics management promotion analysis Value added data Power usage analysis
10
13
Warehouse
Subject Oriented Used to analyze business Summarized and refined Snapshot data Integrated Data
15
Data Warehouse
Performance relaxed Large volumes accessed at a time(millions) Mostly Read (Batch Update) Redundancy present Database Size 100 GB - few terabytes
16
18
. Discovery Model
19
20
Select
Are the data adequate to describe the phenomena the data mining analysis is attempting to model? Can you enhance internal customer records with external lifestyle and demographic data? Are the data stablewill the mined attributes be the same after the analysis? If you are merging databases can you find a common field for linking them? How current and relevant are the data to the business goal?
21
Prepare
Establish strategies for handling missing data, extraneous noise, and outliers Identify redundant variables in the dataset and decide which fields to exclude Decide on a log or square transformation, if necessary Visually inspect the dataset to get a feel for the database Determine the distribution frequencies of the data
22
23
26
27
Neural Network
28
Decision Trees
29