Data Mining
Data Mining
Sameer Deshmukh
Outline
Data Mining
Data Warehousing
Q ‘n’ A
Conclusion
Historical Perspective
1960s:
Data collection, database creation, IMS and
network DBMS
1970s:
Relational data model, relational DBMS
implementation
1980s:
RDBMS, advanced data models (extended-
relational, OO, deductive, etc.) and
application-oriented DBMS (spatial, scientific,
engineering, etc.)
1990s—2000s:
Data mining and data warehousing,
multimedia databases, and Web databases
Data Mining
Definition
Data mining automates the process of locating and
extracting the hidden patterns and knowledge
In simple words
Searching for new knowledge
Why we need data mining
Predictive Model
Descriptive Model
Predictive Model
Prediction
determining how certain attributes will behave in the future
Regression
mapping of data item to real valued prediction variable
Classification
categorization of data based on combinations of attributes
Time Series analysis
examining values of attributes with respect to time
Descriptive Model
Clustering
most closely data clubbed together into clusters
Data Summarization
extracting representative information about database
Association Rules
associativity defined between data items to form relationship
Sequence Discovery
it is used to determine sequential patterns in data based on
time sequence of action
Data mining process
Problem Definition
Creating Database
Exploring database
Evaluation Phase
How to distinguish?
Purpose
Database : Transactional
Applications.
Functionality
Optimized for data retrieval, not routine transaction
processing.
Structure
Performance
Data Warehousing
Modern Organization’s needs ?
Companies spread world wide.
Have
So many Data Sources
Different Operational Systems
Different Schemas
Need Data for
Complex Analysis
Knowledge Discovery
Decision Making.
Solution ???
Data Warehousing
Solution…Data Warehouse.
Data Warehouse . Definition ??
No single definition….
Data Warehouse
Collection of Information gathered from multiple sources,
stored under unified schema, at a single site & mainly
intended for decision support applications.
A subject oriented, integrated, nonvolatile, time-variant,
collection of data in support of management’s decision.
~ W.H. Inmon
Warehouses are Very Large
Databases
35%
30%
25%
Respondents
20%
15%
10%
Initial
5% Projected 2Q96
Data
Data Data Loaders OLAP
Source 2
Data
.
.
. DSSI
ESI
DataSource
n
Data Warehousing
Data Warehouse building
When & how to gather data
Source-driven architecture
Destination-driven architecture
What schema to use
Data Cleansing
Task of correcting and processing data
How to propagate updates
What data to summarize
And many more……
Summary
What is Data Warehousing?
Data Warehouse.
Data Warehouse – Architecture
Data Warehouse vs. Data Mining
Conclusion
Your data is full of undiscovered gems;
start digging!
References