Ch1 Overview KDD - ML
Ch1 Overview KDD - ML
ARIN-2137
KNOWLEDGE DISCOVERY & DATA MINING
2
TOPIC
1
3
10°C
10 It’s cold
temperature
Value
Disseminate
Generate
DSS
MIS
EDP
Rapid Response
Volume
EDP: Electronic Data Processing
MIS: Management Information Systems
DSS: Decision Support Systems
8
non-trivial process
Justified patterns/models
valid
novel Previously unknown
Data Mining
2 Extract Patterns/Models
Collect and
Preprocess Data
1
Create/select
The KDD Process
target database
Data warehousing
Select sampling
1
technique and
sample data
3 4
Select DM Select DM Extract Test Refine
task (s) method (s) knowledge knowledge knowledge
Data Rich
Knowledge Poor
(the resource) KDD
Data Mining
Technology
Mature
Enabling Technology
(Interactive MIS, OLAP,
parallel computing, Web, etc.)
12
Challenges KDD
Scalability Dimensionality
Complex and
Data heterogeneous
ownership data
Data Mining
Regression
Dependency
discovering the finding a Modeling
most significant compact description
changes in the data for a subset of data
Deviation and
change detection Summarization
16
17
Linear regression
Logistic regression
Classification Trees
Naïve bayes
K nearest neighbours
Potential Applications
Business information Manufacturing information
23