UNESCO Courses: Module On Knowledge Discovery and Data Mining
UNESCO Courses: Module On Knowledge Discovery and Data Mining
This presentation summarizes the content and organization of lectures in module Knowledge Discovery and Data Mining
2
Objectives
This course provides:
fundamental techniques of knowledge issues in KDD practical use and tools case-studies of KDD application
3
experience of computer use basis of databases and statistics programming skill for advanced levels
4
Prerequisite
and Content
Introduction
to Lectures
and
Conclusion
This presentation summarizes the content and organization of lectures in module Knowledge Discovery and Data Mining
6
KDD: A Definition
KDD is the automatic extraction of non-obvious, hidden knowledge from large volumes of data.
106-1012 bytes: never see the whole data set or put it in the memory of computers
Knowledge is integrated information, including facts and their relations, which have been perceived, discovered, or learned as our mental pictures. Knowledge can be considered data at a high level of abstraction and generalization.
10
16, M, 0, 32, 32, 0, 0, 0, SUBACUTE, 38, 2, 0, 0, 15, -, +, 12600, 4, 0,abnormal, abnormal, +, 41, 39, 2, 44, 57, F, -, ABPC+CZX, ?, ? ,negative, ?, n, n, ABSCESS, VIRUS ...
Numerical attribute
categorical attribute
missing values
class labels
IF cell_poly <= 220 AND Risk = n AND Loc_dat = + AND Nausea > 15 THEN Prediction = VIRUS [87,5%] [confidence, predictive accuracy]
11
People gathered and stored so much data because they think some valuable assets are implicitly coded within it. Raw data is rarely of direct benefit.
?
knowledge base inference engine
Its true value depends on the ability to extract information useful for decision support. Tradition: via knowledge engineers Impractical Manual Data Analysis New trend: via automatic programs
12
Generate
DSS
MIS
Rapid Response
EDP
Volume
EDP: Electronic Data Processing MIS: Management Information Systems DSS: Decision Support Systems
13
Data Mining
2
Extract Patterns/Models
Data warehousing
2
Find important attributes & value ranges
Normalize values
Transform values
3
Select DM task (s) Select DM method (s) Extract knowledge Test knowledge
4
Refine knowledge
Statistics
KDD
Infer info from data (deduction & induction, mainly numeric data)
Databases
Store, access, search, update data (deduction)
Machine Learning
Computer algorithms that improve automatically through experience (mainly induction, symbolic data)
18
Potential Applications
Business information Manufacturing information
- Marketing and sales data analysis - Investment analysis - Loan approval - Fraud detection - etc.
Scientific information
-
Personal information
KDD
Data Mining Technology Mature
Classification
?
Clustering
Regression
discovering the most significant changes in the data
Dependency Modeling
Summarization
24
Classification
What factors determine cancerous cells?
Examples
Data
Mining Algorithm
Classification Algorithm
General patterns
- Rule Induction - Decision tree - Neural Network
25
(certainty = 92%)
(certainty = 87%)
26
#nuclei=1
#nuclei=2
#nuclei=1
#nuclei=2
#tails=1
#tails=2
cancerous
healthy
cancerous
27
Healthy
Cancerous
28