0% found this document useful (0 votes)
38 views28 pages

SIMS 422: Knowledge Inference Systems & Applications

This document outlines a presentation on knowledge inference systems and applications. It discusses the objectives of the course which are to provide fundamental techniques of knowledge discovery and data mining, issues in practical use and tools, and case studies of applications. It also outlines the prerequisite knowledge expected and content to be covered, including an overview of KDD, mining association rules, decision trees, and cluster analysis. Potential applications of KDD discussed include business, manufacturing, scientific, and personal information analysis.

Uploaded by

Harihar kalia
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views28 pages

SIMS 422: Knowledge Inference Systems & Applications

This document outlines a presentation on knowledge inference systems and applications. It discusses the objectives of the course which are to provide fundamental techniques of knowledge discovery and data mining, issues in practical use and tools, and case studies of applications. It also outlines the prerequisite knowledge expected and content to be covered, including an overview of KDD, mining association rules, decision trees, and cluster analysis. Potential applications of KDD discussed include business, manufacturing, scientific, and personal information analysis.

Uploaded by

Harihar kalia
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

SIMS 422

Knowledge Inference
Systems & Applications

Slides by H. T. Bao

1
Outline of the presentation

Objectives, Brief Discussion


Prerequisite Introduction and
and Content to Lectures Conclusion

2
Objectives
This course provides:

• fundamental techniques of knowledge


discovery and data mining (KDD)

• issues in KDD practical use and tools


• case-studies of KDD application
3
Prerequisite for the course
Nothing special but the followings are
expected:

• experience of computer use


• basis of databases, statistics,
and mathematics

• programming skills
4
Content of the course
• Overview of KDD
• Mining association rules
• Mining action rules
• Decision tree induction
• Distributed knowledge systems and distributed
query answering
• Cluster analysis
5
Outline of the presentation

Objectives, Brief Discussion


Prerequisite Introduction and
and Content to Lectures Conclusion

6
Brief introduction to lectures
Overview of KDD

7
Lecture 1: Overview of KDD
1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD

8
KDD: A Definition
KDD is the automatic extraction of non-obvious,
hidden knowledge from large volumes of data.

106-1012 bytes: What is the knowledge?


we never see the whole Then run Data
How to represent
data set, so will put it in Mining algorithms
and use it?
the memory of computers

9
Data, Information, Knowledge
We often see data as a string of bits, or numbers and
symbols, or “objects” which we collect daily.

Information is data stripped of redundancy, and reduced


to the minimum necessary to characterize the data.

Knowledge is integrated information, including facts and


their relations, which have been perceived, discovered,
or learned as our “mental pictures”.
Knowledge can be considered data at
a high level of abstraction and generalization.

10
From Data to Knowledge
Medical Data by Dr. Tsumoto, Tokyo Med. & Dent. Univ., 38 attributes
...
10, M, 0, 10, 10, 0, 0, 0, SUBACUTE, 37, 2, 1, 0,15,-,-, 6000, 2, 0, abnormal, abnormal,-, 2852, 2148, 712, 97,
49, F,-,multiple,,2137, negative, n, n, ABSCESS,VIRUS
12, M, 0, 5, 5, 0, 0, 0, ACUTE, 38.5, 2, 1, 0,15, -,-, 10700,4,0,normal, abnormal, +, 1080, 680, 400, 71, 59,
F,-,ABPC+CZX,, 70, negative, n, n, n, BACTERIA, BACTERIA
15, M, 0, 3, 2, 3, 0, 0, ACUTE, 39.3, 3, 1, 0,15, -, -, 6000, 0,0, normal, abnormal, +, 1124, 622, 502, 47, 63, F,
-,FMOX+AMK, , 48, negative, n, n, n, BACTE(E), BACTERIA
16, M, 0, 32, 32, 0, 0, 0, SUBACUTE, 38, 2, 0,   0, 15, -, +, 12600, 4, 0,abnormal, abnormal, +, 41, 39, 2, 44,
57, F, -, ABPC+CZX, ?, ? ,negative, ?, n, n, ABSCESS,   VIRUS
...

Numerical attribute categorical attribute missing values class labels

IF cell_poly <= 220 AND Risk = n AND Loc_dat = + AND Nausea > 15
THEN Prediction = VIRUS [87,5%]
[confidence, predictive accuracy]
11
Data Rich Knowledge Poor
How to acquire knowledge for
knowledge-based systems
remains as the main difficult
and crucial problem.
People gathered and stored so
much data because they think
some valuable assets
are implicitly coded within it. ?
Raw data is rarely of direct benefit. knowledge inference
base engine

Its true value depends on the ability


to extract information useful for
decision support. Tradition: via knowledge engineers
Impractical Manual Data Analysis New trend: via automatic programs

12
Benefits of Knowledge Discovery

Value

Disseminate

Generate
DSS
MIS
EDP
Rapid Response
Volume
EDP: Electronic Data Processing
MIS: Management Information Systems
DSS: Decision Support Systems
13
Lecture 1: Overview of KDD
1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD

14
The KDD process
The non-trivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data - Fayyad, Platetsky-Shapiro, Smyth (1996)
Multiple process

non-trivial process
Justified patterns/models
valid
novel Previously unknown

useful Can be used

understandable by human and machine


15
The Knowledge Discovery Process
5
a step in the KDD process
consisting of methods Putting the results
that produce useful in practical use
patterns or models from 4
the data, under some
acceptable computational Interpret and Evaluate
efficiency limitations discovered knowledge
3

Data Mining
2 Extract Patterns/Models

Collect and
Preprocess Data
1

Understand the domain and KDD is inherently


Define problems interactive and iterative
16
The KDD Process
Data organized by function

Create/select
target database
Data warehousing
Select sampling
1
technique and
sample data

Supply missing Eliminate


values noisy data 2

Normalize Transform Create derived Find important


values values attributes attributes &
value ranges

3 4
Select DM Select DM Extract Test Refine
task (s) method (s) knowledge knowledge knowledge

Query & report generation


Transform to Aggregation & sequences
different
representation Advanced methods 5
17
Main Contributing Areas of KDD
Statistics
[data warehouses: Infer info from data
integrated data] (deduction & induction,
mainly numeric data)
[OLAP: On-Line
KDD
Analytical Processing]

Databases
Machine Learning
Store, access, search,
update data (deduction) Computer algorithms that improve
automatically through experience
(mainly induction, symbolic data)

18
Lecture 1: Overview of KDD
1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD

19
Potential Applications
Business information Manufacturing information

- Marketing and sales


data analysis
- Investment analysis
- Loan approval
- Controlling and scheduling
- Fraud detection
- Network management
- etc.
- Experiment result analysis
- etc.
Scientific information Personal information
- Sky survey cataloging
- Biosequence Databases
- Geosciences: Quakefinder
- etc.

20
KDD: Opportunity and Challenges
Competitive
Pressure

Data Rich
Knowledge Poor
(the resource) KDD
Data Mining
Technology
Mature

Enabling Technology
(Interactive MIS, OLAP,
parallel computing, Web, etc.)
21
KDD: A New and Fast Growing Area
KDD workshops: since 1989.
Inter. Conferences: KDD (USA), first in 1995;
PAKDD (Asia), first in 1997; PKDD (Europe), first in 1997.
ML’04/PKDD’04 (in Pisa, Italy)

Industry interests and competition: IBM, Microsoft,


Silicon Graphics, Sun, Boeing, NASA, SAS, SPSS, …
About 80% of the Fortune 500 companies are involved in
data mining projects or using data mining systems.

JAPAN: FGCS Project (logic programming and reasoning).

“Knowledge Discovery is the most desirable end-product of computing”.


Wiederhold, Standford Univ.
22
Lecture 1: Overview of KDD
1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD

23
Primary Tasks of Data Mining
finding the description
identifying a finite
of several predefined
set of categories or
classes and classify
clusters to describe
a data item into one
the data.
of them.
Clustering
Classification
finding a model
maps a data item which describes
? significant dependencies
to a real-valued
prediction variable. between variables.

Regression Dependency
Modeling
discovering the finding a
most significant compact description
changes in the data for a subset of data
Deviation and
change detection Summarization
24
Classification
“What factors determine cancerous cells?”

Examples

Data Mining General


Algorithm patterns
- Rule Induction
Classification - Decision tree
Cancerous Cell Data
Algorithm - Neural Network

25
Classification: Rule Induction
“What factors determine a cell is cancerous?”

If Color = light
and Tails = 1
and Nuclei = 2
Then Healthy Cell (certainty = 92%)

If Color = dark
and Tails = 2
and Nuclei = 2
Then Cancerous Cell (certainty = 87%)

26
Classification: Decision Trees

Color = dark Color = light

#nuclei=1 #nuclei=2 #nuclei=1 #nuclei=2

cancerous healthy
#tails=1 #tails=2
#tails=1 #tails=2

healthy cancerous healthy cancerous

27
Classification: Neural Networks
“What factors determine a cell is cancerous?”

Color = dark

Healthy
# nuclei = 1

Cancerous

# tails = 2

28

You might also like