0% found this document useful (0 votes)

38 views28 pages

SIMS 422: Knowledge Inference Systems & Applications

This document outlines a presentation on knowledge inference systems and applications. It discusses the objectives of the course which are to provide fundamental techniques of knowledge discovery and data mining, issues in practical use and tools, and case studies of applications. It also outlines the prerequisite knowledge expected and content to be covered, including an overview of KDD, mining association rules, decision trees, and cluster analysis. Potential applications of KDD discussed include business, manufacturing, scientific, and personal information analysis.

Uploaded by

Harihar kalia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views28 pages

SIMS 422: Knowledge Inference Systems & Applications

Uploaded by

Harihar kalia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

SIMS 422

Knowledge Inference
Systems & Applications

Slides by H. T. Bao

1
Outline of the presentation

Objectives, Brief Discussion

Prerequisite Introduction and
and Content to Lectures Conclusion

2
Objectives
This course provides:

• fundamental techniques of knowledge

discovery and data mining (KDD)

• issues in KDD practical use and tools

• case-studies of KDD application
3
Prerequisite for the course
Nothing special but the followings are
expected:

• experience of computer use

• basis of databases, statistics,
and mathematics

• programming skills
4
Content of the course
• Overview of KDD
• Mining association rules
• Mining action rules
• Decision tree induction
• Distributed knowledge systems and distributed
query answering
• Cluster analysis
5
Outline of the presentation

Objectives, Brief Discussion

Prerequisite Introduction and
and Content to Lectures Conclusion

6
Brief introduction to lectures
Overview of KDD

7
Lecture 1: Overview of KDD
1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD

8
KDD: A Definition
KDD is the automatic extraction of non-obvious,
hidden knowledge from large volumes of data.

106-1012 bytes: What is the knowledge?

we never see the whole Then run Data
How to represent
data set, so will put it in Mining algorithms
and use it?
the memory of computers

9
Data, Information, Knowledge
We often see data as a string of bits, or numbers and
symbols, or “objects” which we collect daily.

Information is data stripped of redundancy, and reduced

to the minimum necessary to characterize the data.

Knowledge is integrated information, including facts and

their relations, which have been perceived, discovered,
or learned as our “mental pictures”.
Knowledge can be considered data at
a high level of abstraction and generalization.

10
From Data to Knowledge
Medical Data by Dr. Tsumoto, Tokyo Med. & Dent. Univ., 38 attributes
...
10, M, 0, 10, 10, 0, 0, 0, SUBACUTE, 37, 2, 1, 0,15,-,-, 6000, 2, 0, abnormal, abnormal,-, 2852, 2148, 712, 97,
49, F,-,multiple,,2137, negative, n, n, ABSCESS,VIRUS
12, M, 0, 5, 5, 0, 0, 0, ACUTE, 38.5, 2, 1, 0,15, -,-, 10700,4,0,normal, abnormal, +, 1080, 680, 400, 71, 59,
F,-,ABPC+CZX,, 70, negative, n, n, n, BACTERIA, BACTERIA
15, M, 0, 3, 2, 3, 0, 0, ACUTE, 39.3, 3, 1, 0,15, -, -, 6000, 0,0, normal, abnormal, +, 1124, 622, 502, 47, 63, F,
-,FMOX+AMK, , 48, negative, n, n, n, BACTE(E), BACTERIA
16, M, 0, 32, 32, 0, 0, 0, SUBACUTE, 38, 2, 0, 　 0, 15, -, +, 12600, 4, 0,abnormal, abnormal, +, 41, 39, 2, 44,
57, F, -, ABPC+CZX, ?, ? ,negative, ?, n, n, ABSCESS, 　 VIRUS
...

Numerical attribute categorical attribute missing values class labels

IF cell_poly <= 220 AND Risk = n AND Loc_dat = + AND Nausea > 15
THEN Prediction = VIRUS [87,5%]
[confidence, predictive accuracy]
11
Data Rich Knowledge Poor
How to acquire knowledge for
knowledge-based systems
remains as the main difficult
and crucial problem.
People gathered and stored so
much data because they think
some valuable assets
are implicitly coded within it. ?
Raw data is rarely of direct benefit. knowledge inference
base engine

Its true value depends on the ability

to extract information useful for
decision support. Tradition: via knowledge engineers
Impractical Manual Data Analysis New trend: via automatic programs

12
Benefits of Knowledge Discovery

Value

Disseminate

Generate
DSS
MIS
EDP
Rapid Response
Volume
EDP: Electronic Data Processing
MIS: Management Information Systems
DSS: Decision Support Systems
13
Lecture 1: Overview of KDD
1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD

14
The KDD process
The non-trivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data - Fayyad, Platetsky-Shapiro, Smyth (1996)
Multiple process

non-trivial process
Justified patterns/models
valid
novel Previously unknown

useful Can be used

understandable by human and machine

15
The Knowledge Discovery Process
5
a step in the KDD process
consisting of methods Putting the results
that produce useful in practical use
patterns or models from 4
the data, under some
acceptable computational Interpret and Evaluate
efficiency limitations discovered knowledge
3

Data Mining
2 Extract Patterns/Models

Collect and
Preprocess Data
1

Understand the domain and KDD is inherently

Define problems interactive and iterative
16
The KDD Process
Data organized by function

Create/select
target database
Data warehousing
Select sampling
1
technique and
sample data

Supply missing Eliminate

values noisy data 2

Normalize Transform Create derived Find important

values values attributes attributes &
value ranges

3 4
Select DM Select DM Extract Test Refine
task (s) method (s) knowledge knowledge knowledge

Query & report generation

Transform to Aggregation & sequences
different
representation Advanced methods 5
17
Main Contributing Areas of KDD
Statistics
[data warehouses: Infer info from data
integrated data] (deduction & induction,
mainly numeric data)
[OLAP: On-Line
KDD
Analytical Processing]

Databases
Machine Learning
Store, access, search,
update data (deduction) Computer algorithms that improve
automatically through experience
(mainly induction, symbolic data)

18
Lecture 1: Overview of KDD
1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD

19
Potential Applications
Business information Manufacturing information

- Marketing and sales

data analysis
- Investment analysis
- Loan approval
- Controlling and scheduling
- Fraud detection
- Network management
- etc.
- Experiment result analysis
- etc.
Scientific information Personal information
- Sky survey cataloging
- Biosequence Databases
- Geosciences: Quakefinder
- etc.

20
KDD: Opportunity and Challenges
Competitive
Pressure

Data Rich
Knowledge Poor
(the resource) KDD
Data Mining
Technology
Mature

Enabling Technology
(Interactive MIS, OLAP,
parallel computing, Web, etc.)
21
KDD: A New and Fast Growing Area
KDD workshops: since 1989.
Inter. Conferences: KDD (USA), first in 1995;
PAKDD (Asia), first in 1997; PKDD (Europe), first in 1997.
ML’04/PKDD’04 (in Pisa, Italy)

Industry interests and competition: IBM, Microsoft,

Silicon Graphics, Sun, Boeing, NASA, SAS, SPSS, …
About 80% of the Fortune 500 companies are involved in
data mining projects or using data mining systems.

JAPAN: FGCS Project (logic programming and reasoning).

“Knowledge Discovery is the most desirable end-product of computing”.

Wiederhold, Standford Univ.
22
Lecture 1: Overview of KDD
1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD

23
Primary Tasks of Data Mining
finding the description
identifying a finite
of several predefined
set of categories or
classes and classify
clusters to describe
a data item into one
the data.
of them.
Clustering
Classification
finding a model
maps a data item which describes
? significant dependencies
to a real-valued
prediction variable. between variables.

Regression Dependency
Modeling
discovering the finding a
most significant compact description
changes in the data for a subset of data
Deviation and
change detection Summarization
24
Classification
“What factors determine cancerous cells?”

Examples

Data Mining General

Algorithm patterns
- Rule Induction
Classification - Decision tree
Cancerous Cell Data
Algorithm - Neural Network

25
Classification: Rule Induction
“What factors determine a cell is cancerous?”

If Color = light
and Tails = 1
and Nuclei = 2
Then Healthy Cell (certainty = 92%)

If Color = dark
and Tails = 2
and Nuclei = 2
Then Cancerous Cell (certainty = 87%)

26
Classification: Decision Trees

Color = dark Color = light

#nuclei=1 #nuclei=2 #nuclei=1 #nuclei=2

cancerous healthy
#tails=1 #tails=2
#tails=1 #tails=2

healthy cancerous healthy cancerous

27
Classification: Neural Networks
“What factors determine a cell is cancerous?”

Color = dark

Healthy
# nuclei = 1

Cancerous
…

# tails = 2

Chapter 3 DATA MINIG
No ratings yet
Chapter 3 DATA MINIG
17 pages
Data Mining
No ratings yet
Data Mining
254 pages
Unit - I MLT
No ratings yet
Unit - I MLT
137 pages
PPT-DWDM Unit 3
No ratings yet
PPT-DWDM Unit 3
106 pages
Fayyad Et Al, 1996, From Data Mining To Knowledge Discovery in Databases
No ratings yet
Fayyad Et Al, 1996, From Data Mining To Knowledge Discovery in Databases
18 pages
Fund Data Science
No ratings yet
Fund Data Science
91 pages
AIML-HC Mod 02
No ratings yet
AIML-HC Mod 02
65 pages
Lect 1 2 Data Mining 3
No ratings yet
Lect 1 2 Data Mining 3
19 pages
Chapter - 5 - Data Mining
No ratings yet
Chapter - 5 - Data Mining
18 pages
DE Unit1 - Introdcution - DE - 8jul24
No ratings yet
DE Unit1 - Introdcution - DE - 8jul24
56 pages
Unit 1
No ratings yet
Unit 1
102 pages
Data Mining and Its Branches
No ratings yet
Data Mining and Its Branches
37 pages
1-Introduction To Data Mining-13-12-2024
No ratings yet
1-Introduction To Data Mining-13-12-2024
48 pages
Week 01
No ratings yet
Week 01
28 pages
UNIT 4 NOTES Oops
No ratings yet
UNIT 4 NOTES Oops
15 pages
Lecture 7 - Introduction To Data Mining
No ratings yet
Lecture 7 - Introduction To Data Mining
31 pages
DB 14
No ratings yet
DB 14
97 pages
KDD 1 Introduction
No ratings yet
KDD 1 Introduction
33 pages
HaftamuA ArticleReview
No ratings yet
HaftamuA ArticleReview
39 pages
Unit 1
No ratings yet
Unit 1
59 pages
Unit3 - Machine Learning With Big Data
No ratings yet
Unit3 - Machine Learning With Big Data
74 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
50 pages
Class 1a-DataCollection
No ratings yet
Class 1a-DataCollection
14 pages
1.1 DM-intro
No ratings yet
1.1 DM-intro
25 pages
Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
Week 4 - Introduction To Data Mining and Data Mining Techniques
No ratings yet
Week 4 - Introduction To Data Mining and Data Mining Techniques
44 pages
02-Data Mining The Data Mining Process
No ratings yet
02-Data Mining The Data Mining Process
15 pages
Chapter 1 - Introduction To Knowledge Discovery in
No ratings yet
Chapter 1 - Introduction To Knowledge Discovery in
18 pages
Paper Ljupce Markusheski PHD
No ratings yet
Paper Ljupce Markusheski PHD
12 pages
CS8751 ML&KDD Minnesota - Introduction
No ratings yet
CS8751 ML&KDD Minnesota - Introduction
32 pages
Ch1 Overview KDD - ML
No ratings yet
Ch1 Overview KDD - ML
23 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Knowledge Discovery in Databases: "We Are Drowning in Information and Starving For Knowledge"
No ratings yet
Knowledge Discovery in Databases: "We Are Drowning in Information and Starving For Knowledge"
22 pages
Mannila 1997
No ratings yet
Mannila 1997
15 pages
Knowledge Discovery in Databases: Javier B Ejar Cbea
No ratings yet
Knowledge Discovery in Databases: Javier B Ejar Cbea
30 pages
Unit 1
No ratings yet
Unit 1
43 pages
UNIT - 1 Data Mining
No ratings yet
UNIT - 1 Data Mining
16 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
DWM 4
No ratings yet
DWM 4
23 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
29 pages
Suraj R. Bhuyar: Presented by
No ratings yet
Suraj R. Bhuyar: Presented by
18 pages
Types of Attributes-1
No ratings yet
Types of Attributes-1
8 pages
Shaping Maths SG1
100% (1)
Shaping Maths SG1
19 pages
Data Mining
No ratings yet
Data Mining
25 pages
Chapter 3
No ratings yet
Chapter 3
5 pages
DM - MOD - 1 Part I
No ratings yet
DM - MOD - 1 Part I
9 pages
KDD Process in Data Mining - Javatpoint
No ratings yet
KDD Process in Data Mining - Javatpoint
10 pages
Hung-Son Intro-DM KD PDF
No ratings yet
Hung-Son Intro-DM KD PDF
58 pages
From Data Mining To Knowledge Discovery in Database
100% (1)
From Data Mining To Knowledge Discovery in Database
18 pages
KDD
No ratings yet
KDD
3 pages
UNESCO Courses: Module On Knowledge Discovery and Data Mining
No ratings yet
UNESCO Courses: Module On Knowledge Discovery and Data Mining
28 pages
An Online Hotel Booking System
50% (2)
An Online Hotel Booking System
6 pages
Artificial Intelligence - KCS701 - 2022-23 - AKTU - Solution - PDF.PDF - Crdownload
No ratings yet
Artificial Intelligence - KCS701 - 2022-23 - AKTU - Solution - PDF.PDF - Crdownload
28 pages
Intelligent Knowledge Discovery
No ratings yet
Intelligent Knowledge Discovery
4 pages
Johnnie's Win32 API Tutorial
0% (1)
Johnnie's Win32 API Tutorial
7 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
KDD Vs Data Mining
No ratings yet
KDD Vs Data Mining
2 pages
OL ICT Model Paper I TM
No ratings yet
OL ICT Model Paper I TM
7 pages
CP Computer Programming Asst II1
No ratings yet
CP Computer Programming Asst II1
9 pages
Smart Contact Manager Synopsis
No ratings yet
Smart Contact Manager Synopsis
15 pages
What Is The KDD Process
No ratings yet
What Is The KDD Process
2 pages
SE Answer Key
No ratings yet
SE Answer Key
17 pages
Thesis On Mobile Cloud Computing
100% (2)
Thesis On Mobile Cloud Computing
5 pages
Face - Recognition - Using Python & Opencv
No ratings yet
Face - Recognition - Using Python & Opencv
7 pages
Dokumen - Tips Widevine Level 1 Provisioning Models Level 1 Provisioning Models W I D e Vi 1
100% (1)
Dokumen - Tips Widevine Level 1 Provisioning Models Level 1 Provisioning Models W I D e Vi 1
13 pages
FortiNAC-7.2 F-FortiGate VPN Integration Guide
No ratings yet
FortiNAC-7.2 F-FortiGate VPN Integration Guide
55 pages
320 Web Applications PDF
No ratings yet
320 Web Applications PDF
7 pages
Unit 5 - Part B - Digital Presentations
No ratings yet
Unit 5 - Part B - Digital Presentations
7 pages
Get Essential C# 12.0, 8th Edition Mark Michaelis Free All Chapters
100% (8)
Get Essential C# 12.0, 8th Edition Mark Michaelis Free All Chapters
39 pages
Logs 24-11-29 001901
No ratings yet
Logs 24-11-29 001901
36 pages
Jeopardy Template
No ratings yet
Jeopardy Template
55 pages
Digital Microscope: Instruction Manual
No ratings yet
Digital Microscope: Instruction Manual
72 pages
Project Online To Planner Sync
100% (1)
Project Online To Planner Sync
6 pages
VSB Java Syllabus
No ratings yet
VSB Java Syllabus
4 pages
Manual: 1. Download "Ewelink" App
No ratings yet
Manual: 1. Download "Ewelink" App
7 pages
Tips For Charts
No ratings yet
Tips For Charts
35 pages
Aws Security Essentials
No ratings yet
Aws Security Essentials
2 pages
ChatGPT Teardown
No ratings yet
ChatGPT Teardown
9 pages
Cv-Mohd Salman
No ratings yet
Cv-Mohd Salman
4 pages
A Study On Coverage Criteria Based Test Case Reduction Techniques
No ratings yet
A Study On Coverage Criteria Based Test Case Reduction Techniques
7 pages
SLT Form Two
No ratings yet
SLT Form Two
5 pages
+ Add/request New Update: 19949926 - SICHUAN Province Airport Group. Co., LTD
No ratings yet
+ Add/request New Update: 19949926 - SICHUAN Province Airport Group. Co., LTD
2 pages
Firewall Command Line
No ratings yet
Firewall Command Line
5 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
Mastering Pandas in Python: Course Book
From Everand
Mastering Pandas in Python: Course Book
Pedro Martins
No ratings yet
CDP Systems and Implementation: Definitive Reference for Developers and Engineers
From Everand
CDP Systems and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

SIMS 422: Knowledge Inference Systems & Applications

Uploaded by

SIMS 422: Knowledge Inference Systems & Applications

Uploaded by

SIMS 422

Objectives, Brief Discussion

• fundamental techniques of knowledge

• issues in KDD practical use and tools

• experience of computer use

Objectives, Brief Discussion

2. The KDD Process

4. Data Mining Methods

5. Challenges for KDD

106-1012 bytes: What is the knowledge?

Information is data stripped of redundancy, and reduced

Knowledge is integrated information, including facts and

Numerical attribute categorical attribute missing values class labels

Its true value depends on the ability

2. The KDD Process

4. Data Mining Methods

5. Challenges for KDD

useful Can be used

understandable by human and machine

Understand the domain and KDD is inherently

Supply missing Eliminate

Normalize Transform Create derived Find important

Query & report generation

2. The KDD Process

4. Data Mining Methods

5. Challenges for KDD

- Marketing and sales

Industry interests and competition: IBM, Microsoft,

JAPAN: FGCS Project (logic programming and reasoning).

“Knowledge Discovery is the most desirable end-product of computing”.

2. The KDD Process

4. Data Mining Methods

5. Challenges for KDD

Data Mining General

Color = dark Color = light

#nuclei=1 #nuclei=2 #nuclei=1 #nuclei=2

healthy cancerous healthy cancerous

You might also like