0% found this document useful (0 votes)

19 views23 pages

WK 6 Nearest Neighbor Classifier and Bayesian Classifier 1 PPT

Uploaded by

walid49161

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views23 pages

WK 6 Nearest Neighbor Classifier and Bayesian Classifier 1 PPT

Uploaded by

walid49161

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Summer 2021

Data Mining and Machine

Learning (CSE 321)
Topic – 5: Classification
(Alternative Techniques)
Course Teacher:
Md. Tarek Habib
Assistant Professor
Department of Computer Science and Engineering
Daffodil International University
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
Topic Contents
• Nearest-Neighbor Classifier

• Bayesian Classifier

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

• “Introduction to Data Mining,” Pang-Ning

Tan, Michael Steinbach and Vipin Kumar,
Addison Wesley, 2006.
☞ Chapter 5 (Classification: Alternative
Techniques)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3

3
Nearest Neighbor Classifiers

● Basic idea:
– If it walks like a duck, quacks like a duck, then
it’s probably a duck

Compute
Distance Test
Record

Training Choose k of the

Records “nearest” records

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4

Nearest-Neighbor Classifiers

● Requires three things

– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

● To classify an unknown record:

– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5

Definition of Nearest Neighbor

K-nearest neighbors of a record x are data points

that have the k smallest distance to x

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6

Nearest Neighbor Classification

● Compute distance between two points:

– Euclidean distance

● Determine the class from nearest neighbor list

– take the majority vote of class labels among
the k-nearest neighbors
– Weigh the vote according to distance
◆ weight factor, w = 1/d2

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7

Nearest Neighbor Classification…

● Choosing the value of k:

– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from
other classes

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8

Nearest Neighbor Classification…

● Scaling issues
– Attributes may have to be scaled to prevent
distance measures from being dominated by
one of the attributes
– Example:
◆ height of a person may vary from 1.5m to 1.8m
◆ weight of a person may vary from 90lb to 300lb

◆ income of a person may vary from $10K to $1M

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9

Nearest neighbor Classification…

● k-NN classifiers are lazy learners

– It does not build models explicitly
– Unlike eager learners such as decision tree
induction and rule-based systems
– Classifying unknown records are relatively
expensive

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10

Bayes Classifier

● The task of predicting whether a user is in the risk of heart

disease depends on healthy diet and exercise.
● There might be other factors like heridity, smoking or
alcohol abuse which may affect it
● A probabilistic framework for solving classification
problems
● Conditional Probability:

● Bayes theorem:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11

Example of Bayes Theorem

● Given:
– A doctor knows that meningitis causes stiff neck 50% of the
time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20

● If a patient has stiff neck, what’s the probability

he/she has meningitis?

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12

Bayesian Classifiers

● Consider each attribute and class label as random

variables

● Given a record with attributes (A1, A2,…,An)

– Goal is to predict class C
– Specifically, we want to find the value of C that
maximizes P(C| A1, A2,…,An )

● Can we estimate P(C| A1, A2,…,An ) directly from

data?

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13

Bayesian Classifiers

● Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem

– Choose value of C that maximizes

P(C | A1, A2, …, An)

– Equivalent to choosing value of C that maximizes

P(A1, A2, …, An|C) P(C)

● How to estimate P(A1, A2, …, An | C )?

● Assume independence among attributes Ai when class is

given:
– P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj) Π P(Ai| Cj) is

maximal.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15

How to Estimate Probabilities from Data?

● Class: P(C) = Nc/N

– e.g., P(No) = 7/10,
P(Yes) = 3/10

● For discrete attributes:

P(Ai | Ck) = |Aik|/ Nc k
– where |Aik| is number of
instances having attribute
Ai and belongs to class Ck
– Examples:
P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0

How to Estimate Probabilities from Data?

● For continuous attributes:

– Discretize the range into bins
◆ one ordinal attribute per bin
◆ violates independence assumption k

– Two-way split: (A < v) or (A > v)

◆ choose only one of the two splits as new attribute
– Probability density estimation:
◆ Assume attribute follows a normal distribution
◆ Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
◆ Once probability distribution is known, can use it to
estimate the conditional probability P(Ai|c)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17
How to Estimate Probabilities from Data?

● Normal distribution:

– One for each (Ai,ci) pair

● For (Income, Class=No):

– If Class=No
◆ sample mean = 110
◆ sample variance = 2975

Example of Naïve Bayes Classifier
Given a Test Record:

● P(X|Class=No) = P(Refund=No|Class=No)
× P(Married| Class=No)
× P(Income=120K| Class=No)
= 4/7 × 4/7 × 0.0072 = 0.0024

● P(X|Class=Yes) = P(Refund=No| Class=Yes)

× P(Married| Class=Yes)
× P(Income=120K| Class=Yes)
= 1 × 0 × 1.2 × 10-9 = 0

Since P(X|No)P(No) > P(X|Yes)P(Yes)

Therefore P(No|X) > P(Yes|X)
=> Class = No

Naïve Bayes Classifier

● If one of the conditional

probability is zero, then
the entire expression
becomes zero
● Probability estimation:

n: number of instances
from class yj
p: prior probability
m: parameter
nc: is the number of
training examples from
class yj

Example of Naïve Bayes Classifier

A: attributes
M: mammals
N: non-mammals

P(A|M)P(M) > P(A|N)P(N)

=> Mammals

Naïve Bayes (Summary)

● Robust to isolated noise points

● Handle missing values by ignoring the instance

during probability estimate calculations

● Robust to irrelevant attributes

● Independence assumption may not hold for some

attributes
– Use other techniques such as Bayesian Belief
Networks (BBN)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23

Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
AIML - UNIT-4 Modified
No ratings yet
AIML - UNIT-4 Modified
119 pages
Aiml Unit-4
No ratings yet
Aiml Unit-4
82 pages
DM Consolidated
100% (1)
DM Consolidated
676 pages
01 Classification
No ratings yet
01 Classification
77 pages
Performance Evaluation
No ratings yet
Performance Evaluation
56 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Probability and Statistics Mansoura Day4
No ratings yet
Probability and Statistics Mansoura Day4
23 pages
Classification Techniques
No ratings yet
Classification Techniques
50 pages
Chap4 Basic Classification
No ratings yet
Chap4 Basic Classification
82 pages
ICGTETM 2016 Proceedings PDF
No ratings yet
ICGTETM 2016 Proceedings PDF
690 pages
Advanced Mathematical Applications in Data Science
From Everand
Advanced Mathematical Applications in Data Science
Biswadip Basu Mallik
No ratings yet
Introduction To Data Mining
100% (1)
Introduction To Data Mining
643 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
Mastering Clojure Data Analysis
From Everand
Mastering Clojure Data Analysis
Eric Rochester
No ratings yet
Lec 1
No ratings yet
Lec 1
33 pages
4-Chap4 Basic Classification
No ratings yet
4-Chap4 Basic Classification
128 pages
Chap7 Extended Association Analysis
No ratings yet
Chap7 Extended Association Analysis
67 pages
Lecture 3.1.3 3.1.4
No ratings yet
Lecture 3.1.3 3.1.4
24 pages
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
72 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Chap10 Anomaly Detection
No ratings yet
Chap10 Anomaly Detection
24 pages
Lecture Notes For Chapter 7 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 7 Introduction To Data Mining: by Tan, Steinbach, Kumar
67 pages
Idea Statica PL Steel 2018 A4 en
No ratings yet
Idea Statica PL Steel 2018 A4 en
16 pages
Handling Continuous Attributes: Different Kinds of Rules
No ratings yet
Handling Continuous Attributes: Different Kinds of Rules
33 pages
Classification
No ratings yet
Classification
50 pages
Relativity
100% (1)
Relativity
301 pages
Wotan As An Archetype
No ratings yet
Wotan As An Archetype
29 pages
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
From Everand
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
Ilya Narsky
No ratings yet
Introd M
No ratings yet
Introd M
38 pages
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
No ratings yet
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
23 pages
Prechtl Libro PDF
100% (3)
Prechtl Libro PDF
104 pages
ACT Resource Guide 2019
50% (2)
ACT Resource Guide 2019
26 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
101 pages
What Is Data Mining?: Many Definitions
No ratings yet
What Is Data Mining?: Many Definitions
15 pages
Lec12 Nearest Neighborclassifier
No ratings yet
Lec12 Nearest Neighborclassifier
12 pages
Chap4 KNN New
No ratings yet
Chap4 KNN New
7 pages
1 Lect - 1.2 - 12 - August 2022 PDF
No ratings yet
1 Lect - 1.2 - 12 - August 2022 PDF
59 pages
Introd M
No ratings yet
Introd M
37 pages
Chap4 KNN
No ratings yet
Chap4 KNN
11 pages
Chap4 - Basic - Classification-Admin and Economy
No ratings yet
Chap4 - Basic - Classification-Admin and Economy
31 pages
Naive Bayes
No ratings yet
Naive Bayes
18 pages
Data Mining K-Means Algorithm
No ratings yet
Data Mining K-Means Algorithm
36 pages
CS 6823 Data Mining: Classification Decision Tree
No ratings yet
CS 6823 Data Mining: Classification Decision Tree
39 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
The Eye of The World Robert Jordan Download
No ratings yet
The Eye of The World Robert Jordan Download
27 pages
Learning R for Geospatial Analysis
From Everand
Learning R for Geospatial Analysis
Michael Dorman
No ratings yet
Chap4 KNN
No ratings yet
Chap4 KNN
6 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
37 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
32 pages
Instance Based Classifiers: Dr. Faisal Kamiran
No ratings yet
Instance Based Classifiers: Dr. Faisal Kamiran
20 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
35 pages
AquaChem DemoGuide
No ratings yet
AquaChem DemoGuide
67 pages
Cloze Test - Study Notes PDF
No ratings yet
Cloze Test - Study Notes PDF
11 pages
Introduction To Normal Distribution: Nathaniel E. Helwig
0% (1)
Introduction To Normal Distribution: Nathaniel E. Helwig
56 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
34 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
34 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
Lecture Notes For Chapter 10 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 10 Introduction To Data Mining: by Tan, Steinbach, Kumar
24 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
32 pages
BITS-WASE-DATA MINING-Session-07-2015 PDF
No ratings yet
BITS-WASE-DATA MINING-Session-07-2015 PDF
25 pages
Theories of Intelligence
No ratings yet
Theories of Intelligence
24 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Nanoparticles
100% (1)
Nanoparticles
31 pages
Basic Concept of Classification (Data Mining)
No ratings yet
Basic Concept of Classification (Data Mining)
11 pages
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
13 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
31 pages
A Solution To Score 8.0
No ratings yet
A Solution To Score 8.0
53 pages
Lecture Notes For Chapter 1 Introduction To Data Mining
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining
16 pages
Ethico Legal Dilemmas in Critical Care Nursing
No ratings yet
Ethico Legal Dilemmas in Critical Care Nursing
36 pages
Logical DB
No ratings yet
Logical DB
8 pages
Year 5 - Reasoning - Spring 2019
No ratings yet
Year 5 - Reasoning - Spring 2019
16 pages
07a80305 Computerorganizationandarchitecture
No ratings yet
07a80305 Computerorganizationandarchitecture
8 pages
An Extensive Examination of Data Structures 6
No ratings yet
An Extensive Examination of Data Structures 6
17 pages
Relativity: Tensor Analysis
No ratings yet
Relativity: Tensor Analysis
4 pages
Atkinson 2000
No ratings yet
Atkinson 2000
20 pages
Week 11 Lecture
No ratings yet
Week 11 Lecture
17 pages
Dikshathakur Resume
No ratings yet
Dikshathakur Resume
2 pages
Class Time Table - Fall 2020
No ratings yet
Class Time Table - Fall 2020
17 pages
NOTES From TBBOTC Harry Binswanger
No ratings yet
NOTES From TBBOTC Harry Binswanger
3 pages
Lecture # 38
No ratings yet
Lecture # 38
16 pages
384
No ratings yet
384
8 pages
SBI Clerk Mains Result 2016 Declared!!!
No ratings yet
SBI Clerk Mains Result 2016 Declared!!!
9 pages
Ramanujan
No ratings yet
Ramanujan
2 pages
(30 Jan C) Rancangan Pengajaran Slot
No ratings yet
(30 Jan C) Rancangan Pengajaran Slot
4 pages
Bedroom Lighting Test
No ratings yet
Bedroom Lighting Test
6 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

WK 6 Nearest Neighbor Classifier and Bayesian Classifier 1 PPT

Uploaded by

WK 6 Nearest Neighbor Classifier and Bayesian Classifier 1 PPT

Uploaded by

Summer 2021

Data Mining and Machine

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

• “Introduction to Data Mining,” Pang-Ning

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3

Training Choose k of the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4

● Requires three things

● To classify an unknown record:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5

K-nearest neighbors of a record x are data points

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6

● Compute distance between two points:

● Determine the class from nearest neighbor list

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7

● Choosing the value of k:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8

◆ income of a person may vary from $10K to $1M

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9

● k-NN classifiers are lazy learners

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10

● The task of predicting whether a user is in the risk of heart

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11

● If a patient has stiff neck, what’s the probability

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12

● Consider each attribute and class label as random

● Given a record with attributes (A1, A2,…,An)

● Can we estimate P(C| A1, A2,…,An ) directly from

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13

– Choose value of C that maximizes

– Equivalent to choosing value of C that maximizes

● How to estimate P(A1, A2, …, An | C )?

● Assume independence among attributes Ai when class is

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj) Π P(Ai| Cj) is

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15

● Class: P(C) = Nc/N

● For discrete attributes:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16

● For continuous attributes:

– Two-way split: (A < v) or (A > v)

– One for each (Ai,ci) pair

● For (Income, Class=No):

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18

● P(X|Class=Yes) = P(Refund=No| Class=Yes)

Since P(X|No)P(No) > P(X|Yes)P(Yes)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19

● If one of the conditional

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20

P(A|M)P(M) > P(A|N)P(N)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21

● Robust to isolated noise points

● Handle missing values by ignoring the instance

● Robust to irrelevant attributes

● Independence assumption may not hold for some

You might also like