Mod09-ppt2-ML in Image Classification

The document provides an overview of three machine learning classification methods: K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and Naïve Bayes. It explains the principles behind each method, including instance-based classification, optimal separation in SVM, and Bayesian probability in Naïve Bayes. Additionally, it highlights the advantages and limitations of these classifiers in practical applications.

Uploaded by

harsha vardhini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views30 pages

Mod09-ppt2-ML in Image Classification

Uploaded by

harsha vardhini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

MACHINE LEARNING

CLASSIFICATION
K NEAREST NEIGHBOUR (KNN)
SUPPORT VECTOR MACHINE
NAÏVE BAYES

DR. SHILOAH ELIZABETH D

Assistant Professor
Department of Computer Science and Engineering
Anna University
SUPERVISED LEARNING
Instance-Based Classifiers
• Store the training records
• Use training records to predict the class label of unseen cases
• Examples:
• Rote-learner - Memorizes entire training data and performs classification only
if attributes of record match one of the training examples exactly
• Nearest neighbor - Uses k “closest” points (nearest neighbors) for performing
classification
Nearest Neighbor Classifiers
• Requires three things
• The set of stored records
• Distance Metric to compute distance
between records
• The value of k, the number of nearest
neighbors to retrieve
• To classify an unknown record:
• Compute distance to other training
records
• Identify k nearest neighbors
• Use class labels of nearest neighbors
to determine the class label of
unknown record (e.g., by taking
majority vote)
• K-nearest neighbors of a record x are data points that have the k smallest
distance to x
Nearest Neighbor
• Compute distance between two points:
Nearest Neighbor Classification
• Determine the class from nearest neighbor list
Nearest Neighbor Classification
• Choosing the value of k:
• If k is too small, sensitive to noise points
• If k is too large, neighborhood may include points from other classes
Nearest Neighbor Classification
• Normalization
• Curse of Dimensionality
• k-NN classifiers are lazy learners
• It does not build models explicitly
• Unlike eager learners such as decision tree induction and rule-based systems
• Classifying unknown records are relatively expensive
Example
We have data from the questionnaires survey (to ask people opinion) and objective testing with two
attributes (acid durability and strength) to classify whether a special paper tissue is good or not.
Here is four training samples:
X1 = Acid Durability (seconds) and X2 = Strength(kg/square meter); Y = Classification
(7, 7, Bad); (7, 4, Bad); (3, 4, Good); (1, 4, Good)
Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7.
Without another expensive survey, can we guess what the classification of this new tissue is?

We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that pass laboratory test with X1
= 3 and X2 = 7 is included in Good category.
SUPPORT VECTOR MACHINE (SVM)
SUPPORT VECTOR MACHINE (SVM)
• SVM is a kernel method
• Give better classification performance than other ML algorithms on
reasonably sized datasets.
• They do not work well on extremely large datasets since they involve
a data matrix inversion which is very expensive.
SVM – When data is linearly separable
OPTIMAL SEPARATION

Three different classification lines. Is there any reason why one is better than the others?
OPTIMAL SEPARATION
• All three of the lines that are drawn separate out the two classes,
• so in some sense they are ‘correct’, and
• the Perceptron would stop its training if it reached any one of them.
• we prefer a line that runs through the middle of the separation
between the datapoints from the two classes,
• staying approximately equidistant from the data in both classes.
• If we pick the lines shown in the left or right graphs,
• then there is a chance that a datapoint from one class will be on the wrong
side of the line,
• just because we have put the line tight up against some of the datapoints we
have seen in the training set.
The Margin and Support Vectors

The margin is the largest region we can put that separates the classes without there being any points inside, where the
box is made from two lines that are parallel to the decision boundary.
The classifier in the middle of the Figure has the largest margin of the three. It has the imaginative name of the
maximum margin (linear) classifier.
The datapoints in each class that lie closest to the classification line are called support vectors.
SVM
• Using the argument that the best classifier is the one that goes
through the middle of no-man’s land, we can now make two
arguments:
• the margin should be as large as possible, and
• the support vectors are the most useful datapoints because they are the ones
that we might get wrong.
• This leads to an interesting feature of these algorithms:
• after training we can throw away all of the data except for the support
vectors, and use them for classification
SVM
• Computing optimal decision boundary from a given set of datapoints
• w - weight vector (a vector, not a matrix, since there is only one output)
• x - input vector
• Output y = w· x+b, with b being the contribution from the bias weight
• We use the classifier line by saying that
• any x value that gives a positive value for w · x + b is above the line, and so is
an example of the ‘+’ class,
• any x that gives a negative value is in the ‘o’ class.
SVM
• Let us include our no-man’s land
• If the absolute value is less than our margin M, which would put it inside the
grey box
• w · x is the inner or scalar product, w · x.
• This can also be written as wT x, which means that we can treat the vectors as
degenerate matrices and use the normal matrix multiplication rules.
• For a given margin value M we can say that any point x
• where wT x + b  M is a plus, and
• any point where wT x + b  −M is a circle.
• The actual separating hyperplane is specified by wT x + b = 0.
SVM
• support vector - a point x+ that lies on the ‘+’ class boundary line, so that wT
x+ = M
• If we want to find the closest point that lies on the boundary line for the ‘o’
class, then we travel perpendicular to the ‘+’ boundary line until we hit the
‘o’ boundary line.
• The point that we hit is the closest point, and we’ll call it x−
• the distance travelled to get to the separating hyperplane is M
• from x+to x-is 2M
• to write down the margin size M in terms of w
• w is perpendicular to the classifier line, the ‘+’ and ‘o’ boundary lines
• so the direction travelled from x+to x-is along w.
• to make w a unit vector w/||w||, and so we see that the margin is 1/||w||
Bayesian Classification: Why?
• A statistical classifier: performs probabilistic prediction, i.e.,
predicts class membership probabilities
• Foundation: Based on Bayes’ Theorem.
• Performance: A simple Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree and
selected neural network classifiers
• Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct —
prior knowledge can be combined with observed data
• Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision
making against which other methods can be measured
22
Bayes’ Theorem: Basics
M
• Total probability Theorem: P(B)   P(B | A )P( A )
i i
i 1

• Bayes’ Theorem: P(H | X)  P(X | H )P(H )  P(X | H ) P(H ) / P(X)

P(X)
• Let X be a data sample (“evidence”): class label is unknown
• Let H be a hypothesis that X belongs to class C
• Classification is to determine P(H|X), (i.e., posteriori probability): the
probability that the hypothesis holds given the observed data sample X
• P(H) (prior probability): the initial probability
• E.g., X will buy computer, regardless of age, income, …
• P(X): probability that sample data is observed
• P(X|H) (likelihood): the probability of observing the sample X, given that
the hypothesis holds
• E.g., Given that X will buy computer, the prob. that X is 31..40,
medium income
23
Prediction Based on Bayes’ Theorem
• Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes’ theorem

P(H | X)  P(X | H )P(H )  P(X | H ) P(H ) / P(X)

P(X)
• Informally, this can be viewed as
posteriori = likelihood x prior/evidence
• Predicts X belongs to Ci iff the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
• Practical difficulty: It requires initial knowledge of many
probabilities, involving significant computational cost

24
Classification Is to Derive the Maximum Posteriori
• Let D be a training set of tuples and their associated class labels,
and each tuple is represented by an n-D attribute vector X = (x1,
x2, …, xn)
• Suppose there are m classes C1, C2, …, Cm.
• Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
• This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X)  i i
i P(X)
• Since P(X) is constant for all classes, only
P(C | X)  P(X | C )P(C )
i i i
needs to be maximized

• This greatly reduces the computation cost: Only counts the class
distribution
• If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk
for Ak divided by |Ci, D| (# of tuples of Ci in D)
• If Ak is continous-valued, P(xk|Ci) is usually computed based on
Gaussian distribution with a mean μ and standard 1  deviation σ
( x ) 2

g ( x,  ,  )  e 2 2
2 
and P(xk|Ci) is
P ( X | C i )  g ( xk ,  C i ,  C i )
26
Naïve Bayes Classifier: Training Dataset
buys
_co
Class: stude credit_rati mpu
age income nt ng ter
C1:buys_computer = ‘yes’ <=30 high no fair no
C2:buys_computer = ‘no’ <=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
Data to be classified: >40 low yes fair yes
X = (age <=30, >40 low yes excellent no
31…40 low yes excellent yes
Income = medium, <=30 medium no fair no
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
27
Naïve Bayes Classifier: An Example age income studentcredit_rating
buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 >40
>40
medium
low
no fair
yes fair
yes
yes
>40 low yes excellent no
P(buys_computer = “no”) = 5/14= 0.357 31…40
<=30
low
medium
yes excellent
no fair
yes
no
<=30 low yes fair yes

• Compute P(X|Ci) for each class >40

<=30
medium yes fair
medium yes excellent
yes
yes
31…40 medium no excellent yes
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 31…40
>40
high
medium
yes fair
no excellent
yes
no

P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
28 Therefore, X belongs to class (“buys_computer = yes”)
Avoiding the Zero-Probability Problem
• Naïve Bayesian prediction requires each conditional prob. be
non-zero. Otherwise, the predicted prob. will be zero
n
P( X | C i)   P( x k | C i)
k 1
• Ex. Suppose a dataset with 1000 tuples, income=low (0),
income= medium (990), and income = high (10)
• Use Laplacian correction (or Laplacian estimator)
• Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
• The “corrected” prob. estimates are close to their
“uncorrected” counterparts
29
Naïve Bayes Classifier: Comments
• Advantages
• Easy to implement
• Good results obtained in most of the cases
• Disadvantages
• Assumption: class conditional independence, therefore loss of
accuracy
• Practically, dependencies exist among variables
• E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer,
diabetes, etc.
• Dependencies among these cannot be modeled by Naïve Bayes
Classifier
• How to deal with these dependencies? Bayesian Belief Networks
30

Machine Learning Interviews
100% (3)
Machine Learning Interviews
22 pages
Lovelock C - 1983 - Classifying Services To Gain Strategic Marketing Insights - Journal of Marketing - 47 - Pp. 9-20
No ratings yet
Lovelock C - 1983 - Classifying Services To Gain Strategic Marketing Insights - Journal of Marketing - 47 - Pp. 9-20
13 pages
QUESTIONS
No ratings yet
QUESTIONS
20 pages
Slide 10 Chapter9 Classification Advanced Methods
No ratings yet
Slide 10 Chapter9 Classification Advanced Methods
46 pages
ML Unit 3 Part B Material
No ratings yet
ML Unit 3 Part B Material
15 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
ML Unit 3 V1
No ratings yet
ML Unit 3 V1
25 pages
L6 Lecture Image - Classification.fundemental v4
No ratings yet
L6 Lecture Image - Classification.fundemental v4
66 pages
INT354 - Unit 3
No ratings yet
INT354 - Unit 3
60 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
SVM Class
No ratings yet
SVM Class
33 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Unit 3
No ratings yet
Unit 3
20 pages
Dsbdunitiii T1729232981820-1
No ratings yet
Dsbdunitiii T1729232981820-1
26 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
DL Highlights
No ratings yet
DL Highlights
6 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Ai and ML
No ratings yet
Ai and ML
16 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Unit 3
No ratings yet
Unit 3
100 pages
Basic of SVM Algorithm
No ratings yet
Basic of SVM Algorithm
10 pages
CH 7
No ratings yet
CH 7
33 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Classification
No ratings yet
Classification
7 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
08 Classification
No ratings yet
08 Classification
46 pages
03 Classification
No ratings yet
03 Classification
66 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Perceptron
No ratings yet
Perceptron
3 pages
ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
Module 3
No ratings yet
Module 3
79 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Lect 07 Distance Based Algorithms
No ratings yet
Lect 07 Distance Based Algorithms
34 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
w04-LectureSlices-MA4550
No ratings yet
w04-LectureSlices-MA4550
32 pages
Module 6-Svm
No ratings yet
Module 6-Svm
47 pages
Machine Learning Crash Course: Computer Vision James Hays
No ratings yet
Machine Learning Crash Course: Computer Vision James Hays
38 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
3.unit 3 ML Part-2 Q&A
No ratings yet
3.unit 3 ML Part-2 Q&A
23 pages
08classification I
No ratings yet
08classification I
52 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
AI Chapter 3 Part 3
No ratings yet
AI Chapter 3 Part 3
49 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
SVM
No ratings yet
SVM
11 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Deploy 1
No ratings yet
Deploy 1
2 pages
Software Engineering
No ratings yet
Software Engineering
14 pages
Digital Image Processing
No ratings yet
Digital Image Processing
11 pages
Trends in Child Stunting - IEEE
No ratings yet
Trends in Child Stunting - IEEE
5 pages
Physical Signal Seminar
No ratings yet
Physical Signal Seminar
11 pages
Project
100% (1)
Project
30 pages
Hierarchical Clustering: Ke Chen
No ratings yet
Hierarchical Clustering: Ke Chen
21 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
L13 Intro-Cnn Slides
No ratings yet
L13 Intro-Cnn Slides
65 pages
BIT 415 Term Paper Questions
No ratings yet
BIT 415 Term Paper Questions
2 pages
Teltek Project Submission
No ratings yet
Teltek Project Submission
12 pages
Stat841 Outline
No ratings yet
Stat841 Outline
3 pages
Exam 2017
No ratings yet
Exam 2017
8 pages
Full Proj Report
No ratings yet
Full Proj Report
59 pages
s12911 019 1004 8 PDF
No ratings yet
s12911 019 1004 8 PDF
16 pages
Designing A Learning System: DR - Chandrika.J Professor CSE Course Faculty
No ratings yet
Designing A Learning System: DR - Chandrika.J Professor CSE Course Faculty
22 pages
Gradient Boosting Machine and SHAP For Biogas Production
No ratings yet
Gradient Boosting Machine and SHAP For Biogas Production
73 pages
Face Mask Detection
No ratings yet
Face Mask Detection
4 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
Forecasting and Disaggregates-Travel
No ratings yet
Forecasting and Disaggregates-Travel
25 pages
COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
5 pages
Artificial Neural Networks Quiz Questions 1
No ratings yet
Artificial Neural Networks Quiz Questions 1
17 pages
ML Unit 1
No ratings yet
ML Unit 1
25 pages
Lab 10 - Random Forest Classifier
No ratings yet
Lab 10 - Random Forest Classifier
3 pages
2-Parametric and Non Parametric ML
No ratings yet
2-Parametric and Non Parametric ML
30 pages
AI - Module 4
No ratings yet
AI - Module 4
57 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
Report NutriScanAI Latest
100% (1)
Report NutriScanAI Latest
47 pages
Research Paper-3
No ratings yet
Research Paper-3
19 pages
Questions
No ratings yet
Questions
4 pages
Business Analytics in Sport Talent Acquisition Met
No ratings yet
Business Analytics in Sport Talent Acquisition Met
20 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
Medicinal Leaves Classification Using Random Forest and AdaBoost
No ratings yet
Medicinal Leaves Classification Using Random Forest and AdaBoost
8 pages

Mod09-ppt2-ML in Image Classification

Uploaded by

Mod09-ppt2-ML in Image Classification

Uploaded by

MACHINE LEARNING

DR. SHILOAH ELIZABETH D

• Bayes’ Theorem: P(H | X)  P(X | H )P(H )  P(X | H ) P(H ) / P(X)

P(H | X)  P(X | H )P(H )  P(X | H ) P(H ) / P(X)

• Compute P(X|Ci) for each class >40

P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6

You might also like