0% found this document useful (0 votes)
13 views47 pages

Module 6-Svm

This document discusses Support Vector Machines (SVM) and classifiers in the context of object and pattern recognition. It explains the principles of classification, decision boundaries, and the role of support vectors, as well as the concepts of soft margins and kernel tricks for handling non-linearly separable data. Additionally, it highlights the pros and cons of SVMs and their applications in various fields such as text and image classification.

Uploaded by

2022.surel.sanap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views47 pages

Module 6-Svm

This document discusses Support Vector Machines (SVM) and classifiers in the context of object and pattern recognition. It explains the principles of classification, decision boundaries, and the role of support vectors, as well as the concepts of soft margins and kernel tricks for handling non-linearly separable data. Additionally, it highlights the pros and cons of SVMs and their applications in various fields such as text and image classification.

Uploaded by

2022.surel.sanap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Module -6

Boundary Description & Object Recognition


Topic : SVM and Classifier

MR. MRUGENDRA VASMATKAR


Assistant Professor, EXTC
VES Institute of Technology
References
1. Milan Sonka ,Vaclav Hlavac, Roger Boyle, “Image Processing, Analysis, and Machine Vision” Cengage Engineering,
3rd Edition, 2013
2. Digital image processing – Ramesh Jayraman
3. Digital Image processing - Sreedhar

Marks in Exam :- 30 Marks – Book -1


No Numerical
Object Recognition
 An object is a physical unit, usually represented in image analysis
and computer vision by a region in a segmented image.
 The set of objects can be divided into disjoint subsets, that, from the
classification point of view, have some common features and are
called classes.
 Object recognition is based on assigning classes to objects, and the
device that does these assignments is called the CLASSIFIER.
Pattern Recognition
 The classifier (similarly to a human) does not decide about the class
from the object itself- -rather, sensed object properties serve this
purpose.
 Properties such as texture, specific weight, hardness, etc . , are used
instead. This sensed object is called the PATTERN
 Object Recognition & Pattern recognition are considered
synonymous.
Classification in Euclidean Space
• A classifier is a partition of the space x into disjoint decision regions
• Each region has a label attached
• Regions with the same label need not be contiguous
• For a new test point, find what decision region it is in, and predict the corresponding label

• Decision boundaries = boundaries between decision regions


• The “dual representation” of decision regions

• We can characterize a classifier by the equations for its decision boundaries

• Learning a classifier  searching for the decision boundaries that optimize our
objective function
A Simple Classifier: Minimum
Distance Classifier
• Training
• Separate training vectors by class
• Compute the mean for each class, mk, k = 1,… m

• Prediction
• Compute the closest mean to a test vector x’ (using Euclidean distance)
• Predict the corresponding class

• In the 2-class case, the decision boundary is defined by the locus of the hyperplane that is
halfway between the 2 means and is orthogonal to the line connecting them

• This is a very simple-minded classifier – easy to think of cases where it will not work very well
A Simple Classifier: Minimum
Distance Classifier
8

FEATURE 2
4

0
0 1 2 3 4 5 6 7 8
FEATURE 1
Linear Classifiers
• Linear classifier  single linear decision boundary
(for 2-class case)
• We can always represent a linear decision boundary by a linear equation:
w1 x1 + w2 x2 + … + wd xd = S wj xj = wt x = 0
• In d dimensions, this defines a (d-1) dimensional hyperplane
• d=3, we get a plane; d=2, we get a line

• For prediction we simply see if S w j xj > 0

• The wi are the weights (parameters)


• Learning consists of searching in the d-dimensional weight space for the set of
weights (the linear boundary) that minimizes an error measure
• A threshold can be introduced by a “dummy” feature that is always one; it weight
corresponds to (the negative of) the threshold

• Note that a minimum distance classifier is a special (restricted) case of a linear classifier
Linear Classifiers

A Possible Decision Boundary


7

5
FEATURE 2

0
0 1 2 3 4 5 6 7 8
FEATURE 1
8

7
Another Possible
Decision Boundary
6

5
FEATURE 2

0
0 1 2 3 4 5 6 7 8
FEATURE 1
Classifier Principles
8

Minimum Error
7 Decision Boundary

5
FEATURE 2

0
0 1 2 3 4 5 6 7 8
FEATURE 1
Minimum Distance Classifier
INTRODUCTION
Think of support vector
machine as a “road
machine”, which separates
the left , right-side cars,
buildings, pedestrians and
makes the widest lane as
possible. And those cars,
buildings, really close to
the street is the support
vectors.
What is Support Vector Machine (Classifier)

1. Support Vector Machine (the “road machine”) is responsible for finding the decision boundary to separate
different classes and maximize the margin.
2. Margins are the (perpendicular) distances between the line and those dots closest to the line.
Support Vector Machine (SVM)
• A Support Vector Machine (SVM) is a supervised machine learning
algorithm that can be employed for both classification and regression
purposes. SVMs are more commonly used in classification problems
and as such, this is what we will focus on in this post.
Support Vectors

• Support vectors are the data points nearest to the hyperplane, the
points of a data set that, if removed, would alter the position of the
dividing hyperplane. Because of this, they can be considered the
critical elements of a data set.
SVM in linear separable cases

Obviously, infinite lines exist to separate the red and green dots in the
example above. SVM needs to find the optimal line with the constraint
of correctly classifying either class:

1. Follow the constraint: only look into the separate hyperplanes(e.g. separate
lines), hyperplanes that classify classes correctly

2. Conduct optimization: pick up the one that maximizes the margin


What is Hyperplane
• As a simple example, for a classification task with only two features
(like the image above), you can think of a hyperplane as a line that
linearly separates and classifies a set of data.
• Intuitively, the further from the hyperplane our data points lie, the
more confident we are that they have been correctly classified. We
therefore want our data points to be as far away from the hyperplane
as possible, while still being on the correct side of it.
• So when new testing data is added, whatever side of the hyperplane it
lands will decide the class that we assign to it.
What is a Hyperplane?

Hyperplane is an (n minus 1)-dimensional subspace for an n-dimensional space.


1. For a 2-dimension space, its hyperplane will be 1-dimension, which is just a line.
2. For a 3-dimension space, its hyperplane will be 2-dimension, which is a plane that slice the
cube
Finding Right Hyperplane
Mathematical Expression
Mathematical Expression
What is a Separating Hyperplane?
• Assuming the label y is either 1 (for green) or -1 (for red),
• All those three lines below are separating hyperplanes.
• Because they all share the same property — above the line, is green; below the
line, is red.
What is a Separating Hyperplane?

Combined Rule
What is margin?
• Let’s say we have a hyperplane — line X
• calculate the perpendicular distance from all those 40 dots to line X, it
will be 40 different distances
• Out of the 40, the smallest distance, that’s our margin!
What is margin?
• The distance between either side of the dashed line to the solid line is
the margin. We can think of this optimal line as the mid-line of the
widest stretching we can possibly have between red and green dots.
To sum up, SVM in the linear separable
cases:
1. Constrain/ensure that each observation is on the correct side
of the Hyperplane
2. Pick up the optimal line so that the distance from those
closest dots to the Hyperplane, so-called margin, is maximized
SVM in linear non-separable cases

• In the linearly separable case, SVM is trying to find the hyperplane


that maximizes the margin, with the condition that both classes are
classified correctly.
• In reality, datasets are probably never linearly separable, so the
condition of 100% correctly classified by a hyperplane will never be
met.
• SVM address non-linearly separable cases by introducing two
concepts: Soft Margin and Kernel Tricks.
No Clear Hyperplane
Non Separable Hyperplane
linear nonseparable

Example
Solution
• Soft Margin: try to find a line to separate, but tolerate one or few
misclassified dots (e.g. the dots circled in red)
• Kernel Trick: try to find a non-linear decision boundary
Soft Margin
• Two types of misclassifications are tolerated by SVM under soft margin:

The dot is on the wrong side of the decision


boundary but on the correct side/ on the margin

The dot is on the wrong side of the decision


boundary and on the wrong side of the margin
Degree of tolerance

Applying Soft Margin, SVM tolerates a few dots to get misclassified and tries to
balance the trade-off between finding a line that maximizes the margin and
minimizes the misclassification.

How much tolerance(soft) we want to give when finding the decision boundary is
an important hyper-parameter for the SVM (both linear and nonlinear solutions).
In Sklearn, it is represented as the penalty term — ‘C’.
The bigger the C, the more penalty SVM gets when it makes misclassification.
Therefore, the narrower the margin is and fewer support vectors the decision boundary will
depend on.
Kernel Trick

• What Kernel Trick does is it utilizes existing features, applies some


transformations, and creates new features.
• Those new features are the key for SVM to find the nonlinear decision
boundary.
• We can choose ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a
callable as our kernel/transformation.
linear’, ‘poly’, ‘rbf’,
Polynomial Kernel
• Think of the polynomial kernel as a transformer/processor to generate
new features by applying the polynomial combination of all the
existing features.
Polynomial Kernel

Existing Feature: X = np.array([-2,-1,0, 1,2]) But, if we apply transformation X² to get:


Label: Y = np.array([1,1,0,1,1]) New Feature: X = np.array([4,1,0, 1,4])
it’s impossible for us to find a line to separate the yellow By combing the existing and new feature, we can
(1)and purple (0) dots certainly draw a line to separate the yellow purple dots

Support vector machine with a polynomial kernel can generate a non-linear decision
boundary using those polynomial features.
Radial Basis Function (RBF)
kernel
• Radial Basis Function kernel as a transformer/processor to generate new features by measuring the
distance between all other dots to a specific dot/dots — centers. The most popular/basic RBF kernel
is the Gaussian Radial Basis Function:

• Gamma (γ) controls the influence of new features


• Φ(x, center) on the decision boundary.
• The higher the gamma, the more influence of the features will have on the decision
boundary, more wiggling the boundary will be.
Radial Basis Function (RBF)
kernel
Effect of Gamma
• Similar to the penalty term — C in the soft margin, Gamma is a
hyperparameter that we can tune for when we use SVM with kernel.
To sum up, SVM in the linear
nonseparable cases:
• By combining the soft margin (tolerance of misclassifications) and kernel
trick together, Support Vector Machine is able to structure the decision boundary
for linear non-separable cases.
• Hyper-parameters like C or Gamma control how wiggling the SVM decision
boundary could be.
• the higher the C, the more penalty SVM was given when it misclassified, and
therefore the less wiggling the decision boundary will be
• the higher the gamma, the more influence the feature data points will have on the
decision boundary, thereby the more wiggling the boundary will be
Pros & Cons of Support Vector
Machines
• Pros
• Accuracy
• Works well on smaller cleaner datasets
• It can be more efficient because it uses a subset of training points
• Cons
• Isn’t suited to larger datasets as the training time with SVMs can be
high
• Less effective on noisier datasets with overlapping classes
Applications
• Text classification tasks such as category assignment, detecting spam
and sentiment analysis.
• Image recognition challenges
• Aspect-based recognition and color-based classification.
• Handwritten digit recognition, such as postal automation services.
Thank
You

You might also like