0% found this document useful (0 votes)
9 views38 pages

Support Vector Machine For Classification

The document provides an overview of Support Vector Machines (SVM) for classification, detailing the max-margin classifier concept and the process of determining optimal decision boundaries. It discusses linear and non-linear SVMs, the kernel trick for handling non-linearly separable data, and techniques for hyper-parameter tuning, including cross-validation. Additionally, it covers multi-class SVM strategies and provides resources for further learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views38 pages

Support Vector Machine For Classification

The document provides an overview of Support Vector Machines (SVM) for classification, detailing the max-margin classifier concept and the process of determining optimal decision boundaries. It discusses linear and non-linear SVMs, the kernel trick for handling non-linearly separable data, and techniques for hyper-parameter tuning, including cross-validation. Additionally, it covers multi-class SVM strategies and provides resources for further learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Support Vector Machine for

Classification
Instructor: Seunghoon Hong
Recap: image representation
Recap: nonparametric approach for classification
Training images Test image

Cat

1. Compute distance between feature vectors


2. Find nearest neighbor from the training data
Dog
3. Use the nearest neighbor’s label
Recap: parametric approach for classification
● The nearest neighbor algorithm is a specific instantiation of non-parametric
model for classification
● As an alternative, we can parameterize a decision function and learn these
parameters using the training data

Example: nearest neighbor


(non-parametric model)

Example: linear model


(parametric model)
Parameters
Today’s agenda
● Support Vector Machine (SVM)
Example: separable 2D data

Positive

Negative
Example: separable 2D data

Positive

Negative
Example: determining a good classifier

All decision boundaries lead to


perfect classification.
→ which boundary is better?
Positive

Negative
Example: determining a good classifier

Issues?

The decision boundary is


Positive awkwardly closed to the
negative samples.
Negative
→ may not generalize well to
unseen examples
Support Vector Machine (SVM)
● Max-margin classifier
Maximizing the margin of the
classifier will lead to better
generalization

Positive

Negative

margin
Support Vector Machine (SVM)
● Let’s assume that we have a set of linearly separable data
Support Vector Machine (SVM)
● Our decision rule

Note that we can generalize these rules to


arbitrary big constant (i.e. C instead of 1),
but we set C=1 for mathematical
convenience.

if
Support Vector Machine (SVM)
● Our decision rule

● For the samples closest to the decision


boundary (i.e. support vectors)
Support Vector Machine (SVM)
● Let x+ and x- be the positive and
negative support vectors

● Quantifying the margin

Difference between Unit normal of


positive and negative decision plane
support vectors
Support Vector Machine (SVM)
● Let x+ and x- be the positive and
negative support vectors

● Quantifying the margin

By the definition of
support vectors
Support Vector Machine (SVM)
● Problem of maximizing the margin

● Learning objective (max-margin classifier; SVM)


Support Vector Machine (SVM)
● Lagrangian formulation
○ Integrate the constraints using the Lagrangian multipliers αi

What are the conditions that the optimal w, b should satisfy?


Support Vector Machine (SVM)
● From the optimality conditions:

The (optimal) decision


boundary is computed by a
linear combination of data!
Support Vector Machine (SVM)
● From the optimality conditions:
Support Vector Machine (SVM)
● From the optimality conditions:

Rewrite the original objective by exploiting these conditions!


Support Vector Machine (SVM)
Support Vector Machine (SVM)
● Dual form of SVM objective

● Both the objective and constraints are convex functions

● We can find the solution using any Quadratic Programming (QP) solver
● The obtained solution is the global optimum! (no local optima!)
So the optimality of the solution is always guaranteed!
Support Vector Machine (SVM)
● Parameters for the max-margin hyperplane
○ Weight coefficient

■ Any data point for which will not contribute


■ It turned out that only the support vectors have

○ Bias parameter
■ From the fact that for all support vectors
■ We usually take average over all support vectors for numerical stability
Support Vector Machine (SVM)
● Testing
Linear SVM, Non-separable case
● Soft margin
○ Introduce slack variables,
○ Allow training example to be within the margin or
even on the wrong side of the linear separator.

● New objective function with slack variables


Non-linear SVM
● So far, we assume that our data is (almost) linearly separable
● What if our data is linearly not separable?
Non-linear SVM
● We want to map the data in original input space to some higher dimensional
space where separation of training data is much easier with linear classifier.

Projection of the data to Linear SVM


higher dimensional space

Non-linear SVM

How do we design the mapping such that it


maps the data to linearly separable space?
Kernel method in SVM
● Kernel trick: define a kernel as a dot product between the features

● The SVM decision function is then can be written as:

● It allows us to just define the kernel k without knowing the explicit form of the
mapping function φ!
Widely-used kernels
● Linear kernel:
● Polynomial kernel:
● Gaussian (Radial basis function - RBF) kernel:
● Histogram intersection kernel:
● And many others...
Non-linear SVM
● Optimizing the SVM objective with kernel
Oops? inner-product!

❏ Dual form of linear SVM:

❏ SVM with kernel:

● The same optimization techniques can be used to solve kernel SVM


● We only know the kernel k, and don’t have to know the mapping φ explicitly
Non-linear SVM
● Linear vs. non-linear SVM
Hyper-parameter tuning
● Hyper-parameters
○ Weights to the sum of slack variables in the soft-margin SVM

○ Kernel parameters

● How to select appropriate values for the hyper-parameters?


Cross-validation
● A naïve approach
○ Select hyper-parameter values that minimize training error.
○ OVERFITTING!

● A better approach: Cross Validation


○ Divide the training dataset into 𝐾 parts.
○ Set aside one of the parts for validation.
○ Learn SVMs with the remaining 𝐾−1
parts by varying hyper-parameters.
○ Evaluate errors of the learned models
on the validation set.
○ Repeat the 2~4 steps and calculate
mean errors per hyper-parameter set.
○ Select the hyper-parameter set
with lowest mean error.
Multi-class SVM
● One‐versus‐all
○ Training: learn an SVM for each class vs. the others.
○ Testing: apply each SVM to test example and assign to it the class of the SVM that returns the
highest decision value.

● One‐versus‐one
○ Training: learn an SVM for each pair of classes
○ Testing: each learned SVM “votes” for a class to assign to the test example
SVM Resources
● References
○ C. Cortes and V. Vapnik, Support‐vector networks, Machine Learning 20 (3): 273, 1995.
○ N. Cristianini, and J. Shawe‐Taylor, An introduction to support vector machine and other
kernel-based methods, Cambridge University Press, Cambridge. 2000.
○ B. Scholkopf and A. Smola, Learning with Kernels, Robust Estimators, MIT Press, 2002.

● Libraries and software packages


○ LIBSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm/
○ LIBLINEAR: http://www.csie.ntu.edu.tw/~cjlin/liblinear/
○ SVM light : http://svmlight.joachims.org/
Questions?
Review: image classification pipeline

Feature extractor Classifier


Next
● Introduction to neural network

You might also like