0% found this document useful (0 votes)

32 views15 pages

A09 Support Vector Machines 2up

The document provides an overview of Support Vector Machines (SVM), a supervised learning algorithm used for classification and regression. It discusses the concepts of linear and nonlinear SVMs, the importance of maximizing the margin for better generalization, and the use of kernel functions to handle non-linear data. Additionally, it covers the mathematical formulation of SVMs, optimization problems, and the significance of support vectors in defining the separating hyperplane.

Uploaded by

Ayyappan Harikrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views15 pages

A09 Support Vector Machines 2up

Uploaded by

Ayyappan Harikrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Support Vector Machines

Mehul Motani
Electrical & Computer Engineering
National University of Singapore
Email: [email protected]

© Mehul Motani Support Vector Machines 1

AI and Machine Learning

• “AI is the new electricity. Just as 100 years ago
electricity transformed industry after industry, AI will
now do the same.” – Andrew Ng
• Machine learning is about learning from data.
• Supervised learning – Learning from data with labels
which serve a supervisory purpose
• Unsupervised learning – Learning from data without
labels allows tasks such a clustering.
• Reinforcement learning – Learning from data without
labels but there is feedback from the environment.

© Mehul Motani Support Vector Machines 2

Support Vector Machines (SVM)
• SVM is a supervised learning algorithm
– Useful for both classification and regression problems
• Linear SVM – Maximum-Margin Classifier
– Formalize notion of the best linear separator
• Optimization Problem with Lagrangian Multipliers
– Technique to solve a constrained optimization problem
• Nonlinear SVM – Extending Linear SVM with Kernels
– Project data into higher-dimensional space to make it
linearly separable.
– create nonlinear classifiers by applying the kernel trick to
maximum-margin hyperplanes.
– Complexity: Depends only on the number of training
examples, not on dimensionality of the kernel space!

© Mehul Motani Support Vector Machines 3

SVM – A Brief History

• Pre-1980: Almost all learning methods learned linear decision
surfaces.
– Linear learning methods have nice theoretical properties
• 1980’s: Decision trees and Neural Nets allowed efficient
learning of non-linear decision surfaces
– Little theoretical basis and all suffer from local minima
• 1990’s: Efficient learning algorithms for non-linear functions
based on computational learning theory developed
• Support Vector Machines
– The original SVM algorithm was invented by Vapnik and
Chervonenkis in 1963.
– Nonlinear SVMs using the kernel trick were first introduced in a
conference paper by Boser, Guyon and Vapnik in 1992.
– The SVM with soft margin was proposed by Cortes and Vapnik in
1993 and published in 1995.

© Mehul Motani Support Vector Machines 4

Supervised Learning: Linear Separators
• Binary classification can be viewed as the task of
separating classes in feature space.
• Two features: !! and !"
• Two classes: red and blue
wTx + b = 0
!" • Linear separator given by line:
wTx + b > 0 wTx + b = 0 (1)
wTx + b < 0 !! $!
x= ! w= $
" "
• Classification
wTx + b < 0 à blue (-1)
wTx + b > 0 à red (+1)
• Classifier function
f(x) = sign(wTx + b)
!! • New data: Green dot will be
classified as red class (+1)
© Mehul Motani Support Vector Machines 5

Linear Separators
• There are many possible linear separators!
• Which of the linear separators is optimal?
• The linear SVM solution defines an objective and finds the
linear separator which maximizes that objective.

© Mehul Motani Support Vector Machines 6

Linear Separators and Margin
• Consider three new data points: A,
B, C (green dots), all of which are
classified as class 0.
Class 0
• How confident are you that point A
A B is class 0?
C Class 1
• What about point C?
• What about point B?
• Intuitively, we are more confident
about point A than point C.
• Intuition: if a point is far from the
separating hyperplane (i.e., large
margin), then we may be more
confident in our prediction.

© Mehul Motani Support Vector Machines 7

Classification Margin
• Data points closest to the hyperplane are called the support
vectors (circled data points)
• Margin ρ of separator is the distance between support vectors
• Note that the separator is completely defined by its support
vectors.
xi ρ
r What is the distance, r, from data
w point xi to the separator?
Q For x1 and x2 on the separating hyperplane:
wT (x1 − x 2 ) = 0 ⇒ w ⊥ hyperplane (1)

⎛ w ⎞ wT x i + b
T
w ⎜⎜ x i − r ⎟⎟ + b = 0 ⇒ r = (2)
⎝ w ⎠ w

Point Q Norm of w
© Mehul Motani Support Vector Machines 8
Maximum Margin Classification
• Maximizing the margin is provably good and intuitive
– Larger margin leads to lower generalization error (Vapnik).
• Implies that only support vectors matter; other training
examples can be ignored à SVM is stable and robust to outliers
ρ

© Mehul Motani Support Vector Machines 9

Linear SVM Mathematically

• Let training set be S={(xi, yi)}i=1,2,...,n with xi Î Rd and yi Î {-1, 1}.
• Suppose we have a separating hyperplane with margin ρ,
weight vector w and scalar b.
• Then for each training example (xi, yi):
wTxi + b ≤ - ρ/2 if yi = -1
wTxi + b ≥ ρ/2 if yi = 1 Û
yi(wTxi + b) ≥ ρ/2 (1)

• For every support vector xs the above inequality is an equality.

After rescaling w and b by ρ/2 in the equality, we obtain that
distance between each xs and the hyperplane is
y s ( w T x s + b) 1 (2)
r= =
w w
• Then the margin can be expressed through (rescaled) w and b
as: 2
r = 2r = (3)
w
© Mehul Motani Support Vector Machines 10
Linear SVMs Mathematically (cont.)
• Then we can formulate the quadratic optimization problem:
"
Find w and b such that ! = #
is maximized (a)
(1)
and for all (xi, yi) ∈ S: yi (wTxi + b) ≥ 1 (b)

• We can reformulate the problem in (1) as follows:

Find w and b such that

Φ(w) = ||w||2=wTw is minimized (2)
and for all (xi, yi) ∈ S: yi (wTxi + b) ≥ 1

© Mehul Motani Support Vector Machines 11

Solving the Optimization Problem

Primal: Find w and b such that
Φ(w) =wTw is minimized (1)
and for all (xi, yi) ∈ S: yi (wTxi + b) ≥ 1

• Need to optimize a quadratic function subject to linear constraints.

• Quadratic optimization problems are a well-known class of mathematical
programming problems for which several (non-trivial) algorithms exist.
• Solution involves constructing a dual problem where a Lagrange multiplier αi
is associated with every inequality constraint in the primal problem:
Dual: Find α1…αn such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0 (2)
(2) αi ≥ 0 for all αi

See https://en.wikipedia.org/wiki/Quadratic_programming
© Mehul Motani Support Vector Machines 12
The Optimization Problem Solution
• Given a solution α1…αn to the dual problem, solution to the primal is:

w =Σαiyixi b = yk - Σαiyixi Txk for any αk > 0 (1)

• Each non-zero αi indicates that corresponding xi is a support vector.

• Then the classifying function is (note that we don’t need w explicitly):

f(x) = ΣαiyixiTx + b (2)

• The quantity xTy is called the inner product or dot product between the
vector x and the vector y.
• Notice that the solution relies on the inner product between the test point x
and the support vectors xi – we will return to this later.
• Also keep in mind that solving the optimization problem involved computing
the inner products xiTxj between all training points.

© Mehul Motani Support Vector Machines 13

Linear and nonlinear data models

A B C

D E F
© Mehul Motani Support Vector Machines 14
Soft Margin Classification
• What if the training set is not linearly separable?
• Slack variables ξi can be added to allow misclassification of
difficult or noisy examples, resulting margin called soft.
What should our quadratic
optimization criterion be?
Minimize:
R
1 T
ξi w w + C ∑ξ k
ξj
2 k=1
Maximize Misclassification
Margin Penalty

Note: that ξ is the Greek letter Xi

and is pronounced as ‘zai’ or ‘ksi’.
© Mehul Motani Support Vector Machines 15

Hard margin vs Soft margin

• The hard-margin SVM formulation:
Find w and b such that
Φ(w) =wTw is minimized (1)
and for all (xi ,yi) ∈ S: yi (wTxi + b) ≥ 1
• Modified soft-margin SVM formulation with slack variables:
Find w and b such that
Φ(w) =wTw + CΣξi is minimized
and for all (xi ,yi) ∈ S: yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0 (2)
• Parameter C can be viewed as a way to control overfitting
– It trades off the relative importance of maximizing the margin and fitting the training
data.
– Larger C à the more the penalty for misclassifications. This leads to smaller and
smaller margins but less misclassifications. This is essentially overfitting.
– Small C à the lower the penalty for misclassifications. This leads to larger margins
but more misclassifications.
© Mehul Motani Support Vector Machines 16
Soft Margin Classification – Solution
• Dual problem is identical to separable case:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and (1)
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi
• Again, xi with non-zero αi will be support vectors.
• Solution to the soft margin SVM is:
w =Σαiyixi
b= yk(1- ξk) - ΣαiyixiTxk for any k s.t. αk>0 (2)
Note: We don’t need to compute w
f(x) = ΣαiyixiTx + b
explicitly for classification:

Note: If the 2-norm penalty for slack variables CΣξi2 was used in primal
objective, we would need additional Lagrange multipliers for slack variables…
© Mehul Motani Support Vector Machines 17

Theoretical Justification for Maximum Margins

• VC dimension is a measure of the complexity of a classifier. The more
complex the classifier, the more prone it is to overfitting.
• Vapnik proved the following:
The class of optimal linear separators has VC dimension h bounded
from above as ìé D 2 ù ü
h £ miníê 2 ú, m0 ý + 1 (1)
îê r ú þ
where ρ is the margin, D is the diameter of the smallest sphere that
can enclose all of the training examples, and m0 is the dimensionality.
• Intuitively, this implies that regardless of dimensionality m0 we can
minimize the VC dimension by maximizing the margin ρ.
• Thus, complexity of the classifier is kept small regardless of
dimensionality.

© Mehul Motani Support Vector Machines 18

Summary of Linear SVMs
• The classifier is a separating hyperplane.
• Most “important” training points are support vectors; they
define the hyperplane.
• Quadratic optimization algorithms can identify which training
points xi are support vectors with non-zero Lagrangian
multipliers αi.
• Both in the dual formulation of the problem and in the solution,
the training points appear only inside inner products:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and f(x) = ΣαiyixiTx + b (2)
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi (1)

© Mehul Motani Support Vector Machines 19

Non-linear SVMs
Consider this noisy dataset: A
0 x

How about this dataset? B

0 x

C But what are we going to do if

0 x
the dataset is just too hard?
x2

How about… mapping

data to a higher-
dimensional space: D
0 x
© Mehul Motani Support Vector Machines 20
Non-linear SVMs: Feature spaces
• General idea: the original feature space is mapped to some
higher-dimensional feature space where the training set is
separable:
Lifting Function B
A

Φ: x → φ(x)

The “Kernel Trick”

• The linear SVM classifier relies on the inner product between vectors, for
example: K(xi,xj)=xiTxj
• If every datapoint is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the inner product becomes: K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is equivalent to an inner product in some
higher dimensional feature space.
• Example: 2-dimensional vectors x=[x1 x2] T
– Let K(xi,xj)=(1 + xiTxj)2 (1)
– Need to show that K(xi,xj)= φ(xi) Tφ(xj) for some φ(x)
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2
(2)
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2] [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] T
àK(xi,xj) = φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2] T (3)
• Thus, a kernel function implicitly maps data to a high-dimensional space
(without the need to compute each φ(x) explicitly).
© Mehul Motani Support Vector Machines 22
What Functions are Kernels?
• For some functions K(xi,xj) checking that K(xi,xj)= φ(xi) Tφ(xj) can be
cumbersome.
• Mercer’s theorem: Every semi-positive definite symmetric function is a
valid kernel
• Semi-positive definite symmetric functions correspond to a semi-positive
definite symmetric Gram matrix:
K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xn)

K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xn)

K=
(1)
… … … … …
K(xn,x1) K(xn,x2) K(xn,x3) … K(xn,xn)

Check out the discussion at: https://www.quora.com/What-is-the-kernel-trick

Examples of Kernel Functions

1. Linear: K(xi,xj)= xiTxj
– Mapping Φ: x → φ(x), where φ(x) is x itself

2. Polynomial of power p: K(xi,xj)= (1+ xiTxj)p

æd + pö
– Mapping Φ: x → φ(x), where φ(x) has çç ÷ dimensions, where d is the
è p ÷ø
original feature space dimension. 2
xi -x j
-
2s 2
3. Gaussian (radial-basis function): K(xi,xj) = e
– Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional: every point is mapped
to a function (a Gaussian); combination of functions for support vectors is the
separator.
4. Higher-dimensional space still has intrinsic dimensionality d (the mapping is
not onto), but linear separators in it correspond to non-linear separators in
original space.

Non-linear SVMs Mathematically
• Dual problem formulation:
Find α1…αn such that
Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and
(1)
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi

• The solution is:

f(x) = ΣαiyiK(xi, xj)+ b (2)

• Optimization techniques for finding αi’s remain the same!

Nonlinear SVM - Summary

• In summary, linear SVM locates a separating
hyperplane in the feature space and classifies
points in that space
• Nonlinear SVM lifts the problem to a higher
dimensional space and performs linear SVM in the
higher dimensional space.
• This corresponds to a nonlinear separator in the
original feature space.
• The algorithm does not need to represent the
space explicitly, it does this by simply defining a
kernel function, which plays the role of the inner
product in the high dimensional feature space.

Properties of SVM
• Sparseness of solution when dealing with large data sets as only
support vectors are used to specify the separating hyperplane
• Ability to handle large feature spaces as the complexity does
not depend on the dimensionality of the feature space
• Overfitting can be controlled by soft margin approach
• Mathematically nice – a simple convex optimization problem
which is guaranteed to converge to a single global solution
• Supported by theory and intuition
• SVM empirically works very well
– Text (and hypertext) categorization, image classification,
– Protein classification, Disease classification
– Hand-written character recognition

Weakness of SVM
• SVM is sensitive to noise
- A relatively small number of mislabeled examples can dramatically decrease
the performance
• Standard SVM only considers two classes
• Question: How to do multi-class classification with SVM?
• Answer: Build multiple SVMs
1. With m classes, learn m SVM’s
– SVM 1 learns “Output = 1” vs “Output != 1”
– SVM 2 learns “Output = 2” vs “Output != 2”
–:
– SVM m learns “Output = m” vs “Output != m”
2. To predict the output for a new input, just predict with each SVM and
find out which one puts the prediction the furthest into the positive
region.

SVM Summary
• SVMs were originally proposed by Boser, Guyon and Vapnik in 1992
and gained increasing popularity in late 1990s.
• SVMs are currently among the best performers for a number of
classification tasks ranging from text to genomic data.
• SVMs can be applied to complex data types beyond feature vectors
(e.g. graphs, sequences, relational data) by designing kernel functions
for such data.
• Tuning SVMs remains a black art: selecting a specific kernel and
parameters is usually done in a try-and-see manner.
• Some references on VC-dimension and Support Vector Machines:
• C.J.C. Burges. A tutorial on support vector machines for pattern
recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998.
• The VC/SRM/SVM Bible: Statistical Learning Theory by Vladimir
Vapnik, Wiley-Interscience, 1998
• https://en.wikipedia.org/wiki/Support_vector_machine
© Mehul Motani Support Vector Machines 29

What do data engineers and thieves have in common?

SVM
No ratings yet
SVM
21 pages
Final Project (Fatima)
No ratings yet
Final Project (Fatima)
21 pages
Ejemplo Carta Motivacion Ingles
No ratings yet
Ejemplo Carta Motivacion Ingles
1 page
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
10 SVM
No ratings yet
10 SVM
23 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
SVM Student
No ratings yet
SVM Student
40 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
6 Lec SVM Kernel
No ratings yet
6 Lec SVM Kernel
36 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Support Vector Machine
No ratings yet
Support Vector Machine
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
100% (1)
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
44 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
Support Vector Machines (SVM) : N I y X D
No ratings yet
Support Vector Machines (SVM) : N I y X D
5 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
103 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Deep Learn
No ratings yet
Deep Learn
48 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Main
No ratings yet
Main
12 pages
SVM - Feb 15
No ratings yet
SVM - Feb 15
34 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
SVM
No ratings yet
SVM
36 pages
20 SVM
No ratings yet
20 SVM
35 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
Lecture10 SVM
No ratings yet
Lecture10 SVM
22 pages
Support Vector Machines For Classification: A Seminar On Data Mining
No ratings yet
Support Vector Machines For Classification: A Seminar On Data Mining
18 pages
Support Vector Machines: Javier B Ejar Cbea
No ratings yet
Support Vector Machines: Javier B Ejar Cbea
44 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Master Thesis Support Vector Machine
100% (3)
Master Thesis Support Vector Machine
5 pages
ML Lectures - 20 22
No ratings yet
ML Lectures - 20 22
14 pages
How To Code For Quantum Computers
From Everand
How To Code For Quantum Computers
Nivio Dos Santos
No ratings yet
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
f1603667736 Home Networking Do-It-Yourself For Dummies
89% (9)
f1603667736 Home Networking Do-It-Yourself For Dummies
443 pages
Streamlined HMO Transaction Process Flow - 5.27.2025
No ratings yet
Streamlined HMO Transaction Process Flow - 5.27.2025
3 pages
Expressions N2
No ratings yet
Expressions N2
22 pages
GEA Process Technology For Instant Coffee - tcm11-23912
No ratings yet
GEA Process Technology For Instant Coffee - tcm11-23912
16 pages
TransferLearningwithAdaptiveFine Tuning
No ratings yet
TransferLearningwithAdaptiveFine Tuning
16 pages
Basic Mycology and Their Role in Food Spoilage: Presented By-Sagar Badnakhe 17FET113
100% (1)
Basic Mycology and Their Role in Food Spoilage: Presented By-Sagar Badnakhe 17FET113
33 pages
Edexcel GCE: Core Mathematics C3
No ratings yet
Edexcel GCE: Core Mathematics C3
24 pages
Creative Writing Grade 8 9
No ratings yet
Creative Writing Grade 8 9
1 page
Croatia 5 Contents
No ratings yet
Croatia 5 Contents
3 pages
Star Chart June 2022
No ratings yet
Star Chart June 2022
1 page
De Thi Hoc Ki 2 Tieng Anh 11 English Discovery de So 1 1712647409
No ratings yet
De Thi Hoc Ki 2 Tieng Anh 11 English Discovery de So 1 1712647409
5 pages
Solomon rprc11 PPT 05 Accessible
No ratings yet
Solomon rprc11 PPT 05 Accessible
44 pages
Ji
No ratings yet
Ji
18 pages
Mac 124 Adv Ndi
No ratings yet
Mac 124 Adv Ndi
3 pages
Language Learning Roadmap by Dreaming Spanish
No ratings yet
Language Learning Roadmap by Dreaming Spanish
2 pages
Unit 3 Lessons 1 and 2
No ratings yet
Unit 3 Lessons 1 and 2
3 pages
320D FAL Especificaciones Hidraulicas PDF
100% (1)
320D FAL Especificaciones Hidraulicas PDF
6 pages
Applying CTs in Protection Schemes For Transformers, Generators, Machines
No ratings yet
Applying CTs in Protection Schemes For Transformers, Generators, Machines
10 pages
Module 11 English 31
No ratings yet
Module 11 English 31
14 pages
TRIM
No ratings yet
TRIM
3 pages
Case Study - Network Design
No ratings yet
Case Study - Network Design
6 pages
Pip's Pigs: Student: - Date
No ratings yet
Pip's Pigs: Student: - Date
10 pages
An Atomic Empire A Technical History of The Rise and Fall of The British Atomic Energy Programme C N Hill Instant Download
No ratings yet
An Atomic Empire A Technical History of The Rise and Fall of The British Atomic Energy Programme C N Hill Instant Download
86 pages
Evans Pritchard Et Al 1954 The Institutions of Prrimitive Society
No ratings yet
Evans Pritchard Et Al 1954 The Institutions of Prrimitive Society
121 pages
Job Description: Mechanical Design Engineer: Role and Person
No ratings yet
Job Description: Mechanical Design Engineer: Role and Person
1 page
Dav Public School PPL
No ratings yet
Dav Public School PPL
22 pages
PHYSICS
No ratings yet
PHYSICS
46 pages
Salma Resume
No ratings yet
Salma Resume
1 page

A09 Support Vector Machines 2up

Uploaded by

A09 Support Vector Machines 2up

Uploaded by

Support Vector Machines

© Mehul Motani Support Vector Machines 1

AI and Machine Learning

© Mehul Motani Support Vector Machines 2

© Mehul Motani Support Vector Machines 3

SVM – A Brief History

© Mehul Motani Support Vector Machines 4

© Mehul Motani Support Vector Machines 6

© Mehul Motani Support Vector Machines 7

© Mehul Motani Support Vector Machines 9

Linear SVM Mathematically

• For every support vector xs the above inequality is an equality.

• We can reformulate the problem in (1) as follows:

Find w and b such that

© Mehul Motani Support Vector Machines 11

Solving the Optimization Problem

• Need to optimize a quadratic function subject to linear constraints.

w =Σαiyixi b = yk - Σαiyixi Txk for any αk > 0 (1)

• Each non-zero αi indicates that corresponding xi is a support vector.

f(x) = ΣαiyixiTx + b (2)

© Mehul Motani Support Vector Machines 13

Linear and nonlinear data models

Note: that ξ is the Greek letter Xi

Hard margin vs Soft margin

Theoretical Justification for Maximum Margins

© Mehul Motani Support Vector Machines 18

© Mehul Motani Support Vector Machines 19

How about this dataset? B

C But what are we going to do if

How about… mapping

The “Kernel Trick”

K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xn)

Check out the discussion at: https://www.quora.com/What-is-the-kernel-trick

© Mehul Motani Support Vector Machines 23

Examples of Kernel Functions

2. Polynomial of power p: K(xi,xj)= (1+ xiTxj)p

© Mehul Motani Support Vector Machines 24

• The solution is:

f(x) = ΣαiyiK(xi, xj)+ b (2)

• Optimization techniques for finding αi’s remain the same!

© Mehul Motani Support Vector Machines 25

Nonlinear SVM - Summary

© Mehul Motani Support Vector Machines 26

© Mehul Motani Support Vector Machines 27

© Mehul Motani Support Vector Machines 28

What do data engineers and thieves have in common?

© Mehul Motani Support Vector Machines 30

You might also like