0% found this document useful (0 votes)

34 views54 pages

SVM-CDing2024 11 15

Uploaded by

hunterqjm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views54 pages

SVM-CDing2024 11 15

Uploaded by

hunterqjm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Tutorial on SVM: motivations, formulations and extensions

Support Vector Machines

Chris Ding

Many Slides adopted from Andrew Moore and UT Austin

Read: A tutorial on Support Vector Machines, by Chris Burges

Key points:

-- max-margin
-- use f(x) = +1, -1 to set the scaling constant
-- optimization: dual opt function
-- KKT condition, complementarity slackness condition
-- separable(hard) SVM vs non-separable(soft) SVM
-- Kernel trick
-- XOR problem

Three homeworks
Perceptron

• Binary classification can be viewed as the task

of separating classes in feature space:

wTx + b = 0
wTx + b > 0
wTx + b < 0

f(x) = sign(wTx + b)
Do better than perceptron.

• Which of the lines is better? optimal?

This is a decision boundary problem.

A discriminant function decides decision boundary
Classification Margin (1992)

wT xi + b
r=
• Distance from example xi to the separator is w
• Examples closest to the hyperplane are support vectors.
• Margin ρ of the separator is the distance between the two
f(x) = 1, f(x)= -1 lines. ρ

r
Maximum Margin Classification
• Maximizing the margin is good according to intuition and
PAC theory.
• Implies that only support vectors matter; other training
examples are ignorable. +1
-1
Linear SVM Mathematically
• Let training set {(xi, yi)}i=1..n, xi∈Rd, yi ∈ {-1, 1} be separated by
a hyperplane with margin ρ. Then for each training example
(xi, yi):
wTxi + b ≤ - ρ/2 if yi = -1 yi(wTxi + b) ≥ ρ/2
w xi + b ≥ ρ/2 if yi = 1
T ⇔
y s ( w T x s + b) 1
r= =
w w
• For every support vector xs the above inequality is an equality.
After rescaling w and b by ρ/2 in the equality, we obtain that
distance between each xs and the hyperplane is

• The margin can be expressed through (rescaled) w and b as:

2
ρ = 2r =
w
Set derivative w.r.t. w equal to 0
Set derivative w.r.t. b equal to 0
KKT Complementarity slackness condition
Linear SVMs Mathematically (cont.)

• The quadratic optimization problem:

Find w and b such that
2
ρ=
w is maximized
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

Which can be reformulated as:

Find w and b such that
Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Prime Opt Problem transfermed into Dual Opt Problem
KKT complementarity slackness condition
KKT complementarity slackness condition
Solving the Optimization Problem

Find w and b such that

Φ(w) =wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1

• A quadratic function with linear constraints. This original optimization

problem is called primal optimization problem
• Lagrange multiplier αi is associated with every inequality constraint. Final
optimization is to solve a dual problem

Find α1…αn such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
The Optimization Problem Solution

• Given a solution α1…αn to the dual problem, solution to

the primal is:

w =Σαiyixi b = yk - Σαiyixi Txk for any αk > 0

• Each non-zero αi indicates that corresponding xi is a

support vector. f(x) = ΣαiyixiTx + b
• Then the classifying function is (note that we don’t need
w explicitly):
Soft Margin Classification
(Data has noise, not completely separable)

• What if the training set is not linearly separable?

• Slack variables ξi can be added to allow misclassification
of noisy examples.

Proposed in 1970s
ξi =0
ξi

ξi
Soft Margin Classification Mathematically

• The old formulation:

Find w and b such that
Φ(w) =wTw is minimized
and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1

• Modified formulation incorporates slack variables:

Find w and b such that

Φ(w) =wTw + CΣξi is minimized
and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0

• Parameter C can be viewed as a way to control overfitting: it “trades off” the

relative importance of maximizing the margin and fitting the training data.
Soft Margin Classification – Solution
Invented 1995
• Dual problem is almost identical to separable case (they would not be
identical if the 2-norm penalty for slack variables CΣξi2 was used in primal
objective, we would need additional Lagrange multipliers for slack variables):
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi
• Again, xi with non-zero αi will be support vectors.
• Solution to the dual problem is:
Again, we don’t need to
compute w explicitly for
w =Σαiyixi classification:
b= yk(1- ξk) - ΣαiyixiTxk for any k s.t. αk>0
f(x) = ΣαiyixiTx + b
Theoretical Justification for Maximum
Margins
• Vapnik has proved the following:
The class of optimal linear separators has VC dimension h bounded from
above as
 D 
2

h ≤ min  2 , m0  + 1
 ρ  

where ρ is the margin, D is the diameter of the smallest sphere that

enclose all of the training examples, and m0 is the dimensionality.

• Intuitively, this implies that regardless of dimensionality m0 we can

minimize the VC dimension by maximizing the margin ρ.

• Complexity of the classifier is kept small regardless of data dimensionality.

Non-linear SVMs: Feature spaces
• General idea: the original feature space can always be mapped
to some higher-dimensional feature space where the training set is
separable

Φ: x → φ(x)
Non-linear SVMs
• Datasets that are linearly separable

0 x

• Dataset is not linearly separable

0 x

• Mapping data to a higher-dimensional space:

x2
z=(x, x2)

0 x
Kernal Trick

• Both in the dual formulation of the problem and in the solution

training points appear only inside inner products:

Find α1…αN such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and f(x) = ΣαiyixiTx + b
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

• Thus the explicit coordinates (the map function) is not required.

Kernel trick
• Example: polynomial kernel

Transform 2D input space to 3D feature space

x (=x1 , x2 ) φ ( x) ( x1 , x2 , 2 x1 x2 )
2 2

z (=z1 , z2 ) φ ( z ) ( z12 , z22 , 2 z1 z2 )

φ=
( x) ⋅ φ ( z ) ( x12 , x22 , 2 x1 x2 ) ⋅ ( z12 , z22 , 2 z1 z2 )
= x z + x z + 2 x1 z1 x2 z2 = ( x1 z1 + x2 z2 )
2 2
1 1
2 2
2 2
2

=( x ⋅ z ) =K ( x, z )
2
Kernel trick + QP
• Max margin classifier can be found by solving
1
arg max（∑ α j − ∑ α jα k y j yk (φ (x j ) ⋅ φ (x k )))
α j 2 j ,k
1
arg max（∑ α j − ∑ α jα k y j yk ( K (x j , x k ))
α j 2 j ,k

• the weight matrix (no need to compute and store)

w = ∑ α j y jφ ( x j )
j
• the decision function is

h(x) sign(∑ α j y j (φ (x=

) ⋅ φ (x j )) + b) sign(∑ α j y j K (x, x j ) + b)
j j

Copyright © 2001, 2003, Andrew W. Moore

SVM Kernel Functions
• Use kernel functions which compute

) (φ ( xi ) ⋅ φ ( x j ))= K ( xi , x j )
( zi ⋅ z j =
• The inner-product kernel K(a, b)= a ⋅ b is the (simplest) linear kernal

• Polynomial Kernel K(a, b)=(a ⋅ b +1)d

• Beyond polynomials, other very high dimensional basis functions

form practical useful Kernels:
• Radial-Basis-style Kernel Function:
 (a − b) 2  σ, κ and δ are
K (a, b) = exp − 

 2σ 
2
model parameters
chosen by CV
• Neural network style Kernel Function:
K (a, b) = tanh(κ a.b − δ )
Copyright © 2001, 2003, Andrew W. Moore
What Functions are Kernels?
• For some functions K(xi,xj) checking that K(xi,xj)= φ(xi) Tφ(xj) , i.e.,
find φ(xi) can be difficult.
• Mercer’s theorem:
Every semi-positive definite symmetric function is a kernel
• Semi-positive definite symmetric functions correspond to a semi-
positive definite symmetric Gram matrix:

K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xn)

K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xn)

K=
… … … … …
K(xn,x1) K(xn,x2) K(xn,x3) … K(xn,xn)
Colour shades is f(x) value
Checkerboard data: Standard Linear Classifier cannot separate the two classes
Solve
SolveXOR
XORproblem
problemusing
usingkernel
kernelSVM
SVM

Note: 0 ↔ (-1)
Solve XOR problem using kernel SVM
Solve XOR problem using kernel SVM

Final output of SVM:

Note: 0 ↔ (-1)
Input are {0,1}*2
Examples of Kernel Functions
• Linear: K(xi,xj)= xiTxj
– Mapping Φ: x → φ(x), where φ(x) is x itself

• Polynomial of power p: K(xi,xj)= (1+ xiTxj)p

d + p
– Mapping Φ: x → φ(x), where φ(x) has   dimensions
 p 

2
xi − x j
−
2σ 2
• Gaussian (radial-basis function): K(xi,xj) = e
– Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional: every point is
mapped to a function (a Gaussian); combination of functions for support
vectors is the separator.

• Higher-dimensional space still has intrinsic dimensionality d

(the mapping is not onto). Linear separators in it correspond to
non-linear separators in original space.
Non-linear SVMs Mathematically

• Dual problem formulation:

Find α1…αn such that
Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi

• The solution is: f(x) = ΣαiyiK(xi, xj)+ b

• Optimization techniques for finding αi’s remain the same!

D
a
t
a
M
i
n

SVM and other Linear

i
n

Classification can only separate

g
,
C
h
r
space into 2 classes
i
s How to do multi-class classification
D
i (k > 2)?
n
g

47
Sec.14.5

Use 2-class classifier to do k-class

classification

• One vs others
– Build a classifier for each class against all other class
combined together
– Need to train K such classifiers
– Use the largest score to determine final class
• One vs one
– Train K(K-1)/2 classifers, each classifer one class vs
another class.
– Use majority voting to obtain final class

48
Sec.14.5

Multi-label Classification
• Classes are mutually exclusive
– Each handwritten letter belongs to exactly one class
– A student is either 1st year, 2nd year, 3rd year, 4th year
student, can not be bother or more
– The common case: multi-class exclusive classification
• Classes are mutually non-exclusive
– An article on drug design could also discuss the drug
company’s (and market) economics.
– An image has sky, building, road etc.
– Multi-class inclusive classification (multi-label classification)

49
Sec.14.5

One vs Others: more details

• Build a classifier between each class and its

complementary set (docs from all other classes).
• Given test object, evaluate it for membership in each
class.
• Assign document to class with:
– maximum score
– maximum confidence
– maximum probability
?

?
?

50
SVM applications
• SVMs were originally proposed by Boser, Guyon and Vapnik in 1992
and gained increasing popularity in late 1990s.
• SVMs are currently among the best performers for a number of
classification tasks ranging from text to genomic data.
• SVMs can be applied to complex data types beyond feature vectors (e.g.
graphs, sequences, relational data) by designing kernel functions for
such data.
• SVM techniques have been extended to a number of tasks such as
regression [Vapnik et al. ’97], principal component analysis [Schölkopf et
al. ’99], etc.
• Most popular optimization algorithms for SVMs use decomposition to
hill-climb over a subset of αi’s at a time, e.g. SMO [Platt ’99] and
[Joachims ’99]
• Tuning SVMs remains a black art: selecting a specific kernel and
parameters is usually done in a try-and-see manner
• Most popular SVM software is LIBSVM from C.J.Lin
Homework SVM1: SVM1a, SVM1b

Solve the SVM problem for following: Find W and b.

X1: (1, 1), y = +1

X2: (-1,1), y = +1
X3: (0,-1), y = -1

X1: (1, 1, +1)

X2: (-1,1, +1)
X3: (0,-1, -1)
X4: (0,-2, -1)
Homework SVM2 (Optional. TA will provide some code example):

Generate 200 data points in 2 dimension, each class has 100.

Make the 2 classes close enough so that they are non-separable.
Run SVM solver on the data. Set C=0.01 or 0.1
Find out the data points where alpha_i = 0 or alpha_i = C.
Find data point with ksi_i > 0.

Plot the lines f(x)=1, 0, -1. adding red circles to the data where
alpha_i = 0. Adding squares to data point where alpha_i=C.

Explain what ksi_i>0 data points are ?

See example in SVM slides.

Homework SVM3 (optional) :
Derive the dual problem from the prime problem, using
quadratic penalty on ksi’s.

SVM
No ratings yet
SVM
21 pages
Intro SVM PDF
No ratings yet
Intro SVM PDF
47 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Prolog - Unification - Backtracking - Recursion - Lists - Cut
No ratings yet
Prolog - Unification - Backtracking - Recursion - Lists - Cut
78 pages
SVM
No ratings yet
SVM
36 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Mid Term Past Papers 701
No ratings yet
Mid Term Past Papers 701
4 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
University of Zimbabwe Faculty of Commerce Business Studies Department Second Semester
100% (1)
University of Zimbabwe Faculty of Commerce Business Studies Department Second Semester
4 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Linear Equations in Linear Algebra: Row Reduction and Echelon Forms
No ratings yet
Linear Equations in Linear Algebra: Row Reduction and Echelon Forms
31 pages
10 SVM
No ratings yet
10 SVM
23 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
Gauss-Seidel Iterative Method: = + sin (5 Ct) cos (Ct) −ρ A 2 gh
No ratings yet
Gauss-Seidel Iterative Method: = + sin (5 Ct) cos (Ct) −ρ A 2 gh
4 pages
Introduction To Cryptography: Basic Concepts Classical Techniqes Modern Conventional Techniques
No ratings yet
Introduction To Cryptography: Basic Concepts Classical Techniqes Modern Conventional Techniques
35 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
32 pages
ANN-Regression-Python Examples
No ratings yet
ANN-Regression-Python Examples
35 pages
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
36 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Wiley - Data Structures and Algorithms in C++, 2nd Edition - 978!0!470-46044-3
No ratings yet
Wiley - Data Structures and Algorithms in C++, 2nd Edition - 978!0!470-46044-3
3 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
Modified Gath-Geva Fuzzy Clustering For Identifica PDF
No ratings yet
Modified Gath-Geva Fuzzy Clustering For Identifica PDF
18 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
SVM Class
No ratings yet
SVM Class
33 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
QC Homework
No ratings yet
QC Homework
11 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
CIS Updated NUMERICAL ANALYSIS
No ratings yet
CIS Updated NUMERICAL ANALYSIS
4 pages
Left Recursion
No ratings yet
Left Recursion
9 pages
Rida Farouki Course
No ratings yet
Rida Farouki Course
7 pages
A Penny Saved Is A Penny Earned
No ratings yet
A Penny Saved Is A Penny Earned
2 pages
Cycle 3 Probability Sheets - Copyright PDF
No ratings yet
Cycle 3 Probability Sheets - Copyright PDF
6 pages
AI Note
No ratings yet
AI Note
113 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Operations Management Final Exam
No ratings yet
Operations Management Final Exam
23 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
AI Lec 1 Introduction, Foundation, History and State of The Art
No ratings yet
AI Lec 1 Introduction, Foundation, History and State of The Art
7 pages
Support Vector Machines: Logisic Regression
No ratings yet
Support Vector Machines: Logisic Regression
10 pages
Flow Chart
No ratings yet
Flow Chart
9 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Exercise 3 - Answer Key
No ratings yet
Exercise 3 - Answer Key
5 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
A09 Support Vector Machines 2up
No ratings yet
A09 Support Vector Machines 2up
15 pages
SVM Student
No ratings yet
SVM Student
40 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
AI18-Support Vector Machines
No ratings yet
AI18-Support Vector Machines
24 pages
MLT UNIT-4 & 5 Imp Sol
No ratings yet
MLT UNIT-4 & 5 Imp Sol
22 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
2018 Mult 9
No ratings yet
2018 Mult 9
46 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
NA Lec 11 Fall2024 Anzar Roots Bisection Method 04102024 123920pm
No ratings yet
NA Lec 11 Fall2024 Anzar Roots Bisection Method 04102024 123920pm
17 pages
Support Vector Machines: Theory, Implementation, and Applications
No ratings yet
Support Vector Machines: Theory, Implementation, and Applications
40 pages
Enhancing Machine Learning Work Ows: A Comprehensive Study of Machine Learning Pipelines
No ratings yet
Enhancing Machine Learning Work Ows: A Comprehensive Study of Machine Learning Pipelines
7 pages
156az - Finite Element Methods
No ratings yet
156az - Finite Element Methods
2 pages
Applied Maths Class 12 Board Paper
No ratings yet
Applied Maths Class 12 Board Paper
13 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
02 Nodeemb
No ratings yet
02 Nodeemb
71 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
10 SVM
No ratings yet
10 SVM
77 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Revision V5no
No ratings yet
Revision V5no
14 pages
Machine Translation
No ratings yet
Machine Translation
10 pages

SVM-CDing2024 11 15

Uploaded by

SVM-CDing2024 11 15

Uploaded by

Tutorial on SVM: motivations, formulations and extensions

Support Vector Machines

Many Slides adopted from Andrew Moore and UT Austin

Read: A tutorial on Support Vector Machines, by Chris Burges

• Binary classification can be viewed as the task

• Which of the lines is better? optimal?

This is a decision boundary problem.

• The margin can be expressed through (rescaled) w and b as:

• The quadratic optimization problem:

Which can be reformulated as:

Find w and b such that

• A quadratic function with linear constraints. This original optimization

Find α1…αn such that

• Given a solution α1…αn to the dual problem, solution to

w =Σαiyixi b = yk - Σαiyixi Txk for any αk > 0

• Each non-zero αi indicates that corresponding xi is a

• What if the training set is not linearly separable?

• The old formulation:

• Modified formulation incorporates slack variables:

Find w and b such that

• Parameter C can be viewed as a way to control overfitting: it “trades off” the

where ρ is the margin, D is the diameter of the smallest sphere that

• Intuitively, this implies that regardless of dimensionality m0 we can

• Complexity of the classifier is kept small regardless of data dimensionality.

• Dataset is not linearly separable

• Mapping data to a higher-dimensional space:

• Both in the dual formulation of the problem and in the solution

Find α1…αN such that

• Thus the explicit coordinates (the map function) is not required.

Transform 2D input space to 3D feature space

z (=z1 , z2 ) φ ( z ) ( z12 , z22 , 2 z1 z2 )

• the weight matrix (no need to compute and store)

h(x) sign(∑ α j y j (φ (x=

Copyright © 2001, 2003, Andrew W. Moore

• Polynomial Kernel K(a, b)=(a ⋅ b +1)d

• Beyond polynomials, other very high dimensional basis functions

K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xn)

Final output of SVM:

• Polynomial of power p: K(xi,xj)= (1+ xiTxj)p

• Higher-dimensional space still has intrinsic dimensionality d

• Dual problem formulation:

• The solution is: f(x) = ΣαiyiK(xi, xj)+ b

• Optimization techniques for finding αi’s remain the same!

SVM and other Linear

Classification can only separate

Use 2-class classifier to do k-class

One vs Others: more details

• Build a classifier between each class and its

Solve the SVM problem for following: Find W and b.

X1: (1, 1), y = +1

X1: (1, 1, +1)

Generate 200 data points in 2 dimension, each class has 100.

Explain what ksi_i>0 data points are ?

See example in SVM slides.

You might also like