0% found this document useful (0 votes)

9 views77 pages

10 SVM

The document provides an overview of Support Vector Machines (SVMs) and their advantages in classification, regression, and novelty detection. It discusses the mathematical foundations of SVMs, including linear classifiers, optimization problems, and the significance of maximizing the margin for better generalization. Additionally, it covers the learning process through quadratic programming and the handling of noisy datasets.

Uploaded by

ngocongthanhsg0812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views77 pages

10 SVM

Uploaded by

ngocongthanhsg0812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

Trịnh Tấn Đạt

Khoa CNTT – Đại Học Sài Gòn

Email: [email protected]
Website: https://sites.google.com/site/ttdat88/
Contents
 Introduction
 Review of Linear Algebra
 Classifiers & Classifier Margin
 Linear SVMs: Optimization Problem
 Hard Vs Soft Margin Classification
 Non-linear SVMs
Introduction
 Competitive with other classification methods
 Relatively easy to learn
 Kernel methods give an opportunity to extend the idea to
 Regression
 Density estimation
 Kernel PCA
 Etc.

3
Advantages of SVMs - 1
 A principled approach to classification, regression and novelty detection
 Good generalization capabilities
 Hypothesis has an explicit dependence on data, via support vectors – hence,
can readily interpret model

4
Advantages of SVMs - 2
 Learning involves optimization of a convex function (no local minima as in
neural nets)
 Only a few parameters are required to tune the learning machine (unlike lots of
weights and learning parameters, hidden layers, hidden units, etc as in neural
nets)

5
Prerequsites
 Vectors, matrices, dot products
 Equation of a straight line in vector notation
 Familiarity with
 Perceptron is useful
 Mathematical programming will be useful
 Vector spaces will be an added benefit
 The more comfortable you are with Linear Algebra, the easier this material will
be

6
What is a Vector ?
 Think of a vector as a directed line segment in
N-dimensions! (has “length” and a 
“direction”)   
v = b 
 Basic idea: convert geometry in higher
dimensions into algebra!  c 
 Once you define a “nice” basis along each
dimension: x-, y-, z-axis …
y
 Vector becomes a 1 x N matrix!
 v = [a b c]T v
 Geometry starts to become linear algebra on
vectors like v! x

7
Vector Addition: A+B
v+ w = ( x1 , x 2 ) + ( y1 , y 2 ) = ( x1 + y1 , x 2 + y 2 )
A+B

A
A+B = C
(use the head-to-tail method to
B combine vectors)
C
B

8
Scalar Product: av

a v = a ( x1 , x 2 ) = ( ax1 , ax 2 )

Change only the length (“scaling”), but keep direction fixed.

Sneak peek: matrix operation (Av) can change length,

direction and also dimensionality!
9
Vectors: Magnitude (Length) and Phase (direction)
v = ( x , x ,  , x )T
1 2 n
n
v =  x2 (Magnitude or “2-norm”)
i
i =1
If v = 1, a unit vector

Alternate representations:
(unit vector => pure direction)
Polar coords: (||v||, )
Complex numbers: ||v||ej
y

||v||

 “phase”
x

10
Inner (dot) Product: v.w or wTv

v 
w v.w = ( x1 , x 2 ).( y1 , y 2 ) = x1 y1 + x 2 . y 2

The inner product is a SCALAR!

v.w = ( x1 , x 2 ).( y1 , y 2 ) =|| v ||  || w || cos 

v.w = 0  v ⊥ w
If vectors v, w are “columns”, then dot product is wTv
11
Projections w/ Orthogonal Basis
 Get the component of the vector on each axis:
 dot-product with unit vector on each axis!

Aside: this is what Fourier transform does!

Projects a function onto a infinite number of orthonormal basis functions: (ej or ej2n), and
adds the results up (to get an equivalent “representation” in the “frequency” domain).

12
Projection: Using Inner Products -1

p = a (aTx)
||a|| = aTa = 1
13
Projection: Using Inner Products -2
p = a (aTb)/ (aTa)
Note: the “error vector” e = b-p
is orthogonal (perpendicular) to p.
i.e. Inner product: (b-p)Tp = 0

14
Review of Linear Algebra - 1
 Consider
w1x1+ w2x2 + b = 0 = wTx + b = w.x + b
 In the x1x2-coordinate system, this is the equation of a straight
line
Proof: Rewrite this as
x2 = (w1/w2)x1 + (1/w2) b = 0
Compare with y = m x + c
This is the equation of a straight line with slope m = (w1/w2) and
intercept c = (1/w2)
15
Review of Liner Algebra - 2
1. w.x = 0 is the eqn of a st line through origin
2. w. x + b = 0 is the eqn of any straight line
3. w. x + b = +1 is the eqn of a straight line parallel to (2)
on the positive side of Eqn (1) at a distance 1
4. w. x + b = -1 is the eqn of a straight line parallel to (2)
on the negative side of Eqn (1) at a distance 1

16
Define a Binary Classifier
▪ Define f as a classifier
▪ f = f (w, x, b) = sign (w.x + b)
▪ If f = +1, x belongs to Class 1
▪ If f = - 1, x belongs to Class 2
▪ We call f a linear classifier because
w.x + b = 0 is a straight line.
This line is called the class boundary

17

Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1 w x + b>0
denotes -1

How would you classify

this data?

w x + b<0

18

Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

How would you classify

this data?

19

Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

How would you classify

this data?

20

Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

Any of these would be

fine..

..but which is best?

21

Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

How would you classify

this data?

Misclassified
to +1 class
22

Classifier Margin
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1 Define the margin of a
linear classifier as the
width that the boundary
could be increased by
before hitting a
datapoint.

23

Maximum Margin
x f yest
1. Maximizing the margin is good according to intuition
and PAC theory f(x,w,b) = sign(w x + b)
denotes +1 2. Implies that only support vectors are important;
denotes -1 other training examples The
are ignorable.
maximum margin
3. Empirically it works verylinear classifier is the
very well.
linear classifier with the
maximum margin.
Support Vectors are
those datapoints that This is the simplest kind
the margin pushes up of SVM (Called an LSVM)
against

Linear SVM
24
Significance of Maximum Margin - 1
 From the perspective of statistical learning theory, the motivation for
considering the Binary Classifier SVM’s comes from theoretical bounds on
generalization error
 These bounds have two important features

25
Significance of Maximum Margin - 2
 The upper bound on the generalization error does not depend upon the
dimensionality of the space
 The bound is minimized by maximizing the margin

26
SVMs: Three Main Ideas
1. Define an optimal hyperplane for a linearly separable case:
1. One that maximizes the margin
2. Solve the optimization problem
2. Extend the definition to non-linearly separable cases:
1. Have a penalty term for misclassifications
3. Map data to a high dimensional space where it is easier to
classify with linear decision surfaces:
1. reformulate problem so that data is mapped implicitly to this space

27
Setting Up the Optimization Problem
Var1 The hyperplanes
when k=1 are
canonical planes

 
 w x + b = k
w
 
w  x + b = −k k Var2
k
 
w x + b = 0

28
An Observation
 The vector w is perpendicular  Let u and v be two vectors in the
to the Plus plane. Why? Plus plane. Then what is
 Why choose wx+b = +1 and w.(u-v)?
wx+b = -1 as the planes defining  Because sign (wx+b) has TWO
margin boundaries? degrees of freedom and what
matters in their ratio

29
Linear SVM Mathematically
x+ M=Margin Width

Also
What we know:
x+ = x- + λ w
 w . x+ + b = +1
 w . x- + b = -1 |x+ - x-|= M
 w . (x+-x-) = 2

30
Width of the Margin

 What we know:

w. x + + b = +1 M = || x - x || = || l w ||
+ -

w. x - + b = -1 = l || w || = l w.w
x+ = x- + lw 2 w.w 2
= =
+
- x || = M
|| x - w.w w.w
2
l=
w.w
31
Learning the Maximum Margin Classifier
 Given a guess of w and b we can compute
 whether all data points are in the correct half-planes
 the width of the margin
 Now we just need to write a program to search the space
of w’s and b’s to find the widest margin that matches all
the data points. How?
 Gradient descent? Matrix Inversion? EM? Newton’s
Method?
32
Learning via Quadratic Programming
 QP is a well-studied class of optimization algorithms to
maximize a quadratic function of some real-valued variables
subject to linear constraints.

33
Linear SVM Mathematically
◼ Goal: 1) Correctly classify all training data
wxi + b  1 if yi = +1
wxi + b £ -1 if yi = -1
yi ( wxi + b)  1 for all i
2
2) Maximize the Margin M=
1 T || w ||
same as minimize w w
2
◼ We can formulate a Quadratic Optimization Problem and solve for w and b
1 T
◼ Minimize F(w) = w w
2
subject to yi ( wxi + b)  1 i

34
Solving the Constrained Minimization
 Classical method is to minimize the associated un-constrained problem using
Lagrange multipliers. That is minimize

 1   N 
L( w, b) = w.w −   i  yi (( w.xi ) + b) − 1
2 i =1
 This is done by finding the saddle points:

¶L = 0 gives åai yi = 0
¶b

35
..contd

¶L = 0 gives w = åai yi xi
¶w
 Which when substituted back in L tells us that we should maximize the
functional:
N N
W (a ) = åai - å aia j yi y j (xi - x j )
1
i=1 2 i, j=1
 Subject to alphas greater than or equal to 0

36
...contd
 Subject to the constraints

ai ³ 0
 and

åa i yi = 0
i=1

37
Decision Surface
 The decision surface then is defined by

æ N ö
D(z) = sign ç åai yi (xi - z) + b ÷
è i ø
where z is a test vector

38
Solving the Optimization Problem
Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1
◼ Need to optimize a quadratic function subject to linear constraints.
◼ Quadratic optimization problems are a well-known class of mathematical
programming problems, and many (rather intricate) algorithms exist for solving
them.
◼ The solution involves constructing a dual problem where a Lagrange multiplier
αi is associated with every constraint in the primary problem:

Find α1…αN such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi

39
The Optimization Problem Solution
◼ The solution has the form:
w =Σαiyixi b= yk- wTxk for any xk such that αk 0

◼ Each non-zero αi indicates that corresponding xi is a support vector.

◼ Then the classifying function will have the form:
f(x) = ΣαiyixiTx + b
◼ Notice that it relies on an inner product between the test point x and the
support vectors xi – we will return to this later.
◼ Also keep in mind that solving the optimization problem involved computing
the inner products xiTxj between all pairs of training points.

40
Dataset with noise

denotes +1 ◼ Hard Margin: So far we required

denotes -1 ◼ all data points be classified correctly
◼ Allowed NO training errors
◼ What if the training set is noisy?
- Solution 1: use very powerful kernels

OVERFITTING!

41
Soft Margin Classification
Slack variables ξi can be added to allow misclassification of
difficult or noisy examples.

What should our quadratic

optimization criterion be?
Minimize
R
1
w.w + C  ε k
2 k =1

42
Hard Margin Vs Soft Margin
◼ The old formulation:
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1

◼ The new formulation incorporating slack variables:

Find w and b such that

Φ(w) =½ wTw + CΣξi is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1- ξi and ξi ≥ 0 for all i

◼ Parameter C can be viewed as a way to control overfitting.

◼ This is “constrained optimization”

43
Linear SVMs: Summary
◼ The classifier is a separating hyperplane.
◼ Most “important” training points are support vectors; they define the
hyperplane.
◼ Quadratic optimization algorithms can identify which training points xi are
support vectors with non-zero Lagrangian multipliers αi.
◼ Both in the dual formulation of the problem and in the solution training points
appear only inside dot productas:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

f(x) = ΣαiyixiTx + b

44
Why Go to Dual Formulation?
 The vector w could be infinite-dimensional and poses problems
computationally
 There are only as many Lagrange variables, “alphas”, as there are training
instances
 As a bonus, it turns out that the “alphas” are non-zero only for the support
vectors (far fewer in number than the training data)

45
Non-linear SVM
Disadvantages of Linear Decision Surfaces
Var1

Var2
47
Advantages of Non-Linear Surfaces
Var1

Var2
48
Linear Classifiers in High-Dimensional Spaces
Constructed Feature 2
Var1

Var2
Constructed Feature 1

Find function (x) to map to a

different space

49
Non-linear SVMs
◼ Datasets that are linearly separable with some noise work out great:

0 x

◼ But what are we going to do if the dataset is just too hard?

0 x

◼ How about… mapping data to a higher-dimensional space:

0 x

50
 The last figure can be thought of also as a nonlinear basis function in two
dimensions
 That is, we used the basis z = (x,x2)

51
Non-linear SVMs: Feature spaces
◼ General idea: the original input space can always be mapped to some higher-
dimensional feature space where the training set is separable:

Φ: x → φ(x)

52
What is the Mapping Function?
 The idea is to achieve linear separation by mapping the data into a higher
dimensional space
 Let us call Φ the function that achieves this mapping.
 What is this Φ?

53
Let us recall the formula we used earlier -
Linear SVM lecture:
The dual formulation

54
Dual Formula – 1 (Linear SVM)
R
1 R R

k =1
 k −    k l Qkl ,Qkl = yk yl ( xk .xl )
2 k =1 l =1

subject to R
0   k  C fork   k y k = 0
k =1

 Notice the dot product

55
Dual Formula – 2 (Linear SVM soft margin)
R
w = åa k yk xk
k=1

b = yK (1- x K ) - xK . wK
where
K = arg max k a k
Now classify, using f (x, w,b) = sign(w. x + b)
56
For the Non-linear Case....
 Let us replace the inner product (xi . xj) by
Φ(xi). Φ(xj) to define the operations in the new higher
dimensional space
 If there is a “kernel function” K such that
K(xi, xj) = Φ(xi). Φ(xj) = Φ(xi)TΦ(xj)
then we do not need to know Φ explicitly.
 This strategy is preferred because the alternative of working
with Φ is expensive, computationally.

57
Dual in New (Feature) Space
R
1 R R
max åa k - å åa ka l Qkl where Qkl = yk yl (F(xk ).F(xl ))
k=1 2 k=1 l=1
R
s.t. 0 £ a k £ C, "k and åa k yk = 0
k=1

Now define

w= å a k yk F(xk ); b = yK (1- x K ) - xK . wK
k s.t. a k >0

where K = argmax k (a k )K = arg max k (a k )

Classify with f (x, w, b) = sign(w. F(x) + b)

58
Computational Burden
 Because we’re working in a higher-dimension space (and potentially even an
infinite-dimensional space), calculating φ(xi)T φ(xj) may be intractable.
 We must do R2/2 dot products to evaluate Q
 Each dot product requires m2/2 additions and multiplications
 The whole thing costs R2m2/4.
 Too high!!! Or, does it? Really?

59
How to Avoid the Burden?
 Instead, we have the kernel trick, which tells us that
 K(xi, xj) = (1 + xi . xj)2 = φ(xi) . φ(xj)
for the given φ. Thus, we can simplify our
calculations.
 Re-writing the dual in terms of the kernel,
é ù
maxa ³0 êåai - åaia j yi y j K(xi , x j )ú
1
êë 2 i, j úû

60
Decision Function
 To classify a novel instance x once you’ve learned the optimal αi parameters,
all you have to do is calculate

f (x) = sign(w. x + b) = sign(åai yi K(xi , x) + b)

(by setting w = åa i yif (xi ) and using the kernel trick).

61
A Note
 Note that αi is only non-zero for instances φ(xi) on or
near the boundary - those are called the support vectors
since they alone specify the decision boundary.
 We can toss out the other data points once training is
complete. Thus, we only sum over the xi which constitute
the support vectors.

62
Consider a Φ. Φas shown below
é 1 ù é 1 ù
ê ú ê ú
ê 2 a1 ú ê 2b1 ú
ê ú ê ú
ê ú ê ú
ê 2am ú ê 2bm ú
ê ú ê ú
ê a2
1 ú ê b12 ú
ê ú ê ú
ê a2
2
ú ê b22 ú
ê ú ê ú
ê ú ê ú
ê am
2
ú ê bm
2
ú
ê ú ê ú
2a1a2 2b1b2
F(a). F(b) = ê ú. ê ú
ê 2a1a3 ú ê 2b1b3 ú
ê ú ê ú
ê ú ê ú
ê 2a1am ú ê 2b1bm ú
ê ú ê ú
ê 2a2 a3 ú ê 2b2 b3 ú
ê ú ê ú
ê ú ê ú
ê 2a2 am ú ê 2b2 bm ú
ê ú ê ú
ê ú ê ú
ê ú ê ú
ê 2am-1am ú ê 2bm-1bm ú
ê
ë ú
û ê
ë ú
û

63
Collecting terms in the dot product
 First term = 1 +
 Next m terms = m

 Next m terms = å 2a b i i
i=1
m

 Rest = åa 2
i bi2
i=1
m
m m å 2a b
å å 2a a b b
i i
i=1
i j i j
 Therefore i=1 j=i+1

m m m m
F(a). F(b) = 1+ 2å ai bi + å ai b + å å 2ai a j bi b j 2
i
2

i=1 i=1 i=1 j=i+1

64
Out of Curiosity
(1+ a. b) = (a. b) + 2(a. b) +1
2 2

æ ö æ m ö
m 2

= ç å ai bi ÷ + 2 ç å ai bi ÷ +1
è i=1 ø è i=1 ø
m m æ m ö
= å å ai bi a j b j + 2 ç å ai bi ÷ +1
i=1 j=1 è i=1 ø
m m m æ m ö
= å (ai bi )2 + 2å å ai bi a j b j + 2 ç å ai bi ÷ +1
i=1 i=1 j=i+1 è i=1 ø

65
Both are Same
 Comparing term by term, we see
 Φ.Φ = (1 + a.b)2
 But computing the right side is lot more efficient, O(m) (m
additions and multiplications)
 Let us call (1 + a.b)2 = K(a,b) = Kernel

66
Φ in “Kernel Trick” Example
2-dimensional vectors x = [x1 x2];
Let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj) = (1 + xiTxj)2
= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T
[1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
= φ(xi) Tφ(xj),
where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]

67
Other Kernels
 Beyond polynomials there are other high dimensional basis
functions that can be made practical by finding the right kernel
function

68
Examples of Kernel Functions
◼ Linear: K(xi,xj)= xi Txj

◼ Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

◼ Gaussian (radial-basis function network):

2
xi − x j
K ( x i , x j ) = exp(− )
2 2

◼ Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1)

69
 The function we end up optimizing is

R
1 R R
åak - 2 ååa kalQkl where Qkl = yk yl K(xk , xl )
k=1 k=1 l=1
R
s.t. 0 £ a k £ C, "k and åa k yk = 0
k=1

70
Multi-class classification
Multi-class classification
 One versus all classification
Multi-class SVM
Multi-class SVM
SVM Software
 Python: scikit-learn module
 LibSVM (C++)
 SVMLight (C)
 Torch (C++)
 Weka (Java)
…

75
Research
 One-class SVM (unsupervised learning): outlier detection
 Weibull-calibrated SVM (W-SVM) / PI -SVM: open set recognition
Homework
 CIFAR-10 image recognition using SVM
 The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with
6000 images per class. * There are 50000 training images and 10000 test images.
 These are the classes in the dataset: airplane, automobile, bird, cat, deer, dog, frog,
horse, ship, truck

 Hint : https://github.com/wikiabhi/Cifar-10
https://github.com/mok232/CIFAR-10-Image-Classification

Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
12 - Bài Toán Phân L P - SVM - v2
No ratings yet
12 - Bài Toán Phân L P - SVM - v2
138 pages
21 Support Vector Machines 03-10-2024
No ratings yet
21 Support Vector Machines 03-10-2024
72 pages
ML - Lec 8-SVM As A Linear Classifier
No ratings yet
ML - Lec 8-SVM As A Linear Classifier
78 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
74 pages
Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
SVM PCA Kmeans
No ratings yet
SVM PCA Kmeans
121 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Chapter 8
No ratings yet
Chapter 8
103 pages
ELP On Mushroom Cultivation
No ratings yet
ELP On Mushroom Cultivation
19 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
SVM Student
No ratings yet
SVM Student
40 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
SEL-487B-1: Bus Differential and Breaker Failure Relay
100% (1)
SEL-487B-1: Bus Differential and Breaker Failure Relay
726 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
BASIC CBLM9 Work in A Diverse Environment
100% (2)
BASIC CBLM9 Work in A Diverse Environment
55 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Support Vector Machines: Javier B Ejar Cbea
No ratings yet
Support Vector Machines: Javier B Ejar Cbea
44 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Fluorescence Micros
No ratings yet
Fluorescence Micros
22 pages
Tutorial4 SVM
No ratings yet
Tutorial4 SVM
8 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Handbook of Econometrics Volume 3
No ratings yet
Handbook of Econometrics Volume 3
620 pages
Background of The Study vs. Literature Review
100% (3)
Background of The Study vs. Literature Review
6 pages
Intoduction To Ista
No ratings yet
Intoduction To Ista
14 pages
Dissertation Zusammenfassung Schreiben
100% (2)
Dissertation Zusammenfassung Schreiben
6 pages
GS4 Ethics Notes by @CSEWhy
No ratings yet
GS4 Ethics Notes by @CSEWhy
26 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
10 SVM
No ratings yet
10 SVM
23 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
CE118 Project Part 1
No ratings yet
CE118 Project Part 1
42 pages
Data Science For Civil Engineering Unit 3 Notes-1
No ratings yet
Data Science For Civil Engineering Unit 3 Notes-1
29 pages
Unit 1 Family Life Lesson 2 Language
No ratings yet
Unit 1 Family Life Lesson 2 Language
76 pages
Diversity Race Module Allen
No ratings yet
Diversity Race Module Allen
47 pages
Global Citizenship Essay
No ratings yet
Global Citizenship Essay
2 pages
American Culture and Drug Abuse
No ratings yet
American Culture and Drug Abuse
1 page
Prisoners Rights Presentation
No ratings yet
Prisoners Rights Presentation
16 pages
Performance Task in ENGLISH 10
No ratings yet
Performance Task in ENGLISH 10
2 pages
HSC English 2nd Paper 2024 (All Board)
No ratings yet
HSC English 2nd Paper 2024 (All Board)
1 page
Spanos - Past-Life Ids Ufos Satanic Abuse
No ratings yet
Spanos - Past-Life Ids Ufos Satanic Abuse
8 pages
Probability
No ratings yet
Probability
38 pages
1 s2.0 S0924013620301187 Main
No ratings yet
1 s2.0 S0924013620301187 Main
13 pages
Exemplos Betas
No ratings yet
Exemplos Betas
12 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Career Opportunities - Food Security Cluster Coordinator - WFP
No ratings yet
Career Opportunities - Food Security Cluster Coordinator - WFP
4 pages
San Chit
No ratings yet
San Chit
2 pages
English Questions
No ratings yet
English Questions
6 pages
Effects of Habitat Fragmentation On The Persistence of Medium and Large Mammal Species in The Brazilian Savanna of Goiás State
No ratings yet
Effects of Habitat Fragmentation On The Persistence of Medium and Large Mammal Species in The Brazilian Savanna of Goiás State
9 pages
Investigational Device Exemption (IDE) - FDA
No ratings yet
Investigational Device Exemption (IDE) - FDA
2 pages
Iaad 2023
No ratings yet
Iaad 2023
4 pages
ACR-Orientation Work Arrangement
No ratings yet
ACR-Orientation Work Arrangement
10 pages
Jayson Dr. Palisoc Domain 3 Diversity of Learners
No ratings yet
Jayson Dr. Palisoc Domain 3 Diversity of Learners
7 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

10 SVM

Uploaded by

10 SVM

Uploaded by

Trịnh Tấn Đạt

Khoa CNTT – Đại Học Sài Gòn

Change only the length (“scaling”), but keep direction fixed.

Sneak peek: matrix operation (Av) can change length,

The inner product is a SCALAR!

v.w = ( x1 , x 2 ).( y1 , y 2 ) =|| v ||  || w || cos 

Aside: this is what Fourier transform does!

How would you classify

How would you classify

How would you classify

Any of these would be

..but which is best?

How would you classify

Find α1…αN such that

◼ Each non-zero αi indicates that corresponding xi is a support vector.

denotes +1 ◼ Hard Margin: So far we required

What should our quadratic

◼ The new formulation incorporating slack variables:

Find w and b such that

◼ Parameter C can be viewed as a way to control overfitting.

Find function (x) to map to a

◼ But what are we going to do if the dataset is just too hard?

◼ How about… mapping data to a higher-dimensional space:

 Notice the dot product

where K = argmax k (a k )K = arg max k (a k )

Classify with f (x, w, b) = sign(w. F(x) + b)

f (x) = sign(w. x + b) = sign(åai yi K(xi , x) + b)

i=1 i=1 i=1 j=i+1

◼ Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

◼ Gaussian (radial-basis function network):

◼ Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1)

You might also like