Lec5 Support Vector Machine
Lec5 Support Vector Machine
X2
X1
overview
This is a constrained optimization problem
Split the data
•• • 0
hyperplane
This hyperplane
best splits the data
because it is as far
0 as possible from
0, 0 these support
support vectors
,/
•• 0 0
Q vectors
which is another
way of saying we
0
0
• maximized the
margin
Intuition behind SVM
i
Points (instances) are like vectors p = (xl,x2 xn)
'
SVM finds the closest two points from the two classes (see
figure), that support (define) the best separating line|plane Itffortwrtai
n*
•( x2 — \\ ) = width
I
it • II
it' - x : + b =1
-
H’ Xj+i = -1
IV Xj+6 — IV —
-XT-& = 1 (-1 >
-
W X; W X : ~ ^
w 2
(x2 - X, )
HI >1
X2
Support Vector Machine
linearly separable data
Margin =
w ||
.
••
.
••
4
.1f ••
/
-1 - ••
V'
Support Vector , *;
.•
*
^ •*
/
® Support Vector
*
§
wrx + 6 = 1 *’
w >•
ViT\+ 6 = 0
\y' x + b = -l *
Svm as a minimization problem
• Maximizing is the same as minimizing 1^ 1 / 2
2 / 1tv I
Class 2
10=0
8=0.6
7=0
2=0
5=0
1=0.8
4=0
6=1.4
9=0
3=0
Class 1
Example
• Here we select 3 Support Vectors to start with.
• They are S1; S2 and S3.
2
*2 A
Si
5I
©
1 (• 52 (- )
0
1 2 3 5 6
*
1
-1
-2
>
®S2
S3
s3 ()
U
Example
• Here we will use vectors augmented with a 1 as a bias input,
and for clarity we will differentiate these with an over-tilde.
That is:
2
(5) si
* = I
I
2
* = U) $2 1
1
4
* 0 = $3 0
1 w
Example
a) 5] . 5i + a2S2' S\ + a3S3 . S -i = 1 (- ve class')
5]. S2 4 — 1 (_ — ve class')
d| " “
^ ^
^3 * 3* 2
- © (?) (© (© © )
ai
©? *
«»( ) “ (-;)
• (
; ) * (?) (
;•)
»
• = -1
Get ! +
- - 4 az +
- - 9 cr 3 1
4«! -I- 6 cr 2 4- 9 cr 3 = 1
9«! + 9 az + 17cr 3
• Simplifying the above 3 simultaneous equations we
get: a1=a2 = - 3.25 and a3 = 3.5.
« 1 = «2 - 3.25 and a 3 = 3.5 * - ( )
2
52 = -
1
1
• The hyper plane that discriminates the
positive class from the negative class is give
S3 =
Q
by:
w
z
a i Si
2 2 4
iv 1 + a2 1 + «3 0
1 1 1
2 2 4 1
w .
(-3 . 2 5) 1 + (-3 . 2 5). .
1 + (3 . 5 ) 0 0
1 1 1 3
2 2 4 1
vv = (- 3.25 ). 1 + (- 3.25 ). 1 + (3.5 ). 0 = 0
1 1 1 3
• y wx + b with w
-1
-2
Kernel trick
SVM Algorithm
I O 0 0 o o o
x=0
Positive " plane^ Negative " plane"
Harder 1-dimensional dataset
x=0
Harder 1-dimensional dataset
Remember how
permitting non-
linear basis
functions made
linear regression
so much nicer?
Let's permit them
here too
Z /t = (• )
x=0
** s
Harder 1-dimensional dataset
Remember how
permitting non-
linear basis
functions made
linear regression
so much nicer?
Let s permit them
here too
/
x^O
Non-linear SVMs: Feature spaces
Φ: x → φ(x) />
*V
\
N \
/ \ \
\
/ \
N
/ \ • <\ N
S
l
\ / s S
s
\ /
\
\ \ /
/
\
N
\
•
s
\
\
s
\
27
</>