SVM Intro
SVM Intro
Nipun Batra
April 25, 2023
IIT Gandhinagar
VECTOR MACHINES
SUPPORT
f
PIPER B' Nar '
CLASSIFICATION TECHNION
O O O
¥1
O
O
O O
:
÷ :O
÷
I
FEATURE
| 00 O O
/
{ SEPARATING
HYPERPLANE
Ef
o
O
o /
U
i:O:
H
/ O
-
1
FEATURE
# SEPARATING
-
,
'
O O O r
HYPERPLANE
Etf
r
O r
/
r
O
o or -
N r r
yr
-
I r
.
-
I ,
-800
U
-
O O
r
1-1 MARGIN
/% ,
- O
--
1
FEATURE
# SEPARATING
-
,
'
O O O r
HYPERPLANE
)
r
O -
/
r
°
o ••• -
n ex
- / # SUPPORT
EE
$ r
.
-
Jiao o
vectors
U
-
O O
r
H MARGIN
/% ,
- O
--
I
FEATURE
POINT
ID ←
0000-0-000
••
VIS # DIMENSIONS
HYPERPLANE
ID
)
f- LINE
ZD oo•0 ,
-
to f
•
-
-
VIS # DIMENSIONS
HYPERPLANE
POINT
ID ←
÷r
•• ÷
)
f- LINE
ZD oo•O ,
-
to OO
•
-
3D
o 00
( HYPERPLANE
Atmore ) f
Too
?
WHICH HYPER PLANE
\ Oo Oo !
¥1
O
:
÷ :O
÷
I
FEATURE
?
WHICH HYPER PLANE
I ooo !
¥f
.
÷
.
- -
-
O
W
OO
÷
I
FEATURE
?
WHICH HYPER PLANE
I O O O
i
)
O l
O
O O
N
r
j lo o
o
U
It
/ Oo 0
i
-
I
FEATURE
?
WHICH HYPER PLANE
I OO O O /
-
-
¥f
O r
O o
n
e
w /
ooo :O
-
÷
I
FEATURE
EQUATION OF HYPERPLANE
DEFINE ?
HOW TO
EQUATION OF HYPERPLANE
plane
P
Any point
on
:
Po point on plane
\
: one
-
EQUATION OF HYPERPLANE
to
F w→ :
L vector
(
¥1
plane at Po
-
EQUATION OF HYPERPLANE
↳
\
P and Po lie on
plane
Biggin
-
EQUATION OF HYPERPLANE
't
I
f plane
'
Is
I '
y!no
-
EQUATION OF HYPERPLANE
PTO = I - Io lies on
qw→ plane
IEEE '
week - Io)
I
ight
'
or, I .
Csi Io )-
= 0
ORIGIN
X or LT .
I - T .
Too -_ o
,
/ or ,
15.2+3=07
-
DISTANCE B/w 11 HYPERPLANES
\ µ?Itbz=O
-
↳
-
f. E' + bio
DISTANCE B/w 11 HYPERPLANES
\
-
µ .2tbz=O
o:# bro
ORIGIN
Distance between 2 parallel hyperplanes
1
Distance between 2 parallel hyperplanes
1
Distance between 2 parallel hyperplanes
1
Distance between 2 parallel hyperplanes
1
Distance between 2 parallel hyperplanes
1
Distance between 2 parallel hyperplanes
1
FORMULATION
| +1 CLASS
O O O /
,
,
)
O
O /
O •
n
r
EE , too o
o + class
E / o o
÷ I
FEATURE
FORMULATION
f
+I
F. T't b =
- -
-
/
- ,
-
w→ Fet
.
b = -
l
O O O /
/
r ,
O -
-
O /
O og -
N -
, /
,
EE ,
-
I ? a.
'
o
o
E T - o 0
It O
,
-
I
FEATURE
FORMULATION
|
F. n' +3=+1
- w→E+b= -
I
O O /
00
/
r -
,
-
O /
.IO/marain--6f'ffb-Tf
o -
• -
N ,
- /
E' r
-
I -
a
-
§ / r
000
115€
It r
=
2-
FEET
FORMULATION
MARGIN
GOAL :
MAXIMIZE
MAXIMIZE 2-
f
ii. n→tb=tl 11h51 )
-
11h51
- -
-
-
T -
8. Pet b= -
I MINIMIZE
-
O O /
00
)
- -
,
-
O /
points
O og -
"
label
Correctly
,
- t
-
s . -1
.
Et ,
-
I -
a.
'
o
'
§ / O
H r
r
Oo
-
I
FEATURE
FORMULATION
MARGIN
GOAL :
MAXIMIZE
MAXIMIZE 2-
f
ii. n→tb=tl 11h51 )
-
11h51
- -
-
-
T -
8. Pet b= -
I MINIMIZE
-
O O /
00
)
r -
,
-
O /
points
O og -
"
label
Correctly
,
- t
-
s . -1
.
Et -
I -
a.
'
o
o
if Yi
,
- =
-
l
E - i. e
ooo
.
-
H -
E I
f. It b
-
-
1
FEATURE
if Yi = + I
I
I. Itb >
FORMULATION
MARGIN
GOAL :
MAXIMIZE
MAXIMIZE 2-
\
if.n→tb=tl 11h51 )
00 O O
-
/
-
e
-
w→ .se?tb= -
I lminimize.HN#
-
)
r -
,
-
O /
points
O og -
"
label
Correctly
,
- t
-
s . -1
.
Et -
I -
a.
'
o
o
if Yi
,
- =
-
l
E - i. e
ooo
.
-
H -
E I
f. It b
-
-
1
FEATURE
if Yi = + I
I. Itb 71
-
|yilw.atb
Primal Formulation
Objective
1
Minimize ||w ||2
2
s.t. yi (w .xi + b) ≥ 1 ∀i
2
Primal Formulation
Objective
1
Minimize ||w ||2
2
s.t. yi (w .xi + b) ≥ 1 ∀i
2
Primal Formulation
Objective
1
Minimize ||w ||2
2
s.t. yi (w .xi + b) ≥ 1 ∀i
√
w1 wT w
||w || =
w
2 v
w =
w1
u
... u
uh i w
wn
u 2
=u
u w1 , w2 , ...wn
t ...
wn 2
EXAMPLE ( IN ID )
Ho
-
PON
SEPARATING
✓
Simple Exercise
x y
1 1
2 1
−1 −1
−2 −1
Separating Hyperplane: wx + b = 0
3
Simple Exercise
yi (wi xi + b) ≥ 1
x1 y
⇒ yi (wi xi + b) ≥ 1
1 1
2
⇒ 1(w1 + b) ≥ 1
1
−1
−1
⇒ 1(2w1 + b) ≥ 1
−2 −1 ⇒ −1(−w1 + b) ≥ 1
⇒ −1(−2w1 + b) ≥ 1
4
÷
^
¥÷
b
'
:
'
÷
'
②
÷
i¥¥÷÷÷÷
* :
÷i¥¥÷÷÷÷÷÷ : . .
Simple Exercise
wmin = 1, b = 0
w .x + b = 0
x =0
5
Simple Exercise
6
Primal Formulation is a Quadratic Program
Generally;
⇒ Minimize Quadratic(x)
⇒ such that, Linear(x)
Question
x = (x1 , x2 )
1
minimize ||x||2
2
: x1 + x2 − 1 ≥ 0
7
QUADRATIC
MINIMIZE
S t
. .
LINEAR
Az
¥÷ ✓
•
SOLUTION
: .
Converting to Dual Problem
1
Minimize ||w̄ ||2
2
s.t. yi (w̄ .xi + b) ≥ 1
∀i
d N
1X 2 X
L(w̄ , b, α1 , α2 , ...αn ) = wi − αi (yi (w̄ .x¯i + b) − 1) ∀ αi ≥ 0
2
i=1 i=1
n
∂L X
=0⇒ αi yi = 0
∂b
i=1
8
Converting to Dual Problem
n
∂L X
= 0 ⇒w̄ − αi yi x¯i = 0
∂w
i=1
XN
w̄ = αi yi x¯i
i=1
d N
1X 2 X
L(w̄ , b, α1 , α2 , ...αn ) = wi − αi (yi (w̄ .x¯i + b) − 1
2
i=1 i=1
N N N
1 X X X
= ||w̄ ||2 − ⃗ . x¯i −
αi y i w αi y i b + αi
2
i=1 i=1 i=1
P P
N ( i αi yi x̄i ) α y x̄
X j j j j X X
= αi + − αi yi αj yj x̄j x¯i
2
=1 i j
9
Converting to Dual Problem
N N N
X 1 XX
L(α) = αi − αi αj yi yj x¯i · x¯j
2
i=1 i=1 j=1
10
Question
Question:
αi (yi (w̄ , x̄i + b) − 1) = 0 ∀i as per KKT slackness
What is αi for support vector points?
11
EXAMPLE ( IN ID )
Ho
-
PON
SEPARATING
✓
Revisiting the Simple Example
x1 y
1 1
2 1
−1 −1
−2 −1
4 4 4
X 1 XX
L(α) = αi − αi αj yi yj x¯i x¯j αi ≥ 0
2
i=1 i=1 j=1
X
αi yi = 0 αi (yi (w̄ .x¯i + b − 1) = 0
12
Revisiting the Simple Example
L(α1 , α2 , α3 , α4 ) =α1 + α2 + α3 + α4
1
− {α1 α1 × (1 ∗ 1) × (1 ∗ 1)
2
+
α1 α2 × (1 ∗ 1) × (1 ∗ 2)
+
α1 α3 × (1 ∗ −1) × (1 ∗ 1)
...
α4 α4 × (−1 ∗ −1) × (−2 ∗ −2)}
13
Revisiting the Simple Example
Maximize 2α − 21 (4α2 )
α
14
Revisiting the Simple Example
∂
2α − 2α2 = 0 ⇒ 2 − 4α = 0
∂α
⇒ α = 1/2
∴ α1 = 1/2 α2 = 0; α3 = 1/2 α4 = 0
N
X
⃗ =
w αi yi x̄i = 1/2 × 1 × 1 + 0 × 1 × 2
i=1
+1/2 × −1 × −1 + 0 × −1 × −2
= 1/2 + 1/2 = 1
15
Revisiting the Simple Example
Finding b:
For the support vectors we have,
w ·→
yi (⃗ −
xi + b) − 1 = 0
or, yi (w̄ · x̄1 + b) = 1
or, yi2 (w̄ · x̄i + b) = yi
or, w̄ , x̄i + b = yi (∵ yi2 = 1)
or, b = yi − w · xi
In practice, b = N1SV N
P SV
i=1 (yi − w̄ x̄i )
16
Obtaining the Solution
1
b = {(1 − (1)(1)) + (−1 − (1)(−1))
2
1
= {0 + 0} = 0
2
=0
∴w =1 & b=0
17
Making Predictions
Making Predictions
ŷ (xi ) = SIGN(w · xi + b)
For xtest = 3; ŷ (3) = SIGN(1 × 3 + 0) = +ve class
18
Making Predictions
Alternatively,
ŷ (xTEST ) = SIGN (w̄ · x̄TEST + b)
NS
!
X
= SIGN αj yj xj · xtest + b
i=1
In our example,
α1 = 1/2; α2 = 0; α3 = 1/2; α4 = 0
1 1
ŷ (3) = SIGN × 1 × (1 × 3) + 0 + × (−1) × (−1 × 3) + 0
2 2
6
= SIGN = SIGN(3) = +1
2
19
DATA
Q Q
# ORIGINAL
z 3 k IN R
Non-Linearly Separable Data
20
Non-Linearly Separable Data
20
Non-Linearly Separable Data
20
Non-Linearly Separable Data
20
Non-Linearly Separable Data
20
ORIGINAL DATA
+0-05
2 3 K IN R
- I 0 I
Na
:L
£ DATA
' TRANSFORMED
IN RZ
04 -
i
ios : : .
ORIGINAL DATA
2 3 K IN R
- I 0 I
Na
!
¥ TRANSFORMED
DATA
"
IN R
o o
MARGIN
-
-2
- -
MAX
- -
- -
-
-
2- SEPARATING
O HYPERPLANE
tE Ca E-
Sz
-
iii.in:÷÷
"' ' "
.
2 3 K IN R
- I 0 I
Na
!
¥ TRANSFORMED
DATA
"
IN R
o o
MARGIN
-
-2
- -
MAX
- -
- -
-
-
2- SEPARATING
O HYPERPLANE
7#→ ca -
E. E
R
z
RZ SPACE
i÷i÷÷
"
ii÷÷÷÷÷÷÷÷÷÷
ORIGINAL DATA
+0-05
2 3 K IN R
- I 0 I
q
-
O
8-
>
-
DATA
TRANSFORMED
z
-
o
¥ .
:*,
2-
O l - O
II
ORIGINAL DATA
+0-05
2 3 K IN R
- I 0 I
i÷÷÷÷
q
-
⑥
8-
>
-
;
.
2-
O l - O
II
ORIGINAL DATA
+0-05
2 3 K IN R
- I 0 I
t
g 0088
-
/ r
8 -
r -
r
7- /
' " ""
II "rEm
-
'
" '
-
i
(Ff)
a- , too low
-
s
; -
-
2-
006 , -
- o
/ r
t.EE
-
2 5
/
Nz
f÷
Coil)
•
Go
Lo , t)
-
Nz
÷f÷
Coil)
•
080
( or')
Xz a
f.
( =
ai )
*'
at )
✓ x, c-
× 3¥ Eat a
2)
Az
f÷
Coil)
•
080
( or')
Xz a
t.ee
( =
ai )
try PER
PLANE
T IN a
SE PARA
€5 '
at )
✓ x, c-
×3 F- Flat ad
Linear SVMs in higher dimensions
Linear SVM:
Maximize
N N N
X 1 XX
L(α) = αi − αi αj yi yj xi .xj
2
i=1 i=1 j=1
N N N
X 1 XX
L(α) = αi − αi αj yi yj ϕ(xi ).ϕ(xj )
2
i=1 i=1 j=1
21
Linear SVMs in higher dimensions: Steps
ϕ : Rd → RD
Q. If D >> d
Both steps are expensive!
22
Kernel Trick
23
Kernel Trick
23
Kernel Trick
23
Kernel Trick
23
Kernel Trick
23
ORIGINAL DATA
0*-09
2 3 2h IN R
- I 0 I
KERNEL
É
" •
TRICK
8-
°""
" "" " " " "
IN R2
µ;]
4- ◦
an)=
3-
2-
=ÉÉÉÉ
?⃝
KERNEL TRICK
told µ;]
=
?
Cai stj)
=
K
,
KERNEL TRICK
told [BE]
=
K ( di stj)
=
( It a i.
]
dj
- I
,
KERNEL TRICK
told µ;]
=
K ( di stj)
=
( It ✗ i. dj
] - I
,
] xi2xj2 I
① Zai dj
I +
I +
-
vii. dj
.
=
+
-
= 2mi .
oej + xi2xj2
xi? xj 2)
=
@ ai • Gaj +
≥
> < Fsaj , aj2 >
< fzxi ni •
=
,
= ¢ Cai ) .
cfcaj )
=
ORIGINAL DATASET
a
y
- I - I
0 I
1 I
2 I
3 -
I
= .
TRANSFORMED DATASET
ORIGINAL DATASET
Fda x2
2 y
se
y I I
V2
-
-
I -
- I - I
° O I
0 I O
I 1 1 52 I 1
2 '
2 252 4 I
3 -
I
3 352 9 -
I
= * .
Sze si
N
Y se y
I I
fz
-
- I - I -
I -
0 I O
O O I
I 1
1 52 I \
2 I
2 252 4 I
3 -
I
352 9 -1
3
}
< Rae x2 ) : 2
local) =
, '
{ Bae si ) : 2
& (E) = ,
MULTIPLICATION + I Addition
¢ told 011m)
- =
2
=
ORIGINAL DATASET TRANSFORMED DATASET
Sza x2
N
Y se y
I I
fz
-
- I - I -1 -
0 I O
O O I
I 1
1 52 I \
2 I
2 252 4 I
3 -
I
352 9 -1
3
Kcal , Nz ) = ( + Riaz } -
I
☒ Kidz { I
}
N.dz → I →
↳
' + Kidz → 1
( ta, .az } - I →I
Kernel Trick
24
Kernel Trick
24
Kernel Trick
24
Some Kernels
25
Kernels
" #
x1
Q) For x̄ = what space does kernel K(x̄, x¯′ ) = (1 + x̄ x¯′ )3
x2
belong to?
x̄ ∈ R2
ϕ(x̄) ∈ R?
K (x, z) = (1 + x1 z1 + x2 z2 )3
= ...
=< 1, x1 , x2 , x12 , x22 , x12 x2 , x1 x22 , x13 , x23 , x1 x2 >
10 dimensional?
26
Does RBF involve dot product in lower-dimensional space?
2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)
27
Does RBF involve dot product in lower-dimensional space?
2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)
27
Does RBF involve dot product in lower-dimensional space?
2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)
(x − z)2 = x 2 − 2xz + z 2
27
Does RBF involve dot product in lower-dimensional space?
2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)
(x − z)2 = x 2 − 2xz + z 2
27
Does RBF involve dot product in lower-dimensional space?
2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)
(x − z)2 = x 2 − 2xz + z 2
2 −2xz+z 2 ) 2 2
K (x, z) = e −γ(x = e −γx e 2γxz e −γz
27
Does RBF involve dot product in lower-dimensional space?
2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)
(x − z)2 = x 2 − 2xz + z 2
2 −2xz+z 2 ) 2 2
K (x, z) = e −γ(x = e −γx e 2γxz e −γz
Notice that the term e 2γxz is a dot product of the original data
points x and z in the one-dimensional feature space. 27
What space does the RBF kernel lie in?
2
K (x, z) = e −γ||x−z||
2
= e −γ(x−z)
Now:
∞
α
X αn
e =
n!
n=0
2
∴ e −γ(x−z) is ∞ dimensional!!
28
Interpretation of RBF
29
Interpretation of RBF
29
Interpretation of RBF
29
Interpretation of RBF
29
Interpretation of RBF
29
Interpretation of RBF
29
RBF INTERPRETATION
• Tej
EE
RBF INTERPRETATION
?÷÷?÷÷÷÷⇐÷÷÷÷ .