0% found this document useful (0 votes)
16 views114 pages

SVM Intro

Uploaded by

samyakiitgn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views114 pages

SVM Intro

Uploaded by

samyakiitgn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Support Vector Machines

Nipun Batra
April 25, 2023
IIT Gandhinagar
VECTOR MACHINES
SUPPORT

f
PIPER B' Nar '

CLASSIFICATION TECHNION

O O O

¥1
O
O
O O

:
÷ :O
÷
I
FEATURE
| 00 O O
/
{ SEPARATING
HYPERPLANE

Ef
o
O
o /

U
i:O:
H
/ O

-
1
FEATURE

DRAW SEPARATING HYPERPLANE


IDEA : A
|
/

# SEPARATING
-
,
'
O O O r
HYPERPLANE

Etf
r
O r

/
r
O
o or -

N r r

yr
-

I r
.
-

I ,
-800
U
-
O O
r
1-1 MARGIN
/% ,
- O

--
1
FEATURE

IDEA : MAXIMIZE THE MARGIN


|
/

# SEPARATING
-
,
'
O O O r
HYPERPLANE

)
r
O -

/
r
°
o ••• -

n ex
- / # SUPPORT
EE
$ r
.
-

Jiao o
vectors

U
-
O O
r
H MARGIN
/% ,
- O

--
I
FEATURE

SUPPORT VECTORS : POINTS ON BOUNDARY / MARGIN


VIS # DIMENSIONS
HYPERPLANE

POINT
ID ←
0000-0-000
••
VIS # DIMENSIONS
HYPERPLANE

ID

)
f- LINE
ZD oo•0 ,
-

to f

-

-
VIS # DIMENSIONS
HYPERPLANE

POINT
ID ←
÷r
•• ÷

)
f- LINE
ZD oo•O ,
-

to OO

-

3D
o 00
( HYPERPLANE
Atmore ) f

Too
?
WHICH HYPER PLANE

\ Oo Oo !

¥1
O

:
÷ :O
÷
I
FEATURE
?
WHICH HYPER PLANE

I ooo !

¥f
.

÷
.

- -
-

O
W
OO
÷
I
FEATURE
?
WHICH HYPER PLANE

I O O O
i

)
O l
O
O O
N
r

j lo o
o

U
It
/ Oo 0

i
-
I
FEATURE
?
WHICH HYPER PLANE

I OO O O /
-
-

¥f
O r
O o
n
e
w /

ooo :O
-

÷
I
FEATURE
EQUATION OF HYPERPLANE

DEFINE ?
HOW TO
EQUATION OF HYPERPLANE

plane
P
Any point
on
:

Po point on plane
\
: one

-
EQUATION OF HYPERPLANE

to
F w→ :
L vector

(
¥1
plane at Po

-
EQUATION OF HYPERPLANE


\
P and Po lie on

plane
Biggin

-
EQUATION OF HYPERPLANE

PTO = I - Teo lies on

't

I
f plane

'
Is
I '

y!no

-
EQUATION OF HYPERPLANE

PTO = I - Io lies on

qw→ plane

IEEE '
week - Io)

I
ight
'

or, I .
Csi Io )-
= 0

ORIGIN

X or LT .
I - T .
Too -_ o
,

/ or ,
15.2+3=07
-
DISTANCE B/w 11 HYPERPLANES

\ µ?Itbz=O
-


-

f. E' + bio
DISTANCE B/w 11 HYPERPLANES

\
-
µ .2tbz=O

o:# bro

ORIGIN
Distance between 2 parallel hyperplanes

Equation of two planes is:


⃗ · ⃗x + b1 = 0
w
⃗ · ⃗x + b2 = 0
w

1
Distance between 2 parallel hyperplanes

Equation of two planes is:


⃗ · ⃗x + b1 = 0
w
⃗ · ⃗x + b2 = 0
w

For a point x⃗1 on plane 1 and x⃗2 on plane 2, we have:

1
Distance between 2 parallel hyperplanes

Equation of two planes is:


⃗ · ⃗x + b1 = 0
w
⃗ · ⃗x + b2 = 0
w

For a point x⃗1 on plane 1 and x⃗2 on plane 2, we have:




x =→ −
x + tw ⃗
2 1
D = |t w
⃗ | = |t|||⃗
w ||

1
Distance between 2 parallel hyperplanes

Equation of two planes is:


⃗ · ⃗x + b1 = 0
w
⃗ · ⃗x + b2 = 0
w

For a point x⃗1 on plane 1 and x⃗2 on plane 2, we have:




x =→ −
x + tw ⃗
2 1
D = |t w
⃗ | = |t|||⃗
w ||

We can rewrite as follows:

1
Distance between 2 parallel hyperplanes

Equation of two planes is:


⃗ · ⃗x + b1 = 0
w
⃗ · ⃗x + b2 = 0
w

For a point x⃗1 on plane 1 and x⃗2 on plane 2, we have:




x =→ −
x + tw ⃗
2 1
D = |t w
⃗ | = |t|||⃗
w ||

We can rewrite as follows:


⃗ · ⃗x2 + b2 = 0
w
⇒w
⃗ · (⃗x1 + t w
⃗ ) + b2 = 0

1
Distance between 2 parallel hyperplanes

Equation of two planes is:


⃗ · ⃗x + b1 = 0
w
⃗ · ⃗x + b2 = 0
w

For a point x⃗1 on plane 1 and x⃗2 on plane 2, we have:




x =→ −
x + tw ⃗
2 1
D = |t w
⃗ | = |t|||⃗
w ||

We can rewrite as follows:


⃗ · ⃗x2 + b2 = 0
w
⇒w
⃗ · (⃗x1 + t w
⃗ ) + b2 = 0
b1 − b2 b1 − b2
⇒w w ∥2 +b1 −b1 +b2 = 0 ⇒ t =
⃗ ·⃗x1 +t∥⃗ 2
⇒ D = t∥⃗
w∥ =
∥⃗
w∥ ∥⃗w∥

1
FORMULATION

| +1 CLASS

O O O /
,
,

)
O
O /
O •
n
r

EE , too o
o + class
E / o o

÷ I
FEATURE
FORMULATION

f
+I
F. T't b =

- -
-
/
- ,
-
w→ Fet
.
b = -
l
O O O /

/
r ,
O -
-
O /
O og -
N -
, /
,

EE ,
-
I ? a.
'
o
o

E T - o 0
It O
,

-
I
FEATURE
FORMULATION

|
F. n' +3=+1

- w→E+b= -
I
O O /
00

/
r -
,
-
O /

.IO/marain--6f'ffb-Tf
o -
• -
N ,
- /

E' r
-
I -
a
-

§ / r
000

115€
It r
=
2-

FEET
FORMULATION
MARGIN
GOAL :
MAXIMIZE

MAXIMIZE 2-

f
ii. n→tb=tl 11h51 )
-

11h51
- -

-
-
T -
8. Pet b= -
I MINIMIZE
-

O O /
00

)
- -
,
-
O /

points
O og -
"
label
Correctly
,
- t
-
s . -1
.

Et ,
-
I -
a.
'
o

'

§ / O
H r
r
Oo

-
I
FEATURE
FORMULATION
MARGIN
GOAL :
MAXIMIZE

MAXIMIZE 2-

f
ii. n→tb=tl 11h51 )
-

11h51
- -

-
-
T -
8. Pet b= -
I MINIMIZE
-

O O /
00

)
r -
,
-
O /

points
O og -
"
label
Correctly
,
- t
-
s . -1
.

Et -
I -
a.
'
o
o

if Yi
,
- =
-
l
E - i. e
ooo
.

-
H -
E I
f. It b
-

-
1
FEATURE
if Yi = + I

I
I. Itb >
FORMULATION
MARGIN
GOAL :
MAXIMIZE

MAXIMIZE 2-

\
if.n→tb=tl 11h51 )

00 O O
-

/
-

e
-
w→ .se?tb= -
I lminimize.HN#
-

)
r -
,
-
O /

points
O og -
"
label
Correctly
,
- t
-
s . -1
.

Et -
I -
a.
'
o
o

if Yi
,
- =
-
l
E - i. e
ooo
.

-
H -
E I
f. It b
-

-
1
FEATURE
if Yi = + I

I. Itb 71

-
|yilw.atb
Primal Formulation

Objective
1
Minimize ||w ||2
2
s.t. yi (w .xi + b) ≥ 1 ∀i

2
Primal Formulation

Objective
1
Minimize ||w ||2
2
s.t. yi (w .xi + b) ≥ 1 ∀i

Q) What is ||w ||?

2
Primal Formulation

Objective
1
Minimize ||w ||2
2
s.t. yi (w .xi + b) ≥ 1 ∀i

Q) What is ||w ||?

  √
w1 wT w
||w || =
w 
 2 v
w = 
 
w1
u
 ...  u
uh i w 
wn
u  2
=u
u w1 , w2 , ...wn  
t  ... 
wn 2
EXAMPLE ( IN ID )

Ho
-

PON
SEPARATING

Simple Exercise

 
x y
 
1 1
 
2 1
 
−1 −1
 
−2 −1

Separating Hyperplane: wx + b = 0

3
Simple Exercise

yi (wi xi + b) ≥ 1

 
x1 y
  ⇒ yi (wi xi + b) ≥ 1
1 1

2
 ⇒ 1(w1 + b) ≥ 1
 1
−1

−1
 ⇒ 1(2w1 + b) ≥ 1
−2 −1 ⇒ −1(−w1 + b) ≥ 1
⇒ −1(−2w1 + b) ≥ 1

4
÷
^

¥÷
b

'

:
'
÷
'


÷
i¥¥÷÷÷÷
* :
÷i¥¥÷÷÷÷÷÷ : . .
Simple Exercise

wmin = 1, b = 0
w .x + b = 0
x =0

5
Simple Exercise

Minimum values satisfying constraints ⇒ w = 1 and b = 0


∴ Max margin classifier ⇒ x = 0

6
Primal Formulation is a Quadratic Program

Generally;
⇒ Minimize Quadratic(x)
⇒ such that, Linear(x)

Question

x = (x1 , x2 )
1
minimize ||x||2
2
: x1 + x2 − 1 ≥ 0

7
QUADRATIC
MINIMIZE
S t
. .
LINEAR

Az

¥÷ ✓

SOLUTION

: .
Converting to Dual Problem

Primal ⇒ Dual Conversion using Lagrangian multipliers

1
Minimize ||w̄ ||2
2
s.t. yi (w̄ .xi + b) ≥ 1
∀i

d N
1X 2 X
L(w̄ , b, α1 , α2 , ...αn ) = wi − αi (yi (w̄ .x¯i + b) − 1) ∀ αi ≥ 0
2
i=1 i=1
n
∂L X
=0⇒ αi yi = 0
∂b
i=1

8
Converting to Dual Problem

n
∂L X
= 0 ⇒w̄ − αi yi x¯i = 0
∂w
i=1
XN
w̄ = αi yi x¯i
i=1

d N
1X 2 X
L(w̄ , b, α1 , α2 , ...αn ) = wi − αi (yi (w̄ .x¯i + b) − 1
2
i=1 i=1
N N N
1 X X X
= ||w̄ ||2 − ⃗ . x¯i −
αi y i w αi y i b + αi
2
i=1 i=1 i=1
P P   
N ( i αi yi x̄i ) α y x̄
X j j j j X X
= αi + − αi yi  αj yj x̄j  x¯i
2
=1 i j
9
Converting to Dual Problem

N N N
X 1 XX
L(α) = αi − αi αj yi yj x¯i · x¯j
2
i=1 i=1 j=1

Minimize ∥w̄ ∥2 ⇒ Maximize L(α)


s.t s.t
PN
yi (w̄ , xi + b) ⩾ 1 i=1 αi yi = 0 ∀ αi ≥ 0

10
Question

Question:
αi (yi (w̄ , x̄i + b) − 1) = 0 ∀i as per KKT slackness
What is αi for support vector points?

Answer: For support vectors,


w̄ .x¯i + b = −1 (+ve class)
w̄ .x¯i + b = +1 (+ve class)
yi (w̄ · x̄i + b) − 1) = 0 for i = {support vector points}
∴ αi where i ∈ {support vector points} = ̸ 0
For all non-support vector points αi = 0

11
EXAMPLE ( IN ID )

Ho
-

PON
SEPARATING

Revisiting the Simple Example

 
x1 y
 
1 1
 
2 1
 
−1 −1
 
−2 −1

4 4 4
X 1 XX
L(α) = αi − αi αj yi yj x¯i x¯j αi ≥ 0
2
i=1 i=1 j=1
X
αi yi = 0 αi (yi (w̄ .x¯i + b − 1) = 0

12
Revisiting the Simple Example

L(α1 , α2 , α3 , α4 ) =α1 + α2 + α3 + α4
1
− {α1 α1 × (1 ∗ 1) × (1 ∗ 1)
2
+
α1 α2 × (1 ∗ 1) × (1 ∗ 2)
+
α1 α3 × (1 ∗ −1) × (1 ∗ 1)
...
α4 α4 × (−1 ∗ −1) × (−2 ∗ −2)}

How to Solve? ⇒ Use the QP Solver!!

13
Revisiting the Simple Example

For the trivial example,


We know that only x = ±1 will take part in the constraint actively.
Thus, α2 , α4 = 0
By symmetry, α1 = α3 = α (say)
P
& yi αi = 0
L(α1 , α2 , α3 , α4 ) =2α
1 2
− α (1)(−1)(1)(−1)
2
+ α2 (−1)(1)(−1)(1)
+ α2 (1)(1)(1)(1) + α2 (−1)(−1)(−1)(−1)
}

Maximize 2α − 21 (4α2 )
α

14
Revisiting the Simple Example


2α − 2α2 = 0 ⇒ 2 − 4α = 0

∂α
⇒ α = 1/2
∴ α1 = 1/2 α2 = 0; α3 = 1/2 α4 = 0
N
X
⃗ =
w αi yi x̄i = 1/2 × 1 × 1 + 0 × 1 × 2
i=1

+1/2 × −1 × −1 + 0 × −1 × −2
= 1/2 + 1/2 = 1

15
Revisiting the Simple Example

Finding b:
For the support vectors we have,
w ·→
yi (⃗ −
xi + b) − 1 = 0
or, yi (w̄ · x̄1 + b) = 1
or, yi2 (w̄ · x̄i + b) = yi
or, w̄ , x̄i + b = yi (∵ yi2 = 1)
or, b = yi − w · xi
In practice, b = N1SV N
P SV
i=1 (yi − w̄ x̄i )

16
Obtaining the Solution

1
b = {(1 − (1)(1)) + (−1 − (1)(−1))
2
1
= {0 + 0} = 0
2
=0
∴w =1 & b=0

17
Making Predictions

Making Predictions
ŷ (xi ) = SIGN(w · xi + b)
For xtest = 3; ŷ (3) = SIGN(1 × 3 + 0) = +ve class

18
Making Predictions

Alternatively,
ŷ (xTEST ) = SIGN (w̄ · x̄TEST + b)
NS
!
X
= SIGN αj yj xj · xtest + b
i=1

In our example,
α1 = 1/2; α2 = 0; α3 = 1/2; α4 = 0
 
1 1
ŷ (3) = SIGN × 1 × (1 × 3) + 0 + × (−1) × (−1 × 3) + 0
2 2
 
6
= SIGN = SIGN(3) = +1
2

19
DATA

Q Q
# ORIGINAL
z 3 k IN R
Non-Linearly Separable Data

20
Non-Linearly Separable Data

Data not separable in R

20
Non-Linearly Separable Data

Data not separable in R


Can we still use SVM?

20
Non-Linearly Separable Data

Data not separable in R


Can we still use SVM?
Yes!

20
Non-Linearly Separable Data

Data not separable in R


Can we still use SVM?
Yes!
How? Project data to a higher dimensional space.

20
ORIGINAL DATA
+0-05
2 3 K IN R
- I 0 I

Na

:L
£ DATA
' TRANSFORMED

IN RZ
04 -

i
ios : : .
ORIGINAL DATA

2 3 K IN R
- I 0 I

Na

!
¥ TRANSFORMED
DATA

"
IN R

o o

MARGIN
-

-2
- -

MAX
- -
- -
-
-

2- SEPARATING
O HYPERPLANE

tE Ca E-
Sz
-
iii.in:÷÷
"' ' "
.

2 3 K IN R
- I 0 I

Na

!
¥ TRANSFORMED
DATA

"
IN R

o o

MARGIN
-

-2
- -

MAX
- -
- -
-
-

2- SEPARATING
O HYPERPLANE

7#→ ca -
E. E
R
z
RZ SPACE
i÷i÷÷
"
ii÷÷÷÷÷÷÷÷÷÷
ORIGINAL DATA
+0-05
2 3 K IN R
- I 0 I

q
-
O

8-

>
-

DATA
TRANSFORMED

z
-
o
¥ .
:*,
2-

O l - O

II
ORIGINAL DATA
+0-05
2 3 K IN R
- I 0 I

i÷÷÷÷
q
-

8-

>
-

;
.

2-

O l - O

II
ORIGINAL DATA
+0-05
2 3 K IN R
- I 0 I

t
g 0088
-

/ r
8 -

r -

r
7- /
' " ""

II "rEm
-
'
" '

-
i

(Ff)
a- , too low
-
s
; -
-

2-

006 , -
- o
/ r

t.EE
-
2 5
/
Nz


Coil)

Go
Lo , t)
-
Nz

÷f÷
Coil)

080
( or')

Xz a

f.
( =
ai )

*'
at )
✓ x, c-

× 3¥ Eat a
2)
Az


Coil)

080
( or')

Xz a

t.ee
( =
ai )
try PER
PLANE
T IN a
SE PARA

€5 '
at )
✓ x, c-

×3 F- Flat ad
Linear SVMs in higher dimensions

Linear SVM:
Maximize

N N N
X 1 XX
L(α) = αi − αi αj yi yj xi .xj
2
i=1 i=1 j=1

such that constriants are satisfied.



Transformation (ϕ)

N N N
X 1 XX
L(α) = αi − αi αj yi yj ϕ(xi ).ϕ(xj )
2
i=1 i=1 j=1

21
Linear SVMs in higher dimensions: Steps

1. Compute ϕ(x) for each point

ϕ : Rd → RD

2. Compute dot products over RD space

Q. If D >> d
Both steps are expensive!

22
Kernel Trick

23
Kernel Trick

• Can we compute K(x̄i , x̄j ) , such that

23
Kernel Trick

• Can we compute K(x̄i , x̄j ) , such that


• K(x̄i , x̄j ) = ϕ(x̄i ).ϕ(x̄j ) , where

23
Kernel Trick

• Can we compute K(x̄i , x̄j ) , such that


• K(x̄i , x̄j ) = ϕ(x̄i ).ϕ(x̄j ) , where
• K(x̄i , x̄j ) is some function of dot product in original dimension

23
Kernel Trick

• Can we compute K(x̄i , x̄j ) , such that


• K(x̄i , x̄j ) = ϕ(x̄i ).ϕ(x̄j ) , where
• K(x̄i , x̄j ) is some function of dot product in original dimension
• ϕ(x̄i ).ϕ(x̄j ) is dot product in high dimensions (after
transformation)

23
ORIGINAL DATA
0*-09
2 3 2h IN R
- I 0 I

KERNEL

É
" •
TRICK
8-

°""
" "" " " " "

IN R2

µ;]
4- ◦
an)=
3-

2-

=ÉÉÉÉ
?⃝
KERNEL TRICK

told µ;]
=

?
Cai stj)
=
K
,
KERNEL TRICK

told [BE]
=

K ( di stj)
=
( It a i.
]
dj
- I
,
KERNEL TRICK

told µ;]
=

K ( di stj)
=
( It ✗ i. dj
] - I
,

] xi2xj2 I
① Zai dj
I +
I +
-

vii. dj
.
=
+
-

= 2mi .
oej + xi2xj2

xi? xj 2)
=
@ ai • Gaj +


> < Fsaj , aj2 >
< fzxi ni •
=
,

= ¢ Cai ) .
cfcaj )
=

ORIGINAL DATASET

a
y
- I - I

0 I

1 I

2 I

3 -
I
= .

TRANSFORMED DATASET
ORIGINAL DATASET

Fda x2
2 y
se
y I I
V2
-

-
I -

- I - I
° O I
0 I O

I 1 1 52 I 1

2 '
2 252 4 I

3 -
I
3 352 9 -
I
= * .

ORIGINAL DATASET TRANSFORMED DATASET

Sze si
N
Y se y
I I
fz
-

- I - I -
I -

0 I O
O O I

I 1
1 52 I \

2 I
2 252 4 I

3 -
I
352 9 -1
3

Calculation w/ o kernel Trick

}
< Rae x2 ) : 2
local) =
, '
{ Bae si ) : 2
& (E) = ,

MULTIPLICATION + I Addition
¢ told 011m)
- =
2
=
ORIGINAL DATASET TRANSFORMED DATASET

Sza x2
N
Y se y
I I
fz
-

- I - I -1 -

0 I O
O O I

I 1
1 52 I \

2 I
2 252 4 I

3 -
I
352 9 -1
3

Calculation with kernel Trick

Kcal , Nz ) = ( + Riaz } -
I

☒ Kidz { I

}
N.dz → I →

' + Kidz → 1
( ta, .az } - I →I
Kernel Trick

Q) Why did we use dual form?

24
Kernel Trick

Q) Why did we use dual form?


Kernels again!!

24
Kernel Trick

Q) Why did we use dual form?


Kernels again!!
Primal form doesn’t allow for the kernel trick K (x̄1 , x̄2 ) in dual and
compute ϕ(x) and then dot product in D dimensions

24
Some Kernels

1. Linear: K(x̄1 , x̄2 ) = x̄1 x̄2


2. Polynomial: K(x̄1 , x̄2 ) = (p + x̄1 x̄2 )q
2
3. Gaussian: K(x̄1 , x̄2 ) = e −γ||x̄1 −x̄2 || where γ = 1
2σ 2
- Also called
Radial Basis Function (RBF)

25
Kernels

" #
x1
Q) For x̄ = what space does kernel K(x̄, x¯′ ) = (1 + x̄ x¯′ )3
x2
belong to?
x̄ ∈ R2
ϕ(x̄) ∈ R?

K (x, z) = (1 + x1 z1 + x2 z2 )3
= ...
=< 1, x1 , x2 , x12 , x22 , x12 x2 , x1 x22 , x13 , x23 , x1 x2 >

10 dimensional?

26
Does RBF involve dot product in lower-dimensional space?

Assuming x is a one-dimensional vector, we can rewrite the RBF


kernel as:

2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)

27
Does RBF involve dot product in lower-dimensional space?

Assuming x is a one-dimensional vector, we can rewrite the RBF


kernel as:

2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)

Expanding the squared term, we get:

27
Does RBF involve dot product in lower-dimensional space?

Assuming x is a one-dimensional vector, we can rewrite the RBF


kernel as:

2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)

Expanding the squared term, we get:

(x − z)2 = x 2 − 2xz + z 2

27
Does RBF involve dot product in lower-dimensional space?

Assuming x is a one-dimensional vector, we can rewrite the RBF


kernel as:

2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)

Expanding the squared term, we get:

(x − z)2 = x 2 − 2xz + z 2

Substituting this back into the RBF kernel, we get:

27
Does RBF involve dot product in lower-dimensional space?

Assuming x is a one-dimensional vector, we can rewrite the RBF


kernel as:

2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)

Expanding the squared term, we get:

(x − z)2 = x 2 − 2xz + z 2

Substituting this back into the RBF kernel, we get:

2 −2xz+z 2 ) 2 2
K (x, z) = e −γ(x = e −γx e 2γxz e −γz

27
Does RBF involve dot product in lower-dimensional space?

Assuming x is a one-dimensional vector, we can rewrite the RBF


kernel as:

2 2
K (x, z) = e −γ∥x−z∥ = e −γ(x−z)

Expanding the squared term, we get:

(x − z)2 = x 2 − 2xz + z 2

Substituting this back into the RBF kernel, we get:

2 −2xz+z 2 ) 2 2
K (x, z) = e −γ(x = e −γx e 2γxz e −γz

Notice that the term e 2γxz is a dot product of the original data
points x and z in the one-dimensional feature space. 27
What space does the RBF kernel lie in?

Q) For x̄ = x; what space does RBF kernel lie in?

2
K (x, z) = e −γ||x−z||
2
= e −γ(x−z)

Now:

α
X αn
e =
n!
n=0

2
∴ e −γ(x−z) is ∞ dimensional!!

28
Interpretation of RBF

• ŷ (x) = sign( ni=1 αi yi K (x, xi ) + b)


P

29
Interpretation of RBF

• ŷ (x) = sign( ni=1 αi yi K (x, xi ) + b)


P
2
• K (x, xi ) = e −γ||x−xi || is the RBF kernel evaluated at x and xi

29
Interpretation of RBF

• ŷ (x) = sign( ni=1 αi yi K (x, xi ) + b)


P
2
• K (x, xi ) = e −γ||x−xi || is the RBF kernel evaluated at x and xi
2
• ŷ (x) = sign( αi yi e −γ||x−xi || + b)
P

29
Interpretation of RBF

• ŷ (x) = sign( ni=1 αi yi K (x, xi ) + b)


P
2
• K (x, xi ) = e −γ||x−xi || is the RBF kernel evaluated at x and xi
2
• ŷ (x) = sign( αi yi e −γ||x−xi || + b)
P

• −||x − xi ||2 corresponds to radial term

29
Interpretation of RBF

• ŷ (x) = sign( ni=1 αi yi K (x, xi ) + b)


P
2
• K (x, xi ) = e −γ||x−xi || is the RBF kernel evaluated at x and xi
2
• ŷ (x) = sign( αi yi e −γ||x−xi || + b)
P

• −||x − xi ||2 corresponds to radial term



P
αi yi is the activation component

29
Interpretation of RBF

• ŷ (x) = sign( ni=1 αi yi K (x, xi ) + b)


P
2
• K (x, xi ) = e −γ||x−xi || is the RBF kernel evaluated at x and xi
2
• ŷ (x) = sign( αi yi e −γ||x−xi || + b)
P

• −||x − xi ||2 corresponds to radial term



P
αi yi is the activation component
2
• e −||x−xi || is the basis component

29
RBF INTERPRETATION

• Tej

EE
RBF INTERPRETATION

?÷÷?÷÷÷÷⇐÷÷÷÷ .

You might also like