Class04 Feature+Kernels
Class04 Feature+Kernels
Lorenzo Rosasco
Linear functions
f (x) = w > x.
I f ↔ w is one to one,
D E
I inner product f , f̄ := w > w̄,
H
I norm/metric f − f̄ := kw − w̄k.
H
Since
|f (x) − f̄ (x)| ≤ kxk kw − w̄k , ∀x ∈ X
then
wj → w ⇒ fj (x) → f (x), ∀x ∈ X .
n
1X
min (yi − w > xi )2 + λ kwk2 , λ≥0
w∈Rd n
i =1
but also
We noted that
n
X n
X
bλ = Xn > c =
w xi ci ⇔ f̂ λ (x) = x > xi ci ,
i =1 i =1
Regression
Classification
1 The spaces are linear, NOT the functions! L.Rosasco, 9.520/6.860 2018
Features and feature maps
and ϕj : X → R, for j = 1, . . . , p.
I X need not be Rd .
I We can also write
p
X
f (x) = w j ϕj (x).
i =1
The equation
p
X
f (x) = w > Φ(x) = w j ϕj (x)
i =1
suggests to think of features as some form of basis.
ϕj : X → R, j = 1, . . . , p
can be considered.
Feature design/engineering
I vision: SIFT, HOG
I audio: MFCC
I ...
In this case
n
1X
minp (yi − w > Φ(xi ))2 + λ kwk2 , λ ≥ 0,
w∈R n
i =1
Equivalent to,
n
1X
min (yi − f (xi ))2 + λ kf k2HΦ , λ ≥ 0.
f ∈HΦ n
i =1
b ∈ Rnp with
Let Φ
b ij = ϕj (xi )
(Φ)
b λ = (Φ
w b > Φ+nλI
b )−1 Φ
b>b
Y time O (np 2 ∨p 3 ) mem. O (np∨p 2 ),
but also
bλ = Φ
w b > (Φ
bΦb > +nλI )−1b
Y time O (pn 2 ∨n 3 ) mem. O (np∨n 2 ).
Analogously to before
n
X n
X
bλ = Φ
w b> c = Φ(xi )ci ⇔ f̂ λ (x) = Φ(x)> Φ(xi )ci
i =1 i =1
c = (Φ
bΦb > + λI )−1b
Y, (Φ
bΦb > )ij = Φ(xi )> Φ(xj )
p
X
Φ(x)> Φ(x̄) = ϕs (x)ϕs (x̄).
s =1
I Can we consider p = ∞?
For X = R consider
s
−x 2 γ (2γ)(j −1)
ϕj (x) = x j −1 e , j = 2, . . . , ∞
(j − 1)!
with ϕ1 (x) = 1.
Then
s s
∞ ∞
X X
−x 2 γ (2γ)j −1 j −1 −x̄ 2 γ (2γ)j −1
ϕj (x)ϕj (x̄) = x j −1 e x̄ e
(j − 1)! (j − 1)!
j =1 j =1
∞
2γ 2γ
X (2γ)j −1 2γ 2γ 2γ
= e −x e −x̄ (xx̄)j −1 = e −x e −x̄ e 2x x̄
(j − 1)!
j =1
−|x−x̄|2 γ
= e
∞
X
Φ(x)> Φ(x̄) = ϕj (x)ϕj (x̄) = k (x, x̄)
j =1
We have
n
X n
X
λ >
f̂ (x) = Φ(x) Φ(xi )ci = k (x, xi )ci
i =1 i =1
K + λI )−1b
c = (b Y, K )ij = Φ(xi )> Φ(xj ) = k (xi , xj )
(b
K is the kernel matrix, the Gram (inner products) matrix of the data.
b
I Equivalently
n
X
k (xi , xj )ai aj ≥ 0,
i ,j =1
for any a1 , . . . , an ∈ R, x1 , . . . , xn ∈ X .
Assume Φ : X → Rp , p ≤ ∞ and
Note that
n n n 2
X X X
k (xi , xj )ai aj = Φ(xi )> Φ(xj )ai aj = Φ(xi )ai .
i ,j =1 i ,j =1 i =1
Clearly k is symmetric.
Classic examples
I linear k (x, x̄) = x > x̄
I polynomial k (x, x̄) = (x > x̄ + 1)s
2
I Gaussian k (x, x̄) = e −kx−x̄k γ
N
X
f (x) = k (x, xi )ai
i =1
D E N X
X N̄
f , f̄ = k (xi , x̄j )ai āj .
Hk
i =1 j =1
Theorem
Given a pos. def. k there exists Φ s.t. k (x, x̄) = hΦ(x), Φ(x̄)iHk and
HΦ ' H k
Roughly speaking
N
X
>
f (x) = w Φ(x) ' f (x) = k (x, xi )ai
i =1
I reproducing property
I reproducing kernel Hilbert spaces (RKHS)
I Mercer theorem (Kar hunen Loéve expansion)
I Gaussian processes
I Cameron-Martin spaces
I Note that
Definition
A RKHS H is a Hilbert with a function k : X × X → R s.t.
I kx = k (x, ·) ∈ Hk ,
I and
f (x) = hf , kx iHk .
Theorem
If H is a RKHS then k is pos. def.
ex (f ) = f (x)
|ex (f ) − ex (f̄ )| . f − f̄ Hk
, ∀x ∈ X
since
ex (f ) = hf , kx iHk .
Theorem
A Hilbert space with continuous evaluation functionals is a RKHS.
plus
I pos. def. functions
I reproducing property
I RKHS