0% found this document useful (0 votes)
18 views5 pages

Kernel Method Homework

Uploaded by

hadjiamine93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

Kernel Method Homework

Uploaded by

hadjiamine93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Homework for the Course

“Machine Learning with Kernel Methods”

Amine HADJI Pierre SANNE


master MASH master MASH
[email protected] [email protected]

Benoit DESVIGNES
master MASH
[email protected]

1 Combination Rules for Kernels

1.We denote K = αK1 + βK2 . K1 and K2 are p.d. kernel so they are symetric. Therefore
K(x, y) = αK1 (x, y) + βK2 (x, y) = αK1 (y, x) + βK2 (y, x) = K(y, x).
∀N ∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN
PN P
N N P
P N N P
P N
ai aj K(xi , yj ) = ai aj (αK1 (xi , yj ) + βK2 (xi , yj )) = α ai aj K1 (xi , yj ) +
i=1 j=1 i=1 j=1 i=1 j=1
N P
P N
β ai aj K2 (xi , yj )
i=1 j=1
By definition of p.d. kernel, we have :
N P
P N
ai aj K1 (xi , yj ) ≥ 0.
i=1 j=1
PN P N
ai aj K2 (xi , yj ) ≥ 0.
i=1 j=1
N P
P N
So we have that : ai aj K(xi , yj ) ≥ 0.
i=1 j=1
We can conclude that K is a p.d. kernel.
2.We denote K = K1 K2 . K1 and K2 are p.d. kernel so they are symetric. Therefore
K(x, y) = K1 (x, y)K2 (x, y) = K1 (y, x)K2 (y, x) = K(y, x).
Before proving that K is a p.d kernel, we first prove that if A and B are two p.s.d matrices and C a
matrix s.t cij = aij bij , then C is p.s.d.
A and B are p.s.d so we can take them as covariance matrices. Let u and v two independent r.v s.t
E(u) = E(v) = 0 and V ar(u) = A and V ar(v) = B. Let w a r.v s.t w = (wi ) with wi = ui vi .
E(w) = 0 and E(wi wj ) = E(ui uj )E(vi vj ) = aij bij = cij . So V ar(w) = C and so C must be
p.s.d.
Now, let N ∈ N and (x1 , x2 , ..., xN ) ∈ X N , the two matrices K1 = (K1 (xi , xj ))
and K2 = (K2 (xi , xj )) are p.s.d because K1 and K2 are p.d. So the matrix
K= (K1 (xi , xj )K2 (xi , xj )) = (K(xi , xj )) is also p.s.d.
We can conclude that K is a p.d. kernel.
3.We have that ∀x, y in X : lim Kn (x, y) = K(x, y).
n→+∞
Kn is a sequence of p.d. kernel so we have that : ∀n ∈ NKn (x, y) = Kn (y, x), so
lim Kn (x, y) = lim Kn (y, x). Therefore,K(x, y) = K(y, x).
n→+∞ n→+∞
N P
N
∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN ,
P
∀N ai aj K(xi , yj ) =
i=1 j=1

1
N P
P N N P
P N
ai aj lim Kn (xi , yj ) = lim ai aj Kn (xi , yj ).
i=1 j=1 n→+∞ n→+∞ i=1 j=1
N P
P N
We know that ( ai aj Kn (xi , yj ))n≥0 ≥ 0 because Kn is a sequence of p.d. kernel and the
i=1 j=1
limit of a non-negative sequence is non-negative so :
∀N ∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN
PN P N
ai aj K(xi , yj ) ≥ 0.
i=1 j=1
We can conclude that K is a p.d kernel.

n
P K1 (x,y)i
4.exp{K1 (x, y)} = lim i! .
n→+∞ i=0
K1 is a p.d. kernel so we know by question 2 that K1i is a p.d. kernel ı ∈ N. ∀i ∈ N i!1 > 0 so by
n
P K1 (x,y)i
question 1, we know that ( i! )n≥0 is a sequence of p.d. kernel.
i=0
n
P K1 (x,y)i
By question 3, we can conclude that exp{K1 (x, y)} = lim i! is a p.d. kernel.
n→+∞ i=0

2 Quizz: Positive Definite Kernels


1
• K(x, y) = 1−xy
K is obviously symmetric. We have that X = (1, 1) so |xy| < 1. Therefore,
+∞
1
= ( (xy)i
P
K(x, y) = 1−xy
i=0
n
(xy)i )n≤0 is a sequence of p.d. kernel by question 1
P
K1 (x, y) = xy is a p.d. kernel. (
i=0
and 2, therefore by question 3, we have that K(x, y) is a p.d. kernel.

• K(x, y) = 2xy
K(x, y) = 2xy = exy log(2) . Obviously K is symetric. K1 is a p.d. kernel andlog(2) ≥ 0
so by question 4 we have that 2xy = exy log(2) is a p.d. kernel.

• K(x, y) = log(1 + xy) We can take a counter example to prove that it is not a p.d. kernel.
Let us take N = 2, (x, y) = (1, 5), and a=(-5,1).
P2 P2
We have ai aj cos(xi + xj ) = 16 log(2) − 10 log(3) + log(13) < 0, therefore
i=1 j=1
log(1 + xy) is not a p.d kernel.

2
• K(x, y) = e−(x−y)
2 2
K(x, y) = e−x −y +2xy . Obviously, K is symetric.
∀N ∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN
N P
N 2 2 0 2
N P
N
ai aj e−xi −yj +2xi yj . We denote ai = ai e−xi .
P P
So ai aj K(x, y) =
i=1 j=1 i=1 j=1
N P N 0 0
ai aj e2xi yj ≥ 0 because e2xy is a p.d. kernel.
P
i=1 j=1

• K(x, y) = cos(x + y)
We can take a counter example to prove that it is not a p.d. kernel.

2
Let us take N = 2, (x, y) = ( π2 , π2 ), and a=(1,1).
P2 P2
We have ai aj cos(xi + xj ) = cos(π) + cos(π) + cos(π) + cos(π) = −4 < 0,
i=1 j=1
therefore cos(x + y) is not a p.d kernel.

• K(x, y) = cos(x − y)
Obviously, K is symetric. cos(x − y) = cos(x) cos(y) + sin(x) sin(y).
We denote φ(.) = (cos(.), sin(.)). Then cos(x − y) =< φ(x), φ(y) >R2 . We can conclude
that cos(x − y) is a p.d. kernel.

• K(x, y) = min(x, y)
Obviously, K is symetric.
Let N ∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN
N P
P N N P
P N R +∞
ai aj min(xi , xj ) = ai aj 0 1s≤xi 1s≤xj ds =
i=1 j=1 i=1 j=1
R +∞ P N
0
( ai 1s≤xi )2 ds ≥ 0
i=1
We conclude that K is a p.d. kernel.

• K(x, y) = max(x, y)
We can take a counter example to prove that it is not a p.d. kernel.
PN P N
Let us take N = 2, a = (1, −1), x = (1, 2). ai aj max(xi , xj ) = −1 < 0,
i=1 j=1
therefore K is not a p.d kernel.

min(x,y)
• K(x, y) = max(x,y)
Obviously, K is symetric.
K(x, y) = min(x, y) min( x1 , y1 ). min(x, y) is a p.d kernel for all x,y ∈ R∗+ which implies
that min( x1 , y1 ) is a p.d kernel for all x,y ∈ R∗+ . Therefore by question 2, we can conclude
that K is a p.d. kernel.

• K(x, y) = GCD(x, y).


Obviously, K is symetric.
+∞
Q φi (x)
∀x ∈ N, x = pi with p the sequence of prime number (i.e p = (2, 3, 5, 7, ...)) and
i=1
φ(x) ∈ NN (e.g φ(36) = (2, 2, 0, 0, ...)).
+∞
Q min(φi (x),φi (y)) n
Q min(φi (x),φi (y))
GCD(x, y) = pi = lim p . We know that
i=1 n→+∞ i=1 i
min(φi (x),φi (y))
∀i ∈ N∗ , min(φi (x), φi (y)) is a p.d kernel, as φ is a mapping, so pi is a p.d
n
min(φi (x),φi (y))
kernel. It implies that ∀n ∈ N∗ ,
Q
pi is a p.d kernel.
i=1
We can therefore conclude that GCD(x, y) is a p.d kernel by question 3.

• K(x, y) = LCM (x, y).


We can take a counter example to prove that it is not a p.d. kernel.
Let us take, N = 2, a = (1, −1), and x = (1, 2).

3
N P
P N
We have ai aj LCM (xi , yj ) = −1 < 0, therefore K is not a p.d kernel.
i=1 j=1

GCD(x,y)
• K(x, y = LCM (x,y)
Obviously, K is symetric.
+∞
Q min(φi (x),φi (y))−max(φi (x),φi (y))
K(x, y) = pi . Let K1 (x, y) = e− max(x,y) =
i=1
emin(−x,y) = min(e−x , e−y ) with x, y ∈ N and e−x , e−y ∈ R+ . As we know
that min(x, y) is a p.d kernel, we know that K1 is a p.d. kernel. It implies that
min(φi (x),φi (y))−max(φi (x),φi (y))
∀i ∈ N∗ , pi is p.d. kernel. By question 2, we know that
n
min(φ (x),φ (y))−max(φ (x),φ (y))
∀n ∈ N∗ ,
Q i i i i
pi is p.d. kernel. We can conclude that K is
i=1
a p.d. kernel.

3 Covariance Operators in RKHS

y) = xy with x, y ∈ R
1.K(x,  
f : R → R : a∈R
HK = with < f1 , f2 >HK = a1 a2 , ||f ||HK = |a|.
x 7→ ax
 
f : R → R : |a| ≤ 1
BK =
x 7→ ax
CnK = max covn (aX, aY ) = max abcovn (X, Y ) = |covn (X, Y )|
|a|≤,1|b|≤1 |a|≤1,|b|≤1
n n
1
2.CnK f (xi )[ n1
P P
= max covn (f (X), g(Y )) = max (− min
n g∈B g(xj ) − g(xi )]). Be-
f,g∈BK K f ∈Bk i=1 j=1
cause of the convexity of the function inside the min with respect to f (xi ), ∃µ > 0 such that
n n
Cnk = n1 max (− min g(xj ) − g(xi )] + µ||f ||2HK ).
P P
f (xi )[
g∈BK f ∈Hk i=1 j=1
n
P n
P
By the representer theorem f (x) = αi K(x, xi ). By symmetry, g(y) = βi K(y, yi ).
i=1 i=1
1 T 1 T
It implies that CnK = max n α Kx Ky β − T
n2 α Kx 1 1Ky β
αT Kx α≤1,β T Kx β≤1
1 1
1 1 T 1 T
CnK = max n αT Kx (In − n 1 1)Ky β = max 2
n α̃ Kx P Ky β
2 2
=
αT K T
x α≤1,β Kx β≤1 kα̃k≤1,kβ̃k≤1
1 1 1 1
1 2
n kKx P Kx k2 with α̃ = Kx α and β̃ = Ky2 β and k.k2 the spectral norm.
2 2 2

4 Some Basic Learning Bounds

1.

|Rφ (f, x) − Rφ (g, x)| ≤ |φ(f (x)) − φ(g(x))| + λ|kf k2HK − kgk2HK |
≤ L|f (x) − g(x)| + λkf − gk2HK
≤ L| < f, Kx >HK − < g, Kx >HK | + λ2Rkf − gkHK
kf − gkHK ≤ kf kHK + kgkHK )
≤ L| < f − g, Kx >HK +λ2Rkf − gkHK
(Cauchy-Schwartz inequality) ≤ LkKx kHK kf − gkHK + λ2Rkf − gkHK

≤ (L κ + λ2R)kf − gkHK

4
2.
Rφ (f, x) + Rφ (fx , x) = φ(f (x)) + φ(fx (x)) + λ(kf k2HK + kfx k2HK )
f (x) + fx (x)
≥ 2φ( ) + λ(kf k2HK + kgk2HK )
2
f + fx f + fx 2
≥ 2Rφ ( , x) + λ(kf k2HK + kgk2HK − 2k kHK )
2 2

∀g, R(fx , x) ≤ R(g, x) so


Rφ (f, x) + Rφ (fx , x) ≥ 2R(fx , x) + λ2 ((||f ||2HK + ||g||2HK − 2 < f, fx >HK )
So, we have that Rφ (f, x) − Rφ (g, x) ≥ λ2 ||f − fx ||2HK .
3.∀x ∈ X ∀f ∈ BK
|Ψ(f, x)| ≤ C1 kf − fx k2HK ⇒ Ψ(f, x)2 ≤ C12 ||f − fx ||2HK thanks to question 1.
kf − fx ||2HK ≥ C12 Ψ(f, x) thanks to question 2.
C12
⇒ Ψ(f, x)2 ≤ C2 Ψ(f, x) ∀x ∈ X √
C2 C2 2
⇒ EΨ(f, X) ≤ C12 EΨ(f, x) with C12 = 2(L κ+λ2R)
2
λ .
2 C12
So, ∀f ∈ BR , EΨ(f, X) ≤ CEΨ(f, x) with C = C2

You might also like