Kernel Method Homework
Kernel Method Homework
Benoit DESVIGNES
master MASH
[email protected]
1.We denote K = αK1 + βK2 . K1 and K2 are p.d. kernel so they are symetric. Therefore
K(x, y) = αK1 (x, y) + βK2 (x, y) = αK1 (y, x) + βK2 (y, x) = K(y, x).
∀N ∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN
PN P
N N P
P N N P
P N
ai aj K(xi , yj ) = ai aj (αK1 (xi , yj ) + βK2 (xi , yj )) = α ai aj K1 (xi , yj ) +
i=1 j=1 i=1 j=1 i=1 j=1
N P
P N
β ai aj K2 (xi , yj )
i=1 j=1
By definition of p.d. kernel, we have :
N P
P N
ai aj K1 (xi , yj ) ≥ 0.
i=1 j=1
PN P N
ai aj K2 (xi , yj ) ≥ 0.
i=1 j=1
N P
P N
So we have that : ai aj K(xi , yj ) ≥ 0.
i=1 j=1
We can conclude that K is a p.d. kernel.
2.We denote K = K1 K2 . K1 and K2 are p.d. kernel so they are symetric. Therefore
K(x, y) = K1 (x, y)K2 (x, y) = K1 (y, x)K2 (y, x) = K(y, x).
Before proving that K is a p.d kernel, we first prove that if A and B are two p.s.d matrices and C a
matrix s.t cij = aij bij , then C is p.s.d.
A and B are p.s.d so we can take them as covariance matrices. Let u and v two independent r.v s.t
E(u) = E(v) = 0 and V ar(u) = A and V ar(v) = B. Let w a r.v s.t w = (wi ) with wi = ui vi .
E(w) = 0 and E(wi wj ) = E(ui uj )E(vi vj ) = aij bij = cij . So V ar(w) = C and so C must be
p.s.d.
Now, let N ∈ N and (x1 , x2 , ..., xN ) ∈ X N , the two matrices K1 = (K1 (xi , xj ))
and K2 = (K2 (xi , xj )) are p.s.d because K1 and K2 are p.d. So the matrix
K= (K1 (xi , xj )K2 (xi , xj )) = (K(xi , xj )) is also p.s.d.
We can conclude that K is a p.d. kernel.
3.We have that ∀x, y in X : lim Kn (x, y) = K(x, y).
n→+∞
Kn is a sequence of p.d. kernel so we have that : ∀n ∈ NKn (x, y) = Kn (y, x), so
lim Kn (x, y) = lim Kn (y, x). Therefore,K(x, y) = K(y, x).
n→+∞ n→+∞
N P
N
∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN ,
P
∀N ai aj K(xi , yj ) =
i=1 j=1
1
N P
P N N P
P N
ai aj lim Kn (xi , yj ) = lim ai aj Kn (xi , yj ).
i=1 j=1 n→+∞ n→+∞ i=1 j=1
N P
P N
We know that ( ai aj Kn (xi , yj ))n≥0 ≥ 0 because Kn is a sequence of p.d. kernel and the
i=1 j=1
limit of a non-negative sequence is non-negative so :
∀N ∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN
PN P N
ai aj K(xi , yj ) ≥ 0.
i=1 j=1
We can conclude that K is a p.d kernel.
n
P K1 (x,y)i
4.exp{K1 (x, y)} = lim i! .
n→+∞ i=0
K1 is a p.d. kernel so we know by question 2 that K1i is a p.d. kernel ı ∈ N. ∀i ∈ N i!1 > 0 so by
n
P K1 (x,y)i
question 1, we know that ( i! )n≥0 is a sequence of p.d. kernel.
i=0
n
P K1 (x,y)i
By question 3, we can conclude that exp{K1 (x, y)} = lim i! is a p.d. kernel.
n→+∞ i=0
• K(x, y) = 2xy
K(x, y) = 2xy = exy log(2) . Obviously K is symetric. K1 is a p.d. kernel andlog(2) ≥ 0
so by question 4 we have that 2xy = exy log(2) is a p.d. kernel.
• K(x, y) = log(1 + xy) We can take a counter example to prove that it is not a p.d. kernel.
Let us take N = 2, (x, y) = (1, 5), and a=(-5,1).
P2 P2
We have ai aj cos(xi + xj ) = 16 log(2) − 10 log(3) + log(13) < 0, therefore
i=1 j=1
log(1 + xy) is not a p.d kernel.
2
• K(x, y) = e−(x−y)
2 2
K(x, y) = e−x −y +2xy . Obviously, K is symetric.
∀N ∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN
N P
N 2 2 0 2
N P
N
ai aj e−xi −yj +2xi yj . We denote ai = ai e−xi .
P P
So ai aj K(x, y) =
i=1 j=1 i=1 j=1
N P N 0 0
ai aj e2xi yj ≥ 0 because e2xy is a p.d. kernel.
P
i=1 j=1
• K(x, y) = cos(x + y)
We can take a counter example to prove that it is not a p.d. kernel.
2
Let us take N = 2, (x, y) = ( π2 , π2 ), and a=(1,1).
P2 P2
We have ai aj cos(xi + xj ) = cos(π) + cos(π) + cos(π) + cos(π) = −4 < 0,
i=1 j=1
therefore cos(x + y) is not a p.d kernel.
• K(x, y) = cos(x − y)
Obviously, K is symetric. cos(x − y) = cos(x) cos(y) + sin(x) sin(y).
We denote φ(.) = (cos(.), sin(.)). Then cos(x − y) =< φ(x), φ(y) >R2 . We can conclude
that cos(x − y) is a p.d. kernel.
• K(x, y) = min(x, y)
Obviously, K is symetric.
Let N ∈ N, (x1 , x2 , ..., xN ) ∈ X N and (a1 , a2 , ..., aN ) ∈ RN
N P
P N N P
P N R +∞
ai aj min(xi , xj ) = ai aj 0 1s≤xi 1s≤xj ds =
i=1 j=1 i=1 j=1
R +∞ P N
0
( ai 1s≤xi )2 ds ≥ 0
i=1
We conclude that K is a p.d. kernel.
• K(x, y) = max(x, y)
We can take a counter example to prove that it is not a p.d. kernel.
PN P N
Let us take N = 2, a = (1, −1), x = (1, 2). ai aj max(xi , xj ) = −1 < 0,
i=1 j=1
therefore K is not a p.d kernel.
min(x,y)
• K(x, y) = max(x,y)
Obviously, K is symetric.
K(x, y) = min(x, y) min( x1 , y1 ). min(x, y) is a p.d kernel for all x,y ∈ R∗+ which implies
that min( x1 , y1 ) is a p.d kernel for all x,y ∈ R∗+ . Therefore by question 2, we can conclude
that K is a p.d. kernel.
3
N P
P N
We have ai aj LCM (xi , yj ) = −1 < 0, therefore K is not a p.d kernel.
i=1 j=1
GCD(x,y)
• K(x, y = LCM (x,y)
Obviously, K is symetric.
+∞
Q min(φi (x),φi (y))−max(φi (x),φi (y))
K(x, y) = pi . Let K1 (x, y) = e− max(x,y) =
i=1
emin(−x,y) = min(e−x , e−y ) with x, y ∈ N and e−x , e−y ∈ R+ . As we know
that min(x, y) is a p.d kernel, we know that K1 is a p.d. kernel. It implies that
min(φi (x),φi (y))−max(φi (x),φi (y))
∀i ∈ N∗ , pi is p.d. kernel. By question 2, we know that
n
min(φ (x),φ (y))−max(φ (x),φ (y))
∀n ∈ N∗ ,
Q i i i i
pi is p.d. kernel. We can conclude that K is
i=1
a p.d. kernel.
y) = xy with x, y ∈ R
1.K(x,
f : R → R : a∈R
HK = with < f1 , f2 >HK = a1 a2 , ||f ||HK = |a|.
x 7→ ax
f : R → R : |a| ≤ 1
BK =
x 7→ ax
CnK = max covn (aX, aY ) = max abcovn (X, Y ) = |covn (X, Y )|
|a|≤,1|b|≤1 |a|≤1,|b|≤1
n n
1
2.CnK f (xi )[ n1
P P
= max covn (f (X), g(Y )) = max (− min
n g∈B g(xj ) − g(xi )]). Be-
f,g∈BK K f ∈Bk i=1 j=1
cause of the convexity of the function inside the min with respect to f (xi ), ∃µ > 0 such that
n n
Cnk = n1 max (− min g(xj ) − g(xi )] + µ||f ||2HK ).
P P
f (xi )[
g∈BK f ∈Hk i=1 j=1
n
P n
P
By the representer theorem f (x) = αi K(x, xi ). By symmetry, g(y) = βi K(y, yi ).
i=1 i=1
1 T 1 T
It implies that CnK = max n α Kx Ky β − T
n2 α Kx 1 1Ky β
αT Kx α≤1,β T Kx β≤1
1 1
1 1 T 1 T
CnK = max n αT Kx (In − n 1 1)Ky β = max 2
n α̃ Kx P Ky β
2 2
=
αT K T
x α≤1,β Kx β≤1 kα̃k≤1,kβ̃k≤1
1 1 1 1
1 2
n kKx P Kx k2 with α̃ = Kx α and β̃ = Ky2 β and k.k2 the spectral norm.
2 2 2
1.
|Rφ (f, x) − Rφ (g, x)| ≤ |φ(f (x)) − φ(g(x))| + λ|kf k2HK − kgk2HK |
≤ L|f (x) − g(x)| + λkf − gk2HK
≤ L| < f, Kx >HK − < g, Kx >HK | + λ2Rkf − gkHK
kf − gkHK ≤ kf kHK + kgkHK )
≤ L| < f − g, Kx >HK +λ2Rkf − gkHK
(Cauchy-Schwartz inequality) ≤ LkKx kHK kf − gkHK + λ2Rkf − gkHK
√
≤ (L κ + λ2R)kf − gkHK
4
2.
Rφ (f, x) + Rφ (fx , x) = φ(f (x)) + φ(fx (x)) + λ(kf k2HK + kfx k2HK )
f (x) + fx (x)
≥ 2φ( ) + λ(kf k2HK + kgk2HK )
2
f + fx f + fx 2
≥ 2Rφ ( , x) + λ(kf k2HK + kgk2HK − 2k kHK )
2 2