Lecture 34
Lecture 34
classifiers.
subject to yi − W T Φ(Xi ) − b ≤ ǫ + ξi , i = 1, . . . , n
W T Φ(Xi ) + b − yi ≤ ǫ + ξi′ , i = 1, . . . , n
ξi ≥ 0, ξi′ ≥ 0 i = 1, . . . , n
subject to yi − W T Φ(Xi ) − b ≤ ǫ + ξi , i = 1, . . . , n
W T Φ(Xi ) + b − yi ≤ ǫ + ξi′ , i = 1, . . . , n
ξi ≥ 0, ξi′ ≥ 0 i = 1, . . . , n
• We have added the term W T W in the objective
function. This is like model complexity in a
regularization context.
PR NPTEL course – p.31/135
• Like earlier, we can form the Lagrangian and then,
using Kuhn-Tucker conditions, can get the optimal
values of W and b.
|f (X) − f (X ′ )| = |W T (X − X ′ )|.
|f (X) − f (X ′ )| = |W T (X − X ′ )|.
• For all X, X ′ with |W T (X − X ′ )| ≥ 2ǫ,
||X − X ′ || would be smallest if
|W T (X − X ′ )| = 2ǫ and (X − X ′ ) is parallel to W .
|f (X) − f (X ′ )| = |W T (X − X ′ )|.
• For all X, X ′ with |W T (X − X ′ )| ≥ 2ǫ,
||X − X ′ || would be smallest if
|W T (X − X ′ )| = 2ǫ and (X − X ′ ) is parallel to W .
That is, X − X ′ = ± W2ǫW
TW .
|f (X) − f (X ′ )| = |W T (X − X ′ )|.
• For all X, X ′ with |W T (X − X ′ )| ≥ 2ǫ,
||X − X ′ || would be smallest if
|W T (X − X ′ )| = 2ǫ and (X − X ′ ) is parallel to W .
That is, X − X ′ = ± W2ǫW
TW .
2ǫ
• Thus, mǫ (f ) = ||W ||
.
|f (X) − f (X ′ )| = |W T (X − X ′ )|.
• For all X, X ′ with |W T (X − X ′ )| ≥ 2ǫ,
||X − X ′ || would be smallest if
|W T (X − X ′ )| = 2ǫ and (X − X ′ ) is parallel to W .
That is, X − X ′ = ± W2ǫW
TW .
2ǫ
• Thus, mǫ (f ) = ||W ||
.
• Thus in our optimization problem in SVR, minimizing
W T W promotes learning of smoother models.
PR NPTEL course – p.74/135
Solving the SVM optimization problem
∂L
• Using Kuhn-Tucker conditions, we have ∂µi
= 0 and
µ1 + µ2 − µ3 = 0.
∂L
• Using Kuhn-Tucker conditions, we have ∂µi
= 0 and
µ1 + µ2 − µ3 = 0.
• This gives us four equations; we have 7 unknowns.
∂L
• Using Kuhn-Tucker conditions, we have ∂µi
= 0 and
µ1 + µ2 − µ3 = 0.
• This gives us four equations; we have 7 unknowns.
We use complementary slackness conditions on αi .
∂L
• Using Kuhn-Tucker conditions, we have ∂µi
= 0 and
µ1 + µ2 − µ3 = 0.
• This gives us four equations; we have 7 unknowns.
We use complementary slackness conditions on αi .
• We have αi µi = 0. Essentially, we need to guess
which µi > 0.
PR NPTEL course – p.93/135
• In this simple problem we know all µi > 0.
• We can take j = 1, 2 or 3.
• We can take j = 1, 2 or 3.
• With j = 1 we get b∗ = 1 − (4 + 0 − 2) = −1.
• We can take j = 1, 2 or 3.
• With j = 1 we get b∗ = 1 − (4 + 0 − 2) = −1.
• With j = 3 we get b∗ = −1 − (1 + 1 − 2) = −1.
• We can take j = 1, 2 or 3.
• With j = 1 we get b∗ = 1 − (4 + 0 − 2) = −1.
• With j = 3 we get b∗ = −1 − (1 + 1 − 2) = −1.
• If we solved our optimization problem correctly, we
should get same b∗ !