Intro To Linear Programming
Intro To Linear Programming
In terms of matrices,
0 1
0 1 g0 g 1 ··· g d+1 0 1
y1 B g1 C x1
B ..C B g0 ··· g d+2 C B .C
@y2 .A = B .. .. .. .. C @x2 ..A
@ . . . . A
yd xd
gd 1 gd 2 ··· g0
2 3
⇤ ⇤ ⇤ ⇤ ⇤
2 3
⇤ ⇤66⇤⇤ 3⇤⇤ ⇤⇤ ⇤ ⇤7
7
2 6 7 ···
⇤ ⇤66⇤⇤
⇤ ⇤
⇤⇤6 ⇤⇤ ⇤ ⇤7 ⇤ 7 ⇤ ⇤ 7
6⇤ ⇤66⇤⇤ ⇤⇤ ⇤⇤7 ⇤ ⇤7 · · · 5
4 ⇤ ⇤ ⇤ 7 ⇤ ⇤ [⇤] · · ·
6 7
6⇤ ⇤ 4⇤⇤ ⇤⇤ ⇤⇤⇤7⇤⇤· · ⇤·⇤5 ⇤ ⇤ ..
6 7 ..
4⇤ ⇤ ⇤⇤ ⇤⇤ ⇤⇤5 ⇤ .⇤ .. . .
.. .
⇤ ⇤ ⇤ ⇤ .⇤ .
.. ..
.. ..
. .
19
MA3K1 Mathematics of Machine Learning April 10, 2021
Solution (24) Let h : Rd ! Y be any classifier. Define the smallest perturbation that
moves a data point into a different class as
For a linear classifier with only two classes, where h(x) = wT x + b, we get the
vector that moves point x to the boundary by solving the optimization problem (as for
SVMs)
1
r ⇤ = argmin krk2 subject to wT (x + r) + b = 0.
2
The solution is
|wT x + b|
r⇤ = .
kwk
Assume that we now have k linear functions f1 , . . . , fk , with fi (x) = wiT x + bi , and
a classifier h : Rd ! {1, . . . , k} that assigns to each x the index j of the largest value
fj (x) (this corresponds to the one-to-many setting for multiple classification). Let x be
such that maxj fj (x) = fk (x) and define the linear functions
Then \
x2 {y : gi (y) < 0}.
1ik 1
H4
H3
H2
H5
x
H1
Figure 6: The distance to misclassification is the radius of the largest enclosed ball in
a polyhedron P .
20
MA3K1 Mathematics of Machine Learning April 10, 2021
Solution (25) The discriminator D would like to achieve, on average, a large value
on G0 (Z0 ), and a small value on G1 (Z1 ). Using the logarithm, this can be expressed as
the problem of maximizing
We choose D(x) so that the integrand becomes maximal. So considering the function
⇢X0 (x)
y= ,
⇢X0 (x) + ⇢X1 (x)
21