Lecture06_separable
Lecture06_separable
Spring 2020
Stanley Chan
Notations
Input Space, Output Space, Hypothesis
Discriminant Function
Geometry of Discriminant Function
Separating Hyperplane
Normal Vector
Distance from Point to Plane
Linear Separability
Which set is linearly separable?
Separating Hyperplane Theorem
What if theorem fails?
No difference as far as decision is concerned. c Stanley Chan 2020. All Rights Reserved.
6 / 34
Binary Case
w ∈ Rd : linear coefficients
w0 ∈ R: bias / offset
Define the overall parameter
θ = {w , w0 } ∈ Rd+1 .
Example:
If d = 2, then
g (x) = w2 x2 + w1 x1 + w0 .
g (x) = 0 means
w1 w0
x2 = − x1 + − .
w2 w2
| {z } | {z }
slope y-intercept
c Stanley Chan 2020. All Rights Reserved.
8 / 34
Linear Discriminant Function
Notations
Input Space, Output Space, Hypothesis
Discriminant Function
Geometry of Discriminant Function
Separating Hyperplane
Normal Vector
Distance from Point to Plane
Linear Separability
Which set is linearly separable?
Separating Hyperplane Theorem
What if theorem fails?
In high-dimension,
g (x) = w T x + w0 .
is a hyperplane.
Separating Hyperplane:
H = {x | g (x) = 0}
= {x | w T x + w0 = 0}
x ∈ H means x is on the
decision boundary.
w /kw k2 is the normal vector
of H.
w T x 1 + w0 = 0, and w T x 2 + w0 = 0.
w T (x 1 − x 2 ) = (w T x 1 + w0 ) − (w T x 2 + w0 ) = 0.
x p is on H. So
g (x p ) = w T x p + w0 = 0
Therefore, we can show that
g (x 0 ) = w T x 0 + w0
T w
=w xp + η + w0
kw k2
= g (x p ) + ηkw k2 = ηkw k2 .
c Stanley Chan 2020. All Rights Reserved.
14 / 34
Distance from x 0 to g (x) = 0
So distance is
g (x 0 )
η=
kw k2
Conclusion:
g (x 0 ) w
xp = x0 − ·
kw k kw k
| {z 2} | {z 2}
distance normal vector
c Stanley Chan 2020. All Rights Reserved.
15 / 34
Distance from x 0 to g (x) = 0
Alternative Solution:
Let Lagrangian
1
L(x, λ) = kx − x 0 k2 − λ(w T x + w0 )
2
Stationarity condition implies
∇x L(x, λ) = (x − x 0 ) − λw = 0,
∇λ L(x, λ) = w T x + w0 = 0.
c Stanley Chan 2020. All Rights Reserved.
16 / 34
Distance from x 0 to g (x) = 0
Let us do some derivation:
∇x L(x, λ) = (x − x 0 ) − λw = 0,
∇λ L(x, λ) = w T x + w0 = 0.
This gives x = x 0 + λw
⇒ w T x+w0 = w T (x 0 + λw )+w0
⇒ 0 = w T x 0 + λkw k2 + w0
⇒ 0 = g (x 0 ) + λkw k2
⇒ λ = − gkw
(x 0 )
k2
⇒ x = x 0 + − gkw (x 0 )
k 2 w.
Therefore, we arrive at the same result:
g (x 0 ) w
xp = x0 − ·
kw k kw k
| {z 2} | {z 2}
distance normal vector
c Stanley Chan 2020. All Rights Reserved.
17 / 34
Outline
Notations
Input Space, Output Space, Hypothesis
Discriminant Function
Geometry of Discriminant Function
Separating Hyperplane
Normal Vector
Distance from Point to Plane
Linear Separability
Which set is linearly separable?
Separating Hyperplane Theorem
What if theorem fails?
g (x) = w T x + w0 ,
such that g (x) > 0 for all x ∈ C1 and g (x) < 0 for all x ∈ C2 .
Remark: The theorem above provides sufficiency but not necessity for
linearly separability.
c Stanley Chan 2020. All Rights Reserved.
20 / 34
Separating Hyperplane Theorem
Pictorial “proof”:
Pick two points x ∗ and y ∗ s.t. the distance between the sets is
minimized.
Define the mid-point as x 0 = (x ∗ + y ∗ )/2.
Draw the separating hyperplane with normal w = x ∗ − y ∗
Convexity implies any inner product must be positive.
g (x) = w T x + w0 ,
such that g (x) > 0 for all x ∈ C1 and g (x) < 0 for all x ∈ C2 .
Example 1 Example 2
Separating Hyperplane:
Duda, Hart and Stork’s Pattern Classification, Chapter 5.1 and 5.2.
Princeton ORFE-523, Lecture 5 on Separating hyperplane
http://www.princeton.edu/~amirali/Public/Teaching/
ORF523/S16/ORF523_S16_Lec5_gh.pdf
Cornell ORIE-6300, Lecture 6 on Separating hyperplane
https://people.orie.cornell.edu/dpw/orie6300/fall2008/
Lectures/lec06.pdf
Caltech, Lecture Note http://www.its.caltech.edu/~kcborder/
Notes/SeparatingHyperplane.pdf
g (x) = w T (x − x 0 )
x∗ + y∗
∗ ∗ T
= (x − y ) x−
2
kx k − ky ∗ k2
∗ 2
= (x ∗ − y ∗ )T x −
2
According to picture, we want g (x) > 0 for all x ∈ C1 .
Suppose not. Assume
kx ∗ k2 − ky ∗ k2
g (x) = (x ∗ − y ∗ )T x − < 0.
2
See if we can find a contradiction.
c Stanley Chan 2020. All Rights Reserved.
27 / 34
Proof of Separating Hyperplane Theorem
C1 is convex.
Pick x ∈ C1
Pick x ∗ ∈ C1
Let 0 ≤ λ ≤ 1
Construct a point
x λ = (1 − λ)x ∗ + λx.
Convex means
x λ ∈ C1
So we must have
kx λ − y ∗ k ≥ kx ∗ − y ∗ k
c Stanley Chan 2020. All Rights Reserved.
28 / 34
Proof of Separating Hyperplane Theorem
kx λ − y ∗ k2 = k(1 − λ)x ∗ + λx − y ∗ k2
= kx ∗ − y ∗ + λ(x − x ∗ )k2
= kx ∗ − y ∗ k2 + 2λ(x ∗ − y ∗ )T (x − x ∗ ) + λ2 kx − x ∗ k2
= kx ∗ − y ∗ k2 + 2λw T (x − x ∗ ) + λ2 kx − x ∗ k2 .
Remember: w T (x − x 0 ) < 0.
c Stanley Chan 2020. All Rights Reserved.
29 / 34
Proof of Separating Hyperplane Theorem
kx λ − y ∗ k2 = kx ∗ − y ∗ k2 + 2λw T (x − x ∗ ) + λ2 kx − x ∗ k2
< kx ∗ − y ∗ k2 + 2λ(w T x 0 − w T x ∗ ) + λ2 kx − x ∗ k2
∗ 2
kx k − ky ∗ k2
∗ ∗ 2 T ∗
= kx − y k + 2λ −w x
2
+ λ2 kx − x ∗ k2
= kx ∗ − y ∗ k2 − λkx ∗ − y ∗ k2 + λ2 kx − x ∗ k2
| {z } | {z }
=A =B
∗ ∗ 2 2
= kx − y k − λA + λ B
= kx ∗ − y ∗ k2 − λ(A − λB).
Now, pick an x such that A − λB > 0. Then −λ(A − λB) < 0.
A kx ∗ − y ∗ k2
λ< = .
B kx − x ∗ k2 c Stanley Chan 2020. All Rights Reserved.
30 / 34
Proof of Separating Hyperplane Theorem
Therefore, if we choose λ such that A − λB > 0, i.e.,
A kx ∗ − y ∗ k2
λ< = ,
B kx − x ∗ k2
Conclusion:
If x ∈ C1 , then g (x) > 0.
By symmetry, if x ∈ C2 , then g (x) < 0.
And we have found the separating hyperplane (w , w0 ).
c Stanley Chan 2020. All Rights Reserved.
31 / 34
Q&A 1: What is a convex set?