0% found this document useful (0 votes)

2 views35 pages

Lecture06_separable

The document is a lecture on linear separability in machine learning, focusing on the geometry of decision boundaries and the concept of separating hyperplanes. It discusses the conditions under which classes can be linearly separated and introduces the Separating Hyperplane Theorem, which states that two closed convex sets can be separated by a linear function if they do not overlap. Additionally, it highlights the limitations of linear classifiers and suggests alternative approaches for non-linearly separable data.

Uploaded by

gsaidulu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views35 pages

Lecture06_separable

Uploaded by

gsaidulu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

ECE595 / STAT598: Machine Learning I

Lecture 06 Linear Separability

Spring 2020

Stanley Chan

School of Electrical and Computer Engineering

Purdue University

c Stanley Chan 2020. All Rights Reserved.

1 / 34
Overview

c Stanley Chan 2020. All Rights Reserved.

2 / 34
Outline

Goal: Understand the geometry of linear separability.

Notations
Input Space, Output Space, Hypothesis
Discriminant Function
Geometry of Discriminant Function
Separating Hyperplane
Normal Vector
Distance from Point to Plane
Linear Separability
Which set is linearly separable?
Separating Hyperplane Theorem
What if theorem fails?

c Stanley Chan 2020. All Rights Reserved.

3 / 34
Supervised Classification
The goal of supervised classification is to construct a decision boundary
such that the two classes can be (maximally) separated.

c Stanley Chan 2020. All Rights Reserved.

4 / 34
Terminology
Input vectors: x 1 , x 2 , . . . , x N .
E.g., images, speech, EEG signal, rating, etc
Input space: X . Every x n ∈ X .
Labels y1 , y2 , . . . , yN .
Label space: Y. Every yn ∈ Y.
If labels are binary, e.g., yn = ±1, then
Y = {+1, −1}.
Labels are arbitrary. {+1, −1} and {0, 1} has no difference.
Target function f : X → Y. Unknown.
Relationship:
yn = f (x n ).
Hypothesis h : X → Y. Ideally, want
h(x) ≈ f (x), ∀x ∈ X .

c Stanley Chan 2020. All Rights Reserved.

5 / 34
Binary Case
If we restrict ourselves to binary classifier, then

1,
 if g (x) > 0
h(x) = 0, if g (x) < 0

either, if g (x) = 0


g : X → R is called a discriminant function.

g (x) > 0: x lives on the positive side of g .
g (x) < 0: x lives on the negative side of g .
g (x) = 0: The decision boundary.
You can also claim

+1,
 if g (x) > 0
h(x) = −1, if g (x) < 0

either, if g (x) = 0


No difference as far as decision is concerned. c Stanley Chan 2020. All Rights Reserved.
6 / 34
Binary Case

c Stanley Chan 2020. All Rights Reserved.

7 / 34
Linear Discriminant Function
A linear discriminant function takes the form
g (x) = w T x + w0 .

w ∈ Rd : linear coefficients
w0 ∈ R: bias / offset
Define the overall parameter
θ = {w , w0 } ∈ Rd+1 .
Example:
If d = 2, then
g (x) = w2 x2 + w1 x1 + w0 .
g (x) = 0 means
w1 w0
x2 = − x1 + − .
w2 w2
| {z } | {z }
slope y-intercept
c Stanley Chan 2020. All Rights Reserved.
8 / 34
Linear Discriminant Function

c Stanley Chan 2020. All Rights Reserved.

9 / 34
Outline

Goal: Understand the geometry of linear separability.

c Stanley Chan 2020. All Rights Reserved.

10 / 34
Linear Discriminant Function

In high-dimension,
g (x) = w T x + w0 .
is a hyperplane.

Separating Hyperplane:

H = {x | g (x) = 0}
= {x | w T x + w0 = 0}

x ∈ H means x is on the
decision boundary.
w /kw k2 is the normal vector
of H.

c Stanley Chan 2020. All Rights Reserved.

11 / 34
Why is w the Normal Vector?

c Stanley Chan 2020. All Rights Reserved.

12 / 34
Why is w the Normal Vector?

Pick x 1 and x 2 from H.

So g (x 1 ) = 0 and g (x 2 ) = 0.
This means:

w T x 1 + w0 = 0, and w T x 2 + w0 = 0.

Consider the difference vector x 1 − x 2 .

x 1 − x 2 is the tangent vector on the surface of H.
Check

w T (x 1 − x 2 ) = (w T x 1 + w0 ) − (w T x 2 + w0 ) = 0.

So w is perpendicular to x 1 − x 2 , hence it is the normal.

Normalize w /kw k2 so that it has unit norm.
c Stanley Chan 2020. All Rights Reserved.
13 / 34
Distance from x 0 to g (x) = 0
Pick a point x p on H
x p is the closest point to x 0
x 0 − x p is the normal direction
So, for some scalar η > 0,
w
x0 − xp = η
kw k2

x p is on H. So

g (x p ) = w T x p + w0 = 0
Therefore, we can show that
g (x 0 ) = w T x 0 + w0

T w
=w xp + η + w0
kw k2
= g (x p ) + ηkw k2 = ηkw k2 .
c Stanley Chan 2020. All Rights Reserved.
14 / 34
Distance from x 0 to g (x) = 0
So distance is
g (x 0 )
η=
kw k2

The closest point x p is

w
xp = x0 − η
kw k2
g (x 0 ) w
= x0 − · .
kw k2 kw k2

Conclusion:
g (x 0 ) w
xp = x0 − ·
kw k kw k
| {z 2} | {z 2}
distance normal vector
c Stanley Chan 2020. All Rights Reserved.
15 / 34
Distance from x 0 to g (x) = 0
Alternative Solution:

We can also obtain the same result by solving the optimization:

1
x p = argmin kx − x 0 k2 subject to w T x + w0 = 0.
x 2

Let Lagrangian
1
L(x, λ) = kx − x 0 k2 − λ(w T x + w0 )
2
Stationarity condition implies

∇x L(x, λ) = (x − x 0 ) − λw = 0,
∇λ L(x, λ) = w T x + w0 = 0.
c Stanley Chan 2020. All Rights Reserved.
16 / 34
Distance from x 0 to g (x) = 0
Let us do some derivation:
∇x L(x, λ) = (x − x 0 ) − λw = 0,
∇λ L(x, λ) = w T x + w0 = 0.

This gives x = x 0 + λw
⇒ w T x+w0 = w T (x 0 + λw )+w0
⇒ 0 = w T x 0 + λkw k2 + w0
⇒ 0 = g (x 0 ) + λkw k2
⇒ λ = − gkw
(x 0 )
k2

⇒ x = x 0 + − gkw (x 0 )
k 2 w.
Therefore, we arrive at the same result:
g (x 0 ) w
xp = x0 − ·
kw k kw k
| {z 2} | {z 2}
distance normal vector
c Stanley Chan 2020. All Rights Reserved.
17 / 34
Outline

Goal: Understand the geometry of linear separability.

c Stanley Chan 2020. All Rights Reserved.

18 / 34
Which one is Linearly Separable? Which one is Not?

c Stanley Chan 2020. All Rights Reserved.

19 / 34
Separating Hyperplane Theorem
Can we always find a separating hyperplane?
No.
Unless the classes are linearly separable.
If convex and not overlapping, then yes.

Theorem (Separating Hyperplane Theorem)

Let C1 and C2 be two closed convex sets such that C1 ∩ C2 = ∅. Then,
there exists a linear function

g (x) = w T x + w0 ,

such that g (x) > 0 for all x ∈ C1 and g (x) < 0 for all x ∈ C2 .

Remark: The theorem above provides sufficiency but not necessity for
linearly separability.
c Stanley Chan 2020. All Rights Reserved.
20 / 34
Separating Hyperplane Theorem
Pictorial “proof”:
Pick two points x ∗ and y ∗ s.t. the distance between the sets is
minimized.
Define the mid-point as x 0 = (x ∗ + y ∗ )/2.
Draw the separating hyperplane with normal w = x ∗ − y ∗
Convexity implies any inner product must be positive.

c Stanley Chan 2020. All Rights Reserved.

21 / 34
Separating Hyperplane Theorem
Pictorial “proof”:
Pick two points x ∗ and y ∗ s.t. the distance between the sets is
minimized.
Define the mid-point as x 0 = (x ∗ + y ∗ )/2.
Draw the separating hyperplane with normal w = x ∗ − y ∗
Convexity implies any inner product must be positive.

c Stanley Chan 2020. All Rights Reserved.

21 / 34
Linearly Separable?
I have data {x 1 , . . . , x N }.
Closed. Convex. Non-overlapping.
Separating hyperplane theorem: I can find a line.
Victory?
Not quite.

c Stanley Chan 2020. All Rights Reserved.

22 / 34
When Theory Fails

Theorem (Separating Hyperplane Theorem)

Let C1 and C2 be two closed convex sets such that C1 ∩ C2 = ∅. Then,
there exists a linear function

g (x) = w T x + w0 ,

such that g (x) > 0 for all x ∈ C1 and g (x) < 0 for all x ∈ C2 .

Finding a separating hyperplane for training set does not imply it

will work for the testing set.
Separating hyperplane theorem is more often used in theoretical
analysis by assuming properties of the testing set.
If a dataset is linearly separable, then you are guaranteed to find a
perfect classifier. Then you can say how good is the classifier you
designed compared to the perfect one.
c Stanley Chan 2020. All Rights Reserved.
23 / 34
Linear Classifiers Do Not Work

Example 1 Example 2

Intrinsic geometry of the two classes could be bad.

The training set could be lack of training samples.
Solution 1: Use non-linear classifiers, e.g.,
g (x) = x T W x + w T x + ω0 .
Solution 2: Kernel method, e.g., Radial basis function.
Solution 3: Extract features, e.g., g (x) = w T φ(x).
c Stanley Chan 2020. All Rights Reserved.
24 / 34
Reading List

Separating Hyperplane:
Duda, Hart and Stork’s Pattern Classification, Chapter 5.1 and 5.2.
Princeton ORFE-523, Lecture 5 on Separating hyperplane
http://www.princeton.edu/~amirali/Public/Teaching/
ORF523/S16/ORF523_S16_Lec5_gh.pdf
Cornell ORIE-6300, Lecture 6 on Separating hyperplane
https://people.orie.cornell.edu/dpw/orie6300/fall2008/
Lectures/lec06.pdf
Caltech, Lecture Note http://www.its.caltech.edu/~kcborder/
Notes/SeparatingHyperplane.pdf

25 / 34
Appendix

26 / 34
Proof of Separating Hyperplane Theorem

Conjecture: Let’s see if this is the correct hyperplane

g (x) = w T (x − x 0 )
x∗ + y∗

∗ ∗ T
= (x − y ) x−
2
kx k − ky ∗ k2
∗ 2
= (x ∗ − y ∗ )T x −
2
According to picture, we want g (x) > 0 for all x ∈ C1 .
Suppose not. Assume

kx ∗ k2 − ky ∗ k2
g (x) = (x ∗ − y ∗ )T x − < 0.
2
See if we can find a contradiction.
c Stanley Chan 2020. All Rights Reserved.
27 / 34
Proof of Separating Hyperplane Theorem
C1 is convex.
Pick x ∈ C1
Pick x ∗ ∈ C1
Let 0 ≤ λ ≤ 1
Construct a point

x λ = (1 − λ)x ∗ + λx.

Convex means

x λ ∈ C1

Pick an arbitrary point x ∈ C1 .

x ∗ is fixed already.
Pick x λ along the line connecting x and x ∗ .
Convexity implies x λ ∈ C1 .
So kx λ − y ∗ k ≥ kx ∗ − y ∗ k. If not, something is wrong.
Let us do some algebra:

kx λ − y ∗ k2 = k(1 − λ)x ∗ + λx − y ∗ k2
= kx ∗ − y ∗ + λ(x − x ∗ )k2
= kx ∗ − y ∗ k2 + 2λ(x ∗ − y ∗ )T (x − x ∗ ) + λ2 kx − x ∗ k2
= kx ∗ − y ∗ k2 + 2λw T (x − x ∗ ) + λ2 kx − x ∗ k2 .

kx λ − y ∗ k2 = kx ∗ − y ∗ k2 + 2λw T (x − x ∗ ) + λ2 kx − x ∗ k2
< kx ∗ − y ∗ k2 + 2λ(w T x 0 − w T x ∗ ) + λ2 kx − x ∗ k2
∗ 2
kx k − ky ∗ k2

∗ ∗ 2 T ∗
= kx − y k + 2λ −w x
2
+ λ2 kx − x ∗ k2
= kx ∗ − y ∗ k2 − λkx ∗ − y ∗ k2 + λ2 kx − x ∗ k2
| {z } | {z }
=A =B
∗ ∗ 2 2
= kx − y k − λA + λ B
= kx ∗ − y ∗ k2 − λ(A − λB).
Now, pick an x such that A − λB > 0. Then −λ(A − λB) < 0.
A kx ∗ − y ∗ k2
λ< = .
B kx − x ∗ k2 c Stanley Chan 2020. All Rights Reserved.
30 / 34
Proof of Separating Hyperplane Theorem
Therefore, if we choose λ such that A − λB > 0, i.e.,

A kx ∗ − y ∗ k2
λ< = ,
B kx − x ∗ k2

then −λ(A − λB) < 0, and so

kx λ − y ∗ k2 < kx ∗ − y ∗ k2 − λ(A − λB)

< kx ∗ − y ∗ k2

Contradiction, because kx ∗ − y ∗ k2 should be the smallest!

Conclusion:
If x ∈ C1 , then g (x) > 0.
By symmetry, if x ∈ C2 , then g (x) < 0.
And we have found the separating hyperplane (w , w0 ).
c Stanley Chan 2020. All Rights Reserved.
31 / 34
Q&A 1: What is a convex set?

A set C is convex if the following condition is met.

Pick x ∈ C and y ∈ C , and let 0 < λ < 1. If λx + (1 − λ)y is also in
C for any x, y and λ, then C is convex.
Basically, it says that you can pick two points and draw a line. If the
line is also in the set, then the set is convex.

32 / 34
Q&A 2: Is there a way to check whether two sets are
linearly separable?
No, at least I do not know.
The best you can do is to check whether a training set is linearly
separable.
To do so, solve the hard SVM. If you can solve it with zero training
error, then you have found one. If the hard SVM does not have a
solution, then the training set is not separable.
Checking the testing set is impossible unless you know the
distributions of the samples. But if you know the distributions, you
can derive formula to check linear separability.
For example, Gaussians are not linearly separable because no matter
how unlikely you can always find a sample that lives in the wrong
side. Uniform distributions are linearly separable.
Bottom line: Linear separability, in my opinion, is more of a
theoretical tool to describe the intrinsic property of the problem. It
is not for computational purposes. c Stanley Chan 2020. All Rights Reserved.
33 / 34
Q&A 3: If two sets are not convex, how do I know if it is
linearly separable?

You can look at the convex hull.

A convex hull is the smallest convex set that contains the original set.
If the convex hulls are not overlapping, then linearly separable.
For additional information about convex sets, convex hulls, you can
check Chapter 2 of
https://web.stanford.edu/class/ee364a/lectures.html

34 / 34

Hyun Et Al. - 2011 - A Review of Nonlinear Oscillatory Shear Tests Analysis and Application of Large Amplitude Oscillatory Shear (LAO
No ratings yet
Hyun Et Al. - 2011 - A Review of Nonlinear Oscillatory Shear Tests Analysis and Application of Large Amplitude Oscillatory Shear (LAO
57 pages
Intro_to_ML_slides_11
No ratings yet
Intro_to_ML_slides_11
13 pages
01- ITIL V3 2011 Service Design SD_A
No ratings yet
01- ITIL V3 2011 Service Design SD_A
220 pages
Lesson 6
No ratings yet
Lesson 6
36 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
ml-4
No ratings yet
ml-4
101 pages
SVM 1
No ratings yet
SVM 1
6 pages
Support Vecto Machine (3)
No ratings yet
Support Vecto Machine (3)
62 pages
Lecture 8 Course
No ratings yet
Lecture 8 Course
22 pages
10_SVM (1)
No ratings yet
10_SVM (1)
77 pages
Mathematics Behind Machine Learning
No ratings yet
Mathematics Behind Machine Learning
103 pages
Chapter_8 (1)
No ratings yet
Chapter_8 (1)
52 pages
SVM Intro
No ratings yet
SVM Intro
114 pages
The Digital Experience Playbook External II
No ratings yet
The Digital Experience Playbook External II
29 pages
Lesson 6
No ratings yet
Lesson 6
36 pages
g1 Compensation Administration
No ratings yet
g1 Compensation Administration
2 pages
tut_5A
No ratings yet
tut_5A
1 page
Subnetting
No ratings yet
Subnetting
10 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
TGH40N65F2DS_final_datasheet_Rev0.1.0
No ratings yet
TGH40N65F2DS_final_datasheet_Rev0.1.0
9 pages
Lec 11
No ratings yet
Lec 11
17 pages
Lec 12
No ratings yet
Lec 12
14 pages
Series J
0% (1)
Series J
3,102 pages
Nelson Mandela 10th Memorial Service
No ratings yet
Nelson Mandela 10th Memorial Service
5 pages
Separating Hyper Plane
No ratings yet
Separating Hyper Plane
10 pages
Chapter Classification
No ratings yet
Chapter Classification
12 pages
ML Module 4
No ratings yet
ML Module 4
16 pages
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
No ratings yet
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
43 pages
hw6 (2)
No ratings yet
hw6 (2)
5 pages
PerceptronSVM Module5 Part2 October2023
No ratings yet
PerceptronSVM Module5 Part2 October2023
43 pages
Metals 14 00087
No ratings yet
Metals 14 00087
15 pages
Lec15 16
No ratings yet
Lec15 16
35 pages
Assignment Psychiatric
No ratings yet
Assignment Psychiatric
14 pages
1 Algorithm: For I 1 To N Ify
No ratings yet
1 Algorithm: For I 1 To N Ify
6 pages
Alvera Ebrochure Catalogue Nov 21
No ratings yet
Alvera Ebrochure Catalogue Nov 21
2 pages
Unit-4 Part-1 Ml Ai&Ml r23
No ratings yet
Unit-4 Part-1 Ml Ai&Ml r23
20 pages
ПІДСУМКОВА РОБОТА
No ratings yet
ПІДСУМКОВА РОБОТА
3 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
Foundations and Adult Health Nursing 7th Edition Cooper Test Bank instant download
100% (2)
Foundations and Adult Health Nursing 7th Edition Cooper Test Bank instant download
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
46 pages
Lec5 Class Margin
No ratings yet
Lec5 Class Margin
28 pages
History of Estonia
No ratings yet
History of Estonia
46 pages
Q1
No ratings yet
Q1
3 pages
ML-chap10_2024_110300
No ratings yet
ML-chap10_2024_110300
29 pages
SVM
No ratings yet
SVM
57 pages
01 - Ibrahim - Ibrahim Khalil
No ratings yet
01 - Ibrahim - Ibrahim Khalil
1 page
IN5400 - Machine Learning For Image Analysis
No ratings yet
IN5400 - Machine Learning For Image Analysis
6 pages
6.034 Notes: Section 7.1: Slide 7.1.1
No ratings yet
6.034 Notes: Section 7.1: Slide 7.1.1
25 pages
8 Jan2015
No ratings yet
8 Jan2015
53 pages
Geometrical and Statistical Properties of Systems of
No ratings yet
Geometrical and Statistical Properties of Systems of
9 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
Through Faith and Hard Work
No ratings yet
Through Faith and Hard Work
5 pages
Tutorial4 SVM
No ratings yet
Tutorial4 SVM
8 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Support Vector Machines: Artificial Neural Networks Unit 6
No ratings yet
Support Vector Machines: Artificial Neural Networks Unit 6
10 pages
SVM
No ratings yet
SVM
44 pages
Support Vector Machine SVM
No ratings yet
Support Vector Machine SVM
58 pages
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
No ratings yet
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
5 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Pattern Separation and Prediction Via Linear and Semidefinite Programming
No ratings yet
Pattern Separation and Prediction Via Linear and Semidefinite Programming
16 pages
Support Vector Machines
No ratings yet
Support Vector Machines
32 pages
LTI Clarification On RO, Boiler & Agricultural Applications
No ratings yet
LTI Clarification On RO, Boiler & Agricultural Applications
6 pages
PM - Tick Delay
No ratings yet
PM - Tick Delay
1 page
SVM Tutorial: SVM - Understanding The Math - The Optimal Hyperplane
No ratings yet
SVM Tutorial: SVM - Understanding The Math - The Optimal Hyperplane
13 pages
Assignment OB Management
No ratings yet
Assignment OB Management
8 pages
Fire Terminology 9
No ratings yet
Fire Terminology 9
3 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
45 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
An Idiot's Guide To Support Vector Machines
No ratings yet
An Idiot's Guide To Support Vector Machines
28 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
10 pages
1 An Introduction To Linear Classifiers
No ratings yet
1 An Introduction To Linear Classifiers
9 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
Report 1
No ratings yet
Report 1
6 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
13 HD 120
No ratings yet
13 HD 120
4 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Arc 2019-2020
No ratings yet
Arc 2019-2020
95 pages
Cfm56-3 Systems Training Manuals
100% (14)
Cfm56-3 Systems Training Manuals
187 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Coal Mill Installation
0% (1)
Coal Mill Installation
24 pages
Unit-3: Corporate Parenting, Bcg-Matrix and Porter's Diamond
No ratings yet
Unit-3: Corporate Parenting, Bcg-Matrix and Porter's Diamond
36 pages
Medico - Legal Investigation of Wounds
25% (4)
Medico - Legal Investigation of Wounds
23 pages
Brosur Getinge
No ratings yet
Brosur Getinge
48 pages
Kathrein 741322
100% (1)
Kathrein 741322
2 pages
ACW Pump Test Procedures
100% (2)
ACW Pump Test Procedures
29 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Lecture06_separable

Uploaded by

Lecture06_separable

Uploaded by

ECE595 / STAT598: Machine Learning I

Lecture 06 Linear Separability

School of Electrical and Computer Engineering

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

Goal: Understand the geometry of linear separability.

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

g : X → R is called a discriminant function.

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

Goal: Understand the geometry of linear separability.

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

Pick x 1 and x 2 from H.

Consider the difference vector x 1 − x 2 .

So w is perpendicular to x 1 − x 2 , hence it is the normal.

The closest point x p is

We can also obtain the same result by solving the optimization:

Goal: Understand the geometry of linear separability.

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

Theorem (Separating Hyperplane Theorem)

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

Theorem (Separating Hyperplane Theorem)

Finding a separating hyperplane for training set does not imply it

Intrinsic geometry of the two classes could be bad.

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

Conjecture: Let’s see if this is the correct hyperplane

Pick an arbitrary point x ∈ C1 .

then −λ(A − λB) < 0, and so

kx λ − y ∗ k2 < kx ∗ − y ∗ k2 − λ(A − λB)

Contradiction, because kx ∗ − y ∗ k2 should be the smallest!

A set C is convex if the following condition is met.

c Stanley Chan 2020. All Rights Reserved.

You can look at the convex hull.

c Stanley Chan 2020. All Rights Reserved.

You might also like