0% found this document useful (0 votes)
60 views10 pages

Linear Discriminant Analysis

The document summarizes linear discriminant analysis and linear discriminant functions. It discusses: 1) How a linear discriminant function can be used to classify data points into two classes based on whether the function value is positive or negative. 2) How the discriminant function defines a decision boundary that separates the two classes, which is a hyperplane when the function is linear. 3) How to determine the parameters of a linear discriminant function from labeled training data by minimizing errors between predicted and actual class labels.

Uploaded by

angela velasquez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views10 pages

Linear Discriminant Analysis

The document summarizes linear discriminant analysis and linear discriminant functions. It discusses: 1) How a linear discriminant function can be used to classify data points into two classes based on whether the function value is positive or negative. 2) How the discriminant function defines a decision boundary that separates the two classes, which is a hyperplane when the function is linear. 3) How to determine the parameters of a linear discriminant function from labeled training data by minimizing errors between predicted and actual class labels.

Uploaded by

angela velasquez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Lecture 5

Linear discriminant analysis


Linear discriminant function

There are many different ways to represent


a two class pattern classifier. One way is in
terms of a discriminant function g(x).

+1
x g
−1
For a new sample x and a given discriminant
function, we can decide on x belongs to Class
1 if g(x) > 0, otherwise it’s Class 2.

A discriminant function that is a linear combi-


nation of the components of x can be written
as
g(x) = wT x + w0
where w is called the weight vector and w0 the
threshold weight.
1
Lecture 5
The equation g(x) = 0 defines the decision
surface that separates data samples assigned
to Class 1 from data samples assigned to Class
2. This is a hyperplane when g(x) is linear.

Two vectors a and b are normal to each other


if aT b = 0. In Figure below we see [3, 4] and
[−4, 3] are normal to each other, in algebraic
terms,
[3, 4][−4, 3]T = 3 × (−4) + 4 × 3 = 0

−4 0 3

2
Lecture 5

If two points x1, x2 are both on the decision


surface, then

g(x1) = g(x2) = 0

wT x1 + w0 = wT x2 + w0 = 0

wT (x1 − x2) = 0
This means that w is normal to any vector
lying in the hyperplane ( (x1 − x2) is a vector
lying on the the decision surface as it starts
from x2, ends at x1).

x
w

g>0
xp

g=0
g<0

3
Lecture 5
Write
w
x = xp + r
kw k
where xp is the projection of x on the hyper-
plane. r is the distance from x to the hyper-
plane.
g(x) = wT x + w0
w
= wT [xp + r ] + w0
kw k

T wT w
= w xp + r + w0
kw k

= wT xp + w0 +rkwk
| {z }
0

= rkwk

Hence the distance of any data point to the


hyperplane is given by
g(x)
r=
kw k
4
Lecture 5
In particular, when x = [0, 0]T .
w
r= 0
kw k
A linear discriminant function divides the fea-
ture space by a hyperplane, of which the ori-
entation is determined by the normal vector
w, and the location is determined by w0. If
w0 = 0, then the hyperplane passes origin. If
w0 > 0, the origin is on the positive side of the
hyperplane.

x
w
|w 0 |
||w|| |g(x) | g>0
||w||

g=0
g<0
5
Lecture 5

Example 1: In order to select the best can-


didates, an over-subscribed secondary school
sets an entrance exam on two subjects of En-
glish and Mathematics. The marks of 5 ap-
plicants as listed in the Table below and the
decision for acceptance is passing an average
mark of 75.

(i) Show that the decision rule is equivalent of


the method of linear discriminant function.

(ii) Plot the decision hyperplane, indicating the


half planes of both Accept and Reject, and
location of the 5 applicants.

Candidate No. English Math Decision


1 80 85 Accept
2 70 60 Reject
3 50 70 Reject
4 90 70 Accept
5 85 75 Accept
6
Lecture 5
Solution: (i) Denote marks of English and Math
as x1 and x2, respectively. The decision rule is
if x1+x
2
2 > 75, accept, otherwise reject. This is

equivalent to using a linear discriminant func-


tion
g(x) = x1 + x2 − 150
with decision rule: if g(x) > 0, accept, other-
wise reject.

(ii) To plot g(x) = 0, the easiest way is to set


x1 = 0, find the value of x2 so that g(x) = 0.
i.e. 0 = 0 + x2 − 150 , so x2 = 150.

[0, 150]T is on the hyperplane.

Likewise we can also set x2 = 0, find the value


of x1 so that g(x) = 0. i.e. 0 = x1 + 0 − 150,
so x1 = 150.

[150, 0]T is on the hyperplane.

Plot a straight line linking [0, 150]T and [150, 0]T .


7
Lecture 5

150

100
1

4
3
2 g(x)>0 Accept
50

g(x)<0 Reject

0
0 50 100 150

Figure: The solution to the example 1 (ii).

8
Lecture 5

There are many ways of determining the lin-


ear discriminant function g(x) given a set of
training data samples. One way is to set the
labelled data samples some target values. e.g.
+1 for one class and -1 for another class, then
the weights of the linear discriminant function
are adjusted.

Using the same example, a set of linear equa-


tions can be constructed based on values in
the previous Table.



 80w1 + 85w2 + w0 = 1

 70w1 + 60w2 + w0 = −1


50w1 + 75w2 + w0 = −1




 90w1 + 70w2 + w0 = 1

 85w + 75w + w = 1
1 2 0
There are 5 equations to solve 3 unknown pa-
rameters. There is no exact solution. Instead,
the weights are determined by minimizing the
overall errors between both sides.
9
Lecture 5
The solution to this problem is often based on
the least squares estimate, given by
 
w1
 
 w2  =
w0
 
  80 85 1

80 70 50 90 85  70 60 1 
   −1
{ 85 60 75 70 75  
 50 75 1 }

1 1 1 1 1 
 90 70 1 

85 75 1
 
  1

80 70 50 90 85  −1 
  
×  85 60 75 70 75  
 −1 

1 1 1 1 1 
 1 

1

= [0.0571, 0.0580, −8.3176]T


So g(x) = 0.0571x1 + 0.0580x2 − 8.3176. Note
that this is the same hyperplane defined by
g(x) = x1 + 1.0106x2 − 145.6684, close to the
hyperplane used in Example 1.
10

You might also like