0% found this document useful (0 votes)

38 views6 pages

AE - Tema 5 - Two-Class Fisher Discriminant Analysis

Uploaded by

Ramón García

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views6 pages

AE - Tema 5 - Two-Class Fisher Discriminant Analysis

Uploaded by

Ramón García

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Two-Class Discriminant Analysis

On many occasions, we will be faced with the problem of determining to

which of two populations a given observation belongs. For example, we might be
interested in determining whether a patient has a certain disease given the results
of a blood test, or whether a bank customer will repay a loan given the information
available about him/her.
Following, we discuss two-class discriminant analysis. We will focus our
attention mainly on linear discriminants in which the separation between classes
is given by hyperplanes. When we know the distribution of the classes, their
prior probabilities and the cost of misclassifying an observation, we can build the
Bayesian discriminant.

Bayes Discriminant Analysis

Let X = [X1 , . . . , Xp ]T be a p-dimensional observation belonging to population
c1 or to population c2 . Our objective will be to determine a decision function
d : Rp → {c1 , c2 }, defined by
(
c1 if X ∈ R1 ,
d(X) =
c2 if X ∈ R2 = R1c .

To do this, we assume that we know the density functions f1 (X) and f2 (X)
of these two populations, and their prior probabilities p1 and p2 , respectively
(p1 + p2 = 1).
Knowing this information, we can use Bayes’ theorem to calculate the posterior
probabilities. Thus are given by:
f1 (X)p1
k1 (X) = P (c1 |X) = ,
f1 (X)p1 + f2 (X)p2
f2 (X)p2
k2 (X) = P (c2 |X) = .
f1 (X)p1 + f2 (X)p2
Using the previous equations, X ∈ c1 if k1 (X) ≥ k2 (X).
Example. A veterinary clinic has recently treated a Persian cat and they forgot
to record its sex. They know that the weight distribution of females follows a
Normal distribution of mean 4.1 Kg and standard deviation 3.8 Kg, while the
weight distribution of males is a Normal distribution of mean 5 Kg and standard
deviation 4 Kg. If one out of every three cats they care for is female and the weight
of this cat is 4.7 Kg, is it more likely that the cat was a male or female?
Given that one out of every three cats they care for is female, the prior
probabilities for female and male are p1 = 1/3 and p2 = 2/3, respectively.
The weight of the female cats follows a Normal distribution with mean 4.1 Kg
and standard deviation 3.8 Kg. Therefore
1 X − 4.1 2
 

− 
1

f1 (X) = √ e 2 3.8
3.8 2π

1
Finally, the weight of the male cats follows a Normal distribution with mean 5
Kg and standard deviation 4 Kg. Hence

1 X−5 2
 

− 
1

f2 (X) = √ e 2 4
4 2π
The weight of this cat is 4.7 kg. Therefore
1
f (4.7)
3 1
k1 (4.7) = 1 ≈ 0.343
f (4.7) + 23 f2 (4.7)
3 1
and
2
f (4.7)
3 2
k2 (4.7) = 1 ≈ 0.667
f (4.7) + 23 f2 (4.7)
3 1

Since k1 (4.7) < k2 (4.7), it is more likely that this cat were a male.

The source code to obtain the previous result is

1 f1 = function ( x ) {
2 1 / (3.8 * sqrt (2 * pi ) ) * exp ( -1 / 2 * (( x -4.1) / 3.8) ^2) }
3

4 f2 = function ( x ) {
5 1 / (4 * sqrt (2 * pi ) ) * exp ( -1 / 2 * (( x -5) / 4) ^2) }
6

7 k1 = 1 / 3 * f1 (4.7) / (1 / 3 * f1 (4.7) +2 / 3 * f2 (4.7) )

8 k2 = 2 / 3 * f2 (4.7) / (1 / 3 * f1 (4.7) +2 / 3 * f2 (4.7) )

Sometimes the cost of misclassifying an observation may depend on its true

class. For example, it is more costly to classify a sick patient as healthy and not
provide adequate treatment than to classify a healthy one as sick and have him/her
receive a treatment that is harmless to him/her. Therefore, in certain problems it
is necessary to include a cost matrix such as the following:

Choise
c1 c2
c1 0 L(1, 2)
Truth
c2 L(2, 1) 0

where there is no loss if the observation is correctly classified.

In these situations, the expected error of classifying X as c1 when the true class
is c2 is
ϵ1 = k2 (X)L(2, 1)
and the one of classifying as c2 , when the true class is c1

ϵ2 = k1 (X)L(1, 2)

2
In this scenario, X ∈ R1 if the expected error from wrongly classifying X as c1
is less than the expected error from wrongly classifying it as c2

X ∈ R1 ⇐⇒ ϵ1 ≤ ϵ2
⇐⇒ k2 (X)L(2, 1) ≤ k1 (X)L(1, 2)
⇐⇒ f2 (X)p2 L(2, 1) ≤ f1 (X)p1 L(1, 2)
f1 (X) p2 L(2, 1)
⇐⇒ ≥
f2 (X) p1 L(1, 2)
Then, we have proven the following theorem
Theorem. The Bayes solution to the classification problem is given by the region
( )
f1 (X) p2 L(2, 1)
R1 = X ≥ .
f2 (X) p1 L(1, 2)

Next, we analyze the Bayesian solution in the particular case that the densities
are Normal.

Discrimination between two Normal populations with the same

variance-covariance matrix
If f1 and f2 are Normal with the same variance-covariance matrix
1
1 − (X−µ1 )T Σ−1 (X−µ1 )
f1 (X) = √ e 2
( 2π)p |Σ|
1
1 − (X−µ2 )T Σ−1 (X−µ2 )
f2 (X) = √ e 2
( 2π)p |Σ|
The classification region is given by
f1 (X) p2 L(2, 1)
X ∈ R1 ⇐⇒ ≥ c, where c =
f2 (X) p1 L(1, 2)
1 1
− (X−µ1 )T Σ−1 (X−µ1 )+ (X−µ2 )T Σ−1 (X−µ2 )
⇐⇒e 2 2 ≥c
1 1
⇐⇒ − (X − µ1 )T Σ−1 (X − µ1 ) + (X − µ2 )T Σ−1 (X − µ2 ) ≥ ln(c)
2 2
⇐⇒ − (X − µ1 ) Σ (X − µ1 ) + (X − µ2 )T Σ−1 (X − µ2 ) ≥ 2 ln(c)
T −1

T −1
Σ X + XT Σ−1 µ1 + µT1 Σ−1 X − µT1 Σ−1 µ1

⇐⇒ −
X
−1
X − XT Σ−1 µ2 − µT2 Σ−1 X + µT2 Σ−1 µ2 ≥ 2 ln(c)
T
XΣ

⇐⇒ XT Σ−1 (µ1 − µ2 ) + (µT1 − µT2 )Σ−1 X −µT1 Σ−1 µ1 + µT2 Σ−1 µ2 ≥ 2 ln(c)
| {z } | {z }
real number real number
⇐⇒X T
Σ−1 (µ1 − µ2 ) +X T
Σ−1 (µ1 − µ2 ) − µT1 Σ−1 µ1 + µT2 Σ−1 µ2 ≥ 2 ln(c)
⇐⇒2XT Σ−1 (µ1 − µ2 ) − µT1 Σ−1 µ1 + µT2 Σ−1 µ2 ≥ 2 ln(c)
1 1
⇐⇒XT Σ−1 (µ1 − µ2 ) − µT1 Σ−1 µ1 + µT2 Σ−1 µ2 ≥ ln(c)
2 2

3
Remark. The previous expression depends on the discriminant

l = Σ−1 (µ1 − µ2 ).

Theorem. The vector l has the property that maximize the function
(Ec1 (dT X) − Ec2 (dT X))2
φ(d) =
V (dT X)
where Ec1 (dT X) and Ec2 (dT X) are the expected value of the projection in each
class, respectively.
Proof.
(Ec1 (dT X) − Ec2 (dT X))2
φ(d) =
V (dT X)
(dT (µ1 − µ2 ))2
=
dT Σd
(d (µ1 − µ2 ))(dT (µ1 − µ2 ))T
T
=
dT Σd
d (µ1 − µ2 )(µ1 − µ2 )T d
T
=
dT Σd
As φ(kd) = φ(d) we can determine the value of d for which dT Σd = 1. Using
the Lagrange multipliers in the following problem

max dT (µ1 − µ2 )(µ1 − µ2 )T d

d
s.t. dT Σd = 1

we get
max dT (µ1 − µ2 )(µ1 − µ2 )T d − λ(dT Σd − 1)
λ,d

deriving

2(µ1 − µ2 )(µ1 − µ2 )T d − 2λΣd = 0

2 (µ1 − µ2 )(µ1 − µ2 )T d = 2 λΣd
1 −1
Σ (µ1 − µ2 ) (µ1 − µ2 )T d = d
λ | {z }
real number
1
(µ1 − µ2 )T d Σ−1 (µ1 − µ2 ) = d
λ
| {z }
real number
k · Σ−1 (µ1 − µ2 ) = d

1
where k = (µ1 − µ2 )T d is a scalar, therefore
λ
d ∝ Σ−1 (µ1 − µ2 ).

4
Fisher Discriminant Analysis
Let {X1 , · · · , Xn } be n observations of dimension p belonging to one of two classes
c1 or c2 . The number of observations of each class is n1 and n2 , respectively, where
n1 + n2 = n. Suppose now that the two classes have p-dimensional means m1 and
m2 defined by
1 X
mi = Xn
ni n∈c
i

We would like to find a linear combination of predictors, w ∈ Rp , that maximizes

the separation between the classes and also minimizes the within-class variance in
the projected data. The Fisher criterion is defined as the ratio of the between-class
variance to the within-class variance and is given by
(m2 − m1 )2
J(w) =
s21 + s22
where X
mi = wT mi si = (wT Xn − mi )2
n∈ci

The numerator can be written as

(m2 − m1 )2 = (wT m2 − wT m1 )2
= (wT (m2 − m1 ))2
= wT (m2 − m1 )(wT (m2 − m1 ))T
= wT (m2 − m1 )(m2 − m1 )T w
= wT SB w
Similarly,
X
s21 = (wT Xn − m1 )2
n∈c1
X
= (wT Xn − wT m1 )2
n∈c1
X
= (wT (Xn − m1 ))2
n∈c1
X
= (wT (Xn − m1 ))(wT (Xn − m1 ))T
n∈c1
X
= wT (Xn − m1 )(Xn − m1 )T w
n∈c1
X
T T
=w (Xn − m1 )(Xn − m1 ) w
n∈c1
T
= w S1 w
Same reasoning applies for s2 . Therefore, the Fisher criterion can be rewritten as
wT SB w
J(w) =
wT SW w
where SW = S1 + S2

5
Deriving,

(2SB w)wT SW w − wT SB w(2SW w)

=0
(wT SW w)2
(SB w) wT SW w − wT SB w (SW w) = 0
| {z } | {z }
real number real number
wT SB w
(SB w) − (SW w) =0
wT SW w
(SB w) − λ(SW w) = 0
SB w = λ(SW w)

if SW has full rank then

SW −1 SB w = λw
w is an eigenvector of SW −1 SB

Remark.

SB w = (m2 − m1 ) (m2 − m1 )T w = α(m2 − m1 )

| {z }
α
SB w = λSW w
λSW w = α(m2 − m1 )
w ∝ SW −1 (m2 − m1 )

References
[1] Ersbøll, B. K., & Conradsen, K. (2007). An introduction to statistics, vol. 2.
DTU Informatics, 7.

[2] Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine
learning (Vol. 4, No. 4, p. 738). New York: springer.

Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Lecture 2
No ratings yet
Lecture 2
130 pages
Pattern Classification
No ratings yet
Pattern Classification
141 pages
PR January20 03 PDF
No ratings yet
PR January20 03 PDF
74 pages
4gaussian Discriminant
No ratings yet
4gaussian Discriminant
50 pages
Week2 Part1 Summer Partial Notes
No ratings yet
Week2 Part1 Summer Partial Notes
75 pages
JW Chapter11solutions
No ratings yet
JW Chapter11solutions
49 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
58 pages
Report
No ratings yet
Report
50 pages
04 Bayes Classification Rule
No ratings yet
04 Bayes Classification Rule
46 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
Chapter5 DA
No ratings yet
Chapter5 DA
39 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
Analisis Klasifikasi
No ratings yet
Analisis Klasifikasi
41 pages
Lecturer4 - Bayesian Decision Theory
No ratings yet
Lecturer4 - Bayesian Decision Theory
40 pages
Subject: Statistics
No ratings yet
Subject: Statistics
21 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Research - Seminar - N Jana
No ratings yet
Research - Seminar - N Jana
19 pages
PR January20 04 PDF
No ratings yet
PR January20 04 PDF
40 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Cours FLD
No ratings yet
Cours FLD
28 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
Bayesian
No ratings yet
Bayesian
21 pages
Asdfghjkl
No ratings yet
Asdfghjkl
22 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
Lec 6
No ratings yet
Lec 6
14 pages
Lachenbruch EstimationErrorRates 1968
No ratings yet
Lachenbruch EstimationErrorRates 1968
12 pages
22-23 323 Week7Notes
No ratings yet
22-23 323 Week7Notes
15 pages
Sst414 Lesson 5
No ratings yet
Sst414 Lesson 5
11 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Classification Models
No ratings yet
Classification Models
95 pages
Inf2b Learn Note10 2up
No ratings yet
Inf2b Learn Note10 2up
7 pages
Kuliah 3 Teori Keputusan Bayes Bag 2
No ratings yet
Kuliah 3 Teori Keputusan Bayes Bag 2
30 pages
Supervised Unsupervised
No ratings yet
Supervised Unsupervised
39 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Lecture 5 Discriminant Analysis
No ratings yet
Lecture 5 Discriminant Analysis
10 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Document
No ratings yet
Document
6 pages
Discriminant Analysis: 5.1 The Maximum Likelihood (ML) Rule
No ratings yet
Discriminant Analysis: 5.1 The Maximum Likelihood (ML) Rule
6 pages
Unit 5
No ratings yet
Unit 5
21 pages
PDF
No ratings yet
PDF
185 pages
MAS 408 - Discriminant Analysis
No ratings yet
MAS 408 - Discriminant Analysis
7 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Hw2 - Raymond Von Mizener - Chirag Mahapatra
No ratings yet
Hw2 - Raymond Von Mizener - Chirag Mahapatra
13 pages
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
No ratings yet
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
16 pages
Statistic and Probability
100% (4)
Statistic and Probability
21 pages
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
No ratings yet
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
26 pages
Stochastic Analysis
No ratings yet
Stochastic Analysis
358 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
Legal 3 AI
No ratings yet
Legal 3 AI
3 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
Statistics and Probability SUMMATIVE (1) 082451
No ratings yet
Statistics and Probability SUMMATIVE (1) 082451
4 pages
First Summative Test in Statistics and Probability
77% (13)
First Summative Test in Statistics and Probability
2 pages
Splusdiscrete2 Manual
No ratings yet
Splusdiscrete2 Manual
281 pages
Pattern Recognition
No ratings yet
Pattern Recognition
9 pages
Chi-Square, Student's T and Snedecor's F Distributions
No ratings yet
Chi-Square, Student's T and Snedecor's F Distributions
20 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
HAMDIKEN Iskander Ouassim
No ratings yet
HAMDIKEN Iskander Ouassim
53 pages
Econometrics: ECO 2400: Anisha Sharma
No ratings yet
Econometrics: ECO 2400: Anisha Sharma
134 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
Homework Decision Solutions
No ratings yet
Homework Decision Solutions
3 pages
Descriptive Statistics (Presentation) .
No ratings yet
Descriptive Statistics (Presentation) .
16 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
BSC Statistics Syllabus 01122015
No ratings yet
BSC Statistics Syllabus 01122015
16 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
Statistics
No ratings yet
Statistics
7 pages
Lab 0 - Hypertext - Web Protocols Lab
No ratings yet
Lab 0 - Hypertext - Web Protocols Lab
24 pages
D071171011 - Tugas 02
No ratings yet
D071171011 - Tugas 02
8 pages
Class 11 Maths Chapter 16
No ratings yet
Class 11 Maths Chapter 16
44 pages
G.J.O. Jameson - Finding Carmichael Numbers
No ratings yet
G.J.O. Jameson - Finding Carmichael Numbers
12 pages
Kci Fi003157635
No ratings yet
Kci Fi003157635
34 pages
EDA01 Normal Distribution
No ratings yet
EDA01 Normal Distribution
14 pages
Harder Applied A Level Questions - Problem Solving
No ratings yet
Harder Applied A Level Questions - Problem Solving
17 pages
Tóm tắt word 11 TCDNNC
No ratings yet
Tóm tắt word 11 TCDNNC
18 pages
A Short Introduction To Probability
No ratings yet
A Short Introduction To Probability
22 pages
1 Omitted Variable Bias: Part I: The Baseline: SLR.1-4 Hold, and Our Estimates Are Unbiased
No ratings yet
1 Omitted Variable Bias: Part I: The Baseline: SLR.1-4 Hold, and Our Estimates Are Unbiased
10 pages
Practice Paper 5 Probability and Statistics 1 g767IxsSAEaWqili
No ratings yet
Practice Paper 5 Probability and Statistics 1 g767IxsSAEaWqili
9 pages
University of Cambridge International Examinations General Certificate of Education Advanced Level
No ratings yet
University of Cambridge International Examinations General Certificate of Education Advanced Level
4 pages
Characteristic
No ratings yet
Characteristic
9 pages
Module 21 Independent Random Variables
No ratings yet
Module 21 Independent Random Variables
12 pages
Final Project
No ratings yet
Final Project
6 pages
Noncentral Hypergeometric Distribution: J S M J M P
No ratings yet
Noncentral Hypergeometric Distribution: J S M J M P
6 pages
Assignment1 ISOM2500 Spring2024 L4L5 Soltuion
No ratings yet
Assignment1 ISOM2500 Spring2024 L4L5 Soltuion
6 pages
Birthday Problem
No ratings yet
Birthday Problem
2 pages
Homework Emilio
No ratings yet
Homework Emilio
2 pages
Lab6 - Naive Bayes Classification
No ratings yet
Lab6 - Naive Bayes Classification
4 pages
4.2 - Binomial Probability Distributions
No ratings yet
4.2 - Binomial Probability Distributions
3 pages
Mixture of Gaussians and The EM Algorithm
No ratings yet
Mixture of Gaussians and The EM Algorithm
1 page
Harmonic Analysis and the Theory of Probability
From Everand
Harmonic Analysis and the Theory of Probability
Salomon Bochner
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet

AE - Tema 5 - Two-Class Fisher Discriminant Analysis

Uploaded by

AE - Tema 5 - Two-Class Fisher Discriminant Analysis

Uploaded by

Two-Class Discriminant Analysis

On many occasions, we will be faced with the problem of determining to

Bayes Discriminant Analysis

The source code to obtain the previous result is

7 k1 = 1 / 3 * f1 (4.7) / (1 / 3 * f1 (4.7) +2 / 3 * f2 (4.7) )

Sometimes the cost of misclassifying an observation may depend on its true

where there is no loss if the observation is correctly classified.

Discrimination between two Normal populations with the same

max dT (µ1 − µ2 )(µ1 − µ2 )T d

2(µ1 − µ2 )(µ1 − µ2 )T d − 2λΣd = 0

We would like to find a linear combination of predictors, w ∈ Rp , that maximizes

The numerator can be written as

(2SB w)wT SW w − wT SB w(2SW w)

if SW has full rank then

SB w = (m2 − m1 ) (m2 − m1 )T w = α(m2 − m1 )

You might also like