0% found this document useful (0 votes)
79 views7 pages

MAS 408 - Discriminant Analysis

Discriminant analysis is a classification technique used to predict group membership. It analyzes variables measured on known training data to develop rules for assigning new observations to one of the known groups. The method determines which variables best distinguish between the groups and uses these to build a model. New data can then be classified based on the model by predicting which group it is most likely to belong to.

Uploaded by

Dorin Katuu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views7 pages

MAS 408 - Discriminant Analysis

Discriminant analysis is a classification technique used to predict group membership. It analyzes variables measured on known training data to develop rules for assigning new observations to one of the known groups. The method determines which variables best distinguish between the groups and uses these to build a model. New data can then be classified based on the model by predicting which group it is most likely to belong to.

Uploaded by

Dorin Katuu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

DISCRIMINANT ANALYSIS

Adapted from PSU Online Notes

Discriminant analysis is

– is a classification problem.

– two or more groups or clusters or populations are known a priori and one or more new
observations are classified into one of the known populations based on some measured
characteristics.

Example 1

Data were collected on two species of insects in the genus Chaetocnema, (a) Ch. concinna and
(b) Ch. heikertlingeri. Three variables were measured on each insect:

– width of the 1st joint of the tarsus (legs)

– width of the 2nd joint of the tarsus

– width of the aedeagus (reproductive organ)

The objective is to obtain a classification rule for identifying the insect species based on these
three variables.
Let P(πi |x) denote the conditional probability that an observation came from population given
that the observed values of the multivariate vector of variables x.

– We classify an observation to the population for which the value of P(πi |x) is greatest.

– This is most probable group given the observed values of x.

Notation

– Suppose that we have g populations (groups) and that the ith population is denoted as
πi .

– Let pi = P(πi ), be the probability that a randomly selected observation is in population


πi .

– Let f(x|πi ) be the conditional probability density function of the multivariate set of
variables, given that the observation came from population πi .

1
The probability of interest is

P(member of πi and we observe x)


P(member of πi | we observed x) = (1)
P(we observe x)

– The numerator in Equation (1) is the likelihood that a randomly selected observation is
both from population πi and has the value x. This likelihood = pi f(x|πi ).

– The denominator in Equation (1) is the unconditional likelihood (over all populations)
P
that we could observe x. This likelihood = gj=1 pj f(x|πj ).

The posterior probability that an observation is a member of population πi is

pi f(x|πi )
p(πi |x) = Pg (2)
j=1 pj f(x|πj )

– The classification rule is to assign observation x to the population for which Equation (2)
is the greatest.

– The denominator in Equation (2) is the same for all posterior probabilities (for the various
populations) so it is equivalent to say that we will classify an observation to the population
for which pi f(x|πi ) is greatest.

Case of two populations

In the case of two populations we express a classification rule in terms of the ratio of the two
posterior probabilities.
We classify to population 1 when
p1 f(x|π1 )
>1 (3)
p2 f(x|π2 )
Equation (3) can be written as
f(x|π1 ) p2
> (4)
f(x|π2 ) p1
The classification rule is to assign observation x to the population for which Equation (2) is
the greatest is equivalent to assign it to the population that maximizes the product:

log f(x|πi )pi (5)

Steps in Discriminant Analysis

Discriminant analysis is a 7-step procedure. The steps are:

1. Collect training data.

2
– Training data are data with known group memberships.

– We know which population contains each subject.

2. Prior Probability.

– pi represents the expected portion of the community that belongs to population πi .


There are three common choices namely:

a) Equal priors: p̂i = g1 , useful if we believe that all of the population sizes are equal.

b) Arbitrary priors selected according to the investigators beliefs regarding the relative
population sizes, however p̂1 + p̂2 + · · · + p̂g = 1.

c) Estimated priors
ni
p̂i =
N
where ni is the number observations from population πi in the training data, and
N = n1 + n2 + . . . + ng .

3. Bartlett’s test.

– The means of the populations must be equal for there to be case for Discriminant
Analysis (DA).

– Bartlett’s test is used to determine if the variance-covariance matrices are homogeneous


for all populations involved.

– The result of this test will determine whether to use Linear or Quadratic DA.

– Linear DA is for homogeneous variance-covariance matrices:

Σ1 = Σ2 = · · · = Σg = Σ

– Quadratic DA is for heterogeneous variance-covariance matrices:

Σi 6= Σj for some i 6= j

4. Estimate the parameters of the conditional probability density functions f(X|πi ).


The following standard assumptions are made:

a) The data from group i has common mean vector µi .

b) The data from group i has common variance-covariance matrix Σ.

c) Independence: the subjects are independently sampled.

3
d) Normality: the data are multivariate normally distributed.

5. Compute discriminant functions - the rule to classify the new object into one of the known
populations.

6. Use cross validation to estimate misclassification probabilities.

– This is a diagnostic procedure to assess the efficacy of the discriminant analysis.

– You will have some prior rules about what constitutes an acceptable misclassification
rate. These rules could include questions like, ”What is the cost of misclassification?”
For example, in a medical study to help you diagnose cancer. There are two costs to
consider:

– the cost of incorrectly labeling someone as having cancer when they do not. This
could result in some emotional distress!

– the cost of misclassifying someone as not having cancer when they actually do.
The cost is obviously higher if early detection improves cure rates.

– Cross-validation is used to assess the classification probability.

7. Classify observations with unknown group memberships.

Linear Discriminant Analysis

Assume that in population πi the probability density function of x is multivariate normal with
mean vector µi and variance-covariance matrix Σ (same for all populations). Then

1
 
1 0 −1
f(x|πi ) = exp − (x − µi ) Σ (x − µi ) (6)
(2π)p/2 |Σ|1/2 2

Recall:

– We classify to the population with largest pi f(x|πi ).

– This is equivalent to the population with largest log pi f(x|πi ).

In this case, our decision rule is based on the Linear Score Function, a function of the population
means for each of our g populations, µi , as well as the pooled variance-covariance matrix.

4
Linear Score Function

The Linear Score Function is:


1 0 0
X p
sLi (x) = − µi Σ−1 µi + µi Σ−1 x + log pi = di0 + dij xj + log pi (7)
2 j=1

where
1 0
di0 = − µi Σ−1 µi
2
0
dij = j element of µi Σ−1
th

Linear Discriminant Function


1 0 0
X p
dLi (x) = − µi Σ−1 µi + µi Σ−1 x = di0 + dij xj (8)
2 j=1

Equation (7) is computed for each population, then we plug in our observation values and
assign the unit to the population with the largest score.
However, Equation (7) is in terms of population values. These parameters are estimated these
from training data, in which the population membership is known.
Discriminant analysis requires estimates of:

– Prior probabilities: pi = Pr(πi ); i = 1, 2, . . . , g

– The population means are estimated by the sample mean vectors: µi = E(x|πi ); i =
1, 2, . . . , g

– The variance-covariance matrix is estimated by using the pooled variance-covariance ma-


trix: Σ = var(x|πi ); i = 1, 2, . . . , g

Conditional Density Function Parameters

Population Means: µi is estimated by substituting in the sample means x̄i .


Variance-Covariance matrix: Let Si denote the sample variance-covariance matrix for popu-
lation i. Then the variance-covariance matrix Σ is estimated by substituting in the pooled
variance-covariance matrix into the Linear Score Function as:
Pg
(ni − 1)Si
Sp = Pi=1g (9)
i=1 (ni − 1)

to obtain the estimated linear score function:


1 X
p
ŝLi (x) = − x̄i0 S−1
p x̄ i + x̄ 0 −1
S
i p x + log p̂ i = d̂ i0 + d̂ij xj + log pi (10)
2 j=1

5
where
1
d̂i0 = − x̄i0 S−1
p x̄i
2
and
d̂ij = jth element of x̄i0 S−1
p

Classification with Two Multivariate Normal Populations

Misclassification costs and prior probabilities

The criterion for classifying is to minimize the expected cost of misclassification. It is minimized
by classifying a unit with measurement x0 into population 1 if

f1 (x0 ) c(1|2)p2
> (11)
f2 (x0 ) c(2|1)p1

then, the estimated minimum average or Expected Cost of Misclassification (ECM) rule for two
normal populations allocates x0 to population 1 if
 
T 1 c(1|2)p2
(x¯1 − x¯2 ) S−1
p x0 − (x¯1 − x¯2 )T S−1
p (x¯1 + x¯2 ) > log (12)
2 c(2|1)p1

otherwise, it allocates x0 to population 2.


Note

– c(1|2) is the cost when population 2 observation is incorrectly classified into population 1

– c(2|1) is the cost when population 1 observation is incorrectly classified into population
2.

– p1 and p2 are prior probabilities in population 1 and 2, respectively.

Exercise

1. Page 650 11.1

2. Suppose that n1 = 11 and n2 = 12 observations are sampled from two different bivariate
normal distributions that have a common covariance matrix Σ and possibly different mean
vectors µ1 and µ2 . The sample mean vectors and pooled covariance matrix are:

a) Report the estimate of the formula for Fisher”s linear discriminant function. Explain
how this function is used to classify.

6
b) Consider an observation x0 = [1, 2]T on a new experimental unit. Was this unit more
likely to have come from population 1 or population 2? (Assume equal misclassification
costs and equal prior probabilities).

c) Classify the unit in part (b) assuming prior probabilities .35 and .65 of observing a unit
from populations 1 and 2, respectively. Also, assume the cost of misclassifying a unit
from population 1 into population 2 is ten times greater than the cost of misclassifying
a unit from population 2 into population 1

You might also like