Multivariate Analysis (Slides 8)
Multivariate Analysis (Slides 8)
Multivariate Analysis (Slides 8)
1
Discriminant Analysis
• We want to be able to use knowledge of labelled data (i.e., those whose
group membership is known) in order to classify the group membership of
unlabelled data.
• We previously considered the k-nearest neighbours technique for this
problem.
• We shall now consider the alternative approaches of:
– LDA (linear discriminant analysis)
– QDA (quadratic discriminant analysis)
2
LDA & QDA
• Unlike k-Nearest Neighbours (and all the other techniques so far covered),
both LDA and QDA assume the use of a distribution over the data.
• Once we introduce distributions (and parameters for these distributions),
we can quantify our uncertainty over the structure of the data.
• As far as classification is concerned, this means that we can consider the
probability of group assignment.
• The distinction between a point that is assigned a probability of 0.51 to one
group and 0.49 to another, against a point that is assigned a probability of
0.99 to one group and 0.01 to another, can be quite important.
3
Multivariate Normal Distribution
• Let xT = (x1 , x2 , ..., xm ), where x1 , x2 , ..., xm are random variables.
• The Multi-Variate Normal (MVN) distribution has two parameters:
– Mean µ, an m-dimensional vector.
– Covariance matrix Σ, with dimension m × m.
• A vector x is said to follow a MVN distribution, denoted x ∼ M V N (µ, Σ),
if it has the following probability density function:
[ ]
1 1 T −1
f (x|µ, Σ) = m 1 exp − (x − µ) Σ (x − µ)
(2π) |Σ|
2 2 2
4
Multivariate Normal Distribution
• The MVN distribution is very useful when modelling multivariate data.
• Notice:
{ [ ]}
T −1 m 1
{x : f (x|µ, Σ) > C} = x : (x − µ) Σ (x − µ) < −2 log C(2π) 2 |Σ| 2
5
Normal Contours
0 1 0.8
• For example, the contour plot of a MVN , is:
0 0.8 3
3
2
1
0
−1
−2
−3
−2 −1 0 1 2
6
Normal Contours: Data
• Sampling from this distribution and overlaying the results on the contour
plot gives:
6
4
2
0
−2
−4
−6
−4 −2 0 2 4
7
Shape of Scatter
• If we assume that the data within each group follows a MVN distribution
with mean µk and covariance Σk , then we also assume that the scatter is
roughly elliptical.
• The mean sets the location of this scatter and the covariance sets the shape
of the ellipse.
4
4
2
2
0
0
−2
−2
−4
−4
−3 −1 1 2 −3 −1 1 2 3
8
Mahalanobis Distance
• The Mahalanobis distance from a point x to a mean µ is D, where
• Two points have the same Mahalanobis distance if they are on the same
ellipsoid centered on µ (as defined earlier).
6
4
2
µ
0
−2
−4
−6
−4 −2 0 2 4
9
Which Is Closest?
• Suppose we wish to find the mean µk that a point x is closest to as
measured by Mahalanobis distance.
• That is, we want to find the k that minimizes the expression:
(x − µk )T Σ−1
k (x − µk )
10
When Covariance is Equal
• If Σk = Σ for all k, then the previous expression becomes:
11
Estimating Equal Covariance
• In LDA we need to pool the covariance matrices of individual classes.
• Remember that the sample covariance matrix Q for a set of n observations
of dimension m is the matrix whose elements are
1 ∑
n
qij = (xki − xi )(xkj − xj )
n−1
k=1
for i = 1, 2, . . . , m and j = 1, 2, . . . , m.
• Then the pooled covariance matrix is defined as:
1 ∑
g
Qp = (nl − 1)Ql
n−g
l=1
for i = 1, . . . , m and j = 1, . . . , m.
• Hence:
∑
g
W = (nl − 1)Ql
l=1
14
Modelling Assumptions
• Discriminant analysis assumes that observations from group k follow a
MVN distribution with mean µk and covariance Σk .
• That is
[ ]
1 1 T −1
f (xi |i ∈ k) = f (xi |µk , Σk ) = m 1 exp − (x i − µ k ) Σk (xi − µk )
(2π) |Σk |
2 2 2
• Hence,
P(i ∈ k|xi ) > P(i ∈ l|xi ) ⇔ πk f (xi |µk , Σk ) > πl f (xi |µl , Σl )
1 1
> log πl − log |Σl | − (xi − µl )T Σ−1
l (xi − µl )
2 2
16
Linear Discriminant Analysis
• If equal covariances are assumed then P(i ∈ k|xi ) > P(i ∈ l|xi ) if and only if:
1 1
log πk + xTi Σ−1 µk − µTk Σ−1 µk > log πl + xTi Σ−1 µl − µTl Σ−1 µl .
2 2
• Hence the name linear discriminant analysis.
• If πk = 1/K for all k, then this reduces further:
( 1 )T
xi − (µk + µl ) Σ−1 (µk − µl ) > 0
2
17
Quadratic Discriminant Analysis
• No simplification arises in the unequal covariance case, hence
P(i ∈ k|xi ) > P(i ∈ l|xi ) if and only if:
1 1
log πk − log |Σk | − (xi − µk )T Σ−1
k (xi − µk )
2 2
1 1
> log πl − log |Σl | − (x − µl )T Σ−1
l (xi − µl )
2 2
• Hence the name quadratic discriminant analysis.
• If πk = 1/K for all k, then some simplification arises.
18
Summary
• In LDA the decision boundary between class k and class l is given by:
P (k|x) πk f (x|k)
log = log + log =0
P (l|x) πl f (x|l)
• Unlike k-nearest neighbour, both LDA and QDA are model based classifiers
where P(data|group) is assumed to follow a MVN distribution:
– The model based assumption allows for the generation of the probability
for class membership.
– The MVN assumption means that groups are assumed to follow an
elliptical shape.
• Whilst LDA assumes groups have the same covariance matrix, QDA
permits different covariance structures between groups.
19