0% found this document useful (0 votes)

104 views19 pages

Multivariate Analysis (Slides 8)

This document discusses linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). LDA and QDA are statistical techniques used for classification that assume data can be modeled using a multivariate normal distribution. They allow calculating the probability that new, unlabeled data belongs to different known groups based on features of labeled training data. The document explains how LDA and QDA make classifications and the assumptions they are based on, such as equal covariance matrices across groups for LDA.

Uploaded by

John Fogarty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views19 pages

Multivariate Analysis (Slides 8)

Uploaded by

John Fogarty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Multivariate Analysis (slides 8)

• Today we consider Linear Discriminant Analysis (LDA) and Quadratic

Discriminant Analysis (QDA).
• These are used if it is assumed that there exists a set of k groups within the
data and that there is a subset of the data that is labelled, i.e., whose group
membership is known.
• Discriminant analysis refers to a set of ‘supervised’ statistical techniques
where the class information is used to help reveal the structure of the data.
• This structure then allows the ‘classification’ of future observations.

1
Discriminant Analysis
• We want to be able to use knowledge of labelled data (i.e., those whose
group membership is known) in order to classify the group membership of
unlabelled data.
• We previously considered the k-nearest neighbours technique for this
problem.
• We shall now consider the alternative approaches of:
– LDA (linear discriminant analysis)
– QDA (quadratic discriminant analysis)

2
LDA & QDA
• Unlike k-Nearest Neighbours (and all the other techniques so far covered),
both LDA and QDA assume the use of a distribution over the data.
• Once we introduce distributions (and parameters for these distributions),
we can quantify our uncertainty over the structure of the data.
• As far as classification is concerned, this means that we can consider the
probability of group assignment.
• The distinction between a point that is assigned a probability of 0.51 to one
group and 0.49 to another, against a point that is assigned a probability of
0.99 to one group and 0.01 to another, can be quite important.

3
Multivariate Normal Distribution
• Let xT = (x1 , x2 , ..., xm ), where x1 , x2 , ..., xm are random variables.
• The Multi-Variate Normal (MVN) distribution has two parameters:
– Mean µ, an m-dimensional vector.
– Covariance matrix Σ, with dimension m × m.
• A vector x is said to follow a MVN distribution, denoted x ∼ M V N (µ, Σ),
if it has the following probability density function:
[ ]
1 1 T −1
f (x|µ, Σ) = m 1 exp − (x − µ) Σ (x − µ)
(2π) |Σ|
2 2 2

• Here |Σ| is denotes the determinant of Σ.

4
Multivariate Normal Distribution
• The MVN distribution is very useful when modelling multivariate data.
• Notice:
{ [ ]}
T −1 m 1
{x : f (x|µ, Σ) > C} = x : (x − µ) Σ (x − µ) < −2 log C(2π) 2 |Σ| 2

• This corresponds to an m-dimensional ellipsoid centered at point µ.

• If it is assumed that the data within a group k follows a MVN distribution
with mean µk and covariance Σk , then the scatter of the data should be
roughly elliptical.
• The mean fixes the location of the scatter and the covariance affects the
shape of the ellipsoid.

5
Normal Contours
   
0 1 0.8
• For example, the contour plot of a MVN  ,  is:
0 0.8 3
3
2
1
0
−1
−2
−3

−2 −1 0 1 2

6
Normal Contours: Data
• Sampling from this distribution and overlaying the results on the contour
plot gives:
6
4
2
0
−2
−4
−6

−4 −2 0 2 4

7
Shape of Scatter
• If we assume that the data within each group follows a MVN distribution
with mean µk and covariance Σk , then we also assume that the scatter is
roughly elliptical.
• The mean sets the location of this scatter and the covariance sets the shape
of the ellipse.

4
4

2
2

0
0
−2

−2
−4

−4

−3 −1 1 2 −3 −1 1 2 3
8
Mahalanobis Distance
• The Mahalanobis distance from a point x to a mean µ is D, where

D2 = (x − µ)T Σ−1 (x − µ).

• Two points have the same Mahalanobis distance if they are on the same
ellipsoid centered on µ (as defined earlier).
6
4
2

µ
0
−2
−4
−6

−4 −2 0 2 4
9
Which Is Closest?
• Suppose we wish to find the mean µk that a point x is closest to as
measured by Mahalanobis distance.
• That is, we want to find the k that minimizes the expression:

(x − µk )T Σ−1
k (x − µk )

• The point x is closer to µk than it is to µl (under Mahalanobis distance)

when:
(x − µk )T Σ−1
k (x − µ k ) < (x − µ l )T −1
Σl (x − µl ).

• This is a quadratic expression of x.

10
When Covariance is Equal
• If Σk = Σ for all k, then the previous expression becomes:

(x − µk )T Σ−1 (x − µk ) < (x − µl )T Σ−1 (x − µl ).

• This can be simplified as:

−2xT Σ−1 µk + µTk Σ−1 µk < −2xT Σ−1 µl + µTl Σ−1 µl

⇔ −2µTk Σ−1 x + µTk Σ−1 µk < −2µTl Σ−1 x + µTl Σ−1 µl

• This is now a linear expression of x

11
Estimating Equal Covariance
• In LDA we need to pool the covariance matrices of individual classes.
• Remember that the sample covariance matrix Q for a set of n observations
of dimension m is the matrix whose elements are
1 ∑
n
qij = (xki − xi )(xkj − xj )
n−1
k=1

for i = 1, 2, . . . , m and j = 1, 2, . . . , m.
• Then the pooled covariance matrix is defined as:

1 ∑
g
Qp = (nl − 1)Ql
n−g
l=1

Where g is the number of classes, Ql is the estimated sample covariance

matrix for class l, nl is the number of data points in class l, whilst n is the
total number of data points.
12
Estimating Equal Covariance
• This formula arises from summing the squares and cross products over data
points in all classes:
∑
g ∑
nl
Wij = (xki − xli )(xkj − xlj )
l=1 k=1

for i = 1, . . . , m and j = 1, . . . , m.
• Hence:
∑
g
W = (nl − 1)Ql
l=1

• Given n data points falling in g groups, we have n − g degrees of freedom

because we need to estimate the g group means.
• This results in the previous formula for the pooled covariance matrix:
W
Qp =
n−g
13
Modelling Assumptions
• Both LDA and QDA are parametric statistical methods.
• To classify a new observation x into one of the known K groups, we need
P(x ∈ k|x) for k = 1, . . . K.
• That is, we need to know the posterior probability of belonging to each of
group, given the data.
• We then classify the new observation as belonging to the class which has
largest posterior probability.
• Bayes’ Theorem states that the posterior probability of observation x
belonging to group k is:
πk f (xi |x ∈ k)
P(x ∈ k|x) = ∑K
l=1 πl f (xi |x ∈ l)

14
Modelling Assumptions
• Discriminant analysis assumes that observations from group k follow a
MVN distribution with mean µk and covariance Σk .
• That is
[ ]
1 1 T −1
f (xi |i ∈ k) = f (xi |µk , Σk ) = m 1 exp − (x i − µ k ) Σk (xi − µk )
(2π) |Σk |
2 2 2

• Discriminant analysis (as presented here) also assumes values for

πk = P(i ∈ k), which is the proportion of population objects belonging to
class k (this can be known or estimated).
∑K
• Note that k=1 πk = 1.
• Typically, πk = 1/K is used.
• πk are sometimes referred to as prior probabilities.
• We can then compute P(i ∈ k|x) and assign data points to groups so as to
maximise this probability.
15
Some Calculations
• The probability of an observation i belonging to group k conditional on xi
being known satisfies:

P(i ∈ k|x) ∝ πk f (xi |µk , Σk ).

• Hence,

P(i ∈ k|xi ) > P(i ∈ l|xi ) ⇔ πk f (xi |µk , Σk ) > πl f (xi |µl , Σl )

• Taking logarithms and substituting in the probability density function for a

MVN distribution we find after simplification:
1 1
log πk − log |Σk | − (xi − µk )T Σ−1
k (xi − µk )
2 2

1 1
> log πl − log |Σl | − (xi − µl )T Σ−1
l (xi − µl )
2 2
16
Linear Discriminant Analysis
• If equal covariances are assumed then P(i ∈ k|xi ) > P(i ∈ l|xi ) if and only if:
1 1
log πk + xTi Σ−1 µk − µTk Σ−1 µk > log πl + xTi Σ−1 µl − µTl Σ−1 µl .
2 2
• Hence the name linear discriminant analysis.
• If πk = 1/K for all k, then this reduces further:
( 1 )T
xi − (µk + µl ) Σ−1 (µk − µl ) > 0
2

17
Quadratic Discriminant Analysis
• No simplification arises in the unequal covariance case, hence
P(i ∈ k|xi ) > P(i ∈ l|xi ) if and only if:
1 1
log πk − log |Σk | − (xi − µk )T Σ−1
k (xi − µk )
2 2

1 1
> log πl − log |Σl | − (x − µl )T Σ−1
l (xi − µl )
2 2
• Hence the name quadratic discriminant analysis.
• If πk = 1/K for all k, then some simplification arises.

18
Summary
• In LDA the decision boundary between class k and class l is given by:
P (k|x) πk f (x|k)
log = log + log =0
P (l|x) πl f (x|l)

• Unlike k-nearest neighbour, both LDA and QDA are model based classifiers
where P(data|group) is assumed to follow a MVN distribution:
– The model based assumption allows for the generation of the probability
for class membership.
– The MVN assumption means that groups are assumed to follow an
elliptical shape.
• Whilst LDA assumes groups have the same covariance matrix, QDA
permits different covariance structures between groups.

STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Unit 2 - Gaussian Models
No ratings yet
Unit 2 - Gaussian Models
67 pages
Lec5 Part1
No ratings yet
Lec5 Part1
42 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
45 pages
Lecture 9: Classification, LDA: Reading: Chapter 4
No ratings yet
Lecture 9: Classification, LDA: Reading: Chapter 4
55 pages
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
No ratings yet
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
33 pages
Notes Discriminant Analysis March 2021
No ratings yet
Notes Discriminant Analysis March 2021
59 pages
Principles of Multivariate Analysis
No ratings yet
Principles of Multivariate Analysis
6 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
28 pages
Linear Discriminant Analysis: January 2015
No ratings yet
Linear Discriminant Analysis: January 2015
67 pages
Week2 Part1 Summer Partial Notes
No ratings yet
Week2 Part1 Summer Partial Notes
75 pages
Week 7 Notes
No ratings yet
Week 7 Notes
24 pages
Lda PDF
No ratings yet
Lda PDF
47 pages
Fishers LDA
No ratings yet
Fishers LDA
47 pages
Linear Discriminant Analysis and Its Variations: Abu Minhajuddin CSE 8331
No ratings yet
Linear Discriminant Analysis and Its Variations: Abu Minhajuddin CSE 8331
20 pages
Lecture14 Discriminant Analysis
No ratings yet
Lecture14 Discriminant Analysis
38 pages
Lect 13 - Bayes Decistion Theory - Derivation
No ratings yet
Lect 13 - Bayes Decistion Theory - Derivation
25 pages
Asdfghjkl
No ratings yet
Asdfghjkl
22 pages
Machine Learning (CSCI-567, Fall 2008) - Linear Discriminant Analysis
No ratings yet
Machine Learning (CSCI-567, Fall 2008) - Linear Discriminant Analysis
32 pages
A First Course in Multivariate Statistics: Bernard Flury
No ratings yet
A First Course in Multivariate Statistics: Bernard Flury
4 pages
Bayesian Classifier Linear Disciminant Analysis (LDA) Quadratic Discriminant Analysis (QDA)
No ratings yet
Bayesian Classifier Linear Disciminant Analysis (LDA) Quadratic Discriminant Analysis (QDA)
18 pages
Lec-04 - Linear Discriminant Analysis
No ratings yet
Lec-04 - Linear Discriminant Analysis
23 pages
9 - Linear Discriminant Analysis
No ratings yet
9 - Linear Discriminant Analysis
19 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
20 pages
Lec 9
No ratings yet
Lec 9
52 pages
Classification Models
No ratings yet
Classification Models
95 pages
Slides Classification Discranalysis
No ratings yet
Slides Classification Discranalysis
11 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
27 pages
ML Lab 8 - LDA
No ratings yet
ML Lab 8 - LDA
4 pages
Linear Discriminant Analysis How To Have A Practical Approach To An LDA Model?
No ratings yet
Linear Discriminant Analysis How To Have A Practical Approach To An LDA Model?
6 pages
MAS 408 - Discriminant Analysis
No ratings yet
MAS 408 - Discriminant Analysis
7 pages
Legal 3 AI
No ratings yet
Legal 3 AI
3 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
Lec 9
No ratings yet
Lec 9
15 pages
Dimensions Reduction
No ratings yet
Dimensions Reduction
27 pages
Lec 9 Lda
No ratings yet
Lec 9 Lda
48 pages
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
No ratings yet
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
16 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
19 pages
Notes 14
No ratings yet
Notes 14
189 pages
MVA Part I
No ratings yet
MVA Part I
39 pages
Fisher Linear Discriminant Analysis: 1 What's LDA
No ratings yet
Fisher Linear Discriminant Analysis: 1 What's LDA
6 pages
Generative Algorithms
No ratings yet
Generative Algorithms
3 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
13 pages
Linear Discriminant Analysis Reference
No ratings yet
Linear Discriminant Analysis Reference
6 pages
Linear Discriminat Analysis
No ratings yet
Linear Discriminat Analysis
23 pages
Machine Learning-Lecture 3 (Student)
No ratings yet
Machine Learning-Lecture 3 (Student)
4 pages
Linear Classifiers: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Linear Classifiers: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
46 pages
Incomplete 1
No ratings yet
Incomplete 1
9 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
Classification: 12.1 Discriminant Analysis
No ratings yet
Classification: 12.1 Discriminant Analysis
21 pages
LDA Tutorial
No ratings yet
LDA Tutorial
47 pages
Đề KT Unit 2 - Tiếng Anh 3 Smart Start - TEST
No ratings yet
Đề KT Unit 2 - Tiếng Anh 3 Smart Start - TEST
5 pages
Pattern Recognition
No ratings yet
Pattern Recognition
9 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Supervised Learning: Linear Methods (1/2) : Applied Multivariate Statistics - Spring 2012
No ratings yet
Supervised Learning: Linear Methods (1/2) : Applied Multivariate Statistics - Spring 2012
15 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
The Multivariate Normal Distribution: f (x) = √ e −∞ 0. /σ
No ratings yet
The Multivariate Normal Distribution: f (x) = √ e −∞ 0. /σ
5 pages
MCQ 25 Years Pyq
No ratings yet
MCQ 25 Years Pyq
31 pages
Informal Assessment GRADE 11
No ratings yet
Informal Assessment GRADE 11
37 pages
Software Development Contract Template
No ratings yet
Software Development Contract Template
6 pages
Corporate Acc Unit 4 MCQ
100% (1)
Corporate Acc Unit 4 MCQ
7 pages
Daikool VRF Catalog.
No ratings yet
Daikool VRF Catalog.
38 pages
Top 10 Voip Ip Telephony Interview Questions and Answers With An Extra C# Softphone Development Tutorial
No ratings yet
Top 10 Voip Ip Telephony Interview Questions and Answers With An Extra C# Softphone Development Tutorial
9 pages
Copy-of-FAO Supplier Wizard 2020 ENG
No ratings yet
Copy-of-FAO Supplier Wizard 2020 ENG
31 pages
Unit 1 QS & SPW I
No ratings yet
Unit 1 QS & SPW I
13 pages
AIESL CAPABILITY (Group A) 1
No ratings yet
AIESL CAPABILITY (Group A) 1
314 pages
New Dole Format
No ratings yet
New Dole Format
3 pages
BBM Maths Notes - Bhartiyar University
100% (1)
BBM Maths Notes - Bhartiyar University
171 pages
AssociationRule and Apriori
No ratings yet
AssociationRule and Apriori
45 pages
A Study On The Dynamic Analysis of Mooring System
No ratings yet
A Study On The Dynamic Analysis of Mooring System
9 pages
Inversion + Passive Voices
No ratings yet
Inversion + Passive Voices
51 pages
ParkMe - Pitch Deck
No ratings yet
ParkMe - Pitch Deck
10 pages
Parker Palmer Interview Leader To Leader
100% (1)
Parker Palmer Interview Leader To Leader
8 pages
Audit Chapter 5 Remaining Questions (Kindly Printout)
No ratings yet
Audit Chapter 5 Remaining Questions (Kindly Printout)
18 pages
Getting Higher Quality
No ratings yet
Getting Higher Quality
30 pages
EternumWTv0.4 b1 Compressed
No ratings yet
EternumWTv0.4 b1 Compressed
27 pages
SE - Lighting LED Aluminum Profiles Catalogue 2023
No ratings yet
SE - Lighting LED Aluminum Profiles Catalogue 2023
27 pages
Topic Call To Be Different P
No ratings yet
Topic Call To Be Different P
3 pages
Int'l Application Guidelines
No ratings yet
Int'l Application Guidelines
18 pages
The Flip Side
No ratings yet
The Flip Side
4 pages
FYIT Dbms March14
No ratings yet
FYIT Dbms March14
4 pages
The Fish'N Chicken Family Value Meals: Quality Take Home Cooking
No ratings yet
The Fish'N Chicken Family Value Meals: Quality Take Home Cooking
2 pages
BPW Vol1 No4 PDF
No ratings yet
BPW Vol1 No4 PDF
52 pages
Logistics Manager - Franco Canzani
No ratings yet
Logistics Manager - Franco Canzani
2 pages
Git Hub Log
No ratings yet
Git Hub Log
4 pages
Sbar Template RN To PDF
No ratings yet
Sbar Template RN To PDF
2 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)

Multivariate Analysis (Slides 8)

Uploaded by

Multivariate Analysis (Slides 8)

Uploaded by

Multivariate Analysis (slides 8)

• Today we consider Linear Discriminant Analysis (LDA) and Quadratic

• Here |Σ| is denotes the determinant of Σ.

• This corresponds to an m-dimensional ellipsoid centered at point µ.

D2 = (x − µ)T Σ−1 (x − µ).

• The point x is closer to µk than it is to µl (under Mahalanobis distance)

• This is a quadratic expression of x.

(x − µk )T Σ−1 (x − µk ) < (x − µl )T Σ−1 (x − µl ).

• This can be simplified as:

−2xT Σ−1 µk + µTk Σ−1 µk < −2xT Σ−1 µl + µTl Σ−1 µl

⇔ −2µTk Σ−1 x + µTk Σ−1 µk < −2µTl Σ−1 x + µTl Σ−1 µl

• This is now a linear expression of x

Where g is the number of classes, Ql is the estimated sample covariance

• Given n data points falling in g groups, we have n − g degrees of freedom

• Discriminant analysis (as presented here) also assumes values for

P(i ∈ k|x) ∝ πk f (xi |µk , Σk ).

• Taking logarithms and substituting in the probability density function for a

You might also like