0% found this document useful (0 votes)

4 views38 pages

Lecture14 Discriminant Analysis

The lecture covers Discriminant Analysis, focusing on Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) for classification tasks. LDA models the distribution of predictors given different categories, while QDA allows for different covariances across classes. The lecture also compares various classification methods, highlighting the performance of LDA, QDA, and k-NN based on the nature of the data and decision boundaries.

Uploaded by

vjain1855

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views38 pages

Lecture14 Discriminant Analysis

Uploaded by

vjain1855

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Lecture 14: Discriminant Analysis

CS109A Introduction to Data Science

Pavlos Protopapas and Kevin Rader
Lecture Outline

• Discriminant Analysis
• LDA for one predictor
• LDA for p > 1
• QDA
• Comparison of Classification Methods (so far)

CS109A, PROTOPAPAS, RADER

Recall the Heart Data (for classification)
response variable Y
is Yes/No

Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope Ca Thal AHD

63 1 typical 145 233 1 2 150 0 2.3 3 0.0 fixed No

67 1 asymptomatic 160 286 0 2 108 1 1.5 2 3.0 normal Yes

67 1 asymptomatic 120 229 0 2 129 1 2.6 2 2.0 reversable Yes

37 1 nonanginal 130 250 0 0 187 0 3.5 3 0.0 normal No

41 0 nontypical 130 204 0 2 172 0 1.4 1 0.0 normal No

CS109A, PROTOPAPAS, RADER

Discriminant Analysis for Classification

CS109A, PROTOPAPAS, RADER

Linear Discriminant Analysis (LDA)

Linear discriminant analaysis (LDA) takes a different approach to

classification than logistic regression. Rather than attempting to
model the conditional distribution of Y given X, P(Y = k|X = x),
LDA models the distribution of the predictors X given the different
categories that Y takes on, P(X = x|Y = k).

In order to flip these distributions around to model P(X = x|Y = k)

an analyst uses Bayes' theorem.

In this setting with one feature (one X), Bayes' theorem can then
be written as:

What does this mean? CS109A, PROTOPAPAS, RADER

LDA (cont.)

The left hand side, P(Y = k|X = x), is called the posterior
probability and gives the probability that the observation is
in the kth category given the feature, X, takes on a specific
value, x. The numerator on the right is conditional
distribution of the feature within category k, fk(x), times the
prior probability that observation is in the kth category.

The Bayes' classifier is then selected. That is the

observation assigned to the group for which the posterior
probability is the largest.
CS109A, PROTOPAPAS, RADER
Inventor of LDA: R.A. Fisher
The 'Father' of Statistics. More famous for work in genetics
(statistically concluded that Mendel's genetic experiments were
'massaged').
Novel statistical work includes:
• Experimental Design
• ANOVA
• F-test (why do you think it's called the F-test?)
• Exact test for 2 x 2 tables
• Maximum Likelihood Theory
• Use of = 0.05 significance level: “The value for which P = .05,
or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point
as a limit in judging whether a deviation is to be considered
significant or not”.
• And so much more... CS109A, PROTOPAPAS, RADER
LDA for one predictor

LDA has the simplest form when there is just one

predictor/feature (p = 1). In order to estimate fk(x), we have
to assume it comes from a specific distribution. If X is
quantitative, what distribution do you think we should use?
One common assumption is that fk(x) comes from a Normal
distribution:

In shorthand notation, this is often written as ,

meaning, the distribution of the feature X within category k
is Normally distributed with mean and variance .
CS109A, PROTOPAPAS, RADER
LDA for one predictor (cont.)
An extra assumption that the variances are equal,
will simplify are lives.
Plugging this assumed likelihood into the Bayes' formula
(to get the posterior) results in:

The Bayes classifier will be the one that maximizes this

over all values chosen for x. How should we maximize?
So we take the log of this expression and rearrange to
simplify our maximization...
CS109A, PROTOPAPAS, RADER
LDA for one predictor (cont.)

So we maximize the following simplified expression:

How does this simplify if we have just two classes (K =

2) and if we set our prior probabilities to be equal?
This is equivalent to choosing a decision boundary for x
for which

Intuitively, why does this expression make sense? What

do we use in practice?
CS109A, PROTOPAPAS, RADER
LDA for one predictor (cont.)

In practice we don’t know the true mean, variance, and

prior. So we estimate them with the classical estimates,
and plug-them into the expression:

and

where n is the total sample size and nk is the sample

size within class k (thus, n = ).
CS109A, PROTOPAPAS, RADER
LDA for one predictor (cont.)

This classifier works great if the classes are about equal

in proportion, but can easily be extended to unequal
class sizes.
Instead of assuming all priors are equal, we instead set
the priors to match the 'prevalence' in the data set:

Note: we can use a prior probability from knowledge of

the subject as well; for example, if we expect the test
set to have a different prevalence than the training set.
How could we do this in the Dem. vs. Rep. data set?
CS109A, PROTOPAPAS, RADER
LDA for one predictor (cont.)

Plugging all of these estimates back into the original

logged maximization formula we get:

Thus this classifier is called the linear discriminant

classifier: this discriminant function is a linear function
of x.

CS109A, PROTOPAPAS, RADER

Illustration of LDA when p = 1

CS109A, PROTOPAPAS, RADER

LDA when p > 1

CS109A, PROTOPAPAS, RADER

LDA when p > 1

LDA generalizes 'nicely' to the case when there is more

than one predictor.

Instead of assuming the one predictor is Normally

distributed, it assumes that the set of predictors for
each class is 'multivariate normal distributed'
(shorthand: MVN). What does that mean?

This means that the vector of X for an observation has a

multidimensional normal distribution with a mean
vector, , and a covariance matrix, .
CS109A, PROTOPAPAS, RADER
Multivariate Normal Distribution

Here is a visualization of the Multivariate Normal

distribution with 2 variables:

CS109A, PROTOPAPAS, RADER

MVN Distribution

The joint PDF of the Multivariate Normal distribution,

, is:

where is a p dimensional vector and || is the

determinant of the p x p covariance matrix.
Let's do a quick dimension analysis sanity check...
What do and look like?

CS109A, PROTOPAPAS, RADER

LDA when p > 1

Discriminant analysis in the multiple predictor case assumes the

set of predictors for each class is then multivariate Normal:

Just like with LDA for one predictor, we make an extra assumption
that the covariances are equal in each group, . in order to simplify
our lives.

Now plugging this assumed likelihood into the Bayes' formula (to
get the posterior) results in:

CS109A, PROTOPAPAS, RADER

LDA when p > 1 (cont.)

Then doing the same steps as before (taking log and

maximizing), we see that the classification will for an observation
based on its predictors, , will be the one that maximizes
(maximum of K of these ):

Note: this is just the vector-matrix version of the formula we saw

earlier in lecture:

What do we have to estimate now with the vector-matrix version?

How many parameters are there?
There are pK means, pK variances, K prior proportions, and
covariances to estimate. CS109A, PROTOPAPAS, RADER
LDA when K > 2

The linear discriminant nature of LDA still holds not only when p
> 1, but also when K > 2 for that matter as well. A picture can be
very illustrative:

CS109A, PROTOPAPAS, RADER

Quadratic Discriminant Analysis (QDA)

CS109A, PROTOPAPAS, RADER

Quadratic Discriminant Analysis (QDA)

A generalization to linear discriminant analysis is

quadratic discriminant analysis (QDA).

Why do you suppose the choice in name?

The implementation is just a slight variation on LDA.

Instead of assuming the covariances of the MVN
distributions within classes are equal, we instead allow
them to be different.

This relaxation of an assumption completely changes

the picture... CS109A, PROTOPAPAS, RADER
QDA in a picture

A picture can be very illustrative:

QDA in a picture

CS109A, PROTOPAPAS, RADER

QDA (cont.)
When performing QDA, performing classification for an
observation based on its predictors is equivalent to
maximizing the following over the K classes:

Notice the `quadratic form' of this expression. Hence

the name QDA.
Now how many parameters are there to be estimated?
There are pK means, pK variances, K prior proportions,
and covariances to estimate. This could slow us down
very much if K is large...
CS109A, PROTOPAPAS, RADER
Discriminant Analysis in Python

LDA is already implemented in Python via the

sklearn.discriminant_analysis package through the
LinearDiscriminantAnalysis function.

QDA is in the same package and is the

QuadraticDiscriminantAnalysis function.

It's very easy to use. Let's see how this works

CS109A, PROTOPAPAS, RADER

Discriminant Analysis in Python (cont.)

CS109A, PROTOPAPAS, RADER

QDA vs. LDA

So both QDA and LDA take a similar approach to solving

this classification problem: they use Bayes' rule to flip
the conditional probability statement and assume
observations within each class are multivariate Normal
(MVN) distributed.

QDA differs in that it does not assume a common

covariance across classes for these MVNs. What
advantage does this have? What disadvantage does
this have?

CS109A, PROTOPAPAS, RADER

QDA vs. LDA (cont.)

So generally speaking, when should QDA be used over

LDA? LDA over QDA?

The extra covariance parameters that need to be

estimated in QDA not only slow us down, but also allow
for another opportunity for overfitting. Thus if your
training set is small, LDA should perform better for out-
of-sample prediction, aka, predicting future
observations (how do we mimic this process?)

CS109A, PROTOPAPAS, RADER

Comparison of Classification Methods (so far)

CS109A, PROTOPAPAS, RADER

Quadratic Discriminant Analysis (QDA)

We have seen 3 major methods for doing classification:

• Logistic Regression
• k-NN
• Discriminant Analysis (LDA and QDA)
For a specific problem, which approach should be used?

Well of course, it depends on the nature of the data. So

how should we decide?

Visualize the data!

CS109A, PROTOPAPAS, RADER
Six Classification Models We'll Compare

Let's investigate which method will work the best (as measured
by lowest overall classification error rate), by considering 6
different models for 4 different data sets (each data set as a pair
of predictors...you can think of them as the first 2 PCA
components…to come later in the lecture). The 6 models to
consider are:
• A logistic regression with only 'linear' main effects}
• A logistic regression with only 'linear' and 'quadratic' effects}
• LDA
• QDA
• k-NN where k = 3
• k-NN where k = 25
What else will also be important to measure (besides error rate)?
CS109A, PROTOPAPAS, RADER
Which method should perform better? #1

n = 20,000, p = 2, K = 2, 1 = 2 = 0.5

Notice anything fishy about our answers? What

did Kevin do? What should he have done?

CS109A, PROTOPAPAS, RADER

Easy to implement in Python

CS109A, PROTOPAPAS, RADER

Which method should perform better? #2

n = 20,000, p = 2, K = 2, 1 = 2 = 0.5

CS109A, PROTOPAPAS, RADER

Which method should perform better? #3

n = 20,000, p = 2, K = 2, 1 = 2 = 0.5

CS109A, PROTOPAPAS, RADER

Which method should perform better? #4

n = 20,000, p = 2, K = 2, 1 = 2 = 0.5

CS109A, PROTOPAPAS, RADER

Summary of Results

Generally speaking:
• LDA outperforms Logistic Regression if the distribution of
predictors is reasonably MVN (with constant covariance).
• QDA outperforms LDA if the covariances are not the same in
the groups.
• k-NN outperforms the others if the decision boundary is
extremely non-linear.
• Of course, we can always adapt our models (logistic and
LDA/QDA) to include polynomial terms, interaction terms, etc...
to improve classification (watch out for overfitting!)
• In order of computational speed (generally speaking, it
depends on K, p, and n of course):
LDA > QDA > Logistic > k-NN
CS109A, PROTOPAPAS, RADER

Lec-04 - Linear Discriminant Analysis
No ratings yet
Lec-04 - Linear Discriminant Analysis
23 pages
Bayesian Classifier Linear Disciminant Analysis (LDA) Quadratic Discriminant Analysis (QDA)
No ratings yet
Bayesian Classifier Linear Disciminant Analysis (LDA) Quadratic Discriminant Analysis (QDA)
18 pages
Linear Discriminat Analysis
No ratings yet
Linear Discriminat Analysis
23 pages
1.2. Linear and Quadratic Discriminant Analysis - Scikit-Learn 1.6.1 Documentati
No ratings yet
1.2. Linear and Quadratic Discriminant Analysis - Scikit-Learn 1.6.1 Documentati
10 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Supervised Learning: Linear Methods (1/2) : Applied Multivariate Statistics - Spring 2012
No ratings yet
Supervised Learning: Linear Methods (1/2) : Applied Multivariate Statistics - Spring 2012
15 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
45 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
27 pages
4 - (9-10) LDA, QDA, & KNN Classifiers
No ratings yet
4 - (9-10) LDA, QDA, & KNN Classifiers
39 pages
Machine Learning-Lecture 3 (Student)
No ratings yet
Machine Learning-Lecture 3 (Student)
4 pages
Week2 Part1 Summer Partial Notes
No ratings yet
Week2 Part1 Summer Partial Notes
75 pages
Slide ML 0915
No ratings yet
Slide ML 0915
24 pages
DADM S14 Linear Discriminant Analysis
No ratings yet
DADM S14 Linear Discriminant Analysis
13 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
13 pages
Lecture 9: Classification, LDA: Reading: Chapter 4
No ratings yet
Lecture 9: Classification, LDA: Reading: Chapter 4
55 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Reference+Material LDA
No ratings yet
Reference+Material LDA
24 pages
Notes Discriminant Analysis March 2021
No ratings yet
Notes Discriminant Analysis March 2021
59 pages
Linear Classifiers: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Linear Classifiers: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
46 pages
Linear Discriminant Analysis Reference
No ratings yet
Linear Discriminant Analysis Reference
6 pages
Multivariate Analysis (Slides 8)
No ratings yet
Multivariate Analysis (Slides 8)
19 pages
04 Chap04 ClassificationMethods-LDA-QDA
No ratings yet
04 Chap04 ClassificationMethods-LDA-QDA
30 pages
Slides Classification Discranalysis
No ratings yet
Slides Classification Discranalysis
11 pages
Lec5 Part1
No ratings yet
Lec5 Part1
42 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
20 pages
U20cs604 Machine Learning Unit II
No ratings yet
U20cs604 Machine Learning Unit II
50 pages
Week 8 Notes - DM
No ratings yet
Week 8 Notes - DM
26 pages
ML Unit4
No ratings yet
ML Unit4
44 pages
Chapter 11 KNN Naive Bayes and LDA
No ratings yet
Chapter 11 KNN Naive Bayes and LDA
15 pages
1694601448-Unit 3.5 Linear Discriminant Analysis CU 2.0
No ratings yet
1694601448-Unit 3.5 Linear Discriminant Analysis CU 2.0
25 pages
Session 16-Discriminant Analysis
No ratings yet
Session 16-Discriminant Analysis
16 pages
Bayesian Classification
No ratings yet
Bayesian Classification
14 pages
ML Lab 8 - LDA
No ratings yet
ML Lab 8 - LDA
4 pages
Boedeker Kearns 2019 Linear Discriminant Analysis For Prediction of Group Membership A User Friendly Primer
No ratings yet
Boedeker Kearns 2019 Linear Discriminant Analysis For Prediction of Group Membership A User Friendly Primer
14 pages
Linear Discriminant Analysis: January 2015
No ratings yet
Linear Discriminant Analysis: January 2015
67 pages
Linear Discriminant Analysis and Its Variations: Abu Minhajuddin CSE 8331
No ratings yet
Linear Discriminant Analysis and Its Variations: Abu Minhajuddin CSE 8331
20 pages
LDA Slides N
No ratings yet
LDA Slides N
20 pages
Lec 9
No ratings yet
Lec 9
52 pages
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
No ratings yet
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
16 pages
Generative Algorithms
No ratings yet
Generative Algorithms
3 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
16 pages
Lec 9 Lda
No ratings yet
Lec 9 Lda
48 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
04 Chap04 ClassificationMethods LDA QDA
No ratings yet
04 Chap04 ClassificationMethods LDA QDA
28 pages
Linear Discriminant Analysis Summary
No ratings yet
Linear Discriminant Analysis Summary
12 pages
LDA
No ratings yet
LDA
10 pages
Unit 2 - Gaussian Models
No ratings yet
Unit 2 - Gaussian Models
67 pages
Fisher Linear Discriminant Analysis: 1 What's LDA
No ratings yet
Fisher Linear Discriminant Analysis: 1 What's LDA
6 pages
Legal 3 AI
No ratings yet
Legal 3 AI
3 pages
Week#5
No ratings yet
Week#5
33 pages
Dimensions Reduction
No ratings yet
Dimensions Reduction
27 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
33 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages
Linear Discriminant Analysis: Predictive Modelling - Week3
No ratings yet
Linear Discriminant Analysis: Predictive Modelling - Week3
19 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
AI19
No ratings yet
AI19
4 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
From Everand
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
Ilya Narsky
No ratings yet
Satplan: Fundamentals and Applications
From Everand
Satplan: Fundamentals and Applications
Fouad Sabry
No ratings yet
(1980) G.E.Wall - Conjugacy Classes in Projective and Special Linear Groups
No ratings yet
(1980) G.E.Wall - Conjugacy Classes in Projective and Special Linear Groups
26 pages
CS New
No ratings yet
CS New
7 pages
Pharmaceutical Supply Chain and Inventory Management Strategies PDF
No ratings yet
Pharmaceutical Supply Chain and Inventory Management Strategies PDF
13 pages
Predicate Logic Exercise
No ratings yet
Predicate Logic Exercise
8 pages
PV and CDF Tables
No ratings yet
PV and CDF Tables
3 pages
Daftar Buku Perpustakaan Fakultas Matematika Dan Ilmu Pengetahuan Alam
No ratings yet
Daftar Buku Perpustakaan Fakultas Matematika Dan Ilmu Pengetahuan Alam
15 pages
SS1 Further Maths Exam
No ratings yet
SS1 Further Maths Exam
8 pages
Sight Reduction - WP PDF
No ratings yet
Sight Reduction - WP PDF
5 pages
Career Point: Fresher Course For IIT JEE (Main & Advanced) - 2017
No ratings yet
Career Point: Fresher Course For IIT JEE (Main & Advanced) - 2017
2 pages
Model QP
No ratings yet
Model QP
6 pages
Finite Element Analysis in Geotechnical Engineering: Thomastelford
No ratings yet
Finite Element Analysis in Geotechnical Engineering: Thomastelford
10 pages
Appendix
No ratings yet
Appendix
14 pages
Advanced Pharmaceutical Solids
No ratings yet
Advanced Pharmaceutical Solids
534 pages
Preparing For The Future of Ai
No ratings yet
Preparing For The Future of Ai
58 pages
Chevalier Mayzlin 2006 The Effect of Word of Mouth On Sales Online Book Reviews
No ratings yet
Chevalier Mayzlin 2006 The Effect of Word of Mouth On Sales Online Book Reviews
10 pages
Unit III Greedy
No ratings yet
Unit III Greedy
58 pages
HP Calculators: HP 49G+ Return On Investment
No ratings yet
HP Calculators: HP 49G+ Return On Investment
3 pages
Self Organized Biological Dynamics and Non Linear Control PDF
100% (2)
Self Organized Biological Dynamics and Non Linear Control PDF
443 pages
DETAILED LESSON PLAN in MATH IV Parts of
100% (1)
DETAILED LESSON PLAN in MATH IV Parts of
3 pages
CH 3
No ratings yet
CH 3
31 pages
2b. Spatial Weight Matrices
No ratings yet
2b. Spatial Weight Matrices
6 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
of Quantitative Techniques
100% (1)
of Quantitative Techniques
116 pages
Aptitude Questions: BIT Placement Center
No ratings yet
Aptitude Questions: BIT Placement Center
17 pages
Confidence Interval
No ratings yet
Confidence Interval
16 pages
QP Economics Xi 201920
No ratings yet
QP Economics Xi 201920
10 pages
2019 Amc 10a
No ratings yet
2019 Amc 10a
10 pages
Plato's Republic: PART 7
100% (1)
Plato's Republic: PART 7
11 pages
B10-Food Preference
No ratings yet
B10-Food Preference
12 pages
Intro To Plant Taxonomy Notes
No ratings yet
Intro To Plant Taxonomy Notes
26 pages