Lecture14 Discriminant Analysis
Lecture14 Discriminant Analysis
• Discriminant Analysis
• LDA for one predictor
• LDA for p > 1
• QDA
• Comparison of Classification Methods (so far)
Age Sex ChestPain RestBP Chol Fbs RestECG MaxHR ExAng Oldpeak Slope Ca Thal AHD
In this setting with one feature (one X), Bayes' theorem can then
be written as:
The left hand side, P(Y = k|X = x), is called the posterior
probability and gives the probability that the observation is
in the kth category given the feature, X, takes on a specific
value, x. The numerator on the right is conditional
distribution of the feature within category k, fk(x), times the
prior probability that observation is in the kth category.
and
Just like with LDA for one predictor, we make an extra assumption
that the covariances are equal in each group, . in order to simplify
our lives.
Now plugging this assumed likelihood into the Bayes' formula (to
get the posterior) results in:
The linear discriminant nature of LDA still holds not only when p
> 1, but also when K > 2 for that matter as well. A picture can be
very illustrative:
QDA in a picture
Let's investigate which method will work the best (as measured
by lowest overall classification error rate), by considering 6
different models for 4 different data sets (each data set as a pair
of predictors...you can think of them as the first 2 PCA
components…to come later in the lecture). The 6 models to
consider are:
• A logistic regression with only 'linear' main effects}
• A logistic regression with only 'linear' and 'quadratic' effects}
• LDA
• QDA
• k-NN where k = 3
• k-NN where k = 25
What else will also be important to measure (besides error rate)?
CS109A, PROTOPAPAS, RADER
Which method should perform better? #1
n = 20,000, p = 2, K = 2, 1 = 2 = 0.5
n = 20,000, p = 2, K = 2, 1 = 2 = 0.5
n = 20,000, p = 2, K = 2, 1 = 2 = 0.5
n = 20,000, p = 2, K = 2, 1 = 2 = 0.5
Generally speaking:
• LDA outperforms Logistic Regression if the distribution of
predictors is reasonably MVN (with constant covariance).
• QDA outperforms LDA if the covariances are not the same in
the groups.
• k-NN outperforms the others if the decision boundary is
extremely non-linear.
• Of course, we can always adapt our models (logistic and
LDA/QDA) to include polynomial terms, interaction terms, etc...
to improve classification (watch out for overfitting!)
• In order of computational speed (generally speaking, it
depends on K, p, and n of course):
LDA > QDA > Logistic > k-NN
CS109A, PROTOPAPAS, RADER