Lec-04 - Linear Discriminant Analysis
Lec-04 - Linear Discriminant Analysis
Linear decision boundaries found by LDA Quadratic decision boundaries using LDA
Bayes Decision Boundary
• The test error rate of a classification problem is minimized when a new
observation is assigned to a class 𝑗, for which 𝑃(𝑌 = 𝑗|𝑋 = 𝑥0 ) is largest
• This is called a Bayes classifier
• In a 2 class problem for 𝑗 = {1,2},
predict class 1 for
𝑃 𝑌 = 1 𝑋 = 𝑥0 > 0.5
• The Bayes classifier produces the
lowest possible test error rate, called
the Bayes error rate
Bayes Decision Boundary
Bayes Theorem for Classification
Pr 𝑋=𝑥𝑌=𝑘 .Pr(𝑌=𝑘)
• Pr 𝑌 = 𝑘 𝑋 = 𝑥 =
Pr(𝑋=𝑥)
𝜋𝑘 𝑓𝑘 𝑥
• Pr 𝑌 = 𝑘 𝑋 = 𝑥 = σ𝑙 𝜋𝑙 𝑓𝑙 𝑥
• Where 𝑓𝑘 𝑥 = Pr 𝑋 = 𝑥 𝑌 = 𝑘 is the conditional density of 𝑋 in
class 𝑘
and 𝜋𝑘 = Pr(𝑌 = 𝑘) is the prior probability
• We need to know the class posterior Pr 𝑌 = 𝑘 𝑋 = 𝑥 for optimal
classification
Linear Discriminant Analysis for One Predictor
• The observation will be classified for which
𝑝𝑘 𝑥 = Pr 𝑌 = 𝑘 𝑋 = 𝑥 is greatest
1 1 2
• 𝑓𝑘 𝑥 = exp − 2 𝑥 − 𝜇𝑘
𝜎𝑘 2𝜋 2𝜎𝑘
Where 𝜇𝑘 and 𝜎𝑘 are mean and variance parameters of the 𝑘 𝑡ℎ class
• For Linear Discriminant Analysis, it is assumed
𝜎12 = 𝜎22 =. . = 𝜎𝑘2 = 𝜎 2
Linear Discriminant Analysis for One Predictor
𝜋𝑘 𝑓𝑘 𝑥
• 𝑝𝑘 𝑥 = Pr 𝑌 = 𝑘 𝑋 = 𝑥 =σ
𝑙 𝜋𝑙 𝑓𝑙 𝑥
1 1
𝜋𝑘 exp − 2 𝑥−𝜇𝑘 2
𝜎 2𝜋
𝑘 2𝜎 𝑘
= 1 1
σ𝑙 𝜋𝑙 exp − 2 𝑥−𝜇𝑙 2
𝜎𝑙 2𝜋 2𝜎𝑙
• The Bayes classifier will assign an observation at 𝑋 = 𝑥 to the class for
which 𝑝𝑘 (𝑥) is largest
• This is equivalent to assign the observation to a class for which 𝛿𝑘 𝑥
is largest
2
𝑥𝜇𝑘 𝜇𝑘
𝛿𝑘 𝑥 = − + log(𝜋𝑘 )
𝜎2 2𝜎 2
Bayes Decision Boundary
• For 𝐾 = 2 and 𝜋1 = 𝜋2 , observation is assigned to class 1 if
2𝑥 𝜇1 − 𝜇2 > 𝜇12 − 𝜇22
• The Bayes decision boundary correspond to the point where
𝜇12 −𝜇22 𝜇1 +𝜇2
𝑥= =
2 𝜇1 −𝜇2 2
Parameter Estimation
• 𝜇ො𝑘 = 1/𝑛𝑘 σ𝑖:𝑦𝑖 =𝑘 𝑥𝑖
• If no knowledge of prior probability 𝜋𝑘 is available, then it can be
estimated by
𝑛𝑘
𝜋ො 𝑘 =
𝑁
1
• 2
𝜎ො = σ𝐾
𝑘=1 σ𝑖:𝑦𝑖 =𝑘 𝑥𝑖 − 𝜇ො𝑘 2
𝑁−𝐾
• The LDA classifier plugs into these estimates for observation 𝑋 = 𝑥
𝑥 𝜇ො𝑘 𝜇ො𝑘2
𝛿መ𝑘 𝑥 = 2 − 2 + log(𝜋ො 𝑘 )
𝜎ො 2𝜎ො
Multivariate Gaussian
LDA is the special case when it is assume that the covariance matrix is
same for all the classes
Σ𝑘 = Σ ∀𝑘
LDA with Multiple Predictors
Estimated values:
𝑁𝐾
𝜋ො 𝑘 =
𝑁
𝑥
𝜇ො𝑘 = σ𝑔𝑖 =𝑘 𝑖
𝑁𝑘
𝑥𝑖 −ෝ 𝜇𝑘 𝑇
𝜇𝑘 𝑥𝑖 −ෝ
Σ = 𝐾
σ𝑘=1 σ𝑔𝑖 =𝑘
𝑁−𝐾
Example (Default Data)
LDA classifier classified Yes when P(Default=Yes | X) > 0.5 (Bayes Classifier)
Overall training error rate 2.75%
Error rate for default individuals 75.7%
Sensitivity is the percentage of true defaulters that are identified :24.3 %.
Specificity is the percentage of non-defaulters that are correctly identified:
(1 − 23/9667) = 99.8 %.
Example (Default Data)
Fraction of defaulters
Incorrectly classified
Two Gaussian classes with common correlation Two Gaussian classes with different covariance;
Between 𝑋1 and 𝑋2 ; Bayes decision boundary in Bayes decision boundary in purple dashed line,
purple dashed line, LDA (black dotted), QDA (green LDA (black dotted), QDA (green
Solid) Solid)
Example: Stock Market Data
Example: Stock Market Data
> plot(lda.fit)
Test Error Rate