Reference+Material LDA
Reference+Material LDA
1. Overview
2. Key Concepts & Terminologies
3. LDA Implementation
[email protected]
LORB32MGT5
4. Conclusion
5. Use Cases of LDA
6. Further Reading
• This leads to the conclusion that LDA has similar use to both Logistic
regression (for classification) and Principal Component Analysis (for
dimensionality reduction). Let us have a brief discussion on the comparison
of these two techniques.
I Multi-class classification –
Logistic regression is primarily intended to be used as binary classifier, although
[email protected]
it can be extended to multi-class classification.
LORB32MGT5
[email protected]
LORB32MGT5
All the above three limitations of logistic regression are usually handled by LDA, and
thus can be used as an alternate classifier in certain situations.
[email protected]
LORB32MGT5
[email protected]
LORB32MGT5
[email protected]
3. Outlier Treatment –
LORB32MGT5
Although linear models are very sensitive to outlier values, it greatly depends on
the particular use case whether a high values observation is treated as an outlier
or not. After the decision is made, outliers can be treated to avoid skew in the
basic statistics.
LDA can be derived from simple probabilistic models which model the class conditional
distribution of the data for each class . Predictions can then be obtained by using Bayes’
rule, for each training sample :
(eq. 1 )
[email protected]
LORB32MGT5
all the expressions have there usual meaning (as discussed earlier)
(eq. 2 )
Substituting the Probability distribution (eq. 2) in Bayes Rule (eq. 1) , and further
simplification, we get :
(eq. 3 )
[email protected]
LORB32MGT5
• The term (x- µk)t ∑-1 (x- µk) corresponds to the Mahalanobis Distance between the
sample and the mean. The Mahalanobis distance tells how close x is from the
mean µk , while also accounting for the variance of each feature.
• We can thus interpret LDA as assigning to the class whose mean is the closest in
terms of Mahalanobis distance, while also accounting for the class prior
probabilities.
P( k ) is a prior probability that the native class for x is k; and has to be specified
by the user. Usually by default all classes receive equal P(k) =
1/number_of_classes. Or, P(k) can be the count of the occurrence of one class
divided by the total number of occurrences of all the classes.
[email protected]
LORB32MGT5
P(x|k) is probability of presence of point x in class k, if class being dealt with is k.
The main issue in finding value for this term is that the variables are continuous
and not discrete. Hence, we need to compute the Probability Density Function
(PDF).
Now each x will be substituted in the above two equations and the point will be
classified to the class for which P( Class | Data) is the highest.
2. LDA will fail when the discriminatory information is not in the mean but rather
[email protected]
in the variance of the data.
LORB32MGT5
[email protected]
3. Medical Field:
LORB32MGT5
Linear discriminant analysis (LDA) is used to classify the patient disease state as
mild, moderate or severe based upon the patient various parameters and the
medical treatment he is going through. This helps the doctors to intensify or
reduce the pace of their treatment.