0% found this document useful (0 votes)
6 views13 pages

MLS 2 - Classification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

MLS 2 - Classification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

jacques.lethuaut@gmail.

com
R8L0PN473F
Classification

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Topics covered so far
1. Intro: Classification

2. Gaussian Models

3. Logistic Regression

4. Performance Assessments
[email protected]
R8L0PN473F

5. K-Nearest Neighbors

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Discussion questions
1. Why do we use logistic regression?
2. What is a confusion matrix and how can you interpret it?
3. Why is accuracy not always a good performance measure?
4. How to choose the threshold using the Precision-Recall curve?
5. Is there a performance measure that can cover both Precision and Recall?
[email protected]
R8L0PN473F
6. How does the K-NN algorithm work? How to identify K in this algorithm?

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Why do we use logistic regression?
● Logistic Regression is a supervised learning algorithm that is used for classification problems, i.e., where the
dependent variable is categorical.
● In logistic regression, we use the Sigmoid function to calculate the probability of the dependent variable.
● The real-life applications of logistic regression are churn prediction, spam detection, etc.
● The below image shows how logistic regression is different from linear regression in fitting the model.

[email protected]
R8L0PN473F

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action. Image Source
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Confusion matrix
It is used to measure the performance of a classification
algorithm. It can be used to calculate the following metrics:
Actual Values
1. Accuracy: Proportion of correctly predicted results
among the total number of observations Positive (1) Negative (0)

Accuracy = (TP+TN)/(TP+FP+FN+TN)
[email protected]
Positive (1) TP FP

Predicted Values
R8L0PN473F
2. Precision: Proportion of true positives to all the
predicted positives, i.e., how valid the predictions are

Precision = (TP)/(TP+FP)
Negative (0) FN TN
3. Recall: Proportion of true positives to all the actual
positives, i.e., how complete the predictions are

Recall = (TP)/(TP+FN)
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Why accuracy is not always a good performance measure
Accuracy is simply the overall % of correct predictions and can be high even for very useless models.

# Total
Model Misses out on
Patients – 100
Cancer rate – predicts that Accuracy – 2 critical
# of Patients
2% no one has 98% patients
having cancer
cancer having cancer
-2

Here, accuracy
[email protected]

R8L0PN473F
will be 98%, even if we simply ● The other important metrics are Recall and
predict that every patient does not have cancer. Precision:
● In this case, Recall should be used as a measure of ○ Recall - What % of actuals 1s did the model
model performance; high recall imply fewer false capture in prediction?
negatives. ○ Precision - What % of predicted 1s are
● Fewer false negatives implies a lower chance of actual 1s?
‘missing’ a cancer patient, i.e., predicting a cancer ● There is a tradeoff - as you try to increase the
patient as one not having cancer. Recall, the Precision will reduce and vice versa.
● This is where we need other metrics to evaluate ● This tradeoff can be used to figure out the right
model performance. threshold to use for the model.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
How to chose thresholds using the Precision-Recall curve?
● The Precision-Recall curve is a useful measure of
the success of prediction when the classes are
imbalanced.
● The curve shows the tradeoff between the precision
and the recall for different thresholds.
● It can be used to select an optimal threshold as
required to improve the model performance.
[email protected]
● Here, as we can see, the precision and the recall are
R8L0PN473F
almost equal when the threshold is around 0.4.
● If we want a higher precision, we can increase the
threshold.
● If we want a higher recall, we can decrease the
Choosing different thresholds can
threshold.
completely change the model’s
performance.
It is important to think about what
constitutes the ‘sweet spot’.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 7
Is there a performance measure that can cover both Precision
and Recall?

● F1 Score is a measure that takes into account both Precision and Recall.
● The F1 Score is the harmonic mean of Precision and Recall. Therefore, this score takes both false positives and false
negatives into account.

[email protected]
R8L0PN473F

● The highest possible value of the F1 score is 1, indicating perfect precision and recall, and the lowest possible value is
0.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
K-Nearest Neighbours (K-NN) algorithm
This algorithm uses features from the training data to predict the values of new data points, which means the
new data point will be assigned a value based on how similar it is to the data points in the training set. We can
define its working in the following steps:
● Step 1: We need to choose the value of K, i.e., the number of nearest data points to consider. K can be any
positive integer.
● Step 2: For each point in the test data do the following:
Calculate the distance between the test point and each training point with the help of any of the
[email protected]
R8L0PN473F○
distance methods, namely: Euclidean, Manhattan, etc. The most commonly used method to calculate
the distance is the Euclidean method.
○ Now, based on the distance value, sort them in ascending order.
○ Next, choose the top K rows from the sorted array.
○ Now, assign a class to the test point based on the most frequent class.
● Step 3: Repeat this process until all the test points are classified in a
particular class.
We try different values of K and plot them against the test error. The lower the
value of the test error, the better the
This file value
is meantofforK.personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
[email protected]
R8L0PN473F Case Study

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 10
[email protected]
R8L0PN473F Appendix

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 11
LDA vs QDA

Linear Discriminant Analysis Quadratic Discriminant Analysis

It is a linear classifier but much less It is a non-linear classifier but more flexible
flexible than QDA than LDA

It assumes a common covariance matrix for It assumes that each class has its
R8L0PN473F all the classes covariance matrix
[email protected]

It is preferred when the training set only It is preferred when the training set is very
has a few observations large

It can be used as a dimensionality It cannot be used as a dimensionality


reduction technique reduction technique

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 12
Happy Learning !
[email protected]
R8L0PN473F

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action. 13
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

You might also like