Lecture 11 - 09.09.24 Classification Part 1
Lecture 11 - 09.09.24 Classification Part 1
Classification - 1
Introduction to Logistic
Decision Trees
Classification Regression
Classification Confusion
Q&A
Metrics Matrix
“ Introduction to Classification and
Classification algorithms
Classification
Logistic Regression
It is a simple and widely used method for binary classification. It
uses a logistic function to model the probability of the output
class.
Decision Trees
A decision tree is a hierarchical model that partitions the
feature space into a set of rectangular regions. Each leaf
node represents a class label.
Random Forest
• Random forest is an ensemble method that combines
multiple decision trees to improve the prediction
accuracy.
• It creates a set of decision trees on randomly selected
subsets of the training data and then combines their
predictions.
Naive Bayes
• It is a probabilistic classifier that uses Bayes'
theorem to predict the class label of a new
instance.
• Naive Bayes assumes that the features are
conditionally independent given the class label.
Binary Classification
Multi-Class Classification
• Multi-class classification has more than two labels for
classification
• Relevant examples for this model are
• Handwritten digits recognition
• Face expression classification
• Popular algorithms used for binary classification
• Naïve Bayes
• Random Forest
• Decision trees
• SVM
Introduction to Classification
Multi-Label Classification
Multi-Class Multi-Label
Placing all your song collection into Tagging your songs in your media player
specific folder such as by year or by under different playlists
music director The song can be part of multiple playlists
Once placed, the song will belong to (or and can
will be inside) that specific folder only
Classification
Summary
1. True or False: In multi-class classification, each instance can belong to only one class.
2. Answer: True Explanation: In multi-label classification, each instance can be associated with
multiple classes at the same time. For example, a movie could be classified as both "action"
and "comedy".
3. Answer: Linear Regression Explanation: Linear Regression is primarily used for regression
tasks, where the goal is to predict a continuous numerical value. It's not typically used for
classification problems, which involve predicting discrete class labels.
“ Logistic Regression
Logistic Regression
• Linear regression maps the input values to the output values (in continuous
domain)
• The logistic regression model is based on the logistic function, which maps any
real-valued input to a probability value between 0 and 1.
Logistic Regression
• Compute w∙x+b
• Pass it through the sigmoid function: σ(w∙x+b)
• Treat it as a probability
1. Which of the following is the standard activation function used in Logistic Regression?
a) ReLU b) Sigmoid c) Tanh d) Softmax
2. Which of the following loss functions is typically used in Logistic Regression? a) Mean
Squared Error b) Cross-Entropy Loss c) Hinge Loss d) Huber Loss
3. A Logistic Regression model outputs a probability of 0.7 for a certain instance. If the
classification threshold is 0.6, how will this instance be classified in a binary problem?
Solutions
2. Answer: b)Cross-Entropy Loss (also known as Log Loss) is the standard loss
function for Logistic Regression as it measures the performance of a
classification model whose output is a probability value between 0 and 1.
Reference : Hands of Machine learning with scikit learn keras and tensorflow – Geron 3rd Edition
Accuracy
• Based on the three-fold cross validation our model has obtained an accuracy of
96.47%
• Which is reasonably a good estimate for the given model
But is this metric sufficient?
Reference : Hands of Machine learning with scikit learn keras and tensorflow – Geron 3rd Edition
Accuracy
Reference : Hands of Machine learning with scikit learn keras and tensorflow – Geron 3rd Edition
Accuracy
Reference : Hands of Machine learning with scikit learn keras and tensorflow – Geron 3rd Edition
Accuracy
Reference : Hands of Machine learning with scikit learn keras and tensorflow – Geron 3rd Edition
Accuracy
Summary
Reference : Hands of Machine learning with scikit learn keras and tensorflow – Geron 3rd Edition
Confusion Matrix
Precision
• Assesses the accuracy of positive predictions made by a model
• Ratio of true positive predictions to the total number of positive
predictions (both true positives and false positives)
• Gauges the proportion of correctly predicted positive instances out of all
instances the model predicted as positive
• Valuable for evaluating the model's capability to avoid making incorrect
positive predictions and to minimize false positives
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃
Confusion Matrix
Recall
• Ability of a model to correctly identify all relevant instances in the dataset
• Ratio of true positive predictions to the total number of actual positive
instances
• Quantifies the model's effectiveness in capturing and "recalling" instances of a
particular class, thereby providing insight into its ability to minimize false
negatives (missed instances)
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁
Confusion Matrix
• Precision can be made 100% by just making only correct positive prediction
and ensuring its correct ( Since FP = 0 , we will have precision as 1 )
• Recall is also known as sensitivity or True Positive Rate
• Recall can be increased by reducing the number of False Negatives, (i,e
mistakenly classifying a positive instance as negative)
Confusion Matrix
• Instances where reducing false positives is essential include scenarios like ensuring the
safety of videos for children.
• While allowing a few false negatives (labeling a child-safe video as unsafe) might be
acceptable, the focus is primarily on detecting any non-child-safe video falsely marked as
safe (false positive).
Need for high recall
• Instances where reducing false negatives is crucial include situations such as identifying
shoplifters in a high-end jewellery store's surveillance video.
• In this case, the priority is to minimize instances where shoplifters are not detected (false
negatives).
• This might involve adjusting the model to accept more false positives (misidentifying non -
shoplifters as shoplifters) as a trade-off.
Confusion Matrix
F1 Score
F1 Score
• Range:
• The F1 score ranges between 0 and 1, with higher values
indicating better model performance.
• Class Imbalance:
• Useful when dealing with imbalanced datasets, where one
class is more prevalent than the other.
• Balancing Precision and Recall:
• Helps find a middle ground between identifying true positive
instances (precision) and capturing all relevant positive
instances (recall).
Test your understanding
1. In a binary classification problem, which of the following is NOT a component of a confusion
matrix? a) True Positives b) False Negatives c) True Negatives d) Actual Positives
2. True or False: Accuracy is always the best metric to evaluate a classification model, regardless
of class imbalance.
3. Which metric is defined as the ratio of correctly predicted positive samples to the total
predicted positive samples? a) Recall b) Precision c) F1-score d) Specificity
4. What does the area under the ROC curve (AUC-ROC) represent? a) The model's ability to
distinguish between classes b) The model's accuracy c) The model's precision d) The model's
recall
5. In a binary classification problem, a model achieves the following results: True Positives: 80
False Positives: 20 False Negatives: 10 True Negatives: 90 Calculate the model's precision and
recall.
Test your understanding
1. Answer: d) Actual Positives Explanation: A confusion matrix typically contains True Positives, True
Negatives, False Positives, and False Negatives. "Actual Positives" is a sum of True Positives and False
Negatives, not a direct component of the matrix.
2. Answer: False Explanation: Accuracy can be misleading in cases of class imbalance. Other metrics like
precision, recall, or F1-score may be more appropriate depending on the problem and class distribution.
3. Answer: b) Precision Explanation: Precision is defined as TP / (TP + FP), where TP is True Positives and FP
is False Positives. It measures the accuracy of positive predictions.
4. Answer: a) The model's ability to distinguish between classes Explanation: AUC-ROC represents the
model's ability to distinguish between positive and negative classes across various thresholds. A higher
AUC indicates better discrimination.
5. Answer: Precision = TP / (TP + FP) = 80 / (80 + 20) = 0.8 or 80% Recall = TP / (TP + FN) = 80 / (80 + 10) =
0.889 or 88.9%
“ Multiclass Classification
Multiclass Classification
• We have learnt how binary classifiers classify data into either positive or negative class.
• However, Not all binary classifiers such as SVM, and SGD classifiers are inherently equipped to
handle multi-class classification problems
• We can make an ensemble of binary classification models to perform a multi-class classification
models
• There are two main strategies
• One vs All ( One vs Rest ) OVR
• One vs One (OVO)
Multiclass Classification
One vs All
One vs One
𝑁 ∗ (𝑁 − 1)
𝑇𝑜𝑡𝑎𝑙 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟𝑠 =
2
One vs One
Summary
• One vs One, builds N*(N-1)/2 classifiers and One vs All uses N classifiers for a multi-
class classification problem with N classes
• For large datasets with multiple classes, OVR will be challenging as it may lead to
imbalanced dataset for classification
• OVR can comparatively deal with a smaller dataset per model ( since only 2 classes
are involved) however, the number of models is way higher than OVR