ML CH 5
ML CH 5
TRP tells us what proportion of the positive class got correctly classified.
TRP tells us what proportion of the positive class got correctly classified.
• Classification Accuracy
• Confusion Matrix
• F1 Score
• Recall
• Precision
Confusion Matrix
A confusion matrix is a table that visualizes the performance of a
classification model by comparing predicted and actual values across
different classes. It's a handy tool for evaluating the effectiveness of a
model in terms of true positives, true negatives, false positives, and
false negatives.
Confusion Matrix
Ex:
Accuracy = 0.91
Evaluation Metrics
Classification Accuracy
Classification accuracy is the accuracy we generally mean, whenever we use
the term accuracy. We calculate this by calculating the ratio of correct
predictions to the total number of input Samples.
Accuracy = TN + TP / TN + FN + TP + FP
Letʼs use the data of the model outcomes from Table 1 to calculate the
accuracy of a simple classification model
Accuracy = 5 + 5 / 5 + 1 + 5 + 2
= 10 / 13
= 0.77
Accuracy = 0.77
Classification Accuracy
an accuracy score above 0.7 describes an average model
performance, whereas a score above 0.9 indicates a good model.
However, the relevance of the score is determined by the task.
Accuracy alone may not provide a complete picture of model
performance, especially In scenarios where class imbalance exists in
the dataset.
Precision = TP / TP + FP
Using the classification model outcomes from Table 1 above,
precision is calculated as
Precision
Precision = TP / TP + FP
=5/5+2
=5/7
= 0.71
Precision = 0.71
Precision can be thought of as a quality metric; higher precision
indicates that an algorithm provides more relevant results than
irrelevant ones. It is solely focused on the correctness of positive
predictions, with no attention to the correct detection of negative
predictions.
Recall
Recall, also called sensitivity, measures the model's ability to detect
positive events correctly. It is the percentage of accurately predicted
positive events out of all actual positive events. To calculate the recall
of a classification model, the formula is
Recall = TP / TP + FN
Recall = TP / TP + FN
=5/5+1
=5/6
= 0.83
Recall = 0.83
A high recall score indicates that the classifier predicts the majority of
the relevant results correctly. However, the recall metric does not
take into account the potential repercussions of false positives
Recall
we want to build classifiers with high precision and recall. But that’s
not always possible. A classifier with high recall may have low
precision, meaning it captures the majority of positive classes but
produces a considerable number of false positives. Hence, we use
the F1 score metric to balance this precision-recall trade-off.
Difference between Precision and Recall
Difference between Precision and Recall
• This question is very common among all machine learning
engineers and data researchers. The use of Precision and Recall
varies according to the type of problem being solved.
• If there is a requirement of classifying all positive as well as
Negative samples as Positive, whether they are classified correctly
or incorrectly, then use Precision.
• Further, on the other end, if our goal is to detect only all positive
samples, then use Recall. Here, we should not care how negative
samples are correctly or incorrectly classified the samples.
F1 Score
Here, you can observe that the harmonic mean of precision and recall
creates a balanced measurement, i.e., the model's precision is not
optimized at the price of recall, or vice versa. As a result, the F1 score
metric directs real-world decision-making more accurately
Medical Diagnostics
In medical diagnostics, it is important to acquire a high recall while correctly detecting
positive occurrences, even if doing so necessitates losing precision. For instance, the
F1 score of a cancer detection classifier should minimize the possibility of false
negatives, i.e., patients with malignant cancer, but the classifier wrongly predicts as
benign.
Sentiment Analysis
For natural language processing (NLP) tasks like sentiment analysis, recognizing both
positive and negative sentiments in textual data allow businesses to assess public
opinion, consumer feedback, and brand sentiment. Hence, the F1 score allows for an
efficient evaluation of sentiment analysis models by taking precision and recall into
account when categorizing sentiments.
F1 Score Application in Machine Learning
Fraud Detection
In fraud detection, by considering both precision (the accuracy with which fraudulent
cases are discovered) and recall (the capacity to identify all instances of fraud), the
F1 score enables practitioners to assess fraud detection models more accurately. For
instance, the figure below shows the evaluation metrics for a credit card fraud
detection model.
F1 Score Limitations
Dataset Class Imbalance
For imbalanced data, when one class significantly outweighs the other, the regular F1
score metric might not give a true picture of the model's performance. This is because
the regular F1 score gives precision and recall equal weight, but in datasets with
imbalances, achieving high precision or recall for the minority class may result in a
lower F1 score due to the majority class's strong influence.
Contextual Dependence
The evaluation of the F1 score varies depending on the particular problem domain
and task objectives. Various interpretations of what constitutes a high or low F1 score
for different applications require various precision-recall criteria.
Performance Metrics for Regression
where:
•xi represents the actual or observed value for the i-th data
point.
•yi represents the predicted value for the i-th data point.
R Squared Error
Where:
•R2 is the R-Squared.
•SSR represents the sum of squared residuals between the
predicted values and actual values.
•SST represents the total sum of squares, which measures the
total variance in the dependent variable.
R Squared Error
Where:
•R2 is the R-Squared.
•SSR represents the sum of squared residuals between the
predicted values and actual values.
•SST represents the total sum of squares, which measures the
total variance in the dependent variable.
Adjusted R Squared
Thank You!!!
www.paruluniversi
ty.ac.in