0% found this document useful (0 votes)
9 views6 pages

Evaluation Metrics

The document provides a comprehensive overview of evaluation metrics used in machine learning, categorized into classification, regression, and clustering metrics. It outlines specific metrics such as accuracy, precision, recall, F1-score, and others, detailing their formulas, when to use them, and their interpretations. Additionally, it emphasizes best practices for model evaluation, including metric selection based on problem type and the importance of cross-validation.

Uploaded by

Sai Indupuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Evaluation Metrics

The document provides a comprehensive overview of evaluation metrics used in machine learning, categorized into classification, regression, and clustering metrics. It outlines specific metrics such as accuracy, precision, recall, F1-score, and others, detailing their formulas, when to use them, and their interpretations. Additionally, it emphasizes best practices for model evaluation, including metric selection based on problem type and the importance of cross-validation.

Uploaded by

Sai Indupuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

EVALUATION METRICS

Machine Learning

Sai
Page |1

Machine learning models are evaluated using metrics that measure their
performance on the given task. The choice of evaluation metric depends on the type
of problem (e.g., regression, classification, clustering, etc.), the dataset, and the
business objectives.

Here’s an in-depth look at evaluation metrics for different types of machine learning.

1. Classification Metrics
Used to evaluate models where the goal is to predict categories or labels (e.g., spam
detection, image classification).

1.1 Accuracy

• Formula:
Number of Correct Predictions
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
Total Number of Predictions
• When to Use: Balanced datasets with equal class distribution.
• Limitation: Misleading for imbalanced datasets (e.g., 95% accuracy when
95% of data belongs to one class).
• Example Use Case: Email spam classification with balanced classes.

1.2 Precision
True Positives
• Formula : Precision = True Positives + False Positives

• When to Use: When false positives are costly (e.g., predicting cancer when it
doesn't exist).
• Interpretation: High precision means fewer false alarms.
• Improvement: Reduce model's tendency to misclassify negatives as positives
(e.g., threshold adjustment).

1.3 Recall (Sensitivity)

• Formula: 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠Recall =


True Positives
True Positives + False Negatives
• 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
• When to Use: When false negatives are costly (e.g., missing a cancer
diagnosis).
• Interpretation: High recall means the model captures most actual positives.
• Improvement: Train the model to reduce false negatives (e.g., adding more
positive samples).
Page |2

1.4 F1-Score
Precision⋅Recall
• Formula:𝐹1 = 2 ⋅ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙𝐹1 = 2 ⋅ Precision + Recall
• 𝐹1 = 2 ⋅ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙
• When to Use: When precision and recall are equally important.
• Interpretation: Balance between false positives and false negatives.
• Example Use Case: Fraud detection where both false alarms and missed
frauds are critical.

1.5 ROC-AUC (Receiver Operating Characteristic - Area Under Curve)

• ROC Curve: Plots True Positive Rate (TPR) vs. False Positive Rate (FPR).
• AUC: Area under the ROC curve; a value close to 1 indicates a good
classifier.
• When to Use: To compare models, especially with imbalanced datasets.
• Interpretation: High AUC means the model separates classes well.

1.6 Log Loss

• Formula: 𝐿𝑜𝑔 𝐿𝑜𝑠𝑠 = −1𝑁 ∑ 𝑖 = 1𝑁(𝑦𝑖𝑙𝑜 𝑔(𝑝𝑖) + (1 − 𝑦𝑖)𝑙𝑜 𝑔(1 − 𝑝𝑖))


1
• Log Loss = − 𝑁 ∑𝑁𝑖=1(𝑦𝑖 log(𝑝𝑖 ) + (1 − 𝑦𝑖 ) log(1 − 𝑝𝑖 ))𝐿𝑜𝑔 𝐿𝑜𝑠𝑠 = −𝑁1𝑖 = 1 ∑ 𝑁
(𝑦𝑖𝑙𝑜𝑔(𝑝𝑖) + (1 − 𝑦𝑖)𝑙𝑜𝑔(1 − 𝑝𝑖))
• When to Use: Probabilistic classifiers.
• Interpretation: Lower log loss indicates better probability estimates.

2. Regression Metrics
Used for models that predict continuous outputs (e.g., house prices, stock prices).

2.1 Mean Absolute Error (MAE)

• Formula: 𝑀𝐴𝐸 = 1𝑁 ∑ 𝑖 = 1𝑁 ∣ 𝑦𝑖 − 𝑦 𝑖 ∣
1
• MAE = 𝑁 ∑𝑁𝑖=1|𝑦𝑖 − 𝑦
̂|𝑀𝐴𝐸
𝑖 = 𝑁1𝑖 = 1 ∑ 𝑁 ∣ 𝑦𝑖 − 𝑦𝑖 ∣
• When to Use: Understand average magnitude of errors.
• Interpretation: Lower MAE means fewer average deviations from true values.
• Improvement: Use models that capture trends more accurately.
Page |3

2.2 Mean Squared Error (MSE)


1
• Formula: 𝑀𝑆𝐸 = 1𝑁 ∑ 𝑖 = 1𝑁(𝑦𝑖 − 𝑦 𝑖 )2MSE = 𝑁 ∑𝑁
𝑖=1(𝑦𝑖 − 𝑦
̂)
𝑖
2
𝑀𝑆𝐸 = 𝑁1𝑖 =
1 ∑ 𝑁(𝑦𝑖 − 𝑦𝑖)2
• When to Use: Penalizing larger errors.
• Interpretation: Sensitive to outliers.
• Improvement: Remove or minimize outliers.

2.3 Root Mean Squared Error (RMSE)

• Formula: 𝑅𝑀𝑆𝐸 = 𝑀𝑆𝐸 ∗ RMSE = √MSE𝑅𝑀𝑆𝐸 = 𝑀𝑆𝐸


• When to Use: Similar to MSE but interpretable in the same units as the target
variable.
• Example Use Case: Predicting housing prices where large deviations are
undesirable.

2.4 R-Squared (Coefficient of Determination)

• Formula:𝑅2 = 1 −
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟𝑠 (𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠)𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠𝑅 2 = 1 −
Sum of Squared Errors (Residuals)
𝑅2 = 1 −
Total Sum of Squares
𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟𝑠 (𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠)
• When to Use: To explain the variance captured by the model.
• Interpretation: Close to 1 indicates a good fit.
• Limitation: Can be misleading with non-linear data.

3. Clustering Metrics
Used for unsupervised learning tasks (e.g., customer segmentation).

3.1 Silhouette Score

• Range: [-1, 1]
• When to Use: Measure how similar an object is to its cluster vs. other
clusters.
• Interpretation: Higher scores indicate well-separated clusters.
Page |4

3.2 Davies-Bouldin Index

• When to Use: Evaluate the compactness and separation of clusters.


• Interpretation: Lower values indicate better clustering.

3.3 Dunn Index

• When to Use: To measure the ratio between the smallest inter-cluster


distance and the largest intra-cluster distance.
• Interpretation: Higher values indicate better clustering.

4. Use Cases and Interpretations


Problem Type Metric Use Case What It Tells Us How to Improve
How well the Balance the
Accuracy,
Binary Spam model dataset, adjust
F1, AUC-
Classification detection distinguishes thresholds,
ROC
between classes improve features.
Balance of Use weighted
Multi-class Precision, Image precision and metrics, improve
Classification Recall, F1 classification recall across class-specific
classes features.
MAE, Remove outliers,
House price Accuracy of
Regression RMSE, R- fine-tune model,
prediction predicted values
Squared normalize data.
Tune cluster count
Silhouette Customer (k), improve
Clustering Quality of clusters
Score segmentation distance
measures.

5. Best Practices for Model Evaluation


1. Choose Metrics Based on the Problem:
o Classification: Accuracy, Precision, Recall, F1.
o Regression: MAE, MSE, RMSE.
o Clustering: Silhouette Score, Davies-Bouldin Index.
2. Cross-Validation:
o Use k-fold cross-validation to assess model performance on different
subsets of data.
3. Visualize Metrics:
o Use confusion matrices, ROC curves, and scatter plots to understand
model performance.
Page |5

4. Consider Business Objectives:


o Choose metrics aligned with the problem's real-world impact (e.g.,
recall for medical diagnosis).
5. Analyze Failure Cases:
o Study misclassifications or large errors to identify areas for
improvement.

You might also like