Evaluation Metrics
Evaluation Metrics
Machine Learning
Sai
Page |1
Machine learning models are evaluated using metrics that measure their
performance on the given task. The choice of evaluation metric depends on the type
of problem (e.g., regression, classification, clustering, etc.), the dataset, and the
business objectives.
Here’s an in-depth look at evaluation metrics for different types of machine learning.
1. Classification Metrics
Used to evaluate models where the goal is to predict categories or labels (e.g., spam
detection, image classification).
1.1 Accuracy
• Formula:
Number of Correct Predictions
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
Total Number of Predictions
• When to Use: Balanced datasets with equal class distribution.
• Limitation: Misleading for imbalanced datasets (e.g., 95% accuracy when
95% of data belongs to one class).
• Example Use Case: Email spam classification with balanced classes.
1.2 Precision
True Positives
• Formula : Precision = True Positives + False Positives
•
• When to Use: When false positives are costly (e.g., predicting cancer when it
doesn't exist).
• Interpretation: High precision means fewer false alarms.
• Improvement: Reduce model's tendency to misclassify negatives as positives
(e.g., threshold adjustment).
1.4 F1-Score
Precision⋅Recall
• Formula:𝐹1 = 2 ⋅ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙𝐹1 = 2 ⋅ Precision + Recall
• 𝐹1 = 2 ⋅ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙
• When to Use: When precision and recall are equally important.
• Interpretation: Balance between false positives and false negatives.
• Example Use Case: Fraud detection where both false alarms and missed
frauds are critical.
• ROC Curve: Plots True Positive Rate (TPR) vs. False Positive Rate (FPR).
• AUC: Area under the ROC curve; a value close to 1 indicates a good
classifier.
• When to Use: To compare models, especially with imbalanced datasets.
• Interpretation: High AUC means the model separates classes well.
2. Regression Metrics
Used for models that predict continuous outputs (e.g., house prices, stock prices).
• Formula: 𝑀𝐴𝐸 = 1𝑁 ∑ 𝑖 = 1𝑁 ∣ 𝑦𝑖 − 𝑦 𝑖 ∣
1
• MAE = 𝑁 ∑𝑁𝑖=1|𝑦𝑖 − 𝑦
̂|𝑀𝐴𝐸
𝑖 = 𝑁1𝑖 = 1 ∑ 𝑁 ∣ 𝑦𝑖 − 𝑦𝑖 ∣
• When to Use: Understand average magnitude of errors.
• Interpretation: Lower MAE means fewer average deviations from true values.
• Improvement: Use models that capture trends more accurately.
Page |3
• Formula:𝑅2 = 1 −
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟𝑠 (𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠)𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠𝑅 2 = 1 −
Sum of Squared Errors (Residuals)
𝑅2 = 1 −
Total Sum of Squares
𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟𝑠 (𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠)
• When to Use: To explain the variance captured by the model.
• Interpretation: Close to 1 indicates a good fit.
• Limitation: Can be misleading with non-linear data.
3. Clustering Metrics
Used for unsupervised learning tasks (e.g., customer segmentation).
• Range: [-1, 1]
• When to Use: Measure how similar an object is to its cluster vs. other
clusters.
• Interpretation: Higher scores indicate well-separated clusters.
Page |4