Week 08
Week 08
WEEK 08
PERFORMANCE METRICS FOR
CLASSIFICATION ALGORITHMS
metrics Accuracy
Precision
Classificati
Recall (Sensitivity)
on
Specificity
F1-Score
PERFORMANCE METRICS FOR REGRESSION ML ALGORITHMS
Mean Absolute Error (MAE):
Measures the average absolute difference between predicted and actual values. Lower
MAE indicates better performance.
Mean Squared Error (MSE):
Calculates the average squared difference between predicted and actual values.
More sensitive to outliers than MAE.
Lower MSE indicates better performance.
Root Mean Squared Error (RMSE) – (Square root of MSE).
Provides the error in the same units as the target variable.
Lower RMSE indicates better performance.
R-squared (R²):
Measures the proportion of variance in the dependent variable explained by the independent
variables.
Higher R² indicates better fit.
R² can range from 0 to 1.
Adjusted R-squared:
Similar to R², but penalizes for the number of independent variables.
Useful when comparing models with different numbers of features.
CHOOSING THE RIGHT METRIC FOR REGRESSION MODELS
MAE:
Good for understanding the average magnitude of error.
Additional Considerations:
Domain-Specific Metrics:
In some domains, specific metrics may be more relevant. For example, in financial forecasting, Mean Absolute
Percentage Error (MAPE) might be used.
Data Distribution:
The distribution of the target variable can influence the choice of metric. For example, if the target variable is
skewed, RMSE might be more appropriate than MAE.
Model Interpretation:
While these metrics are essential for model evaluation, it's also important to interpret the model's predictions
BINARY CLASSIFICATION
MULTICLASS CLASSIFICATION
CONFUSION MATRIX
MULTI CLASS CLASSIFICATION
PERFORMANCE METRICS FOR CLASSIFICATION ALGORITHMS
Confusion Matrix-Based Metrics:
Accuracy: Overall, how often is the model correct?
Calculated as
Recall (Sensitivity): Of all the actual positive cases, how many did the model correctly
identify?
Calculated as
Specificity: Of all the actual negative cases, how many did the model correctly
identify?
Calculated as
F1-Score: Harmonic mean of precision and recall. A good measure of balance between
precision and recall.
Calculated as
Other Metrics:
ROC Curve (Receiver Operating Characteristic Curve):
Plots the true positive rate (sensitivity) against the false positive rate (1-specificity)
at1 various threshold settings
AUC-ROC (Area Under the Curve):
Measures the overall performance of the model across all possible threshold settings.
Log Loss:
Measures the average error of the model's predictions.
Cohen's Kappa:
Measures the agreement between the predicted and actual classifications, accounting
for chance agreement.
CHOOSING THE RIGHT METRIC
The choice of metric depends on the specific problem and the
desired outcome. Consider the following factors:
Imbalanced Classes:
If the dataset has imbalanced classes, precision, recall, and F1-score can be more
informative than accuracy.
Cost-Sensitive Learning:
If misclassifications have different costs, precision, recall, and F1-score can be
weighted accordingly.
Overall Performance:
Accuracy is a good overall measure of performance, but it can be misleading in
imbalanced datasets.
Trade-off Between Precision and Recall:
The ROC curve and AUC-ROC can help visualize the trade-off between precision and