Performance Metrics
Performance Metrics
METRICS IN MACHINE
LEARNING AND
DEEP LEARNING
Performance Metrics in Machine
Learning (ML) and Deep Learning (DL)
In both Machine Learning (ML) and Deep Learning (DL), the performance of
a model is evaluated based on various metrics, depending on the type of task
being solved—whether it’s a classification, regression, or other specialized
tasks. For regression tasks, there are specific performance metrics that help
assess how well a model is performing in predicting continuous numerical
values.
Here, we will cover the key regression metrics, their applications, formulas,
and use cases:
1.)R² Score (Coefficient of Determination)
• The R² score (also known as the Coefficient of Determination) is one of the most commonly used metrics for
evaluating the performance of regression models. It measures how well the model’s predictions approximate the actual
values. An R² score of 1.0 indicates perfect predictions, while a score of 0.0 means the model performs no better than
predicting the mean of the target variable.
Applications:
• R² is used to assess the goodness-of-fit of a regression model.
• It’s most commonly applied in linear regression and generalized
linear models.
Interpretation:
• R² = 1: Perfect model.
• R² = 0: Model doesn’t explain any of the variance.
• Negative R²: Model performs worse than a model that just
predicts the mean of the target variable.
2.) Mean Absolute Error (MAE)
• Mean Absolute Error (MAE) measures the average magnitude of errors between predicted values and
actual values. It provides a straightforward interpretation of prediction errors in the same units as the
target variable.
Applications:
• MAE is commonly used for regression tasks where the
emphasis is on minimizing the absolute differences between
actual and predicted values.
• It is easy to interpret, especially when dealing with non-technical
stakeholders.
Interpretation:
• The lower the MAE, the better the model’s predictions are.
• MAE = 0 means perfect predictions.
3.) Mean Absolute Percentage Error (MAPE)
• Mean Absolute Percentage Error (MAPE) is used to express the prediction accuracy of the model as a percentage. It
represents the average absolute percentage difference between the actual and predicted values. MAPE is particularly
useful when comparing the performance of different models across datasets of varying scales.
Applications:
• MAPE is used in forecasting models, especially in time series
forecasting (e.g., sales predictions, stock prices).
• It is highly effective when you want to understand percentage
errors and provide a normalized metric for the model’s
performance.
Interpretation:
• MAPE = 0% means perfect predictions.
• A higher MAPE indicates that the model is less accurate.
• However, MAPE is sensitive to zero values in the actual
dataset, which can cause undefined results.
4.) Mean Directional Accuracy (MDA)
•
• Mean Directional Accuracy (MDA) measures the ability of the model to predict the correct direction of change
(whether the predicted value is higher or lower than the actual value). It is a common metric in applications like stock
market prediction, where directionality is often more important than exact prediction.
Applications:
• MDA is especially important in financial
forecasting and time series prediction
where the direction of movement is
more critical than the exact magnitude of
the change.
Interpretation:
• MDA = 1 means the model always
predicts the correct direction.
• MDA = 0 means the model predicts the
wrong direction or is no better than
random guessing.
5.) Explained Variance Score (EVS)
• The Explained Variance Score (EVS) measures the proportion of variance in the target variable that is explained by the
model. It provides insight into the model’s ability to capture the variability in the data.
Applications:
• EVS is useful in cases where the proportion of explained
variance is important, such as in linear regression, ridge
regression, or principal component analysis (PCA).
Interpretation:
• EVS = 1 means the model explains all the variance.
• EVS = 0 means the model explains none of the variance.
• Negative EVS means the model is worse than predicting the
mean value.
Classification Performance Metrics
Accuracy is a simple metric that measures the overall correctness of a classification model, calculated as the ratio of correctly predicted instances
(both true positives and true negatives) to the total number of instances. It is effective when the classes in the dataset are balanced. However,
accuracy can be misleading in imbalanced datasets where one class vastly outnumbers the other, as a model could perform well simply by
predicting the majority class.
Precision focuses on the accuracy of positive predictions, measuring the proportion of true positives out of all predicted positives. It is crucial in
scenarios where false positives are costly, such as in fraud detection or email spam filtering, where predicting a negative as positive could lead to
unnecessary actions or expenses.
Recall, also known as sensitivity or the true positive rate, calculates the proportion of true positives out of all actual positives. This metric is
important when the cost of missing positive instances is high, such as in medical diagnostics, where failing to identify a disease (false negative)
could have serious consequences.
F1 Score combines both precision and recall into a single metric by taking their harmonic mean. It is particularly useful when dealing with
imbalanced datasets, as it provides a balance between precision and recall, penalizing extreme values in either metric. It is often preferred over
accuracy when there is a need to handle both false positives and false negatives effectively.
ROC-AUC (Receiver Operating Characteristic - Area Under Curve) measures the trade-off between the true positive rate and the false positive
rate across different thresholds. The AUC represents the model’s ability to distinguish between classes, with a higher AUC indicating better overall
performance. It is a powerful metric for evaluating binary classifiers, especially in imbalanced datasets, as it does not depend on a fixed threshold
for classification.
Confusion Matrix provides a detailed breakdown of a model’s performance by showing the counts of true positives, false positives, true negatives,
and false negatives. It helps to visualize where the model is making errors and is the foundation for many of the other performance metrics. It is
especially useful when precision and recall are important and helps in understanding how the model is performing across different classes.