Unit III Iml Final
Unit III Iml Final
Classification
Evaluating the performance of your classification model is crucial to ensure its accuracy and
effectiveness. While accuracy is important, it’s just one piece of the puzzle. There are several other
evaluation metrics that provide a more comprehensive understanding of your model’s performance.
This article will discuss these metrics and how they can guide you in making the right decisions to
improve your model’s predictive power.
Classification Metrics in Machine Learning
Classification Metrics is about predicting the class labels given input data. In binary classification, there
are only two possible output classes (i.e., Dichotomy). In multiclass classification, more than two
possible classes can be present. I’ll focus only on binary classification.
A very common example of binary classification is spam detection, where the input data could include
the email text and metadata (sender, sending time), and the output label is either “spam” or “not spam.”
(See Figure) Sometimes, people use some other names also for the two classes: “positive” and
“negative,” or “class 1” and “class 0.”
There are many ways for measuring classification performance. Accuracy, confusion matrix, log-loss,
and AUC-ROC are some of the most popular metrics. Precision-recall is a widely used metrics for
classification problems.
When any model gives an accuracy rate of 99%, you might think that model is performing very good
but this is not always true and can be misleading in some situations. I am going to explain this with the
help of an example.
Confusion Matrix
Confusion Matrix is a performance measurement for the machine learning classification problems
where the output can be two or more classes. It is a table with combinations of predicted and actual
values.
True Positive: We predicted positive and it’s true. In the image, we predicted that a woman is
pregnant and she actually is.
True Negative: We predicted negative and it’s true. In the image, we predicted that a man is
not pregnant and he actually is not.
False Positive (Type 1 Error): We predicted positive and it’s false. In the image, we predicted
that a man is pregnant but he actually is not.
False Negative (Type 2 Error): We predicted negative and it’s false. In the image, we predicted
that a woman is not pregnant but she actually is.
We discussed Accuracy, now let’s discuss some other metrics of the confusion matrix!
Precision
It explains how many of the correctly predicted cases actually turned out to be positive. Precision is
useful in the cases where False Positive is a higher concern than False Negatives. The importance of
Precision is in music or video recommendation systems, e-commerce websites, etc. where wrong results
could lead to customer churn and this could be harmful to the business.
Precision for a label is defined as the number of true positives divided by the number of predicted
positives.
Recall (Sensitivity)
It explains how many of the actual positive cases we were able to predict correctly with our model.
Recall is a useful metric in cases where False Negative is of higher concern than False Positive.
F1 Score
It gives a combined idea about Precision and Recall metrics. It is maximum when Precision is equal to
Recall.
F1 Score is the harmonic mean of precision and recall.
The F1 score punishes extreme values more. F1 Score could be an effective evaluation metric in the
following cases:
When FP and FN are equally costly.
Adding more data doesn’t effectively change the outcome
True Negative is high
AUC-ROC
The Receiver Operator Characteristic (ROC) is a probability curve that plots the TPR(True Positive
Rate) against the FPR(False Positive Rate) at various threshold values and separates the ‘signal’ from
the ‘noise’.
The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between
classes. From the graph, we simply say the area of the curve ABDE and the X and Y-axis.
From the graph shown below, the greater the AUC, the better is the performance of the model at different
threshold points between positive and negative classes. This simply means that When AUC is equal to
1, the classifier is able to perfectly distinguish between all Positive and Negative class points. When
AUC is equal to 0, the classifier would be predicting all Negatives as Positives and vice versa. When
AUC is 0.5, the classifier is not able to distinguish between the Positive and Negative classes.
Working of AUC
In a ROC curve, the X-axis value shows False Positive Rate (FPR), and Y-axis shows True Positive
Rate (TPR). Higher the value of X means higher the number of False Positives(FP) than True
Negatives(TN), while a higher Y-axis value indicates a higher number of TP than FN. So, the choice of
the threshold depends on the ability to balance between FP and FN.
Log loss (Logistic loss) or Cross-Entropy Loss is one of the major metrics to assess the performance
of a classification problem.
For a single sample with true label y∈{0,1} and a probability estimate p=Pr(y=1), the log loss is:
Mean absolute error, or L1 loss, stands out as one of the simplest and easily
comprehensible loss functions and evaluation metrics. It computes by averaging the absolute
differences between predicted and actual values across the dataset. Mathematically, it represents
the arithmetic mean of absolute errors, focusing solely on their magnitude, irrespective of
direction. A lower MAE indicates superior model accuracy.
where
In “Mean Bias Error,” bias reflects the tendency of a measurement process to overestimate or
underestimate a parameter. It has a single direction, positive or negative. Positive bias implies an
overestimated error, while negative bias implies an underestimated error. Mean Bias Error (MBE)
calculates the mean difference between predicted and actual values, quantifying overall bias
without considering absolute values. Similar to MAE, MBE differs in not taking the absolute
value. Caution is needed with MBE, as positive and negative errors can cancel each other out.
RAE measures the performance of a predictive model and is expressed in terms of a ratio. The
value of RAE can range from zero to one. A good model will have values close to zero, with zero
being the best value. This error shows how the mean residual relates to the mean devia tion of the
target function from its mean.
MSE is also known as Quadratic loss as the penalty is not proportional to the error but to the
square of the error. Squaring the error gives higher weight to the outliers, which results in a smooth
gradient for small errors.
Optimization algorithms benefit from this penalization for large errors as it helps find the optimum
values for parameters using the least squares method. MSE will never be negative since the errors
are squared. The value of the error ranges from zero to infinity. MSE increases exponentially with
an increase in error. A good model will have an MSE value closer to zero, indicating a better
goodness of fit to the data.