0% found this document useful (0 votes)
35 views23 pages

2-Training and Testing Models, Evaluation Metrics-01-07-2023

Uploaded by

Roy Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views23 pages

2-Training and Testing Models, Evaluation Metrics-01-07-2023

Uploaded by

Roy Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Training and Testing Models

• Machine Learning algorithms enable the machines


to make predictions and solve problems on the basis
of past observations or experiences.
• These experiences or observations an algorithm can
take from the training data, which is fed to it.
• Further, one of the great things about ML
algorithms is that they can learn and improve over
time on their own, as they are trained with the
relevant training data.
• Once the model is trained enough with the relevant
training data, it is tested with the test data.
32
• We can understand the whole process of training
and testing in three steps, which are as follows:
• Feed: Firstly, we need to train the model by feeding it
with training input data.
• Define: Now, training data is tagged with the
corresponding outputs (in Supervised Learning), and the
model transforms the training data into text vectors or a
number of data features.
• Test: In the last step, we test the model by feeding it with
the test data/unseen dataset. This step ensures that the
model is trained efficiently and can generalize well.

33
34
Training Vs Testing
• The main difference between training data and
testing data is that training data is the subset of
original data that is used to train the machine
learning model, whereas testing data is used to
check the accuracy of the model.
• The training dataset is generally larger in size
compared to the testing dataset. The general ratios
of splitting train and test datasets are 80:20, 70:30,
or 90:10.
• Training data is well known to the model as it is
used to train the model, whereas testing data is like
unseen/new data to the model.
35
Evaluation Metrics
• Performance metrics are used to evaluate the
performance/ effectiveness of our machine learning
model.

36
Performance Metrics for
Regression
• Regression analysis is a subfield of supervised
machine learning.
• It aims to model the relationship between a certain
number of features and a continuous target variable.
• Following are the performance metrics used for
evaluating a regression model:
• Mean Absolute Error (MAE)
• Mean Squared Error (MSE)
• Root Mean Squared Error (RMSE)
• R-Squared
• Adjusted R-squared

37
Mean Squared Error (MSE)

• Here, the error term is squared and thus more


sensitive to outliers as compared to Mean Absolute
Error (MAE).
• Thus,
MSE = 1/4 * (|5-4.8|^2+|10-10.6|^2+|15-14.3|^2+|20-
20.1|^2) = 0.225

39
Root Mean Squared Error
(RMSE)

• Since MSE includes squared error terms, we take


the square root of the MSE, which gives rise to Root
Mean Squared Error (RMSE).
• Thus,
RMSE = (0.225)^0.5 = 0.474

40
R-Squared

• R-squared is calculated by dividing the sum of


squares of residuals (SSres) from the regression
model by the total sum of squares (SStot) of errors
from the average model and then subtract it from 1.
• R-squared is also known as the Coefficient of
Determination.
• It explains the degree to which the input variables
explain the variation of the output / predicted
variable.
41
Adjusted R-squared

• Here, N- total sample size (number of rows) and p- number of predictors


(number of columns)
• The limitation of R-squared is that it will “either stay the same or
increases with the addition of more variables, even if they do not have
any relationship with the output variables.”
• To overcome this limitation, Adjusted R-square comes into the picture as
it penalizes you for adding the variables which do not improve your
existing model.
• Hence, if you are building Linear regression on multiple variables, it is
always suggested that you use Adjusted R-squared to judge the goodness
of the model.
• If there exists only one input variable, R-square and Adjusted R squared
are same. 42
Performance Metrics for
Classification
• Classification is the problem of identifying to which
of a set of categories/classes a new observation
belongs, based on the training set of data containing
records whose class label is known.
• Following are the performance metrics used for
evaluating a classification model:
• Accuracy
• Precision and Recall
• Specificity
• F1-score
• AUC-ROC

43
• To understand different metrics, we must
understand the Confusion matrix.
• A confusion matrix is a table that is often used to
describe the performance of a classification model
(or "classifier") on a set of test data for which the
true values are known.

44
For Multiclass

45
• TN- True negatives (actual 0 predicted 0)
• TP- True positives (actual 1 predicted 1)
• FP- False positives (actual 0 predicted 1)
• FN- False Negatives (actual 1 predicted 0)
• Consider the following values for the confusion
matrix-
• True negatives (TN) = 300
• True positives (TP) = 500
• False negatives (FN) = 150
• False positives (FP) = 50

46
Accuracy

• Accuracy is defined as the ratio of the number of


correct predictions and the total number of
predictions. It lies between [0,1].
• In general, higher accuracy means a better model
(TP and TN must be high).
• However, accuracy is not a useful metric in case of
an imbalanced dataset (datasets with uneven
distribution of classes).
47
• Say we have a data of 1000 patients out of which 50
are having cancer and 950 not, a dumb model which
always predicts as no cancer will have the accuracy
of 95%, but it is of no practical use since in this
case, we want the number of False Negatives as a
minimum.
• Thus, we have different metrics like recall,
precision, F1-score etc.
Thus, Accuracy using above values will be
(500+300)/(500+50+150+300) = 800/1000 = 80%

48
Recall

• Recall is a useful metric in case of cancer detection,


where we want to minimize the number of False
negatives for any practical use since we don't want our
model to mark a patient suffering from cancer as safe.
• On the other hand, predicting a healthy patient as
cancerous is not a big issue since, in further diagnosis, it
will be cleared that he does not have cancer. Recall is
also known as Sensitivity.
• Thus, Recall using above values will be 500/(500+150) =
500/650 = 76.92%

49
Precision

• Precision is useful when we want to reduce the number


of False Positives.
• Consider a system that predicts whether the e-mail
received is spam or not. Taking spam as a positive class,
we do not want our system to predict non-spam e-mails
(important e-mails) as spam, i.e., the aim is to reduce
the number of False Positives.
• Thus, Precision using above values will be 500/(500+50) =
500/550 = 90.90%

50
Specificity
• Specificity is defined as the ratio of True negatives
and True negatives + False positives.
• We want the value of specificity to be high. Its
value lies between [0,1].

• Thus, Specificity using above values will be


300/(300+50) = 300/350 = 85.71%
51
F1-score
• F1-score is a metric that combines both Precision
and Recall and equals to the harmonic mean of
precision and recall.
• Its value lies between [0,1] (more the value better
the F1-score).

• Using values of precision=0.9090 and


recall=0.7692, F1-score = 0.8333 = 83.33%

52
AUC-ROC
• AUC (Area Under The Curve)- ROC (Receiver
Operating Characteristics) curve is one of the most
important evaluation metrics for checking any
classification model’s performance.
• It is plotted between FPR (X-axis) and TPR (Y-axis).

• If the value is less than 0.5 than the model is even worse
than a random guessing model.
53
54

You might also like