0% found this document useful (0 votes)
7 views17 pages

Machine Learning # 2

The document provides an overview of model performance and evaluation metrics in machine learning, focusing on classification and regression metrics such as accuracy, precision, F1 score, and ROC AUC. It discusses the importance of proper dataset partitioning, underfitting and overfitting, and hyperparameter tuning through techniques like holdout and k-fold cross-validation. The content emphasizes the need for appropriate evaluation metrics, especially in cases of imbalanced data, to ensure accurate model performance assessment.

Uploaded by

sm08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views17 pages

Machine Learning # 2

The document provides an overview of model performance and evaluation metrics in machine learning, focusing on classification and regression metrics such as accuracy, precision, F1 score, and ROC AUC. It discusses the importance of proper dataset partitioning, underfitting and overfitting, and hyperparameter tuning through techniques like holdout and k-fold cross-validation. The content emphasizes the need for appropriate evaluation metrics, especially in cases of imbalanced data, to ensure accurate model performance assessment.

Uploaded by

sm08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Introduction to Machine Learning

Course Teacher:
Dr. M. Shahidur Rahman
Professor, DoCSE, SUST
2 Model Performance and Evaluation Metrics

Topics covered:
 Evaluation Metrics
 Model Performance Evaluation
 Model Selection
Model Performance and Evaluation Metrics

 In classification domain, the simplest visualization of the success of a


model is normally described using the confusion matrix.
Evaluation Metrics

Accuracy:

 True positive rate (TPR) or recall or hit rate or sensitivity:

 Precision or positive predictive value:

 F1 Score:
Evaluation Metrics…

 Specificity:

 Miss rate or false negative rate:

 False Positive Rate (FPR):


Evaluation Metrics…

 Accuracy and classification error are informative measures of success


when the data is balanced in terms of the classes
 When the data is imbalanced, i.e., one class is represented in larger
proportion over the other class in the dataset, these measures become
biased towards the majority class and give a wrong estimate of success.
 In such cases, base measures, such as true positive rate (TPR), false
positive rate (FPR), true negative rate (TNR), and false negative rate (FNR),
become useful.
 Metrics such as F1 score combines the base measures to give an overall
measure of success.
Evaluation Metrics…

 The curve that plots TPR and FPR for a classifier at various thresholds is
known as the receiver-operating characteristic (ROC) curve.
 Precision and recall can be plotted at different thresholds, giving the
precision-recall curve (PRC)
 The areas under each curve are respectively known as auROC and auPRC
and are popular metrics of performance.
 In particular, auPRC is generally considered to be an informative metric
in the presence of imbalanced classes.
8 ROC AUC

 A perfect classifier would fall into


the top-left corner of the graph
with a TPR of 1 and an FPR of 0.
 Based on the ROC curve, we
compute the ROC area under the
curve (ROC AUC) to characterize
the performance of a classification
model.
 Higher ROC AUC means better
classification performance.
Regression Evaluation Metrics

 Average prediction error:

 Mean absolute error (MAE):

 Root mean squared error (RMSE):

 Relative squared error (RSE) is used when two errors are measured in
different units:
10 Ratio for partitioning a dataset into training and
test datasets
 In general, we don't want to allocate too much information to the test set.
 However, the smaller the test set, the more inaccurate the estimation of the
generalization error.
 Dividing a dataset into training and test datasets is all about balancing this
tradeoff.
 In practice, the most commonly used splits are 60:40, 70:30, or 80:20,
depending on the size of the initial dataset.
 For large datasets, 90:10 or 99:1 splits are also common and appropriate.
 For example, if the dataset contains more than 100,000 training examples, it
might be fine to withhold only 10,000 examples for testing in order to get a
good estimate of the generalization performance.
11 Underfitting and overfitting

 model can also suffer from underfitting (high bias), which means that
our model is not complex enough to capture the pattern in the training
data well and suffers from low performance on unseen data.
 If a model is too complex for a given training dataset—there are too
many parameters in this model—the model tends to overfit the training
data and does not generalize well to unseen data
12 Debugging algorithms with learning and
validation curves
Hyperparameter tuning

 Validation techniques are meant to


answer the question of how to select
a model(s) with the right
hyperparameter values.
 Hyperparameters are parameters set
before training a machine learning
model. They are not learned from the
data but are manually configured to
optimize model performance. Ex.
Learning Rate (𝛼), Number of Trees, Hyperparameter C is the inverse
Kernal type in SVM. regularization parameter of the
LogisticRegression classifier,
where C=1 provides best performance.
14 Holdout cross-validation

 For estimating the


generalization
performance of ML
models is holdout cross-
validation
K-fold cross validation

 The validation process needs a


large number of labeled data
points for creating the training
set and the validation set.
 Collecting a large labeled set is
usually difficult
 In such cases, instead of
physically separating the training
set and validation set, k-fold
cross-validation is used.
16 K-fold cross validation…

 Once we have found satisfactory hyperparameter values, we can retrain


the model on the complete training dataset and obtain a final
performance estimate using the independent test dataset.
 Value of k in k-fold cross-validation is typically k = 10.
 A special case of k-fold cross-validation is the leave-one-out cross-
validation (LOOCV) method, where k = n, number of training examples.
 It is recommended for working with very small datasets.
17 Model Selection

 Use cross-validation or k-fold cross-validation for fine-tuning the


performance of an ML model by varying its hyperparameter values
 Choose the model that performs best on relevant criteria such as
accuracy.

You might also like