Machine Learning # 2
Machine Learning # 2
Course Teacher:
Dr. M. Shahidur Rahman
Professor, DoCSE, SUST
2 Model Performance and Evaluation Metrics
Topics covered:
Evaluation Metrics
Model Performance Evaluation
Model Selection
Model Performance and Evaluation Metrics
Accuracy:
F1 Score:
Evaluation Metrics…
Specificity:
The curve that plots TPR and FPR for a classifier at various thresholds is
known as the receiver-operating characteristic (ROC) curve.
Precision and recall can be plotted at different thresholds, giving the
precision-recall curve (PRC)
The areas under each curve are respectively known as auROC and auPRC
and are popular metrics of performance.
In particular, auPRC is generally considered to be an informative metric
in the presence of imbalanced classes.
8 ROC AUC
Relative squared error (RSE) is used when two errors are measured in
different units:
10 Ratio for partitioning a dataset into training and
test datasets
In general, we don't want to allocate too much information to the test set.
However, the smaller the test set, the more inaccurate the estimation of the
generalization error.
Dividing a dataset into training and test datasets is all about balancing this
tradeoff.
In practice, the most commonly used splits are 60:40, 70:30, or 80:20,
depending on the size of the initial dataset.
For large datasets, 90:10 or 99:1 splits are also common and appropriate.
For example, if the dataset contains more than 100,000 training examples, it
might be fine to withhold only 10,000 examples for testing in order to get a
good estimate of the generalization performance.
11 Underfitting and overfitting
model can also suffer from underfitting (high bias), which means that
our model is not complex enough to capture the pattern in the training
data well and suffers from low performance on unseen data.
If a model is too complex for a given training dataset—there are too
many parameters in this model—the model tends to overfit the training
data and does not generalize well to unseen data
12 Debugging algorithms with learning and
validation curves
Hyperparameter tuning