0% found this document useful (0 votes)
13 views36 pages

Lec 10

This document provides an overview of the contents of Lecture 10 of the EE2211 Introduction to Machine Learning course. It discusses dataset partitioning into training, validation, and test sets. It also covers evaluation metrics for assessing model performance and gives an example of using a validation set to select hyperparameters when training a random forest model for face classification.

Uploaded by

kiuclairdelune
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views36 pages

Lec 10

This document provides an overview of the contents of Lecture 10 of the EE2211 Introduction to Machine Learning course. It discusses dataset partitioning into training, validation, and test sets. It also covers evaluation metrics for assessing model performance and gives an example of using a validation set to select hyperparameters when training a random forest model for face classification.

Uploaded by

kiuclairdelune
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

EE2211 Introduction to

Machine Learning
Lecture 10

Wang Xinchao
[email protected]

!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Course Contents
• Introduction and Preliminaries (Xinchao)
– Introduction
– Data Engineering
– Introduction to Linear Algebra, Probability and Statistics
• Fundamental Machine Learning Algorithms I (Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Helen)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Xinchao)
– Performance Issues [Important] In the Final, no coding questions for Xinchao’s part!
– K-means Clustering
Despite you will see some in the tutorial, they won’t be testsed.
– Neural Networks
2
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
EE2211: Learning Outcome
A Summary of Module Content
• I am able to understand the formulation of a machine learning task
– Lecture 1 (feature extraction + classification)
– Lecture 4 to Lecture 9 (regression and classification)
– Lecture 11 and Lecture 12 (clustering and neural network)
• I am able to relate the fundamentals of linear algebra and probability to machine
learning
– Lecture 2 (recap of probability and linear algebra)
– Lecture 4 to Lecture 8 (regression and classification)
– Lecture 12 (neural network)
• I am able to prepare the data for supervised learning and unsupervised learning
– Lecture 1 (feature extraction), Page 26 to 31 [For supervised and unsupervised]
– Lecture 2 (data wrangling) [For supervised and unsupervised]
– Lecture 10 (Training/Validation/Test) [For supervised]
– Programming Exercises in tutorials
• I am able to evaluate the performance of a machine learning algorithm
– Lecture 5 to Lecture 9 (evaluate the difference between labels and predictions)
– Lecture 10 (evaluation metrics)
• I am able to implement regression and classification algorithms
– Lecture 5 to Lecture 9
3
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Dataset Partition:
– Training/Validation/Testing

• Cross Validation

• Evaluation Metrics
– Evaluating the quality of a trained classifier

We will talk about many metrics:


It is OK you can’t memorize them all
But intuition is important!

4
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
A Real-world Scenario
• We would like to train a Random Forest for face
classification (i.e., to tell an image is a human face or not)

Faces Non-faces

5
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
A Real-world Scenario
• We would like to train a Random Forest for face
classification (i.e., to tell an image is a human face or not)
– We will have one datasetTraining, Validation
to train the Random Forestand Test
Suppose these data points are all we have, and we want to use them t
algorithm’s performance on new unseen data

6
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
A Real-world Scenario
• We would like to train a Random Forest for face
classification (i.e., to tell an image is a human face or not)
– We will have one dataset to train the Random Forest
– We will have tunable (hyper)parameters for the Random Forest.
For example, the number of trees in the Random Forest
• Shall we use 100 trees?
• Shall we use 200 trees?
• …
We need to decide on the parameter

……
Tree 1 Tree 2
7
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
A Real-world Scenario
• We would like to train a Random Forest for face
classification (i.e., to tell an image is a human face or not)
– We will have one dataset to train the Random Forest
– We will have tunable (hyper)parameters for the Random Forest.
For example, the number of trees in the Random Forest
• Shall we use 100 trees?
• Shall we use 200 trees?
• …
We need to decide on the parameter
– Once we decide the number of trees, we will the Random Forest
with the selected parameter on unseen test data.

Test Data

Yes! No!
8
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Training, Validation, and Test
• In real-world application,
Training, Validation and Test
– We don’t have test data, since they are unseen
– Imagine you develop a face detector app, you don’t know whom
you will test on
• In lab practice,
– We divide the dataset into three parts
Hidden from
Training !
Training set Validation set Test set

For training the ML models For validation: For testing the “real”
choosing the performance and
parameter or generalization
model

– NEVER touch test data during training!!!


9
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Training,
Training, Validation,
Validation and Test
and Test
Training set

Validation
set

Test set

11 10
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
opyright EE, NUS. All Rights Reserved.
Training, Validation, and Test

Training set Validation set Test set

For training the ML models For validation: For testing the “real”
choosing the performance and
parameter or generalization
model

Example: Assume I want to build a Random Forest. I have a parameter to decide: shall I have
• 100 Trees?
• 200 Trees?

What we do next is to use the training set to train two classifiers,


1) 𝑪𝟏 : Random Forest with 100 trees, and 2) 𝑪𝟐 : Random Forest with 200 trees
8
yright EE, NUS. All Rights Reserved.
They have the following accuracy:
1. 𝑪𝟏 : Random Forest with 100 trees: validation accuracy 90%
2. 𝑪𝟐 : Random Forest with 200 trees: validation accuracy 88%

Which one to choose for real application, i.e., testing?


The one with higher validation accuracy, i.e., Random Forest with 100 trees!

11
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
ning, Validation and Test
Python Demo:
lec10.ipynb

Training set Validation set Test set

For training the ML models For validation: For testing the “real”
choosing the performance and
parameter or generalization
model

8
Rights Reserved.

• Problem Setup
– Dataset used: IRISdataset
• Link: https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-dataset
– Training/Validation/Test: 100/25/25
– Machine Learning Task and Model: Polynomial regression
– Parameters to select: Order 1 to 10
In the Final, no coding questions for Xinchao’s part!
12
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation
4-fold cross validation
Test
Step 1: take out test set from the dataset

13
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation
4-fold cross validation
Test

Step 2: We partition the remaining part of the dataset (after taking out the test
set), into k equal parts (equal in terms of number of samples).

14
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation
4-fold cross validation
Test

Fold 1 Train Train Train Validation


Fold 2 Train Train Validation Train
Fold 3 Train Validation Train Train
Fold 4 Validation Train Train Train

Step 3: We run k folds (i.e., k times) of experiments.


Within each fold, we use one part as validation set, and the k-1 remaining parts
as training set. We use different validation sets for different folds.

15
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation Classifiers
4-fold cross validation Trained
Test
Fold 1 Train Train Train Validation 𝐶!! 𝐶"!

Fold 2 Train Train Validation Train 𝐶!" 𝐶""

Fold 3 Train Validation Train Train 𝐶!# 𝐶"#

Fold 4 Validation Train Train Train 𝐶!$ 𝐶"$

1) 𝑪𝟏 : Random Forest with 100 trees


2) 𝑪𝟐 : Random Forest with 200 trees

Step 3.1: Within each fold, if we have n parameter/model candidates, we will


train n models, and we check their validation performance.

16
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation Classifiers
4-fold cross validation Trained
Test
Fold 1 Train Train Train Validation 𝐶!! 𝐶"!

Fold 2 Train Train Validation Train 𝐶!" 𝐶""

Fold 3 Train Validation Train Train 𝐶!# 𝐶"#

Fold 4 Validation Train Train Train 𝐶!$ 𝐶"$

Example: which one to select for test?


Fold 1 Fold 2 Fold 3 Fold 4 Average
Accuracy on Accuracy on Accuracy on Accuracy on Accuracy on All
Validation Set 1 Validation Set 2 Validation Set 3 Validation Set 4 Validation Sets
Classifier with 88% 89% 93% 92% 90.5%
Param1
(e.g. 100 trees) 𝐶!! 𝐶!" 𝐶!# 𝐶!$
Classifier with 90% 88% 91% 91% 90%
Param2
(e.g. 200 trees) 𝐶"! 𝐶"" 𝐶"# 𝐶"$
Step 4: We select the parameter/model with best average validation performance over k folds. 17
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation

Other common partitioning:


• 10-Fold CV
• 5-Fold CV
• 3-Fold CV

We may decide on the size of the test set, for example,


15%, 20%, 30% of the whole dataset, the rest for
training/validation.

18
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• The test set contains the examples that the learning
algorithm has never seen before,
• So test performance shows how well our model
generalizes. Example:

Xinchao uses k-fold cross validation to obtain an optimal


parameter for his model (e.g., decision tree). This
parameter, on the test set, achieves accuracy of 0.8.

Helen uses uses k-fold cross validation to obtain an optimal


parameter for her model (e.g., random forest). This
parameter, on the test set, achieves accuracy of 0.9.

We can say, Helen’s model generalizes better than


Xinchao’s model.
Take home message:

Validation performance -> Selecting parameters!


Test performance -> The “real” performance of a model with selected parameter!
19
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Training, Validation, and Test
• Validation is however not always used:
– Validation is used when you need to pick parameters or models
– If you have no models or parameters to compare, you may
consider partition the data into only training and test

20
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Dataset Partition:
– Training/Validation/Testing

• Cross Validation

• Evaluation Metrics
– Evaluating the quality of a trained classifier

We will talk about many metrics:


It is OK you can’t memorize them all
But intuition is important!

21
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation
Evaluation Metrics Metrics
Regression
Mean Square Error Test samples
𝑛𝑛
𝛴𝛴𝑖𝑖=𝟏𝟏 𝑦𝑦𝑖𝑖 −𝑦𝑦� 𝑖𝑖 2
(MSE = )
𝑛𝑛
Mean Absolute Error
𝑛𝑛
𝛴𝛴𝑖𝑖=𝟏𝟏 |𝑦𝑦𝑖𝑖 −𝑦𝑦� 𝑖𝑖 |
(MAE = )
𝑛𝑛
where 𝑦𝑦𝑖𝑖 denotes the target
output and 𝑦𝑦�𝑖𝑖 denotes the
predicted output for sample 𝑖𝑖.

14
22
© Copyright EE, NUS. All Rights Reserved.
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Class-1: Positive Class
Class-2: Negative Class

Confusion Matrix
Class-1 Class-2
(predicted) (predicted)
Class-1
(actual)
7 (TP) 7 (FN)
Class-2 TP: True Positive
(actual) FN: False Negative (i.e., Type II Error)
2 (FP) 25 (TN) FP: False Positive (i.e., Type I Error)
TN: True Negative

23
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Class-1 Class-2
Evaluation Metrics (predicted) (predicted)
Class-1
Classification (actual)
7 (TP) 7 (FN)
Class-2
• How many samples in the dataset
(actual)
have the real label of Class-2? 2 (FP) 25 (TN)
• How many samples are there in total?

• How many sample are correctly classified? How many are incorrectly
classified?

24
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation
Evaluation Metrics Metrics

Classification

Confusion Matrix for Binary Classification



𝐏𝐏 �
𝐍𝐍
(predicted) (predicted)

P Recall
(actual) TP FN TP/(TP+FN)
N
(actual) FP TN
Precision Accuracy
TP/(TP+FP) (TP+TN)/(TP+TN+FP+FN)

25
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681" 16
Evaluation Metrics
Classification
Cost Matrix for Binary Classification

!
𝐏 !
𝐍
(predicted) (predicted) Total cost:
𝐶%,% * TP +
P
𝐶%,' * FN +
(actual) 𝐶"," * TP 𝐶",$ * FN 𝐶',% * FP +
N 𝐶',' * TN
(actual) 𝐶$," * FP 𝐶$,$ * TN

Main Idea: To assign different penalties for different entries. Higher


penalties for more severe results.

Usually, 𝐶%,% and 𝐶',' are set to 0; 𝐶',% and 𝐶%,' may and may not equal
26
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
• Example of cost matrix
– Assume we would like to develop a self-driving car system
– We have an ML system that detects the pedestrians using camera,
by conducing a binary classification
• When it detects a person (positive class), the car should stop
• When no person is detected (negative class), the car keeps going
True Positive (cost 𝑪𝒑,𝒑 )
There is person, ML detects person and car stops

True Negative (cost 𝑪𝒏,𝒏 )


There is no person, car keeps going

False Positive (cost 𝑪𝒏,𝒑 )


There is no person, ML detects person and car stops

False Negative (cost 𝑪𝒑,𝒏 )


There is person, ML fails to detect person and car keeps going

Credit: automotiveworld.com 𝑪𝒏,𝒑 ? 𝑪𝒑,𝒏 (>, <, or =)


27
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
• Handling unbalanced data
– Assume we have 1000 samples, of which 10 are positive and 990
are negative
Class-1 Class-2
(predicted) (predicted)
– Accuracy = 990/1000=0.99!
Class-1
(actual)
– Yet, half of the Class-1 are 5 (TP) 5 (FN)
Classified to Class-2! Class-2
(actual)
5 (FP) 985 (TN)

The goal is to highlight the problems of the results!


In this case, we shall
1) Use cost matrix, assign different costs for each entry
2) Use Precision and Recall! Precision = 0.5 and Recall = 0.5
28
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification

(True Positive Rate) TPR = TP/(TP+FN) Recall


(False Negative Rate) FNR = FN/(TP+FN)

(True Negative Rate) TNR = TN/(FP+TN)


(False Positive Rate) FPR = FP/(FP+TN)

TPR + FNR = 1 (100% of positive-class data) &


𝐏 𝐍&
TNR + FPR = 1 (100% of negative-class data) (predicted) (predicted)
P
(actual) TP FN
N
(actual) FP TN

29
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Prediction function y = f(x)

sample N1 N2 P1 N3 P2 P3
input -4 -3 -2.5 -2 -1.5 -0.5
x
Prediction -1.1 -0.5 -0.1 0.2 0.6 0.9
y
Actual -1 -1 1 -1 1 1
Label

If threshold set to be y=0, #


𝐏 𝐍#
N3, P2, P3 will be taken as +1 (predicted) (predicted)

P1, N2, N1 will be taken as -1 P


(actual) 𝑻𝑷 = 𝟐 𝑭𝑵 = 𝟏

N
(actual) 𝑭𝑷 = 𝟏 𝑻𝑵 = 𝟐

30
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Prediction function y = f(x) We can change the threshold!

sample N1 N2 P1 N3 P2 P3
input -4 -3 -2.5 -2 -1.5 -0.5
x
Prediction -1.1 -0.5 -0.1 0.2 0.6 0.9
y
Actual -1 -1 1 -1 1 1
Label

If threshold set to be y=0.4, #


𝐏 𝐍#
P2, P3 will be taken as +1 (predicted) (predicted)

N3, P1, N2, N1 will be taken as -1 P


(actual) 𝑻𝑷 = 𝟐 𝑭𝑵 = 𝟏

N
(actual) 𝑭𝑷 = 𝟎 𝑻𝑵 = 𝟑

31
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification:
TP, FP, FN, TN will change wrt thresholds!
If threshold set to be y=0, If threshold set to be y=0.4,
N3, P2, P3 will be taken as +1 P2, P3 will be taken as +1
P1, N2, N1 will be taken as -1 N3, P1, N2, N1 will be taken as -1

#
𝐏 𝐍# #
𝐏 𝐍#
(predicted) (predicted) (predicted) (predicted)
P P
(actual) 𝑻𝑷 = 𝟐 𝑭𝑵 = 𝟏 (actual) 𝑻𝑷 = 𝟐 𝑭𝑵 = 𝟏

N N
(actual) 𝑭𝑷 = 𝟏 𝑻𝑵 = 𝟐 (actual) 𝑭𝑷 = 𝟎 𝑻𝑵 = 𝟑

32
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Confusion Matrix for Multicategory Classification

𝑃&% 𝑃'% 𝑃(%


(predicted) (predicted) (predicted)

𝑃& (actual) 𝑃&,&% 𝑃&,'% … 𝑃&,(%


𝑃) (actual)
𝑃',&% 𝑃','% … 𝑃',(%

⁞ ⁞ ⁞ ⁞
𝑃( (actual) 𝑃(,&% 𝑃(,'% 𝑃(,(%

33
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Other Issues
• Computational speed and memory consumptions are also
important factors
– Especially for mobile or edge devices

• Other factors
– Parallelable, Modularity, Maintainability

• Not focus of this module

34
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Practice Question

Suppose we have a dataset of 550 samples. We take out n


samples as test set, and run k-fold cross validation on the
remaining samples.

Within each fold, we know that, the number of training


samples is three times as large as the number of
validation samples, and two times as large as the number
of test samples.

1. What is k?
2. What is n?

35
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
36
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"

You might also like