Lec 10
Lec 10
Machine Learning
Lecture 10
Wang Xinchao
[email protected]
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Course Contents
• Introduction and Preliminaries (Xinchao)
– Introduction
– Data Engineering
– Introduction to Linear Algebra, Probability and Statistics
• Fundamental Machine Learning Algorithms I (Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Helen)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Xinchao)
– Performance Issues [Important] In the Final, no coding questions for Xinchao’s part!
– K-means Clustering
Despite you will see some in the tutorial, they won’t be testsed.
– Neural Networks
2
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
EE2211: Learning Outcome
A Summary of Module Content
• I am able to understand the formulation of a machine learning task
– Lecture 1 (feature extraction + classification)
– Lecture 4 to Lecture 9 (regression and classification)
– Lecture 11 and Lecture 12 (clustering and neural network)
• I am able to relate the fundamentals of linear algebra and probability to machine
learning
– Lecture 2 (recap of probability and linear algebra)
– Lecture 4 to Lecture 8 (regression and classification)
– Lecture 12 (neural network)
• I am able to prepare the data for supervised learning and unsupervised learning
– Lecture 1 (feature extraction), Page 26 to 31 [For supervised and unsupervised]
– Lecture 2 (data wrangling) [For supervised and unsupervised]
– Lecture 10 (Training/Validation/Test) [For supervised]
– Programming Exercises in tutorials
• I am able to evaluate the performance of a machine learning algorithm
– Lecture 5 to Lecture 9 (evaluate the difference between labels and predictions)
– Lecture 10 (evaluation metrics)
• I am able to implement regression and classification algorithms
– Lecture 5 to Lecture 9
3
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Dataset Partition:
– Training/Validation/Testing
• Cross Validation
• Evaluation Metrics
– Evaluating the quality of a trained classifier
4
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
A Real-world Scenario
• We would like to train a Random Forest for face
classification (i.e., to tell an image is a human face or not)
Faces Non-faces
5
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
A Real-world Scenario
• We would like to train a Random Forest for face
classification (i.e., to tell an image is a human face or not)
– We will have one datasetTraining, Validation
to train the Random Forestand Test
Suppose these data points are all we have, and we want to use them t
algorithm’s performance on new unseen data
6
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
A Real-world Scenario
• We would like to train a Random Forest for face
classification (i.e., to tell an image is a human face or not)
– We will have one dataset to train the Random Forest
– We will have tunable (hyper)parameters for the Random Forest.
For example, the number of trees in the Random Forest
• Shall we use 100 trees?
• Shall we use 200 trees?
• …
We need to decide on the parameter
……
Tree 1 Tree 2
7
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
A Real-world Scenario
• We would like to train a Random Forest for face
classification (i.e., to tell an image is a human face or not)
– We will have one dataset to train the Random Forest
– We will have tunable (hyper)parameters for the Random Forest.
For example, the number of trees in the Random Forest
• Shall we use 100 trees?
• Shall we use 200 trees?
• …
We need to decide on the parameter
– Once we decide the number of trees, we will the Random Forest
with the selected parameter on unseen test data.
Test Data
Yes! No!
8
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Training, Validation, and Test
• In real-world application,
Training, Validation and Test
– We don’t have test data, since they are unseen
– Imagine you develop a face detector app, you don’t know whom
you will test on
• In lab practice,
– We divide the dataset into three parts
Hidden from
Training !
Training set Validation set Test set
For training the ML models For validation: For testing the “real”
choosing the performance and
parameter or generalization
model
Validation
set
Test set
11 10
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
opyright EE, NUS. All Rights Reserved.
Training, Validation, and Test
For training the ML models For validation: For testing the “real”
choosing the performance and
parameter or generalization
model
Example: Assume I want to build a Random Forest. I have a parameter to decide: shall I have
• 100 Trees?
• 200 Trees?
11
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
ning, Validation and Test
Python Demo:
lec10.ipynb
For training the ML models For validation: For testing the “real”
choosing the performance and
parameter or generalization
model
8
Rights Reserved.
• Problem Setup
– Dataset used: IRISdataset
• Link: https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-dataset
– Training/Validation/Test: 100/25/25
– Machine Learning Task and Model: Polynomial regression
– Parameters to select: Order 1 to 10
In the Final, no coding questions for Xinchao’s part!
12
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation
4-fold cross validation
Test
Step 1: take out test set from the dataset
13
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation
4-fold cross validation
Test
Step 2: We partition the remaining part of the dataset (after taking out the test
set), into k equal parts (equal in terms of number of samples).
14
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation
4-fold cross validation
Test
15
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation Classifiers
4-fold cross validation Trained
Test
Fold 1 Train Train Train Validation 𝐶!! 𝐶"!
16
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• In practice, we do the k-fold cross validation Classifiers
4-fold cross validation Trained
Test
Fold 1 Train Train Train Validation 𝐶!! 𝐶"!
18
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
k-fold Cross Validation
• The test set contains the examples that the learning
algorithm has never seen before,
• So test performance shows how well our model
generalizes. Example:
20
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Dataset Partition:
– Training/Validation/Testing
• Cross Validation
• Evaluation Metrics
– Evaluating the quality of a trained classifier
21
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation
Evaluation Metrics Metrics
Regression
Mean Square Error Test samples
𝑛𝑛
𝛴𝛴𝑖𝑖=𝟏𝟏 𝑦𝑦𝑖𝑖 −𝑦𝑦� 𝑖𝑖 2
(MSE = )
𝑛𝑛
Mean Absolute Error
𝑛𝑛
𝛴𝛴𝑖𝑖=𝟏𝟏 |𝑦𝑦𝑖𝑖 −𝑦𝑦� 𝑖𝑖 |
(MAE = )
𝑛𝑛
where 𝑦𝑦𝑖𝑖 denotes the target
output and 𝑦𝑦�𝑖𝑖 denotes the
predicted output for sample 𝑖𝑖.
14
22
© Copyright EE, NUS. All Rights Reserved.
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Class-1: Positive Class
Class-2: Negative Class
Confusion Matrix
Class-1 Class-2
(predicted) (predicted)
Class-1
(actual)
7 (TP) 7 (FN)
Class-2 TP: True Positive
(actual) FN: False Negative (i.e., Type II Error)
2 (FP) 25 (TN) FP: False Positive (i.e., Type I Error)
TN: True Negative
23
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Class-1 Class-2
Evaluation Metrics (predicted) (predicted)
Class-1
Classification (actual)
7 (TP) 7 (FN)
Class-2
• How many samples in the dataset
(actual)
have the real label of Class-2? 2 (FP) 25 (TN)
• How many samples are there in total?
• How many sample are correctly classified? How many are incorrectly
classified?
24
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation
Evaluation Metrics Metrics
Classification
P Recall
(actual) TP FN TP/(TP+FN)
N
(actual) FP TN
Precision Accuracy
TP/(TP+FP) (TP+TN)/(TP+TN+FP+FN)
25
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681" 16
Evaluation Metrics
Classification
Cost Matrix for Binary Classification
!
𝐏 !
𝐍
(predicted) (predicted) Total cost:
𝐶%,% * TP +
P
𝐶%,' * FN +
(actual) 𝐶"," * TP 𝐶",$ * FN 𝐶',% * FP +
N 𝐶',' * TN
(actual) 𝐶$," * FP 𝐶$,$ * TN
Usually, 𝐶%,% and 𝐶',' are set to 0; 𝐶',% and 𝐶%,' may and may not equal
26
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
• Example of cost matrix
– Assume we would like to develop a self-driving car system
– We have an ML system that detects the pedestrians using camera,
by conducing a binary classification
• When it detects a person (positive class), the car should stop
• When no person is detected (negative class), the car keeps going
True Positive (cost 𝑪𝒑,𝒑 )
There is person, ML detects person and car stops
29
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Prediction function y = f(x)
sample N1 N2 P1 N3 P2 P3
input -4 -3 -2.5 -2 -1.5 -0.5
x
Prediction -1.1 -0.5 -0.1 0.2 0.6 0.9
y
Actual -1 -1 1 -1 1 1
Label
N
(actual) 𝑭𝑷 = 𝟏 𝑻𝑵 = 𝟐
30
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Prediction function y = f(x) We can change the threshold!
sample N1 N2 P1 N3 P2 P3
input -4 -3 -2.5 -2 -1.5 -0.5
x
Prediction -1.1 -0.5 -0.1 0.2 0.6 0.9
y
Actual -1 -1 1 -1 1 1
Label
N
(actual) 𝑭𝑷 = 𝟎 𝑻𝑵 = 𝟑
31
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification:
TP, FP, FN, TN will change wrt thresholds!
If threshold set to be y=0, If threshold set to be y=0.4,
N3, P2, P3 will be taken as +1 P2, P3 will be taken as +1
P1, N2, N1 will be taken as -1 N3, P1, N2, N1 will be taken as -1
#
𝐏 𝐍# #
𝐏 𝐍#
(predicted) (predicted) (predicted) (predicted)
P P
(actual) 𝑻𝑷 = 𝟐 𝑭𝑵 = 𝟏 (actual) 𝑻𝑷 = 𝟐 𝑭𝑵 = 𝟏
N N
(actual) 𝑭𝑷 = 𝟏 𝑻𝑵 = 𝟐 (actual) 𝑭𝑷 = 𝟎 𝑻𝑵 = 𝟑
32
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Evaluation Metrics
Classification
Confusion Matrix for Multicategory Classification
⁞ ⁞ ⁞ ⁞
𝑃( (actual) 𝑃(,&% 𝑃(,'% 𝑃(,(%
33
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Other Issues
• Computational speed and memory consumptions are also
important factors
– Especially for mobile or edge devices
• Other factors
– Parallelable, Modularity, Maintainability
34
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Practice Question
1. What is k?
2. What is n?
35
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
36
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"