0% found this document useful (0 votes)
76 views94 pages

EE2211 Introduction To Machine Learning

Uploaded by

syirah97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views94 pages

EE2211 Introduction To Machine Learning

Uploaded by

syirah97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 94

EE2211 Introduction to Machine

Learning
Lecture 7

Thomas Yeo
[email protected]

Electrical and Computer Engineering Department


National University of Singapore

Acknowledgement: EE2211 development team


Thomas Yeo, Kar-Ann Toh, Chen Khong Tham, Helen Zhou, Robby Tan & Haizhou

© Copyright EE, NUS. All Rights Reserved.


Course Contents
• Introduction and Preliminaries (Haizhou)
– Introduction
– Data Engineering
– Introduction to Probability and Statistics
• Fundamental Machine Learning Algorithms I (Kar-Ann / Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Thomas)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Haizhou)
– Performance Issues
– K-means Clustering
– Neural Networks

2
© Copyright EE, NUS. All Rights Reserved.
Fundamental ML Algorithms:
Overfitting, Bias-Variance Tradeoff

Module III Contents


• Overfitting, underfitting & model complexity
• Regularization
• Bias-variance trade-off
• Loss function
• Optimization
• Gradient descent
• Decision trees
• Random forest

3
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set {, from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

4
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set {, from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

5
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set {, from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

6
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set {, from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

7
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

8
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

9
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

10
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

11
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Polynomial
• is 1-D (or more than 1-D) & is 1-D
• Polynomial relationship between &
• Quadratic illustration (4 training samples, is 1-D):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

12
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Polynomial
• is 1-D (or more than 1-D) & is 1-D
• Polynomial relationship between &
• Quadratic illustration (4 training samples, is 1-D):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

13
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test


sets
• Important goal of regression: prediction on new
unseen data, i.e., test set
• Why is test set important for evaluation?

14
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test


sets
• Important goal of regression: prediction on new
unseen data, i.e., test set
• Why is test set important for evaluation?

15
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test


sets
• Important goal of regression: prediction on new
unseen data, i.e., test set

16
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test


sets
• Important goal of regression: prediction on new
unseen data, i.e., test set
• Why is test set important for evaluation?

17
© Copyright EE, NUS. All Rights Reserved.
Questions?

18
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

19
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

20
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

21
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

22
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example
Big Prediction Error

Training Test Set


Set Fit Fit
Order 9 Good Bad
Very Big Prediction Error Order 1 Bad Bad
Order 2 Good Good

23
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example
Big Prediction Error

• If we take one of the blue


lines and compute the
square of its length, this is
called “squared error” for
that particular data point
Very Big Prediction Error • If we average squared
errors across all the red
crosses, it’s called mean
squared error (MSE) in the
test set

24
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

25
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

26
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

27
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

28
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

29
© Copyright EE, NUS. All Rights Reserved.
“Just Nice”

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

30
© Copyright EE, NUS. All Rights Reserved.
“Just Nice”

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

31
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting

Training Test Set


Set Fit Fit
Order 9 Good Bad
Overfitting
Underfitting Order 1 Bad Bad
Order 2 Good Good

32
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training examples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate the 10 unknowns
• Solutions
– Use simpler models (e.g., lower order polynomial)
– Use regularization (see next part of lecture)

33
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training examples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate the 10 unknowns
• Solutions
– Use simpler models (e.g., lower order polynomial)
– Use regularization (see next part of lecture)

34
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training samples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate 10 unknowns well
• Solutions

35
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training samples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate 10 unknowns well
• Solutions
– Use simpler models (e.g., lower order polynomial)
– Use regularization (see next part of lecture)

36
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Underfitting is the inability of trained model to predict the
targets in the training set
• Reason 1
– Model is too simple for the data
– Previous example: Fit order 1 polynomial to 10 data points
that came from an order 2 polynomial
– Solution: Try more complex model
• Reason 2
– Features are not informative enough
– Solution: Try to develop more informative features

37
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Underfitting is the inability of trained model to predict the
targets in the training set
• Reason 1
– Model is too simple for the data
– Previous example: Fit order 1 polynomial to 10 data points
that came from an order 2 polynomial
– Solution: Try more complex model
• Reason 2
– Features are not informative enough
– Solution: Try to develop more informative features

38
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Underfitting is the inability of trained model to predict the
targets in the training set
• Reason 1
– Model is too simple for the data
– Previous example: Fit order 1 polynomial to 10 data points
that came from an order 2 polynomial
– Solution: Try more complex model
• Reason 2
– Features are not informative enough
– Solution: Try to develop more informative features

39
© Copyright EE, NUS. All Rights Reserved.
Overfitting / Underfitting Schematic

Underfitting Overfitting
regime regime

40
© Copyright EE, NUS. All Rights Reserved.
Questions?

41
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.

42
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem 
– For example, estimate 10th order polynomial with just 5 datapoints

43
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem 
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting

44
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem 
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

45
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem 
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

46
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem 
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

• For , matrix becomes invertible (Motivation 1)

47
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem 
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

• might also perform better in test set, i.e., reduces


overfitting (Motivation 2) – will show example later
48
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

49
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

Cost function quantifying


data fitting error in training
set

50
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

Cost function quantifying Regularization


data fitting error in training
set

51
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

52
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

L2 - Regularization

53
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

L2 - Regularization

• Encourage to be small (also called shrinkage or weight-
decay)

54
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

L2 - Regularization

• Encourage to be small (also called shrinkage or weight-
decay) => constrain model complexity

55
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

L2 - Regularization

• Encourage to be small (also called shrinkage or weight-
decay) => constrain model complexity
• More generally, most machine learning algorithms can be
formulated as the following optimization problem

56
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

L2 - Regularization

• Encourage to be small (also called shrinkage or weight-
decay) => constrain model complexity
• More generally, most machine learning algorithms can be
formulated as the following optimization problem

• Data-Loss(w) quantifies fitting error to training set given


parameters w: smaller error => better fit to training data

57
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

L2 - Regularization

• Encourage to be small (also called shrinkage or weight-
decay) => constrain model complexity
• More generally, most machine learning algorithms can be
formulated as the following optimization problem

• Data-Loss(w) quantifies fitting error to training set given


parameters w: smaller error => better fit to training data
• Regularization(w) penalizes more complex models
58
© Copyright EE, NUS. All Rights Reserved.
Regularization Example

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9 + = 1 Good Good

59
© Copyright EE, NUS. All Rights Reserved.
Regularization Example

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9, = 1 Good

60
© Copyright EE, NUS. All Rights Reserved.
Regularization Example

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9, = 1 Good Good

61
© Copyright EE, NUS. All Rights Reserved.
Questions?

62
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:

63
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
High Variance
Low Bias: blue
Low Bias predictions on average
close to red target
High Variance: large
variability among
predictions

64
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
Low Variance High Variance
Low Bias: blue Low Bias: blue
predictions on average Low Bias predictions on average
close to red target close to red target
Low Variance: low High Variance: large
variability among variability among
predictions predictions

65
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
Low Variance High Variance
Low Bias: blue Low Bias: blue
predictions on average Low Bias predictions on average
close to red target close to red target
Low Variance: low High Variance: large
variability among variability among
predictions predictions

High Bias: blue


High Bias

predictions on average
not close to red target
Low Variance: Low
variability among
predictions

66
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
Low Variance High Variance
Low Bias: blue Low Bias: blue
predictions on average Low Bias predictions on average
close to red target close to red target
Low Variance: low High Variance: large
variability among variability among
predictions predictions

High Bias: blue High Bias: blue


High Bias

predictions on average predictions on average


not close to red target not close to red target
Low Variance: Low High Variance: high
variability among variability among
predictions predictions

67
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Trade off
• Test error = Bias Squared + Variance + Irreducible Noise

68
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Trade off
• Test error = Bias Squared + Variance + Irreducible Noise

69
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Trade off
• Test error = Bias Squared + Variance + Irreducible Noise

High Bias High Variance


Low Variance Low Bias

70
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 5 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

71
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

72
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

73
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

74
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

75
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

76
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

77
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

78
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias Order 2
Achieves Lower
• Fit with order 4 polynomial: high variance, low bias Test Error
4th Order Polynomials 2nd Order Polynomials

79
© Copyright EE, NUS. All Rights Reserved.
Questions?

80
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

81
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

82
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

83
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

84
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

85
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

86
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

87
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

88
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

89
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

90
© Copyright EE, NUS. All Rights Reserved.
Bias-Variance Decomposition Theorem

91
© Copyright EE, NUS. All Rights Reserved.
Summary
• Overfitting, underfitting & model complexity
– Overfitting: low error in training set, high error in test set
– Underfitting: high error in both training & test sets
– Overly complex models can overfit; Overly simple models can underfit
• Regularization (e.g., L2 regularization)
– Solve “ill-posed” problem (e.g., more unknowns than data points)
– Reduce overfitting
• Bias-Variance Tradeoff
– Test error = Bias Squared + Variance + Irreducible Noise
– Interpretation:
• Overly complex models can have high variance, low bias
• Overly simple models can have low variance, high bias
• Interpretation is not always true (see tutorial)

92
© Copyright EE, NUS. All Rights Reserved.
Summary
• Overfitting, underfitting & model complexity
– Overfitting: low error in training set, high error in test set
– Underfitting: high error in both training & test sets
– Overly complex models can overfit; Overly simple models can underfit
• Regularization (e.g., L2 regularization)
– Solve “ill-posed” problem (e.g., more unknowns than data points)
– Reduce overfitting
• Bias-Variance Tradeoff
– Test error = Bias Squared + Variance + Irreducible Noise
– Interpretation:
• Overly complex models can have high variance, low bias
• Overly simple models can have low variance, high bias
• Interpretation is not always true (see tutorial)

93
© Copyright EE, NUS. All Rights Reserved.
Summary
• Overfitting, underfitting & model complexity
– Overfitting: low error in training set, high error in test set
– Underfitting: high error in both training & test sets
– Overly complex models can overfit; Overly simple models can underfit
• Regularization (e.g., L2 regularization)
– Solve “ill-posed” problem (e.g., more unknowns than data points)
– Reduce overfitting
• Bias-Variance Decomposition Theorem
– Test error = Bias Squared + Variance + Irreducible Noise
– Can be interpreted as trading off bias & variance:
• Overly complex models can have high variance, low bias
• Overly simple models can have low variance, high bias
• Interpretation of Bias-Variance tradeoff is not always true (see tutorial)

94
© Copyright EE, NUS. All Rights Reserved.

You might also like