0% found this document useful (0 votes)

76 views94 pages

EE2211 Introduction To Machine Learning

Uploaded by

syirah97

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views94 pages

EE2211 Introduction To Machine Learning

Uploaded by

syirah97

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 94

EE2211 Introduction to Machine

Learning
Lecture 7

Thomas Yeo
[email protected]

Electrical and Computer Engineering Department

National University of Singapore

Acknowledgement: EE2211 development team

Thomas Yeo, Kar-Ann Toh, Chen Khong Tham, Helen Zhou, Robby Tan & Haizhou

© Copyright EE, NUS. All Rights Reserved.

Course Contents
• Introduction and Preliminaries (Haizhou)
– Introduction
– Data Engineering
– Introduction to Probability and Statistics
• Fundamental Machine Learning Algorithms I (Kar-Ann / Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Thomas)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Haizhou)
– Performance Issues
– K-means Clustering
– Neural Networks

2
© Copyright EE, NUS. All Rights Reserved.
Fundamental ML Algorithms:
Overfitting, Bias-Variance Tradeoff

Module III Contents

• Overfitting, underfitting & model complexity
• Regularization
• Bias-variance trade-off
• Loss function
• Optimization
• Gradient descent
• Decision trees
• Random forest

3
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set {, from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

4
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set {, from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

5
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set {, from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

6
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set {, from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

7
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

8
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

9
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

10
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

11
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Polynomial
• is 1-D (or more than 1-D) & is 1-D
• Polynomial relationship between &
• Quadratic illustration (4 training samples, is 1-D):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

12
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Polynomial
• is 1-D (or more than 1-D) & is 1-D
• Polynomial relationship between &
• Quadratic illustration (4 training samples, is 1-D):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

13
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test

sets
• Important goal of regression: prediction on new
unseen data, i.e., test set
• Why is test set important for evaluation?

14
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test

sets
• Important goal of regression: prediction on new
unseen data, i.e., test set
• Why is test set important for evaluation?

15
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test

sets
• Important goal of regression: prediction on new
unseen data, i.e., test set

16
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test

sets
• Important goal of regression: prediction on new
unseen data, i.e., test set
• Why is test set important for evaluation?

17
© Copyright EE, NUS. All Rights Reserved.
Questions?

18
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

19
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

20
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

21
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

22
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example
Big Prediction Error

Training Test Set

Set Fit Fit
Order 9 Good Bad
Very Big Prediction Error Order 1 Bad Bad
Order 2 Good Good

23
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example
Big Prediction Error

• If we take one of the blue

lines and compute the
square of its length, this is
called “squared error” for
that particular data point
Very Big Prediction Error • If we average squared
errors across all the red
crosses, it’s called mean
squared error (MSE) in the
test set

24
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

25
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

26
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

27
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

28
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

29
© Copyright EE, NUS. All Rights Reserved.
“Just Nice”

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

30
© Copyright EE, NUS. All Rights Reserved.
“Just Nice”

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

31
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting

Training Test Set

Set Fit Fit
Order 9 Good Bad
Overfitting
Underfitting Order 1 Bad Bad
Order 2 Good Good

32
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training examples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate the 10 unknowns
• Solutions
– Use simpler models (e.g., lower order polynomial)
– Use regularization (see next part of lecture)

33
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training examples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate the 10 unknowns
• Solutions
– Use simpler models (e.g., lower order polynomial)
– Use regularization (see next part of lecture)

34
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training samples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate 10 unknowns well
• Solutions

35
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training samples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate 10 unknowns well
• Solutions
– Use simpler models (e.g., lower order polynomial)
– Use regularization (see next part of lecture)

36
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Underfitting is the inability of trained model to predict the
targets in the training set
• Reason 1
– Model is too simple for the data
– Previous example: Fit order 1 polynomial to 10 data points
that came from an order 2 polynomial
– Solution: Try more complex model
• Reason 2
– Features are not informative enough
– Solution: Try to develop more informative features

37
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Underfitting is the inability of trained model to predict the
targets in the training set
• Reason 1
– Model is too simple for the data
– Previous example: Fit order 1 polynomial to 10 data points
that came from an order 2 polynomial
– Solution: Try more complex model
• Reason 2
– Features are not informative enough
– Solution: Try to develop more informative features

38
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Underfitting is the inability of trained model to predict the
targets in the training set
• Reason 1
– Model is too simple for the data
– Previous example: Fit order 1 polynomial to 10 data points
that came from an order 2 polynomial
– Solution: Try more complex model
• Reason 2
– Features are not informative enough
– Solution: Try to develop more informative features

39
© Copyright EE, NUS. All Rights Reserved.
Overfitting / Underfitting Schematic

Underfitting Overfitting
regime regime

40
© Copyright EE, NUS. All Rights Reserved.
Questions?

41
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.

42
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints

43
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting

44
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

45
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

46
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

• For , matrix becomes invertible (Motivation 1)

47
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

• might also perform better in test set, i.e., reduces

overfitting (Motivation 2) – will show example later
48
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

49
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

Cost function quantifying

data fitting error in training
set

50
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

Cost function quantifying Regularization

data fitting error in training
set

51
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

52
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

L2 - Regularization
•

53
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

L2 - Regularization
•
• Encourage to be small (also called shrinkage or weight-
decay)

L2 - Regularization
•
• Encourage to be small (also called shrinkage or weight-
decay) => constrain model complexity

L2 - Regularization
•
• Encourage to be small (also called shrinkage or weight-
decay) => constrain model complexity
• More generally, most machine learning algorithms can be
formulated as the following optimization problem

• Data-Loss(w) quantifies fitting error to training set given

parameters w: smaller error => better fit to training data

• Data-Loss(w) quantifies fitting error to training set given

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9 + = 1 Good Good

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9, = 1 Good

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9, = 1 Good Good

63
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
High Variance
Low Bias: blue
Low Bias predictions on average
close to red target
High Variance: large
variability among
predictions

64
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
Low Variance High Variance
Low Bias: blue Low Bias: blue
predictions on average Low Bias predictions on average
close to red target close to red target
Low Variance: low High Variance: large
variability among variability among
predictions predictions

65
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
Low Variance High Variance
Low Bias: blue Low Bias: blue
predictions on average Low Bias predictions on average
close to red target close to red target
Low Variance: low High Variance: large
variability among variability among
predictions predictions

High Bias: blue

High Bias

predictions on average
not close to red target
Low Variance: Low
variability among
predictions

66
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
Low Variance High Variance
Low Bias: blue Low Bias: blue
predictions on average Low Bias predictions on average
close to red target close to red target
Low Variance: low High Variance: large
variability among variability among
predictions predictions

High Bias: blue High Bias: blue

High Bias

predictions on average predictions on average

not close to red target not close to red target
Low Variance: Low High Variance: high
variability among variability among
predictions predictions

High Bias High Variance

Low Variance Low Bias

70
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 5 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

71
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

72
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

73
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

74
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

75
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

76
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

77
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

78
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias Order 2
Achieves Lower
• Fit with order 4 polynomial: high variance, low bias Test Error
4th Order Polynomials 2nd Order Polynomials

91
© Copyright EE, NUS. All Rights Reserved.
Summary
• Overfitting, underfitting & model complexity
– Overfitting: low error in training set, high error in test set
– Underfitting: high error in both training & test sets
– Overly complex models can overfit; Overly simple models can underfit
• Regularization (e.g., L2 regularization)
– Solve “ill-posed” problem (e.g., more unknowns than data points)
– Reduce overfitting
• Bias-Variance Tradeoff
– Test error = Bias Squared + Variance + Irreducible Noise
– Interpretation:
• Overly complex models can have high variance, low bias
• Overly simple models can have low variance, high bias
• Interpretation is not always true (see tutorial)

92
© Copyright EE, NUS. All Rights Reserved.
Summary
• Overfitting, underfitting & model complexity
– Overfitting: low error in training set, high error in test set
– Underfitting: high error in both training & test sets
– Overly complex models can overfit; Overly simple models can underfit
• Regularization (e.g., L2 regularization)
– Solve “ill-posed” problem (e.g., more unknowns than data points)
– Reduce overfitting
• Bias-Variance Tradeoff
– Test error = Bias Squared + Variance + Irreducible Noise
– Interpretation:
• Overly complex models can have high variance, low bias
• Overly simple models can have low variance, high bias
• Interpretation is not always true (see tutorial)

93
© Copyright EE, NUS. All Rights Reserved.
Summary
• Overfitting, underfitting & model complexity
– Overfitting: low error in training set, high error in test set
– Underfitting: high error in both training & test sets
– Overly complex models can overfit; Overly simple models can underfit
• Regularization (e.g., L2 regularization)
– Solve “ill-posed” problem (e.g., more unknowns than data points)
– Reduce overfitting
• Bias-Variance Decomposition Theorem
– Test error = Bias Squared + Variance + Irreducible Noise
– Can be interpreted as trading off bias & variance:
• Overly complex models can have high variance, low bias
• Overly simple models can have low variance, high bias
• Interpretation of Bias-Variance tradeoff is not always true (see tutorial)

CEG5301: Machine Learning With Applications: Part I: Fundamentals of Neural Networks
No ratings yet
CEG5301: Machine Learning With Applications: Part I: Fundamentals of Neural Networks
57 pages
LF AI & Data Getting Involved Guide
No ratings yet
LF AI & Data Getting Involved Guide
82 pages
Discrete-Time Markov Chains: He Shuangchi
No ratings yet
Discrete-Time Markov Chains: He Shuangchi
61 pages
NUS - MA1505 (2012) - Chapter 1
No ratings yet
NUS - MA1505 (2012) - Chapter 1
7 pages
CV7001 Part 1 Note
No ratings yet
CV7001 Part 1 Note
267 pages
F2F Lecture 1 Slides
No ratings yet
F2F Lecture 1 Slides
33 pages
Lecture 1 - Introduction (DONE!!)
No ratings yet
Lecture 1 - Introduction (DONE!!)
33 pages
EE2211 Introduction To Machine Learning
No ratings yet
EE2211 Introduction To Machine Learning
99 pages
Section 3.9 Governing Flow Equation For Consolidation Analysis
No ratings yet
Section 3.9 Governing Flow Equation For Consolidation Analysis
15 pages
Midterm Trial
No ratings yet
Midterm Trial
15 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
IE 5004 Lecture 2
No ratings yet
IE 5004 Lecture 2
45 pages
NUS - MA1505 (2012) - Chapter 7
No ratings yet
NUS - MA1505 (2012) - Chapter 7
42 pages
Transition Words List
No ratings yet
Transition Words List
4 pages
MA1301 Chapter 2
No ratings yet
MA1301 Chapter 2
104 pages
MA1505 Tutorial 1 Questions
No ratings yet
MA1505 Tutorial 1 Questions
4 pages
Shallow Foundations - : Bearing Capacity and Limit Analysis
No ratings yet
Shallow Foundations - : Bearing Capacity and Limit Analysis
40 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
MA1511 Engineering Calculus Chapter 5 Infinite Series: 5.1 Sequences
No ratings yet
MA1511 Engineering Calculus Chapter 5 Infinite Series: 5.1 Sequences
20 pages
MA1505 Tutorial Solution 1
No ratings yet
MA1505 Tutorial Solution 1
6 pages
Lecture Five Radial-Basis Function Networks: Associate Professor
No ratings yet
Lecture Five Radial-Basis Function Networks: Associate Professor
64 pages
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
No ratings yet
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
34 pages
Ch13 5-ConditionalRandomFields
No ratings yet
Ch13 5-ConditionalRandomFields
57 pages
MA1301 Tutorial 4 Solution
No ratings yet
MA1301 Tutorial 4 Solution
20 pages
Mathematics 1 Assignment 1
No ratings yet
Mathematics 1 Assignment 1
13 pages
MH1811 Tutorial 6 SS 2020 Seq
No ratings yet
MH1811 Tutorial 6 SS 2020 Seq
2 pages
MSC Ge Modules Description
No ratings yet
MSC Ge Modules Description
4 pages
W1 - Lecture - Vectors - Part 1 (Updated)
No ratings yet
W1 - Lecture - Vectors - Part 1 (Updated)
39 pages
MA1505 12S2 Mid-Term Test Information
No ratings yet
MA1505 12S2 Mid-Term Test Information
6 pages
5 - Ce6101-Cam Clay Model-03102020
No ratings yet
5 - Ce6101-Cam Clay Model-03102020
30 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
Logistic回归模型方法与应用全书 PDF
No ratings yet
Logistic回归模型方法与应用全书 PDF
271 pages
Leture 01
No ratings yet
Leture 01
105 pages
Mid Term Exam
No ratings yet
Mid Term Exam
3 pages
N13072012 2
No ratings yet
N13072012 2
85 pages
1 s2.0 S1342937X2200123X Main
No ratings yet
1 s2.0 S1342937X2200123X Main
17 pages
Nonlinear System Identification - A Bibliography
No ratings yet
Nonlinear System Identification - A Bibliography
71 pages
Quadratic Programming
No ratings yet
Quadratic Programming
19 pages
JAldred JDay Geopolymer-Concrete Singapore-2012
No ratings yet
JAldred JDay Geopolymer-Concrete Singapore-2012
14 pages
Math 2421 Probability Textbook
No ratings yet
Math 2421 Probability Textbook
197 pages
4.1 Fuzzy Logic Architecture and Set Theory
No ratings yet
4.1 Fuzzy Logic Architecture and Set Theory
16 pages
FEM Nonlinear FEM
No ratings yet
FEM Nonlinear FEM
15 pages
Formulas For Torsional Properties and Stresses in Thin-Walled Closed and Open Cross Sections
No ratings yet
Formulas For Torsional Properties and Stresses in Thin-Walled Closed and Open Cross Sections
10 pages
Data Science Engineering Full Time Program Brochure
No ratings yet
Data Science Engineering Full Time Program Brochure
21 pages
MA1511 Chapter 4
No ratings yet
MA1511 Chapter 4
18 pages
Deep Foundations: Pile Types and Axial Capacity
No ratings yet
Deep Foundations: Pile Types and Axial Capacity
58 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
20ICSMGE Vol 1 Keynotes
No ratings yet
20ICSMGE Vol 1 Keynotes
643 pages
Lec 1 Error Analysis
No ratings yet
Lec 1 Error Analysis
29 pages
1 s2.0 S1568494623000844 Main
No ratings yet
1 s2.0 S1568494623000844 Main
18 pages
International Society For Soil Mechanics and Geotechnical Engineering
No ratings yet
International Society For Soil Mechanics and Geotechnical Engineering
9 pages
CP22 - NEPRA QTR Dec 22
No ratings yet
CP22 - NEPRA QTR Dec 22
26 pages
BFC10103 Static and Dynamic
No ratings yet
BFC10103 Static and Dynamic
53 pages
M - Tech Model Questions
No ratings yet
M - Tech Model Questions
124 pages
PM5114 IVLE Lecture1
No ratings yet
PM5114 IVLE Lecture1
23 pages
Lecture 5 - FBDs
No ratings yet
Lecture 5 - FBDs
24 pages
Tutorial 1 Question
No ratings yet
Tutorial 1 Question
3 pages
Numerical Modeling of Ground-Penetrating Radar in 2-D Using MATLAB
No ratings yet
Numerical Modeling of Ground-Penetrating Radar in 2-D Using MATLAB
12 pages
EE2211 Lecture 7
No ratings yet
EE2211 Lecture 7
43 pages
Lecture 7 - Overfitting, Bias-Variance Trade Off (DONE!!) PDF
No ratings yet
Lecture 7 - Overfitting, Bias-Variance Trade Off (DONE!!) PDF
42 pages
The Use of Road Management Systems For Optimal Road Asset Management M I Pinard, G Rohde and R Frank
No ratings yet
The Use of Road Management Systems For Optimal Road Asset Management M I Pinard, G Rohde and R Frank
16 pages
CU-2021 B.A. (General) Sociology Semester-IV Paper-CC4-GE4 QP
No ratings yet
CU-2021 B.A. (General) Sociology Semester-IV Paper-CC4-GE4 QP
3 pages
Quality Control
No ratings yet
Quality Control
19 pages
Business KKN
No ratings yet
Business KKN
52 pages
To Pool or Not To Pool: Homogeneous Versus Heterogeneous Estimators Applied To Cigarette Demand
No ratings yet
To Pool or Not To Pool: Homogeneous Versus Heterogeneous Estimators Applied To Cigarette Demand
10 pages
Stat 330 Solution To Homework 4 1 Probability Mass Function
No ratings yet
Stat 330 Solution To Homework 4 1 Probability Mass Function
3 pages
Effect of Superheated Steam and Convection Roasting On Changes in Physical Properties of Cocoa Bean (Theobroma Cacao)
No ratings yet
Effect of Superheated Steam and Convection Roasting On Changes in Physical Properties of Cocoa Bean (Theobroma Cacao)
6 pages
Coping Mechanisms and Academic Performance of 12th Grade Students During The COVID 19 Pandemic
No ratings yet
Coping Mechanisms and Academic Performance of 12th Grade Students During The COVID 19 Pandemic
9 pages
Text Data Mining: A Case Study: Charles Wesley Ford, Chia-Chu Chiang, Hao Wu, Radhika R. Chilka, and John R. Talburt
No ratings yet
Text Data Mining: A Case Study: Charles Wesley Ford, Chia-Chu Chiang, Hao Wu, Radhika R. Chilka, and John R. Talburt
6 pages
Ahmad Noor Ud Din-MPM153004 With Questionnaire
No ratings yet
Ahmad Noor Ud Din-MPM153004 With Questionnaire
81 pages
A Working Guide To Boosted Regression Trees: J. Elith, J. R. Leathwick and T. Hastie
No ratings yet
A Working Guide To Boosted Regression Trees: J. Elith, J. R. Leathwick and T. Hastie
12 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
Strat 3D
100% (1)
Strat 3D
65 pages
Chapter 5 Discrete Probability Distributions True or False
No ratings yet
Chapter 5 Discrete Probability Distributions True or False
17 pages
Guidelines Project STA404 - Students'
100% (1)
Guidelines Project STA404 - Students'
8 pages
S No. Buyer / Non-Buyer Durability Light Weight Low Cost Rot Resistance
No ratings yet
S No. Buyer / Non-Buyer Durability Light Weight Low Cost Rot Resistance
3 pages
Psychological Statistics Syllabus Sp16
No ratings yet
Psychological Statistics Syllabus Sp16
8 pages
Basics of Statistics
No ratings yet
Basics of Statistics
74 pages
Indian Institute of Technology: MS 5031: Data Analysis Applications in Class Assignment-1 October 24, 2016
No ratings yet
Indian Institute of Technology: MS 5031: Data Analysis Applications in Class Assignment-1 October 24, 2016
2 pages
Applications of Mathematics in Science: Abstarct
No ratings yet
Applications of Mathematics in Science: Abstarct
3 pages
V01-Analysis of Variance
100% (1)
V01-Analysis of Variance
991 pages
Lesson 1-4 PR2 Q2
No ratings yet
Lesson 1-4 PR2 Q2
113 pages
Production Management Unit 5
No ratings yet
Production Management Unit 5
73 pages
25712-Article Text-30064-1-10-20181004
No ratings yet
25712-Article Text-30064-1-10-20181004
8 pages
Understand Principles of The Empirical Rule
No ratings yet
Understand Principles of The Empirical Rule
3 pages
Burns Et Al. 2017
No ratings yet
Burns Et Al. 2017
11 pages
Educational Statistics KCA Past Paper 3
No ratings yet
Educational Statistics KCA Past Paper 3
4 pages
Box Plot
No ratings yet
Box Plot
4 pages
ANOVA 2 Way
No ratings yet
ANOVA 2 Way
9 pages
Moment PDF
No ratings yet
Moment PDF
5 pages

EE2211 Introduction To Machine Learning

Uploaded by

EE2211 Introduction To Machine Learning

Uploaded by

EE2211 Introduction to Machine

Electrical and Computer Engineering Department

Acknowledgement: EE2211 development team

© Copyright EE, NUS. All Rights Reserved.

Module III Contents

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test

Training Test Set

Training Test Set

Training Test Set

• If we take one of the blue

Training Test Set

Training Test Set

Training Test Set

Training Test Set

Training Test Set

Training Test Set

Training Test Set

Training Test Set

• Minimizing with respect to , primal solution is

• Minimizing with respect to , primal solution is

• For , matrix becomes invertible (Motivation 1)

• Minimizing with respect to , primal solution is

• might also perform better in test set, i.e., reduces

Cost function quantifying

Cost function quantifying Regularization

• Data-Loss(w) quantifies fitting error to training set given

• Data-Loss(w) quantifies fitting error to training set given

High Bias: blue

High Bias: blue High Bias: blue

predictions on average predictions on average

High Bias High Variance

You might also like