Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26

Machine Learning
Basics
Lecture slides for Chapter 5 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
2016-09-26
TER 5. MACHINE LEARNING BASICS
Linear Regression
Linear regression example Optimization of w
3 0.55
2 0.50
0.45
1
MSE(train)
0.40
0
y
0.35
1
0.30
2 0.25
3 0.20
1.0 0.5 0.0 0.5 1.0 0.5 1.0 1.5
x1 w1
Figure 5.1
e 5.1: A linear regression problem, with a training set consisting of ten data p
containing one feature. Because there is only one feature, the weight vect
ns only a single parameter to learn, w . (Left)Observe that linear regression l
(Goodfellow 2016)
more parameters than training examples. We have little chance of ch
Underfitting and Overfitting in
tion that generalizes well when so many wildly different solutions ex
xample, the quadratic model is perfectly matched to the true struct
Polynomial Estimation
sk so it generalizes well to new data.
Underfitting Appropriate capacity Overfitting

y
y
x0 x0 x0
Figure 5.2
5.2: We fit three models to this example training set. The training da
(Goodfellow 2016)
TER 5. MACHINE LEARNING BASICS
Generalization and Capacity

Training error
Underfitting zone Overfitting zone
Generalization error
Error
Generalization gap
0 Optimal Capacity
Capacity
e 5.3: Typical relationship between capacity and error. Training and test
Figure
e differently. At the left end of the 5.3training error and generalization
graph, (Goodfellow 2016)
Training Set Size
CHAPTER 5. MACHINE LEARNING BASICS
3.5
3.0
Bayes error
Train (quadratic)
2.5
Error (MSE)
Test (quadratic)
2.0
Test (optimal capacity)
1.5
Train (optimal capacity)
1.0
0.5
0.0
Figure 5.4
0 1 2 3 4 5
10 10 10 10 10 10
Number of training examples
Optimal capacity (polynomial degree)
20
15
10
0
0 1 2 3 4 5
10 10 10 10 10 10
(Goodfellow 2016)
Number of training examples
of how we can control a model’s tendency to overfit or underfit via
can train a high-degree polynomial regression model with differen
Weight Decay
figure 5.5 for the results.
Underfitting Appropriate weight decay Overfitting

(Excessive λ) (Medium λ) (λ →()
y
y
x( x( x(
Figure
: We fit a high-degree polynomial 5.5
regression model to our example trai
(Goodfellow 2016)
etween the estimator and the true value of the parameter ✓. As is clear f
quation 5.54, evaluating the MSE incorporates both the bias and the varia
Bias and Variance

esirable estimators are those with small MSE and these are estimators
anage to keep both their bias and variance somewhat in check.
Underfitting zone Overfitting zone
Bias Generalization
error Variance
Optimal Capacity
capacity
gure 5.6: As capacity increases (x-axis), bias (dotted) tends to decrease and vari
Figure U-shaped
ashed) tends to increase, yielding another 5.6 curve for generalization error (
(Goodfellow 2016)
0 1
10
Decision Trees
00 01 11
R 5. MACHINE LEARNING BASICS
010 011 110 11
0 1
1110
10
00 01 11
010
010 011 110 111
00 01
1110 1111
0
011
010
1 110
00 01
0
11
011
10
1110 111 1111
Figure 5.7 1 110

(Goodfellow 2016)
Principal Components Analysis
20 20
10 10
x2
0 0
z2
10 10
20 20
20 10 0 10 20 20 10 0 10 20
x1 z1
Figure 5.8: PCA learns a linear projection that aligns the direction of greatest variance
with the axes of the new space. (Left)The original data consists of samples of x. In this
Figure
space, the variance might occur along 5.8
directions that are not axis-aligned. (Right)The
transformed data z = x> W now varies most along the axis z1 . The direction of second
most variance is now along z2 .
(Goodfellow 2016)
Curse of Dimensionality
HAPTER 5. MACHINE LEARNING BASICS
Figure 5.9: As the number of relevant dimensions

Figure 5.9 of the data increases (from left
ight), the number of configurations of interest may grow exponentially. (Left)In t
ne-dimensional example, we have one variable for which we only care to distinguish
egions of interest. With enough examples falling within each of these regions (each reg
orresponds to a cell in the illustration), learning algorithms can easily generalize correc
A straightforward way to generalize is to estimate the value of the target function
(Goodfellow wit
2016)
Nearest Neighbor
Figure
5.10: Illustration of how the nearest 5.10 algorithm breaks up the inpu
neighbor
gions. An example (represented here by a circle) within each region defi
(Goodfellow 2016)
the manifold to vary from one point to another. This often happens wh
nifold intersects itself. For example, a figure eight is a manifold that has a s
Manifold Learning
mension in most places but two dimensions at the intersection at the cent
2.5
2.0
1.5
1.0
0.5
0.0
0.5
1.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
ure 5.11: Data sampled from a distribution in a two-dimensional space that is ac

Figure 5.11
centrated near a one-dimensional manifold, like a twisted string. The solid line ind
underlying manifold that the learner should infer. (Goodfellow 2016)
Uniformly Sampled Images
Figure 5.12
(Goodfellow 2016)
QMUL Dataset
Figure
5.13: Training examples from the QMUL5.13Multiview Face Dataset (Gong et al.
ich the subjects were asked to move in such a way as to cover the two-dimen
ld corresponding to two angles of rotation. We would like learning algorithm
o discover and disentangle such manifold coordinates. Figure 20.6 illustrates
(Goodfellow 2016)

Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26

Uploaded by

Copyright:

Available Formats

Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26

Uploaded by

Copyright:

Available Formats

Machine Learning

Underfitting Appropriate capacity Overfitting

Generalization and Capacity

Underfitting Appropriate weight decay Overfitting

Bias and Variance

Underfitting zone Overfitting zone

010 011 110 11

1110 111 1111

Figure 5.7 1 110

Figure 5.9: As the number of relevant dimensions

ure 5.11: Data sampled from a distribution in a two-dimensional space that is ac

You might also like