Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26
Basics
Lecture slides for Chapter 5 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
2016-09-26
TER 5. MACHINE LEARNING BASICS
Linear Regression
Linear regression example Optimization of w
3 0.55
2 0.50
0.45
1
MSE(train)
0.40
0
y
0.35
1
0.30
2 0.25
3 0.20
1.0 0.5 0.0 0.5 1.0 0.5 1.0 1.5
x1 w1
Figure 5.1
e 5.1: A linear regression problem, with a training set consisting of ten data p
containing one feature. Because there is only one feature, the weight vect
ns only a single parameter to learn, w . (Left)Observe that linear regression l
(Goodfellow 2016)
more parameters than training examples. We have little chance of ch
Underfitting and Overfitting in
tion that generalizes well when so many wildly different solutions ex
xample, the quadratic model is perfectly matched to the true struct
Polynomial Estimation
sk so it generalizes well to new data.
y
x0 x0 x0
Figure 5.2
5.2: We fit three models to this example training set. The training da
(Goodfellow 2016)
TER 5. MACHINE LEARNING BASICS
Generalization gap
0 Optimal Capacity
Capacity
e 5.3: Typical relationship between capacity and error. Training and test
Figure
e differently. At the left end of the 5.3training error and generalization
graph, (Goodfellow 2016)
Training Set Size
CHAPTER 5. MACHINE LEARNING BASICS
3.5
3.0
Bayes error
Train (quadratic)
2.5
Error (MSE)
Test (quadratic)
2.0
Test (optimal capacity)
1.5
Train (optimal capacity)
1.0
0.5
0.0
Figure 5.4
0 1 2 3 4 5
10 10 10 10 10 10
Number of training examples
Optimal capacity (polynomial degree)
20
15
10
0
0 1 2 3 4 5
10 10 10 10 10 10
(Goodfellow 2016)
Number of training examples
of how we can control a model’s tendency to overfit or underfit via
can train a high-degree polynomial regression model with differen
Weight Decay
figure 5.5 for the results.
y
x( x( x(
Figure
: We fit a high-degree polynomial 5.5
regression model to our example trai
(Goodfellow 2016)
etween the estimator and the true value of the parameter ✓. As is clear f
quation 5.54, evaluating the MSE incorporates both the bias and the varia
Bias Generalization
error Variance
Optimal Capacity
capacity
gure 5.6: As capacity increases (x-axis), bias (dotted) tends to decrease and vari
Figure U-shaped
ashed) tends to increase, yielding another 5.6 curve for generalization error (
(Goodfellow 2016)
0 1
10
Decision Trees
00 01 11
R 5. MACHINE LEARNING BASICS
0 1
1110
10
00 01 11
010
010 011 110 111
00 01
1110 1111
0
011
010
1 110
00 01
0
11
011
10
20 20
10 10
x2
0 0
z2
10 10
20 20
20 10 0 10 20 20 10 0 10 20
x1 z1
Figure 5.8: PCA learns a linear projection that aligns the direction of greatest variance
with the axes of the new space. (Left)The original data consists of samples of x. In this
Figure
space, the variance might occur along 5.8
directions that are not axis-aligned. (Right)The
transformed data z = x> W now varies most along the axis z1 . The direction of second
most variance is now along z2 .
(Goodfellow 2016)
Curse of Dimensionality
HAPTER 5. MACHINE LEARNING BASICS
Figure
5.10: Illustration of how the nearest 5.10 algorithm breaks up the inpu
neighbor
gions. An example (represented here by a circle) within each region defi
(Goodfellow 2016)
the manifold to vary from one point to another. This often happens wh
nifold intersects itself. For example, a figure eight is a manifold that has a s
Manifold Learning
mension in most places but two dimensions at the intersection at the cent
2.5
2.0
1.5
1.0
0.5
0.0
0.5
1.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Figure 5.12
(Goodfellow 2016)
QMUL Dataset
Figure
5.13: Training examples from the QMUL5.13Multiview Face Dataset (Gong et al.
ich the subjects were asked to move in such a way as to cover the two-dimen
ld corresponding to two angles of rotation. We would like learning algorithm
o discover and disentangle such manifold coordinates. Figure 20.6 illustrates
(Goodfellow 2016)