Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Machine Learning

Basics
Lecture slides for Chapter 5 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
2016-09-26
TER 5. MACHINE LEARNING BASICS

Linear Regression
Linear regression example Optimization of w
3 0.55

2 0.50
0.45
1

MSE(train)
0.40
0
y

0.35
1
0.30
2 0.25
3 0.20
1.0 0.5 0.0 0.5 1.0 0.5 1.0 1.5
x1 w1

Figure 5.1
e 5.1: A linear regression problem, with a training set consisting of ten data p
containing one feature. Because there is only one feature, the weight vect
ns only a single parameter to learn, w . (Left)Observe that linear regression l
(Goodfellow 2016)
more parameters than training examples. We have little chance of ch
Underfitting and Overfitting in
tion that generalizes well when so many wildly different solutions ex
xample, the quadratic model is perfectly matched to the true struct
Polynomial Estimation
sk so it generalizes well to new data.

Underfitting Appropriate capacity Overfitting


y

y
x0 x0 x0

Figure 5.2
5.2: We fit three models to this example training set. The training da
(Goodfellow 2016)
TER 5. MACHINE LEARNING BASICS

Generalization and Capacity


Training error
Underfitting zone Overfitting zone
Generalization error
Error

Generalization gap

0 Optimal Capacity
Capacity

e 5.3: Typical relationship between capacity and error. Training and test
Figure
e differently. At the left end of the 5.3training error and generalization
graph, (Goodfellow 2016)
Training Set Size
CHAPTER 5. MACHINE LEARNING BASICS

3.5
3.0
Bayes error
Train (quadratic)
2.5
Error (MSE)
Test (quadratic)
2.0
Test (optimal capacity)
1.5
Train (optimal capacity)
1.0
0.5
0.0

Figure 5.4
0 1 2 3 4 5
10 10 10 10 10 10
Number of training examples
Optimal capacity (polynomial degree)

20

15

10

0
0 1 2 3 4 5
10 10 10 10 10 10
(Goodfellow 2016)
Number of training examples
of how we can control a model’s tendency to overfit or underfit via
can train a high-degree polynomial regression model with differen
Weight Decay
figure 5.5 for the results.

Underfitting Appropriate weight decay Overfitting


(Excessive λ) (Medium λ) (λ →()
y

y
x( x( x(

Figure
: We fit a high-degree polynomial 5.5
regression model to our example trai
(Goodfellow 2016)
etween the estimator and the true value of the parameter ✓. As is clear f
quation 5.54, evaluating the MSE incorporates both the bias and the varia

Bias and Variance


esirable estimators are those with small MSE and these are estimators
anage to keep both their bias and variance somewhat in check.

Underfitting zone Overfitting zone

Bias Generalization
error Variance

Optimal Capacity
capacity

gure 5.6: As capacity increases (x-axis), bias (dotted) tends to decrease and vari
Figure U-shaped
ashed) tends to increase, yielding another 5.6 curve for generalization error (
(Goodfellow 2016)
0 1

10

Decision Trees
00 01 11
R 5. MACHINE LEARNING BASICS

010 011 110 11

0 1
1110
10
00 01 11

010
010 011 110 111

00 01

1110 1111
0

011

010

1 110
00 01

0
11
011
10

1110 111 1111

Figure 5.7 1 110


(Goodfellow 2016)
Principal Components Analysis
CHAPTER 5. MACHINE LEARNING BASICS

20 20

10 10
x2

0 0

z2
10 10

20 20

20 10 0 10 20 20 10 0 10 20
x1 z1

Figure 5.8: PCA learns a linear projection that aligns the direction of greatest variance
with the axes of the new space. (Left)The original data consists of samples of x. In this
Figure
space, the variance might occur along 5.8
directions that are not axis-aligned. (Right)The
transformed data z = x> W now varies most along the axis z1 . The direction of second
most variance is now along z2 .
(Goodfellow 2016)
Curse of Dimensionality
HAPTER 5. MACHINE LEARNING BASICS

Figure 5.9: As the number of relevant dimensions


Figure 5.9 of the data increases (from left
ight), the number of configurations of interest may grow exponentially. (Left)In t
ne-dimensional example, we have one variable for which we only care to distinguish
egions of interest. With enough examples falling within each of these regions (each reg
orresponds to a cell in the illustration), learning algorithms can easily generalize correc
A straightforward way to generalize is to estimate the value of the target function
(Goodfellow wit
2016)
Nearest Neighbor

Figure
5.10: Illustration of how the nearest 5.10 algorithm breaks up the inpu
neighbor
gions. An example (represented here by a circle) within each region defi
(Goodfellow 2016)
the manifold to vary from one point to another. This often happens wh
nifold intersects itself. For example, a figure eight is a manifold that has a s

Manifold Learning
mension in most places but two dimensions at the intersection at the cent

2.5

2.0

1.5

1.0

0.5

0.0

0.5

1.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

ure 5.11: Data sampled from a distribution in a two-dimensional space that is ac


Figure 5.11
centrated near a one-dimensional manifold, like a twisted string. The solid line ind
underlying manifold that the learner should infer. (Goodfellow 2016)
Uniformly Sampled Images
CHAPTER 5. MACHINE LEARNING BASICS

Figure 5.12
(Goodfellow 2016)
QMUL Dataset

Figure
5.13: Training examples from the QMUL5.13Multiview Face Dataset (Gong et al.
ich the subjects were asked to move in such a way as to cover the two-dimen
ld corresponding to two angles of rotation. We would like learning algorithm
o discover and disentangle such manifold coordinates. Figure 20.6 illustrates
(Goodfellow 2016)

You might also like