Exercises On Backpropagation
Exercises On Backpropagation
Backpropagation
Laurenz Wiskott
Institut fur Neuroinformatik
Ruhr-Universitat Bochum, Germany, EU
30 January 2017
Contents
1 Supervised learning 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Teaching/Material/, where you can also find other teaching material such as programming exercises. The table of contents of
the lecture notes is reproduced here to give an orientation when the exercises can be reasonably solved. For best learning effect
I recommend to first seriously try to solve the exercises yourself before looking into the solutions.
1
2.1.1 Exercise: Chain rule in a three-layer network . . . . . . . . . . . . . . . . . . . . . . . 4
1 Supervised learning
1.1 Introduction
Let y be the scalar output of a network for a training pattern indexed with and s the required output
value. The error of a network over all M training patterns is often defined as
1 X1
E2 := (y s )2 . (1)
M 2
What are the advantages and disadvantages of these three measures? Calculate the derivative with respect
to y 1 . What is the role of parameter > 0?
(a) E := x + y . (1)
(b) E := x2 + 2y 2 . (2)
2
1.4 Online learning rule
1.5 Examples
Let the training set be (x , s ) for = 1, ..., M , where x is the input vector and s the desired output, and
the error function be
M
1 X1
E := (y(x ) s )2 . (3)
M =1 |2 {z }
=:E
1. Try to get an intuition for E and describe how it differs from the linear case. Illustrate your statements
with a graph.
2. Derive an incremental learning rule from E that uses a gradient descent method and that is applied
separately to each training example.
The unit shall learn training data (x , s ) for = 1, ..., M , with x indicating input vectors and s the desired
output values. The error function is given by
M
1X
Fw := (y(x ) s )2 . (2)
2 =1
for all ?
2. Are there cases in which no exact solution exists even though M is small enough?
3. Derive a closed form expression for the weight vector w that minimizes the error function under
general conditions (important for cases in which no exact solution exists).
3
2 Supervised learning in multilayer networks
Consider a three-layer network with high-dimensional input x and scalar output a defined by
!
X
yj := uji xi , (1)
i
X
zk := vkj yj , (2)
j
X
a := wk zk . (3)
k
1. Make a sketch of the network and mark connections and units with the variables used above.
2. Calculate the derivative of a with respect to wk .