0% found this document useful (0 votes)
75 views4 pages

Exercises On Backpropagation

e

Uploaded by

yuyang zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views4 pages

Exercises On Backpropagation

e

Uploaded by

yuyang zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Exercises on

Backpropagation

Laurenz Wiskott
Institut fur Neuroinformatik
Ruhr-Universitat Bochum, Germany, EU

30 January 2017

Contents

1 Supervised learning 2

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Error function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Exercise: Error functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.1 Exercise: Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Online learning rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5.1 Nonlinear regression in x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5.2 Linear regression in x ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5.3 Exercise: Learning rule for a nonlinear unit . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5.4 Exercise: Closed form solution of the linear regression problem . . . . . . . . . . . . . 3

2 Supervised learning in multilayer networks 4

2.1 Multilayer networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4


2017 Laurenz Wiskott (homepage https://www.ini.rub.de/PEOPLE/wiskott/). This work (except for all figures from
other sources, if present) is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view
a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. Figures from other sources have their own
copyright, which is generally indicated. Do not distribute parts of these lecture notes showing figures with non-free copyrights
(here usually figures I have the rights to publish but you dont, like my own published figures).
Several of my exercises (not necessarily on this topic) were inspired by papers and textbooks by other authors. Unfortunately,
I did not document that well, because initially I did not intend to make the exercises publicly available, and now I cannot trace
it back anymore. So I cannot give as much credit as I would like to. The concrete versions of the exercises are certainly my
own work, though.
These exercises complement my corresponding lecture notes available at https://www.ini.rub.de/PEOPLE/wiskott/

Teaching/Material/, where you can also find other teaching material such as programming exercises. The table of contents of
the lecture notes is reproduced here to give an orientation when the exercises can be reasonably solved. For best learning effect
I recommend to first seriously try to solve the exercises yourself before looking into the solutions.

1
2.1.1 Exercise: Chain rule in a three-layer network . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Error backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Sample applications ( slides) 4

1 Supervised learning

1.1 Introduction

1.2 Error function

1.2.1 Exercise: Error functions

Let y be the scalar output of a network for a training pattern indexed with and s the required output
value. The error of a network over all M training patterns is often defined as
1 X1
E2 := (y s )2 . (1)
M 2

Why is this error measure so popular? Discuss it in comparison to


1 X
E|| := |y s | , (2)
M
1 X (y s )2
and E := . (3)
M 2 + (y s )2

What are the advantages and disadvantages of these three measures? Calculate the derivative with respect
to y 1 . What is the role of parameter > 0?

1.3 Gradient descent

1.3.1 Exercise: Gradients

For each of the following functions in x and y

1. draw the function with contour lines,

2. illustrate the minima and the gradient of the function, and


3. decide if the function could serve as an error function.

(a) E := x + y . (1)

(b) E := x2 + 2y 2 . (2)

(c) E := cos(x) + sin(y) . (3)

2
1.4 Online learning rule

1.5 Examples

1.5.1 Nonlinear regression in x

1.5.2 Linear regression in x ?

1.5.3 Exercise: Learning rule for a nonlinear unit

Given the output of a nonlinear unit


N
!
X
y(x) := wi xi , (1)
i=0
with (z) := tanh(z) . (2)

Let the training set be (x , s ) for = 1, ..., M , where x is the input vector and s the desired output, and
the error function be
M
1 X1
E := (y(x ) s )2 . (3)
M =1 |2 {z }
=:E

1. Try to get an intuition for E and describe how it differs from the linear case. Illustrate your statements
with a graph.

2. Derive an incremental learning rule from E that uses a gradient descent method and that is applied
separately to each training example.

1.5.4 Exercise: Closed form solution of the linear regression problem

Given a linear unit with output value


N
X
y(x) := wi xi . (1)
i=1

The unit shall learn training data (x , s ) for = 1, ..., M , with x indicating input vectors and s the desired
output values. The error function is given by
M
1X
Fw := (y(x ) s )2 . (2)
2 =1

1. Up to which M does there generally exist an exact solution w+ , such that


X
s = wi+ xi (3)
i

for all ?
2. Are there cases in which no exact solution exists even though M is small enough?
3. Derive a closed form expression for the weight vector w that minimizes the error function under
general conditions (important for cases in which no exact solution exists).

Hint: Write everything compactly with vectors and matrices.

3
2 Supervised learning in multilayer networks

2.1 Multilayer networks

2.1.1 Exercise: Chain rule in a three-layer network

Consider a three-layer network with high-dimensional input x and scalar output a defined by
!
X
yj := uji xi , (1)
i

X
zk := vkj yj , (2)
j
X
a := wk zk . (3)
k

1. Make a sketch of the network and mark connections and units with the variables used above.
2. Calculate the derivative of a with respect to wk .

3. Calculate the derivative of a with respect to vkj .


4. Calculate the derivative of a with respect to uji .
5. Let the target value be s and the error of the network be defined as
1
E := (s a)2 . (4)
2
Calculate the derivative of the error with respect to uji .
6. Now assume there are M input patterns xi and required target values s , both indexed with a super-
script . Let a indicate the output of the network for input pattern xi and the error be the average
over the individual errors, i.e.
1 X1
E := (s a )2 . (5)
M 2

Calculate the derivative of the averaged error with respect to uji .

2.2 Error backpropagation

3 Sample applications ( slides)

You might also like