Assignment 4
Assignment 4
Assignment #4
Posted on November 21 - Due by November 29, 2:00pm
On the course homepage, you will find the neural network code that I showed in class in a
file named “MNIST_demo.c”. In order to make it work, you will also need to download
the following files from the website at:
http://yann.lecun.com/exdb/mnist/
(a) Modify the code so that it writes the error for the training set and the error for the test
set after each epoch into a file. Let the network run for at least 150 epochs (if you let it
run overnight, you can as well perform more epochs) and plot training and test error as
functions of epoch number like in Figure 9.9 (right panel) of Chapter 9 that I sent you.
If your plot looks completely different from Figure 9.9, please let me know and we can
check what went wrong.
(b) See what happens if you use fewer neurons in the two hidden layers. You can choose
any numbers, but make at least one layer significantly smaller than in the original
network. Again, plot the training and test errors across epochs. How do the results differ
from the initial ones?
(c) Restore the original number of neurons, and now change the dropout rate. Originally,
25% of input units (see line 164 in the code) and 50% of hidden-layer neurons (see line
172) are randomly chosen to drop out, i.e., give no output. Note that when we run the
network in production (non-training) mode without dropout, we need to increase the
output of neurons accordingly to keep the overall activation at the same level (lines 180
and 182). Choose a different dropout rate for the hidden-layer and/or input layer units
to whichever value you like, and run the network again for at least 150 epochs. Plot the
results and compare them to the ones you got in (a).
(d) Add noise to the data by randomly choosing a certain percentage of pixels in both the
training and test images and flipping their intensity so that intensity i will become (255
– i). For example, if a pixel has the original value 10, then after this transformation it
has the value 245. The easiest way to do this is to modify the readData() function. You
can choose any percentage you like, and any of the three networks that you created
above. Again, run it for at least 150 epochs and compare the results to the original
network (whichever one you chose).
Here is an idea for an ANN that would make you rich if it performed well. This ANN
predicts the results of soccer matches. The network receives information about the two
competing teams and the conditions of the match and is supposed to predict how many
goals each team will score. With this knowledge, you could bet on the projected winner
team and gain a lot of money.
Let us say that every team consists of 20 players. You are providing the following input
data to the network:
• The skill level of every player on each of the two teams. Skill is rated by a group
of soccer reporters on a scale from 0 (“is unable to kick the ball”) to 10 (“world
class player”).
• The number of matches that each team has played during the last two weeks. There
are never more than seven matches in that period of time.
• The statistics of former matches between the same two teams within the past 10
years (e.g., Team A won 30% of the matches, Team B 45%, and 25% of the matches
were tied).
• The continent that each team comes from (North America, South America, Europe,
Africa, Asia, or Australia).
• Where the match takes place (Team A’s stadium, Team B’s stadium, or neutral
place).
• The phase of the soccer season (early season vs. late season).
You want to build and train a backpropagation network that, based on this information, is
able to predict the number of goals each team will score. Describe an appropriate way of
formatting the input, interpreting the output, collecting exemplars, constructing the
network, training the network, and testing the network. Give reasons for the decisions that
you make. Describe everything in great detail so that a computer programmer who does
not know anything about ANNs would be able to successfully build this network
application, predict results, and become rich. The programmer can look up the BPN
equations for training and operation in a book, but needs precise explanations for
everything else. Please help him/her out!
The following is a network of linear neurons - that is, neurons whose output is identical to
their net input, x⋅w (in other words, their output function that translates net input into output
is simply the identity function). These neurons do not receive any “dummy” inputs (biases
or offsets). The numbers in the circles indicate the output of a neuron, and the labels of
connections indicate the value of the corresponding weight.
1
-2
-1
-4 3 4
2 3
2 0
-3
1
2 1
(a) Just as a warm-up exercise, compute the output of the hidden-layer and the output-layer
neurons for the given input (2, 1).
(b) Only mandatory for CS670: Show that a network of linear neurons, such as this one,
always computes a linear function, regardless of its number of layers and neurons.
Hint: A function y = f(x) is linear if and only if it can be expressed as y = Ax for some
matrix A.
(c) Only mandatory for CS670: Given that our three-layer network computes a linear
function, we suddenly notice that our network is wastefully large. It must be possible
to compute exactly the same function with a two-layer network. Draw such a network,
including all of its weights, that only consists of an input layer and an output layer and
computes the same function as the network shown above. Hint: In the network above,
determine how the output of each output-layer neuron depends on the two network
inputs, and then you should be able to find the correct weights for the two-layer
network. There is a more elegant way of deriving the solution that is related to (b), but
any correct solution gets full points, regardless of your approach.
Explain in your own words how the concepts of gradient descent and backpropagation
are related to each other.
Please put your answers to all questions in a single text file and upload it to your course
directory. Alternatively, you can submit some or all answers as a hardcopy at the start of
the class.