Feedforward Propagation: 1.1 Visualizing The Data
Feedforward Propagation: 1.1 Visualizing The Data
Feedforward Propagation
We first implement feedforward propagation for neural network with the already given weights.
Then we will implement the backpropagation algorithm to learn the parameters for ourselves.
Here we use the term weights and parameters interchangeably.
Our neural network has 3 layers — an input layer, a hidden layer and an output layer. Do
recall that the inputs will be 20 x 20 grey scale images “unrolled” to form 400 input features
which we will feed into the neural network. So our input layer has 400 neurons. Also the
hidden layer has 25 neurons and the output layer 10 neurons corresponding to 10 digits (or
classes) our model predicts. The +1 in the above figure represents the bias term.
We have been provided with a set of already trained network parameters. These are stored in
ex4weights.mat and will be loaded into theta1 and theta2 followed by unrolling into a vector
nn_params. The parameters have dimensions that are sized for a neural network with 25
units in the second layer and 10 output units (corresponding to the 10 digit classes).
1.3
Fe
edf
or
wa
rd
an
d cost function
First we will implement the cost function followed by gradient for the neural network (for which
we use backpropagation algorithm). Recall that the cost function for the neural network with
regularization is
sigmoid function
def sigmoid(z):
return 1/(1+np.exp(-z))
cost function
2
Ba
ck
pr
op
ag
ation
In this part of the exercise, you will implement the backpropagation algorithm to compute the
gradients for the neural network. Once you have computed the gradient, you will be able to
train the neural network by minimizing the cost function using an advanced optimizer such as
fmincg.
def randInitializeWeights(L_in,
L_out):
epsilon =
0.12
return np.random.rand(L_out, L_in+1)
* 2 * epsilon - epsilon
initial_theta1 =
randInitializeWeights(input_layer_size,
hidden_layer_size)
initial_theta2 =
randInitializeWeights(hidden_layer_size,
num_labels)
# unrolling parameters into a single
column vector
nn_initial_params = np.hstack((initial_theta1.ravel(order='F'), initial_theta2.ravel(order='F')))
2.3 Backpropagation
Backpropagation is not so complicated algorithm once you get the hang of it.
I strongly urge you to watch the Andrew’s videos on backprop multiple times.
In summary we do the following by looping through every training example:
1. Compute the forward propagate to get the output activation a3.
2. Calculate the error term d3 that’s obtained by subtracting actual output from our calculated
output a3.
3. For hidden layer, error termd2 can be calculated as below:
2.
4 Gradient checking
Why do we need Gradient checking ? To make sure that our backprop algorithm has no bugs
in it and works as intended. We can approximate the derivative of our cost function with:
The gradients computed using backprop and numerical approximation should agree to at
least 4 significant digits to make sure that our backprop implementation is bug free.
def checkGradient(nn_initial_params,nn_backprop_Params,input_layer_size,
hidden_layer_size, num_labels,myX,myy,mylambda=0.):
myeps =
0.0001
flattened =
nn_initial_params
flattenedDs =
nn_backprop_Params
n_elems =
len(flattened)
#Pick ten random elements, compute numerical gradient, compare to
respective D's
for i in
range(10):
x=
int(np.random.rand()*n_elems)
epsvec =
np.zeros((n_elems,1))
epsvec[x] =
myeps
cost_high = nnCostFunc(flattened + epsvec.flatten(),input_layer_size, hidden_layer_size,
num_labels,myX,myy,mylambda)
cost_low = nnCostFunc(flattened - epsvec.flatten(),input_layer_size, hidden_layer_size,
num_labels,myX,myy,mylambda)
mygrad = (cost_high - cost_low) /
float(2*myeps)
print("Element: {0}. Numerical Gradient = {1:.9f}. BackProp Gradient =
{2:.9f}.".format(x,mygrad,flattenedDs[x]))
2.
5
Le
ar
ni
ng
pa
rameters using fmincg
After you have successfully implemented the neural network cost function and gradient
computation, the next step is to use fmincg to learn a good set of parameters for the neural
network. theta_opt contains unrolled parameters that we just learnt which we roll to get
theta1_opt and theta2_opt.