Chapter 5
Chapter 5
ABSTRACT
The latest neural network Python implementation built in Chapter 4 supports working
with any number of inputs but without hidden layers. This chapter extends the implemen-
tation to work with a single hidden layer with just 2 hidden neurons. In later chapters,
more hidden layers and neurons will be supported.
FIG. 5.1 ANN architecture with 3 inputs, 1 output, and 1 hidden layer with 2 neurons.
In addition to the weights between the input and hidden layers, there are 2 weights
connecting the 2 hidden neurons to the output neuron which are W41 and W42.
How does the gradient descent algorithm work with these parameters? The
answer will be clear after discussing the theory of the forward and backward
passes. The next section discusses the theory of the forward pass.
Forward pass
In the forward pass, the neurons in the hidden layer accept the inputs from the
input layer in addition to their weights. For each hidden neuron, the sum of
products SOP between the inputs and their weights is calculated.
For the first hidden neuron, it accepts the 3 inputs X1, X2, and X3 in addition
to their weights W11, W21, and W31. The SOP for this neuron is calculated by
summing the products between each input and its weight. The SOP is calculated
in the next equation.
SOP1 X1W11 X 2W21 X3W31
For reference, the SOP for the first hidden neuron is labeled SOP1 in
Fig. 5.1. For the second hidden neuron, its SOP, which is labeled SOP2, is cal-
culated in the next equation.
SOP2 X1W12 X 2W22 X3W32
After calculating the SOP for the 2 hidden neurons, next is to feed the SOP of
each neuron to an activation function. Remember that the function used up to this
time is the sigmoid function which is calculated as given in the next equation.
1
sigmoid SOP
1 e SOP
By feeding SOP1 to the sigmoid function, the result is Activ1 as calcu-
lated by the next equation.
1
Activ1
1 e SOP1
Working with hidden layers Chapter | 5 77
For the second hidden neurons, its activation function output is Activ2 as
calculated by the next equation:
1
Activ2
1 e SOP2
Remember that in the forward pass, the outputs of a layer are regarded the in-
puts to the next layer. That is the outputs of the hidden layer, which are Activ1
and Activ2 as labeled in Fig. 5.1, are regarded the inputs to the output layer.
The process repeats for calculating the SOP in the output layer neuron. Each
input to the output neuron has a weight. For the first input Activ1, its weight is
W41. For the second input Activ2, its weight is W42. The SOP for the output
neuron is labeled SOP3 and calculated as follows:
SOP3 Activ1W41 Activ2 W42
SOP3 is fed to the sigmoid function to return Activ3 as given in the next
equation.
1
Predicted Activ3
1 e SOP3
Note that the output of the activation function Activ3 is regarded as the
predicted output of the network. After the network makes its prediction,
next is to calculate the error using the squared error function.
error Predicted Target
2
TABLE 5.2 Initial weights for the first neuron in the hidden layer.
W11 W21 W31
0.481 0.299 0.192
TABLE 5.3 Initial weights for the second neuron in the hidden layer.
W12 W22 W32
0.329 0.548 0.214
For the first neuron in the hidden layer, the next equation calculates its SOP
(SOP1). The result is SOP1 = 0.9549.
SOP1 X1W11 X 2W21 X3W31 0.1 0.481 0.4 0.299 4.1 0.192 0.99549
The next equation calculates the SOP for the second hidden neuron which
is SOP2 = 1.1295.
SOP2 X1W12 X 2W22 X3W32 0.1 0.329 0.4 0.548 4.1 0.214 1.11295
After feeding SOP1 and SOP2 to the sigmoid function, the result is calcu-
lated according to the next equations.
1 1
Activ1 0.722
1 e SOP1 1 e 0.9549
1 1
Activ2 0.756
1 e SOP2 1 e 1.1295
The outputs of the hidden layer, Activ1 = 0.722 and Activ2 = 0.756,
are regarded the inputs to the next layer which is the output layer. As a result,
the values of the output layer neuron’s inputs are 0.722 and 0.756. The next
equation calculates the SOP for this neuron which is SOP3 = 1.066.
SOP3 is fed to the sigmoid function to return the predicted output as calcu-
lated in the next equation. The predicted output is 0.744.
1 1
Predicted Activ3 SOP3
1.066
0.744
1 e 1 e
After the predicted output is calculated, next is to calculate the prediction
error according to the next equation which results in an error equal to 0.296.
Calculating the prediction error of the network signals the end of the for-
ward pass. The next section discusses the theory of the backward pass.
Backward pass
In the backward pass, the goal is to calculate the gradients that update the net-
work weights. Because we start from where we ended in the forward pass, the
gradients of the last layer are calculated at first and then moving in a backward
direction until reaching the input layer. Let’s start calculating the gradients of
weights between the hidden layer and the output layer.
Similar to calculating the error to W41 derivative, the error to W42 de-
rivative is easily calculated. The only term that changes from the previous equa-
tion is the last one. Rather than calculating the SOP3 to W41 derivative, now
the SOP3 to W42 derivative is calculated which is given in the next equation.
dSOP3
= Activ2
dW42
Finally, the error to W42 gradient is calculated according to the next equation.
dError dError dPredicted dSOP3
=
dW42 dPredicted dSOP3 dW42
At this point, the gradients for all weights between the hidden layer and the
output layer are successfully calculated. Next is to calculate the gradients for the
weights between the input layer and the hidden layer.
Of the derivatives in the chain, the first 2 derivatives are the first 2 ones used
in the previous chain which are:
1. Error to Predicted derivative.
2. Predicted to SOP3 derivative.
The next derivative in the chain is the derivative of SOP3 with respect to
Activ1 and Activ2. The derivative of SOP3 to Activ1 helps to calculate the
gradients of the weights connected to the first hidden neuron which are W11, W21,
and W31. The derivative of SOP3 to Activ2 helps to calculate the gradients of
the weights connected to the second hidden neuron which are W12, W22, and W32.
Starting by Activ1, here is the equation relating SOP3 to Activ1.
SOP3 Activ1W41 Activ2 W42
The SOP3 to Activ1 derivative is calculated as given in the next equation.
dSOP3
= W41
dActiv1
Similarly, the SOP3 to Activ2 derivative is calculated as given in the next
equation.
dSOP3
= W42
dActiv2
After calculating the derivatives of SOP3 to both Activ1 and Activ2, the
next derivatives in the chain to be calculated are:
1. The derivative of Activ1 to SOP1.
2. The derivative of Activ2 to SOP2.
The derivative of Activ1 to SOP1 is calculated by substituting by SOP1
in the sigmoid function’s derivative as given in the next equation. The resulting
derivative will be used for updating the weights of the first hidden neuron which
are W11, W21, and W31.
dActiv1 1 1
1
dSOP1 1 e SOP1 1 e SOP1
Similarly, the Activ2 to SOP2 derivative is calculated according to the
next equation. This will be used for updating the weights of the second hidden
neuron which are W12, W22, and W32.
dActiv2 1 1
1
dSOP2 1 e SOP2 1 e SOP2
In order to update the first hidden neuron’s weights W11, W21, and W31,
the last derivative to calculate is the derivative between SOP1 to all of these
weights. Here is the equation relating SOP1 to all of these weights.
SOP1 X1W11 X 2W21 X3W31
82 Introduction to deep learning and neural networks with Python™
The derivatives of SOP1 to all of these 3 weights are given in the next
equations.
dSOP1
= X1
dW11
dSOP1
= X2
dW21
dSOP1
= X3
dW31
Here is the equation relating SOP2 to the second hidden neuron’s weights
W12, W22, and W32.
SOP2 X1W12 X 2W22 X3W32
The derivatives of SOP2 to W12, W22, and W32 are given in the next equations.
dSOP2
= X1
dW12
dSOP2
= X2
dW22
dSOP2
= X3
dW32
After calculating all derivatives in the chains from the error to all hidden
weights, next is to multiply them for calculating the gradient of each weight.
For the weights connected to the first hidden neuron (W11, W21, and W31),
their gradients are calculated using the chains given in the next equations. Note
that all of these chains share all derivatives except for the last one.
dError dError dPredicted dSOP3 dActiv1 dSOP1
=
dW11 dPredicted dSOP3 dActiv1 dSOP1 dW11
dSOP3
= Activ
= 2 0.756
dW42
Once all derivatives in the chain connecting the error to the 2 output layer’s
weights W41 and W41 are prepared, the gradients can be calculated as in the
next equations. The gradients are 0.15 and 0.157.
dError dError dPredicted dSOP3
1.088 0.191 0.722 0.15
dW41 dPredicted dSOP3 dW41
dSOP3
= W=42 0.567
dActiv2
The next 2 derivatives are:
1. The derivative of Activ1 to SOP1.
2. The derivative of Activ2 to SOP2.
These derivatives are calculated according to the next equations.
dActiv1 1 1 1 1
1 1 0.2
dSOP1 1 e SOP1 1 e SOP1 1 e
0.9549
1 e
0.9549
dActiv2 1 1 1 1
SOP2
1 SOP2
1.1295
1 1.1295
0.185
dSOP2 1 e 1 e 1 e 1 e
Before calculating the gradient for the weights of the first hidden neuron,
there are 3 derivatives to be calculated which are:
1. The derivative of SOP1 to W11.
2. The derivative of SOP1 to W21.
3. The derivative of SOP1 to W31.
Their calculations are given in the next equations.
dSOP1
= X=
1 0.1
dW11
dSOP1
= X=
2 0.4
dW21
dSOP1
= X=
3 4.1
dW31
86 Introduction to deep learning and neural networks with Python™
By multiplying the derivatives in the chain from the error to each of the 3 weights
of the first hidden neuron (W11, W21, and W31), their gradients are calculated ac-
cording to the next equations. The gradients are 0.004, 0.015, and 0.15.
dError dError dPredicted dSOP3 dActiv1 dSOP1
dW11 dPredicted dSOP3 dActiv1 dSOP1 dW11
1.088 0.191 0.882 0.2 0.1 0.004
For the 3 weights of the second hidden neuron (W12, W22, and W32), there
are 3 remaining derivatives to be calculated which are:
1. The derivative of SOP2 to W12.
2. The derivative of SOP2 to W22.
3. The derivative of SOP2 to W32.
These derivatives are calculated according to the next equations.
dSOP2
= X=
1 0.1
dW12
dSOP2
= X=
2 0.4
dW22
dSOP2
= X=
3 4.1
dW32
By multiplying the derivatives in the chain from the error to each of the 3 weights
of the second hidden neuron (W12, W22, and W32), their gradients are calculated
according to the next equations. The gradients are 0.002, 0.009, and 0.089.
dError dError dPredicted dSOP3 dActiv2 dSOP2
dW12 dPredicted dSOP3 dActiv2 dSOP2 dW12
1.088 0.191 0.567 0.185 0.1 0.002
Updating weights
After calculating the gradients for all weights in the network, the next equation
updates all network weights assuming that the learning_rate is 0.001.
dError
W11new W11old learning _ rate 0.481 0.001 0.004 0.480996
dW11
dError
W21new W21old learning _ rate 0.299 0.001 0.015 0.298985
dW21
dError
W31new W31old learning _ rate 0.192 0.001 0.15 0.19185
dW31
dError
W12 new W12 old learning _ rate 0.329 0.001 0.002 0.328998
dW12
dError
W22 new W22 old learning _ rate 0.548 0.001 0.009 0.547991
dW22
dError
W32 new W32 old learning _ rate 0.214 0.001 0.089 0.213911
dW32
dError
W41new W41old learning _ rate 0.882 0.001 0.15 0.88185
dW41
dError
W41new W41old learning _ rate 0.567 0.001 0.157 0.566843
dW41
At this time, the network weights are updated in only 1 iteration. The for-
ward and backward passes calculations could be repeated for a number of itera-
tions until reaching the desired output.
If the calculations are repeated only once, the error will be reduced from
0.296 to 0.29543095. That is the error reduction is only 0.000569049.
Note that setting the learning rate to a higher value than 0.001 may help in
increasing the speed of error reduction.
88 Introduction to deep learning and neural networks with Python™
After understanding the theory behind how the ANN architecture of this
chapter works in both the forward and backward passes, the next section starts
its Python implementation. Note that the implementation is highly dependent on
the implementations developed previously in Chapters 3 and 4. Hence, it is very
important to have a solid understanding of how the previous implementations
work before building over them.
Python™ implementation
The complete code that implements an ANN with 3 inputs, 1 hidden layer with
2 neurons, and 1 output neuron and optimizing it using the gradient descent
algorithm is listed below.
Working with hidden layers Chapter | 5 89
90 Introduction to deep learning and neural networks with Python™
At first, the inputs and the output are prepared using these 2 lines.
The network weights are prepared according to the next lines which defines
the following 3 variables:
1. w1_3: An array holding the 3 weights connecting the 3 inputs to the first
hidden neuron (W11, W21, and W31).
2. w2_3: An array holding the 3 weights connecting the 3 inputs to the second
hidden neuron (W12, W22, and W32).
3. w3_2: An array with 2 weights for the connections between the hidden layer
neurons and the output neuron (W41 and W42).
After preparing the inputs and the weights, the next section works through
the forward pass.
Working with hidden layers Chapter | 5 91
Forward pass
The code of the forward pass is listed in the next block. It starts by calculating
the sum of products for the 2 hidden neurons and saving them into the variables
sop1 and sop2.
These 2 variables are passed to the sigmoid() function and the results are
saved in the variables sig1 and sig2. These 2 variables are multiplied by the
2 weights connected to the output neuron to return sop3.
sop3 is also applied as input to the sigmoid() function to return the
predicted output. Finally, the error is calculated.
After the forward pass is complete, next is to go through the backward pass.
Backward pass
The part of the code responsible for updating the weights between the hidden
and output layer is given in the next code.
The derivative of the error to the predicted output is calculated and saved in the
variable g1. g2 holds the predicted output to SOP3 derivative. The derivatives of
SOP3 to both W41 and W42 are calculated and saved in the vector g3. Note that
g1 and g2 will be used while calculating the gradients of the hidden neurons.
After calculating all derivatives required to calculate the gradients for the
weights W41 and W41, the gradients are calculated and saved in the grad_
hidden_output vector. Finally, these 2 weights are updated using the
update_w() function by passing the old weights, gradients, and learning rate.
92 Introduction to deep learning and neural networks with Python™
After updating the weights between the hidden and output layers, next is to
work on the weights between the input and hidden layers.
The next code updates the weights connected to the first hidden neuron. g3
represents the SOP3 to Activ1 derivative. Because this derivative is calculated
using the old weights’ values, the old weights are saved into the w3_2_old vari-
able for being used in this step. g4 represents the Activ1 to SOP1 derivative.
Finally, g5 represents the SOP1 to weights (W11, W21, and W31) derivatives.
Based on the derivatives saved in g3, g4, and g5, the gradients of the first
hidden neuron’s weights are calculated by multiplying the variables g1 to g5.
Based on the calculated gradients, the weights are updated.
Similar to the 3 weights connected to the first hidden neuron, the other 3 weights
connected to the second hidden neuron are updated according to the next code.
At the end of the code, the w3_2_old variable is set equal to w3_2.
By reaching this step, the entire code for implementing the neural network
in Fig. 5.1 is complete. The next subsection lists the code that trains the network
in a number of iterations.
Working with hidden layers Chapter | 5 93
Complete code
The previously discussed code just trains the network for a single iteration. The
next code uses a loop for going through a number of iterations in which the
weights are updated.
94 Introduction to deep learning and neural networks with Python™
After the iterations complete, Fig. 5.2 shows how the predicted output
changes for each iteration. The network is able to reach the desired output
(0.2) successfully.
Fig. 5.3 shows how the error changes for each iteration.
Working with hidden layers Chapter | 5 95
0.60
0.55
0.50
0.45
Prediction
0.40
0.35
0.30
0.25
0.20
0.40
0.35
0.30
0.25
Error
0.20
0.15
0.10
0.05
0.00
Conclusion
Continuing the implementation of the ANN started in Chapters 3 and 4, this
chapter implemented an ANN with a hidden layer that has just 2 hidden neu-
rons. This chapter discussed the theory of how an ANN with 3 inputs, 1 hid-
den layer with 2 hidden neurons, and 1 output works. Based on a numerical
example, all steps in the forward and backward passes are covered. Finally, the
Python implementation is discussed.
In Chapter 6, the implementation will be extended to use any number of hid-
den neurons within a single hidden layer.