0% found this document useful (0 votes)
17 views22 pages

Chapter 5

Artificial Intelligence chapter 5

Uploaded by

08deepaksingh.me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views22 pages

Chapter 5

Artificial Intelligence chapter 5

Uploaded by

08deepaksingh.me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Chapter 5

Working with hidden layers


Chapter outline
ANN with 1 hidden layer with Hidden layer weights gradients 85
2 neurons 75 Updating weights 87
Forward pass 76 Python™ implementation 88
Forward pass math calculations 77 Forward pass 91
Backward pass 79 Backward pass 91
Output layer weights 79 Complete code 93
Hidden layer weights 80 Conclusion 96
Backward pass math calculations 84
Output layer weights gradients 84

ABSTRACT
The latest neural network Python implementation built in Chapter 4 supports working
with any number of inputs but without hidden layers. This chapter extends the implemen-
tation to work with a single hidden layer with just 2 hidden neurons. In later chapters,
more hidden layers and neurons will be supported.

ANN with 1 hidden layer with 2 neurons


Similar to the strategy used for building the input layer in which layer starts with
a fixed number of inputs until building a generic input layer that can work with
any number of inputs, this chapter just builds a hidden layer with 2 neurons.
This section discusses the network architecture shown in Fig. 5.1. The net-
work has 3 inputs, 1 hidden layer with 2 neurons, and 1 output neuron.
Each of the 3 inputs is connected to the 2 hidden neurons. Thus, there are
2(3) = 6 connections. For each of the 6 connections, there is a different weight.
The weights between the input and hidden layers are labeled as Wzy where z
refers to the input layer neuron index and y refers to the index of the hidden
neuron. Note that the weights between the layers with indices K-1 and K can be
directly called the weights of layer K.
The weight for the connection between the first input X1 and the first hid-
den neuron is W11. The weight W12 is for the connection between X1 and the
second hidden neuron. Regarding X2, the weights W21 and W22 are for the
connections to the first and second hidden neurons, respectively. Similarly, X3
has 2 weights W31 and W32.

Introduction to Deep Learning and Neural Networks with Python™


https://doi.org/10.1016/B978-0-323-90933-4.00004-8 75
© 2021 Elsevier Inc. All rights reserved.
76 Introduction to deep learning and neural networks with Python™

FIG. 5.1 ANN architecture with 3 inputs, 1 output, and 1 hidden layer with 2 neurons.

In addition to the weights between the input and hidden layers, there are 2 weights
connecting the 2 hidden neurons to the output neuron which are W41 and W42.
How does the gradient descent algorithm work with these parameters? The
answer will be clear after discussing the theory of the forward and backward
passes. The next section discusses the theory of the forward pass.

Forward pass
In the forward pass, the neurons in the hidden layer accept the inputs from the
input layer in addition to their weights. For each hidden neuron, the sum of
products SOP between the inputs and their weights is calculated.
For the first hidden neuron, it accepts the 3 inputs X1, X2, and X3 in addition
to their weights W11, W21, and W31. The SOP for this neuron is calculated by
summing the products between each input and its weight. The SOP is calculated
in the next equation.
SOP1  X1W11  X 2W21  X3W31

For reference, the SOP for the first hidden neuron is labeled SOP1 in
Fig. 5.1. For the second hidden neuron, its SOP, which is labeled SOP2, is cal-
culated in the next equation.
SOP2  X1W12  X 2W22  X3W32

After calculating the SOP for the 2 hidden neurons, next is to feed the SOP of
each neuron to an activation function. Remember that the function used up to this
time is the sigmoid function which is calculated as given in the next equation.
1
sigmoid  SOP  
1  e  SOP
By feeding SOP1 to the sigmoid function, the result is Activ1 as calcu-
lated by the next equation.
1
Activ1 
1  e  SOP1
Working with hidden layers Chapter | 5 77

For the second hidden neurons, its activation function output is Activ2 as
calculated by the next equation:
1
Activ2 
1  e  SOP2
Remember that in the forward pass, the outputs of a layer are regarded the in-
puts to the next layer. That is the outputs of the hidden layer, which are Activ1
and Activ2 as labeled in Fig. 5.1, are regarded the inputs to the output layer.
The process repeats for calculating the SOP in the output layer neuron. Each
input to the output neuron has a weight. For the first input Activ1, its weight is
W41. For the second input Activ2, its weight is W42. The SOP for the output
neuron is labeled SOP3 and calculated as follows:
SOP3  Activ1W41  Activ2 W42

SOP3 is fed to the sigmoid function to return Activ3 as given in the next
equation.
1
Predicted  Activ3 
1  e  SOP3
Note that the output of the activation function Activ3 is regarded as the
predicted output of the network. After the network makes its prediction,
next is to calculate the error using the squared error function.
error   Predicted  Target 
2

To have better understanding, the next section discusses an example to go


through the math calculations behind the forward pass.

Forward pass math calculations


According to the architecture in Fig. 5.1, there are 3 inputs 3 inputs (X1, X2, and
X3) and 1 output Y. The values of the 3 inputs and the output of a single sample
are listed in Table 5.1.
Regarding the network weights, Table 5.2 lists the weights for the first neu-
ron in the hidden layer.
The weights for the second hidden neuron are listed in Table 5.3.
The final weights are the ones connected to the output neuron which are
given in Table 5.4.

TABLE 5.1 Sample input for architecture in Fig. 5.1.


X1 X2 X3 Y
0.1 0.4 4.1 0.2
78 Introduction to deep learning and neural networks with Python™

TABLE 5.2 Initial weights for the first neuron in the hidden layer.
W11 W21 W31
0.481 0.299 0.192

TABLE 5.3 Initial weights for the second neuron in the hidden layer.
W12 W22 W32
0.329 0.548 0.214

TABLE 5.4 Initial weights for the output neuron.


W41 W42
0.882 0.567

For the first neuron in the hidden layer, the next equation calculates its SOP
(SOP1). The result is SOP1 = 0.9549.
SOP1  X1W11  X 2W21  X3W31  0.1  0.481  0.4  0.299   4.1  0.192   0.99549
The next equation calculates the SOP for the second hidden neuron which
is SOP2 = 1.1295.

SOP2  X1W12  X 2W22  X3W32  0.1  0.329   0.4  0.548   4.1  0.214   1.11295

After feeding SOP1 and SOP2 to the sigmoid function, the result is calcu-
lated according to the next equations.
1 1
Activ1    0.722
1  e  SOP1 1  e 0.9549

1 1
Activ2    0.756
1  e  SOP2 1  e 1.1295
The outputs of the hidden layer, Activ1 = 0.722 and Activ2 = 0.756,
are regarded the inputs to the next layer which is the output layer. As a result,
the values of the output layer neuron’s inputs are 0.722 and 0.756. The next
equation calculates the SOP for this neuron which is SOP3 = 1.066.

SOP3  Activ1W41  Activ2 W42  0.722  0.882   0.756  0.567   1.066


Working with hidden layers Chapter | 5 79

SOP3 is fed to the sigmoid function to return the predicted output as calcu-
lated in the next equation. The predicted output is 0.744.
1 1
Predicted  Activ3   SOP3
 1.066
 0.744
1 e 1  e
After the predicted output is calculated, next is to calculate the prediction
error according to the next equation which results in an error equal to 0.296.

error   Predicted  Target    0.744  0.2   0.296


2 2

Calculating the prediction error of the network signals the end of the for-
ward pass. The next section discusses the theory of the backward pass.

Backward pass
In the backward pass, the goal is to calculate the gradients that update the net-
work weights. Because we start from where we ended in the forward pass, the
gradients of the last layer are calculated at first and then moving in a backward
direction until reaching the input layer. Let’s start calculating the gradients of
weights between the hidden layer and the output layer.

Output layer weights


Because there is no explicit equation relating both the error and the output lay-
er’s weights (W41 and W42), it is preferred to use the chain rule. What are the
derivatives in the chain from the error to the output layer’s weights?
Starting by the first weight, we need to find the derivative of the error with
respect to W41. The error function is used for this purpose.

error   Predicted  Target 


2

The error function has 2 terms which are:


1. Predicted
2. Target
Because the error function does not include an explicit term as the weight
W41, one of these terms should lead to that weight. Of the 2 terms in the error
function, which one leads to the weight W41? It is Predicted because the
other term Target is constant.
The first derivative to calculate is the derivative of the error with respect
to the predicted output as calculated in the next equation.
dError
 2  Predicted  Target 
dPredicted
80 Introduction to deep learning and neural networks with Python™

Next is to calculate the derivative of Predicted to SOP3 by substituting


in the derivative of the sigmoid function by SOP3 as given in the next equation.
dPredicted  1  1 
  SOP3  
1  SOP3 
dSOP3 1 e  1 e 
The next derivative is the SOP3 to W41 derivative. To follow up, here is the
equation that relates both SOP3 and W41.

SOP3  Activ1W41  Activ2 W42

The derivative of SOP3 to W41 is given in the next equation.


dSOP3
= Activ1
dW41
By calculating all derivatives in the chain from the error to W41, the W41
gradient is calculated by multiplying all of these derivatives as given in the next
equation.
dError dError dPredicted dSOP3
=
dW41 dPredicted dSOP3 dW41

Similar to calculating the error to W41 derivative, the error to W42 de-
rivative is easily calculated. The only term that changes from the previous equa-
tion is the last one. Rather than calculating the SOP3 to W41 derivative, now
the SOP3 to W42 derivative is calculated which is given in the next equation.
dSOP3
= Activ2
dW42
Finally, the error to W42 gradient is calculated according to the next equation.
dError dError dPredicted dSOP3
=
dW42 dPredicted dSOP3 dW42

At this point, the gradients for all weights between the hidden layer and the
output layer are successfully calculated. Next is to calculate the gradients for the
weights between the input layer and the hidden layer.

Hidden layer weights


The generic chain of derivatives from the error to any of the weights of the hid-
den layer is given in the next equation where Wzy means the weight connecting
the input neuron with index z with the hidden neuron indexed y.
dError dError dPredicted dSOP3 dActiv y dSOPy
=
dWzy dPredicted dSOP3 dActivy dSOPy dWzy
Working with hidden layers Chapter | 5 81

Of the derivatives in the chain, the first 2 derivatives are the first 2 ones used
in the previous chain which are:
1. Error to Predicted derivative.
2. Predicted to SOP3 derivative.
The next derivative in the chain is the derivative of SOP3 with respect to
Activ1 and Activ2. The derivative of SOP3 to Activ1 helps to calculate the
gradients of the weights connected to the first hidden neuron which are W11, W21,
and W31. The derivative of SOP3 to Activ2 helps to calculate the gradients of
the weights connected to the second hidden neuron which are W12, W22, and W32.
Starting by Activ1, here is the equation relating SOP3 to Activ1.
SOP3  Activ1W41  Activ2 W42
The SOP3 to Activ1 derivative is calculated as given in the next equation.
dSOP3
= W41
dActiv1
Similarly, the SOP3 to Activ2 derivative is calculated as given in the next
equation.
dSOP3
= W42
dActiv2
After calculating the derivatives of SOP3 to both Activ1 and Activ2, the
next derivatives in the chain to be calculated are:
1. The derivative of Activ1 to SOP1.
2. The derivative of Activ2 to SOP2.
The derivative of Activ1 to SOP1 is calculated by substituting by SOP1
in the sigmoid function’s derivative as given in the next equation. The resulting
derivative will be used for updating the weights of the first hidden neuron which
are W11, W21, and W31.
dActiv1  1  1 
 1
dSOP1  1  e  SOP1   1  e  SOP1 
Similarly, the Activ2 to SOP2 derivative is calculated according to the
next equation. This will be used for updating the weights of the second hidden
neuron which are W12, W22, and W32.
dActiv2  1  1 
 1
dSOP2  1  e  SOP2   1  e  SOP2 
In order to update the first hidden neuron’s weights W11, W21, and W31,
the last derivative to calculate is the derivative between SOP1 to all of these
weights. Here is the equation relating SOP1 to all of these weights.
SOP1  X1W11  X 2W21  X3W31
82 Introduction to deep learning and neural networks with Python™

The derivatives of SOP1 to all of these 3 weights are given in the next
equations.
dSOP1
= X1
dW11

dSOP1
= X2
dW21

dSOP1
= X3
dW31
Here is the equation relating SOP2 to the second hidden neuron’s weights
W12, W22, and W32.
SOP2  X1W12  X 2W22  X3W32

The derivatives of SOP2 to W12, W22, and W32 are given in the next equations.
dSOP2
= X1
dW12

dSOP2
= X2
dW22

dSOP2
= X3
dW32
After calculating all derivatives in the chains from the error to all hidden
weights, next is to multiply them for calculating the gradient of each weight.
For the weights connected to the first hidden neuron (W11, W21, and W31),
their gradients are calculated using the chains given in the next equations. Note
that all of these chains share all derivatives except for the last one.
dError dError dPredicted dSOP3 dActiv1 dSOP1
=
dW11 dPredicted dSOP3 dActiv1 dSOP1 dW11

dError dError dPredicted dSOP3 dActiv1 dSOP1


=
dW21 dPredicted dSOP3 dActiv1 dSOP1 dW21

dError dError dPredicted dSOP3 dActiv1 dSOP1


=
dW31 dPredicted dSOP3 dActiv1 dSOP1 dW31
For the weights connected to the second hidden neuron (W12, W22, and
W32), their gradients are calculated using the chains given in the next equations.
Note that all of these chains share all derivatives except for the last derivative.
Working with hidden layers Chapter | 5 83

dError dError dPredicted dSOP3 dActiv2 dSOP2


=
dW12 dPredicted dSOP3 dActiv2 dSOP2 dW12

dError dError dPredicted dSOP3 dActiv2 dSOP2


=
dW22 dPredicted dSOP3 dActiv2 dSOP2 dW22

dError dError dPredicted dSOP3 dActiv2 dSOP2


=
dW32 dPredicted dSOP3 dActiv2 dSOP2 dW32
At this time, the chains for calculating the gradients for all weights in the
entire network are successfully prepared. The next equations summarize these
chains.
dError dError dPredicted dSOP3
=
dW41 dPredicted dSOP3 dW41

dError dError dPredicted dSOP3


=
dW42 dPredicted dSOP3 dW42

dError dError dPredicted dSOP3 dActiv1 dSOP1


=
dW11 dPredicted dSOP3 dActiv1 dSOP1 dW11

dError dError dPredicted dSOP3 dActiv1 dSOP1


=
dW21 dPredicted dSOP3 dActiv1 dSOP1 dW21

dError dError dPredicted dSOP3 dActiv1 dSOP1


=
dW31 dPredicted dSOP3 dActiv1 dSOP1 dW31

dError dError dPredicted dSOP3 dActiv2 dSOP2


=
dW12 dPredicted dSOP3 dActiv2 dSOP2 dW12

dError dError dPredicted dSOP3 dActiv2 dSOP2


=
dW22 dPredicted dSOP3 dActiv2 dSOP2 dW22

dError dError dPredicted dSOP3 dActiv2 dSOP2


=
dW32 dPredicted dSOP3 dActiv2 dSOP2 dW32
After calculating all gradients, next is to update the weights according to the
next equation.
Wnew  Wold  learning _ rate  grad
By discussing the steps of calculating the gradients and updating the weights,
the next section continues the math example started previously to do the back-
ward pass calculations.
84 Introduction to deep learning and neural networks with Python™

Backward pass math calculations


In this section, the values for all derivatives are calculated followed by calculat-
ing the weights’ gradients. Of all derivatives in the chains, the first 2 derivatives
are shared across all the chains.
Given the values of the predicted and target outputs, the first derivative in all
chains is calculated in the next equation.
dError
 2  Predicted  Target   2  0.744  0.2   1.088
dPredicted
The second derivative in all chains is between Predicted and SOP3
which is calculated according to the next equation.
dPredicted  1  1   1  1 
  SOP3  
1  SOP3 
 1.066  
1 1.066 
 0.191
dSOP3 1 e  1 e  1 e  1 e 
Besides the first 2 derivatives, the others change for some chains. The next
subsection calculates the derivative for the output layer. The subsequent subsec-
tion works on the derivatives of the hidden layer.

Output layer weights gradients


For calculating the gradients of the 2 output layer’s weights W41 and W41, there
are 2 remaining derivatives in the chain which are:
1. The derivative of SOP3 to W41.
2. The derivative of SOP3 to W42.
These 2 derivatives are calculated in the next equations.
dSOP3
= Activ
= 1 0.722
dW41

dSOP3
= Activ
= 2 0.756
dW42
Once all derivatives in the chain connecting the error to the 2 output layer’s
weights W41 and W41 are prepared, the gradients can be calculated as in the
next equations. The gradients are 0.15 and 0.157.
dError dError dPredicted dSOP3
  1.088  0.191 0.722   0.15
dW41 dPredicted dSOP3 dW41

dError dError dPredicted dSOP3


  1.088  0.191 0.756   0.157
dW42 dPredicted dSOP3 dW42
After the gradients for W41 and W42 are calculated, the next section works
on calculating the gradients of the hidden neurons.
Working with hidden layers Chapter | 5 85

Hidden layer weights gradients


According to the chains of derivatives of the hidden neurons, the next 2 deriva-
tives to be calculated are:
1. The derivative of SOP3 to Activ1.
2. The derivative of SOP3 to Activ2.
These 2 derivatives are calculated in the next equations.
dSOP3
= W=41 0.882
dActiv1

dSOP3
= W=42 0.567
dActiv2
The next 2 derivatives are:
1. The derivative of Activ1 to SOP1.
2. The derivative of Activ2 to SOP2.
These derivatives are calculated according to the next equations.
dActiv1  1  1   1  1 
 1  1   0.2
dSOP1  1  e  SOP1   1  e  SOP1  1 e
0.9549  
 1 e
0.9549


dActiv2  1  1   1  1 
  SOP2  
1  SOP2 
 1.1295  
1 1.1295 
 0.185
dSOP2  1  e  1 e  1 e  1 e 
Before calculating the gradient for the weights of the first hidden neuron,
there are 3 derivatives to be calculated which are:
1. The derivative of SOP1 to W11.
2. The derivative of SOP1 to W21.
3. The derivative of SOP1 to W31.
Their calculations are given in the next equations.
dSOP1
= X=
1 0.1
dW11

dSOP1
= X=
2 0.4
dW21

dSOP1
= X=
3 4.1
dW31
86 Introduction to deep learning and neural networks with Python™

By multiplying the derivatives in the chain from the error to each of the 3 weights
of the first hidden neuron (W11, W21, and W31), their gradients are calculated ac-
cording to the next equations. The gradients are 0.004, 0.015, and 0.15.
dError dError dPredicted dSOP3 dActiv1 dSOP1

dW11 dPredicted dSOP3 dActiv1 dSOP1 dW11
 1.088  0.191 0.882  0.2  0.1  0.004

dError dError dPredicted dSOP3 dActiv1 dSOP1



dW21 dPredicted dSOP3 dActiv1 dSOP1 dW21
 1.088  0.191 0.882  0.2  0.4   0.015

dError dError dPredicted dSOP3 dActiv1 dSOP1



dW31 dPredicted dSOP3 dActiv1 dSOP1 dW31
 1.088  0.191 0.882  0.2  4.1  0.15

For the 3 weights of the second hidden neuron (W12, W22, and W32), there
are 3 remaining derivatives to be calculated which are:
1. The derivative of SOP2 to W12.
2. The derivative of SOP2 to W22.
3. The derivative of SOP2 to W32.
These derivatives are calculated according to the next equations.
dSOP2
= X=
1 0.1
dW12

dSOP2
= X=
2 0.4
dW22

dSOP2
= X=
3 4.1
dW32
By multiplying the derivatives in the chain from the error to each of the 3 weights
of the second hidden neuron (W12, W22, and W32), their gradients are calculated
according to the next equations. The gradients are 0.002, 0.009, and 0.089.
dError dError dPredicted dSOP3 dActiv2 dSOP2

dW12 dPredicted dSOP3 dActiv2 dSOP2 dW12
 1.088  0.191 0.567  0.185  0.1  0.002

dError dError dPredicted dSOP3 dActiv2 dSOP2



dW22 dPredicted dSOP3 dActiv2 dSOP2 dW22
 1.088  0.191 0.567  0.185  0.4   0.009
Working with hidden layers Chapter | 5 87

dError dError dPredicted dSOP3 dActiv2 dSOP2



dW32 dPredicted dSOP3 dActiv2 dSOP2 dW32
 1.088  0.191 0.567  0.185  4.1  0.089
By calculating the gradients for all weights in the network, the next subsec-
tion updates the weights.

Updating weights
After calculating the gradients for all weights in the network, the next equation
updates all network weights assuming that the learning_rate is 0.001.
 dError 
W11new  W11old  learning _ rate    0.481  0.001  0.004   0.480996
 dW11 

 dError 
W21new  W21old  learning _ rate    0.299  0.001  0.015   0.298985
 dW21 

 dError 
W31new  W31old  learning _ rate    0.192  0.001  0.15   0.19185
 dW31 

 dError 
W12 new  W12 old  learning _ rate    0.329  0.001  0.002   0.328998
 dW12 

 dError 
W22 new  W22 old  learning _ rate    0.548  0.001  0.009   0.547991
 dW22 

 dError 
W32 new  W32 old  learning _ rate    0.214  0.001  0.089   0.213911
 dW32 

 dError 
W41new  W41old  learning _ rate    0.882  0.001  0.15   0.88185
 dW41 

 dError 
W41new  W41old  learning _ rate    0.567  0.001  0.157   0.566843
 dW41 
At this time, the network weights are updated in only 1 iteration. The for-
ward and backward passes calculations could be repeated for a number of itera-
tions until reaching the desired output.
If the calculations are repeated only once, the error will be reduced from
0.296 to 0.29543095. That is the error reduction is only 0.000569049.
Note that setting the learning rate to a higher value than 0.001 may help in
increasing the speed of error reduction.
88 Introduction to deep learning and neural networks with Python™

After understanding the theory behind how the ANN architecture of this
chapter works in both the forward and backward passes, the next section starts
its Python implementation. Note that the implementation is highly dependent on
the implementations developed previously in Chapters 3 and 4. Hence, it is very
important to have a solid understanding of how the previous implementations
work before building over them.

Python™ implementation
The complete code that implements an ANN with 3 inputs, 1 hidden layer with
2 neurons, and 1 output neuron and optimizing it using the gradient descent
algorithm is listed below.
Working with hidden layers Chapter | 5 89
90 Introduction to deep learning and neural networks with Python™

At first, the inputs and the output are prepared using these 2 lines.

The network weights are prepared according to the next lines which defines
the following 3 variables:
1. w1_3: An array holding the 3 weights connecting the 3 inputs to the first
hidden neuron (W11, W21, and W31).
2. w2_3: An array holding the 3 weights connecting the 3 inputs to the second
hidden neuron (W12, W22, and W32).
3. w3_2: An array with 2 weights for the connections between the hidden layer
neurons and the output neuron (W41 and W42).

After preparing the inputs and the weights, the next section works through
the forward pass.
Working with hidden layers Chapter | 5 91

Forward pass
The code of the forward pass is listed in the next block. It starts by calculating
the sum of products for the 2 hidden neurons and saving them into the variables
sop1 and sop2.
These 2 variables are passed to the sigmoid() function and the results are
saved in the variables sig1 and sig2. These 2 variables are multiplied by the
2 weights connected to the output neuron to return sop3.
sop3 is also applied as input to the sigmoid() function to return the
predicted output. Finally, the error is calculated.

After the forward pass is complete, next is to go through the backward pass.

Backward pass
The part of the code responsible for updating the weights between the hidden
and output layer is given in the next code.
The derivative of the error to the predicted output is calculated and saved in the
variable g1. g2 holds the predicted output to SOP3 derivative. The derivatives of
SOP3 to both W41 and W42 are calculated and saved in the vector g3. Note that
g1 and g2 will be used while calculating the gradients of the hidden neurons.
After calculating all derivatives required to calculate the gradients for the
weights W41 and W41, the gradients are calculated and saved in the grad_
hidden_output vector. Finally, these 2 weights are updated using the
update_w() function by passing the old weights, gradients, and learning rate.
92 Introduction to deep learning and neural networks with Python™

After updating the weights between the hidden and output layers, next is to
work on the weights between the input and hidden layers.
The next code updates the weights connected to the first hidden neuron. g3
represents the SOP3 to Activ1 derivative. Because this derivative is calculated
using the old weights’ values, the old weights are saved into the w3_2_old vari-
able for being used in this step. g4 represents the Activ1 to SOP1 derivative.
Finally, g5 represents the SOP1 to weights (W11, W21, and W31) derivatives.

Based on the derivatives saved in g3, g4, and g5, the gradients of the first
hidden neuron’s weights are calculated by multiplying the variables g1 to g5.
Based on the calculated gradients, the weights are updated.
Similar to the 3 weights connected to the first hidden neuron, the other 3 weights
connected to the second hidden neuron are updated according to the next code.

At the end of the code, the w3_2_old variable is set equal to w3_2.

By reaching this step, the entire code for implementing the neural network
in Fig. 5.1 is complete. The next subsection lists the code that trains the network
in a number of iterations.
Working with hidden layers Chapter | 5 93

Complete code
The previously discussed code just trains the network for a single iteration. The
next code uses a loop for going through a number of iterations in which the
weights are updated.
94 Introduction to deep learning and neural networks with Python™

After the iterations complete, Fig. 5.2 shows how the predicted output
changes for each iteration. The network is able to reach the desired output
(0.2) successfully.
Fig. 5.3 shows how the error changes for each iteration.
Working with hidden layers Chapter | 5 95

0.60

0.55

0.50

0.45
Prediction

0.40

0.35

0.30

0.25

0.20

0 10000 20000 30000 40000 50000 60000 70000 80000


Iteration Number
FIG. 5.2 Network prediction vs. iteration for the ANN architecture in Fig. 5.1.

0.40

0.35

0.30

0.25
Error

0.20

0.15

0.10

0.05

0.00

0 10000 20000 30000 40000 50000 60000 70000 80000


Iteration Number
FIG. 5.3 Network error vs. iteration for the ANN architecture in Fig. 5.1.
96 Introduction to deep learning and neural networks with Python™

Conclusion
Continuing the implementation of the ANN started in Chapters 3 and 4, this
chapter implemented an ANN with a hidden layer that has just 2 hidden neu-
rons. This chapter discussed the theory of how an ANN with 3 inputs, 1 hid-
den layer with 2 hidden neurons, and 1 output works. Based on a numerical
example, all steps in the forward and backward passes are covered. Finally, the
Python implementation is discussed.
In Chapter 6, the implementation will be extended to use any number of hid-
den neurons within a single hidden layer.

You might also like