0% found this document useful (0 votes)

17 views22 pages

Chapter 5

Artificial Intelligence chapter 5

Uploaded by

08deepaksingh.me

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views22 pages

Chapter 5

Artificial Intelligence chapter 5

Uploaded by

08deepaksingh.me

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Chapter 5

Working with hidden layers

Chapter outline
ANN with 1 hidden layer with Hidden layer weights gradients 85
2 neurons 75 Updating weights 87
Forward pass 76 Python™ implementation 88
Forward pass math calculations 77 Forward pass 91
Backward pass 79 Backward pass 91
Output layer weights 79 Complete code 93
Hidden layer weights 80 Conclusion 96
Backward pass math calculations 84
Output layer weights gradients 84

ABSTRACT
The latest neural network Python implementation built in Chapter 4 supports working
with any number of inputs but without hidden layers. This chapter extends the implemen-
tation to work with a single hidden layer with just 2 hidden neurons. In later chapters,
more hidden layers and neurons will be supported.

ANN with 1 hidden layer with 2 neurons

Similar to the strategy used for building the input layer in which layer starts with
a fixed number of inputs until building a generic input layer that can work with
any number of inputs, this chapter just builds a hidden layer with 2 neurons.
This section discusses the network architecture shown in Fig. 5.1. The net-
work has 3 inputs, 1 hidden layer with 2 neurons, and 1 output neuron.
Each of the 3 inputs is connected to the 2 hidden neurons. Thus, there are
2(3) = 6 connections. For each of the 6 connections, there is a different weight.
The weights between the input and hidden layers are labeled as Wzy where z
refers to the input layer neuron index and y refers to the index of the hidden
neuron. Note that the weights between the layers with indices K-1 and K can be
directly called the weights of layer K.
The weight for the connection between the first input X1 and the first hid-
den neuron is W11. The weight W12 is for the connection between X1 and the
second hidden neuron. Regarding X2, the weights W21 and W22 are for the
connections to the first and second hidden neurons, respectively. Similarly, X3
has 2 weights W31 and W32.

Introduction to Deep Learning and Neural Networks with Python™

https://doi.org/10.1016/B978-0-323-90933-4.00004-8 75
© 2021 Elsevier Inc. All rights reserved.
76 Introduction to deep learning and neural networks with Python™

FIG. 5.1 ANN architecture with 3 inputs, 1 output, and 1 hidden layer with 2 neurons.

In addition to the weights between the input and hidden layers, there are 2 weights
connecting the 2 hidden neurons to the output neuron which are W41 and W42.
How does the gradient descent algorithm work with these parameters? The
answer will be clear after discussing the theory of the forward and backward
passes. The next section discusses the theory of the forward pass.

Forward pass
In the forward pass, the neurons in the hidden layer accept the inputs from the
input layer in addition to their weights. For each hidden neuron, the sum of
products SOP between the inputs and their weights is calculated.
For the first hidden neuron, it accepts the 3 inputs X1, X2, and X3 in addition
to their weights W11, W21, and W31. The SOP for this neuron is calculated by
summing the products between each input and its weight. The SOP is calculated
in the next equation.
SOP1 X1W11 X 2W21 X3W31

For reference, the SOP for the first hidden neuron is labeled SOP1 in
Fig. 5.1. For the second hidden neuron, its SOP, which is labeled SOP2, is cal-
culated in the next equation.
SOP2 X1W12 X 2W22 X3W32

After calculating the SOP for the 2 hidden neurons, next is to feed the SOP of
each neuron to an activation function. Remember that the function used up to this
time is the sigmoid function which is calculated as given in the next equation.
1
sigmoid SOP
1 e SOP
By feeding SOP1 to the sigmoid function, the result is Activ1 as calcu-
lated by the next equation.
1
Activ1
1 e SOP1
Working with hidden layers Chapter | 5 77

For the second hidden neurons, its activation function output is Activ2 as
calculated by the next equation:
1
Activ2
1 e SOP2
Remember that in the forward pass, the outputs of a layer are regarded the in-
puts to the next layer. That is the outputs of the hidden layer, which are Activ1
and Activ2 as labeled in Fig. 5.1, are regarded the inputs to the output layer.
The process repeats for calculating the SOP in the output layer neuron. Each
input to the output neuron has a weight. For the first input Activ1, its weight is
W41. For the second input Activ2, its weight is W42. The SOP for the output
neuron is labeled SOP3 and calculated as follows:
SOP3 Activ1W41 Activ2 W42

SOP3 is fed to the sigmoid function to return Activ3 as given in the next
equation.
1
Predicted Activ3
1 e SOP3
Note that the output of the activation function Activ3 is regarded as the
predicted output of the network. After the network makes its prediction,
next is to calculate the error using the squared error function.
error Predicted Target
2

To have better understanding, the next section discusses an example to go

through the math calculations behind the forward pass.

Forward pass math calculations

According to the architecture in Fig. 5.1, there are 3 inputs 3 inputs (X1, X2, and
X3) and 1 output Y. The values of the 3 inputs and the output of a single sample
are listed in Table 5.1.
Regarding the network weights, Table 5.2 lists the weights for the first neu-
ron in the hidden layer.
The weights for the second hidden neuron are listed in Table 5.3.
The final weights are the ones connected to the output neuron which are
given in Table 5.4.

TABLE 5.1 Sample input for architecture in Fig. 5.1.

X1 X2 X3 Y
0.1 0.4 4.1 0.2
78 Introduction to deep learning and neural networks with Python™

TABLE 5.2 Initial weights for the first neuron in the hidden layer.
W11 W21 W31
0.481 0.299 0.192

TABLE 5.3 Initial weights for the second neuron in the hidden layer.
W12 W22 W32
0.329 0.548 0.214

TABLE 5.4 Initial weights for the output neuron.

W41 W42
0.882 0.567

For the first neuron in the hidden layer, the next equation calculates its SOP
(SOP1). The result is SOP1 = 0.9549.
SOP1 X1W11 X 2W21 X3W31 0.1 0.481 0.4 0.299 4.1 0.192 0.99549
The next equation calculates the SOP for the second hidden neuron which
is SOP2 = 1.1295.

SOP2 X1W12 X 2W22 X3W32 0.1 0.329 0.4 0.548 4.1 0.214 1.11295

After feeding SOP1 and SOP2 to the sigmoid function, the result is calcu-
lated according to the next equations.
1 1
Activ1 0.722
1 e SOP1 1 e 0.9549

1 1
Activ2 0.756
1 e SOP2 1 e 1.1295
The outputs of the hidden layer, Activ1 = 0.722 and Activ2 = 0.756,
are regarded the inputs to the next layer which is the output layer. As a result,
the values of the output layer neuron’s inputs are 0.722 and 0.756. The next
equation calculates the SOP for this neuron which is SOP3 = 1.066.

SOP3 Activ1W41 Activ2 W42 0.722 0.882 0.756 0.567 1.066

Working with hidden layers Chapter | 5 79

SOP3 is fed to the sigmoid function to return the predicted output as calcu-
lated in the next equation. The predicted output is 0.744.
1 1
Predicted Activ3 SOP3
1.066
0.744
1 e 1 e
After the predicted output is calculated, next is to calculate the prediction
error according to the next equation which results in an error equal to 0.296.

error Predicted Target 0.744 0.2 0.296

2 2

Calculating the prediction error of the network signals the end of the for-
ward pass. The next section discusses the theory of the backward pass.

Backward pass
In the backward pass, the goal is to calculate the gradients that update the net-
work weights. Because we start from where we ended in the forward pass, the
gradients of the last layer are calculated at first and then moving in a backward
direction until reaching the input layer. Let’s start calculating the gradients of
weights between the hidden layer and the output layer.

Output layer weights

Because there is no explicit equation relating both the error and the output lay-
er’s weights (W41 and W42), it is preferred to use the chain rule. What are the
derivatives in the chain from the error to the output layer’s weights?
Starting by the first weight, we need to find the derivative of the error with
respect to W41. The error function is used for this purpose.

error Predicted Target

The error function has 2 terms which are:

1. Predicted
2. Target
Because the error function does not include an explicit term as the weight
W41, one of these terms should lead to that weight. Of the 2 terms in the error
function, which one leads to the weight W41? It is Predicted because the
other term Target is constant.
The first derivative to calculate is the derivative of the error with respect
to the predicted output as calculated in the next equation.
dError
2 Predicted Target
dPredicted
80 Introduction to deep learning and neural networks with Python™

Next is to calculate the derivative of Predicted to SOP3 by substituting

in the derivative of the sigmoid function by SOP3 as given in the next equation.
dPredicted 1 1
SOP3
1 SOP3
dSOP3 1 e 1 e
The next derivative is the SOP3 to W41 derivative. To follow up, here is the
equation that relates both SOP3 and W41.

SOP3 Activ1W41 Activ2 W42

The derivative of SOP3 to W41 is given in the next equation.

dSOP3
= Activ1
dW41
By calculating all derivatives in the chain from the error to W41, the W41
gradient is calculated by multiplying all of these derivatives as given in the next
equation.
dError dError dPredicted dSOP3
=
dW41 dPredicted dSOP3 dW41

Similar to calculating the error to W41 derivative, the error to W42 de-
rivative is easily calculated. The only term that changes from the previous equa-
tion is the last one. Rather than calculating the SOP3 to W41 derivative, now
the SOP3 to W42 derivative is calculated which is given in the next equation.
dSOP3
= Activ2
dW42
Finally, the error to W42 gradient is calculated according to the next equation.
dError dError dPredicted dSOP3
=
dW42 dPredicted dSOP3 dW42

At this point, the gradients for all weights between the hidden layer and the
output layer are successfully calculated. Next is to calculate the gradients for the
weights between the input layer and the hidden layer.

Hidden layer weights

The generic chain of derivatives from the error to any of the weights of the hid-
den layer is given in the next equation where Wzy means the weight connecting
the input neuron with index z with the hidden neuron indexed y.
dError dError dPredicted dSOP3 dActiv y dSOPy
=
dWzy dPredicted dSOP3 dActivy dSOPy dWzy
Working with hidden layers Chapter | 5 81

Of the derivatives in the chain, the first 2 derivatives are the first 2 ones used
in the previous chain which are:
1. Error to Predicted derivative.
2. Predicted to SOP3 derivative.
The next derivative in the chain is the derivative of SOP3 with respect to
Activ1 and Activ2. The derivative of SOP3 to Activ1 helps to calculate the
gradients of the weights connected to the first hidden neuron which are W11, W21,
and W31. The derivative of SOP3 to Activ2 helps to calculate the gradients of
the weights connected to the second hidden neuron which are W12, W22, and W32.
Starting by Activ1, here is the equation relating SOP3 to Activ1.
SOP3 Activ1W41 Activ2 W42
The SOP3 to Activ1 derivative is calculated as given in the next equation.
dSOP3
= W41
dActiv1
Similarly, the SOP3 to Activ2 derivative is calculated as given in the next
equation.
dSOP3
= W42
dActiv2
After calculating the derivatives of SOP3 to both Activ1 and Activ2, the
next derivatives in the chain to be calculated are:
1. The derivative of Activ1 to SOP1.
2. The derivative of Activ2 to SOP2.
The derivative of Activ1 to SOP1 is calculated by substituting by SOP1
in the sigmoid function’s derivative as given in the next equation. The resulting
derivative will be used for updating the weights of the first hidden neuron which
are W11, W21, and W31.
dActiv1 1 1
1
dSOP1 1 e SOP1 1 e SOP1
Similarly, the Activ2 to SOP2 derivative is calculated according to the
next equation. This will be used for updating the weights of the second hidden
neuron which are W12, W22, and W32.
dActiv2 1 1
1
dSOP2 1 e SOP2 1 e SOP2
In order to update the first hidden neuron’s weights W11, W21, and W31,
the last derivative to calculate is the derivative between SOP1 to all of these
weights. Here is the equation relating SOP1 to all of these weights.
SOP1 X1W11 X 2W21 X3W31
82 Introduction to deep learning and neural networks with Python™

The derivatives of SOP1 to all of these 3 weights are given in the next
equations.
dSOP1
= X1
dW11

dSOP1
= X2
dW21

dSOP1
= X3
dW31
Here is the equation relating SOP2 to the second hidden neuron’s weights
W12, W22, and W32.
SOP2 X1W12 X 2W22 X3W32

The derivatives of SOP2 to W12, W22, and W32 are given in the next equations.
dSOP2
= X1
dW12

dSOP2
= X2
dW22

dSOP2
= X3
dW32
After calculating all derivatives in the chains from the error to all hidden
weights, next is to multiply them for calculating the gradient of each weight.
For the weights connected to the first hidden neuron (W11, W21, and W31),
their gradients are calculated using the chains given in the next equations. Note
that all of these chains share all derivatives except for the last one.
dError dError dPredicted dSOP3 dActiv1 dSOP1
=
dW11 dPredicted dSOP3 dActiv1 dSOP1 dW11

dError dError dPredicted dSOP3 dActiv1 dSOP1

=
dW21 dPredicted dSOP3 dActiv1 dSOP1 dW21

dError dError dPredicted dSOP3 dActiv1 dSOP1

=
dW31 dPredicted dSOP3 dActiv1 dSOP1 dW31
For the weights connected to the second hidden neuron (W12, W22, and
W32), their gradients are calculated using the chains given in the next equations.
Note that all of these chains share all derivatives except for the last derivative.
Working with hidden layers Chapter | 5 83

dError dError dPredicted dSOP3 dActiv2 dSOP2

=
dW12 dPredicted dSOP3 dActiv2 dSOP2 dW12

dError dError dPredicted dSOP3 dActiv2 dSOP2

=
dW22 dPredicted dSOP3 dActiv2 dSOP2 dW22

dError dError dPredicted dSOP3 dActiv2 dSOP2

=
dW32 dPredicted dSOP3 dActiv2 dSOP2 dW32
At this time, the chains for calculating the gradients for all weights in the
entire network are successfully prepared. The next equations summarize these
chains.
dError dError dPredicted dSOP3
=
dW41 dPredicted dSOP3 dW41

dError dError dPredicted dSOP3

=
dW42 dPredicted dSOP3 dW42

dError dError dPredicted dSOP3 dActiv1 dSOP1

=
dW11 dPredicted dSOP3 dActiv1 dSOP1 dW11

dError dError dPredicted dSOP3 dActiv1 dSOP1

=
dW21 dPredicted dSOP3 dActiv1 dSOP1 dW21

dError dError dPredicted dSOP3 dActiv1 dSOP1

=
dW31 dPredicted dSOP3 dActiv1 dSOP1 dW31

dError dError dPredicted dSOP3 dActiv2 dSOP2

=
dW12 dPredicted dSOP3 dActiv2 dSOP2 dW12

dError dError dPredicted dSOP3 dActiv2 dSOP2

=
dW22 dPredicted dSOP3 dActiv2 dSOP2 dW22

dError dError dPredicted dSOP3 dActiv2 dSOP2

=
dW32 dPredicted dSOP3 dActiv2 dSOP2 dW32
After calculating all gradients, next is to update the weights according to the
next equation.
Wnew Wold learning _ rate grad
By discussing the steps of calculating the gradients and updating the weights,
the next section continues the math example started previously to do the back-
ward pass calculations.
84 Introduction to deep learning and neural networks with Python™

Backward pass math calculations

In this section, the values for all derivatives are calculated followed by calculat-
ing the weights’ gradients. Of all derivatives in the chains, the first 2 derivatives
are shared across all the chains.
Given the values of the predicted and target outputs, the first derivative in all
chains is calculated in the next equation.
dError
2 Predicted Target 2 0.744 0.2 1.088
dPredicted
The second derivative in all chains is between Predicted and SOP3
which is calculated according to the next equation.
dPredicted 1 1 1 1
SOP3
1 SOP3
1.066
1 1.066
0.191
dSOP3 1 e 1 e 1 e 1 e
Besides the first 2 derivatives, the others change for some chains. The next
subsection calculates the derivative for the output layer. The subsequent subsec-
tion works on the derivatives of the hidden layer.

Output layer weights gradients

For calculating the gradients of the 2 output layer’s weights W41 and W41, there
are 2 remaining derivatives in the chain which are:
1. The derivative of SOP3 to W41.
2. The derivative of SOP3 to W42.
These 2 derivatives are calculated in the next equations.
dSOP3
= Activ
= 1 0.722
dW41

dSOP3
= Activ
= 2 0.756
dW42
Once all derivatives in the chain connecting the error to the 2 output layer’s
weights W41 and W41 are prepared, the gradients can be calculated as in the
next equations. The gradients are 0.15 and 0.157.
dError dError dPredicted dSOP3
1.088 0.191 0.722 0.15
dW41 dPredicted dSOP3 dW41

dError dError dPredicted dSOP3

1.088 0.191 0.756 0.157
dW42 dPredicted dSOP3 dW42
After the gradients for W41 and W42 are calculated, the next section works
on calculating the gradients of the hidden neurons.
Working with hidden layers Chapter | 5 85

Hidden layer weights gradients

According to the chains of derivatives of the hidden neurons, the next 2 deriva-
tives to be calculated are:
1. The derivative of SOP3 to Activ1.
2. The derivative of SOP3 to Activ2.
These 2 derivatives are calculated in the next equations.
dSOP3
= W=41 0.882
dActiv1

dSOP3
= W=42 0.567
dActiv2
The next 2 derivatives are:
1. The derivative of Activ1 to SOP1.
2. The derivative of Activ2 to SOP2.
These derivatives are calculated according to the next equations.
dActiv1 1 1 1 1
1 1 0.2
dSOP1 1 e SOP1 1 e SOP1 1 e
0.9549
1 e
0.9549

dActiv2 1 1 1 1
SOP2
1 SOP2
1.1295
1 1.1295
0.185
dSOP2 1 e 1 e 1 e 1 e
Before calculating the gradient for the weights of the first hidden neuron,
there are 3 derivatives to be calculated which are:
1. The derivative of SOP1 to W11.
2. The derivative of SOP1 to W21.
3. The derivative of SOP1 to W31.
Their calculations are given in the next equations.
dSOP1
= X=
1 0.1
dW11

dSOP1
= X=
2 0.4
dW21

dSOP1
= X=
3 4.1
dW31
86 Introduction to deep learning and neural networks with Python™

By multiplying the derivatives in the chain from the error to each of the 3 weights
of the first hidden neuron (W11, W21, and W31), their gradients are calculated ac-
cording to the next equations. The gradients are 0.004, 0.015, and 0.15.
dError dError dPredicted dSOP3 dActiv1 dSOP1

dW11 dPredicted dSOP3 dActiv1 dSOP1 dW11
1.088 0.191 0.882 0.2 0.1 0.004

dError dError dPredicted dSOP3 dActiv1 dSOP1

dW21 dPredicted dSOP3 dActiv1 dSOP1 dW21
1.088 0.191 0.882 0.2 0.4 0.015

dError dError dPredicted dSOP3 dActiv1 dSOP1

dW31 dPredicted dSOP3 dActiv1 dSOP1 dW31
1.088 0.191 0.882 0.2 4.1 0.15

For the 3 weights of the second hidden neuron (W12, W22, and W32), there
are 3 remaining derivatives to be calculated which are:
1. The derivative of SOP2 to W12.
2. The derivative of SOP2 to W22.
3. The derivative of SOP2 to W32.
These derivatives are calculated according to the next equations.
dSOP2
= X=
1 0.1
dW12

dSOP2
= X=
2 0.4
dW22

dSOP2
= X=
3 4.1
dW32
By multiplying the derivatives in the chain from the error to each of the 3 weights
of the second hidden neuron (W12, W22, and W32), their gradients are calculated
according to the next equations. The gradients are 0.002, 0.009, and 0.089.
dError dError dPredicted dSOP3 dActiv2 dSOP2

dW12 dPredicted dSOP3 dActiv2 dSOP2 dW12
1.088 0.191 0.567 0.185 0.1 0.002

dError dError dPredicted dSOP3 dActiv2 dSOP2

dW22 dPredicted dSOP3 dActiv2 dSOP2 dW22
1.088 0.191 0.567 0.185 0.4 0.009
Working with hidden layers Chapter | 5 87

dError dError dPredicted dSOP3 dActiv2 dSOP2

dW32 dPredicted dSOP3 dActiv2 dSOP2 dW32
1.088 0.191 0.567 0.185 4.1 0.089
By calculating the gradients for all weights in the network, the next subsec-
tion updates the weights.

Updating weights
After calculating the gradients for all weights in the network, the next equation
updates all network weights assuming that the learning_rate is 0.001.
dError
W11new W11old learning _ rate 0.481 0.001 0.004 0.480996
dW11

dError
W21new W21old learning _ rate 0.299 0.001 0.015 0.298985
dW21

dError
W31new W31old learning _ rate 0.192 0.001 0.15 0.19185
dW31

dError
W12 new W12 old learning _ rate 0.329 0.001 0.002 0.328998
dW12

dError
W22 new W22 old learning _ rate 0.548 0.001 0.009 0.547991
dW22

dError
W32 new W32 old learning _ rate 0.214 0.001 0.089 0.213911
dW32

dError
W41new W41old learning _ rate 0.882 0.001 0.15 0.88185
dW41

dError
W41new W41old learning _ rate 0.567 0.001 0.157 0.566843
dW41
At this time, the network weights are updated in only 1 iteration. The for-
ward and backward passes calculations could be repeated for a number of itera-
tions until reaching the desired output.
If the calculations are repeated only once, the error will be reduced from
0.296 to 0.29543095. That is the error reduction is only 0.000569049.
Note that setting the learning rate to a higher value than 0.001 may help in
increasing the speed of error reduction.
88 Introduction to deep learning and neural networks with Python™

After understanding the theory behind how the ANN architecture of this
chapter works in both the forward and backward passes, the next section starts
its Python implementation. Note that the implementation is highly dependent on
the implementations developed previously in Chapters 3 and 4. Hence, it is very
important to have a solid understanding of how the previous implementations
work before building over them.

Python™ implementation
The complete code that implements an ANN with 3 inputs, 1 hidden layer with
2 neurons, and 1 output neuron and optimizing it using the gradient descent
algorithm is listed below.
Working with hidden layers Chapter | 5 89
90 Introduction to deep learning and neural networks with Python™

At first, the inputs and the output are prepared using these 2 lines.

The network weights are prepared according to the next lines which defines
the following 3 variables:
1. w1_3: An array holding the 3 weights connecting the 3 inputs to the first
hidden neuron (W11, W21, and W31).
2. w2_3: An array holding the 3 weights connecting the 3 inputs to the second
hidden neuron (W12, W22, and W32).
3. w3_2: An array with 2 weights for the connections between the hidden layer
neurons and the output neuron (W41 and W42).

After preparing the inputs and the weights, the next section works through
the forward pass.
Working with hidden layers Chapter | 5 91

Forward pass
The code of the forward pass is listed in the next block. It starts by calculating
the sum of products for the 2 hidden neurons and saving them into the variables
sop1 and sop2.
These 2 variables are passed to the sigmoid() function and the results are
saved in the variables sig1 and sig2. These 2 variables are multiplied by the
2 weights connected to the output neuron to return sop3.
sop3 is also applied as input to the sigmoid() function to return the
predicted output. Finally, the error is calculated.

After the forward pass is complete, next is to go through the backward pass.

Backward pass
The part of the code responsible for updating the weights between the hidden
and output layer is given in the next code.
The derivative of the error to the predicted output is calculated and saved in the
variable g1. g2 holds the predicted output to SOP3 derivative. The derivatives of
SOP3 to both W41 and W42 are calculated and saved in the vector g3. Note that
g1 and g2 will be used while calculating the gradients of the hidden neurons.
After calculating all derivatives required to calculate the gradients for the
weights W41 and W41, the gradients are calculated and saved in the grad_
hidden_output vector. Finally, these 2 weights are updated using the
update_w() function by passing the old weights, gradients, and learning rate.
92 Introduction to deep learning and neural networks with Python™

After updating the weights between the hidden and output layers, next is to
work on the weights between the input and hidden layers.
The next code updates the weights connected to the first hidden neuron. g3
represents the SOP3 to Activ1 derivative. Because this derivative is calculated
using the old weights’ values, the old weights are saved into the w3_2_old vari-
able for being used in this step. g4 represents the Activ1 to SOP1 derivative.
Finally, g5 represents the SOP1 to weights (W11, W21, and W31) derivatives.

Based on the derivatives saved in g3, g4, and g5, the gradients of the first
hidden neuron’s weights are calculated by multiplying the variables g1 to g5.
Based on the calculated gradients, the weights are updated.
Similar to the 3 weights connected to the first hidden neuron, the other 3 weights
connected to the second hidden neuron are updated according to the next code.

At the end of the code, the w3_2_old variable is set equal to w3_2.

By reaching this step, the entire code for implementing the neural network
in Fig. 5.1 is complete. The next subsection lists the code that trains the network
in a number of iterations.
Working with hidden layers Chapter | 5 93

Complete code
The previously discussed code just trains the network for a single iteration. The
next code uses a loop for going through a number of iterations in which the
weights are updated.
94 Introduction to deep learning and neural networks with Python™

After the iterations complete, Fig. 5.2 shows how the predicted output
changes for each iteration. The network is able to reach the desired output
(0.2) successfully.
Fig. 5.3 shows how the error changes for each iteration.
Working with hidden layers Chapter | 5 95

0.60

0.55

0.50

0.45
Prediction

0.40

0.35

0.30

0.25

0.20

0 10000 20000 30000 40000 50000 60000 70000 80000

Iteration Number
FIG. 5.2 Network prediction vs. iteration for the ANN architecture in Fig. 5.1.

0.40

0.35

0.30

0.25
Error

0.20

0.15

0.10

0.05

0.00

0 10000 20000 30000 40000 50000 60000 70000 80000

Iteration Number
FIG. 5.3 Network error vs. iteration for the ANN architecture in Fig. 5.1.
96 Introduction to deep learning and neural networks with Python™

Conclusion
Continuing the implementation of the ANN started in Chapters 3 and 4, this
chapter implemented an ANN with a hidden layer that has just 2 hidden neu-
rons. This chapter discussed the theory of how an ANN with 3 inputs, 1 hid-
den layer with 2 hidden neurons, and 1 output works. Based on a numerical
example, all steps in the forward and backward passes are covered. Finally, the
Python implementation is discussed.
In Chapter 6, the implementation will be extended to use any number of hid-
den neurons within a single hidden layer.

Bms Project (Matrix) Completed
83% (72)
Bms Project (Matrix) Completed
14 pages
Copy of SSG 311 - Module 6 - Neural Network
No ratings yet
Copy of SSG 311 - Module 6 - Neural Network
41 pages
06 NeuralNetworks 2024
No ratings yet
06 NeuralNetworks 2024
82 pages
DL - ANN - RNN - CNN (Autosaved) (Autosaved)
No ratings yet
DL - ANN - RNN - CNN (Autosaved) (Autosaved)
53 pages
Modue 2 - Back Propagation Algorithm-Updated
No ratings yet
Modue 2 - Back Propagation Algorithm-Updated
51 pages
P95 Course Slides
No ratings yet
P95 Course Slides
86 pages
A Gentle Introduction To Neural Networks With Python
100% (1)
A Gentle Introduction To Neural Networks With Python
85 pages
Chapter 8
No ratings yet
Chapter 8
28 pages
Chapter 6
No ratings yet
Chapter 6
15 pages
Ann
No ratings yet
Ann
30 pages
A Gentle Introduction To Neural Networks With Python
No ratings yet
A Gentle Introduction To Neural Networks With Python
85 pages
Chapter 7
No ratings yet
Chapter 7
21 pages
ANN Model Calculation Example Ascii
No ratings yet
ANN Model Calculation Example Ascii
3 pages
7 - Neural Networks
No ratings yet
7 - Neural Networks
66 pages
NN 2
No ratings yet
NN 2
31 pages
Chap11 Neural Nets
No ratings yet
Chap11 Neural Nets
38 pages
Lect8 DNN
No ratings yet
Lect8 DNN
33 pages
NN 2
No ratings yet
NN 2
31 pages
Chapter 3
No ratings yet
Chapter 3
13 pages
Geniuses List PDF
No ratings yet
Geniuses List PDF
9 pages
Chapter3 - BP
No ratings yet
Chapter3 - BP
12 pages
Neural Network
100% (1)
Neural Network
54 pages
Deep Learning-Material For The Units 1,2,3
No ratings yet
Deep Learning-Material For The Units 1,2,3
36 pages
Neural Networks / Deep Learning
No ratings yet
Neural Networks / Deep Learning
9 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
15 Neural Network Updated
No ratings yet
15 Neural Network Updated
85 pages
Module 5 Lecture 2
No ratings yet
Module 5 Lecture 2
45 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Intro To Deep Learning - Lab
No ratings yet
Intro To Deep Learning - Lab
7 pages
Machine Learning Lecture 11
No ratings yet
Machine Learning Lecture 11
28 pages
7 Neural Networks
No ratings yet
7 Neural Networks
70 pages
Mind - How To Build A Neural Network (Part One)
No ratings yet
Mind - How To Build A Neural Network (Part One)
9 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
31 pages
Single Neuron Model
No ratings yet
Single Neuron Model
16 pages
Neural
No ratings yet
Neural
53 pages
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
No ratings yet
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
35 pages
Feedforward in Neural Networks
No ratings yet
Feedforward in Neural Networks
14 pages
Pr2 ANN WriteUp
No ratings yet
Pr2 ANN WriteUp
11 pages
Aml Pa
No ratings yet
Aml Pa
17 pages
Pr3 ANN WriteUp
No ratings yet
Pr3 ANN WriteUp
8 pages
CSD311: Artificial Intelligence
No ratings yet
CSD311: Artificial Intelligence
12 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Mathematics: Stage 1: Regional Mathematical Olympiad (RMO and pre-RMO)
No ratings yet
Mathematics: Stage 1: Regional Mathematical Olympiad (RMO and pre-RMO)
7 pages
Neural Networks - Annotated
No ratings yet
Neural Networks - Annotated
21 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
A Paper
No ratings yet
A Paper
29 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
pastpapersMathematics20-20International20 (0607) 202520Specimen20Paper20&20Syllabus0607 Y25 SP
No ratings yet
pastpapersMathematics20-20International20 (0607) 202520Specimen20Paper20&20Syllabus0607 Y25 SP
16 pages
ANN Research
No ratings yet
ANN Research
18 pages
PCA
100% (1)
PCA
45 pages
Back Propagation
No ratings yet
Back Propagation
9 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
Artificial Neural Networks - MLP
No ratings yet
Artificial Neural Networks - MLP
52 pages
Ee 182 - Obe
No ratings yet
Ee 182 - Obe
5 pages
Week 6 - Lab
No ratings yet
Week 6 - Lab
5 pages
Non Verbal Reasoning Answers
No ratings yet
Non Verbal Reasoning Answers
7 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
10 pages
FFNN
No ratings yet
FFNN
3 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
NN Lab2
No ratings yet
NN Lab2
5 pages
Lesson Plan For Vector Math
No ratings yet
Lesson Plan For Vector Math
1 page
Back Propagation Learning Algorithm
No ratings yet
Back Propagation Learning Algorithm
15 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
PDF (Free) : National Defence Academy NDA Exam Pattern & Syllabus
No ratings yet
PDF (Free) : National Defence Academy NDA Exam Pattern & Syllabus
4 pages
Run Charts: Strength Time
No ratings yet
Run Charts: Strength Time
7 pages
Aryabhatta The The Great Indian Mathemat
No ratings yet
Aryabhatta The The Great Indian Mathemat
4 pages
Study Plan After Jee Main 2025 Session 1
No ratings yet
Study Plan After Jee Main 2025 Session 1
16 pages
Teaching Plan GSC 221 Discrete Maths
No ratings yet
Teaching Plan GSC 221 Discrete Maths
9 pages
Chapter 3 Binary Number Arithmetic and Number Representation
No ratings yet
Chapter 3 Binary Number Arithmetic and Number Representation
23 pages
Unit 3 Sessionwise Problems
No ratings yet
Unit 3 Sessionwise Problems
5 pages
Class: 6 (Syllabus & Sample Questions)
No ratings yet
Class: 6 (Syllabus & Sample Questions)
2 pages
Chapter Wise Analysis (No. of Questions) : IIT JEE 2010 Maths Analysis
No ratings yet
Chapter Wise Analysis (No. of Questions) : IIT JEE 2010 Maths Analysis
3 pages
Moduli Spaces and Vector Bundles 1st Edition Leticia Brambila-Paz - Experience The Full Ebook by Downloading It Now
100% (1)
Moduli Spaces and Vector Bundles 1st Edition Leticia Brambila-Paz - Experience The Full Ebook by Downloading It Now
85 pages
430 4 3 Mathematics (Basic)
No ratings yet
430 4 3 Mathematics (Basic)
19 pages
Lecture Planner - Maths - Backlog Killer For Class 12th JEE
No ratings yet
Lecture Planner - Maths - Backlog Killer For Class 12th JEE
1 page
SPE26339 Risk Analysis Drilling AFE
No ratings yet
SPE26339 Risk Analysis Drilling AFE
9 pages
On An Irreducibility Theorem of A Cohn
No ratings yet
On An Irreducibility Theorem of A Cohn
5 pages
Math 10
No ratings yet
Math 10
5 pages
Transient Simulation: Lecture Iv: II II II
No ratings yet
Transient Simulation: Lecture Iv: II II II
22 pages
Fastest Possible Processing Speed of The Human Brain - World Mental Calculation
No ratings yet
Fastest Possible Processing Speed of The Human Brain - World Mental Calculation
11 pages
Analysis and Optimal Control of Pulse Width Modulation PDF
No ratings yet
Analysis and Optimal Control of Pulse Width Modulation PDF
10 pages
2011 SQ Cs 171 Midterm Exam
No ratings yet
2011 SQ Cs 171 Midterm Exam
4 pages
CheatSheet BBE
No ratings yet
CheatSheet BBE
2 pages
Mathematics Test Sample Questions: X Xy y
No ratings yet
Mathematics Test Sample Questions: X Xy y
2 pages

Chapter 5

Uploaded by

Chapter 5

Uploaded by

Chapter 5

Working with hidden layers

ANN with 1 hidden layer with 2 neurons

Introduction to Deep Learning and Neural Networks with Python™

To have better understanding, the next section discusses an example to go

Forward pass math calculations

TABLE 5.1 Sample input for architecture in Fig. 5.1.

TABLE 5.4 Initial weights for the output neuron.

SOP3  Activ1W41  Activ2 W42  0.722  0.882   0.756  0.567   1.066

error   Predicted  Target    0.744  0.2   0.296

Output layer weights

error   Predicted  Target 

The error function has 2 terms which are:

Next is to calculate the derivative of Predicted to SOP3 by substituting

SOP3  Activ1W41  Activ2 W42

The derivative of SOP3 to W41 is given in the next equation.

Hidden layer weights

dError dError dPredicted dSOP3 dActiv1 dSOP1

dError dError dPredicted dSOP3 dActiv1 dSOP1

dError dError dPredicted dSOP3 dActiv2 dSOP2

dError dError dPredicted dSOP3 dActiv2 dSOP2

dError dError dPredicted dSOP3 dActiv2 dSOP2

dError dError dPredicted dSOP3

dError dError dPredicted dSOP3 dActiv1 dSOP1

dError dError dPredicted dSOP3 dActiv1 dSOP1

dError dError dPredicted dSOP3 dActiv1 dSOP1

dError dError dPredicted dSOP3 dActiv2 dSOP2

dError dError dPredicted dSOP3 dActiv2 dSOP2

dError dError dPredicted dSOP3 dActiv2 dSOP2

Backward pass math calculations

Output layer weights gradients

dError dError dPredicted dSOP3

Hidden layer weights gradients

dError dError dPredicted dSOP3 dActiv1 dSOP1

dError dError dPredicted dSOP3 dActiv1 dSOP1

dError dError dPredicted dSOP3 dActiv2 dSOP2

dError dError dPredicted dSOP3 dActiv2 dSOP2

0 10000 20000 30000 40000 50000 60000 70000 80000

0 10000 20000 30000 40000 50000 60000 70000 80000

You might also like

SOP3 Activ1W41 Activ2 W42 0.722 0.882 0.756 0.567 1.066

error Predicted Target 0.744 0.2 0.296

error Predicted Target

SOP3 Activ1W41 Activ2 W42