0% found this document useful (0 votes)
29 views15 pages

Chapter 6

The document describes the structure and naming conventions of a feedforward neural network. It has an input layer, two hidden layers, and an output layer. There are weights between each layer connecting nodes. The pre-activation and activation values of each node are described. Weights are represented by W matrices and biases by b vectors according to generally accepted naming conventions for neural networks.

Uploaded by

ADY Beats
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views15 pages

Chapter 6

The document describes the structure and naming conventions of a feedforward neural network. It has an input layer, two hidden layers, and an output layer. There are weights between each layer connecting nodes. The pre-activation and activation values of each node are described. Weights are represented by W matrices and biases by b vectors according to generally accepted naming conventions for neural networks.

Uploaded by

ADY Beats
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter 6: Feed Forward Neural Network

1.3 Feed Forward Neural Networks

This is the basic structure of an Artificial Neural Network

Observations from this network


1) There is 1 Input layer, 2 hidden layers and 1 output layer
2) Input Layer has 3 nodes having values x 1 , x 2 , x 3 depicting 3 input values
3) Input Layer also has a bias node b 1
4) Hidden Layer 1 has 3 nodes and 1 bias node b 2
5) Hidden Layer 2 has 3 nodes and 1 bias node b 3
6) Output Layer has 2 nodes
7) Input Layer is Layer 0
8) Hidden Layer 1 is Layer 1
9) Hidden Layer 2 is Layer 2
10) Output Layer is Layer 3
11) There are 9 weights connecting Input Layer nodes to Hidden Layer 1 node
1
w 11=w 111=¿the weight connecting 1st Node of 1st Layer to 1st Node of Next Layer
1
w 12=w 112=¿ the weight connecting 1st Node of 1st Layer to 2nd Node of Next Layer
1
w 13=w 113=¿ the weight connecting 1st Node of 1st Layer to 3rd Node of Next Layer
1
w 21=w 121=¿the weight connecting 2nd Node of 1st Layer to 1st Node of Next Layer
1
w 22=w 122=¿the weight connecting 2nd Node of 1st Layer to 2nd Node of Next Layer
1
w 23=w 123=¿the weight connecting 2nd Node of 1st Layer to 3rd Node of Next Layer
1
w 31=w 131=¿the weight connecting 3rd Node of 1st Layer to 1st Node of Next Layer
1
w 32=w 132=¿the weight connecting 3rd Node of 1st Layer to 2nd Node of Next Layer
1
w 33=w 133=¿the weight connecting 3rd Node of 1st Layer to 3rd Node of Next Layer
12) There are 9 weights connecting Hidden Layer 1 nodes to Hidden Layer 2 nodes
2
w 11=w 211=¿ the weight connecting 1st Node of 2nd Layer to 1st Node of Next Layer
2
w 12=w 212=¿the weight connecting 1st Node of 2nd Layer to 2nd Node of Next Layer
2
w 13=w 213=¿the weight connecting 1st Node of 2nd Layer to 3rd Node of Next Layer
2
w 21=w 221=¿the weight connecting 2nd Node of 2nd Layer to 1st Node of Next Layer
2
w 22=w 222=¿the weight connecting 2nd Node of 2nd Layer to 2nd Node of Next Layer
2
w 23=w 223=¿the weight connecting 2nd Node of 2nd Layer to 3rd Node of Next Layer
2
w 31=w 231=¿the weight connecting 3rd Node of 2nd Layer to 1st Node of Next Layer
2
w 32=w 232=¿the weight connecting 3rd Node of 2nd Layer to 2nd Node of Next Layer
2
w 33=w 233=¿the weight connecting 3rd Node of 2nd Layer to 3rd Node of Next Layer

13) There are 6 weights connecting Hidden Layer 2 nodes to Output Layer nodes
3
w 11=w 311=¿ the weight connecting 1st Node of 3rd Layer to 1st Node of Next Layer
3
w 12=w 312=¿the weight connecting 1st Node of 3rd Layer to 2nd Node of Next Layer
3
w 21=w 321=¿the weight connecting 2nd Node of 3rd Layer to 1st Node of Next Layer
3
w 22=w 322=¿the weight connecting 2nd Node of 3rd Layer to 2nd Node of Next Layer
3
w 31=w 331=¿the weight connecting 3rd Node of 3rd Layer to 1st Node of Next Layer
3
w 32=w 332=¿the weight connecting 3rd Node of 3rd Layer to 2nd Node of Next Layer

14) All biases have weight =1


15) Each hidden layer node is divided into 2 parts
L
Pre-Activation : a i vector
L
Activation : hi vector
16) These are the Pre activation and activation vectors of Hidden Layer 1
1
a 1=a 11=pre-activation of 1st Node of Hidden Layer 1
1
a 2=a 12=pre-activation of 2nd Node of Hidden Layer 1
1
a 3=a 13=pre-activation of 3rd Node of Hidden Layer 1
1
h1=h 11=¿ activation of 1st Node of Hidden Layer 1
1
h2 =h 12=¿activation of 2nd Node of Hidden Layer 1
1
h3 =h 13=¿ activation of 3rd Node of Hidden Layer 1

17) These are the Pre activation and activation vectors of Hidden Layer 2
2
a 1=a 21=pre-activation of 1st Node of Hidden Layer 2
2
a 2=a 22=pre-activation of 2nd Node of Hidden Layer 2
2
a 3=a 23=pre-activation of 3rd Node of Hidden Layer 2
2
h1 =h 21=¿activation of 1st Node of Hidden Layer 2
2
h2 =h 22=¿activation of 2nd Node of Hidden Layer 2
2
h3 =h 23=¿ activation of 3rd Node of Hidden Layer 2

18) These are the Pre activation and activation vectors of Output Layer
3
a 1=a 21=pre-activation of 1st Node of Output Layer
3
a 2=a 22=pre-activation of 2nd Node of Output Layer
3
h1 =h 21=¿activation of 1st Node of Output Layer
3
h2 =h 22=¿activation of 2nd Node of Output Layer

19) For the output Layer h=o ( output )

Lets now understand the naming convention for a general ANN

1) Input to the network is x , an n dimensional vector


n
xi ∈ R
2) The network contains L total layers (Not counting input layer)
3) The network contains L-1 Hidden Layers having n neurons each
4) The output Layer contains k neurons (corresponding to k classes)
5) Each neuron in hidden layers can be divided into 2 parts

[]
Pre-activation and Activation vectors
L
a1
L
a2
L
L
a3
Pre-activation vector of layer L = a = .
.
.
a Ln

[]
L
h1
L
h2
L
L
h3
Activation vector of Layer L = H = .
.
.
hnL
6) There are weights between Lth Layer ∧L−1th Layer in W L ¿

[ ]
L L L L
w11 w 21 w 31 . . . wn 1
L L L L
w12 w22 w32 . . . wn 2
W = w L w L w L . . . w L → W i ∈ Rn xn
L
13 23 33 n3
. . .. . .
w L1 n w L2 n wL3 n .. . w Lnn

7) The biases at Lth Layer to L+1th Layer with n neurons are in vector BL
[]
L
b1
L
b2
L
L
b3 n
B= → Bi ∈ R
.
.
.
bnL
8) Weights between Output Layer with k neurons and Last hidden layer with n neurons

[ ]
L L L L
w11 w21 w 31 . . . wn 1
L L L L
w12 w22 w32 . . . wn 2
W = w L w L w L . . . w L → W i ∈ Rk x n
L
13 23 33 n3
. . . .. .
w L1 k w L2 k w3Lk .. . w Lnk

[]
9) Bias between Output Layer with k neurons and Last hidden layer with n neurons
L
b1
L
b2
L
L
b3 k
B= → Bi ∈ R
.
.
.
bkL

Meaning of Pre-activation and Activation

Pre activation at Layer L

[ ] [ ][ ] [ ]
L LT L−1 L
a =W H +B
L L−1 L
a1 h1 b1
L L L L
a2
L w11 w12 w13 . . . w1 n h L−1
b2
L
2
L L L
a3
L w21 w22 w23 .. . w2Ln h L−1 b L
3 3
= wL wL wL . . . wL . +
. 31 32 33 3n . .
. . . . .. . . .
L L L L
. w n 1 wn 2 wn 3 .. . w nn . .
L L−1
an hn b Ln
Activation at Layer L
H L=g ( aL )
[ ] ([ ])
L L
h1 a1
L
h2 aL2
L L
h3 a3
. =g .
. .
. .
L L
hn an

g ( a ( x ) ) is the activation function for example g(x) can be logistic sigmoid activation function
1
g ( x )=σ ( x )= −x
1+e
That means
L 1
H =
1+e−( a )
L

Loss function
The measure of how “Bad” the network performed is measured using some idea of difference between
actual output and the predicted output
Some popular Loss functions are

1) Mean Squared Error Loss


N
1
L ( θ )= ∑ ( ^
2
y i− y i )
N (i =1 )
θ is the vector containing trainable parameters ,θ= w
b []
Its basically says
 Find output y from your neural network for a given data point (input vector)
2
 Find Squared Error=( ^y − y ) where ^y is the actual output/label for the given data
point(input vector)
 Do this for all data points
 Find average of all Squared Errors
 That will be the Mean Squared Loss of your ANN

2) Cross Entropy Loss


This Loss function is used when we are dealing with categorical data where the output usually
is a probability distribution, which tells how much the neural network is sure that the given
data point(input vector) belongs to a particular class out of different classes
For eg animal classification using physical traits as input vector

( Q 1( X ) )
n
L ( θ )=∑ P ( X ) . log
i=1
Here
P(X) is the actual probability distribution
Q(X) is the predicted distribution
More about Cross Entropy in Chapter 2

Learning Parameters of feed forward Neural Networks


The parameters of our Feed Forward Neural Network are
1) Input vector X
2) Output vector Y
3) Weight Matrices W L where L is the Layer index
4) Bias Vectors BL where L is the Layer index
Out of all these the only thing we can control in order to improve our network’s performance or to
reduce the Loss function’s value is the weights and the bias values

General Representation of weights


Weights between Lth Layer with n nodes∧L−1th Layer with mnodes in W L ¿

( )
w L 11
⋯ wLn 1
L
W = ⋮ ⋱ ⋮
L L
w 1m ⋯ w nm
General Representation of biases
The biases at Lth Layer to L+1th Layer with mneurons are in vector BL

()
L
b 1
L
B= b 2
L


L
b m
Gradient Descent algorithm

t←0;
epochs ←1000 ;
while t <epochs do
w t +1=w t−η ∇ w ;
b t+1 =bt −η ∇ b;
t ← t+1
end

For representational simplicity let’s store our Learnable Parameters in another vector θ
[[]]
[ ]
( )
1
w 11 ⋯ w 1 n1 1
⋮ ⋱ ⋮
1 1
w 1 n2 ⋯ w n1 n 2

( )
w2 11 ⋯ w 2 n2 1
⋮ ⋱ ⋮
2 2
w 1 n3 ⋯ w n2 n 3

( )
w 3 11 ⋯ w 3n31
⋮ ⋱ ⋮
3 3
w 1 n4 ⋯ w n3 n 4

[]

( )
( )
w L−1 11 ⋯ w( L−1 ) n (L−1) 1

[]
W1 ⋮ ⋱ ⋮
2
W w ( L−1)
1 n( L ) ⋯ w ( L−1)
n( L−1 ) n( L )
3
W

[ ] W L−1

()
θ= W = =

[]
1
b1
B 1
B b2
1
2
B …
B3 b1 n2

()
2
B L−1 b1
b2 2

2
b n3

()
3
b1
b3 2

b3 n4


( )
L−1
b 1
L−1
b 2

L−1
b n L−1

Algorithm will now become


t←0;
epochs ←1000 ;
θ=[ W B ]
while t <epochs do
θ=θ−η ∇ θ;
end

[]
Here

∂ L (θ)

[] ]
1
W
2
W
∂ W3

[
∂ L (θ ) …
W L−1
∇ θ= ∂ W = ….(1)
∂ L (θ ) ∂ L (θ)

[]
∂B 1
B
B2
∂ B3

L−1
B

And

[]
∂ L (θ)
∂W1
∂ L (θ)
2
∂W
∂ L (θ)
∂ L (θ ) 3

[]
= ∂W ….(2)
W
1
∂ L (θ)
2 4
W ∂W
∂ W3 …
… …
W
L−1
∂ L (θ)
L−1
∂W
Solving for each element in the vector

[ ]
∂ L (θ) ∂ L(θ ) ∂ L (θ)

∂ ( w 11) ∂ ( w 21 ) ∂ ( w n1 1 )
1 1 1

∂ L (θ) ∂ L(θ ) ∂ L (θ)


∂ L(θ ) …
= ∂ ( w1 12 )
1 ∂ ( w 22 )
1
∂ ( w n1 2 ) ….(3)
1
∂W
… … ……
∂ L (θ) ∂ L(θ ) ∂ L(θ )

∂ ( w 1 n2 ) ∂ ( w 2 n2 ) ∂ ( w n1 n2 )
1 1 1

[ ]
∂ L (θ) ∂ L(θ ) ∂ L (θ)

∂ ( w 11) ∂ ( w 21 ) ∂ ( w n2 1 )
2 2 2

∂ L (θ) ∂ L(θ ) ∂ L (θ)


∂ L(θ ) …
= ∂ ( w2 12 )
2 ∂ ( w 22 )
2
∂ ( w n2 2 ) …. (4)
2
∂W
… … ……
∂ L (θ) ∂ L(θ ) ∂ L (θ)

∂ ( w 1n 3) ∂ ( w 2 n3 ) ∂ ( w n2 n3 )
2 2 2

[ ]
∂ L(θ ) ∂ L(θ ) ∂ L(θ )

∂ (w 11) ∂( w 21 ) ∂( w n( L−1) 1 )
L−1 L−1 L−1

∂ L(θ ) ∂ L(θ ) ∂ L(θ )


∂ L(θ ) …
L−1
= ∂ ( w L−1 12 ) ∂( w L−1
22 ) ∂( w
L−1
n( L−1) 2 ) ….(5)
∂W
… … ……
∂ L (θ) ∂ L(θ ) ∂ L (θ)

∂( w 1 n( L ) ) ∂ ( w 2n (L ) ) ∂(w n( L−1) n( L ))
L−1 L−1 L−1

Putting values of (3) , (4) and (5) in (2)


Now (2) becomes
[ ]
[ ]
∂ L (θ ) ∂ L (θ ) ∂ L(θ )

∂ ( w 11 ) ∂ ( w 21 ) ∂ (w 1 n 1 1 )
1 1

∂ L (θ ) ∂ L (θ ) ∂ L(θ )

∂ ( w 12 )
1
∂ ( w1 22 ) ∂ (w 1 n 1 2 )
… … ……
∂ L (θ) ∂ L (θ ) ∂ L (θ)

∂ ( w 1 n2 ) ∂ ( w 2n 2) ∂ ( w n 1 n2 )
1 1 1

[ ]
∂ L(θ ) ∂ L (θ ) ∂ L(θ )

∂ ( w 11 ) ∂ ( w 21 ) ∂ ( w n2 1 )
2 2 2

∂ L(θ ) ∂ L (θ ) ∂ L(θ )

∂ ( w 12 ) ∂ ( w 22 ) ∂ (w n 2 2 )
2 2 2
∂ L (θ )

[]
=
… … …… ….(2)
W1
W
2 ∂ L (θ) ∂ L (θ ) ∂ L (θ)

∂ ( w 1 n3 ) ∂ ( w 2n 3) ∂ ( w n 2 n3 )
2 2 2
∂ W3
… …
W L−1

[ ]

∂ L (θ) ∂ L (θ) ∂ L(θ )

∂ (w 11) ∂(w 21 ) ∂(w n( L−1) 1 )
L−1 L−1 L−1

∂ L (θ) ∂ L (θ) ∂ L(θ )



∂ ( w L−1 12 ) ∂ ( w L−1 22 ) ∂ ( w L−1 n( L−1) 2 )
… … ……
∂ L (θ) ∂ L(θ ) ∂ L (θ)

∂(w 1 n( L ) ) ∂ ( w 2 n( L ) ) ∂( w n( L−1) n ( L ))
L−1 L−1 L−1

[[ ] ]
Now ,
∂ L (θ)
1
∂B
∂ L (θ)
∂ L(θ ) 2
= ∂B
B1 … …(6)
B
2 …
∂ B3 ∂ L (θ)
… ∂ B L−1
L−1
B
[ ( )] [ ]
∂ L(θ )
1
∂b 1
∂ L(θ )
∂ L(θ ) ∂ L (θ ) 1
1
= = ∂ b 2 ….(7)
∂B b1
1

1 …
∂ b 2
… ∂ L(θ )
b1 n2 ∂ b1 n2

[ ( )] [ ]
∂ L (θ )
2
∂b 1
∂ L (θ )
∂ L(θ ) ∂ L (θ ) 2
2
= = ∂b 2 …..(8)
∂B b1
2

2 …
∂ b2
… ∂ L (θ )
b2 n3 ∂ b2 n3

[ ( )] [ ]
∂ L (θ)
∂ b L−1 1
∂ L (θ)
∂ L (θ) ∂ L(θ ) L−1
L−1
= = ∂ b 2 …..(9)
∂B b L−1 1 …
L−1 …
∂ b 2
… ∂ L (θ)
L−1
b nL−1 ∂ b L−1 n L−1

Putting values of (7), (8) and (9) in (6)


Now (6) becomes
[]
[]
∂ L (θ )
1
∂b 1
∂ L (θ )
1
∂b 2


∂ L (θ )
∂ b1 n2

[[ ] ]
∂ L (θ )
∂b 2 1
∂ L (θ )
2
∂b 2
∂ L(θ )
= …
1 ….(6)
B …
2
B ∂ L (θ )
∂ B3 ∂ b n3
2


L−1 …
B

[]

∂ L (θ )
L−1
∂b 1
∂ L (θ )
∂b L−1 2


∂ L (θ )
L−1
∂b n L−1
Therefore
Putting values of (2) and (6) in (1)
(1) becomes
[[ ] ]
[ ]
∂ L (θ) ∂ L (θ) ∂ L (θ)

∂ ( w 11 )
1
∂ ( w 21 )1
∂ ( w 1 n1 1 )
∂ L (θ) ∂ L (θ) ∂ L (θ)

∂ ( w 12 ) ∂ ( w 22 ) ∂ ( w n1 2 )
1 1 1

… … ……
∂ L(θ ) ∂ L (θ) ∂ L (θ)

∂ ( w 1 n2 ) ∂ ( w 2 n2 ) ∂ ( w n1 n 2)
1 1 1

[ ]
∂ L (θ) ∂ L (θ) ∂ L (θ)

∂ ( w 11 )
2
∂ ( w 21 )2
∂ ( w 2 n2 1 )
∂ L (θ) ∂ L (θ) ∂ L (θ)

∂ ( w 2 12 ) ∂ ( w 2 22 ) ∂ ( w 2 n2 2 )
… … ……
∂ L(θ ) ∂ L (θ) ∂ L(θ )

∂ ( w 1 n3 ) ∂ ( w 2 n3 ) ∂ ( w n2 n3 )
2 2 2

[ ]

∂ L(θ ) ∂ L(θ ) ∂ L (θ )

∂ (w L−1
11 ) ∂( w L−1
21 ) ∂( w L−1
n (L−1) 1 )
∂ L(θ ) ∂ L(θ ) ∂ L (θ )
∂ L (θ) …

[ ]
∂ (w 12 ) ∂( w 22 ) ∂( w n (L−1) 2 )
L−1 L−1 L−1
1
W
… … ……
W2
∂ W3 ∂ L(θ ) ∂ L (θ ) ∂ L (θ)

[ ]
∂ (w 1 n( L ) ) ∂ ( w 2 n( L ) ) ∂(w n (L−1) n( L ) )
L−1 L−1 L−1
∂ L (θ ) …
L−1
W
∇ θ= ∂ W = =
∂ L (θ ) ∂ L (θ)

[]
….(1)

[]
∂B 1
B
2
B ∂ L (θ )
∂ B3 ∂b 1
1

… ∂ L (θ )
B L−1 ∂b 1 2


∂ L (θ )
1
∂ b n2

∂ L (θ )
2
∂b 1
∂ L (θ )
2
∂b 2

This is our final form of ∇ θ

Using this we can update our weights and bias according to the Gradient of Loss function with respect
to each weight and bias.

You might also like