0% found this document useful (0 votes)
2 views92 pages

First

The document is a review of neural networks and deep learning, covering key concepts such as derivatives, biological neurons, linear regression approaches, gradient descent, and computation graphs. It explains the forward and backward passes in neural networks, detailing how to compute gradients and update weights. Additionally, it discusses types of neurons, particularly linear and logistic neurons, and their respective computations.

Uploaded by

Rahul Panjwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views92 pages

First

The document is a review of neural networks and deep learning, covering key concepts such as derivatives, biological neurons, linear regression approaches, gradient descent, and computation graphs. It explains the forward and backward passes in neural networks, detailing how to compute gradients and update weights. Additionally, it discusses types of neurons, particularly linear and logistic neurons, and their respective computations.

Uploaded by

Rahul Panjwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

L2: Neural

Networks Review
Deep Learning
CS427/CS5310/EE414/EE513

By Murtaza Taj
Derivatives De
ep
Le a r n i ng

Function Product Rule Chain Rule

n m n m
• f(x) = x • f(x) = x y • z=x
n m
d( f(x)) n−1 d(x) ∂(y ) ∂(x ) n
• dx
= nx
dx
= nx n−1
• f′(x, y) = x m
∂y
+ yn
∂x
• y=1+z
∂y ∂z ∂y
=
• ∂x ∂x ∂z
m
• f(x) = x 3 x m −n
• f(x, y) = = x y ∂y m−1 n−1
d( f(x)) 2−1 d(x) 2
y n
• ∂x = (mx )(nz )
• = 3x = 3x
dx dx
Biological Neuron De
ep
Le a r n i ng

T
w = {w0, w1, w2, ⋯}
T
x = {1,x1, x2, ⋯}
Analytical vs. Iterative Solution De
ep
Le a r n i ng

w = {w0, w1, w2, ⋯}T


(i) (i) (i)
x = {1,x1 , x2 , ⋯}T
(i) T (i) (i)
=w x =

y wj xj
j
(i) (i) 2 (i) T (i) 2
(t − w x )
∑ ∑
E= (t − y ) =
i i

Direct/Analytical Approach Iterative Approach


∂E
∂E (i) (i) (i) wj ← wj − η

=− (t − y )xj ∂wj
∂wj i
wj ← wj + Δwj
Ax = b
∂E
Δwj = − η
∂wj
(i) (i) (i)
Δwj = η(t − y )xj
where η is the learning rate
Linear Regression: Two Approaches De
ep
Le a r n i ng

Direct/Analytical Approach Iterative Approach


∂E
Ax = b wj = wj − η
∂wj
Key Computation: Forward Pass De ng
ep n i
a r
e
T
y=w x
L

x T
w x y
Key Computation: Forward Pass De ng
ep n i
a r
Le

w
y
x y=w xT E=
1

n i∈train
(i) (i) 2
(t − y )
E
Key Computation: Forward/Backward Pass De ng
ep n i
a r
Le

w
y (i)
=w x (i)T (i)
y
x =

(i) (i)
wj xj
E=
1

n i∈train
(i) (i) 2
(t − y )
E
j

(i) (i)
∂E 2 ∂y ∂E 2

= =−

(i) (i) (i)
(t − y )xj
∂wj n i ∂wj ∂y (i) n i∈train
(i)
∂y ∂y (i) (i) ∂E
E
∂E 2
=−

(i) (i)
(t − y )wj { , (i) }
∂xj
(i) n i∈train ∂wj ∂xj ∂y (i)
Gradient Descent De ng
ep n i
r

Δw ← Initialize to ZEROS
e a
L
• j
• For each iteration
(i) (i)
• For each training example (x , t )
• For each weight wj
(i) (i) (i) (i) (i) (i)
Δwj ← Δwj + ηxj (t −y )

Δwj = η xj (t −y )
• For each weight wj i

wj ← wj + Δwj
Gradient Descent De ng
ep n i
a r
Le

Left Image: https://medium.com/analytics-vidhya/gradient-descent-part-2-the-math-c23060a96a13


Right Image: https://hduongtrong.github.io/2015/11/23/coordinate-descent/
Example: Computational Graph
Computation Graph De ng
ep n i
a r
Le

• For example here is a simple mathematical equation

y = (a + b) × (b − c)
2
y = ab − ac + b − bc
Computation Graph De ng
ep n i
a r
Le

• For example here is a simple mathematical equation

y = (a + b) × (b − c)
2
y = ab − ac + b − bc

• We want to calculate gradient w.r.t a, b, c


∂y ∂y ∂y
, ,
∂a ∂b ∂c
Computation Graph De ng
ep n i
a r
Le

• For example here is a simple mathematical equation

y = (a + b) × (b − c)
2
y = ab − ac + b − bc
• We want to calculate gradient w.r.t a, b, c
∂y ∂y ∂y
, ,
∂a ∂b ∂c
∂y
=b−c
∂a
Computation Graph De ng
ep n i
a r
Le

• For example here is a simple mathematical equation

y = (a + b) × (b − c)
2
y = ab − ac + b − bc

• We want to calculate gradient w.r.t a, b, c


∂y ∂y ∂y
, ,
∂y ∂a ∂b ∂c
=b−c
∂a
∂y
=a+b+b−c
∂b
Computation Graph De ng
ep n i
a r
Le

• For example here is a simple mathematical equation

y = (a + b) × (b − c)
2
y = ab − ac + b − bc

• We want to calculate gradient w.r.t a, b, c


∂y ∂y ∂y
, ,
∂a ∂b ∂c
∂y ∂y
=b−c = − (a + b)
∂a ∂c
∂y
=a+b+b−c
∂b
Computation Graph De ng
ep n i
a r
Le

• For example here is a simple mathematical equation


y = (a + b) × (b − c)

• We can write a computation graph of above equation as follows:

+
y
b *


c
Computation Graph - Forward Pass De ng
ep n i
a r
Le

• Let us consider a = 5, b = − 2, c = 4
y = (a + b) × (b − c)

• First we will use these values to compute d and e

a 5
d=a+b=5−2=3
+
−2
b ×
−2


c 4 e=b−c=−2−4=−6
Computation Graph - Forward Pass De ng
ep n i
a r
Le

• Let us consider a = 3, b = − 2, c = 4
y = (a + b) × (b − c)

• Now we will use d and e to compute y

a 5
d=a+b=5−2=3
+
−2 y = d × e = 3 × (−6) = − 18
b ×
−2


e=b−c=−2−4=−6
c 4
Computation Graph - Backward Pass De ng
ep n i
a r
Le

• We start the backward pass by finding the derivative of the final output with
respect to the final output (itself!).
∂y
=1
∂y

• Our computational graph now looks as shown below


a 5

+ d
−2
b × −18 y
−2
∂y
e =1
− ∂y

c 4
Computation Graph - Backward Pass De ng
ep n i
a r
Le

• Now given y = d × e
∂y
=e=b−c
∂d
∂y
=d=a+b
∂e
a 5 ∂y
=e=−6
∂d
+ d
−2
b × −18 y
−2
e

∂y
c 4 =d=3
∂e
Computation Graph - Backward Pass De ng
ep n i
a r
Le

• We want to calculate gradient w.r.t a, b, c y = (a + b) × (b − c)


∂y ∂y ∂y y=d×e
, ,
∂a ∂b ∂c
∂y ∂y ∂d
• From chain rule we have = =e×1=−6
∂y ∂a ∂d ∂a
=−6
∂a

a 5 ∂y
=e=−6
∂d
+ d=a+b
−2
b × −18
y = d*e
−2

− e=b−c
∂y
c 4 =d=3
∂e
y = (a + b) × (b − c)
Computation Graph - Backward Pass De ng
ep n i
a r
Le

• We want to calculate gradient w.r.t a, b, c


∂y ∂y ∂y
, ,
∂a ∂b ∂c

∂y ∂y ∂e
• From chain rule we have = = d × (−1) = − 3
∂c ∂e ∂c
a 5 ∂y
=e=−6
∂d
+ d=a+b
−2
b × −18
y = d*e
−2

− e=b−c
4 ∂y
c ∂y =d=3
=−3 ∂e
∂c
y = (a + b) × (b − c)
Computation Graph - Backward Pass De ng
ep n i
a r
Le

• We want to calculate gradient w.r.t a, b, c


∂y ∂y ∂y
, ,
∂a ∂b ∂c
∂y ∂y ∂e ∂y ∂d
• From chain rule we have = +
∂b ∂e ∂b ∂d ∂b
=d×1+e×1=−3
a 5 ∂y
=e=−6
∂d
+ d=a+b
−2
b × −18
y = d*e
∂y −2
=−3
∂b e=b−c

4 ∂y
c =d=3
∂e
DIY Example De ng
ep n i
a r
Le

• g = (x + y)z
DIY Examples De ng
ep n i
a r
Le

https://ekababisong.org/gcp-ml-seminar/tensorflow/
Types of Neuron
Types of Neurons De
ep
Le a r n i ng

w1
w0
w2
f(x, w)
w3 w1
Linear Neuron
w0
w2
f(x, w)
w1 w3
w0 Logistic Neuron
w2
f(x, w)
w3 Potentially more. Require a convex
Perceptron loss function for gradient descent
Slide Credit: HKUST
Logistic Neuron
Logistic Neuron De
ep
Le a r n i ng

• These give a real-valued output that is a



smooth and bounded function of their z = w0 + xjwj
total input
j
• They have nice derivates which make
learning easy 1
y=
1+e −z
Key Computation: Forward-Prop De
ep
Le a r n i ng

x z = w0 +

j
wj xj z
y=
1
1+e −z
y
Key Computation: Forward-Prop De
ep
e a r n i ng

w
L

z 1 y
x z = w0 +

j
wj xj y=
1 + e −z
1
E = (t − y)
2
2
E

∂E
∂wj
∂E ∂z ∂z ∂y ∂E
∂xj
{ , }
∂wj ∂xj ∂z ∂y
E
Key Computation: Back-Prop De
ep
Le a r n i ng

w
z 1 y
x z = w0 +

j
wj xj y=
1+e −z
1
E = (t − y)
2
2
E

∂E ∂z ∂y ∂E
= = − xj y(1 − y)(t − y)
∂wj ∂wj ∂z ∂y
∂E
∂xj
{xj, wj} y(1 − y) −(t − y) E
Derivation of Logistic Neuron De
ep
Le a r n i ng

1 −z −1
y= = (1 + e )
1+e −z

( 1 + e −z ) ( 1 + e −z )
−z −z
∂y −1(−e ) 1 e
= = = y(1 − y)
∂z (1 + e )
−z 2

−z −z −z
e (1 + e ) − 1 (1 + e ) −1
because = = = 1 − y
1 + e −z (1 + e −z) (1 + e −z) (1 + e −z)
Key Computation: Back-Prop De
ep
e a r n i ng

w
L

z 1 y
x z = w0 +

j
wj xj y=
1+e −z
E=
1

2 i∈train
(t − y) 2
E

∂E ∂z ∂y ∂E

= =− xj y(1 − y)(t − y)
∂wj ∂wj ∂z ∂y i
∂E
∂xj
{xj, wj} y(1 − y) −

(t − y) E
i
Key Computation: Back-Prop De
ep
Le a r n i ng

• Iterative Solution E

∂E
wj ← wj − η
∂wj
wj ← wj + Δwj ∂E
∂wj w
∂E
Δwj = − η
∂wj E

(i) (i) (i) (i) (i) (i)



Δwj = η xj y (1 − y )(t − y )
i
∂E
∂wj w
Activtion Functions De
ep
Le a r n i ng

sigmoid vs tanh
Rectified Linear Units (ReLU) De
ep
Le a r n i ng

! ReLU max(0,z) [Krizhevsky et al., NIPS12]

! Leaky ReLU max(0.1z, z)


(C) Dhruv Batra
Multi-Layer Neural Networks
Perceptron De
ep
Le a r n i ng

Perceptron
Example:
w = [0.79, 0.96, 0.66]
Implements a linear
function f(x1, x2)

Threshold to get linear


1 w0 decision boundary

w1
x1

x2 w2
AND Function De
ep
Le a r n i ng
AND Function De
ep
Le a r n i ng
AND Function De
ep
Le a r n i ng
AND Function De
ep
Le a r n i ng
AND Function De
ep
Le a r n i ng
OR Function De
ep
Le a r n i ng

1×1−1×1−1×1=−1
OR Function De
ep
Le a r n i ng

1×1−1×1+1×1=1
OR Function De
ep
Le a r n i ng

1×1+1×1−1×1=1
OR Function De
ep
Le a r n i ng

1×1+1×1+1×1=3
What to do incase of non-linear problem? XOR De
ep
Le a r n i ng

Idea: Stack a bunch of them together


Consider XOR x1
-1
x2
-1
XOR(x1, x2)
-1
-1 1 1
1 -1 1
1 1 -1

1 ? x2

?
x1
?
x1
x2
No set of weights
can separate
these classes
XOR Function De
ep
Le a r n i ng

Consider XOR x1
-1
x2
-1
XOR(x1, x2)
-1
-1 1 1
1 -1 1
1 1 -1

1 ? x2

?
x1
?
x1
x2
No set of weights
can separate
these classes
XOR De
ep
Le a r n i ng

XOR
XOR(x1, x2) = OR(AND(x1, ¬x2), AND(¬x1, x2))
x1 x2 AND(x1, ¬x2) AND(¬x1, x2) OR(AND(x1, ¬x2), AND(¬x1, x2))
-1 -1
-1 1
1 -1
1 1
XOR De
ep
Le a r n i ng

XOR
XOR(x1, x2) = OR(AND(x1, ¬x2), AND(¬x1, x2))
x1 x2 AND(x1, ¬x2) AND(¬x1, x2) OR(AND(x1, ¬x2), AND(¬x1, x2))
-1 -1 -1
-1 1 -1
1 -1 1
1 1 -1
XOR De
ep
Le a r n i ng

XOR
XOR(x1, x2) = OR(AND(x1, ¬x2), AND(¬x1, x2))
x1 x2 AND(x1, ¬x2) AND(¬x1, x2) OR(AND(x1, ¬x2), AND(¬x1, x2))
-1 -1 -1 -1
-1 1 -1 1
1 -1 1 -1
1 1 -1 -1
XOR De
ep
e a r n i ng

XOR
L

XOR(x1, x2) = OR(AND(x1, ¬x2), AND(¬x1, x2))


x1 x2 AND(x1, ¬x2) AND(¬x1, x2) OR(AND(x1, ¬x2), AND(¬x1, x2))
-1 -1 -1 -1 -1
-1 1 -1 1 1
1 -1 1 -1 1
1 1 -1 -1 -1
XOR De
ep
Le a r n i ng

XOR(x1, x2) = OR(AND(x1, ¬x2), AND(¬x1, x2))


OR AND
1 1 1 -1

x1 1 x1 1

x2 1 x2 1

1 -1 1
1
-1
1 1
x1 XOR(x1, x2)
h1 = AND(x1, ¬x2)
-1 OR(h1, h2)
-1 1
x2
1
h2 = AND(¬x1, x2)
XOR De
ep
Le a r n i ng

XOR(x1, x2) = OR(AND(x1, ¬x2), AND(¬x1, x2))


OR AND XOR
1 1 1 -1 1
1 -1 1
-1
x1 1 x1 1 x1 1 1

-1
-1 1
x2 1 x2 1 x2
1

By combining two
Perceptrons, we are
able to create a
non-linear decision
boundary
A Powerful Model De
ep
Le a r n i ng

A Powerful Model

Yaser Abu Mostafa, Caltech


Demo: De
ep
Le a r n i ng

http://playground.tensorflow.org/
Readings De
ep
Le a r n i ng

• Gradient Descent
• Forward and Backward pass of Neural Network
• [Nikhil Buduma CH1] Neural, Perceptron, Regression, Logistic Regression
• [Nikhil Buduma CH2] Training feedforward NN, back prop., Gradient Descent
Multilayer Networks De
ep
Le a r n i ng

x1 w1
h1 w5 o1
w2 w6

w3 w7
x2 h2 o2
w4 w8

b1 b2
Multilayer Networks De
ep
Le a r n i ng

h1 o1
x1 w1 w5
w2 z y w6
z y

w3
h2
w7 o2
x2 w4 z y w8 z y

b1 b2
Multilayer Networks De
ep
Le a r n i ng

h1 o1
x1 w1 w5
zh1 yh1 zo1 yo1
w2 w6

w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2

b1 b2
Key Computation: Back-Prop De
ep
Le a r n i ng

∂E h1 o1
=? x1 w1 w5
∂w5 zh1 yh1 zo1 yo1
w2 w6

w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2

b1 b2
Key Computation: Back-Prop De
ep
Le a r n i ng

∂E ∂zo1 ∂yo1 ∂Eo1 ∂Eo2


∂w5 ∂w5 ∂zo1 ∂yo1
+
∂yo1
ET

∂E
yh1 yo1(1 − yo1) −(to1 − yo1) ET
∂w5
Key Computation: Back-Prop De
ep
Le a r n i ng

∂E h1 o1
=? x1 w1 w5
∂w1 zh1 yh1 zo1 yo1
w2 w6

w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2

b1 b2
Key Computation: Back-Prop De
ep
Le a r n i ng

∂E
=?
∂w1

∂zh1 ∂yh1 ∂E
∂w1 ∂zh1 ∂yh1 E

∂zh1 ∂yh1 ∂Eo1 ∂Eo2


∂w1 ∂zh1 ∂yh1
+
∂yh1
E
∂zo1 ∂yo1 ∂Eo1
∂zh1 ∂yh1 +

∂w1 ∂zh1
∂yh1 ∂zo1 ∂yo1
∂zo2 ∂yo2 ∂Eo2 E
∂yh1 ∂zo2 ∂yo2
Key Computation: Back-Prop De
ep
Le a r n i ng

∂E
=?
∂w1

∂zh1 ∂yh1 ∂E
∂w1 ∂zh1 ∂yh1 E
∂zo1 ∂yo1 ∂Eo1
∂zh1 ∂yh1 +

∂w1 ∂zh1
∂yh1 ∂zo1 ∂yo1
∂zo2 ∂yo2 ∂Eo2 E
∂yh1 ∂zo2 ∂yo2

−w5yo1(1 − yo1)(to1 − yo1)+

x1 yh1(1 − yh1) −w7yo2(1 − yo2)(to2 − yo2) E


Gradient Descent De
ep
Le a r n i ng

• Δwj ← Initialize to ZEROS


• For each iteration
(i) (i)
• For each training example (x , t )
• For each weight wj
Δwj ← Δwj+
ηx1yh1(1 − yh1)[−w5yo1(1 − yo1)(to1 − yo1] − w7yo2(1 − yo2)(to2 − yo2]

• For each weight wj


wj ← wj + Δwj
Key Computation: Back-Prop De
ep
Le a r n i ng

∂E h1 o1
=? x1 w1 w5
∂w1 zh1 yh1 zo1 yo1
w2 w6

w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2

w9
w10 o3
b1 b2 zo3 yo3
Numerical Example
A Step by Step Backpropagation Example De
ep
Le a r n i ng

https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
Multilayer Networks De
ep
Le a r n i ng

x1 w1
h1 w5 o1
w2 w6

w3 w7
x2 h2 o2
w4 w8

b1 b2
Multilayer Networks De
ep
Le a r n i ng

h1 o1
x1 w1 w5
w2 z y w6
z y

w3
h2
w7 o2
x2 w4 z y w8 z y

b1 b2
Multilayer Networks De
ep
Le a r n i ng

h1 o1
x1 w1 w5
zh1 yh1 zo1 yo1
w2 w6

w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2

b1 b2
A Step by Step Backpropagation Example
De
ep
Le a r n i ng

x
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The goal of backpropagation is to optimize the weights so that the


neural network can learn how to correctly map arbitrary inputs to
outputs.

For the rest of this tutorial we’re going to work with a single training set:
given inputs 0.05 and 0.10, we want the neural network to output 0.01
and 0.99.
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Forward Pass


Total net input for h1: z
z

We then squash it using the logistic function to get the output of :


1
yh1 =
1 + e h1
−z
x

Carrying out the same process for h2 we get:


x
y
A Step by Step Backpropagation Example De
ep
Le a r n i ng

Output for o1: zo1 = w5 * yh1 + w6 * yh2 + b2 * 1


z

1
yo1 =
1 + e o1
−z
x

For o2 we get: y x
A Step by Step Backpropagation Example
De
ep
Le a r n i ng

Calculating the Error


ET

1 2
Eo1 = (to1 − yo1)
2
ET

x
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Backward Pass


Output Layer
∂ET
Consider w5. We want to know how much a change in affects the total error, aka .
∂w5
∂ET ∂zo1 ∂yo1 ∂ET
By applying the chain rule we know that: =
∂w5 ∂w5 ∂zo1 ∂yo1

Visually, here’s what we’re doing:


x

1 x
zo1 yo1 Eo1 = (to1 − yo1)2
2
ET = Eo1 + Eo2
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Backward Pass ∂ET ∂zo1 ∂yo1 ∂ET


=
∂w5 ∂w5 ∂zo1 ∂yo1
how much does the total error change w.r.t the output?
1 2 1 2
ET = (to1 − yo1) + (to2 − yo2)
2 2
∂ET 1 2−1
= 2 * (to1 − yo1) * (−1) + (−1) * 0
∂yo1 2

1
zo1 yo1 Eo1 = (to1 − yo1)2
2
ET = Eo1 + Eo2
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Backward Pass ∂ET ∂zo1 ∂yo1 ∂ET


=
∂w5 ∂w5 ∂zo1 ∂yo1
how much does the output o1 change w.r.t its total net input?
1
yo1 =
1 + e −zo1
∂yo1
= yo1(1 − yo1)
∂zo1

1
zo1 yo1 Eo1 = (to1 − yo1)2
2
ET = Eo1 + Eo2
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Backward Pass ∂ET ∂zo1 ∂yo1 ∂ET


=
∂w5 ∂w5 ∂zo1 ∂yo1
how much does the total net input of o1 change with respect to w5?
zo1 = w5 * yh1 + w6 * yh2 + b2 * 1
∂zo1
= 1 * yh1 = 0.593269992
∂w5
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Backward Pass


Putting it all together: ∂ET ∂zo1 ∂yo1 ∂ET
=
∂w5 ∂w5 ∂zo1 ∂yo1
∂ET
= − (to1 − yo1)yo1 * (1 − yo1) * yh1
∂w5
∂ET
w5+ = w5 − η
∂w5
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Backward Pass


We can repeat this process to get the new weights w6, w7, and w8

x
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Backward Pass - Hidden Layer


Continue the backwards pass by calculating new values for w1, w2, w3, and w4
∂ET ∂zh1 ∂yh1 ∂ET
=
∂w1 ∂w1 ∂zh1 ∂yh1
∂ET ∂Eo1 ∂Eo2
= +
∂yh1 ∂yh1 ∂yh1

ET = Eo1 + E02
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Backward Pass x


∂Eo2
∂yh1
x

∂Eo2
Following the same process for ,
∂yh1

we get:

∂ET ∂Eo1 ∂Eo2


Therefore: = +
∂yh1 ∂yh1 ∂yh1
A Step by Step Backpropagation Example De
ep
Le a r n i ng

The Backward Pass - Hidden Layer


x

∂E ∂z ∂y ∂E
T h1 h1 T
=
∂w1 ∂w1 ∂zh1 ∂yh1 x

The Backward Pass - Hidden Layer


zh1
∂zh1
∂w1
1
yh1 =
1 + e −zh1
∂yh1
= yh1(1 − yh1)
∂zh1

∂ET
∂w1
Similarly,
Take Home - Written Assignment De
ep n i ng
a r
Le
Derive the formula for
∂E ∂E
=? =?
∂w2 ∂w6
∂E ∂E
=? =?
∂w3 ∂w7
∂E ∂E
=? =?
∂w4 ∂w8

Find the value for


Δw6 Δw2 x

Δw7 Δw3
x
Δw8 Δw4
Readings De
ep
Le a r n i ng

• Handwritten Notes on LMS


• Linear Algebra Background
• https://www.deeplearningbook.org/contents/linear_algebra.html
• Probability background
• https://www.deeplearningbook.org/contents/prob.html
• ML Background
• https://www.deeplearningbook.org/contents/ml.html

You might also like