First
First
Networks Review
Deep Learning
CS427/CS5310/EE414/EE513
By Murtaza Taj
Derivatives De
ep
Le a r n i ng
n m n m
• f(x) = x • f(x) = x y • z=x
n m
d( f(x)) n−1 d(x) ∂(y ) ∂(x ) n
• dx
= nx
dx
= nx n−1
• f′(x, y) = x m
∂y
+ yn
∂x
• y=1+z
∂y ∂z ∂y
=
• ∂x ∂x ∂z
m
• f(x) = x 3 x m −n
• f(x, y) = = x y ∂y m−1 n−1
d( f(x)) 2−1 d(x) 2
y n
• ∂x = (mx )(nz )
• = 3x = 3x
dx dx
Biological Neuron De
ep
Le a r n i ng
T
w = {w0, w1, w2, ⋯}
T
x = {1,x1, x2, ⋯}
Analytical vs. Iterative Solution De
ep
Le a r n i ng
x T
w x y
Key Computation: Forward Pass De ng
ep n i
a r
Le
w
y
x y=w xT E=
1
∑
n i∈train
(i) (i) 2
(t − y )
E
Key Computation: Forward/Backward Pass De ng
ep n i
a r
Le
w
y (i)
=w x (i)T (i)
y
x =
∑
(i) (i)
wj xj
E=
1
∑
n i∈train
(i) (i) 2
(t − y )
E
j
(i) (i)
∂E 2 ∂y ∂E 2
∑
= =−
∑
(i) (i) (i)
(t − y )xj
∂wj n i ∂wj ∂y (i) n i∈train
(i)
∂y ∂y (i) (i) ∂E
E
∂E 2
=−
∑
(i) (i)
(t − y )wj { , (i) }
∂xj
(i) n i∈train ∂wj ∂xj ∂y (i)
Gradient Descent De ng
ep n i
r
Δw ← Initialize to ZEROS
e a
L
• j
• For each iteration
(i) (i)
• For each training example (x , t )
• For each weight wj
(i) (i) (i) (i) (i) (i)
Δwj ← Δwj + ηxj (t −y )
∑
Δwj = η xj (t −y )
• For each weight wj i
wj ← wj + Δwj
Gradient Descent De ng
ep n i
a r
Le
y = (a + b) × (b − c)
2
y = ab − ac + b − bc
Computation Graph De ng
ep n i
a r
Le
y = (a + b) × (b − c)
2
y = ab − ac + b − bc
y = (a + b) × (b − c)
2
y = ab − ac + b − bc
• We want to calculate gradient w.r.t a, b, c
∂y ∂y ∂y
, ,
∂a ∂b ∂c
∂y
=b−c
∂a
Computation Graph De ng
ep n i
a r
Le
y = (a + b) × (b − c)
2
y = ab − ac + b − bc
y = (a + b) × (b − c)
2
y = ab − ac + b − bc
+
y
b *
−
c
Computation Graph - Forward Pass De ng
ep n i
a r
Le
• Let us consider a = 5, b = − 2, c = 4
y = (a + b) × (b − c)
a 5
d=a+b=5−2=3
+
−2
b ×
−2
−
c 4 e=b−c=−2−4=−6
Computation Graph - Forward Pass De ng
ep n i
a r
Le
• Let us consider a = 3, b = − 2, c = 4
y = (a + b) × (b − c)
a 5
d=a+b=5−2=3
+
−2 y = d × e = 3 × (−6) = − 18
b ×
−2
−
e=b−c=−2−4=−6
c 4
Computation Graph - Backward Pass De ng
ep n i
a r
Le
• We start the backward pass by finding the derivative of the final output with
respect to the final output (itself!).
∂y
=1
∂y
+ d
−2
b × −18 y
−2
∂y
e =1
− ∂y
c 4
Computation Graph - Backward Pass De ng
ep n i
a r
Le
• Now given y = d × e
∂y
=e=b−c
∂d
∂y
=d=a+b
∂e
a 5 ∂y
=e=−6
∂d
+ d
−2
b × −18 y
−2
e
−
∂y
c 4 =d=3
∂e
Computation Graph - Backward Pass De ng
ep n i
a r
Le
a 5 ∂y
=e=−6
∂d
+ d=a+b
−2
b × −18
y = d*e
−2
− e=b−c
∂y
c 4 =d=3
∂e
y = (a + b) × (b − c)
Computation Graph - Backward Pass De ng
ep n i
a r
Le
∂y ∂y ∂e
• From chain rule we have = = d × (−1) = − 3
∂c ∂e ∂c
a 5 ∂y
=e=−6
∂d
+ d=a+b
−2
b × −18
y = d*e
−2
− e=b−c
4 ∂y
c ∂y =d=3
=−3 ∂e
∂c
y = (a + b) × (b − c)
Computation Graph - Backward Pass De ng
ep n i
a r
Le
• g = (x + y)z
DIY Examples De ng
ep n i
a r
Le
https://ekababisong.org/gcp-ml-seminar/tensorflow/
Types of Neuron
Types of Neurons De
ep
Le a r n i ng
w1
w0
w2
f(x, w)
w3 w1
Linear Neuron
w0
w2
f(x, w)
w1 w3
w0 Logistic Neuron
w2
f(x, w)
w3 Potentially more. Require a convex
Perceptron loss function for gradient descent
Slide Credit: HKUST
Logistic Neuron
Logistic Neuron De
ep
Le a r n i ng
x z = w0 +
∑
j
wj xj z
y=
1
1+e −z
y
Key Computation: Forward-Prop De
ep
e a r n i ng
w
L
z 1 y
x z = w0 +
∑
j
wj xj y=
1 + e −z
1
E = (t − y)
2
2
E
∂E
∂wj
∂E ∂z ∂z ∂y ∂E
∂xj
{ , }
∂wj ∂xj ∂z ∂y
E
Key Computation: Back-Prop De
ep
Le a r n i ng
w
z 1 y
x z = w0 +
∑
j
wj xj y=
1+e −z
1
E = (t − y)
2
2
E
∂E ∂z ∂y ∂E
= = − xj y(1 − y)(t − y)
∂wj ∂wj ∂z ∂y
∂E
∂xj
{xj, wj} y(1 − y) −(t − y) E
Derivation of Logistic Neuron De
ep
Le a r n i ng
1 −z −1
y= = (1 + e )
1+e −z
( 1 + e −z ) ( 1 + e −z )
−z −z
∂y −1(−e ) 1 e
= = = y(1 − y)
∂z (1 + e )
−z 2
−z −z −z
e (1 + e ) − 1 (1 + e ) −1
because = = = 1 − y
1 + e −z (1 + e −z) (1 + e −z) (1 + e −z)
Key Computation: Back-Prop De
ep
e a r n i ng
w
L
z 1 y
x z = w0 +
∑
j
wj xj y=
1+e −z
E=
1
∑
2 i∈train
(t − y) 2
E
∂E ∂z ∂y ∂E
∑
= =− xj y(1 − y)(t − y)
∂wj ∂wj ∂z ∂y i
∂E
∂xj
{xj, wj} y(1 − y) −
∑
(t − y) E
i
Key Computation: Back-Prop De
ep
Le a r n i ng
• Iterative Solution E
∂E
wj ← wj − η
∂wj
wj ← wj + Δwj ∂E
∂wj w
∂E
Δwj = − η
∂wj E
sigmoid vs tanh
Rectified Linear Units (ReLU) De
ep
Le a r n i ng
Perceptron
Example:
w = [0.79, 0.96, 0.66]
Implements a linear
function f(x1, x2)
w1
x1
x2 w2
AND Function De
ep
Le a r n i ng
AND Function De
ep
Le a r n i ng
AND Function De
ep
Le a r n i ng
AND Function De
ep
Le a r n i ng
AND Function De
ep
Le a r n i ng
OR Function De
ep
Le a r n i ng
1×1−1×1−1×1=−1
OR Function De
ep
Le a r n i ng
1×1−1×1+1×1=1
OR Function De
ep
Le a r n i ng
1×1+1×1−1×1=1
OR Function De
ep
Le a r n i ng
1×1+1×1+1×1=3
What to do incase of non-linear problem? XOR De
ep
Le a r n i ng
1 ? x2
?
x1
?
x1
x2
No set of weights
can separate
these classes
XOR Function De
ep
Le a r n i ng
Consider XOR x1
-1
x2
-1
XOR(x1, x2)
-1
-1 1 1
1 -1 1
1 1 -1
1 ? x2
?
x1
?
x1
x2
No set of weights
can separate
these classes
XOR De
ep
Le a r n i ng
XOR
XOR(x1, x2) = OR(AND(x1, ¬x2), AND(¬x1, x2))
x1 x2 AND(x1, ¬x2) AND(¬x1, x2) OR(AND(x1, ¬x2), AND(¬x1, x2))
-1 -1
-1 1
1 -1
1 1
XOR De
ep
Le a r n i ng
XOR
XOR(x1, x2) = OR(AND(x1, ¬x2), AND(¬x1, x2))
x1 x2 AND(x1, ¬x2) AND(¬x1, x2) OR(AND(x1, ¬x2), AND(¬x1, x2))
-1 -1 -1
-1 1 -1
1 -1 1
1 1 -1
XOR De
ep
Le a r n i ng
XOR
XOR(x1, x2) = OR(AND(x1, ¬x2), AND(¬x1, x2))
x1 x2 AND(x1, ¬x2) AND(¬x1, x2) OR(AND(x1, ¬x2), AND(¬x1, x2))
-1 -1 -1 -1
-1 1 -1 1
1 -1 1 -1
1 1 -1 -1
XOR De
ep
e a r n i ng
XOR
L
x1 1 x1 1
x2 1 x2 1
1 -1 1
1
-1
1 1
x1 XOR(x1, x2)
h1 = AND(x1, ¬x2)
-1 OR(h1, h2)
-1 1
x2
1
h2 = AND(¬x1, x2)
XOR De
ep
Le a r n i ng
-1
-1 1
x2 1 x2 1 x2
1
By combining two
Perceptrons, we are
able to create a
non-linear decision
boundary
A Powerful Model De
ep
Le a r n i ng
A Powerful Model
http://playground.tensorflow.org/
Readings De
ep
Le a r n i ng
• Gradient Descent
• Forward and Backward pass of Neural Network
• [Nikhil Buduma CH1] Neural, Perceptron, Regression, Logistic Regression
• [Nikhil Buduma CH2] Training feedforward NN, back prop., Gradient Descent
Multilayer Networks De
ep
Le a r n i ng
x1 w1
h1 w5 o1
w2 w6
w3 w7
x2 h2 o2
w4 w8
b1 b2
Multilayer Networks De
ep
Le a r n i ng
h1 o1
x1 w1 w5
w2 z y w6
z y
w3
h2
w7 o2
x2 w4 z y w8 z y
b1 b2
Multilayer Networks De
ep
Le a r n i ng
h1 o1
x1 w1 w5
zh1 yh1 zo1 yo1
w2 w6
w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2
b1 b2
Key Computation: Back-Prop De
ep
Le a r n i ng
∂E h1 o1
=? x1 w1 w5
∂w5 zh1 yh1 zo1 yo1
w2 w6
w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2
b1 b2
Key Computation: Back-Prop De
ep
Le a r n i ng
∂E
yh1 yo1(1 − yo1) −(to1 − yo1) ET
∂w5
Key Computation: Back-Prop De
ep
Le a r n i ng
∂E h1 o1
=? x1 w1 w5
∂w1 zh1 yh1 zo1 yo1
w2 w6
w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2
b1 b2
Key Computation: Back-Prop De
ep
Le a r n i ng
∂E
=?
∂w1
∂zh1 ∂yh1 ∂E
∂w1 ∂zh1 ∂yh1 E
∂w1 ∂zh1
∂yh1 ∂zo1 ∂yo1
∂zo2 ∂yo2 ∂Eo2 E
∂yh1 ∂zo2 ∂yo2
Key Computation: Back-Prop De
ep
Le a r n i ng
∂E
=?
∂w1
∂zh1 ∂yh1 ∂E
∂w1 ∂zh1 ∂yh1 E
∂zo1 ∂yo1 ∂Eo1
∂zh1 ∂yh1 +
∂w1 ∂zh1
∂yh1 ∂zo1 ∂yo1
∂zo2 ∂yo2 ∂Eo2 E
∂yh1 ∂zo2 ∂yo2
∂E h1 o1
=? x1 w1 w5
∂w1 zh1 yh1 zo1 yo1
w2 w6
w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2
w9
w10 o3
b1 b2 zo3 yo3
Numerical Example
A Step by Step Backpropagation Example De
ep
Le a r n i ng
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
Multilayer Networks De
ep
Le a r n i ng
x1 w1
h1 w5 o1
w2 w6
w3 w7
x2 h2 o2
w4 w8
b1 b2
Multilayer Networks De
ep
Le a r n i ng
h1 o1
x1 w1 w5
w2 z y w6
z y
w3
h2
w7 o2
x2 w4 z y w8 z y
b1 b2
Multilayer Networks De
ep
Le a r n i ng
h1 o1
x1 w1 w5
zh1 yh1 zo1 yo1
w2 w6
w3
h2
w7 o2
x2 w4 zh2 yh2 w8 zo2 yo2
b1 b2
A Step by Step Backpropagation Example
De
ep
Le a r n i ng
x
A Step by Step Backpropagation Example De
ep
Le a r n i ng
For the rest of this tutorial we’re going to work with a single training set:
given inputs 0.05 and 0.10, we want the neural network to output 0.01
and 0.99.
A Step by Step Backpropagation Example De
ep
Le a r n i ng
1
yo1 =
1 + e o1
−z
x
For o2 we get: y x
A Step by Step Backpropagation Example
De
ep
Le a r n i ng
1 2
Eo1 = (to1 − yo1)
2
ET
x
A Step by Step Backpropagation Example De
ep
Le a r n i ng
1 x
zo1 yo1 Eo1 = (to1 − yo1)2
2
ET = Eo1 + Eo2
A Step by Step Backpropagation Example De
ep
Le a r n i ng
1
zo1 yo1 Eo1 = (to1 − yo1)2
2
ET = Eo1 + Eo2
A Step by Step Backpropagation Example De
ep
Le a r n i ng
1
zo1 yo1 Eo1 = (to1 − yo1)2
2
ET = Eo1 + Eo2
A Step by Step Backpropagation Example De
ep
Le a r n i ng
x
A Step by Step Backpropagation Example De
ep
Le a r n i ng
ET = Eo1 + E02
A Step by Step Backpropagation Example De
ep
Le a r n i ng
∂Eo2
Following the same process for ,
∂yh1
we get:
∂E ∂z ∂y ∂E
T h1 h1 T
=
∂w1 ∂w1 ∂zh1 ∂yh1 x
∂ET
∂w1
Similarly,
Take Home - Written Assignment De
ep n i ng
a r
Le
Derive the formula for
∂E ∂E
=? =?
∂w2 ∂w6
∂E ∂E
=? =?
∂w3 ∂w7
∂E ∂E
=? =?
∂w4 ∂w8
Δw7 Δw3
x
Δw8 Δw4
Readings De
ep
Le a r n i ng