Session 4
Session 4
Session 4
Session IV
Pierre Michel
[email protected]
M2 EBDS
2021
“Your ML methods seem pretty good but. . . ”
1. Deep Learning
𝑥1
𝑥2
ℎ𝑊,𝑏 (𝑥)
𝑥3
What is a neuron ?
Activation funtions
Note: a neuron using the rectified linear function is called a rectified linear
unit (ReLU).
Activation funtions
Activation functions
sigmoid
4
tanh
rectified linear
3
2
f(z)
1
0
−1
−2 −1 0 1 2
𝑥1
(2)
𝑎1
(2)
𝑥2 𝑎2
(2) ℎ𝑊,𝑏 (𝑥)
𝑎3
𝑥3 Layer 3
1 1
Layer 1 Layer 2
𝑥1
(2)
𝑎1 Output Layer
(2)
𝑥2 𝑎2
(2) ℎ𝑊,𝑏 (𝑥)
𝑎3
𝑥3 Layer 3
1 1
Layer 1 Layer 2
Figure 3
Notations
𝑥1
(2)
𝑎1 Output Layer
(2)
𝑥2 𝑎2
(2) ℎ𝑊,𝑏 (𝑥)
𝑎3
𝑥3 Layer 3
1 1
Layer 1 Layer 2
(l)
Let zi be the weighted sum of inputs to neuron i in layer l, for example:
3
(2) (1) (1)
X
zi = Wij xj + bi
j=1
Moreover, we have:
(l) (l)
ai = f (zi )
a(2) = f (z (2) )
Forward propagation
a(l+1) = f (z (l+1) )
Different architectures
𝑥1
𝑥2
ℎ𝑊,𝑏 (𝑥)
𝑥3
1 Layer 4
1 1
Layer 3
Layer 1 Layer 2
1
J(W, b; x, y) = ||hW,b (x) − y||2
2
We thus define, for the m examples, the overall cost function:
nl −1 X
sl sX
" #
m l +1
1 X (i) (i) λ X (l)
J(W, b) = J(W, b; x , y ) + (Wji )2
m i=1 2
l=1 i=1 j=1
nl −1 X sl sX
" #
m l +1
1 X 1 (i) (i) 2 λ X (l)
= ||hW,b (x ) − y || + (Wji )2
m i=1 2 2 i=1 j=1 l=1
nl −1 Xsl sX
" #
m l +1
1 X 1 λ X (l)
J(W, b) = ||hW,b (x(i) ) − y (i) ||2 + (Wji )2
m i=1 2 2 i=1 j=1
l=1
(l)
Wij ∼ N (0, )
.
(l)
bij ∼ N (0, )
.
Gradient descent
Here is one iteration of gradient descent:
(l) (l) ∂
Wij = Wij − α J(W, b)
∂Wij (l)
(l) (l) ∂
bi = bi − α (l)
J(W, b)
∂bi
with
m
" #
∂ 1 X ∂ (i) (i) (l)
(l)
J(W, b) = J(W, b; x , y ) + λWij
∂W m i=1 ∂W (l)
ij ij
" m
#
∂ 1 X ∂
(l)
J(W, b) = J(W, b; x(i) , y (i) )
∂b m i=1 ∂b(l)
i i
Pierre Michel Prediction methods and Machine learning 27/35
1. Deep Learning
1.3. Training a neural network
Backpropagation algorithm
• Feedforward pass: compute all the activations in layers L2 ,. . . ,Lnl .
• Output layer: for each neuron i in layer nl set
(nl ) ∂ 1 (n ) (n )
δi = (n )
||y − hW,b (x)||2 = −(yi − ai l )f 0 (zi l )
∂zi l 2
• Hidden layers: for each hidden layer, l = nl − 1, nl − 2, ..., 2
δ (l) = (W (l) )T δ (l+1) · f 0 (z (l) )
(l) (l) 1 (l) (l)
W =W −α ∆W + λW
m
1
b(l) = b(l) − α ∆b(l)
m
Loop a given number of times (called epochs).
Note: by default, batch size equals m, this can bePrediction
Pierre Michel
reduced to q < m such
methods and Machine learning 31/35
1. Deep Learning
1.4. Deep learning in Python