Lecture 04 Back Propagation
Lecture 04 Back Propagation
Lecturer : Hongpu Liu Lecture 4-1 PyTorch Tutorial @ SLAM Research Group
Compute gradient in simple network
Neuron
Linear Model 𝒙
∗ ෝ
𝒚 𝒙 ∗ ෝ
𝒚
𝑦ො = 𝑥 ∗ 𝜔 𝝎
𝝎
𝜕𝑙𝑜𝑠𝑠 𝜕𝑙𝑜𝑠𝑠𝑛
𝜔 =𝜔−𝛼 𝜕𝜔
= 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-2 PyTorch Tutorial @ SLAM Research Group
What about the complicated network?
Gradient
𝜕𝑙𝑜𝑠𝑠
=?
𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-3 PyTorch Tutorial @ SLAM Research Group
Computational Graph
𝑋 Input
𝑦ො = 𝑊2 𝑊1 ∙ 𝑋 + 𝑏1 + 𝑏2
Lecturer : Hongpu Liu Lecture 4-4 PyTorch Tutorial @ SLAM Research Group
Computational Graph
𝑋 Input
𝑦ො = 𝑊2 𝑊1 ∙ 𝑋 + 𝑏1 + 𝑏2
Lecturer : Hongpu Liu Lecture 4-5 PyTorch Tutorial @ SLAM Research Group
Computational Graph
𝑋
First Layer
𝑊1 𝑀𝑀
𝑦ො = 𝑊2 𝑊1 ∙ 𝑋 + 𝑏1 + 𝑏2
Lecturer : Hongpu Liu Lecture 4-6 PyTorch Tutorial @ SLAM Research Group
Computational Graph
𝑋
First Layer
𝑊1 𝑀𝑀
𝑦ො = 𝑊2 𝑊1 ∙ 𝑋 + 𝑏1 + 𝑏2
𝑊2 𝑀𝑀
Lecturer : Hongpu Liu Lecture 4-7 PyTorch Tutorial @ SLAM Research Group
Computational Graph
𝑋
First Layer
𝑊1 𝑀𝑀
𝑦ො = 𝑊2 𝑊1 ∙ 𝑋 + 𝑏1 + 𝑏2
𝑊2 𝑀𝑀
𝑏2 𝐴𝐷𝐷
𝑦ො
Lecturer : Hongpu Liu Lecture 4-8 PyTorch Tutorial @ SLAM Research Group
Computational Graph
𝑋
First Layer
𝑊1 𝑀𝑀
𝑦ො = 𝑊2 𝑊1 ∙ 𝑋 + 𝑏1 + 𝑏2
𝑊2 𝑀𝑀
Second Layer
𝑏2 𝐴𝐷𝐷
𝑦ො
Lecturer : Hongpu Liu Lecture 4-9 PyTorch Tutorial @ SLAM Research Group
What problem about this two layer neural network?
𝑋
A two layer neural network
𝑦ො = 𝑊2 𝑊1 ∙ 𝑋 + 𝑏1 + 𝑏2 𝑊1 𝑀𝑀
𝑋
= 𝑊2 ∙ 𝑊1 ∙ 𝑋 + (𝑊2 𝑏1 +𝑏2 )
𝑏1 𝐴𝐷𝐷 𝑊 𝑀𝑀
=𝑊∙𝑋+𝑏
𝑊2 𝑀𝑀 𝑏 𝐴𝐷𝐷
𝑦ො
𝑏2 𝐴𝐷𝐷
𝑦ො
Lecturer : Hongpu Liu Lecture 4-10 PyTorch Tutorial @ SLAM Research Group
What problem about this two layer neural network?
𝑋
A two layer neural network
First Layer
𝑦ො = 𝑊2 𝑊1 ∙ 𝑋 + 𝑏1 + 𝑏2 𝑊1 𝑀𝑀
= 𝑊2 ∙ 𝑊1 ∙ 𝑋 + (𝑊2 𝑏1 +𝑏2 )
Nonlinear
𝑏1 𝐴𝐷𝐷 𝜎
=𝑊∙𝑋+𝑏 Function
Second Layer
A nonlinear function is required 𝑊2 𝑀𝑀
by each layer.
𝑦ො
Lecturer : Hongpu Liu Lecture 4-11 PyTorch Tutorial @ SLAM Research Group
The composition of functions and Chain Rule
Lecturer : Hongpu Liu Lecture 4-12 PyTorch Tutorial @ SLAM Research Group
Chain Rule – 1. Create Computational Graph (Forward)
𝑥
𝑥
𝑓(𝑥, 𝜔)
𝑓 𝑧 Loss
𝜔
𝜔
Lecturer : Hongpu Liu Lecture 4-13 PyTorch Tutorial @ SLAM Research Group
Chain Rule – 2. Local Gradient
𝑥
𝜕𝑧
𝑥 𝜕𝑥
𝑓(𝑥, 𝜔)
𝑓 𝑧 Loss
𝜔
𝜕𝑧
𝜔 𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-14 PyTorch Tutorial @ SLAM Research Group
Chain Rule – 3. Given gradient from successive node
𝑥
𝜕𝑧
𝑥 𝜕𝑥
𝑓(𝑥, 𝜔)
𝑓 𝑧 Loss
𝜕𝐿
𝜔 𝜕𝑧
𝜕𝑧
𝜔 𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-15 PyTorch Tutorial @ SLAM Research Group
Chain Rule – 4. Use chain rule to compute the gradient (Backward)
𝑥
𝜕𝑧
𝑥 𝜕𝑥
𝜕𝐿 𝑓(𝑥, 𝜔)
𝜕𝑥
𝑓 𝑧 Loss
𝜕𝐿
𝜔 𝜕𝑧
𝜕𝑧
𝜔 𝜕𝜔
𝜕𝐿
𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-16 PyTorch Tutorial @ SLAM Research Group
Chain Rule – 4. Use chain rule to compute the gradient (Backward)
𝑥
𝜕𝑧
𝑥 𝜕𝑥
𝜕𝐿 𝜕𝐿 𝜕𝑧 𝑓(𝑥, 𝜔)
= ∙
𝜕𝑥 𝜕𝑧 𝜕𝑥
𝑓 𝑧 Loss
𝜕𝐿
𝜔 𝜕𝑧
𝜕𝑧
𝜔 𝜕𝜔
𝜕𝐿 𝜕𝐿 𝜕𝑧
= ∙
𝜕𝜔 𝜕𝑧 𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-17 PyTorch Tutorial @ SLAM Research Group
Example: 𝑓 = 𝑥 ∙ 𝜔
𝑥
𝜕𝑧
𝑥 𝜕𝑥
=ω
𝑧
𝑓 =𝑥∙𝜔 𝑧 Loss
𝜔
𝜕𝑧
𝜔 𝜕𝜔
=𝑥
Lecturer : Hongpu Liu Lecture 4-18 PyTorch Tutorial @ SLAM Research Group
Example: 𝑓 = 𝑥 ∙ 𝜔, 𝑥 = 2, 𝜔 = 3
Forward
𝑥=2
𝜕𝑧
𝑥 𝜕𝑥
=ω
𝑧=6
𝑓 =𝑥∙𝜔 𝑧 Loss
𝜔=3
𝜕𝑧
𝜔 𝜕𝜔
=𝑥
Lecturer : Hongpu Liu Lecture 4-19 PyTorch Tutorial @ SLAM Research Group
Example: Backward
Backward
𝑥=2
𝜕𝑧
𝑥 𝜕𝑥
=ω
𝑧=6
𝑓 =𝑥∙𝜔 𝑧 Loss
𝜕𝐿
𝜔=3 =5
𝜕𝑧
𝜕𝑧
𝜔 𝜕𝜔
=𝑥 Given by successive node
Lecturer : Hongpu Liu Lecture 4-20 PyTorch Tutorial @ SLAM Research Group
Example: Backward
Backward
𝑥=2
𝜕𝑧
𝑥 𝜕𝑥
=ω
𝜕𝐿 𝜕𝐿 𝜕𝑧 𝑧=6
= ∙
𝜕𝑥 𝜕𝑧 𝜕𝑥
𝑓 =𝑥∙𝜔 𝑧 Loss
𝜕𝐿
𝜔=3 =5
𝜕𝑧
𝜕𝑧
𝜔 𝜕𝜔
=𝑥
𝜕𝐿 𝜕𝐿 𝜕𝑧
= ∙
𝜕𝜔 𝜕𝑧 𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-21 PyTorch Tutorial @ SLAM Research Group
Example: Backward
Backward
𝑥=2
𝜕𝑧
𝑥 𝜕𝑥
=ω
𝜕𝐿 𝜕𝐿 𝜕𝑧 𝑧=6
= ∙
𝜕𝑥 𝜕𝑧 𝜕𝑥
= 5 ∙ 𝜔 = 15 𝑓 =𝑥∙𝜔 𝑧 Loss
𝜕𝐿
𝜔=3 =5
𝜕𝑧
𝜕𝑧
𝜔 𝜕𝜔
=𝑥
𝜕𝐿 𝜕𝐿 𝜕𝑧
= ∙
𝜕𝜔 𝜕𝑧 𝜕𝜔
= 5 ∙ 𝑥 = 10
Lecturer : Hongpu Liu Lecture 4-22 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
Linear Model
𝑦ො = 𝑥 ∗ 𝜔
𝜔 ∗ 𝑦ො
Lecturer : Hongpu Liu Lecture 4-23 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
Linear Model
𝑦ො = 𝑥 ∗ 𝜔
𝜔 ∗ 𝑦ො
𝜕𝑥𝜔
=𝑥
𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-24 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
Linear Model
𝑦ො = 𝑥 ∗ 𝜔
𝑥 =1
𝜔 ∗ 𝑦ො
𝜕𝑥𝜔
=𝑥
𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-25 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟
𝜕𝑥𝜔
=𝑥
𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-26 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟
𝜕𝑥𝜔 𝜕𝑦ො − 𝑦
=𝑥 =1
𝜕𝜔 𝜕𝑦ො
Lecturer : Hongpu Liu Lecture 4-27 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠 = 1
𝜕𝑥𝜔 𝜕𝑦ො − 𝑦
=𝑥 =1
𝜕𝜔 𝜕𝑦ො
Lecturer : Hongpu Liu Lecture 4-28 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠 = 1
𝜕𝑥𝜔 𝜕𝑦ො − 𝑦 𝜕𝑟 2
=𝑥 =1 = 2𝑟
𝜕𝜔 𝜕𝑦ො 𝜕𝑟
Lecturer : Hongpu Liu Lecture 4-29 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠 = 1
𝜕𝑥𝜔 𝜕𝑦ො − 𝑦 𝜕𝑟 2
=𝑥 =1 = 2𝑟
𝜕𝜔 𝜕𝑦ො 𝜕𝑟
𝜕𝑙𝑜𝑠𝑠
= 2𝑟 = −2
𝜕𝑟
Lecturer : Hongpu Liu Lecture 4-30 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠 = 1
𝜕𝑥𝜔 𝜕𝑦ො − 𝑦 𝜕𝑟 2
=𝑥 =1 = 2𝑟
𝜕𝜔 𝜕𝑦ො 𝜕𝑟
𝜕𝑙𝑜𝑠𝑠 𝜕𝑙𝑜𝑠𝑠 𝜕𝑟
= ∙ 𝜕𝑙𝑜𝑠𝑠
𝜕𝑦ො 𝜕𝑟 𝜕𝑦ො = 2𝑟 = −2
= −2 ∙ 1 = −2 𝜕𝑟
Lecturer : Hongpu Liu Lecture 4-31 PyTorch Tutorial @ SLAM Research Group
Computational Graph of Linear Model
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠 = 1
𝜕𝑥𝜔 𝜕𝑦ො − 𝑦 𝜕𝑟 2
=𝑥 =1 = 2𝑟
𝜕𝜔 𝜕𝑦ො 𝜕𝑟
𝜕𝑙𝑜𝑠𝑠 𝜕𝑙𝑜𝑠𝑠 𝜕 𝑦ො
= ∙ 𝜕𝑙𝑜𝑠𝑠 𝜕𝑙𝑜𝑠𝑠 𝜕𝑟
𝜕𝜔 𝜕 𝑦ො 𝜕𝜔 = ∙ 𝜕𝑙𝑜𝑠𝑠
𝜕𝑦ො 𝜕𝑟 𝜕𝑦ො = 2𝑟 = −2
= −2 ∙ 𝑥 𝜕𝑟
= −2 ∙ 1 = −2
= −2 ∙ 1 = −2
Lecturer : Hongpu Liu Lecture 4-32 PyTorch Tutorial @ SLAM Research Group
Exercise 4-1: Compute the gradient with Computational Graph
𝑥 =2 𝑦 =4
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠
𝜕𝑙𝑜𝑠𝑠
=?
𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-33 PyTorch Tutorial @ SLAM Research Group
Exercise 4-2: Compute gradient of Affine model
𝑥 =1 𝑦 =2
𝜔 ∗ + 𝑦ො 𝑙𝑜𝑠𝑠(𝑦,
ො 𝑦) 𝑙𝑜𝑠𝑠
𝜕𝑙𝑜𝑠𝑠
𝜕𝜔
=? 𝑏 =2
𝜕𝑙𝑜𝑠𝑠
=?
𝜕𝑏
Lecturer : Hongpu Liu Lecture 4-34 PyTorch Tutorial @ SLAM Research Group
Tensor in PyTorch
Lecturer : Hongpu Liu Lecture 4-35 PyTorch Tutorial @ SLAM Research Group
Implementation of linear model with PyTorch
w = torch.Tensor([1.0])
w.requires_grad = True
Lecturer : Hongpu Liu Lecture 4-36 PyTorch Tutorial @ SLAM Research Group
Implementation of linear model with PyTorch
Linear Model
def forward(x):
return x * w 𝑦ො = 𝑥 ∗ 𝜔
Lecturer : Hongpu Liu Lecture 4-37 PyTorch Tutorial @ SLAM Research Group
Implementation of linear model with PyTorch
Lecturer : Hongpu Liu Lecture 4-38 PyTorch Tutorial @ SLAM Research Group
Implementation of linear model with PyTorch
w.grad.data.zero_()
Lecturer : Hongpu Liu Lecture 4-39 PyTorch Tutorial @ SLAM Research Group
Implementation of linear model with PyTorch
Lecturer : Hongpu Liu Lecture 4-40 PyTorch Tutorial @ SLAM Research Group
Implementation of linear model with PyTorch
Lecturer : Hongpu Liu Lecture 4-41 PyTorch Tutorial @ SLAM Research Group
Implementation of linear model with PyTorch
Lecturer : Hongpu Liu Lecture 4-42 PyTorch Tutorial @ SLAM Research Group
Implementation of linear model with PyTorch
w.grad.data.zero_()
Lecturer : Hongpu Liu Lecture 4-43 PyTorch Tutorial @ SLAM Research Group
Forward/Backward in PyTorch
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠
Lecturer : Hongpu Liu Lecture 4-44 PyTorch Tutorial @ SLAM Research Group
Forward in PyTorch
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠 = 1
w = torch.Tensor([1.0])
w.requires_grad = True
l = loss(x, y)
Lecturer : Hongpu Liu Lecture 4-45 PyTorch Tutorial @ SLAM Research Group
Backward in PyTorch
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠 = 1
𝜕𝑙𝑜𝑠𝑠
l.backward() = 𝑤. 𝑔𝑟𝑎𝑑
𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-46 PyTorch Tutorial @ SLAM Research Group
Update weight in PyTorch
𝑥 =1 𝑦 =2
𝜔 ∗ 𝑦ො − 𝑟 ^2 𝑙𝑜𝑠𝑠 = 1
𝜕𝑙𝑜𝑠𝑠
l.backward() = 𝑤. 𝑔𝑟𝑎𝑑
𝜕𝜔
Lecturer : Hongpu Liu Lecture 4-47 PyTorch Tutorial @ SLAM Research Group
Exercise 4-3: Compute gradients using computational graph
Lecturer : Hongpu Liu Lecture 4-48 PyTorch Tutorial @ SLAM Research Group
Exercise 4-4: Compute gradients using PyTorch
Lecturer : Hongpu Liu Lecture 4-49 PyTorch Tutorial @ SLAM Research Group
PyTorch Tutorial
04. Back Propagation
Lecturer : Hongpu Liu Lecture 4-50 PyTorch Tutorial @ SLAM Research Group