Lecture 03 Gradient Descent
Lecture 03 Gradient Descent
Lecturer : Hongpu Liu Lecture 3-1 PyTorch Tutorial @ SLAM Research Group
Revision
x (hours) y (points)
1 2 Linear Model
2 4
3 6 𝑦ො = 𝑥 ∗ 𝜔
4 ?
To simplify the model
Lecturer : Hongpu Liu Lecture 3-2 PyTorch Tutorial @ SLAM Research Group
Revision
Linear Model 14
12
𝑦ො = 𝑥 ∗ 𝜔 10 True Line
8
Points
6
x (hours) y (points)
4
1 2
2
2 4
0
3 6
0 1 2 3 4 5
Hours
Lecturer : Hongpu Liu Lecture 3-3 PyTorch Tutorial @ SLAM Research Group
Revision
12
𝑦ො = 𝑥 ∗ 𝜔 10 True Line
8
Points
6
x (hours) y (points)
4
1 2
2
2 4
0
3 6
0 1 2 3 4 5
Hours
Lecturer : Hongpu Liu Lecture 3-4 PyTorch Tutorial @ SLAM Research Group
Revision
Lecturer : Hongpu Liu Lecture 3-5 PyTorch Tutorial @ SLAM Research Group
Optimization Problem
Optimization Problem
Lecturer : Hongpu Liu Lecture 3-6 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
𝒄𝒐𝒔𝒕
Initial Guess
𝝎
Lecturer : Hongpu Liu Lecture 3-7 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
𝒄𝒐𝒔𝒕
Initial Guess
Global cost
minimum
𝝎
Lecturer : Hongpu Liu Lecture 3-8 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
𝒄𝒐𝒔𝒕
Initial Guess
? ?
𝝎
Lecturer : Hongpu Liu Lecture 3-9 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
Gradient
𝒄𝒐𝒔𝒕
𝜕𝑐𝑜𝑠𝑡
Initial Guess
𝜕𝜔
𝝎
Lecturer : Hongpu Liu Lecture 3-10 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
Gradient
𝒄𝒐𝒔𝒕
𝜕𝑐𝑜𝑠𝑡
𝜕𝜔
Update 𝝎
Update
𝜕𝑐𝑜𝑠𝑡
𝜔 =𝜔−𝛼
𝜕𝜔
𝝎
Lecturer : Hongpu Liu Lecture 3-11 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
Gradient
𝒄𝒐𝒔𝒕
𝜕𝑐𝑜𝑠𝑡
Repeat 𝜕𝜔
Gradient
Descent Update
𝜕𝑐𝑜𝑠𝑡
𝜔 =𝜔−𝛼
𝜕𝜔
𝝎
Lecturer : Hongpu Liu Lecture 3-12 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
Derivative Gradient
𝑁
𝜕𝑐𝑜𝑠𝑡(𝜔) 𝜕 1 2
𝜕𝑐𝑜𝑠𝑡
= 𝑥𝑛 ∙ 𝜔 − 𝑦𝑛
𝜕𝜔 𝜕𝜔 𝑁 𝜕𝜔
𝑛=1
𝑁
1 𝜕 Update
= 𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 2
𝑁 𝜕𝜔 𝜕𝑐𝑜𝑠𝑡
𝑛=1
𝜔 =𝜔−𝛼
1
𝑁
𝜕(𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 ) 𝜕𝜔
= 2 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝑁 𝜕𝜔
𝑛=1
𝑁
1
= 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝑁
𝑛=1
Lecturer : Hongpu Liu Lecture 3-13 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
Derivative Gradient
𝑁
𝜕𝑐𝑜𝑠𝑡(𝜔) 𝜕 1 𝜕𝑐𝑜𝑠𝑡
= 𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 2
𝜕𝜔 𝜕𝜔 𝑁 𝜕𝜔
𝑛=1
𝑁
1 𝜕 Update
= 𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 2
𝑁 𝜕𝜔 𝜕𝑐𝑜𝑠𝑡
𝑛=1 𝜔 =𝜔−𝛼
𝑁 𝜕𝜔
1 𝜕(𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
= 2 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝑁 𝜕𝜔 Update
𝑛=1
𝑁 𝑁
1 1
= 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 ) 𝜔 = 𝜔 − 𝛼 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝑁
𝑁 𝑛=1
𝑛=1
Lecturer : Hongpu Liu Lecture 3-14 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
Lecturer : Hongpu Liu Lecture 3-15 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
w = 1.0
Lecturer : Hongpu Liu Lecture 3-16 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] def forward(x):
return x * w
w = 1.0
Lecturer : Hongpu Liu Lecture 3-17 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] def cost(xs, ys):
cost = 0
w = 1.0
for x, y in zip(xs, ys):
def forward(x): y_pred = forward(x)
return x * w
cost += (y_pred - y) ** 2
def cost(xs, ys): return cost / len(xs)
cost = 0
for x, y in zip(xs, ys):
y_pred = forward(x)
cost += (y_pred - y) ** 2 Define the cost function
return cost / len(xs)
Lecturer : Hongpu Liu Lecture 3-18 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] def gradient(xs, ys):
grad = 0
w = 1.0
for x, y in zip(xs, ys):
def forward(x): grad += 2 * x * (x * w - y)
return x * w
return grad / len(xs)
def cost(xs, ys):
cost = 0
for x, y in zip(xs, ys):
y_pred = forward(x)
cost += (y_pred - y) ** 2 Define the gradient function
return cost / len(xs)
Lecturer : Hongpu Liu Lecture 3-19 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] for epoch in range(100):
cost_val = cost(x_data, y_data)
w = 1.0
grad_val = gradient(x_data, y_data)
def forward(x): w -= 0.01 * grad_val
return x * w
Lecturer : Hongpu Liu Lecture 3-20 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0] Predict (before training) 4 4.0
y_data = [2.0, 4.0, 6.0] Epoch: 0 w= 1.09 cost= 4.67
Epoch: 1 w= 1.18 cost= 3.84
w = 1.0 Epoch: 2 w= 1.25 cost= 3.15
def forward(x):
Epoch: 3 w= 1.32 cost= 2.59
return x * w Epoch: 4 w= 1.39 cost= 2.13
Epoch: 5 w= 1.44 cost= 1.75
def cost(xs, ys): Epoch: 6 w= 1.50 cost= 1.44
cost = 0 Epoch: 7 w= 1.54 cost= 1.18
for x, y in zip(xs, ys): Epoch: 8 w= 1.59 cost= 0.97
y_pred = forward(x)
cost += (y_pred - y) ** 2
Epoch: 9 w= 1.62 cost= 0.80
return cost / len(xs) Epoch: 10 w= 1.66 cost= 0.66
…………
def gradient(xs, ys): Epoch: 90 w= 2.00 cost= 0.00
grad = 0 Epoch: 91 w= 2.00 cost= 0.00
for x, y in zip(xs, ys):
grad += 2 * x * (x * w - y)
Epoch: 92 w= 2.00 cost= 0.00
Epoch: 93 w= 2.00 cost= 0.00 Cost in each epoch
return grad / len(xs)
Epoch: 94 w= 2.00 cost= 0.00
print('Predict (before training)', 4, forward(4)) Epoch: 95 w= 2.00 cost= 0.00
for epoch in range(100): Epoch: 96 w= 2.00 cost= 0.00
cost_val = cost(x_data, y_data) Epoch: 97 w= 2.00 cost= 0.00
grad_val = gradient(x_data, y_data) Epoch: 98 w= 2.00 cost= 0.00
w -= 0.01 * grad_val
print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
Epoch: 99 w= 2.00 cost= 0.00
print('Predict (after training)', 4, forward(4)) Predict (after training) 4 8.00
Lecturer : Hongpu Liu Lecture 3-21 PyTorch Tutorial @ SLAM Research Group
Stochastic Gradient Descent
𝜕𝑙𝑜𝑠𝑠 𝜕𝑙𝑜𝑠𝑠𝑛
𝜔 =𝜔−𝛼 𝜕𝜔
= 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝜕𝜔
Lecturer : Hongpu Liu Lecture 3-22 PyTorch Tutorial @ SLAM Research Group
Implementation of SGD
Lecturer : Hongpu Liu Lecture 3-23 PyTorch Tutorial @ SLAM Research Group
Implementation of SGD
def forward(x):
return x * w
Lecturer : Hongpu Liu Lecture 3-24 PyTorch Tutorial @ SLAM Research Group
Implementation of SGD
Lecturer : Hongpu Liu Lecture 3-25 PyTorch Tutorial @ SLAM Research Group
PyTorch Tutorial
03. Gradient Descent
Lecturer : Hongpu Liu Lecture 3-26 PyTorch Tutorial @ SLAM Research Group