0% found this document useful (0 votes)
62 views26 pages

Lecture 03 Gradient Descent

The document discusses gradient descent for linear regression. It introduces a linear model to fit data points, with the goal of minimizing the mean squared error cost function. Gradient descent is presented as an optimization algorithm that starts with an initial guess for the model parameter and iteratively updates it in the direction of steepest descent according to the gradient of the cost function, until a minimum is reached. Pseudocode is provided to implement gradient descent for linear regression, with updates calculated based on the derivative of the cost function with respect to the model parameter.

Uploaded by

lingyun wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views26 pages

Lecture 03 Gradient Descent

The document discusses gradient descent for linear regression. It introduces a linear model to fit data points, with the goal of minimizing the mean squared error cost function. Gradient descent is presented as an optimization algorithm that starts with an initial guess for the model parameter and iteratively updates it in the direction of steepest descent according to the gradient of the cost function, until a minimum is reached. Pseudocode is provided to implement gradient descent for linear regression, with updates calculated based on the derivative of the cost function with respect to the model parameter.

Uploaded by

lingyun wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

PyTorch Tutorial

03. Gradient Descent

Lecturer : Hongpu Liu Lecture 3-1 PyTorch Tutorial @ SLAM Research Group
Revision

• What would be the best model for the data?


• Linear model?

x (hours) y (points)
1 2 Linear Model
2 4
3 6 𝑦ො = 𝑥 ∗ 𝜔
4 ?
To simplify the model

Lecturer : Hongpu Liu Lecture 3-2 PyTorch Tutorial @ SLAM Research Group
Revision

Linear Model 14

12

𝑦ො = 𝑥 ∗ 𝜔 10 True Line
8

Points
6
x (hours) y (points)
4
1 2
2
2 4
0
3 6
0 1 2 3 4 5
Hours

Lecturer : Hongpu Liu Lecture 3-3 PyTorch Tutorial @ SLAM Research Group
Revision

The machine starts with a random guess, 𝜔 = random value


Linear Model 14

12

𝑦ො = 𝑥 ∗ 𝜔 10 True Line
8

Points
6
x (hours) y (points)
4
1 2
2
2 4
0
3 6
0 1 2 3 4 5
Hours

Lecturer : Hongpu Liu Lecture 3-4 PyTorch Tutorial @ SLAM Research Group
Revision

Mean Square Error


𝑁
1 2
𝑐𝑜𝑠𝑡(𝜔) = ෍ 𝑦ො𝑛 − 𝑦𝑛
𝑁
𝑛=1

It can be found that


when 𝜔 = 2, the cost
will be minimal.

Lecturer : Hongpu Liu Lecture 3-5 PyTorch Tutorial @ SLAM Research Group
Optimization Problem

Mean Square Error


𝑁
1 2
𝑐𝑜𝑠𝑡(𝜔) = ෍ 𝑦ො𝑛 − 𝑦𝑛
𝑁
𝑛=1

Optimization Problem

𝜔∗ = arg min 𝑐𝑜𝑠𝑡(𝜔)


𝜔

Lecturer : Hongpu Liu Lecture 3-6 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm

𝒄𝒐𝒔𝒕
Initial Guess

𝝎
Lecturer : Hongpu Liu Lecture 3-7 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm

𝒄𝒐𝒔𝒕
Initial Guess

Global cost
minimum

𝝎
Lecturer : Hongpu Liu Lecture 3-8 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm

𝒄𝒐𝒔𝒕
Initial Guess

? ?

𝝎
Lecturer : Hongpu Liu Lecture 3-9 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm

Gradient
𝒄𝒐𝒔𝒕
𝜕𝑐𝑜𝑠𝑡
Initial Guess
𝜕𝜔

𝝎
Lecturer : Hongpu Liu Lecture 3-10 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm

Gradient
𝒄𝒐𝒔𝒕
𝜕𝑐𝑜𝑠𝑡
𝜕𝜔
Update 𝝎
Update

𝜕𝑐𝑜𝑠𝑡
𝜔 =𝜔−𝛼
𝜕𝜔

𝝎
Lecturer : Hongpu Liu Lecture 3-11 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm

Gradient
𝒄𝒐𝒔𝒕
𝜕𝑐𝑜𝑠𝑡
Repeat 𝜕𝜔
Gradient
Descent Update

𝜕𝑐𝑜𝑠𝑡
𝜔 =𝜔−𝛼
𝜕𝜔

𝝎
Lecturer : Hongpu Liu Lecture 3-12 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
Derivative Gradient
𝑁
𝜕𝑐𝑜𝑠𝑡(𝜔) 𝜕 1 2
𝜕𝑐𝑜𝑠𝑡
= ෍ 𝑥𝑛 ∙ 𝜔 − 𝑦𝑛
𝜕𝜔 𝜕𝜔 𝑁 𝜕𝜔
𝑛=1
𝑁
1 𝜕 Update
= ෍ 𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 2
𝑁 𝜕𝜔 𝜕𝑐𝑜𝑠𝑡
𝑛=1
𝜔 =𝜔−𝛼
1
𝑁
𝜕(𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 ) 𝜕𝜔
= ෍ 2 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝑁 𝜕𝜔
𝑛=1
𝑁
1
= ෍ 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝑁
𝑛=1

Lecturer : Hongpu Liu Lecture 3-13 PyTorch Tutorial @ SLAM Research Group
Gradient Descent Algorithm
Derivative Gradient
𝑁
𝜕𝑐𝑜𝑠𝑡(𝜔) 𝜕 1 𝜕𝑐𝑜𝑠𝑡
= ෍ 𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 2
𝜕𝜔 𝜕𝜔 𝑁 𝜕𝜔
𝑛=1
𝑁
1 𝜕 Update
= ෍ 𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 2
𝑁 𝜕𝜔 𝜕𝑐𝑜𝑠𝑡
𝑛=1 𝜔 =𝜔−𝛼
𝑁 𝜕𝜔
1 𝜕(𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
= ෍ 2 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝑁 𝜕𝜔 Update
𝑛=1
𝑁 𝑁
1 1
= ෍ 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 ) 𝜔 = 𝜔 − 𝛼 ෍ 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝑁
𝑁 𝑛=1
𝑛=1

Lecturer : Hongpu Liu Lecture 3-14 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0

Prepare the training set.


def forward(x):
return x * w

def cost(xs, ys):


cost = 0
for x, y in zip(xs, ys):
y_pred = forward(x)
cost += (y_pred - y) ** 2
return cost / len(xs)

def gradient(xs, ys):


grad = 0
for x, y in zip(xs, ys):
grad += 2 * x * (x * w - y)
return grad / len(xs)

print('Predict (before training)', 4, forward(4))


for epoch in range(100):
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data, y_data)
w -= 0.01 * grad_val
print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
print('Predict (after training)', 4, forward(4))

Lecturer : Hongpu Liu Lecture 3-15 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
w = 1.0

Initial guess of weight.


def forward(x):
return x * w

def cost(xs, ys):


cost = 0
for x, y in zip(xs, ys):
y_pred = forward(x)
cost += (y_pred - y) ** 2
return cost / len(xs)

def gradient(xs, ys):


grad = 0
for x, y in zip(xs, ys):
grad += 2 * x * (x * w - y)
return grad / len(xs)

print('Predict (before training)', 4, forward(4))


for epoch in range(100):
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data, y_data)
w -= 0.01 * grad_val
print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
print('Predict (after training)', 4, forward(4))

Lecturer : Hongpu Liu Lecture 3-16 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] def forward(x):
return x * w
w = 1.0

Define the model:


def forward(x):
return x * w

def cost(xs, ys):


cost = 0
for x, y in zip(xs, ys):
y_pred = forward(x)
Linear Model
cost += (y_pred - y) ** 2
return cost / len(xs)

def gradient(xs, ys):


grad = 0
𝑦ො = 𝑥 ∗ 𝜔
for x, y in zip(xs, ys):
grad += 2 * x * (x * w - y)
return grad / len(xs)

print('Predict (before training)', 4, forward(4))


for epoch in range(100):
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data, y_data)
w -= 0.01 * grad_val
print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
print('Predict (after training)', 4, forward(4))

Lecturer : Hongpu Liu Lecture 3-17 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] def cost(xs, ys):
cost = 0
w = 1.0
for x, y in zip(xs, ys):
def forward(x): y_pred = forward(x)
return x * w
cost += (y_pred - y) ** 2
def cost(xs, ys): return cost / len(xs)
cost = 0
for x, y in zip(xs, ys):
y_pred = forward(x)
cost += (y_pred - y) ** 2 Define the cost function
return cost / len(xs)

def gradient(xs, ys):


grad = 0
Mean Square Error
for x, y in zip(xs, ys): 𝑁
grad += 2 * x * (x * w - y) 1 2
return grad / len(xs) 𝑐𝑜𝑠𝑡(𝜔) = ෍ 𝑦ො𝑛 − 𝑦𝑛
𝑁
print('Predict (before training)', 4, forward(4)) 𝑛=1
for epoch in range(100):
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data, y_data)
w -= 0.01 * grad_val
print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
print('Predict (after training)', 4, forward(4))

Lecturer : Hongpu Liu Lecture 3-18 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] def gradient(xs, ys):
grad = 0
w = 1.0
for x, y in zip(xs, ys):
def forward(x): grad += 2 * x * (x * w - y)
return x * w
return grad / len(xs)
def cost(xs, ys):
cost = 0
for x, y in zip(xs, ys):
y_pred = forward(x)
cost += (y_pred - y) ** 2 Define the gradient function
return cost / len(xs)

def gradient(xs, ys):


grad = 0 Gradient
for x, y in zip(xs, ys):
grad += 2 * x * (x * w - y)
𝑁
return grad / len(xs) 𝜕𝑐𝑜𝑠𝑡 1
= ෍ 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
print('Predict (before training)', 4, forward(4))
𝜕𝜔 𝑁
𝑛=1
for epoch in range(100):
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data, y_data)
w -= 0.01 * grad_val
print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
print('Predict (after training)', 4, forward(4))

Lecturer : Hongpu Liu Lecture 3-19 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0] for epoch in range(100):
cost_val = cost(x_data, y_data)
w = 1.0
grad_val = gradient(x_data, y_data)
def forward(x): w -= 0.01 * grad_val
return x * w

def cost(xs, ys):


cost = 0
for x, y in zip(xs, ys):
y_pred = forward(x)
cost += (y_pred - y) ** 2 Do the update
return cost / len(xs)

def gradient(xs, ys):


grad = 0 Update
for x, y in zip(xs, ys):
grad += 2 * x * (x * w - y)
return grad / len(xs) 𝜕𝑐𝑜𝑠𝑡
𝜔 =𝜔−𝛼
print('Predict (before training)', 4, forward(4))
for epoch in range(100):
𝜕𝜔
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data, y_data)
w -= 0.01 * grad_val
print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
print('Predict (after training)', 4, forward(4))

Lecturer : Hongpu Liu Lecture 3-20 PyTorch Tutorial @ SLAM Research Group
Implementation
x_data = [1.0, 2.0, 3.0] Predict (before training) 4 4.0
y_data = [2.0, 4.0, 6.0] Epoch: 0 w= 1.09 cost= 4.67
Epoch: 1 w= 1.18 cost= 3.84
w = 1.0 Epoch: 2 w= 1.25 cost= 3.15
def forward(x):
Epoch: 3 w= 1.32 cost= 2.59
return x * w Epoch: 4 w= 1.39 cost= 2.13
Epoch: 5 w= 1.44 cost= 1.75
def cost(xs, ys): Epoch: 6 w= 1.50 cost= 1.44
cost = 0 Epoch: 7 w= 1.54 cost= 1.18
for x, y in zip(xs, ys): Epoch: 8 w= 1.59 cost= 0.97
y_pred = forward(x)
cost += (y_pred - y) ** 2
Epoch: 9 w= 1.62 cost= 0.80
return cost / len(xs) Epoch: 10 w= 1.66 cost= 0.66
…………
def gradient(xs, ys): Epoch: 90 w= 2.00 cost= 0.00
grad = 0 Epoch: 91 w= 2.00 cost= 0.00
for x, y in zip(xs, ys):
grad += 2 * x * (x * w - y)
Epoch: 92 w= 2.00 cost= 0.00
Epoch: 93 w= 2.00 cost= 0.00 Cost in each epoch
return grad / len(xs)
Epoch: 94 w= 2.00 cost= 0.00
print('Predict (before training)', 4, forward(4)) Epoch: 95 w= 2.00 cost= 0.00
for epoch in range(100): Epoch: 96 w= 2.00 cost= 0.00
cost_val = cost(x_data, y_data) Epoch: 97 w= 2.00 cost= 0.00
grad_val = gradient(x_data, y_data) Epoch: 98 w= 2.00 cost= 0.00
w -= 0.01 * grad_val
print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
Epoch: 99 w= 2.00 cost= 0.00
print('Predict (after training)', 4, forward(4)) Predict (after training) 4 8.00

Lecturer : Hongpu Liu Lecture 3-21 PyTorch Tutorial @ SLAM Research Group
Stochastic Gradient Descent

Gradient Descent Derivative of Cost Function


𝑁
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 1
𝜔 =𝜔−𝛼 𝜕𝜔
= ෍ 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝑁
𝜕𝜔 𝑛=1

Stochastic Gradient Descent Derivative of Loss Function

𝜕𝑙𝑜𝑠𝑠 𝜕𝑙𝑜𝑠𝑠𝑛
𝜔 =𝜔−𝛼 𝜕𝜔
= 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
𝜕𝜔

Lecturer : Hongpu Liu Lecture 3-22 PyTorch Tutorial @ SLAM Research Group
Implementation of SGD

x_data = [1.0, 2.0, 3.0] def loss(x, y):


y_data = [2.0, 4.0, 6.0]
y_pred = forward(x)
w = 1.0 return (y_pred - y) ** 2
def forward(x):
return x * w

def loss(x, y):


Calculate loss function:
y_pred = forward(x)
return (y_pred - y) ** 2

def gradient(x, y):


return 2 * x * (x * w - y)
Loss Function
print('Predict (before training)', 4, forward(4))

for epoch in range(100):


for x, y in zip(x_data, y_data):
𝑙𝑜𝑠𝑠 = (𝑦ො − 𝑦)2 = (𝑥 ∗ 𝜔 − 𝑦)2
grad = gradient(x, y)
w = w - 0.01 * grad
print("\tgrad: ", x, y, grad)
l = loss(x, y)

print("progress:", epoch, "w=", w, "loss=", l)

print('Predict (after training)', 4, forward(4))

Lecturer : Hongpu Liu Lecture 3-23 PyTorch Tutorial @ SLAM Research Group
Implementation of SGD

x_data = [1.0, 2.0, 3.0] def gradient(x, y):


y_data = [2.0, 4.0, 6.0]
return 2 * x * (x * w - y)
w = 1.0

def forward(x):
return x * w

def loss(x, y):


Calculate loss function:
y_pred = forward(x)
return (y_pred - y) ** 2

def gradient(x, y):


return 2 * x * (x * w - y)
Derivative of Loss Function
print('Predict (before training)', 4, forward(4))
𝜕𝑙𝑜𝑠𝑠𝑛
for epoch in range(100):
= 2 ∙ 𝑥𝑛 ∙ (𝑥𝑛 ∙ 𝜔 − 𝑦𝑛 )
for x, y in zip(x_data, y_data):
𝜕𝜔
grad = gradient(x, y)
w = w - 0.01 * grad
print("\tgrad: ", x, y, grad)
l = loss(x, y)

print("progress:", epoch, "w=", w, "loss=", l)

print('Predict (after training)', 4, forward(4))

Lecturer : Hongpu Liu Lecture 3-24 PyTorch Tutorial @ SLAM Research Group
Implementation of SGD

x_data = [1.0, 2.0, 3.0] for epoch in range(100):


y_data = [2.0, 4.0, 6.0]
for x, y in zip(x_data, y_data):
w = 1.0 grad = gradient(x, y)
def forward(x): w = w - 0.01 * grad
return x * w print("\tgrad: ", x, y, grad)
def loss(x, y): l = loss(x, y)
y_pred = forward(x)
return (y_pred - y) ** 2

def gradient(x, y):


Update weight by every grad of
return 2 * x * (x * w - y) sample of train set.
print('Predict (before training)', 4, forward(4))

for epoch in range(100):


for x, y in zip(x_data, y_data):
grad = gradient(x, y)
w = w - 0.01 * grad
print("\tgrad: ", x, y, grad)
l = loss(x, y)

print("progress:", epoch, "w=", w, "loss=", l)

print('Predict (after training)', 4, forward(4))

Lecturer : Hongpu Liu Lecture 3-25 PyTorch Tutorial @ SLAM Research Group
PyTorch Tutorial
03. Gradient Descent

Lecturer : Hongpu Liu Lecture 3-26 PyTorch Tutorial @ SLAM Research Group

You might also like