0% found this document useful (0 votes)

16 views14 pages

Lec03 2 Linear Regression Slides

The document provides an overview of gradient descent, an iterative method for optimizing functions, particularly in the context of linear regression. It explains the rationale behind the update rule, the importance of the learning rate, and the conditions for terminating the algorithm. Key points include the direction and size of updates based on the gradient, as well as the efficiency of gradient descent compared to direct solutions.

Uploaded by

brenden.wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views14 pages

Lec03 2 Linear Regression Slides

Uploaded by

brenden.wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CSC 311 Intro to Machine Learning

Gradient Descent

Alice Gao
Learning Objectives
By the end of this lecture, you should be able to

• Explain the rationale behind the gradient descent

update rule (direction and step size).
• Derive the gradient descent updates for linear
regression with multiple features.
• Compare and contrast direction solution and gradient
descent.

Alice Gao, CSC311 Intro to ML, University of Toronto. 2

Outline
1. Gradient Descent

Alice Gao, CSC311 Intro to ML, University of Toronto. 3

GRADIENT DESCENT

Alice Gao, CSC311 Intro to ML, University of Toronto. 4

Why Use Gradient Descent?
• A general method for optimizing a function

• Easier to implement than direct solution

• More efficient than direct solution

• Each GD update costs 𝑂(𝑁𝐷)
• rather than 𝑂 𝐷! for matrix inversion
• much cheaper if 𝐷 is large (i.e., high-dimensional data)

Alice Gao, CSC311 Intro to ML, University of Toronto. 5

What is Gradient Descent?
An iterative method to find the minima of a function.

Consider a scalar function 𝐹: ℝ ↦ ℝ.

We want to minimize the function 𝐹(𝑤).

Gradient descent procedure:

1. Start with a random point 𝑤!
2. Apply an update rule iteratively
until a stopping condition is met.

Alice Gao, CSC311 Intro to ML, University of Toronto. 6

In which direction should we move?
Sign of the update should be the At 𝑤 = 𝑤" , the
same as/opposite of sign of the gradient. derivative is
positive/negative,
and we want to
increase/decrease 𝑤.
𝑑𝐹(𝑤! )
<0
𝑑𝑤 𝑑𝐹(𝑤" )
>0
𝑑𝑤 At 𝑤 = 𝑤# , the
derivative is
positive/negative,
and we want to
increase/decrease 𝑤.
𝑤" 𝑤#

Alice Gao, CSC311 Intro to ML, University of Toronto. 7

What is the size of each update?
Each update’s size should be
_______________________ When the curve is steep,
the gradient’s magnitude. we are likely close to/far from
the minimum, and the
gradient’s magnitude is
small/large.

When the curve is flat,

we are likely close to/far from
the minimum, and the
gradient’s magnitude is
small/large.

Alice Gao, CSC311 Intro to ML, University of Toronto. 9

Gradient Descent Update Rule
To minimize the function 𝐹(w), we use the update rule:

w ← w − 𝛼 ∇w 𝐹(w)
or
"#(%) "#(%)
𝑤! ← 𝑤! − 𝛼 ......... 𝑤( ← 𝑤( − 𝛼
"'! "'"

Each update
Direction Negative of gradient’s sign
Size Proportional to gradient’s magnitude

Alice Gao, CSC311 Intro to ML, University of Toronto. 11

Gradient Descent: When Do We Stop?
In theory:
• Stop when w stop changing (convergence)

In practice:
• Stop when the change in 𝐹(w) is small enough.
• Stop when we are tired of waiting.

Alice Gao, CSC311 Intro to ML, University of Toronto. 12

Choosing a Learning Rate
Too small Too large
• Takes too long to converge. • May diverge

Alice Gao, CSC311 Intro to ML, University of Toronto. 13

Gradient Descent Update for Linear Regression
The general update rule:
w ← w − 𝛼 ∇w ℰ(𝐰)

Update rule for linear regression:

𝛼 )
w ← w − X (Xw − t)
𝑁
or
-
𝛼
w ← w − 2 x * (𝐰 ) x (*) − t (*) )
𝑁
*+,

Alice Gao, CSC311 Intro to ML, University of Toronto. 14

Summary: Gradient Descent
w ← w − 𝛼 ∇w ℰ(𝐰)

Update rule • Direction: negative of gradient

• Magnitude: proportional to gradient.

Terminating • When change in ℰ is small enough.

conditions • After a # of iterations.

• Too small: takes too long to converge.

Learning rate
• Too large: may diverge.

Alice Gao, CSC311 Intro to ML, University of Toronto. 15

A Modular Approach to ML

Model Describes relationships between variables.

Loss function Quantifies how badly the model fits the data.

Optimization
Fit a model that minimizes the loss.
algorithm

Alice Gao, CSC311 Intro to ML, University of Toronto. 16

Road Design and Detailing Using AutoCAD Civil3D 2018
100% (7)
Road Design and Detailing Using AutoCAD Civil3D 2018
21 pages
Topic 4 (Part 2) - NN Learning
No ratings yet
Topic 4 (Part 2) - NN Learning
92 pages
Oil System
No ratings yet
Oil System
48 pages
Topic 5 - Part2 NN Learning
No ratings yet
Topic 5 - Part2 NN Learning
90 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
1 Intro
No ratings yet
1 Intro
91 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Lec 7 Optimization Part 2
No ratings yet
Lec 7 Optimization Part 2
139 pages
Short Questions: (CHAPTER 21) Nuclear Physics
No ratings yet
Short Questions: (CHAPTER 21) Nuclear Physics
12 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
Geography 200 Questions 1547013925 78.
100% (1)
Geography 200 Questions 1547013925 78.
21 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
Grade 08-3rd-Term-Test-Paper-North-Western-Province
No ratings yet
Grade 08-3rd-Term-Test-Paper-North-Western-Province
4 pages
CS229
No ratings yet
CS229
69 pages
7 Optimization2 Stochastic Gradient
No ratings yet
7 Optimization2 Stochastic Gradient
114 pages
Notes 1
No ratings yet
Notes 1
30 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
110 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
cs229 2
No ratings yet
cs229 2
275 pages
EE2211 Introduction To Machine Learning
No ratings yet
EE2211 Introduction To Machine Learning
99 pages
Upload Unit 2
No ratings yet
Upload Unit 2
19 pages
Week 7
No ratings yet
Week 7
53 pages
CV Lec4
No ratings yet
CV Lec4
46 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Lect 5 - Gradient Descent
No ratings yet
Lect 5 - Gradient Descent
31 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
LLB 1 Sem Legal Language and Legal Writing Including General English Winter 2015
No ratings yet
LLB 1 Sem Legal Language and Legal Writing Including General English Winter 2015
5 pages
Unit 5 Learning
No ratings yet
Unit 5 Learning
21 pages
Lec 5 - Gradient-Descent
No ratings yet
Lec 5 - Gradient-Descent
31 pages
GD Algo
No ratings yet
GD Algo
18 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
DL Unit-I
No ratings yet
DL Unit-I
30 pages
Regression
No ratings yet
Regression
30 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Gradient Descent - DR Saiful
No ratings yet
Gradient Descent - DR Saiful
26 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Lecture 5
No ratings yet
Lecture 5
18 pages
Asefa Lombeso
No ratings yet
Asefa Lombeso
98 pages
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
No ratings yet
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
15 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Arizona550GT-XT User Manual Rev-1.2-A PDF
No ratings yet
Arizona550GT-XT User Manual Rev-1.2-A PDF
269 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
LInear
No ratings yet
LInear
14 pages
Gradient Descent
No ratings yet
Gradient Descent
14 pages
YouTube Transcript Summarizer PPT Final
100% (1)
YouTube Transcript Summarizer PPT Final
9 pages
Optimization For ML: CS771: Introduction To Machine Learning Nisheeth
No ratings yet
Optimization For ML: CS771: Introduction To Machine Learning Nisheeth
18 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Current Affairs Q&A PDF Top 100 May 2023 by AffairsCloud 1
No ratings yet
Current Affairs Q&A PDF Top 100 May 2023 by AffairsCloud 1
59 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
ML Notes
No ratings yet
ML Notes
14 pages
Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai
14 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
3ELearning Material Integrative Methods Midterm-1
No ratings yet
3ELearning Material Integrative Methods Midterm-1
29 pages
Lakme
No ratings yet
Lakme
23 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
Physics Project Real
No ratings yet
Physics Project Real
18 pages
Norelem Grupa 02000 en
No ratings yet
Norelem Grupa 02000 en
56 pages
6G Internet of Things A Comprehensive Survey
No ratings yet
6G Internet of Things A Comprehensive Survey
25 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Dropping Out of School Essay
100% (2)
Dropping Out of School Essay
6 pages
Getting Started With Calc Manager For HFM Calc Manager For HFM
No ratings yet
Getting Started With Calc Manager For HFM Calc Manager For HFM
48 pages
A3 Oas
No ratings yet
A3 Oas
4 pages
cs188 Fa23 Note23
No ratings yet
cs188 Fa23 Note23
2 pages
UCSD ECE153 Handout #26 Prof. Young-Han Kim Thursday, May 1, Solutions To Homework Set #4 (Prepared by Fatemeh Arbabjolfaei) - PDF Free Download PDF
No ratings yet
UCSD ECE153 Handout #26 Prof. Young-Han Kim Thursday, May 1, Solutions To Homework Set #4 (Prepared by Fatemeh Arbabjolfaei) - PDF Free Download PDF
11 pages
Matrl CS P355NL1 PDF
100% (1)
Matrl CS P355NL1 PDF
2 pages
Print Math 11 Mod 1
No ratings yet
Print Math 11 Mod 1
13 pages
GCSE Pyschology
No ratings yet
GCSE Pyschology
24 pages
7 Article8
No ratings yet
7 Article8
12 pages
CH 03
No ratings yet
CH 03
15 pages
Color CCTV Camera Wiring Guidelines P/N 1812-140
No ratings yet
Color CCTV Camera Wiring Guidelines P/N 1812-140
1 page
A Study of Inbound Logistics Mode Based On JIT Production in Cruise Ship Construction
No ratings yet
A Study of Inbound Logistics Mode Based On JIT Production in Cruise Ship Construction
18 pages
(Perpan) Heat Conduction Through A Cylindrical and Spherical Wall
No ratings yet
(Perpan) Heat Conduction Through A Cylindrical and Spherical Wall
12 pages
National Transmission & Despatch Company LTD.: 220kV G/S Khuzdar
No ratings yet
National Transmission & Despatch Company LTD.: 220kV G/S Khuzdar
1 page
Mycom Selection n4wb
No ratings yet
Mycom Selection n4wb
1 page
BIO&Epi Final22
No ratings yet
BIO&Epi Final22
3 pages
Active Contour: Advancing Computer Vision with Active Contour Techniques
From Everand
Active Contour: Advancing Computer Vision with Active Contour Techniques
Fouad Sabry
No ratings yet
Hill Climbing: Fundamentals and Applications
From Everand
Hill Climbing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Simulated Annealing: Fundamentals and Applications
From Everand
Simulated Annealing: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lec03 2 Linear Regression Slides

Uploaded by

Lec03 2 Linear Regression Slides

Uploaded by

CSC 311 Intro to Machine Learning

• Explain the rationale behind the gradient descent

Alice Gao, CSC311 Intro to ML, University of Toronto. 2

Alice Gao, CSC311 Intro to ML, University of Toronto. 3

Alice Gao, CSC311 Intro to ML, University of Toronto. 4

• Easier to implement than direct solution

• More efficient than direct solution

Alice Gao, CSC311 Intro to ML, University of Toronto. 5

Consider a scalar function 𝐹: ℝ ↦ ℝ.

Gradient descent procedure:

Alice Gao, CSC311 Intro to ML, University of Toronto. 6

Alice Gao, CSC311 Intro to ML, University of Toronto. 7

When the curve is flat,

Alice Gao, CSC311 Intro to ML, University of Toronto. 9

Alice Gao, CSC311 Intro to ML, University of Toronto. 11

Alice Gao, CSC311 Intro to ML, University of Toronto. 12

Alice Gao, CSC311 Intro to ML, University of Toronto. 13

Update rule for linear regression:

Alice Gao, CSC311 Intro to ML, University of Toronto. 14

Update rule • Direction: negative of gradient

Terminating • When change in ℰ is small enough.

• Too small: takes too long to converge.

Alice Gao, CSC311 Intro to ML, University of Toronto. 15

Model Describes relationships between variables.

Alice Gao, CSC311 Intro to ML, University of Toronto. 16

You might also like