0% found this document useful (0 votes)

124 views30 pages

Linear Regression With One Variable: Gradient Descent

1) Gradient descent is an algorithm used to find the minimum of a cost function by taking iterative steps proportional to the negative gradient. 2) For linear regression, gradient descent works by simultaneously updating the parameters theta0 and theta1 in order to minimize the cost function J(θ0, θ1), which is a convex function. 3) There are two types of gradient descent - batch gradient descent, which uses all training examples in each step, and stochastic gradient descent, which uses a single example or mini-batch, making it faster for large datasets.

Uploaded by

Fatima Sabir Masood Sabir Chaudhry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views30 pages

Linear Regression With One Variable: Gradient Descent

Uploaded by

Fatima Sabir Masood Sabir Chaudhry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Linear regression

with one variable

Gradient
Machine Learning
Descent
Slides from CS-229 by Andrew Ng
Andrew Ng
Recap
Hypothesis:

Parameters:

Cost Function:

Goal:

2
Andrew Ng
Recap
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
J() =
= = 2.3
3
Andrew Ng
Idea
Have some function
Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
4
Andrew Ng
Intuitively

J(0,1)

1
0

5
Andrew Ng
Intuitively

J(0,1)

1
0

6
Andrew Ng
Gradient descent algorithm assignment
a:=b

Simultaneously update
Learning rate &

Correct: Simultaneous update Incorrect:

7
Andrew Ng
Linear regression
with one variable
Gradient descent
intuition
Machine Learning

Andrew Ng
Gradient descent algorithm

Learning rate derivative

9
Andrew Ng
:= -
≥ 0
:= -

:= -
≤ 0
:= -

10
Andrew Ng
If α is too small, gradient descent
can be slow.

If α is too large, gradient descent

can overshoot the minimum. It may
fail to converge, or even diverge.

11
Andrew Ng
Gradient descent can converge to a local
minimum, even with the learning rate α fixed.

As we approach a local
minimum, gradient
descent will automatically
take smaller steps. So, no
need to decrease α over
time. 12
Andrew Ng
Linear regression
with one variable
Gradient descent for
linear regression
Machine Learning

Andrew Ng
Gradient descent algorithm Linear Regression Model

14
Andrew Ng
Gradient descent algorithm (Linear Regression)

update
and
simultaneously

15
Andrew Ng
Hypothesis:
Cost Function:
Convex function

(Bowl-shaped)

16
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

17
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

18
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

19
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

20
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

21
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

22
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

23
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

24
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

25
Andrew Ng
(Batch) Gradient Descent algorithm

update
and
simultaneously

26
Andrew Ng
“Batch” Gradient Descent (BGD)
“Batch”: Each step of gradient descent uses
all the ‘m’ training examples.

27
Andrew Ng
(Stochastic) Gradient Descent algorithm
repeat until convergence {

for i = 1 to m {
(𝒊 ) ( 𝒊) update
𝜽
0 :=𝜽 0 − 𝜶 ( 𝒉 𝜽
( 𝒙 ) − 𝒚 )
and
( 𝒊) (𝒊) (𝒊) simultaneously
𝜽
1 := 𝜽 1 − 𝜶 ( 𝒉𝜽
( 𝒙 ) − 𝒚 ) . 𝒙
}
}

28
Andrew Ng
“Stochastic” Gradient Descent (SGD)
“Stochastic”: Each step of gradient descent
uses one training example or a mini-batch of
data (especially in deep learning).

29
Andrew Ng
Batch Vs. Stochastic Gradient Descent
• BGD computationally expensive on large datasets

• SGD often gets close to the minimum much faster than

BGD

• SGD typically used in practice on large datasets

• SGD has a possibility of escaping from local/global

minima
30
Andrew Ng

9 Elements of Communication
83% (23)
9 Elements of Communication
1 page
David Castle - The Role of Intellectual Property Rights in Biotechnology Innovation (2011) PDF
No ratings yet
David Castle - The Role of Intellectual Property Rights in Biotechnology Innovation (2011) PDF
475 pages
Regularization: The Problem of Overfitting
No ratings yet
Regularization: The Problem of Overfitting
24 pages
1 Linear Regression With One Variable
No ratings yet
1 Linear Regression With One Variable
49 pages
Singapore Income Tax
No ratings yet
Singapore Income Tax
15 pages
Slide 4 - Linear Regression With Multiple Variables
100% (1)
Slide 4 - Linear Regression With Multiple Variables
30 pages
Machine Learning Coursera All Exercies PDF
No ratings yet
Machine Learning Coursera All Exercies PDF
117 pages
Introduction To Human Computer Interaction
No ratings yet
Introduction To Human Computer Interaction
16 pages
Linear Regression With Gradient Descent
100% (1)
Linear Regression With Gradient Descent
8 pages
The Best Cover Letter I Ever Received
No ratings yet
The Best Cover Letter I Ever Received
4 pages
Slide 3 - Linear Regression One Variable
No ratings yet
Slide 3 - Linear Regression One Variable
60 pages
Naïve Bayes Classifier Example: Play Tennis
No ratings yet
Naïve Bayes Classifier Example: Play Tennis
5 pages
Time Series Analysis: Box-Jenkins Method
No ratings yet
Time Series Analysis: Box-Jenkins Method
26 pages
Chapter 5
No ratings yet
Chapter 5
58 pages
Media Archaeology Out of Nature - An Interview With Jussi Parikka
No ratings yet
Media Archaeology Out of Nature - An Interview With Jussi Parikka
14 pages
Course Outline HCI
No ratings yet
Course Outline HCI
3 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
49 pages
Hand Gesture Recognition
100% (1)
Hand Gesture Recognition
11 pages
Docs Slides Lecture11
No ratings yet
Docs Slides Lecture11
18 pages
Machine Learning Coursera
100% (1)
Machine Learning Coursera
55 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Lecture 2
No ratings yet
Lecture 2
87 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
5 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Machine Learning Andrew NG Week 6 Quiz 1
No ratings yet
Machine Learning Andrew NG Week 6 Quiz 1
8 pages
Quantitative Problems Chapter 5
No ratings yet
Quantitative Problems Chapter 5
5 pages
Deloitte-Leading Beyond The Great Disruption
No ratings yet
Deloitte-Leading Beyond The Great Disruption
16 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
Regression 2023
No ratings yet
Regression 2023
76 pages
Lecture 2
No ratings yet
Lecture 2
62 pages
Machine Learning Andrew NG Week 5 Quiz 1
No ratings yet
Machine Learning Andrew NG Week 5 Quiz 1
3 pages
Lecture2-Linear Regression With One Variable
No ratings yet
Lecture2-Linear Regression With One Variable
49 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Linear Regression
No ratings yet
Linear Regression
64 pages
Crime PDF
No ratings yet
Crime PDF
12 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
49 pages
Lecture 2
No ratings yet
Lecture 2
49 pages
IoTF BDA 2.01 Release Notes
No ratings yet
IoTF BDA 2.01 Release Notes
2 pages
Confidence Interval Exercise
No ratings yet
Confidence Interval Exercise
19 pages
LinearRegression) Byimran
No ratings yet
LinearRegression) Byimran
47 pages
Automatic Control The Hidden Technology
No ratings yet
Automatic Control The Hidden Technology
13 pages
Foundational Papers of Complexity Science Toc
No ratings yet
Foundational Papers of Complexity Science Toc
2 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
A Basic Introduction To Neural Networks
No ratings yet
A Basic Introduction To Neural Networks
6 pages
ML Cheatsheet
No ratings yet
ML Cheatsheet
1 page
AZ AI Lec 08 Machine Learing1
No ratings yet
AZ AI Lec 08 Machine Learing1
60 pages
The Difference Between Men and Women
No ratings yet
The Difference Between Men and Women
2 pages
Naïve Bayes: The Task of Text Classification
No ratings yet
Naïve Bayes: The Task of Text Classification
34 pages
Lecture 2
No ratings yet
Lecture 2
71 pages
L10-Naive Bayes Continuous
No ratings yet
L10-Naive Bayes Continuous
16 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
CV Preey Shah PDF
No ratings yet
CV Preey Shah PDF
1 page
Maria Abastillas Project 2011
No ratings yet
Maria Abastillas Project 2011
56 pages
Lecture 6,7-Linear Regression
No ratings yet
Lecture 6,7-Linear Regression
47 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Lec6 7 Linear Regression
No ratings yet
Lec6 7 Linear Regression
38 pages
Model Perf Cheat Sheet
No ratings yet
Model Perf Cheat Sheet
2 pages
Lead Compensator-Time Domain
No ratings yet
Lead Compensator-Time Domain
17 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Probably Approximately Correct (PAC) Learning
No ratings yet
Probably Approximately Correct (PAC) Learning
10 pages
Lecture 1 - : Fei-Fei Li & Justin Johnson & Serena Yeung
No ratings yet
Lecture 1 - : Fei-Fei Li & Justin Johnson & Serena Yeung
53 pages
Detectx: A Deepfake Detection System
No ratings yet
Detectx: A Deepfake Detection System
17 pages
ML02
No ratings yet
ML02
25 pages
Temperature Controller
No ratings yet
Temperature Controller
22 pages
L08-Probability Basics
No ratings yet
L08-Probability Basics
29 pages
L1802 Modeling JT
No ratings yet
L1802 Modeling JT
42 pages
Predicting Students Employability Using
No ratings yet
Predicting Students Employability Using
6 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
4 pages
Oral Com #1 Activity
No ratings yet
Oral Com #1 Activity
3 pages
Project 7nov - Odp
No ratings yet
Project 7nov - Odp
16 pages
A LQR Optimal Method To Control The Position of An Overhead Crane
No ratings yet
A LQR Optimal Method To Control The Position of An Overhead Crane
7 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
55-Julia-Large Dimension Parametrization With Convolutional Variational Autoencoder
No ratings yet
55-Julia-Large Dimension Parametrization With Convolutional Variational Autoencoder
20 pages
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
No ratings yet
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
15 pages
Universal Grammar and Second Language Learning
No ratings yet
Universal Grammar and Second Language Learning
6 pages
Internship Report Ankit
No ratings yet
Internship Report Ankit
5 pages
Copia de 1 - 5127348643912745704
No ratings yet
Copia de 1 - 5127348643912745704
2 pages
The Truth About Truth: Why Truth "Hurts", Yet We Still Seek It
From Everand
The Truth About Truth: Why Truth "Hurts", Yet We Still Seek It
Martin Nyanzu
No ratings yet
Overcoming Job Loss: A Spiritual Guide
From Everand
Overcoming Job Loss: A Spiritual Guide
Sandra L. Bailey
No ratings yet
One Minute Food Manager: A Handy Food Lovers Guide for All World Travellers
From Everand
One Minute Food Manager: A Handy Food Lovers Guide for All World Travellers
Manish Sharma
No ratings yet
Basics of Marriage Management
From Everand
Basics of Marriage Management
Walter E Vieira
No ratings yet
The Lovable Antichrist
From Everand
The Lovable Antichrist
Carol Drake Wheatley
No ratings yet
So You Think You Are Married ...Ten Tips on How to Live Like It.
From Everand
So You Think You Are Married ...Ten Tips on How to Live Like It.
Michael K. Lea
No ratings yet
Marriage According To God
From Everand
Marriage According To God
Ezekiel C. Melchisedec
No ratings yet
How Men Make Women Crazy (And Vice Versa): Ending the Madness
From Everand
How Men Make Women Crazy (And Vice Versa): Ending the Madness
Jami
No ratings yet
I Am Jewish Teacher's Guide
From Everand
I Am Jewish Teacher's Guide
Jewish Lights Publishing
No ratings yet
Not Normal: Unconventional Marriage Tips for Mature Couples Starting Again
From Everand
Not Normal: Unconventional Marriage Tips for Mature Couples Starting Again
Malcolm Yard
No ratings yet
Gossip Can Save Your Marriage
From Everand
Gossip Can Save Your Marriage
Kishore Dharma
No ratings yet
Cooking with the Authors of Summer Heat
From Everand
Cooking with the Authors of Summer Heat
Caridad Piñeiro
3/5 (2)
Emotional Life Rebalance your emotions (english version)
From Everand
Emotional Life Rebalance your emotions (english version)
Dott.ssa Maria Pia Iurlaro
No ratings yet
At Thy Right Hand
From Everand
At Thy Right Hand
Richard B. Sparks
No ratings yet
Quantum Algorithms in Action: A Practical Guide to Implementation with Qiskit
From Everand
Quantum Algorithms in Action: A Practical Guide to Implementation with Qiskit
Robert Johnson
No ratings yet
15 Core Principles For Making Marriage Work: A Modern Guide For Couples In Today's Busy And Distracted World
From Everand
15 Core Principles For Making Marriage Work: A Modern Guide For Couples In Today's Busy And Distracted World
Frank Albert
No ratings yet
Kindness as a Way. Discovering the Transforming Power of Kindness.
From Everand
Kindness as a Way. Discovering the Transforming Power of Kindness.
Santos Omar Medrano Chura
No ratings yet

Linear Regression With One Variable: Gradient Descent

Uploaded by

Linear Regression With One Variable: Gradient Descent

Uploaded by

Linear regression

with one variable

Correct: Simultaneous update Incorrect:

Learning rate derivative

If α is too large, gradient descent

• SGD often gets close to the minimum much faster than

• SGD typically used in practice on large datasets

• SGD has a possibility of escaping from local/global

You might also like