0% found this document useful (0 votes)

46 views12 pages

Linear Regression With One Variable

Linear regression finds the best fitting line to model the relationship between two variables. The cost function measures the accuracy of the hypothesis function by calculating the average squared difference between predicted and actual values. Gradient descent is used to minimize this cost function by iteratively updating the parameters of the hypothesis function in the direction of steepest descent down the cost curve. The goal is to reduce the cost until reaching the optimal parameters that generate the best fitting regression line.

Uploaded by

ksms9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views12 pages

Linear Regression With One Variable

Uploaded by

ksms9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

2.

Linear Regression with One

Variable – or Univariate Linear Regression

 MODEL REPRESENTATION:

➢ Other Notations:
X = space of input values
Y = space of output values
Dataset = list of m training examples → (x(i) ,y(i)); i = 1,2, . . . ,m

Hypothesis Function: Function which is derived by feeding

training data (Input and Output (Supervised) ) to the learning
algorithm, which can then be used to predict o/p for new input
data.
➢ For a supervised problem: h : X → Y so that h(x) is a “good”
predictor for the corresponding value of y.

COST FUNCTION: Can measure the accuracy of our hypothesis

function by Choosing parameters of h(x) such that h(x) is close to y
for data in the training set.
Cost function is a minimization function:
Here, we try to minimize the (1/2m) x (sum of squared differences
b/w predicted value and actual value given in dataset).
➢ “J” is the cost function or squared error cost function or
Mean squared error
➢ (1/2m) is for averaging the squared difference.
Minimize means, we try to find values of θ parameters such that
the cost function is minimized.

Cost ➔ An average difference of all the results of the hypothesis

with inputs from x's and the actual output y's.

➢ The mean is halved as a convenience for the computation of

the gradient descent, as the derivative term of the square
function will cancel out the 1/2 term.

COST FUNCTION INTUITION: Training set data is scattered on x-y

plane. We try to draw a straight line through it. Goal → find best
fitting line.
Ideally, the line should pass through all the points of our training
data set. In such a case, the value of “J” will be 0.
➢ For simplicity: let
For Θ=1 ➔ h(Θ)=x ➔ J(Θ)=0
For Θ=0.5 ➔ h(Θ)=0.5x ➔
J(Θ)=0.58
For Θ=0 ➔ h(Θ)=0 ➔ J(Θ)=2.3

➢ J(Θ) is the average of sqr of diff

bw h(x) and y:
➢ h(x) = predicted value at given
training data
➢ y = actual value of o/p in training data

➢ Vertical lines
represent the given
difference

➢ For different values of

Θ, we try to minimize J(Θ) {error}, which occurs at Θ=1.
Therefore, we choose Θ=1 as out best fitting curve: h(x)=x

≫ For more complex h(x) fxn like h(x) = Θ0 + Θ1x : we have to plot
J(Θ0, Θ1) in 3D. As, for different combinations of Θ0 and Θ1, J can be
different.
These can be more easily represented using contour figures: A
contour plot is a graph that contains many contour lines. A contour
line of a two variable function has a constant value at all points on
the line.

Graph b/w Θ0 and Θ1: these ellipses are the combinations of Θ0 and
Θ1 for which value of J is same. Points other than on ellipses are
also valid points, they also correspond to a unique value of J.
➢ The best combination (one which minimizes J(Θ0, Θ1) ) of Θ0 and
Θ1 lies around center of the innermost circle.

 GRADIENT DESCENT: an algorithm to minimize J(Θ0, Θ1)

➢ Start with some Θ0, Θ1
➢ Keep changing Θ0, Θ1 to reduce J until minimum is reached

Here, we start at a value of Θ0, Θ1 and keep going down on J(Θ0,

Θ1) curve until we reach a local minima. We will know that we have
succeeded when our cost function is at the very bottom of the pits
in our graph.
➢ If we start at a diff value of Θ0, Θ1, we end having different
minima.
We are not graphing x and y itself, but the parameter range of our
hypothesis function and the cost resulting from selecting a
particular set of parameters.

The slope of the tangent is the derivative at that point and it will
give us a direction to move towards. We make steps down the cost
function in the direction with the steepest descent. The size of
each step is determined by the parameter α, which is called the
learning rate.
α = Learning rate
SIMULTANEOUS UPDATE: first we calculate new value for both Θ0
and Θ1, then only we update their values. So order of execution of
statements is:

Here, if update Θ0 before actually calculating the value of new Θ1,

the Θ0 used in equation of Θ1 will be new Θ0, not the one we
wanted to minimize for.
At each iteration j, one should simultaneously update the
parameters .. Θ0, Θ1…Θn. Updating a specific parameter prior to
calculating another one on the jth iteration would yield a wrong
implementation.

GRADIENT DESCENT INTUITION: for simplicity we only use one

parameter:
h(x) = Θ1.x
→α is positive
For a value of Θ1, if the slope of J(Θ) is positive: Θ1 decreases
For negative slope of J(Θ): Θ1 increases
θ1 eventually converges to its minimum.

If the value of α is too small: gradient descent takes baby steps
towards the min.
If α is too large: gradient descent takes huge steps.. in such
case the gradient descent may even overshoot the min if the
diff b/w initial Θ and Θmin is less than the value of jump in Θ
(α * derivative of J) and it may start going further and further
from the min.

➢ Therefore, we should adjust our parameter to ensure that the

gradient descent algorithm converges in a reasonable time.
➢ If the Θ is already at its local minimum, the slope will be 0.. thus
Θ won’t change

Even if the learning rate α is fixed, the slope gets smaller as we

reach towards the minima.. so the steps automatically become
smaller
For a linear regression model: Derivative of J:

For Θ0 – derivate wrt Θ0

For Θ1 – derivate wrt Θ1
➢ For a linear regression model: the curve is always a convex curve
(bowl shaped).
It has only one optimum → global minima (assuming the
learning rate α is not too large).

➢ In the contour curve: we start of with any value of Θ0 and Θ1 and

then we minimize the J.
➢ We approach the min as we reach towards the center.

We start at an arbitrary value for Θ0 and Θ1:

We start minimizing J(Θ0, Θ1) with our gradient descent algo:

J is a complicated quadratic function

The ellipses shown above are the contours of a quadratic function
Batch Gradient Descent → Each step of gradient descent uses all
training examples.
The point of all this is that if we start with a guess for our
hypothesis and then repeatedly apply these gradient descent
equations, our hypothesis will become more and more accurate.

Long-Term Deflections of Reinforced Concrete Beams
No ratings yet
Long-Term Deflections of Reinforced Concrete Beams
137 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
7 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
12 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Slide 3 - Linear Regression One Variable
No ratings yet
Slide 3 - Linear Regression One Variable
60 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
07 Gradient Descent For Linear Regression 10 Min
No ratings yet
07 Gradient Descent For Linear Regression 10 Min
5 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
43 pages
Regression
No ratings yet
Regression
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229
No ratings yet
CS229
69 pages
Notes 1
No ratings yet
Notes 1
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
ML02
No ratings yet
ML02
25 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Linear Regression
100% (1)
Linear Regression
51 pages
(ML&PR 2025) Lec2 Regression II
No ratings yet
(ML&PR 2025) Lec2 Regression II
41 pages
ML Coursera
No ratings yet
ML Coursera
10 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
8 - Linear Regression - Gradient Descent Method
No ratings yet
8 - Linear Regression - Gradient Descent Method
21 pages
Linear Regression: Level:4 Department: IT, Security
No ratings yet
Linear Regression: Level:4 Department: IT, Security
35 pages
Linear Regression - Gradient Descent Method
No ratings yet
Linear Regression - Gradient Descent Method
15 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Gradient Descent for Linear Regression: repeat until convergence: (:=:=) − α ( −) 1 ∑ − α ( ( −) ) 1 ∑
No ratings yet
Gradient Descent for Linear Regression: repeat until convergence: (:=:=) − α ( −) 1 ∑ − α ( ( −) ) 1 ∑
1 page
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
(MLP) MidtermNote
No ratings yet
(MLP) MidtermNote
31 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
Lec2 Linear Regression With One Variable
No ratings yet
Lec2 Linear Regression With One Variable
48 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
ML Unit 2
No ratings yet
ML Unit 2
53 pages
Lecture - 8 MLR
No ratings yet
Lecture - 8 MLR
63 pages
Ingeniera Geotecnica Machine Learning
No ratings yet
Ingeniera Geotecnica Machine Learning
8 pages
AI Part B (XII) 2023-24
No ratings yet
AI Part B (XII) 2023-24
22 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
CS-30004 (Dsa) - CS End Nov 2024
No ratings yet
CS-30004 (Dsa) - CS End Nov 2024
17 pages
(ML) Machine Learning Lab Manual
No ratings yet
(ML) Machine Learning Lab Manual
25 pages
Da Lab Manual Updated
No ratings yet
Da Lab Manual Updated
98 pages
Chapter 3
100% (1)
Chapter 3
59 pages
Image Colorization Final Report
No ratings yet
Image Colorization Final Report
52 pages
Dispersion Formulas
No ratings yet
Dispersion Formulas
2 pages
Feng Et Al-2017-Journal of Food Science
No ratings yet
Feng Et Al-2017-Journal of Food Science
11 pages
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
100% (1)
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
14 pages
An Information-Theoretic Framework For Receiver Quantization in Communication - Single
No ratings yet
An Information-Theoretic Framework For Receiver Quantization in Communication - Single
35 pages
Adaptive Signal
No ratings yet
Adaptive Signal
9 pages
Characterization of Clutter Heterogeneity and Estimation of Its Covariance Matrix
No ratings yet
Characterization of Clutter Heterogeneity and Estimation of Its Covariance Matrix
6 pages
Financial Planning
No ratings yet
Financial Planning
98 pages
Pdetime: Rethinking Long-Term Multivariate Time Series Forecasting From The Perspective of Partial Differential Equations
No ratings yet
Pdetime: Rethinking Long-Term Multivariate Time Series Forecasting From The Perspective of Partial Differential Equations
15 pages
Statistics
No ratings yet
Statistics
57 pages
4.6 Complexity Theorem
No ratings yet
4.6 Complexity Theorem
8 pages
SaeHB Me Beta
No ratings yet
SaeHB Me Beta
6 pages
Intermittent Demand Forecasting: Context, Methods and Applications 1st Edition Aris A. Syntetos
100% (3)
Intermittent Demand Forecasting: Context, Methods and Applications 1st Edition Aris A. Syntetos
54 pages
1 s2.0 S1350453322001175 Main
No ratings yet
1 s2.0 S1350453322001175 Main
14 pages
Determination of Discharge Coefficient of Stepped Morning Glory Spillway Using A Hybrid Data-Driven Method
No ratings yet
Determination of Discharge Coefficient of Stepped Morning Glory Spillway Using A Hybrid Data-Driven Method
13 pages
Weighted Moving Average Formula
No ratings yet
Weighted Moving Average Formula
25 pages
QMT 3001 Business Forecasting Term Project
No ratings yet
QMT 3001 Business Forecasting Term Project
30 pages
Honest Causal Forests
No ratings yet
Honest Causal Forests
5 pages
William R. Bell, Scott H. Holan, Tucker S. McElroy - Economic Time Series - Modeling and Seasonality-Chapman and Hall - CRC (2012)
100% (1)
William R. Bell, Scott H. Holan, Tucker S. McElroy - Economic Time Series - Modeling and Seasonality-Chapman and Hall - CRC (2012)
544 pages
MLP Slides Merged
No ratings yet
MLP Slides Merged
480 pages

Linear Regression With One Variable

Uploaded by

Linear Regression With One Variable

Uploaded by

2.

Linear Regression with One

Hypothesis Function: Function which is derived by feeding

COST FUNCTION: Can measure the accuracy of our hypothesis

Cost ➔ An average difference of all the results of the hypothesis

➢ The mean is halved as a convenience for the computation of

COST FUNCTION INTUITION: Training set data is scattered on x-y

➢ J(Θ) is the average of sqr of diff

➢ For different values of

 GRADIENT DESCENT: an algorithm to minimize J(Θ0, Θ1)

Here, we start at a value of Θ0, Θ1 and keep going down on J(Θ0,

Here, if update Θ0 before actually calculating the value of new Θ1,

GRADIENT DESCENT INTUITION: for simplicity we only use one

➢ Therefore, we should adjust our parameter to ensure that the

Even if the learning rate α is fixed, the slope gets smaller as we

For Θ0 – derivate wrt Θ0

➢ In the contour curve: we start of with any value of Θ0 and Θ1 and

We start at an arbitrary value for Θ0 and Θ1:

J is a complicated quadratic function

You might also like