0% found this document useful (0 votes)

31 views74 pages

ML-Unit I - Linear Regression

- The document discusses linear regression and data collection from product reviews. It shows rating, review text, votes, and product quality for different reviews. - It then talks about using the collected data to predict the number of votes for future reviews based on the word count in the review text. This is framed as a regression problem since the goal is to predict a continuous output value. - Examples are provided of simple linear regression using only the vote data, calculating the mean and residuals. The goal of linear regression is to minimize the sum of squared residuals to find the best fitting line.

Uploaded by

Pranav Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views74 pages

ML-Unit I - Linear Regression

Uploaded by

Pranav Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

Machine Learning

Dr. Sunil Saumya

IIIT Dharwad
Linear Regression
Data Collection
Rating Review Text Vote Product
Quality

1 It's stops working 52 Low

just after 1 month ..
It's a fault in it's
design or circuit
..maybe it's not sweat
proff.. but I would
suggest don't buy it..
Data Collection Continue..
Rating Review Text Vote Product
Quality
1 It's stops working just after 1 52 Low
month ..
It's a fault in it's design or
circuit ..maybe it's not sweat
proff.. but I would suggest
don't buy it..
1 Not good 6 Low

5 It's a killer one... I've been 42 High

using Sennheiser cx180 for a
year and it's damaged, so I got
this one for me. So…..
Data Collection Continue..
Rating Review Text Vote Product
Quality
1 It's stops working just after 1 month .. 52 Low (0)
It's a fault in it's design or circuit ..maybe it's
not sweat proff.. but I would suggest don't
buy it..
1 Not good 6 Low (0)
5 It's a killer one... I've been using Sennheiser 42 High (1)
cx180 for a year and it's damaged, so I got
this one for me. So…..
1 Nit a good product! Mine stopped working ?? Low (0)
with in 1st month of purchase! And now iam
struggling with warranty claim!
Data Collection Continue..

Rating Review Text Word Vote Product

Count Quality
1 It's stops working just after 1 month .. 27 52 Low (0)
It's a fault in it's design or circuit ..maybe it's
not sweat proff.. but I would suggest don't buy
it..
1 Not good 2 6 Low (0)
5 It's a killer one... I've been using Sennheiser 100 42 High (1)
cx180 for a year and it's damaged, so I got
this one for me. So…..
1 Nit a good product! Mine stopped working 20 ?? Low (0)
with in 1st month of purchase! And now iam
struggling with warranty claim!
Problem Statement

● Problem statement:
Word Vote
Count ○ For every review (given in terms of word count
in the dataset) that will be posted on the
27 52
e-commerce website, predict how many votes it
2 6
will receive?
100 42 ● Here, we have input and output both given in the
20 ?? dataset, so it is a supervised problem.
● Second, while looking the vote column (output) it looks
Dataset like continuous value and not categorical value.
○ Hence, it is a regression problem.
One-dimensional data
Sl. No. Vote
● Unfortunately, while storing the data we collected only
1 5
vote and not the review word count.
2 17 ● So, this is the best data we have now, and we have to
3 11 find what will be the next vote for review no. 4.
4 8 ● How will you predict the vote count for future review
5 14 only on this data?
6 5
Dataset
Data Visualization

Sl. No. Vote

1 5
2 17
3 11
4 8
5 14
6 5
Dataset
Best line for the given data

Sl. No. Vote

1 5
2 17
3 11
4 8
5 14
6 5
Dataset Ŷ = 10
“Mean”: Best line for the given data

Sl. No. Vote

1 5
2 17
3 11
4 8
5 14
6 5 ● With only one variable, and no other information, the best prediction for next
measurement is the mean itself.
Dataset ● The variability in the vote can be explained by vote itself.
Ŷ = 10
“Goodness of ﬁt” for the Vote

Sl. No. Vote

1 5
2 17
3 11
4 8
5 14
6 5 ● Residuals are also known as Errors.
● Residual always add ups to Zero.
Dataset Ŷ = 10 ○ In this case Residual above line is +12 and below line is -12.
Squaring the residuals
Sl. R R2 Sum of squared errors (SSE) = 120
No.
1 -5 25
2 +7 49
3 +1 1
4 -2 4
5 +4 16
6 -5 25

● Why square the residuals?

○ Makes them positive
Dataset Ŷ = 10 ○ Emphasizes larger deviations from the mean
Squaring the residuals
Sl. R R2 Sum of squared errors (SSE) = 120
No.
1 -5 25
2 +7 49
3 +1 1
4 -2 4
5 +4 16
6 -5 25

Dataset Ŷ = 10
Squaring the residuals
Sum of squared errors (SSE) = 120
Sl. R R2
No.
1 -5 25
2 +7 49
3 +1 1
4 -2 4
5 +4 16
6 -5 25

Dataset Ŷ = 10
Important points
Sl. R R2 ● The goal of simple linear regression is to create a linear model
No. that minimizes the sum of squares of the residuals/ errors
(SSE).
1 -5 25
● When conducting simple linear regression with two variables,
2 +7 49 we will determine how good that line “fits” the data by
3 +1 1 comparing it to this type of line; where we pretend the second
4 -2 4 variable does not exist.
● If a two variable regression model looks like this example,
5 +4 16
other variable does nothing to explain dependent variable.
6 -5 25

Dataset Ŷ = 10
Important points
● Simple linear regression is really a comparison of two models.
○ One is where the independent variable does not even exist.
○ Other uses the best fit regression line.
● If there is only one variable in dataset, the best prediction is given by the mean of
the dependent variable.
● The difference between the best fit line and the observed value is called the
residuals (or errors).
● The residuals are squared and sum together to given sum of squared residuals/errors
(SSE).
● Simple linear regression is designed to find the best fitting line through the data that
minimizes the SSE.
Linear Regression with independent variable
● Linear regression is a statistical method of
finding the relationship between independent
and dependent variables.
● Why do we call them as Independent and
Dependent variables?
○ Our independent variables are
independent because we cannot
mathematically determine the years of
experience.
○ But, we can determine / predict salary
column values (Dependent Variables)
based on years of experience.
Linear Regression with independent variable
● If you look at the data, the dependent column
values (Salary in 1000$) are increasing /
decreasing based on years of experience.
● Total Sum of Squares (SST):
○ The SST is the sum of all squared
differences between the mean of a sample
and the individual values in that sample. It
is represented mathematically with the
formula.
Linear Regression with independent variable
● Total Sum of Squares (SST):

The total sum of squared errors SST output is

5226.19.
Ordinary Least Square (OLS) Linear Regression
● Linear regression model objective is to reduce the SSE value as minimum as
possible.
● OLS works on the slope-intercept form of line to determine the relationship between
independent variables and dependent variable.
● To identify a slope-intercept, we use the equation
y = mx + b,
‘m’ is the slope
‘x’ → independent variables
‘b’ is intercept
Ordinary Least Square (OLS) Linear Regression
● To identify a slope-intercept, we use the
equation
y = mx + b,
‘m’ is the slope
‘x’ → independent variables
‘b’ is intercept
● To use OLS method, we apply the below
formula to find the equation
Ordinary Least Square (OLS) Linear Regression

m = 1037.8 / 216.19
m = 4.80
b = 45.44 - 4.80 * 7.56 = 9.15
Hence,
y = mx + b → 4.80x + 9.15
y = 4.80x + 9.15
Ordinary Least Square (OLS) Linear Regression
● Let’s compare our OLS method result with MS-Excel.
● Yes, we can test our linear regression best line fit in Microsoft Excel.

Our OLS method output → y = 4.80x + 9.15

MS-Excel Linear Reg. Output → y = 4.79x + 9.18
Ordinary Least Square (OLS) Linear Regression
● Let us calculate SSE again by using our output equation.

y = 4.79x + 9.18
Ordinary Least Square (OLS) Linear Regression
● Let us calculate SSE again by using our output equation.

SSE before OLS: 5226.19

SSE with OLS: 245.38
Linear Regression: on review Dataset
Word Vote
● In our dataset, we have only one independent variable
Count
(or input or x) “Word count” therefore, we can use
27 52
linear regression.
2 6 ● As we know, the linear relationship is always presented
100 42 by a straight line.
40 38
14 30
20 ??

Dataset
Linear Regression
Word Vote
● Now, let's come back to our main dataset and find the
Count
relationship between Word count (X) and Vote (Y)
27 52
2 6
100 42
40 38
14 30
20 ??

Dataset
OLS Linear Regression
Word Vote ● We know that a linear relationship can be obtained by
Count drawing a straight line between Word count (X) and
27 52 Vote (Y) which is given as:
2 6 Y = mx + c
100 42 Where,
m= slope
40 38
c= intercept
14 30
20 ??

Dataset
OLS Linear Regression
Word Vote ● We know that a linear relationship can be obtained by
Count drawing a straight line between Word count (X) and
27 52 Vote (Y) which is given as:
2 6 Y = mx + c
Where, Avg (x) = (27+2+100+40+14)/5 =36.6
100 42
m= slope Avg (y) = (52+6+42+38+30)/5 = 33.6
40 38
c= intercept
14 30
20 ??

Dataset
OLS Linear Regression
Word Vote
Count
27 52
2 6
100 42
40 38
14 30
20 ??

Dataset
OLS Linear Regression
Word Vote But, is it the
Count best line?
27 52
2 6
100 42
40 38
14 30
20 ??

Dataset
Linear Regression: choosing best line
Word Vote ● But, is it the best line?
Count ○ We can get multiple lines if we change the value
of m and c in the equation Y=mx+c.
27 52
To get the best line we will
2 6 use the gradient descent
algorithm.
100 42
40 38 Idea: Choose m and c
such that f(x) is close to y
14 30 for our training example
(x, y).
20 ??
Therefore, we need to
minimize the difference
Dataset between f(x) and y.
Re-writing hypothesis for ﬁnding best line
● A best regression line is one for which we get the least
error.
● Objective: of all possible lines, find the one that minimizes
the distance between the predicted y values (on the line)
and the true y values
Hypothesis Function: h(x)= w0+w1x
Cost function: Mean Squared Error
● Objective: of all possible lines, find the one that
minimizes the distance between the predicted y
values (on the line) and the true y values
Hypothesis Function: h(x)= w0+w1x
Cost function: Mean Squared Error
● Objective: of all possible lines, find the one that
minimizes the distance between the predicted y
values (on the line) and the true y values
Hypothesis Function: h(x)= w0+w1x

In other words, find w0 and w1 that minimize the cost function

J(w) for our n training examples
Cost function (MSE): Intuition

● As a simplification for the moment, let’s set w0 to be zero

● This means that our line will pass through the origin
● Our hypothesis is then h(x) = 0 + w1x = w1x
● Our cost function is then

● Our goal is to find w1 that minimizes J(w1 )

Cost function (MSE): Intuition

Suppose we have the following

three training dataset: Let’s consider the cost associated
Consider W0=0 with different values of w1
Cost function (MSE): Intuition

Suppose we have the following

three training dataset:
Consider W0=0

For W1=0.5

Let’s consider the cost associated

with different values of w1
Cost function (MSE): Intuition

Suppose we have the following

three training dataset:
Consider W0=0

For W1=0

Let’s consider the cost associated

with different values of w1
Cost function (MSE): Intuition

Suppose we have the following

three training dataset:
Consider W0=0

For W1=2

Let’s consider the cost associated

with different values of w1
Cost function (MSE): Intuition

Suppose we have the following

three training dataset:
Consider W0=0

Let’s consider the cost associated

with different values of w1
Cost function (MSE): of two parameters (W0, W1)
Cost function (MSE): of two parameters (W0, W1)

Cost Function
Cost function (MSE): of two parameters (W0, W1)
Cost function (MSE): of two parameters (W0, W1)
Cost function (MSE): of two parameters (W0, W1)
Gradient Descent
● We want to find the line that best fits the
data, i.e., we want to find w0 and w1 that
minimize the cost, J(w0,w1).

Gradient descent Algorithm:

Step 1: Start with some w0 and w1

(e.g., w0= 0 and w1 = 0)

Step 2: Keep changing w0 and w1 to reduce

the cost J(w0,w1) until hopefully we end up at
a minimum.
Gradient Descent; W0=0
● We want to find the line (passing through
the origin) that best fits the data, i.e., we
want to find and w1 that minimize the
cost, J(w1).

Gradient descent Algorithm:

Step 1: Start with some w1

(e.g., w1 = 0)

Step 2: Keep changing w1 to reduce the cost

J(w1) until hopefully we end up at a
minimum.
Gradient Descent
Gradient Descent
Gradient Descent: choosing learning rate
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent

Initialize: w0= 0 and w1 = 0

Repeat until convergence:

Batch Gradient Descent
● With batch gradient descent, we
consider all data points each
time we update a weight
parameter
Initialize: w0= 0 and w1 = 0

Repeat until convergence:

Stochastic Gradient Descent
● With stochastic gradient descent,
we consider a single data point
each time we update a weight
parameter

Initialize: w0= 0 and w1 = 0

Repeat until convergence, iterating
over each data point (x, y):
Multivariate Linear Regression
Univariate Linear Regression
Multivariate Linear Regression
Multivariate Linear Regression
Multivariate Linear Regression
Multivariate Linear Regression: feature scaling

● Features may have very

different ranges!
● Don’t forget to perform
feature scaling, e.g.,
subtract each feature’s
mean and divide by each
feature’s standard
deviation.
● Then features will have
the same scale.
Polynomial Regression
Polynomial Regression
Polynomial Regression
Polynomial Regression
Polynomial Regression
Overfitting and Underfitting
● Overfitting and Underfitting are two crucial concepts in machine learning
and are the prevalent causes for the poor performance of a machine learning
model.
● We will see the overfitting and underfitting for both:
○ Regression problem
Overfitting and Underfitting
● Underfitting:
○ When a model has not learned the
patterns in the training data well and
is unable to generalize well on the
new data, it is known as underfitting.
○ An underfit model has poor
performance on the training data and
will result in unreliable predictions.
○ Underfitting occurs due to high bias
and low variance.
W1X+b
Overfitting and Underfitting
● Optimum fit:
○ let's look at a second variation of a
model, which is if you insert for a
quadratic function at the data with two
features, x1 and x2, then when you fit the
parameters W1 and W2, you can get a
curve that fits the data somewhat better.
○ Your learning algorithm do well, even on
examples that are not on the training set,
that's called generalization.
W1X+W2X2+b
Overfitting and Underfitting
● Overfit:
○ The problem is if the model learns
the data too well, it fails to capture
the true relationship between input
and output and thus gives poor
validation accuracy ( results on
unseen data ) although it exhibits
good accuracy on the training data.
○ This is called overfitting and is a very
common problem in Machine
Learning. W1X1+W2X22+W3X33+W4X44+b
Overfitting and Underfitting
Overfitting may occur when we have
too many features and the learned
hypothesis fits the training data very
well but fails to generalize to new
examples.

Data Science
100% (1)
Data Science
14 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Peak Fit
No ratings yet
Peak Fit
295 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Cpar MAS: Costs and Cost Concepts
No ratings yet
Cpar MAS: Costs and Cost Concepts
6 pages
2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
51 pages
Simple Lin Regress Inference
No ratings yet
Simple Lin Regress Inference
51 pages
Lect03 CSN382
No ratings yet
Lect03 CSN382
31 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Chapter 6
No ratings yet
Chapter 6
35 pages
1.10 Simple Linear Regression - Answers
No ratings yet
1.10 Simple Linear Regression - Answers
22 pages
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
No ratings yet
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
51 pages
Linearregressionpl
No ratings yet
Linearregressionpl
9 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
4 pages
Lecture 3.1
No ratings yet
Lecture 3.1
21 pages
Lec 6
No ratings yet
Lec 6
19 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
10.introduction To Artificial Intelligence
No ratings yet
10.introduction To Artificial Intelligence
25 pages
ML L6 Linear Regresion
No ratings yet
ML L6 Linear Regresion
54 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
15.simple Linear Regression-530
No ratings yet
15.simple Linear Regression-530
54 pages
Linear Regression Concepts - A4
No ratings yet
Linear Regression Concepts - A4
6 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Simple Linear Regression Sample
No ratings yet
Simple Linear Regression Sample
55 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
Hilton CH 6 Select Solutions
100% (1)
Hilton CH 6 Select Solutions
19 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
2022 Further Maths Exam 2 Solutions
No ratings yet
2022 Further Maths Exam 2 Solutions
21 pages
Decision-Directed Least-Squares Phase Perturbation Compensation in OFDM Systems
No ratings yet
Decision-Directed Least-Squares Phase Perturbation Compensation in OFDM Systems
13 pages
ML 02 Regression 2
No ratings yet
ML 02 Regression 2
30 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Statistical Methods and Analyses For Medical Devices High-Quality Ebook
100% (18)
Statistical Methods and Analyses For Medical Devices High-Quality Ebook
17 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Linear Regression - 1st Draft
No ratings yet
Linear Regression - 1st Draft
5 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Least Squares Orthogonal Distance Fitting of Curves and Surfaces in Space
No ratings yet
Least Squares Orthogonal Distance Fitting of Curves and Surfaces in Space
149 pages
Mathematics Grade 12 Term 3 Week 3 - 2020
No ratings yet
Mathematics Grade 12 Term 3 Week 3 - 2020
5 pages
Certificate Level Stage II Syllabi
No ratings yet
Certificate Level Stage II Syllabi
19 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
OED Mathematics in The Modern World Prelims Quiz 1 (10/10) : Answer Saved Marked Out of 1.00
No ratings yet
OED Mathematics in The Modern World Prelims Quiz 1 (10/10) : Answer Saved Marked Out of 1.00
68 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
30 Questions To Test A Data Scientist On Linear Regression
No ratings yet
30 Questions To Test A Data Scientist On Linear Regression
10 pages
Pareto Analysis Technique
No ratings yet
Pareto Analysis Technique
15 pages
Eviews Basics PDF
No ratings yet
Eviews Basics PDF
11 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Forecasting: Central Luzon State University
No ratings yet
Forecasting: Central Luzon State University
11 pages
Manufacturing Cost Estimation
No ratings yet
Manufacturing Cost Estimation
22 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Uncertainty Quantification
No ratings yet
Uncertainty Quantification
34 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
Propagation of Data Uncertainty in Surface Wave Inversion
No ratings yet
Propagation of Data Uncertainty in Surface Wave Inversion
10 pages
Nonlinear Curve Fit Proof
No ratings yet
Nonlinear Curve Fit Proof
5 pages
CPT Pharmacom Syst Pharma - 2012 - Mould - Basic Concepts in Population Modeling Simulation and Model Based Drug
No ratings yet
CPT Pharmacom Syst Pharma - 2012 - Mould - Basic Concepts in Population Modeling Simulation and Model Based Drug
14 pages
Business Analytics
No ratings yet
Business Analytics
6 pages
Multifactor Analysis of Specific Storage Estimates and Implications For Transient Groundwater Modelling
No ratings yet
Multifactor Analysis of Specific Storage Estimates and Implications For Transient Groundwater Modelling
22 pages
2.simple Regression Analysis Chapter 6
No ratings yet
2.simple Regression Analysis Chapter 6
27 pages
Seasonality and Trend
No ratings yet
Seasonality and Trend
13 pages
Two Way Tables GR 10 11 by Jurg Basson
No ratings yet
Two Way Tables GR 10 11 by Jurg Basson
6 pages
Non Linear Model
No ratings yet
Non Linear Model
4 pages
Hansen 1982
No ratings yet
Hansen 1982
27 pages
Syallabus
No ratings yet
Syallabus
39 pages
Gre Formula Book
From Everand
Gre Formula Book
Saifuddin Kamran
No ratings yet
GCSE Maths Teachers Pack V11
From Everand
GCSE Maths Teachers Pack V11
Clive W. Humphris
No ratings yet

ML-Unit I - Linear Regression

Uploaded by

ML-Unit I - Linear Regression

Uploaded by

Machine Learning

Dr. Sunil Saumya

1 It's stops working 52 Low

5 It's a killer one... I've been 42 High

Rating Review Text Word Vote Product

Sl. No. Vote

Sl. No. Vote

Sl. No. Vote

Sl. No. Vote

● Why square the residuals?

The total sum of squared errors SST output is

Our OLS method output → y = 4.80x + 9.15

SSE before OLS: 5226.19

In other words, find w0 and w1 that minimize the cost function

● As a simplification for the moment, let’s set w0 to be zero

● Our goal is to find w1 that minimizes J(w1 )

Suppose we have the following

Suppose we have the following

Let’s consider the cost associated

Suppose we have the following

Let’s consider the cost associated

Suppose we have the following

Let’s consider the cost associated

Suppose we have the following

Let’s consider the cost associated

Gradient descent Algorithm:

Step 1: Start with some w0 and w1

Step 2: Keep changing w0 and w1 to reduce

Gradient descent Algorithm:

Step 1: Start with some w1

Step 2: Keep changing w1 to reduce the cost

Initialize: w0= 0 and w1 = 0

Repeat until convergence:

Repeat until convergence:

Initialize: w0= 0 and w1 = 0

● Features may have very

You might also like