0% found this document useful (0 votes)

34 views

Linear Regression: Normal Equation and Gradient Descent

Linear regression is a method for modeling the relationship between a dependent variable (like house cost) and one or more independent variables (like house area). It finds the best-fitting straight line through data points by minimizing the sum of the squares of the vertical distances between the points and the line. This line can then be used to predict the dependent variable for new values of the independent variables. Linear regression can be solved using either gradient descent, an iterative optimization algorithm, or the normal equation, which finds the optimal line values directly from the data in one step using matrix operations.

Uploaded by

Niel Brown

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Linear Regression: Normal Equation and Gradient Descent

Uploaded by

Niel Brown

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Linear Regression: Normal Equation and Gradient Descent

Someone asked me Why we use use Linear Regression? I answered, for

Prediction. He asked me again What is Prediction? I answered with a
situation “Suppose you and me are walking on a road. We came across
crossroads, now I tell you that we will go straight. This happens ve times.
Now on sixth crossroad. I ask you which way we will go. Your answer will
be, we will go straight. This what you said is called prediction. He asked
me the next question How did we do it? I explained, we looked at data or
in this case my previous responses whenever we are at a crossroad and
considering that we assumed that we will go straight.
What is Linear Regression? It is an approach for modelling the
relationship between dependent variable and independent variables .
Again a question What are independent and dependent variables?
Now the below image have some data regarding houses. We have “area
of the house” and “cost of the house”. We can see that cost is dependent
on area of the house as when the area increases cost will also increase
and when area decreases the cost will also decrease. So we can say that
cost is a dependent variable and area is an independent variable.
With the help of linear Regression we will model this relationship
between cost of the house and area of the house.

Figure 1 : Example of House(Area vs Cost) Data set

The best way to model this relationship is to plot a graph between the
cost and area of the house. The area of the house is represented in the X-
axis while cost is represented in Y-axis. What will Regression Do? It will
try to fit the line through these points. Left image shows the plotted
points and Right image shows the line fitting the points.
Figure 2 :Plotting Area of the house (X-axis) Vs Cost of house(Y-axis)

If we want to predict the cost of the house which has housing area of
1100 sq feet with the help of above image it can be done as shown in
the below image As you can see the cost of 1100 sq feet house is near
about 35.

Figure 3 : Finding Cost of House when Area of house is 1100sq feet

Before going deeper we should understand some terms which I will be

going to use for derivation and explaining mathematical interpretation.
Figure 4: Terms and Notations

• M := training samples

• x := input feature/variables. These can be more then one.

• y := output feature/variables.

• (x,y) := training example. Eg: (1000,30)

How is this line is represented mathematically?

Figure 5: Hypothesis h(x)

h(x) represents the line mathematically as for now we have only one
input feature the equation will be linear equation and it also resembles
the line equation “Y = mx + c” . Now we will see what effect does
choosing the value of theta will have on line.

Figure 6 : The value of theta will have e ect the slope and intercept of the line. As you can in left
and right images.
Why Linear? Linear is the basic building block. We will get into more
complex problems later which may require use of non linear functions
or high degree polynomial

How to best fit our Data? To best fit our data we have to choose the
value of theta’s such that the difference between h(x) and y is
minimum. To calculate this we will define a error function. As you can
see in the below right image. -

• Error function is defined by the difference between h(x) — y.

• We are taking absolute error as square of the error because some

points are above and below the line.

• To take the error of all points we used Summation.

• Averaged and then divided by 2 to make the calculation easier. It

will have no effect on overall error.

Figure 7 : Error Function Derivation(source : www.google.com/images)

Now to visualize the relationship between Theta and Error Function(J).

We will plot a graph between theta and Error function. The bottom
right image shows the graph where X-axis will represent the theta and
Y-axis will represent the J(theta), error function. We will assume the
Theta0 will be zero. It means the line will always pass through through
origin.

Figure 8.1 : Plotting h(x) when theta_0 = 0 and theta_1 = 1

• We assumed some points (1,1),(2,2),(3,3) and assuming theta0 =

0 and theta1 = 1. We calculated the error and obviously it will be
zero.

Figure 8.2 : Calculation Of Error for Figure 8.1
Figure 8.3 : Plotting h(x) when theta_0 = 0 and theta_1 = 0.5

• Then we repeated the same process with value 0 and 0.5 and error
we got is 0.58 and you can also see in the image. The line is not a
good fit to the given points.

Figure 8.4 : Calculation Of Error for Figure 8.3

Figure 8.5 : Graph between Error Function and Theta_1
• Now if take more values of theta we will get some thing like hand
drawn diagram(bottom-right) as we can minimum is at theta1 =1

But unfortunately we cannot always have theta0 = 0 because if we can

some data points are like shown in the below image we can to take
some intercept or we cant ever reach the best fit while theta0 having
some value we will plot a 3D graph shown in right image. It will always
be bowled shaped graph.

Figure 9 : 3D plot while considering both Theta(source : www.google.com/images)

3D repesentations are very difficult to study. So we will not study them.

We will plot them in 2D also known an contour plots

In the below images you will see the contour plots of the hypothesis.
What does these eclipses represents in the image?

• Any point on the same eclipses will give us the same value of error
function J. As the three points represented by pink color on the
below left image will have same value of error function.

• The red points describes the hypothesis for the left image you will
get theta0(intecept) = 800 and theta1(slope) = -0.5 and in the
below right image theta0(intecept) = 500 and theta1(slope) = 0.
So you can a line parallel to x-axis.

• In the left image the red point is far from center of the ellipse and
the line also not a good fit but in the right case the read is closer to
center of ellipses the line is also a better fit then left image. So we
can also conclude that center of ellipses will be the minima or the
optimal value of theta’s which will the best fit for the given data
points

Figure 10 : Contour Plots(source : www.google.com/images)

We have written our cost function but How to minimize it? So we have
solutions to the problem

• Gradient Descent

• Normal Equations

Gradient Descent
We are standing at top of the hill and we look 360 degree around us.
We want to take small step in the direction which will take us
downhill. In case of best direction would be the direction of steepest
descent. We then reach a point we follows the same steps until we
reach the ground as in case of below image.
Figure 11 : Gradient Descent (source : https://codesachin.wordpress.com/tag/gradient-descent/)

There is one more thing. Are you sure that you will always reach the
same minimum? I can guarantee you are not sure. In case gradient
descent you can’t be sure. Lets take the mountain problem. If you start
few steps from your right It is entirely possible that you end up at
completely different minima as shown in below image

Figure 12 : Problem with Gradient Descent(source : https://codesachin.wordpress.com/tag/gradient-
descent/)

We will discuss the mathematical interpenetration of Gradient Descent

but let’s understand some terms and notations as follows :

• alpha is learning rate which describes how big the step you take.
• Derivative gives you the slope of the line tangent to the ‘theta’
which can be either positive or negative and derivative tells us that
we will increase or decrease the ‘theta’.

• Simultaneous update means that both theta should be updated

simultaneously.

The derivation will be given by

Figure 13 : Gradient Descent Derivation

. . .

Normal Equations
As Gradient Descent is an iterative process, Normal equations help to
find the optimum solution in one go. They use matrix multiplication.
The formula’s and notations are explained in the images. Below right
image will explain what will be our X and y from our example. The first
column of our X will always be 1 because it will be multiplied by
Theta_0 which we know is our intercept to the axis's.
Figure 14 : Normal Equation with Example

The derivation of the Normal Equation are explained in below image.

They use matrix notation and properties.

This image explains the

• ‘Theta’ matrix

• ‘x’ matrix

• hypothesis is expanded

• Using those matrix we can rewrite the hypothesis as given is last

step
Figure 15: Notations in Normal Equations

Figure 16 explains the following

• We will replace our hypothesis in error function.

• We assume z matrix as given in step 2

• Error Function can be rewritten as step 3. So if you multiply

transpose of Z matrix with Z matrix. we will get our step 1
equation

• We can decompose z from step 2 as shown in step 4.

• By the property given in Step 6. We will rewrite z as given in Step

7. Capital X is also known as design matrix is transpose of small x
Figure 16 : Normal Equation Derivation Part 1

• Substituting z in the equation

• Expand the error equation and then take derivative with respect to
theta and equate it to 0.

• We will get out solution. As shown in below image

Figure 17: Normal Equation Derivation Part 2

. . .
Comparison Between Gradient Descent
and Normal Equations
• We need to choose alpha, initial value of theta in case of gradient
Descent but Normal equations we don’t have to choose alpha or
theta.

• Gradient Descent is an iterative process while Normal Equation

gives you solution right away.

• Normal equation computation gets slow as number of features

increases but Gradient Descent performs well with features being
with very large.

. . .

We have seen linear regression is powerful algorithm for modeling

relationship and predicting values based on that relationship. We have
also discussed the methods of minimizing the error so that we can
obtain the optimum values of our parameters. The linear Regression
doesn’t perform well when it comes to classification problems. We will
what algorithm overcomes that limitation in the next article. Keep
Reading.

This article is taken from this link.

Math Quiz To g8
90% (10)
Math Quiz To g8
4 pages
Machine Learning - Exploring The Model - Resp
No ratings yet
Machine Learning - Exploring The Model - Resp
18 pages
Algebra Cheat Sheets PDF
100% (1)
Algebra Cheat Sheets PDF
10 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
ML Coursera
No ratings yet
ML Coursera
10 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
43 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Example 1 Graph Solution
No ratings yet
Example 1 Graph Solution
21 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
Lecture-17-Linear Regression Using Sklearn
No ratings yet
Lecture-17-Linear Regression Using Sklearn
32 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Week 1
No ratings yet
Week 1
9 pages
3. error functions
No ratings yet
3. error functions
11 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
Functions and Graphs
No ratings yet
Functions and Graphs
9 pages
Image Reconstruction With Tikhonov Regularization
No ratings yet
Image Reconstruction With Tikhonov Regularization
5 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
Unit 3.1 Gradient Descent in Linear Regression
No ratings yet
Unit 3.1 Gradient Descent in Linear Regression
6 pages
03_Cost_Function_-_Intuition_I_11_min
No ratings yet
03_Cost_Function_-_Intuition_I_11_min
4 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
12 pages
Olevel Additional Mathematics Revision Notes by Fahmeed Rajput
100% (1)
Olevel Additional Mathematics Revision Notes by Fahmeed Rajput
37 pages
Calculus II - Equations of Lines
No ratings yet
Calculus II - Equations of Lines
6 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Understanding Regression With Geometry - by Ravi Charan - Medium
No ratings yet
Understanding Regression With Geometry - by Ravi Charan - Medium
20 pages
Example of A Linear Function
No ratings yet
Example of A Linear Function
4 pages
Deriving The Normal Equation Using Matrix Calculus
No ratings yet
Deriving The Normal Equation Using Matrix Calculus
18 pages
Complex Numbers: Learning Outcomes
No ratings yet
Complex Numbers: Learning Outcomes
52 pages
Graphing Data and Curve Fitting
No ratings yet
Graphing Data and Curve Fitting
5 pages
Juvis Piecewise
No ratings yet
Juvis Piecewise
14 pages
Day 3
No ratings yet
Day 3
8 pages
Machine Learning
No ratings yet
Machine Learning
60 pages
02_Cost_Function_8_min
No ratings yet
02_Cost_Function_8_min
3 pages
MLF_Week_4_Notes_by_Manisha_Pal
No ratings yet
MLF_Week_4_Notes_by_Manisha_Pal
13 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
Computer Graphics
No ratings yet
Computer Graphics
49 pages
Final Exam - Dela Cruz, Regelle
No ratings yet
Final Exam - Dela Cruz, Regelle
11 pages
Curve Fitting WZ Zilliam Lee
No ratings yet
Curve Fitting WZ Zilliam Lee
14 pages
How To Find The Line of Best Fit in 3 Steps
No ratings yet
How To Find The Line of Best Fit in 3 Steps
5 pages
HTTP Tutorial Math Lamar Edu Classes Alg GraphRationalFcns Aspx
No ratings yet
HTTP Tutorial Math Lamar Edu Classes Alg GraphRationalFcns Aspx
8 pages
CalcI MoreVolume Solutions
No ratings yet
CalcI MoreVolume Solutions
14 pages
Histogram Charts in Matlab: Data Analysis Statistics
No ratings yet
Histogram Charts in Matlab: Data Analysis Statistics
13 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
gradient-descent-from-scratch-complete-intuition
No ratings yet
gradient-descent-from-scratch-complete-intuition
8 pages
Polynomial Curve Fitting in Machine Learning
No ratings yet
Polynomial Curve Fitting in Machine Learning
7 pages
Peter Shirley-Ray Tracing in One Weekend 2
No ratings yet
Peter Shirley-Ray Tracing in One Weekend 2
10 pages
An Intuitive Guide To Linear Algebra - BetterExplained
No ratings yet
An Intuitive Guide To Linear Algebra - BetterExplained
6 pages
What Is Differentiation?: Rate of Change of The Distance Compared To The Time. The Slope Is Positive All The
No ratings yet
What Is Differentiation?: Rate of Change of The Distance Compared To The Time. The Slope Is Positive All The
14 pages
Polynomial Regression
No ratings yet
Polynomial Regression
6 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
Calculus Graphs
No ratings yet
Calculus Graphs
8 pages
Chapter 2 - Intercepts, Zeros, And Solutions
No ratings yet
Chapter 2 - Intercepts, Zeros, And Solutions
22 pages
Advanced Mathematics - Chapter 2
No ratings yet
Advanced Mathematics - Chapter 2
22 pages
Chapter 4 - Exponential and Logarithmic Functions
No ratings yet
Chapter 4 - Exponential and Logarithmic Functions
25 pages
Curve Fitting
No ratings yet
Curve Fitting
18 pages
Assignment4 40168195
No ratings yet
Assignment4 40168195
10 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Bézier Circles and other shapes
From Everand
Bézier Circles and other shapes
G. Adam Stanislav
5/5 (1)
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Solutions - QR Assessment Practice Problems
No ratings yet
Solutions - QR Assessment Practice Problems
22 pages
Lesson Script in Mathematics
No ratings yet
Lesson Script in Mathematics
43 pages
Linear Algebra - Jim Hefferon
No ratings yet
Linear Algebra - Jim Hefferon
446 pages
Math 8 2nd Exam 2018-2019
100% (1)
Math 8 2nd Exam 2018-2019
2 pages
PSAT89ResourcesonKA 13020
No ratings yet
PSAT89ResourcesonKA 13020
2 pages
Partial Differential Equation
No ratings yet
Partial Differential Equation
39 pages
Done Water Filtration - Lesson Plan
No ratings yet
Done Water Filtration - Lesson Plan
10 pages
GR 8 Unit 2 Lines Test Review
No ratings yet
GR 8 Unit 2 Lines Test Review
11 pages
05 Task Performance 1
No ratings yet
05 Task Performance 1
3 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Graphing Linear Equations "Cheat Sheet"
No ratings yet
Graphing Linear Equations "Cheat Sheet"
2 pages
Functions and Linear Models
100% (1)
Functions and Linear Models
61 pages
Form 2 Assigment
No ratings yet
Form 2 Assigment
22 pages
Systems of Linear Equations in 2 Variables
No ratings yet
Systems of Linear Equations in 2 Variables
17 pages
Math 080 Final-Exam Review 04-2017
No ratings yet
Math 080 Final-Exam Review 04-2017
19 pages
Math 9 Quarter 1 Week 1 - Ms. Tima
50% (2)
Math 9 Quarter 1 Week 1 - Ms. Tima
11 pages
Edsc 304 - Choice Board - Tic Tac Toe
No ratings yet
Edsc 304 - Choice Board - Tic Tac Toe
4 pages
Writing Equations From Word Problems
No ratings yet
Writing Equations From Word Problems
6 pages
Differential Equations: Work-Text in
No ratings yet
Differential Equations: Work-Text in
128 pages
Equation of Straight Line Part 1
No ratings yet
Equation of Straight Line Part 1
13 pages
Leon 8 Errata
No ratings yet
Leon 8 Errata
17 pages
4pm1 01r Rms 20230824
No ratings yet
4pm1 01r Rms 20230824
29 pages
Simultaneous Equation (Economics) (DONE)
No ratings yet
Simultaneous Equation (Economics) (DONE)
9 pages
Linear Algebra Lecture Notes ch1
No ratings yet
Linear Algebra Lecture Notes ch1
10 pages
Coefficients Linear in Two Variables1
No ratings yet
Coefficients Linear in Two Variables1
5 pages
Format For Acp 2022-23 - Secondary (Std-Ix)
No ratings yet
Format For Acp 2022-23 - Secondary (Std-Ix)
49 pages
Unit Plan
100% (1)
Unit Plan
2 pages
Algebra II Unit 1 Planning
No ratings yet
Algebra II Unit 1 Planning
7 pages