0% found this document useful (0 votes)

268 views7 pages

Derivation of Normal Equations

The document discusses the derivation of the normal equations for linear regression. It defines the hypothesis function and cost function, and expresses them using matrix notation. It then takes the partial derivative of the cost function with respect to the regression coefficients to derive an expression for the normal equations, which can be set to zero to solve for the coefficients that minimize total squared error.

Uploaded by

Muhammad Usman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

268 views7 pages

Derivation of Normal Equations

Uploaded by

Muhammad Usman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Derivation of Normal Equations for Linear Regression

Instructor: Dr. Ali Hassan

1 Linear Regression

In these notes I want to explain you how the normal equations for linear regression are derived. For doing
this, I would be using the vector and matrix calculus. Please do remember that you can find these details
on the internet on different websites like wikipedia. The idea is same but the symbols might be different. I
am going to stay consistant with my course contents.
We have defined the following hypothesis function:

hθ (x) = θ0 x0 + θ1 x1 + . . . + θn xn , (1)

where each x is a n-dimensional vector or observation. Our goal is to minimising the cost function J(Θ)
using least squares as follows:

m
1 X
J(Θ) = (hθ (x)(i) − y (i) )2 , (2)
2m i=1

where x(i) is an ith observation from m number of samples and y (i) is the ground truth (class label) for that
ith observation. While doing these calculations, the rule of thumb is to check the dimensions of the resulting
vectors. This gives you a good idea of whether you have done the calculus correctly.
So here I redefine the vectors in our problem just as a reminder. The regression coefficients in our problem
are θ defined as

 
θ0
 θ1 
 ∈ Rn+1 (3)
 
 ..
 . 
θn

Each mth input x is a vector of n + 1 dimensions where we add x0 = 1 for convenience. Therefore our
hypothesis function becomes:

hθ (x) = θ T x, (4)

where this equation represents the dot product of the two vectors. We define the ‘design matrix’ are refered
to as X using the following notation:

1
2 Derivation of Normal Equations for Linear Regression

· · · x(1)T · · ·
 
 · · · x(2)T · · · 
X= (5)
 
.. 
 . 
· · · x(m)T · · ·

as a matrix of m rows, in which each row is the ith sample (the vector x(i) ). With this, we can rewrite the
least-squares cost as following, replacing the explicit sum by matrix multiplication:

1
J(θ) = (Xθ − y)T (Xθ − y) (6)
2m

1
Using some matrix transpose identities, we can simplify this a bit. I will throw away the fraction part 2m ,
since we’re going to compare a derivative to zero anyway. So here are the equations:

J(θ) = ((Xθ)T − y T )(Xθ − y) (7)

T T T T
J(θ) = (Xθ) Xθ − (Xθ) y − y (Xθ) + y y (8)

Note that Xθ is a vector, and so is y. So when we multiply one by another, it doesn’t matter what the
order is (as long as the dimensions work out). So we can further simplify:

J(θ) = θ T X T Xθ − 2(Xθ)T y + y T y (9)

This is the matrix representation of the cost function J that we wished to minimise. Remember using the
system of linear equations we had taken the parital derivative of the cost fucntion and equated it equal to
zero. Using this derivative we used the Gradient descent algorithm to find the values of θ that will minimise
this cost function.
Now we have to take the partial derivative with respect to vectors which is lightly uncomfortable, but I will
try to explain it.
Lets work at the derivatives in terms of vectors. Lets consider the following fucntion:

f (v) = aT v (10)

We know how to take the parital derivative of the above function by expanding it. So lets define the partial
derivatives first:

 ∂f

∂v1
 ∂f 
∂  ∂v2 
f = ..  (11)
∂v 
 .


∂f
∂vn

We know how to solve this by opening the vectors as follows:

Derivation of Normal Equations for Linear Regression 3

f (v) = aT v (12)
= a1 v1 + a2 v2 + . . . + an vn (13)

Now computing the partial derivatives of each component we get following:

∂f
= a1
∂v1
∂f
= a2
∂v2
..
.
∂f
= an (14)
∂vn
(15)

So in terms of vectors, we can write the above equation as:

∂
f = a. (16)
∂v
Lets keep working through the partial derivatives with respect to vectors. Lets define

P (θ) = 2(Xθ)T y. (17)

Over here please verify the dimensions of the above equation if the satisfy. Now lets expand the above
equations:

  T  y 
x11 x12 ··· x1n θ1 1
 x21 x22 ··· x2n   θ2   y 2

P (θ) = 2 ×    (18)


 · · · ··· ···

···  ···   .. 
 . 
xm1 xm2 ··· xmn θn ym

Expanind the equations:

T  

x11 θ1 + · · · +x1n θn y1
 x21 θ1 + · · · +x2n θn   y2 
P (x) = 2 ×  (19)
 

··· ··· ···
  .. 
.
   
xm1 θ1 + · · · +xmn θn ym

and now multiplying the matrix with vector y:

P (θ) = 2(x11 θ1 + · · · + x1n θn )y1 +

(x21 θ1 + · · · + x2n θn )y2 +
··· +
(xm1 θ1 + · · · + xmn θn )ym (20)
4 Derivation of Normal Equations for Linear Regression

We can write the above equations in a much compact form using summations signs

m
X
P (x) = 2 (xr1 θ1 + · · · + xrn θn )yr (21)
r=1
Xm n
X
= 2 yr xrc θc (22)
r=1 c=1

Now start taking the partial derivatives of above equation by θi :

∂P
= 2(x11 y1 + · · · + xm1 ym )
∂θ1
∂P
= 2(x12 y1 + · · · + xm2 ym )
∂θ2
..
.
∂P
= 2(x1n y1 + · · · + xmn ym )
∂θn
(23)

So we can write the above equations using the matrix forms as follows:

∂
P = 2Xy (24)
∂θ

∂ ∂
P = (2(Xθ)T y)
∂θ ∂θ
= 2Xy (25)

Take a moment to convince yourself this is true. It’s just collecting the individual components of X into a
matrix and the individual components of y into a vector. Since X is a m-by-n matrix and y is a m-by-1
column vector, the dimensions work out and the result is a n-by-1 column vector.
So we’ve just computed the second term of the vector derivative of J. Now lets go back to the full definition
of J and see how to compute the derivative of its first term. Lets call it Q

Q(θ) = θ T X T Xθ (26)

This will be slightly more complex but here we go ... plz note there is a transpose of X.

   θ1

x11 x21 ··· xm1 x11 x12 ··· x1n
 x12 x22 ··· xm2   x21 x22 ··· x2n 
 θ2 
Q(θ) = (θ1 · · · θn )   (27)

···

···  .. 
.
   
x1n x2n ··· xmn xm1 xm2 ··· xmn θn
Derivation of Normal Equations for Linear Regression 5

The centre two matrices are simply X-squared matrix which is n-by-n. The element in row r and column c
of this square matrix is:

m
X
xir xic , (28)
i=1

2
where X-squared is a symmetric matrix. Lets call each element of this squared matrix as Xrc . Now the
above equation becomes:

2 2
 
X11 θ1 + · · · + X1n θn
2 2
 X21 θ1 + · · · + X2n θn 
Q(θ) = (θ1 · · · θn )   (29)
 ··· 
2 2
Xn1 θ1 + · · · + Xnn θn

Now multiplying by θ we get:

2 2
Q(θ) = θ1 (X11 θ1 + · · · + X1n θn ) +
2 2
θ2 (X21 θ1 + · · · + X2n θn ) +
··· +
2 2
θn (Xn1 θ1 + · · · + Xnn θn ) (30)

Lets start calculating the partial derivatives as follows:

∂Q 2 2 2 2 2
= (2θ1 X11 + θ2 X12 + · · · + θn X1n ) + θ2 X21 + · · · + θn Xn1 (31)
∂θ1

2 2
Remember X is a symmetric matrix, so X12 = X21 . Hence,

∂Q 2 2 2
= 2θ1 X11 + 2θ2 X12 + · · · + 2θn X1n (32)
∂θ1

The remaining partial derivatives will also be similar.

Now writing the partial derivatives in the vector forms, we get:

∂Q
= 2X 2 θ = 2X T Xθ (33)
∂θ

Hence,

∂ T T
θ X Xθ = 2X T Xθ. (34)
∂θ

Now lets look back it the matrix form of the cost fucntion J given in (9) and insert the partial derivatives.
6 Derivation of Normal Equations for Linear Regression

∂J ∂ T T
= (θ X Xθ − 2(Xθ)T y + y T y) (35)
∂θ ∂θ
= 2X T Xθ − 2X T y = 0 (36)

2X T Xθ = 2X T y (37)

Now assuming that matrix X T X is invertible, we get

θ = (X T X)−1 X T y (38)

which is the normal equation that we use for calculating the θ empirically using matrix notations.

2 Derivation Using the Matrix Derivatives

Now that you know how to calculate the partial derivatives of matrices, lets have a look at how to solve the
derivative of normal equations for linear regression. Lets recap the cost function for the linear regression:
m
1 X
J(Θ) = (hθ (x)(i) − y (i) )2 , (39)
2m i=1

The vectorised form of this equation is given as follows:

1
J(θ) = (Xθ − y)T (Xθ − y) (40)
2m

1
Throwing away the fraction part, 2m , is the simplified version of the equation
1
Using some matrix transpose identities, we can simplify this a bit. I will throw away the fraction part 2m ,
since we’re going to compare a derivative to zero anyway. So here are the equations:

J(θ) = ((Xθ)T − y T )(Xθ − y) (41)

T T T T
J(θ) = (Xθ) Xθ − (Xθ) y − y (Xθ) + y y (42)

Rearranging the above matrices and vectors, we get:

J(θ) = θ T X T Xθ − 2y T (Xθ) + y T y (43)

Now our goal is simply to find the values of theta that minimises the cost function J(θ). To do that we
use the concepts from our FCS where to minimise the function, all we need to do is to take the derivative of
the function that we want minimise and put it equal to zero. To do that we got to remember one thing that
here we are dealing with vectors and matrices not variables. So lets have a look at some of the rules that we
are going to use.
Derivation of Normal Equations for Linear Regression 7

2.1 Matrices Derivatives

If we have to take a derivative of a function of θ, f (θ) with respect to (w.r.t) θ, then the partial derivative
is comes out to be:

∂
(Aθ) = AT (44)
∂θ

Secondly,

∂ T
(θ Aθ) = 2AT θ (45)
∂θ

2.2 Calculating the Derivatives

Now lets apply these functions on the cost function. Therefore:

∂J(θ) ∂ T T 1
= (θ X Xθ − 2y T (Xθ) + y T y) (46)
∂θ ∂θ 2m
1
= (2X T Xθ − 2X T y + 0) (47)
2m

All that we need to do now is to put the first derivative of the function equal to zero and solve for θ.

1
(2X T Xθ − 2X T y + 0) = 0, (48)
2m
2X T Xθ = 2X y,T
(49)
X T Xθ = X T y, (50)
T −1 T
θ = (X X) X y, (51)

which is the equation for solving θ. This is called the Normal Equation for linear regression.

GREAT Manager Framework
100% (4)
GREAT Manager Framework
14 pages
L5 Normal Equations For Regression PDF
No ratings yet
L5 Normal Equations For Regression PDF
20 pages
Shenzhen Denver 3000T User Manual
No ratings yet
Shenzhen Denver 3000T User Manual
358 pages
Moog Valves DIVelectricalInterfaces Manual
No ratings yet
Moog Valves DIVelectricalInterfaces Manual
108 pages
Linear Algebra Assignment Solution
100% (1)
Linear Algebra Assignment Solution
28 pages
Elephant Lifting Catalog v48
100% (1)
Elephant Lifting Catalog v48
80 pages
M-35 Mix Design
No ratings yet
M-35 Mix Design
1 page
The Four Flavors of INTJ In-Depth: Analytic Intuiting With Analytic Thinking
No ratings yet
The Four Flavors of INTJ In-Depth: Analytic Intuiting With Analytic Thinking
4 pages
CS6910 Tutorial1
No ratings yet
CS6910 Tutorial1
10 pages
Literary Voice - March 2021
No ratings yet
Literary Voice - March 2021
372 pages
Least Squares Curve Fitting: Numerical Methods
No ratings yet
Least Squares Curve Fitting: Numerical Methods
39 pages
Introduction To Management Accounting
No ratings yet
Introduction To Management Accounting
30 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Leadership Across Cultures
No ratings yet
Leadership Across Cultures
36 pages
Distribution and Habitat Association of Somali Ostrich in Samburu, Kenya
No ratings yet
Distribution and Habitat Association of Somali Ostrich in Samburu, Kenya
9 pages
Computer Network - CS610 Power Point Slides Lecture 12
No ratings yet
Computer Network - CS610 Power Point Slides Lecture 12
20 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
No ratings yet
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
25 pages
Derivation Ols
No ratings yet
Derivation Ols
11 pages
Ed Ruscha's One Way Street
No ratings yet
Ed Ruscha's One Way Street
16 pages
Man Cruise
No ratings yet
Man Cruise
73 pages
Ann2018 L5
No ratings yet
Ann2018 L5
23 pages
ML - Lec 5 - Regression - Gradient Descent Least Square
No ratings yet
ML - Lec 5 - Regression - Gradient Descent Least Square
59 pages
Matrix Calculus Tutorial
No ratings yet
Matrix Calculus Tutorial
7 pages
03 Regression
No ratings yet
03 Regression
55 pages
Linear Regression: Normal Equation and Gradient Descent
No ratings yet
Linear Regression: Normal Equation and Gradient Descent
17 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Project On Mysql
No ratings yet
Project On Mysql
67 pages
Lecture 0.2 - Linear Methods For Regression, Optimization
No ratings yet
Lecture 0.2 - Linear Methods For Regression, Optimization
53 pages
The Handbook of Mobile Middleware 1st Edition Paolo Bellavista 2024 Scribd Download
No ratings yet
The Handbook of Mobile Middleware 1st Edition Paolo Bellavista 2024 Scribd Download
45 pages
Lecture 1
No ratings yet
Lecture 1
51 pages
Chapter 003
No ratings yet
Chapter 003
54 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Lecture 5
No ratings yet
Lecture 5
31 pages
3.2 Least Square and Polynomial Regression
No ratings yet
3.2 Least Square and Polynomial Regression
39 pages
Data Science Unit-II
No ratings yet
Data Science Unit-II
28 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Linear Algebra: 03/26/12 Revised by D.H. Chen 1
No ratings yet
Linear Algebra: 03/26/12 Revised by D.H. Chen 1
47 pages
Deriving The Normal Equation Using Matrix Calculus
No ratings yet
Deriving The Normal Equation Using Matrix Calculus
18 pages
MAFE208IU-L6 - Least Squares Regression
No ratings yet
MAFE208IU-L6 - Least Squares Regression
47 pages
NW NSC GR 11 Maths Lit P1 Eng Memo Nov 2019
No ratings yet
NW NSC GR 11 Maths Lit P1 Eng Memo Nov 2019
7 pages
Full - Detils - To Find Coefficient - REGRESSION
No ratings yet
Full - Detils - To Find Coefficient - REGRESSION
20 pages
Day 1
No ratings yet
Day 1
41 pages
Lecture 4 - Estimation - BMSLec03
No ratings yet
Lecture 4 - Estimation - BMSLec03
20 pages
CENG3300 Lecture 2-1
No ratings yet
CENG3300 Lecture 2-1
21 pages
Lec10 LeastSquaresRegression PDF
No ratings yet
Lec10 LeastSquaresRegression PDF
4 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
EC501 Lecture 01
No ratings yet
EC501 Lecture 01
28 pages
V. Nonlinear Regression by Modified Gauss-Newton Method: Theory
No ratings yet
V. Nonlinear Regression by Modified Gauss-Newton Method: Theory
39 pages
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
No ratings yet
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
6 pages
Matrix Differentiation
No ratings yet
Matrix Differentiation
15 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
A Symmetric Function Approach To Polynomial Regression - Hans-Christian Herbig, Daniel Herden, Christopher Seaton
No ratings yet
A Symmetric Function Approach To Polynomial Regression - Hans-Christian Herbig, Daniel Herden, Christopher Seaton
12 pages
Pronoun-Antecedent Rules
No ratings yet
Pronoun-Antecedent Rules
22 pages
Applied Econometrics: Department of Economics Stern School of Business
No ratings yet
Applied Econometrics: Department of Economics Stern School of Business
27 pages
Shahzad 2014
No ratings yet
Shahzad 2014
21 pages
Lec 3
No ratings yet
Lec 3
20 pages
Menalled Et Al Canopy Develop Trop Tree Plantations
No ratings yet
Menalled Et Al Canopy Develop Trop Tree Plantations
15 pages
(: Subtitle) : Dissertation Title
No ratings yet
(: Subtitle) : Dissertation Title
18 pages
LMnotes 04
No ratings yet
LMnotes 04
9 pages
5630 Cree
No ratings yet
5630 Cree
32 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
7 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Machine Learning - Home - Week 2 - Notes - Coursera
No ratings yet
Machine Learning - Home - Week 2 - Notes - Coursera
10 pages
2013 ME Magway,, English
No ratings yet
2013 ME Magway,, English
4 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Math YHPLinear Regression
No ratings yet
Math YHPLinear Regression
13 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Solution Quiz 1
No ratings yet
Solution Quiz 1
5 pages
Mathématiques Avancées Master Wow
No ratings yet
Mathématiques Avancées Master Wow
4 pages
Astm C40 C40M 16
No ratings yet
Astm C40 C40M 16
1 page
Cs419 Closed Form Derv
No ratings yet
Cs419 Closed Form Derv
5 pages
1 Linear Algebra: 1 K 1 1 K K 1 K
No ratings yet
1 Linear Algebra: 1 K 1 1 K K 1 K
3 pages
Machine Learning 2
No ratings yet
Machine Learning 2
5 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
Mat Deriv
No ratings yet
Mat Deriv
3 pages
8.design and Analysis of A Conformal MIMO Ingestible Bolus Sensor Antenna For Wireless Capsule Endoscopy For Animal Husbandry
No ratings yet
8.design and Analysis of A Conformal MIMO Ingestible Bolus Sensor Antenna For Wireless Capsule Endoscopy For Animal Husbandry
9 pages
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
No ratings yet
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
9 pages
Gradients Involving Matrices
No ratings yet
Gradients Involving Matrices
5 pages
Derivation of The Normal Equation For Linear Regression - Eli Bendersky's Website
No ratings yet
Derivation of The Normal Equation For Linear Regression - Eli Bendersky's Website
2 pages
fml-g12s Ds en
No ratings yet
fml-g12s Ds en
7 pages
Q2 Lesson 1 Worksheet
No ratings yet
Q2 Lesson 1 Worksheet
2 pages
Sentence Structure: Categories Noun
No ratings yet
Sentence Structure: Categories Noun
4 pages
Phy340-Tutorial 2
No ratings yet
Phy340-Tutorial 2
2 pages
Leave Application For The Death in The Family
No ratings yet
Leave Application For The Death in The Family
1 page
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Derivation of Normal Equations

Uploaded by

Derivation of Normal Equations

Uploaded by

Derivation of Normal Equations for Linear Regression

Instructor: Dr. Ali Hassan

J(θ) = ((Xθ)T − y T )(Xθ − y) (7)

J(θ) = θ T X T Xθ − 2(Xθ)T y + y T y (9)

We know how to solve this by opening the vectors as follows:

Now computing the partial derivatives of each component we get following:

So in terms of vectors, we can write the above equation as:

P (θ) = 2(Xθ)T y. (17)

Expanind the equations:

and now multiplying the matrix with vector y:

P (θ) = 2(x11 θ1 + · · · + x1n θn )y1 +

Now start taking the partial derivatives of above equation by θi :

Now multiplying by θ we get:

Lets start calculating the partial derivatives as follows:

The remaining partial derivatives will also be similar.

Now assuming that matrix X T X is invertible, we get

2 Derivation Using the Matrix Derivatives

The vectorised form of this equation is given as follows:

J(θ) = ((Xθ)T − y T )(Xθ − y) (41)

Rearranging the above matrices and vectors, we get:

J(θ) = θ T X T Xθ − 2y T (Xθ) + y T y (43)

2.1 Matrices Derivatives

2.2 Calculating the Derivatives

Now lets apply these functions on the cost function. Therefore:

You might also like