0% found this document useful (0 votes)

26 views12 pages

regression

Uploaded by

Chandan kumar Mohanta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views12 pages

regression

Uploaded by

Chandan kumar Mohanta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

M.Sc.

Applications of Mathematics
Project : Application of Least Squares Method in Regression Analysis
Presented by : Tamal Kanti Panja, Shouvik Sardar, Suvadip Roy

INTRODUCTION :

• What is Regression?
Regression is a statistical measure that attempts to determine the strength of the relationship
between one dependent variable (usually denoted by Y) and a single or a series of other changing
variables (known as independent variables).

• What is Regression Analysis?

In statistics, Regression Analysis is a statistical technique for estimating the relationships
among variables. It includes many techniques for modeling and analyzing several variables, when
the focus is on the relationship between a dependent variable and one or more independent variables.
More specifically, regression analysis helps us to understand how the typical value of the dependent
variable changes when any one of the independent variables is varied, while the other independent
variables are held fixed.

• What is the objective?

In Regression Analysis, our objective is to get the exact form of the approximate relationship
between a dependent variable and one or more independent variables, which is widely used for pre-
diction and forecasting purposes.
The idea behind Regression Analysis is to verify that a function Y = f (X) fits a given data
set {(x1 , y1 ), (x2 , y2 ), . . . . . . , (xn , yn )} after obtaining the parameters that identify the function f (X).
The value X represents one or more independent variables.
The function f (X) can be, for example, a linear function, i.e.
Y = mX + b or Y = b0 + b1 X1 + b2 X2 + · · · · · · + bm Xm ,
a polynomial function, i.e.
Y = b0 + b1 X + b2 X 2 + · · · · · · + bp X p ,
or other non-linear functions such as :
(i) Exponential function, i.e. Y = abX ,
(ii) Logistic function, i.e.
e(b0 +b1 X) 1
Y = (b +b X)
= −(b0 +b0 X) ,
e 0 1 +1 e +1
(iii) Trigonometric function, i.e. Y = g(sin(bX)) + E, where g(.) is a known polynomial function
and E stands for error.

Note: We use the above non-linear functions in some particular phenomena such as :
(i) Exponential function is used in census and for population prediction.
(ii) Logistic function is used for predicting the outcome of a categorical data. Categorical data
refers to a data which has more than one source of variation.
(iii) Trigonometric function is used in case of share market purpose.
Though there are various types of regression model, in general purposes we emphasize on linear
and polynomial equations only.
• Why linear and polynomial equations only?
We use linear and polynomial equations because
(i) Linear and polynomial equations are linear in parameters. So these are easier to deal with
than any other types of equations.
(ii) Analysis of linear and polynomial equations are less time, money and labour consuming.
(iii) In most of the cases, the regression equation can be well approximated by linear or polyno-
mial equations with less error.

• What is the procedure of obtaining a regression equation?

The procedure of obtaining a regression equation consists in postulating a form of the function
to be fitted, Y = f (X), which will depend, in general, on a number of parameters, say {b0 , b1 , . . . , bk }.
Then we choose a suitable method to determine the values of those parameters. The most commonly
used method for this purpose is the Least Squares Method. In this context, we will describe how
least squares method is useful in regression analysis. However, there are many other useful ways of
regression analysis but here we are interested only on least squares method since this is the most
useful, suitable and commonly used method for fitting a linear or polynomial equation to a given set
of observations on a dependent and one or more independent variable(s).

METHOD OF LEAST SQUARES :

Suppose (xi , yi ), i = 1, 2, . . . . . . , n be n paired observations on two variables X and Y where

Y is the study variable and X is the explanatory variable. Let Y = f (a1 , a2 , . . . , ak ; X) be the
empirical formula to be fitted on the data where the form of the empirical formula is obtained either
by graphical display or through any other method. Here, a1 , a2 , . . . , ak are the unknown parameters
to be estimated on the basis of the given data.
Here, we assume that the empirical formula to be fitted is linear in parameters a1 , a2 , . . . , ak .
We also assume that only Y values are subjected to error while the X values are free from errors.
Naturally, we measure the residuals (error part) along the Y-axis only.
Here, ei = Residual for ith paired observation = yi –f (a1 , a2 , . . . , ak ; xi ), i = 1, 2, . . . , n
Xn
We define S2 = e2i as the Residual Sum of Squares (RSS)
i=1
In Least Squares Method, we assume that the best fitted curve is that for which RSS is mini-
mum. In ideal situation, best fitted curve will be provided by
RSS = 0
Xn
⇔ e2i = 0
i=1
⇔ ei = 0 ∀ i = 1, 2, . . . , n

But such a situation will be realised very rarely in practice. For other situations, we shall try to
minimize the RSS with respect to the unknown parameters a1 , a2 , . . . , ak . For this purpose, we
equate the partial derivatives of S 2 with respect to a1 , a2 , . . . , ak separately to 0 to generate as many
equations as the no. of parameters. These equations are known as normal equations.
Mathematically,
∂S 2
= 0 ∀ j = 1, 2, . . . , k are k normal equations in k unknowns
∂aj
Finally, these normal equations are solved simultaneously to get the estimates of a1 , a2 , . . . , ak . Let
the estimates be aˆ1 , aˆ2 , . . . , aˆk . Then the fitted equation is
Y = f (aˆ1 , aˆ2 , . . . , aˆk ; X)
• Fitting of equation when it is linear in parameters :

Polynomial and multiple linear equations are linear in parameters. So, in this context we
will fit polynomial and multiple linear equations on the basis of observed data on dependent and
independent variables.
1. To fit Y = a0 +a1 X +a2 X 2 +· · ·+ap X p (pth degree polynomial) on n paired observations
(xi , yi ), i = 1, 2, . . . , n.
n
X
Here, S = 2
(yi − a0 − a1 xi − a2 x2i − · · · − ap xpi )2
i=1

Normal equations are :

n
∂S 2 X 
=0⇒ (yi − a0 − a1 xi − a2 x2i − · · · − ap xpi ) = 0


∂a0


i=1


n

∂S 2 X 

xi (yi − a0 − a1 xi − a2 x2i − · · · − ap xpi ) = 0

=0⇒ 

∂a1


i=1


n 
∂S 2 X 

=0⇒ x2i (yi − a0 − a1 xi − a2 x2i − · · · − ap xpi ) = 0 · · · · · · (∗)
∂a2 i=1 

.. 

. 


.. 

.



n

2 
∂S X 
xpi (yi − a0 − a1 xi − a2 x2i − · · · − ap xpi ) = 0

=0⇒ 

∂ap

i=1

(∗) : (p+1) normal equations for fitting a pth degree polynomial.

Matrix Representation of Normal Equations :

The (p+1) normal equations can be expressed in matrix notation as
 n n    X n 
a
X X p
 n xi · · · xi  0 yi 
 a1  
i=1 i=1
     i=1 
 n n n  a   n 
X X X 2  X
p+1  
 2
  
 xi xi · · · xi   ..    xi y i 

 i=1  . =  ⇔ M a = c, say

i=1 i=1 i=1
 . . . .   ..  
 
. 
 .. .. .. ..  .   .. 
  .   
n n n . n
.
 X X X   X 
p p+1 2p  p
 
xi xi ··· xi xi y i
  
ap
i=1 i=1 i=1 i=1

Given n paired observations (xi , yi ), i = 1, 2, . . . , n ; we can obtain the elements of M and c

whereas a = (a0 , a1 , a2 , . . . , ap )T is unknown. So, we have a linear system of equations with
(p+1) unknowns. So, the unknown a can be obtained by solving the linear system of equations
M a = c and we get â = (aˆ0 , aˆ1 , aˆ2 , . . . , aˆp )T as the Least Square Estimate of a and the fitted
equation is
Y = aˆ0 + aˆ1 X + aˆ2 X 2 + · · · + aˆp X p

Here, if we take p = 1, we get the linear equation as Y = aˆ0 + aˆ1 X

For p = 2, we get the quadratic equation as Y = aˆ0 + aˆ1 X + aˆ2 X 2 ,
for p = 3, the cubic equation as Y = aˆ0 + aˆ1 X + aˆ2 X 2 + aˆ3 X 3 and so on.
Ways to solve the linear system of equations M a = c :
(i) We can solve the linear system of equations M a = c by Gauss Elimination method.
(ii) Since the matrix M is a square matrix, we can apply LU factorisation method also.
(iii) Again, we can see that the matrix M is symmetric and it can be shown that it is a positive
definite matrix (since the determinant value of all the principal order sub-matrices are positive).
So, we can use Cholesky factorisation here to solve the linear system.
(iv) Moreover, depending on the data, the matrix M may have other properties that would lead
us to solve the system by some other methods but those properties are completely uncertain.
(v) If the matrix M be non-singular, we can use the crude method to solve the system and get
the estimate of a as â = M −1 c .
(vi) If the matrix M be non-singular, i.e. M has full column rank, we can apply QR factorisa-
tion also.

2. To fit Y = b0 + b1 X1 + b2 X2 + · · · + bm Xm (multiple linear equation) on n sets of

observations (x1i , x2i , . . . , xmi , yi ), i = 1, 2, . . . , n.
n
X
Here, S 2 = (yi − b0 − b1 x1i − b2 x2i − · · · − bm xmi )2
i=1

Normal equations are :

n
∂S 2 X 
=0⇒ (yi − b0 − b1 x1i − b2 x2i − · · · − bm xmi ) = 0


∂b0


i=1


n

∂S 2 X 


=0⇒ x1i (yi − b0 − b1 x1i − b2 x2i − · · · − bm xmi ) = 0 

∂b1


i=1


n 
∂S 2 X 

=0⇒ x2i (yi − b0 − b1 x1i − b2 x2i − · · · − bm xmi ) = 0 · · · · · · (∗∗)
∂b2 i=1 

.. 

. 


.. 

.



n

2 
∂S X 

=0⇒ xmi (yi − b0 − b1 x1i − b2 x2i − · · · − bm xmi ) = 0 

∂bm

i=1

(∗∗) : (m+1) normal equations for fitting a multiple linear equation on m independent variables.

Matrix Representation of Normal Equations :

The (m+1) normal equations can be expressed in matrix notation as
 n n    X n 
b
X X
n x1i ··· xmi  0 yi
b1 
  
 i=1 i=1
   i=1 
 n n n  b   n 
 X X
2
X  2   X 
x1i x1i ··· x1i xmi  .   x1i yi
 ..  = 
 

 ⇔ N b = d, say
 
 i=1 i=1 i=1   i=1
.. .. ..  ...   ..

 .. 
 . . . . 


  . 
  ..   X
 
n n n   n
.
 X X X 
xmi xmi x1i · · · x2mi xmi yi
   
bm
i=1 i=1 i=1 i=1
Given n sets of observations (x1i , x2i , . . . , xmi , yi ), i = 1, 2, . . . , n ; we can obtain the elements of
N and d whereas b = (b0 , b1 , b2 , . . . , bm )T is unknown. So, we have a linear system of equations
with (m+1) unknowns. So, the unknown b can be obtained by solving the linear system of
equations N b = d and we get b̂ = (bˆ0 , bˆ1 , bˆ2 , . . . , bˆm )T as the Least Square Estimate of b and the
fitted equation is
Y = bˆ0 + bˆ1 X1 + bˆ2 X2 + · · · + bˆm Xm

Here, if we take p = 1, we get the simple linear equation as Y = bˆ0 + bˆ1 X. (Since here we have
only one independent variable X1 , we can take it as X.)

Ways to solve the linear system of equations N b = d :

(i) We can solve the linear system of equations N b = d by Gauss Elimination method.
(ii) Since the matrix N is a square matrix, we can apply LU factorisation method also.
(iii) Again, we can see that the matrix N is symmetric and it can be shown that it is a positive
definite matrix (since the determinant value of all the principal order sub-matrices are positive).
So, we can use Cholesky factorisation here to solve the linear system.
(iv) Moreover, depending on the data, the matrix N may have other properties that would lead
us to solve the system by some other methods but those properties are completely uncertain.
(v) If the matrix N be non-singular, we can use the crude method to solve the system and get
the estimate of a as b̂ = N −1 d .
(vi) If the matrix N be non-singular, i.e. N has full column rank, we can apply QR factorisation
also.

• A measure of goodness of fit :

A measure of goodness of fit for fitting a pth degree polynomial Y = aˆ0 +aˆ1 X+aˆ2 X 2 +· · ·+aˆp X p ,
where aˆ0 , aˆ1 , aˆ2 , . . . , aˆp are least square estimates of a0 , a1 , a2 , . . . , ap obtained by solving (p+1) nor-
mal equations, is given by
X n X n
RSS = 2
ei = (yi − ŷi )2 , where ŷi = aˆ0 + aˆ1 xi + aˆ2 x2i + · · · + aˆp xpi
i=1 i=1
n
X
= (yi − aˆ0 − aˆ1 xi − · · · − aˆp xpi )(yi − aˆ0 − aˆ1 xi − · · · − aˆp xpi )
i=1
n
X n
X
= (yi − aˆ0 − aˆ1 xi − · · · − aˆp xpi )yi − aˆ0 (yi − aˆ0 − aˆ1 xi − · · · − aˆp xpi )
i=1 i=1
n
X n
X
−aˆ1 xi (yi − aˆ0 − aˆ1 xi − · · · − aˆp xpi ) − aˆ2 x2i (yi − aˆ0 − aˆ1 xi − · · · − aˆp xpi )
i=1 i=1
n
X
− · · · · · · − aˆp xpi (yi − aˆ0 − aˆ1 xi − · · · − aˆp xpi )
i=1
n
X
= (yi − aˆ0 − aˆ1 xi − · · · − aˆp xpi )yi (by normal equations)
i=1
n
X n
X n
X n
X n
X
= yi2 − aˆ0 yi − aˆ1 xi yi − aˆ2 x2i yi − · · · · · · − aˆp xpi yi
i=1 i=1 i=1 i=1 i=1
n
X
Note that, in addition to the calculations made earlier for solving normal equations, only yi2 is
i=1
required for the determination of RSS.
Similarly, for fitting a multilinear equation with m independent variables Y = bˆ0 + bˆ1 X1 +
bˆ2 X2 + · · · + bˆm Xm , where bˆ0 , bˆ1 , bˆ2 , . . . , bˆm are least square estimates of b0 , b1 , b2 , . . . , bm obtained by
solving (m+1) normal equations, a measure of goodness of fit can be given by
n
X n
X
RSS = e2i = (yi − ŷi )2 , where ŷi = bˆ0 + bˆ1 x1i + bˆ2 x2i + · · · + bˆm xmi
i=1 i=1
n
X
= (yi − bˆ0 − bˆ1 x1i − · · · − bˆm xmi )(yi − bˆ0 − bˆ1 x1i − · · · − bˆm xmi )
i=1
n
X n
X
= (yi − bˆ0 − bˆ1 x1i − · · · − bˆm xmi )yi − bˆ0 (yi − bˆ0 − bˆ1 x1i − · · · − bˆm xmi )
i=1 i=1
n
X n
X
−bˆ1 x1i (yi − bˆ0 − bˆ1 x1i − · · · − bˆm xmi ) − bˆ2 x2i (yi − bˆ0 − bˆ1 x1i − · · · − bˆm xmi )
i=1 i=1
n
X
− · · · · · · − bˆm xmi (yi − bˆ0 − bˆ1 x1i − · · · − bˆm xmi )
i=1
n
X
= (yi − bˆ0 − bˆ1 x1i − · · · − bˆm xmi )yi (by normal equations)
i=1
Xn n
X n
X n
X n
X
= yi2 − bˆ0 yi − bˆ1 x1i yi − bˆ2 x2i yi − · · · · · · − bˆm xmi yi
i=1 i=1 i=1 i=1 i=1

Here also, we can see that, in addition to the calculations made earlier for solving normal equations,
Xn
only yi2 is required for the determination of RSS.
i=1
We see that RSS is actually the measure of sum of square errors (SSE). A small value of RSS
denotes that an equation fitted to the given data set is good. In case of polynomial fitting, it can
be shown that RSS is a non-increasing function of the degree of the polynomial to be fitted. So,
RSS reduces with the increase of the degree of polynomial. Again, the RSS will decrease with the
increase of the number of explanatory variables in case of multilinear fitting. By doing this we can
minimize the error upto a certain threshold.

• Fitting of equation when it is not linear in parameters :

We can use least square method for fitting transformed equation using the transformed data
if the empirical relation can be made linear in parameters with suitable transformations.
1. To fit Y = abX (exponential equation) on n paired observations (xi , yi ), i = 1, 2, . . . , n.
Here, we take logarithm to both sides of the equation and get
log Y = log a + X log b
⇒ Z = A + BX, where A = log a, B = log b and Z = log Y

We fit Z = A + BX to the transformed data (xi , zi ), i = 1, 2, . . . , n; where zi = log yi ,

i = 1, 2, . . . , n. Suppose, Â and B̂ are the least squares estimates of A and B. Then we
get the estimates of a and b as â = eÂ and b̂ = eB̂ respectively and get the fitted equation as

Y = âb̂X
APPLICATION OF LEAST SQUARES METHOD IN REGRESSION :

Our main objective is to show how the least squares method is applied in the regression anal-
ysis to get the regression equation rather than doing a whole regression analysis. In this context, we
will discuss only two types of regression model as follows:
(i) Simple Linear Regression
(ii) Multiple Linear Regression
We will not discuss the polynomial regression, since the problem of polynomial regression can be
extended from the problem of simple linear regression. Actually, simple linear regression is a partic-
ular form of polynomial regression. Again, the problem of exponential regression can be reduced to
a simple linear regression problem. So,we will discuss these two regression models and will compare
them with an example. Before starting that discussion, we first need to know what is simple linear
regression and what is multiple linear regression.
1. Simple Linear Regression :
Definition : Simple Linear Regression is a statistical technique that uses only one explanatory
variable to predict the outcome of a response variable. The goal of simple linear regression
(SLR) is to model the relationship between an explanatory and a response variable.
For example, we can think of the height of a human body as the explanatory variable and weight
as the response variable. Then we could try to predict the weight on the basis of height. Some
more such examples are:
(i) Production and sale of a large business house,
(ii) Age and blood pressure of a human body,
(iii) Weight of a new born baby and age of the mother, etc.
A simple linear regression equation can be taken as Y = a0 + a1 X, where Y is the dependent or
response variable and X is the independent or explanatory variable. Here, we try to estimate
the value of Y on the basis of X. Given a set of paired observations on the variables X and Y ,
we can use the least square method to get the estimates of a0 and a1 as aˆ0 and aˆ1 respectively
and obtain the simple linear regression equation of Y on X as Y = aˆ0 + aˆ1 X. (When we take
the regression equation in such a way that Y is a function of X, i.e. we take Y as the response
variable and X as the explanatory variable, then we call the regression equation as Y on X. In
the reverse, we call it as the regression equation of X on Y .)
2. Multiple Linear Regression :
Definition : Multiple Linear Regression is a statistical technique that uses several explanatory
variables to predict the outcome of a response variable. The goal of multiple linear regression
(MLR) is to model the relationship between the explanatory and response variables.
It is often the case that a response variable may depend on more than one explanatory variable.
For example, human weight could reasonably be expected to depend on both the height and the
age of the person. Furthermore, possible explanatory variables often co-vary with one another
(e.g. sea surface temperatures and sea-level pressures). This makes it impossible to subtract
out the effects of the factors separately by performing successive linear regressions for each
individual factor. It is necessary in such cases to perform multiple regression defined by an
extended linear model.
A multiple linear regression equation can be taken as Y = b0 + b1 X1 + b2 X2 + · · · + bm Xm , where
Y is the dependent or response variable and X1 , X2 , . . . , Xm are the independent or explanatory
variables. Here, we try to estimate the value of Y on the basis of X1 , X2 , . . . , Xm . Given a set
of tuples of observations on the variables X1 , X2 , . . . , Xm and Y , we can use the least square
method to get the estimates of b0 , b1 , . . . , bm as bˆ0 , bˆ1 , . . . , bˆm respectively and obtain the multiple
linear regression equation as Y = bˆ0 + bˆ1 X1 + bˆ2 X2 + · · · + bˆm Xm .
• Elaboration with an Example :

Here, we have a dataset (Table 1) on systolic blood pressure of 11 persons with their ages
(in years) and weight (in pounds). In this context, we will deal with this dataset and will try to
obtain a simple and a multiple linear regression equation to obtain an estimate of the systolic blood
pressure of a person with respect to age (in case of simple linear regression) and with respect to
both age and weight (in case of multiple linear regression). Before that, we need to clear our concept
about systolic blood pressure.

What is Systolic Blood Pressure?

Systolic blood pressure is the amount of pressure that blood exerts on vessels while the heart
is beating. In a blood pressure reading (such as 120/80, this is the normal value), it is the number
on the top. If the top and bottom blood pressures are both too high, a person is said to have high
blood pressure. If only the top number is higher than 140, the person has a condition called isolated
systolic hypertension.

There are many physical factors, such as age, weight, diet, exercise, disease, drugs or alco-
hol etc. that influence systolic blood pressure. Here, we consider the first two physical factors
only.
i yi x1i x2i
1 132 52 173
2 143 59 184
3 153 67 194
4 162 73 211
5 154 64 196
6 168 74 220
7 137 54 188
8 149 61 188
9 159 65 207
10 128 46 167
11 166 72 217
Table 1 : Showing the data on systolic blood pressure (yi ) with age in years (x1i ) and weight in pounds (x2i )
corresponding to ith person of total 11 persons

At first, we fit a simple linear regression equation to the data to estimate systolic blood pressure
(BP) with respect to age only. For this, we take our regression equation to be fitted as Y = a0 +a1 X1 ,
where Y is the response variable denoting the systolic BP and X1 is the explanatory variable denoting
the age (in years).
Now, by Least Squares Method, for fitting Y = a0 + a1 X1 , a simple linear equation, we have
the normal equations in matrix form as:
 Xn   X n 
n x1i   yi 
 a0

i=1  i=1
= X  ⇔ M a = c, say
 
n n n
a
 X 
 X
2  1  
x1i x1i x1i yi
i=1 i=1 i=1

Here, n = 11
Using scilab, we get
n
X n
X n
X n
X
x1i = 687, x21i = 43737, yi = 1651, x1i yi = 104328
i=1 i=1 i=1 i=1

11 687 1651
∴ M= and c =
687 43737 104328
We get the estimate of a as

aˆ0 −1 58.705515
â = =M c=
aˆ1 1.4632305
Hence, the fitted equation is Y = 58.705515 + 1.4632305 X1
The estimated values of Y obtained by this equation are given below (in Table 2):
i yi ŷi simple ei = yi − ŷi simple
1 132 134.7935 -2.7934997
2 143 145.03611 -2.0361129
3 153 156.74196 -3.7419567
4 162 165.52134 -3.5213395
5 154 152.35227 1.6477347
6 168 166.98457 1.0154301
7 137 137.71996 -0.7199606
8 149 147.96257 1.0374261
9 159 153.8155 5.1845043
10 128 126.01412 1.9858831
11 166 164.05811 1.941891
Table 2 : Showing the observed (yi ) and expected (yˆi simple ) value on systolic BP with the error (ei = yi − yˆi simple )
due to this fitting corresponding to ith person of total 11 persons

From Table 2, considering the errors, we can say that the fitted simple linear regression equation to
the observed data is moderately good. We compute RSS (Residual Sum of Squares) for comparison
with other models as:
X n n
X
RSSsimple = e2i = (yi − ŷi simple )2 = 78.285949
i=1 i=1

Now, we fit a multiple linear regression equation to the data to estimate systolic BP with respect
to both age and weight. For this, we take our regression equation to be fitted as Y = b0 +b1 X1 +b2 X2 ,
where Y is the response variable denoting the systolic BP, X1 and X2 are the explanatory variables
denoting the age (in years) and the weight (in pounds) respectively.
Now, by Least Squares Method, for fitting Y = b0 + b1 X1 + b2 X2 , a multiple linear equation,
we have the normal equations in matrix form as:

X n X n  
X n 
 n x1i x2i   yi 
 i=1 i=1     i=1 
 b0
 n n n n
 X X X  X 
2
  

 x 1i x 1i x 1i x 2i


 b 1
 = 
 x 1i y i
 ⇔ N b = d, say

 i=1 i=1 i=1  b2  i=1 
 X n Xn X n   Xn 
x2i x2i x1i x22i x2i yi
   
i=1 i=1 i=1 i=1
n
X n
X n
X n
X
Using scilab, with the previous calculations of x1i , x21i , yi , x1i yi , we get
i=1 i=1 i=1 i=1
n
X n
X n
X n
X n
X
x2i = 2146, x1i x2i = x2i x1i = 135530, x22i = 421708, x2i yi = 324401
i=1 i=1 i=1 i=1 i=1
   
11 687 2146 1651
∴ N =  687 43737 135530  and d =  104328 
2146 135530 421708 324401
We get the estimate of b as
 ˆ   
b0 31.60052
b̂ =  bˆ1  = N −1 d =  0.8663002 
bˆ2 0.3300308

Hence, the fitted equation is Y = 31.60052 + 0.8663002 X1 + 0.3300308 X2

The estimated values of Y obtained by this equation are given below (in Table 3):

i yi ŷi multiple ei = yi − ŷi multiple

1 132 133.74345 -1.7434543
2 143 143.43789 -0.4378943
3 153 153.6686 -0.6686038
4 162 164.47693 -2.4769282
5 154 151.72976 2.2702353
6 168 168.31351 -0.3135054
7 137 140.42652 -3.4265163
8 149 146.49062 2.5093821
9 159 156.2264 2.7735967
10 128 126.56547 1.4345317
11 166 165.92084 0.0791566
Table 3 : Showing the observed (yi ) and expected (yˆi multiple ) value on systolic BP with the error (ei = yi − yˆi multiple )
due to this fitting corresponding to ith person of total 11 persons

From Table 3, considering the errors, we can say that the fitted multiple linear regression equation
to the observed data is very good. We compute RSS (Residual Sum of Squares) for comparison with
other models as: n n
X X
2
RSSmultiple = ei = (yi − ŷi multiple )2 = 42.860841
i=1 i=1

Now, comparing RSSsimple and RSSmultiple , we can clearly say that for the observed data,
multiple linear regression equation fitting is better than simple linear regression equation fitting. This
implies that we can estimate the systolic BP better taking into consideration the weight alongwith
the age than taking the age only.

From the above example, we can conclude that for estimating the value of response variable
taking into consideration several explanatory variables is better than taking only one explanatory
variable. The RSS will decrease with the increase of the number of explanatory variables but that
does not mean that we can increase the number of explanatory variables as much as we want because
the increasing of the number of explanatory variables is more time, money and labour consuming.
CONCLUSION :

Regression analysis is itself a huge subject of discussion and subject to various considerations
also. It is important to state few points about this, such as:
X While we are doing some linear regression problem, we may deal with simple linear regression or
multiple linear regressions. In a model, different factors have different extent of effects, so if we
consider larger number of variables in the model we would be able to explain larger amount of
information for the dependent variable. So it is clear that error is inversely related with number
of factors taken into account of the model.
X Here, in this project, we have deliberately considered that a mathematical model can be fitted
on some data set and then we have shown that there exist various types of mathematical model
and of course those models have different extent of fitting with the original data, whereas in
regression analysis, there is another notion called inferential problem where we test whether
the effect of a particular factor is present in the problem or not. Then if some factors have no
effect on a particular problem, we drop those effects when we fix the model. And then we go
on with the important factors but there is some technical problems related to Statistics that
should be taken in to account. While we test the presence of a factor’s effect it may happen
that our test gives the conclusion that there is no significant effect of a factor in the model but
we should practically think about it and then make a conclusion. For an example, if we are to
fit a regression model for predicting the cut off marks for admission in an engineering college,
taking marks of the students of that college in different subjects, it may happen that someone
is interested to test whether a particular subject has significant effect in the model or not. For
that a test is conducted for cross validation. In practice, testing the effect of a particular subject
both of English and Mathematics, the test may give the result that there is no significant effect
of that subject in each case but we should accept the result for English and reject the result for
Mathematics to make the model appropriate.
X Another problem related to regression theory is the problem of predicting a value of a dependent
variable while the independent variable is a very unlikely value for the set, on which the model
is fitted. In this case, though the model is well fitted with observed data so far, it may give high
error for the unlikely value of the independent variable. For an example, if a regression model
is fitted on heights and weights for some adults, where weight is to be predicted for some given
height, and the data set for height lies between 5 feet to 5.11 feet. Then if we want to predict
the weight while the height is above 7 feet, it may results in high error.
X Sometimes it may happen that a regression model can be fitted on some data set, but in reality
there is no meaning of fitting such a model. For an example, we may construct some linear
relationship between sizes of shoes with their I.Q level, and may be the model is well fitted with
the original data set, but actually there is no relationship between these two.
So, for regression analysis, we should take various factor in to our account which are related
to the problem, while we are to fit an appropriate model, otherwise it will be worthless, and the
decision should be taken with practical ideas about the field of interest. So, it is convenient to state
a quotation at the end: “I am in full favor of keeping dangerous weapons from the fools. Let’s start
with STATISTICS.”
• References :
– Fundamentals of Statistics (Volume I), Eighth Edition (2008) [The World Press Private
Limited, Kolkata] – A. M. Gun, M. K. Gupta, B. Dasgupta.
– Linear Algebra and Its Applications, Fourth Edition – Gilbert Strang.
– Statistical Inference, Second Edition – George Casella, Roger L. Berger.
– Least squares - Wikipedia, the free encyclopedia
weblink: en.wikipedia.org/wiki/Least_squares
– Linear regression - Wikipedia, the free encyclopedia
weblink: en.wikipedia.org/wiki/Linear_regression
– Regression analysis - Wikipedia, the free encyclopedia
weblink: en.wikipedia.org/wiki/Regression_analysis
– Simple linear regression - Wikipedia, the free encyclopedia
weblink: en.wikipedia.org/wiki/Simple_linear_regression
– Multiple Linear Regression
weblink: www.stat.yale.edu/Courses/1997-98/101/linmult.htm
– Multiple Linear Regression (MLR) Definition - Investopedia
weblink: www.investopedia.com/terms/m/mlr.asp
– Data source for the elaborate example:
weblink: http://college.cengage.com/mathematics/brase/understandable_stat-
istics/7e/students/datasets/mlr/frames/mlr02.html

Biology Practical (1) SSC
No ratings yet
Biology Practical (1) SSC
24 pages
08 Curvefitting w Interpolation (1)
No ratings yet
08 Curvefitting w Interpolation (1)
64 pages
P&S unit 2
No ratings yet
P&S unit 2
42 pages
Nursing Care of Children Principles and Practice 4th Edition James Test Bank pdf download
100% (3)
Nursing Care of Children Principles and Practice 4th Edition James Test Bank pdf download
43 pages
Canadian Triage Acuity Scale CTAS Interactive Quick Look Booklet
No ratings yet
Canadian Triage Acuity Scale CTAS Interactive Quick Look Booklet
28 pages
Cmitalk
No ratings yet
Cmitalk
167 pages
REg03
No ratings yet
REg03
39 pages
MAF3821 2024 Part1
100% (1)
MAF3821 2024 Part1
35 pages
(Original PDF) Pathophysiology - E-Book 6th Edition 2024 Scribd Download
100% (1)
(Original PDF) Pathophysiology - E-Book 6th Edition 2024 Scribd Download
55 pages
Python Notes
No ratings yet
Python Notes
25 pages
Cardiac Science Product Catalog
No ratings yet
Cardiac Science Product Catalog
32 pages
Diab Module 7
No ratings yet
Diab Module 7
36 pages
Yuvita Fitrianti NE1814201110081
No ratings yet
Yuvita Fitrianti NE1814201110081
3 pages
Big O
No ratings yet
Big O
28 pages
Nursing Process Paper
100% (1)
Nursing Process Paper
13 pages
L7-CurveFitting(LeastSquaresRegression)
No ratings yet
L7-CurveFitting(LeastSquaresRegression)
45 pages
DA Unit-3
No ratings yet
DA Unit-3
11 pages
CA2123 Lecture 9 11
No ratings yet
CA2123 Lecture 9 11
119 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Submission For The Reclassification of A Medicine: Topical Minoxidil
No ratings yet
Submission For The Reclassification of A Medicine: Topical Minoxidil
33 pages
ECH 3128 Topic 6 Curve Fitting 1
No ratings yet
ECH 3128 Topic 6 Curve Fitting 1
22 pages
Regression
No ratings yet
Regression
60 pages
Manejo Odontologico Del Paciente Hipertenso
No ratings yet
Manejo Odontologico Del Paciente Hipertenso
14 pages
Least Squares Method
No ratings yet
Least Squares Method
36 pages
Chapter2 Regression SimpleLinearRegressionAnalysis
No ratings yet
Chapter2 Regression SimpleLinearRegressionAnalysis
41 pages
2022 Eng IAT
No ratings yet
2022 Eng IAT
30 pages
3.2 Least Square and Polynomial Regression
No ratings yet
3.2 Least Square and Polynomial Regression
39 pages
MAFE208IU-L6 - Least Squares Regression
No ratings yet
MAFE208IU-L6 - Least Squares Regression
45 pages
unit-5 -notes
No ratings yet
unit-5 -notes
41 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
6 Multiple Regression
No ratings yet
6 Multiple Regression
26 pages
Linear Ization 1
No ratings yet
Linear Ization 1
24 pages
An Introduction and Guide To Effective Doppler Assessment
No ratings yet
An Introduction and Guide To Effective Doppler Assessment
6 pages
Vital Signs 2021
No ratings yet
Vital Signs 2021
59 pages
Unscheduled Visit - SynAct-CS002
No ratings yet
Unscheduled Visit - SynAct-CS002
11 pages
Linear Algebra Spring Project 2024099270 Chominhyeok
No ratings yet
Linear Algebra Spring Project 2024099270 Chominhyeok
4 pages
NSTP Reviewer 1
No ratings yet
NSTP Reviewer 1
8 pages
GS2024_QP_Phys_IPhD
No ratings yet
GS2024_QP_Phys_IPhD
40 pages
Book 9 PDF
No ratings yet
Book 9 PDF
33 pages
Me310 5 Regression PDF
No ratings yet
Me310 5 Regression PDF
15 pages
ML Lec-3
No ratings yet
ML Lec-3
11 pages
4-Curve Fitting and Interpolation
No ratings yet
4-Curve Fitting and Interpolation
48 pages
M Iii 118 127
No ratings yet
M Iii 118 127
10 pages
Least Square Regression: Numerical Methods ECE 410
No ratings yet
Least Square Regression: Numerical Methods ECE 410
44 pages
Regression Analysis Material
No ratings yet
Regression Analysis Material
12 pages
1468564504EText (CH 3, M 1
No ratings yet
1468564504EText (CH 3, M 1
16 pages
MAFE208IU-L6 - Least Squares Regression
No ratings yet
MAFE208IU-L6 - Least Squares Regression
47 pages
Nursing Informatics
No ratings yet
Nursing Informatics
3 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
PS Unit - Iv
No ratings yet
PS Unit - Iv
19 pages
TACN
No ratings yet
TACN
16 pages
Multivariate-Polynomial-Regression
No ratings yet
Multivariate-Polynomial-Regression
4 pages
Least Squares Curve Fitting: Numerical Methods
No ratings yet
Least Squares Curve Fitting: Numerical Methods
39 pages
Chapter 5 Fitting Other Models
No ratings yet
Chapter 5 Fitting Other Models
22 pages
11 - Management Post Operative Low Cardiac Output Syndrome
100% (1)
11 - Management Post Operative Low Cardiac Output Syndrome
46 pages
Curve Fitting - Lecturers - 2
No ratings yet
Curve Fitting - Lecturers - 2
21 pages
Curve Fitting: There Are Two General Approaches For Curve Fitting
No ratings yet
Curve Fitting: There Are Two General Approaches For Curve Fitting
63 pages
ECON 5350 Class Notes Least Squares: 2.1 The Problem
No ratings yet
ECON 5350 Class Notes Least Squares: 2.1 The Problem
4 pages
Mit18 100af20 hw1
No ratings yet
Mit18 100af20 hw1
2 pages
L4 Emt 2101 Engineering Mathematics Iii
No ratings yet
L4 Emt 2101 Engineering Mathematics Iii
25 pages
Curve Fitting
No ratings yet
Curve Fitting
21 pages
Chapter Four-2
No ratings yet
Chapter Four-2
30 pages
Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management
No ratings yet
Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management
28 pages
Physical Diagnosis - Learning Objectives
No ratings yet
Physical Diagnosis - Learning Objectives
4 pages
Proposal 2
No ratings yet
Proposal 2
14 pages
imc2016-day1-questions
No ratings yet
imc2016-day1-questions
1 page
5_6122829974731751861
No ratings yet
5_6122829974731751861
8 pages
Lesson 1. Pediatric Ward
No ratings yet
Lesson 1. Pediatric Ward
9 pages
Paper On Polynomial Regression
No ratings yet
Paper On Polynomial Regression
7 pages
5.1 Frog Embryology Slides
No ratings yet
5.1 Frog Embryology Slides
10 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Chapter 2 SOLVING NONLINEAR EQUATION 3
No ratings yet
Chapter 2 SOLVING NONLINEAR EQUATION 3
14 pages
Mece-00 1:econometric Methods: Course Code: Asst
No ratings yet
Mece-00 1:econometric Methods: Course Code: Asst
22 pages
Association Between The 8-Item Morisky Medication Adherence Scale (MMAS-8) and Blood Pressure Control
No ratings yet
Association Between The 8-Item Morisky Medication Adherence Scale (MMAS-8) and Blood Pressure Control
10 pages
Linear Regression Course
No ratings yet
Linear Regression Course
22 pages
Zykeria Hunt Hamilton - Review Sheet Ch. 16 Vital Signs
No ratings yet
Zykeria Hunt Hamilton - Review Sheet Ch. 16 Vital Signs
1 page
PROMYS India 2024 Application
100% (1)
PROMYS India 2024 Application
4 pages
Chapter Two
No ratings yet
Chapter Two
13 pages
Curve Fitting and Interpolation
No ratings yet
Curve Fitting and Interpolation
14 pages
NA 1.CurveFitting
No ratings yet
NA 1.CurveFitting
12 pages
Full Medical Examination Form For Foreign Workers: Work Pass Division
No ratings yet
Full Medical Examination Form For Foreign Workers: Work Pass Division
1 page
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
No ratings yet
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
30 pages
Math - de Moivre Theorem by DR D K Yadav As On 30.03.20 Min
No ratings yet
Math - de Moivre Theorem by DR D K Yadav As On 30.03.20 Min
23 pages
imc2018-day1-solutions (1)
No ratings yet
imc2018-day1-solutions (1)
3 pages
Group 2: Magallon, Buenaventura Sallaberger, Jayrahnicole Agliam, Jessica May Diño, Fortunato
No ratings yet
Group 2: Magallon, Buenaventura Sallaberger, Jayrahnicole Agliam, Jessica May Diño, Fortunato
12 pages
Citizen Blood Pressure Monitor CH-452
No ratings yet
Citizen Blood Pressure Monitor CH-452
14 pages
Stats Main
No ratings yet
Stats Main
18 pages
Curve Fitting: Fitting A Straight Line
No ratings yet
Curve Fitting: Fitting A Straight Line
3 pages
B.Sc. (I) Sta Tistics Paper - III Unit - III (Part-I) The Method of Le Ast Squares
No ratings yet
B.Sc. (I) Sta Tistics Paper - III Unit - III (Part-I) The Method of Le Ast Squares
8 pages
Curve Fitting
No ratings yet
Curve Fitting
48 pages
B40i Spec Modules ENG
No ratings yet
B40i Spec Modules ENG
5 pages
Modified Braden Q Scale
100% (1)
Modified Braden Q Scale
1 page
Physical Education Yoga and Life Style Notes
No ratings yet
Physical Education Yoga and Life Style Notes
10 pages
Principles Physiotherapy
100% (2)
Principles Physiotherapy
259 pages
Curve Fitting, B-Splines & Approximations
No ratings yet
Curve Fitting, B-Splines & Approximations
14 pages
Lecture25 Ps
No ratings yet
Lecture25 Ps
10 pages
Least Square Regression
No ratings yet
Least Square Regression
13 pages
Performance of Differential Evolution Method in Least Squares Fitting of Some Typical Nonlinear Curves
No ratings yet
Performance of Differential Evolution Method in Least Squares Fitting of Some Typical Nonlinear Curves
21 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

regression

Uploaded by

regression

Uploaded by

M.Sc.

• What is Regression Analysis?

• What is the objective?

• What is the procedure of obtaining a regression equation?

METHOD OF LEAST SQUARES :

Suppose (xi , yi ), i = 1, 2, . . . . . . , n be n paired observations on two variables X and Y where

Normal equations are :

(∗) : (p+1) normal equations for fitting a pth degree polynomial.

Matrix Representation of Normal Equations :

Given n paired observations (xi , yi ), i = 1, 2, . . . , n ; we can obtain the elements of M and c

Here, if we take p = 1, we get the linear equation as Y = aˆ0 + aˆ1 X

2. To fit Y = b0 + b1 X1 + b2 X2 + · · · + bm Xm (multiple linear equation) on n sets of

Normal equations are :

Matrix Representation of Normal Equations :

Ways to solve the linear system of equations N b = d :

• A measure of goodness of fit :

• Fitting of equation when it is not linear in parameters :

We fit Z = A + BX to the transformed data (xi , zi ), i = 1, 2, . . . , n; where zi = log yi ,

What is Systolic Blood Pressure?

Hence, the fitted equation is Y = 31.60052 + 0.8663002 X1 + 0.3300308 X2

i yi ŷi multiple ei = yi − ŷi multiple

You might also like