0% found this document useful (0 votes)
31 views12 pages

STB1003 - Unit-3 BSC

Statistics notes of regression analyses

Uploaded by

shoreznazeer2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views12 pages

STB1003 - Unit-3 BSC

Statistics notes of regression analyses

Uploaded by

shoreznazeer2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Depar tment of Statistics & O.R.

Aligarh Muslim University Aligarh

BA/BSc I Semester

Introduction to Statistics (STBMN 1003)

by

Dr. Haseeb Athar


Unit - 3
Regression Analysis
 Simple Linear Regression
 Multiple Linear Regression

2 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Linear Regression Analysis
Regression analysis is the next step up after correlation; it is used when we want to predict
the value of a variable based on the value of another variable. In this case, the variable we are
using to predict the other variable's value is called the independent variable or sometimes the
predictor variable. The variable we are wishing to predict is called the dependent variable or
sometimes the outcome variable.
Assumptions
 The dependent and independent variables should be quantitative.
 Variables are approximately normally distributed.
 There is a linear relationship between the two variables.

Lines of Regression
Line of regression is the line which gives the best estimate to the value of one variable for
any specific value of another variable. Hence line of regression is the line of best fit and
obtained by the principle of least square method.
Simple Linear Regression Analysis
Let us suppose that in the bivariate distribution , is the dependent and
is the independent variable and the line of regression of on is
Hence according to the principle of least square, the normal equations for estimating and
are
 Y  n    X (1)

 XY    X    X 2 (2)
If regression line passes through , then
(3)
Let
1
 XY  Cov( X , Y ) 
n
 XY X Y
1
or
n
 XY   XY  X Y (4)
1
Also,  X2   X 2  X 2
n
1
or
n
 X 2   X2  X 2 (5)
Now divide (2) by , then
1 1 1
n
 XY   .  X    X 2
n n
 XY  X Y   X   ( X2  X 2 ) (6)
Now multiply (3) by X , we get
X Y  X   X 2 (7)
After solving (6) and (7), we have

3 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
  X   Y 
  XY 
ˆ  XY  n
 X2   X 2
 X 
2
n

ˆ  Y   X 
1
n
 Y  ˆ  X  ,
where  is the slope of regression line and  is the intercept.
Regression Coefficients
We know that, the line of regression of on is

Here the slope of regression line  which is also called regression coefficient can be defined
in another way, as below:
Suppose regression line passes through , then

Thus, we have

or

or (Since )
or ,
where .

Similarly for the regression line of on , ie X   *   *Y


or
or ,
where .

Why two regression lines?


When there is a reasonable amount of scatter, we can draw two different regression lines
depending upon which variable we consider to be the most accurate. The first is a line of
regression of on , which can be used to estimate given . The other is a line of
regression of on , used to estimate given .

(Positive Correlation) (Negative Correlation)

4 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Coefficient of Determination
The coefficient of determination (denoted by ) is a key output of regression analysis. It is
interpreted as the proportion of the variance in the dependent variable that is predictable from
the independent variable.
 With linear regression, the coefficient of determination is equal to the square of the
correlation coefficient between and variables.
 An of 0 means that the dependent variable cannot be predicted from the
independent variable.
 An of 1 means the dependent variable can be predicted without error from the
independent variable.
 An between 0 and 1 indicates the extent to which the dependent variable is
predictable. An of 0.10 means that 10 percent of the variance in Y is predictable
from X; an of 0.20 means that 20 percent is predictable; and so on.

Standard Error (SE) of the Estimates


SE of the estimates is used to measure the reliability of the estimated regression equation and
the variability of the observed values around the regression line. If SE is more, then more
variability in data point around the regression line. If SE is zero, then all data points would
lie exactly on the regression line. The formula for calculating SE of the estimates is given by
1
SE   (Y  Yˆ )2 ,
n
where Y is observed value and Yˆ is the expected value.
Some important properties
Property 1: If there is a perfect correlation between the data (in other words, if all the points
lie on a straight line), then the two regression lines will be the same.
The line of regression of on is

In case of perfect correlation we have

Therefore,

or

which is required equation of on .


Thus in the case of perfect correlation both lines of regression coincide.

Property 2: Correlation coefficient is the geometric mean between the regression


coefficients.

5 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Property 3: If one of the regression coefficients is greater than unity then other must be less
than unity.
Suppose
or

We know that , thus

or

or

Property 4: Arithmetic mean of regression coefficients is greater than the correlation


coefficient. That is

Proof. Suppose that

or

or

which implies

or

which is always true.


Therefore,

Property 5: Regression coefficients are independent of change of origin but not the scale.

Let

and
where .
Then

Therefore

6 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Similarly,

This shows that regression coefficient is not independent of origin.


Further, if we put , then and
Thus, it is independent of origin.
Example 1: The data related to advertisement expenditure and sales for 12 months of a car
company are given in the following table
Advt. Expenditure
(in Lacs) 1.0 1.2 1.5 2.0 2.2 2.5 3.0 3.1 3.8 4.0 4.0 4.2
Sales (in Lacs) 55 60 50 58 55 58 61 60 55 56 50 61

Estimate the regression lines and then


a) Estimate the values of sales when company decided to spend Rs 5,50,000 on advertising
during the next quarter.
b) Estimate the amount to be expended on advertisement to achieve sale target 75 Lacs.
c) Compute the standard error of estimates.
Solution:
Advt. Expend Sales
(X ) (Y ) X2 Y2 XY

1.0 55 1.00 3025 55


1.2 60 1.44 3600 72
1.5 50 2.25 2500 75
2.0 58 4.00 3364 116
2.2 55 4.84 3025 121
2.5 58 6.25 3364 145
3.0 61 9.00 3721 183
3.1 60 9.61 3600 186
3.8 55 14.44 3025 209
4.0 56 16.00 3136 224
4.0 50 16.00 2500 200
4.2 61 17.64 3721 256.2
32.5 679 102.47 38581 1842.2

The regression equation Y on X is given by


Y  Y  byx ( X  X )

This can also be written as


Y  (Y  byx X )  byx X

or Y    X ,

7 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
where
  X   Y  32.5  679
  XY  1842.2 
ˆ  byx  XY  n  12  0.22
 X2   X 2 32.5  32.5
102.47 
 X 2

n 12

ˆ  Y  ˆ X 
1
n
 Y  ˆ  X   121 (679  0.22  32.5)  55.99
Therefore, estimated regression equation Y on X is
Yˆ  55.99  0.22 X (*)

The regression equation X on Y is given by

X  X  bxy (Y  Y )

This can also be written as

X  ( X  bxyY )  bxyY

or X  *   *Y ,

where
  X   Y  32.5  679
 XY  XY  1842.2 
ˆ *  bxy   n  12  0.02
 Y2   Y
2 679  679
38581 
Y  n
2
12

ˆ *  X  ˆ *Y 
1
n
  X  ˆ Y   121 (32.5  0.02  679)  1.58
*

Therefore, estimated regression equation X on Y is

Xˆ  1.58  0.02Y (**)

a) The estimated sale when advertisement expenditure is Rs. 5.5 Lacs using (*) is
Yˆ  55.99  0.22 X  55.99  0.22  5.5  57.2 Lacs.

b) The amount to be expended on advertisement to achieve sale target 75 Lacs. can be


estimated by using (**) is given by
Xˆ  1.58  0.02Y  1.58  0.02  75  3.08 Lacs.

8 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
c) Standard error (SE) of estimates
X X̂ ( X  Xˆ ) 2 Y Yˆ (Y  Yˆ ) 2
1.0 2.68 2.82 55 56.21 1.46
1.2 2.78 2.50 60 56.25 14.03
1.5 2.58 1.17 50 56.32 39.94
2.0 2.74 0.55 58 56.43 2.46
2.2 2.68 0.23 55 56.47 2.17
2.5 2.74 0.06 58 56.54 2.13
3.0 2.80 0.04 61 56.65 18.92
3.1 2.78 0.10 60 56.67 11.08
3.8 2.68 1.25 55 56.83 3.33
4.0 2.70 1.69 56 56.87 0.76
4.0 2.58 2.02 50 56.87 47.20
4.2 2.80 1.96 61 56.91 16.70
14.38 160.19
Standard Error of estimates for sale is
1 160.19
SE   (Y  Yˆ )2   3.65
n 12
Standard Error of estimates for expenditure on advertisement is
1 14.38
SE   ( X  Xˆ )2   1.09
n 12
Practice Exercises
1. Given that variance of a variable is 9 and the regression equations are
8 X  10Y  66  0
40 X  18Y  214
Find (i) mean value of and (ii) rXY and  Y .
2. Find the most likely price of a commodity in Mumbai corresponding to the price of Rs. 70
at Delhi from the following
(Delhi) (Mumbai)
Average Price 65 67
Standard Deviations 2.5 3.5
Correlation coefficient between prices of commodity in two cities is 0.8
3. The following table gives the demand and price for a commodity for 6 days.
Price (Rs.) : 4 3 6 9 12 10
Demand (mds) : 46 65 50 30 15 25
i) Obtain the value of correlation coefficient.
ii) Develop the estimating regression equations.
iii) Compute the standard error of estimate.

9 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
iv) Predict Demand for price (Rs.) = 5, 8, and 11.
v) Compute coefficient of determination and give your comment on the distribution.

Multiple Linear Regression Analysis


Let us consider a distribution involving three random variables . Then
regression equation of on and is given as
X1  a  b12.3 X 2  b13.2 X 3 (8)
The coefficients and are known as the partial regression coefficients of on
and of on respectively.
Now according to the principle of least square, the normal equations for estimating a , b12.3
and b13.2 are
 X1  na  b12.3  X 2  b13.2  X 3 (i)

 X 1 X 2  a  X 2  b12.3  X 22  b13.2  X 2 X 3 (ii) (9)

 X 1 X 3  a  X 3  b12.3  X 2 X 3  b13.2  X 32 (iii)

Suppose regression line passes through ( X1 , X 2 , X 3 ) , then


X1  a  b12.3 X 2  b13.2 X 3 (10)
Now by subtracting (10) from (8), we get
( X1  X1 )  b12.3 ( X 2  X 2 )  b13.2 ( X 3  X 3 ) (11)
Let X 1  X 1  y1 , X 2  X 2  y2 and X 3  X 3  y3
Then equation (11) becomes
y1  b12.3 y2  b13.2 y3
Now substituting the values X 1  y1  X 1 , X 2  y2  X 2 and X 3  y3  X 3 in (ii) and (iii) of
the normal equations (9) after noting that  y1   ( X 1  X 1 )  0 ,  y2   ( X 2  X 2 )  0
and  y3   ( X 3  X 3 )  0 , we get

 y1 y2  b12.3  y22  b13.2  y2 y3  nX 2 (a  b12.3 X 2  b13.2 X 3  X 1 ) (12)

and  y1 y3  b12.3  y2 y3  b13.2  y32  nX 2 (a  b12.3 X 2  b13.2 X 3  X1 ) (13)

Since in view of (10), a  b12.3 X 2  b13.2 X 3  X 1  0 , therefore (12) and (13) reduces to

 y1 y2  b12.3  y22  b13.2  y2 y3 (14)

 y1 y3  b12.3  y2 y3  b13.2  y32 (15)

Now after solving equation (14) and (15) for b12.3 and b13.2 , we get

bˆ12.3 
 y1 y2  y32  ( y1 y3 )( y2 y3 )
 y22  y32  ( y2 y3 )2

and bˆ13.2 
 y1 y3  y22  ( y1 y2 )( y2 y3 ) .
 y22  y32  ( y2 y3 )2

10 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Also â  X1  bˆ12.3 X 2  bˆ13.2 X 3 . Thus in view of (8) the estimated regression line is
Xˆ 1  aˆ  bˆ12.3 X 2  bˆ13.2 X 3 (16)
Regression Coefficients in Terms of Simple Correlation Coefficients
The variance of X 1 is given by

12   ( X1  X1 )2  
1 y12
,
n n
  y12  n 12 .
Similarly
 y22  n 22 and  y32  n 32 .
We know the correlation coefficient between X 1 and X 3 is given by

r12 
 ( X 1  X 1 )( X 2  X 2 ) 
 y1 y2
 ( X1  X1 )2  ( X 2  X 2 )2    y12   y22 
or r12 
 y1 y2
n 1 2
  y1 y2  n1 2 r12 .
Similarly,
 y1 y3  n1 3r13
and  y2 y3  n 2 3r23 .
Substituting the above values in equation (14) and (15), we get
 1r12  b12.3  2  b13.2  3 r23 (17)

and  1r13  b13.2  3  b12.3  2 r23 (18)


After solving the equations (17) and (18) for the two partial regression coefficients, we get
 r  r r   
b12.3   12 132 23  1  (19)
 1  r23    2 
 r  r r   
and b13.2   13 122 23  1  (20)
 1  r23   3 
Also note that the standard error of estimate of the dependent variable X 1 from estimated
value X̂ 1 based estimated regression equation (16) is

1
 1.23   ( X1  Xˆ 1 )2 (21)
n
An alternative method of computing  1.23 in terms of simple correlation coefficients is given
by

11 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
1  r122  r132  r232  2r12 r13r23
 1.23   1 (22)
(1  r232 )

Problem 3: Suppose in a trivariate distribution , , ,


. Find (i) , (ii) , (iii) and (iv) .

12 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh

You might also like