STB1003 - Unit-3 BSC
STB1003 - Unit-3 BSC
BA/BSc I Semester
by
2 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Linear Regression Analysis
Regression analysis is the next step up after correlation; it is used when we want to predict
the value of a variable based on the value of another variable. In this case, the variable we are
using to predict the other variable's value is called the independent variable or sometimes the
predictor variable. The variable we are wishing to predict is called the dependent variable or
sometimes the outcome variable.
Assumptions
The dependent and independent variables should be quantitative.
Variables are approximately normally distributed.
There is a linear relationship between the two variables.
Lines of Regression
Line of regression is the line which gives the best estimate to the value of one variable for
any specific value of another variable. Hence line of regression is the line of best fit and
obtained by the principle of least square method.
Simple Linear Regression Analysis
Let us suppose that in the bivariate distribution , is the dependent and
is the independent variable and the line of regression of on is
Hence according to the principle of least square, the normal equations for estimating and
are
Y n X (1)
XY X X 2 (2)
If regression line passes through , then
(3)
Let
1
XY Cov( X , Y )
n
XY X Y
1
or
n
XY XY X Y (4)
1
Also, X2 X 2 X 2
n
1
or
n
X 2 X2 X 2 (5)
Now divide (2) by , then
1 1 1
n
XY . X X 2
n n
XY X Y X ( X2 X 2 ) (6)
Now multiply (3) by X , we get
X Y X X 2 (7)
After solving (6) and (7), we have
3 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
X Y
XY
ˆ XY n
X2 X 2
X
2
n
ˆ Y X
1
n
Y ˆ X ,
where is the slope of regression line and is the intercept.
Regression Coefficients
We know that, the line of regression of on is
Here the slope of regression line which is also called regression coefficient can be defined
in another way, as below:
Suppose regression line passes through , then
Thus, we have
or
or (Since )
or ,
where .
4 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Coefficient of Determination
The coefficient of determination (denoted by ) is a key output of regression analysis. It is
interpreted as the proportion of the variance in the dependent variable that is predictable from
the independent variable.
With linear regression, the coefficient of determination is equal to the square of the
correlation coefficient between and variables.
An of 0 means that the dependent variable cannot be predicted from the
independent variable.
An of 1 means the dependent variable can be predicted without error from the
independent variable.
An between 0 and 1 indicates the extent to which the dependent variable is
predictable. An of 0.10 means that 10 percent of the variance in Y is predictable
from X; an of 0.20 means that 20 percent is predictable; and so on.
Therefore,
or
5 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Property 3: If one of the regression coefficients is greater than unity then other must be less
than unity.
Suppose
or
or
or
or
or
which implies
or
Property 5: Regression coefficients are independent of change of origin but not the scale.
Let
and
where .
Then
Therefore
6 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Similarly,
or Y X ,
7 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
where
X Y 32.5 679
XY 1842.2
ˆ byx XY n 12 0.22
X2 X 2 32.5 32.5
102.47
X 2
n 12
ˆ Y ˆ X
1
n
Y ˆ X 121 (679 0.22 32.5) 55.99
Therefore, estimated regression equation Y on X is
Yˆ 55.99 0.22 X (*)
X X bxy (Y Y )
X ( X bxyY ) bxyY
or X * *Y ,
where
X Y 32.5 679
XY XY 1842.2
ˆ * bxy n 12 0.02
Y2 Y
2 679 679
38581
Y n
2
12
ˆ * X ˆ *Y
1
n
X ˆ Y 121 (32.5 0.02 679) 1.58
*
a) The estimated sale when advertisement expenditure is Rs. 5.5 Lacs using (*) is
Yˆ 55.99 0.22 X 55.99 0.22 5.5 57.2 Lacs.
8 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
c) Standard error (SE) of estimates
X X̂ ( X Xˆ ) 2 Y Yˆ (Y Yˆ ) 2
1.0 2.68 2.82 55 56.21 1.46
1.2 2.78 2.50 60 56.25 14.03
1.5 2.58 1.17 50 56.32 39.94
2.0 2.74 0.55 58 56.43 2.46
2.2 2.68 0.23 55 56.47 2.17
2.5 2.74 0.06 58 56.54 2.13
3.0 2.80 0.04 61 56.65 18.92
3.1 2.78 0.10 60 56.67 11.08
3.8 2.68 1.25 55 56.83 3.33
4.0 2.70 1.69 56 56.87 0.76
4.0 2.58 2.02 50 56.87 47.20
4.2 2.80 1.96 61 56.91 16.70
14.38 160.19
Standard Error of estimates for sale is
1 160.19
SE (Y Yˆ )2 3.65
n 12
Standard Error of estimates for expenditure on advertisement is
1 14.38
SE ( X Xˆ )2 1.09
n 12
Practice Exercises
1. Given that variance of a variable is 9 and the regression equations are
8 X 10Y 66 0
40 X 18Y 214
Find (i) mean value of and (ii) rXY and Y .
2. Find the most likely price of a commodity in Mumbai corresponding to the price of Rs. 70
at Delhi from the following
(Delhi) (Mumbai)
Average Price 65 67
Standard Deviations 2.5 3.5
Correlation coefficient between prices of commodity in two cities is 0.8
3. The following table gives the demand and price for a commodity for 6 days.
Price (Rs.) : 4 3 6 9 12 10
Demand (mds) : 46 65 50 30 15 25
i) Obtain the value of correlation coefficient.
ii) Develop the estimating regression equations.
iii) Compute the standard error of estimate.
9 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
iv) Predict Demand for price (Rs.) = 5, 8, and 11.
v) Compute coefficient of determination and give your comment on the distribution.
Since in view of (10), a b12.3 X 2 b13.2 X 3 X 1 0 , therefore (12) and (13) reduces to
Now after solving equation (14) and (15) for b12.3 and b13.2 , we get
bˆ12.3
y1 y2 y32 ( y1 y3 )( y2 y3 )
y22 y32 ( y2 y3 )2
and bˆ13.2
y1 y3 y22 ( y1 y2 )( y2 y3 ) .
y22 y32 ( y2 y3 )2
10 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Also â X1 bˆ12.3 X 2 bˆ13.2 X 3 . Thus in view of (8) the estimated regression line is
Xˆ 1 aˆ bˆ12.3 X 2 bˆ13.2 X 3 (16)
Regression Coefficients in Terms of Simple Correlation Coefficients
The variance of X 1 is given by
12 ( X1 X1 )2
1 y12
,
n n
y12 n 12 .
Similarly
y22 n 22 and y32 n 32 .
We know the correlation coefficient between X 1 and X 3 is given by
r12
( X 1 X 1 )( X 2 X 2 )
y1 y2
( X1 X1 )2 ( X 2 X 2 )2 y12 y22
or r12
y1 y2
n 1 2
y1 y2 n1 2 r12 .
Similarly,
y1 y3 n1 3r13
and y2 y3 n 2 3r23 .
Substituting the above values in equation (14) and (15), we get
1r12 b12.3 2 b13.2 3 r23 (17)
1
1.23 ( X1 Xˆ 1 )2 (21)
n
An alternative method of computing 1.23 in terms of simple correlation coefficients is given
by
11 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
1 r122 r132 r232 2r12 r13r23
1.23 1 (22)
(1 r232 )
12 Lecture notes by Dr. Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh