2-Simple Linear Regression
2-Simple Linear Regression
Response Regressor
Yi 0 1 X i ui
Regression Random term
coefficients 1
Estimation Process
Regression Model Sample Data:
y = 0 + 1x +e x y
Regression Equation x1 y1
E(y) = 0 + 1x . .
Unknown Parameters . .
0, 1 xn yn
Estimated
b0 and b1 Regression Equation
provide estimates of ŷ b 0 b 1 x
0 and 1 Sample Statistics
b0, b1
2
Estimation Of Parameters In SLR by OLS
E (ui ) 0 , E (ui2 ) u2 , E (ui u j ) 0 & , ui N (0, u2 )
we k now that,
Yi 0 1 X i ui
ui Yi 0 1 X i
S ui2 (Yi 0 1 X I ) 2 ( I )
partially differentiate w.r.t 0 , 1
s
2 (Yi 0 1 X i )( 1)
0
s
2 (Yi 0 1 X i )( X i )
1
s s
0
0 1
Yi nb0 b1 X i (1)
X iYi b0 X i b1 X i 2 (2)
multiply eq.(1) by X i -multiply eq.(2) by n
afters olving , we get
n XY X Y XY ( X Y ) / n
b1 & b0 Y b1 X
n X ( X )
2 2
X 2 ( X ) 2 / n
3
Alternatives Forms For b0 and b1
xi yi xiYi
1 b1 b where xi X i X & yi Yi Y
xi2 xi2
1
xi
2 b1 wiYi where wi
xi2
S ( XY )
3 b1
S ( XX )
mx y ( X X ) 2
4 b1 whert mx x
mx x n
1 b0 Y b1 X
Yi
2 b0 X wY
i i
n
1
3 b0 wi X Yi where 'w i ' is the least square weights
i n 4
Some Important Results About Wi
• We assume that Xi values are fixed constant in repeated sampling
and hence Wi are fixed constant in repeating sampling.
xi
1) wi 0 where wi ,
i xi 2
xi xi X X
0
i x xi xi
2 2 2
i
2
1 x x 2
1
2) wi
2
wi 2
2 i i
xi 2 i xi
i x
2 2
i i 2 xi
xi xi 2
3) wi xi 1 where wi xi x 1
i x xi
2 i 2
i
5
Mean, Variance, Covariance and
Correlation of OLS estimators of
Bo & B1
6
MEAN of b1
As the least square estimates are the function of sample data, since the data
are likely to change from sample to sample as a result estimates will change
from one sample to another.
b1 wiYi wi ( 0 1 X i ui )
i i
0 wi 1 wi X i wi ui
i i i
b1 1 wi ui As wi 0 & w X i i 1
i i i
We know that
b1 1 wi ui
E (b1 ) 1 wi E (ui ) where wi is constant
E (b1 ) 1 as E (ui ) 0
7
Variance of b1
Var(b1 ) b1 E b1 E b1 E b1 1 E wi ui
2 2 2
E wi 2ui 2 2 wi w j ui u j
i j
wi 2 E (ui 2 ) 2 wi w j E (ui u j )
i j
1 1
2
0 where E (ui u j ) 0 E (ui )
2 2
wi
2
xi2 xi2
u u
u2
Var (b1 )
xi2
variation in X values smaller the variance of b1 and hence the greater the
precision with which b1 can be estimated
8
MEAN of bo
1
b0 Xwi Yi
n
1
Xwi 0 1 X i ui
n
1 1 1
0 1 X i ui Xwi 0 Xwi 1 X i Xwi ui
n n n
n 1 1
0 1 X i ui X 0 wi X 1 wi X i X wi ui
n n n
1
0 X 1 ui X X wi ui
n
1
0 Xwi E (ui )
n
E (b0 ) 0 where E (ui ) 0
9
Variance of bo
1
E b0 0 where b0 0 Xwi ui
2
n
2
1 1
E X wi ui b0 0 Xwi u
n n
1
E i ui Xwi i `
2
put
n
E i2 ui2 2 i j ui u j
i j
i2 E ui2 2 i j E ui u j
i j
i2 E ui2 i
2
u2
2
1
2
u Xwi
n
1 X
2
2
u X wi 2
2
wi
n n
1 X
2
2
u x wi 2
2
wi
n n
1 X
var b0 2
2
n x 10
Covariance of (b0, b1)
Cov(b0 , b1 ) E (b0 E (b0 )(b1 E (b1 ) where b0 0 Y b1 X 0 0 1 X u b1 X 0
E u (b1 1 ) X (b1 1 )(b1 1 ) u X (b1 1 )
1
E ui wi u i XE (b1 1 )
2
n
1
E wi u i 2 wi w j ui u j X
2
variance of b1
n i j
1 2
u wi X variance of b1
n
u2
X
i
x 2
The nature of covariance between bo & b1 depends upon the sign of mean of
X-values. If mean of x-values is positive, the covariance will be negative. Thus if
the slope coefficient is overestimated , the intercept coefficient will be 11
underestimated
Correlations between (b0, b1)
cov(b0 , b1 )
Correlation b0 , b1
var(b1 ) var(b0 )
u2 u2 1 X
where cov(b0 , b1 ) X var(b1 ) var(b0 ) u2 2
x2 x2 n x
u2 X u2
X
x2 n x2
Correlation b0 , b1
u 2
1 X 1 1 X
u2 2
u2
x 2
n x x n x 2
2
X
n x2 X
2 2
2
1 n2 X 2
n x 2
n 2
X 2 n X X
n ( x )
2
X X
n X 2 X 2 2 XX n 2 X 2 n X 2 n 2 X 2 2nX X 2 n 2 X 2
2
X
b b n X 2
01
12
Estimation The Variance Of The Disturbance Term
• Let ei be the estimate of ui made from the sample data
ei Yi Yˆi
0 1 X i ui b0 b1 X i whrere Yi 0 1 X i ui
ui (b0 0 ) (b1 1 ) X i Y 0 1 X u
(ui u ) (b1 1 ) X (b1 1 ) X i b0 Y b1 X
(ui u ) (b1 1 )( X i X ) b0 0 1 X u b1 X
ei (ui u ) (b1 1 ) xi b0 0 u (b1 1 ) X
n n
1 2 1 2
n u E ui 2 uiu j n u E ( ui ) E (uiu j )
2 2 2
n i j n n i j
1
n u2 ( Eui 2 ) E (uiu j ) 0 & Eui 2 u2
n
1 2
n u n u (n 1) u2
2
n
14
Nowconsider B
B E (b1 1 ) 2 xi 2 Estimation The Variance Of
u2 The Disturbance Term
xi E (b1 1 ) xi
2 2 2
u 2
xi 2
C E 2(b1 1 ) (ui u ) xi
xi
2 E (b1 1 ) xi ui u xi whrer b1 1 ui wi wi
xi 2
xi xi ui 2
2 E ui wi xi ui 2 E 2 i
u xiui 2 E
i
x x i
2
2
E x 2u 2 2 x x u u 2 x 2 E (u 2 ) 2 x x E (u u ) E (ui u j ) 0 Eui 2
xi 2
i j i j i i j
xi 2
i u u i j
i j i j
2 xi 2 u2
2 u2
xi 2
3) eiYˆi 0
proof
eiYˆi ei (b0 b1 X i )
e Yˆ b e b e X
i i 0 i 1 i i where ei 0 by first property
eiYˆi 0 ei X i 0 by second property
16
Properties of Residual e
4) The variance of ei is not constant i.e.
v(ei ) constant even though v(u i ) is constant.
5) The estimated residuals are autocorrelated i.e.
cov ei , e j 0even though u i 's are not autocorrelated.
6) COV(b1,e)=0
17
COV(b1,e)
COV (b1 , ei ) E b1 Eb1 e E (e)
E b1 1 (ui u ) (b1 1 ) xi as ei (ui u ) (b1 1 ) xi
E b1 1 (ui u ) E (b1 1 ) 2 xi
xi
=E (w i ui )(ui u ) u2
x 2
i
1
Eui ( w1u1 w2u2 wi ui wnun ) E ui w iui wi u2
n
1
wi E (ui2 ) E wi ui2 2 ( wi w j )(ui u j ) wi u2
n i j
1
wi u2 u2 wi wi u2
n
0
18
Estimation Of Parameter Of Simple Linear Regression
By MLE
Consider the population regression model
Yi 0 1 X i ui where ui N 0, u2
Likelihood function is
2
1 u 0
i
L 0 , 1 ,
n 1
2 u
2
u e
i 1
2 2
u
1
ui2
L 0 , 1 , u2
1 2 u2
n
e
(2 u2 ) 2
1
(Yi 0 1 X i ) 2
L 0 , 1 , u2
1 2 u2
n
e
(2 u2 ) 2
ln L 0 , 1 , u ln 2 ln u 2 (Yi 0 1 X i ) 2
2 n n 2 1
2 2 2 u 19
ln L 0 , 1 , u2 ln 2 ln u2
n n 1
(Yi 0 1 X i ) 2
2 2 2 u
2
1 ei2
(Yi 0 1 X i ) 2
2
u 2
u
n n
20
MLE of variance of disturbance
term is a biased estimator
Prove that σ 2u is biased Estimator of σ 2u i.e.
E ( u2 ) u2
proof :
E n 2 u
ei
2 2
As
Dividing both sides by n
ei2 n2 2
E u
n n
2
E 2
u 2
u u2
n
So , MLE estimator of σ 2u is a biased estimator with amount
2 2
of bias " σ u " but bias will decrease as n increase.
n
Note, lim E u2 u2
n
22
Linearity
b1 is linear estimate of β1
As
b1 WiYi W1Y1 W2Y2 ... WnYn
we can write b1 as a linear combination of random observations y
e.g.
sample mean is a linear estimator
X a1 X 1 a2 X 2 ... an X n where a i are constant
1 1 1
X X 1 X 2 ... X n
n n n
23
Unbiasedness
E b1 1
i.E mean of slope estimator from all possible samples is equal to population slope
coefficient
24
Minimum variance
The least square estimator posses the smallest sampling
variance with in the class of linear unbiased estimator of B1
(It may be that other non-linear or biased estimators from
other methods having a small variance.)
All estimators
Linear estimator
Unbiased estimators
• OLS
25
Minimization of the variance of b1
We want to prove that any other linear unbiased estimator of the
parameter say b*1 (linear , unbiased) obtained from any other method
has a bigger variance then the least square estimator .
Let b*1 be another linear , unbiased estimator of B1 defined by b1* CiYi
i
b1 * C
i
i 0 1 X i ui 0 Ci
i
1 Ci X i
i
C u i
i i
E (b1 *) 0 Ci 1 Ci X i
i i
i i i
2
var(b1* ) E Ci ui Ci 0 Ci X i 1 b1* 1 Ci ui
i i i i
var(b1* ) E C ui2 2 Ci C j ui u j C E (ui2 ) 2 Ci C j E (ui u j )
2 2
E (ui u j ) 0
i i
i j
i i j
u2 Ci2
i
Ci
i 0 C
i
i Xi 1
28
Sampling Distribution of b1
We know that
b1 wiYi
The shape of b1 depend on Yi
whereYi 0 1 X i ui
29
Distribution of dependent variable
Yi 0 1 X i ui
VARIANCE
MEAN
E (Yi ) E[ 0 1 X i ui ] var(Yi ) var[ 0 1 X i ui ]
E (Yi ) 0 1 X i var(Yi ) 0 0 var(ui ) u2
SHAPE OF Yi
Yi 0 1 X i ui
Since β 0 andβ1are constats furthermore the values of explanatory variable
are fixed in repating sampling.So, the shape of the distribution
of Yi is determind by the shape of the distribution of u i ,
which is normal by assumption.
Yi ~ N 0 1 X i , u2
i.e. the distribution of u i and Yi are identical.Expect that their means differ.
Infact, the dtribution of u i is just the distribution if Yi translated on to a
zero mean. 30
Sampling distribution of b1
Theorem:
Linear combination of normally distributed random variables is also normally
distributed
31
Test of significance of B1
The Z-test is applicableif Now ,
i) the variance of Y i.e.σ is known
2
u
ii) the variance of Y i.e.σ is unknown but sampleissufficently largei.e.(n>30)
2
b 1
2
1 1 2
u
var( Z ) E b
1 1 E b 1
u
case I 1
u2
when population variance of yis known then x 2 x2
b -
Z 1 1 1
2 E b1 1 E (b1 ) 1
2
u2
u
x2 x2
In this formula thereis one variable b1 which is normally distributed.
1
2
Therefor Zis also a variable and normally distributed e.g Z N 0,1 u2
E b 1 E (b1 )
We can prove E Z =0 var Z =1 x2
u2
b -β E(b1 )-β1 β1 -β1 0 x 2
E(Z)= 1 1
= = = =0 2 1
σ 2u σ 2u σ 2u σ 2u u
åx 2 x2 x2 x2 x2
32
Test of significance of b1
if population variance of Y is unlnown but the sample is small
then we can use t-test to test β1
We know that
b (Yi Yˆi )
2
ei2
Z 1 1
and 2 (n 2)
u2 2
u 2
u
x2
(Yi Yˆi )
2
ei2 (Yi E (Yi ))
2
2 (n 2)
2
u 2
u 2
u
So,
(Yi Yˆi )
2
2 (n 2)
2
u 33
Test of significance of b1
t random variable:
The ratio of the SNV to the square root of Independent 2 variable divided by
its degree of freedom.
According to the defination
b1 1
u2
xi 2
t
ei2
u2
n2
b1 1 b 1
t 1
ei2 ˆ u2
( n 2) xi 2
Note ,
As cov(b1 , ei ) 0 , so numenrator & denomenator are independent.
34
INTERPRETATIONS OF
TESTS FOR SLOPE
If H0: 1 = 0 is not rejected, one of the following is true:
( x i , yˆ i ) ( yi yi )
ŷi
35
•
(x, y)
y •
30
yˆ i y
25
20
5 10 15 20 25 30
37
Partition of total variation in Y into
Explained & Unexplained Variation
Total variation in Y= (Yi -Y) 2
2
ˆ -Y)+(Y -Y
(Y ˆ )
i i i
ˆ -Y) 2 + (Y -Y
(Y ˆ ) 2 2 (Y
ˆ -Y)(Y -Y
ˆ )
i i i i i i
(b0 b1 X i )ei
b0 ei b1 X i ei as ei 0 & X i ei 0
ˆ -Y) 2 + (Y -Y
(Y ˆ )2
i i i
39
X
Variance
Y
explained by X Variance NOT
explained by X
(SSR)
(SSE)
40
Relation between SSR & SSE
41
Alternative forms of RegSS &ESS
i
ˆ ˆ
2
(Y -Y) 2
b b X
0 1 i 0 1 b b X Y i b0 b1 X i
ˆ
(Yi -Y) b1 ( X i X )
2 2
Yˆ i b0 b1 X i
ˆ
(Yi -Y) b 1 ( X i X )
2 2 2
Yˆ i Y b1 X b1 X Y
xy
1 RegS .S b x i
2 2
from b1 2
x i
1
xy
2 RegS .S b1 2 x 2i b1 xy
x i
Re gSS
3 RegS .S R y 2 2
As R 2
TotalSS
ESS Total Re gSS y 2 R 2 y 2 y 2 (1 R 2 ) 42
R-Squared
• The R-squared statistic, also called the
coefficient of determination, is the percentage of
response variation explained by the explanatory
variable.
Total sum of squares - Residual sum of squares
R
2
43
Interpreting R2
• R2 takes on values between 0 and 1, with
higher R2 indicating a stronger linear
association.
• If the residuals are all zero (a perfect fit),
then R2 is 1. If the least squares line has
slope 0, R2 will be 0.
• R2 is useful as a unitless summary of the
strength of linear association.
44
Caveats about R2
– R2 is not useful for assessing model adequacy, i.e.,
does simple linear regression model hold (use
residual plots) or whether or not there is an
association (use test of
H 0 : 1 0 vs. H1 : 1 0 )
– A good R2 depends on the context. In precise
laboratory work, R2 values under 90% might be too
low, but in social science contexts, when a single
variable rarely explains great deal of variation in
response, R2 values of 50% may be considered
remarkably good.
45
Prove that the test of significance of regression is
equivalent to the test of significance of RHO
H 0 : 1 0 2 0
As b1 ~ N 1 , var(b1 )
Hence
b1 1
~ N (0,1)
var(b1 )
[b1 1 ]2
~ 2 (1) 1
var(b1 )
e 2
~ 2 ( n 2) 2
2
u
u2
n2 46
[b1 1 ]2
~ F (1, n 2)
e 2
n2
xi2
under H 0 : 1 0
b12 xi2
1 ~ F (1, n 2)
e 2
n2
Re gS .S R 2 yi2
1 1
ErrorS .S
n2 1 R y
2 2
i
n2
R2
~ F (1, n 2) which is test statistic for testing Rho=0
1 R 2
n2 47
Prove that F(1,v)=t2(V) for testing
significance of regression
b1 1
t
ˆ u2
xi 2
under H 0 : 1 0
R 2 yi2
t
2 b1
2
xi
2
b1 xy
1
R 2
n 2
F
ˆ u
2
e 2
1 R yi
2 2
1 R 2
n2 n2
Hence proved .
48
• Prediction interval and confidence interval
Two intervals can be used to discover how
closely the predicted value will match the true
value of y.
– Prediction interval - for a particular value of Y,
– Confidence interval - for the expected value of Y.
1 Linearity
ˆ
The estimator Y is linear eatimator
0
1
Yˆ0 b0 b1 X 0 Xwi Yi X 0 wiYi
n
1
( X 0 X ) wi Yi
n
So, Yˆ0 is linear estimator of E Y0
2 Unbiasedness
Ŷ0 is an unbiased estimator.
0Yˆ b b X
0 1 0
E Yˆ E Y
0 0
ˆ is an unbiased estimator of E Y
i.e.Y
50
0 0
3Variance
2
var(Yˆ0 ) E Yˆ0 E Yˆ0
var(Yˆ0 ) E b 0 b1 X 0 0 1 X 0
2
1 X 2 X 02 2 X 0 X 1 X 2
2
2
var(b0 )
2
2
u 2 2 u
n x x x n x
1 X 2 X 02 2 X 0 X u2
2
var(b1 )
x2
u 2
n x
1 ( X 0 X )2
2
u 2
n x
51
Prove that estimator of E(Yo) is BLUE
ˆ as an linear function of the Y
Define the point estimate Y0 i
E Yo 0 1 X i
Yˆ0 Ci ( 0 1 X i ui )
Yˆ0 0 Ci 1 Ci X i Ci ui
E (Yˆ0 ) 0 Ci 1 Ci X i
which is an unbiased estimator for E (Y0 ) iff
Ci 1 & Ci X i X 0
under the above condition , Var(Yˆ )
0
var(Yˆ0 ) E C 2 i u 2i 2 Ci C j ui u j
i j
var(Yˆ0 ) C 2 i E (u 2i ) 2 Ci C j E (ui u j ) E (ui u j ) 0
i j
52
var(Yˆ0 ) C
2
u
2
i
n 1 n X
1
Now by Lagrange's Multiplier method X 4
n
multiply eq A with eqX i & taking summation
Z C 2i 2 Ci 1 2 ( Ci X i X 0 )
Ci X i X i X 2i
partially diff. w.r.t Ci , λ and μ X 0 X i X 2i
1
Z X 0 X nX X 2 i
2Ci 2 2 X i 0 1 n
Ci X 0 X n X 2 X 2 i
X 0 X ( X 2 i nX 2 ) ( X 2 i X ) 2
Z
2 Ci 1 0 2 X0 X
5 put in 4
xi2
Z 1 X0 X
2 Ci X i X 0 0 3
xi2
X 6
n
putting e values of 5 & 6 in A
from 1 2 amd 3 1 X X X X
Ci 0 2 X 0 2 X i
Ci X i A n xi xi
1 X0 X 1 ( X 0 X ) xi
Ci (Xi X )
Ci 1 B n xi 2
n xi2
putting the values of Ci in eq Yˆ0 CiYi
Ci X i X 0 C 1 ( X 0 X ) xi
Yˆ0 Yi
Taking sum of eq A n xi2
1 X xiYi X xiYi
Yˆ0 Yi
Ci n X i n xi2
xi2
xiYi
1 n n X Yˆ0 Y b1 X b1 X 0 where b1
xi2 53
Yˆ0 b0 b1 X 0 & b0 Y b1 X
ˆ is given by
now, the variance of Y0
var(Yˆ ) C 2 2
0 i u
2
ˆ 1 ( X 0 X ) xi
var(Y0 ) u
2
n xi
2
ˆ 1 ( X X ) x i 2( X 0 X ) xi
2 2
var(Y0 ) 2
2 0
u 2 2 2
n ( xi ) n xi
ˆ 1 ( X X ) 2
x 2
2( X 0 X ) xi
var(Y0 ) u
2 0 i
xi 0
n ( xi )
2 2
n xi2
ˆ 1 ( X X ) 2
var(Y0 ) u
2 0
n xi
2
Hence proved
54
Prediction of a single value of Y
An important application of the regression model is the predication of Y correponding
to a specfied level of the regresser variable X. Suppose the given value of the
explanatory variable is X0so that our task is to predict the value of Y say Y0 .
Since Y0 is a random variable with values scattered around the point on the population
regression line corresponding to X0 .
If we know the population parameters our predictor of Y0 would be its mean
E(Y0 ) 0 1 X 0 which defines a point on the population line. This is the best predictor
of Y0 in the sence that wthe variance of Y0 around E(Y0) is smaller than around any
other point
In reality E(Yo) is not known and has to bbe estimated.
ˆ =b b X .
the estimator is the corresponding point on the sample regression line Y o 0 1 0
eo u0 (b0 0 ) (b1 1 ) X 0
is normally distributed with mean zero
E (eo ) E (Y0 Yˆ0 ) E (u0 ) E (b0 0 ) X 0 E (b1 1 )
E (eo ) 0
56
Now, var(eo ) E eo E eo E eo
2 2
1 X 2 X 02 2 XX 0
2 2
2 2
n x x x2
u u
1 ( X 0 X )2 2 1 ( X 0 X )2
2 2
u 1
x2
u u 2
n x n
57
since e0 is a linear conbtion of normal variable is also normally distributed with mean zero and
1 ( X X ) 2
variance u 1
2 0
n x 2
2 1 ( X 0 X )2
e0 N 0, u 1
n x 2
Predication intervel is
(Y0 Yˆ0 ) 0 (Y0 Yˆ0 )
t
ˆ
S .E (Y0 Y0 ) 1 ( X X ) 2
ˆ u2 1 0 2
n x
1 ( X X ) 2
Yˆ0 t ( ) ˆ 1 0 2
2
where d . f n 2
x
u
2 n
58
What’s the Difference?
Prediction Interval Confidence Interval
ˆ 2 1 ( X 0 X )2 1 ( X 0 X )2
Yo t( / 2,V ) ˆ u 1
xi2
Yˆo t( / 2,V ) ˆ
2
n xi2
u
n
1 no 1