0% found this document useful (0 votes)
23 views

2-Simple Linear Regression

This document provides an overview of simple linear regression (SLR) modeling. It discusses: 1) The basic SLR model equation relating a response variable Y to a predictor variable X. 2) How least squares estimation is used to estimate the regression coefficients β0 and β1 from a sample data set. 3) Formulas for computing the estimated regression coefficients b0 and b1 using the ordinary least squares (OLS) method.

Uploaded by

iqra mumtaz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

2-Simple Linear Regression

This document provides an overview of simple linear regression (SLR) modeling. It discusses: 1) The basic SLR model equation relating a response variable Y to a predictor variable X. 2) How least squares estimation is used to estimate the regression coefficients β0 and β1 from a sample data set. 3) Formulas for computing the estimated regression coefficients b0 and b1 using the ordinary least squares (OLS) method.

Uploaded by

iqra mumtaz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Simple Linear Regression Model

Response Regressor

Yi   0  1 X i  ui
Regression Random term
coefficients 1
Estimation Process
Regression Model Sample Data:
y = 0 + 1x +e x y
Regression Equation x1 y1
E(y) = 0 + 1x . .
Unknown Parameters . .
0, 1 xn yn

Estimated
b0 and b1 Regression Equation
provide estimates of ŷ  b 0  b 1 x
0 and 1 Sample Statistics
b0, b1

2
Estimation Of Parameters In SLR by OLS
E (ui )  0 , E (ui2 )   u2 , E (ui u j )  0 & , ui N (0,  u2 )
we k now that,
Yi   0  1 X i  ui
ui  Yi   0  1 X i

S   ui2   (Yi   0  1 X I ) 2        ( I )
partially differentiate w.r.t  0 , 1
s
 2 (Yi   0  1 X i )( 1)
 0
s
 2 (Yi   0  1 X i )(  X i )
1
s s
 0
 0 1
  Yi  nb0  b1  X i              (1)
 X iYi  b0  X i  b1  X i 2        (2)
multiply eq.(1) by  X i -multiply eq.(2) by n
afters olving , we get
n  XY   X  Y  XY  (  X  Y ) / n
b1   & b0  Y  b1 X
n  X  ( X )
2 2
 X 2  ( X ) 2 / n
3
Alternatives Forms For b0 and b1
 xi yi  xiYi
1  b1   b  where xi  X i  X & yi  Yi  Y
 xi2  xi2
1

xi
2  b1   wiYi where wi 
 xi2
S ( XY )
3  b1 
S ( XX )
mx y ( X  X ) 2
4  b1  whert mx x 
mx x n

1  b0  Y  b1 X
 Yi
2  b0   X  wY
i i
n
1 
3  b0     wi X  Yi where 'w i ' is the least square weights
i n  4
Some Important Results About Wi
• We assume that Xi values are fixed constant in repeated sampling
and hence Wi are fixed constant in repeating sampling.
xi
1)  wi  0 where wi  ,
i  xi 2

xi  xi   X  X 
   0
i x  xi  xi
2 2 2
i
2
1  x   x 2
1
2)  wi 
2
 wi    2  
2 i i

 xi 2  i    xi 
i x 
2 2
i i 2 xi

xi  xi 2
3)  wi xi  1 where  wi xi   x  1
i x  xi
2 i 2
i

5
Mean, Variance, Covariance and
Correlation of OLS estimators of
Bo & B1

6
MEAN of b1
As the least square estimates are the function of sample data, since the data
are likely to change from sample to sample as a result estimates will change
from one sample to another.

b1   wiYi   wi (  0  1 X i  ui )
i i

  0  wi  1  wi X i   wi ui
i i i

b1  1   wi ui As wi  0 & w X i i 1
i i i

We know that
b1  1   wi ui
E (b1 )  1   wi E (ui ) where  wi is constant
E (b1 )  1 as E (ui )  0
7
Variance of b1
Var(b1 )   b1  E b1  E  b1    E b1  1   E   wi ui 
2 2 2

 E   wi 2ui 2  2   wi w j ui u j 
 i j 
  wi 2 E (ui 2 )  2   wi w j E (ui u j )
i j

1 1
 2
0 where E (ui u j )  0 E (ui )  
2 2
 wi 
2

 xi2  xi2
u u

 u2
Var (b1 ) 
 xi2

The variance of b1 is inversely proportional to  x i i.e larger the


2

variation in X values smaller the variance of b1 and hence the greater the
precision with which b1 can be estimated
8
MEAN of bo
1 
b0     Xwi  Yi
n 
1 
    Xwi    0  1 X i  ui 
n 
1 1 1 
    0  1 X i  ui  Xwi  0  Xwi 1 X i  Xwi ui 
n n n 
n 1 1
  0  1  X i   ui  X  0  wi  X 1  wi X i  X  wi ui
n n n
1
  0  X 1   ui  X   X  wi ui
n
1 
  0     Xwi  E (ui )
n 
E (b0 )   0 where E (ui )  0

9
Variance of bo
1 
 E  b0   0  where b0   0     Xwi  ui
2

n 
2
 1   1 
 E    X wi  ui  b0   0     Xwi  u
 n   n 
1
 E   i ui   Xwi  i `
2
put
n
 
 E   i2 ui2  2 i  j ui u j 
 i j 
  i2 E  ui2   2   i j E  ui u j 
i j

  i2 E  ui2    i
2
 u2
2
1 
 2
u   Xwi 
n 
1 X 

2
2
u   X wi  2
2
wi 
n n 
1 X 

2
2
u   x  wi  2
2
 wi 
n n 
1 X 
var  b0    2
  2 
 n  x  10
Covariance of (b0, b1)
Cov(b0 , b1 )  E  (b0  E (b0 )(b1  E (b1 )  where b0   0  Y  b1 X   0   0  1 X  u  b1 X   0
 E u (b1  1 )  X (b1  1 )(b1  1 )    u  X (b1  1 ) 
1   
 E   ui   wi u i   XE (b1  1 )
2

n   
1  
 E   wi u i  2  wi  w j  ui u j   X
2
variance of b1
n  i j 
1 2
  u  wi  X variance of b1
n
 u2
 X
i
x 2

The nature of covariance between bo & b1 depends upon the sign of mean of
X-values. If mean of x-values is positive, the covariance will be negative. Thus if
the slope coefficient is overestimated , the intercept coefficient will be 11
underestimated
Correlations between (b0, b1)
cov(b0 , b1 )
Correlation  b0 , b1  
var(b1 ) var(b0 )
 u2  u2 1 X 
where cov(b0 , b1 )   X var(b1 )  var(b0 )   u2   2 
 x2  x2  n  x 

 u2  X  u2
X 
 x2 n  x2
Correlation  b0 , b1   
u 2
1 X  1 1 X 
 u2   2 
 u2 
x 2
n x   x  n  x 2 
2 

X

n  x2  X
 
2 2    
2
1    n2 X 2
n  x 2
 n 2
X 2 n X X
n ( x )
2

 X  X
 
n   X 2  X 2  2 XX   n 2 X 2 n  X 2  n 2 X 2  2nX  X 2  n 2 X 2
2

 X
 
b b n X 2
01
12
Estimation The Variance Of The Disturbance Term
• Let ei be the estimate of ui made from the sample data
ei  Yi  Yˆi
  0  1 X i  ui  b0  b1 X i whrere Yi   0  1 X i  ui
 ui  (b0   0 )  (b1  1 ) X i Y   0  1 X  u
 (ui  u )  (b1  1 ) X  (b1  1 ) X i b0  Y  b1 X
 (ui  u )  (b1  1 )( X i  X ) b0   0  1 X  u  b1 X
ei  (ui  u )  (b1  1 ) xi b0   0  u  (b1  1 ) X

Now taking sum and squaring both sides


 ei    (ui  u )  (b1  1 ) xi 
2 2

 ei 2  (ui  u ) 2  (b1  1 ) 2  xi 2  2(b1  1 ) (ui  u ) xi


E   ei 2   E  (ui  u ) 2  (b1  1 ) 2  xi 2  2(b1  1 ) (ui  u ) xi   (1)
13
Estimation The Variance Of The Disturbance Term
consider
A  E  (ui  u )2  , B  E (b1  1 ) 2  xi 2  C  E  2(b1  1 ) (ui  u ) xi 
Now
First consider
 (  u ) 2
 E (  u ) 2
A  E  (ui  u )2   E  ui 2  i
   E (u i
2
)  i

 n  n
1  2  1 2
 n u  E  ui  2   uiu j  n u  E ( ui )    E (uiu j )
2 2 2

n  i j  n n i j
1
 n u2  ( Eui 2 )  E (uiu j )  0 & Eui 2   u2
n
1 2
 n u  n u  (n  1) u2
2

n
14
Nowconsider B
B  E (b1  1 ) 2  xi 2  Estimation The Variance Of
 u2 The Disturbance Term
  xi E (b1  1 )   xi
2 2 2
 u 2

 xi 2
C  E  2(b1  1 ) (ui  u ) xi 
xi
 2 E  (b1  1 )  xi ui  u  xi  whrer b1  1   ui wi wi 
 xi 2
  xi      xi ui 2 
 2 E   ui wi   xi ui    2 E  2 i 
u  xiui    2 E  

 i 
x    x i
2



2
E   x 2u 2  2   x x u u   2   x 2 E (u 2 )  2   x x E (u u )   E (ui u j )  0  Eui 2  
 xi 2 
i j i j  i i j 
  xi 2 
i u u i j
i j i j

2  xi 2 u2
  2 u2
 xi 2

putting these values in "1"


E   ei2    n  1  u2   u2  2 u2
E   ei2    u2  n  1  1  2  (n  2) u2
  ei2 
  u
2
E
 (n  2) 
E ˆ u2    u2
 ei2
So unbiased estimate of  is 2
where ei  Yi  Yˆi
(n  2)
u
15
Properties of Residual e
1)  ei  0
Pr oof
 ei  (Yi  Yˆi )   Yi   Yˆi where ei  Yi  Yˆi &  Yˆi   (b0  b1 X i )
 ei   Yi   Yi  Yi  nb0  b1  X i Ist normal equation
 ei  0 so,  Y   Yˆi  Y  Yˆ
2)  ei X i  0
Pr oof
(Yi  Yˆi ) X i
  X iYi   X iYˆi
  X iYi   X i (b0  b1 X i )
  X iYi  [b0  X i  b1  X i 2 ] 2nd normal equation
  X iYi   X iYi  X iYi  b0  X i  b1  X i 2
 ei X i  0

3)  eiYˆi  0
proof
 eiYˆi   ei (b0  b1 X i )
 e Yˆ  b  e  b  e X
i i 0 i 1 i i where  ei  0 by first property
 eiYˆi  0  ei X i  0 by second property
16
Properties of Residual e
4) The variance of ei is not constant i.e.
v(ei )  constant even though v(u i ) is constant.
5) The estimated residuals are autocorrelated i.e.
cov  ei , e j   0even though u i 's are not autocorrelated.
6) COV(b1,e)=0

17
COV(b1,e)
COV (b1 , ei )  E  b1  Eb1  e  E (e)  
 E  b1  1  (ui  u )  (b1  1 ) xi   as ei  (ui  u )  (b1  1 ) xi
 E  b1  1  (ui  u )  E (b1  1 ) 2 xi
xi
=E  (w i ui )(ui  u )   u2
x 2
i

1   
 Eui ( w1u1  w2u2  wi ui wnun )  E   ui   w iui   wi u2
n   
1  
 wi E (ui2 )  E   wi ui2  2 ( wi  w j )(ui u j )   wi u2
n  i j 
1
 wi u2   u2  wi  wi u2
n
0
18
Estimation Of Parameter Of Simple Linear Regression
By MLE
Consider the population regression model
Yi   0  1 X i  ui where ui N  0,  u2 
Likelihood function is
2
1  u 0 
  i
L   0 , 1 ,  
n 1 
2 u 
2
u  e
i 1
2 2
u
1
  ui2
L   0 , 1 ,  u2  
1 2 u2
n
e
(2 u2 ) 2

1
  (Yi   0  1 X i ) 2
L   0 , 1 ,  u2  
1 2 u2
n
e
(2 u2 ) 2

Taking ln both sides because L and lnL are both monotonicallyinsreasing


function,So, max of Lis equivalent to max of lnL.

ln L   0 , 1 ,  u    ln  2   ln  u   2 (Yi   0  1 X i ) 2
2 n n 2 1
2 2 2 u 19
ln L   0 , 1 ,  u2    ln  2   ln  u2  
n n 1
(Yi   0  1 X i ) 2
2 2 2 u
2

Now , diff.w.r.t to β 0 ,β1andσ 2u and equate it to zero.


 ln L 1
 2 (Yi   0  1 X i )  1  0
 0 u
  Yi  n 0  1  X i      1
 ln L 1
 2 (Yi   0  1 X i )   X i   0
1 u
  X iYi   0  X i  1  X i 2       2 
 ln L n 1
  (Yi   0  1 X i ) 2  0        3
 u2
2 u 2( u )
2 2 2

from eq 1 and eq  2 


 0  Y  1 X
 xy
1 
x2
MLE & OLS estimators of  0 & 1 are same.
1
from eq  3 n (Yi   0  1 X i ) 2  0
 2
u

1  ei2
  (Yi   0  1 X i ) 2
2
u   2
u
n n
20
MLE of variance of disturbance
term is a biased estimator
Prove that σ 2u is biased Estimator of σ 2u i.e.
E ( u2 )   u2
proof : 
E    n  2 u
  ei 
2 2
As
Dividing both sides by n
  ei2  n2 2
E     u
 n   n 
2
E  2
 u    2
u   u2
n
So , MLE estimator of σ 2u is a biased estimator with amount
2 2
of bias " σ u " but bias will decrease as n increase.
n
Note, lim E  u2    u2
n 

That is σ 2u is an asymptotically unbiased estimator of σ 2u .


21
Gauss-Markov Theorem
Under the assumptions of simple linear
regression model, the least squares
estimators b0 and b1 are Linear, unbiased
and have minimum variance among all
unbiased linear estimators (BLUE)

22
Linearity
b1 is linear estimate of β1
As
b1   WiYi  W1Y1  W2Y2  ...  WnYn
we can write b1 as a linear combination of random observations y
e.g.
sample mean is a linear estimator
X  a1 X 1  a2 X 2  ...  an X n where a i are constant
1 1 1
X  X 1  X 2  ...  X n
n n n

23
Unbiasedness

E  b1   1
i.E mean of slope estimator from all possible samples is equal to population slope
coefficient

24
Minimum variance
The least square estimator posses the smallest sampling
variance with in the class of linear unbiased estimator of B1
(It may be that other non-linear or biased estimators from
other methods having a small variance.)

All estimators

Linear estimator

Unbiased estimators

• OLS

25
Minimization of the variance of b1
We want to prove that any other linear unbiased estimator of the
parameter say b*1 (linear , unbiased) obtained from any other method
has a bigger variance then the least square estimator .
Let b*1 be another linear , unbiased estimator of B1 defined by b1*   CiYi
i
b1 *  C 
i
i 0  1 X i  ui    0  Ci
i
 1  Ci X i 
i
C u i
i i

taling expectionon both side


E (b1 *)   0  Ci  1  Ci X i   C E (u
i i ) E (ui )  0
i i i

E (b1 *)   0  Ci  1  Ci X i
i i

so, b1* will be an unbiased estimator iff C i


i 0 C i
i Xi  1

Now, the variance of b1* under above restriction


var(b1* )  E  b1 *  1  b1*   0  Ci  1  Ci X i   Ci ui
2

i i i
2
 
var(b1* )  E   Ci ui   Ci  0  Ci X i  1  b1*  1   Ci ui
 i  i i i

 
var(b1* )  E   C ui2  2   Ci C j ui u j    C E (ui2 )  2   Ci C j E (ui u j )
2 2
E (ui u j )  0
 i i
i j
 i i j

  u2  Ci2
i

Minimization of Var(b1*) depends on minimization of C


i
i
2
subject to
26
C
i
i 0 C Xi
i i 1
Minimization of Var(b1*) i.e minimization of C
i
i
2
, subject to

Ci
i 0 C
i
i Xi  1

Consider the Lagrange function Z as . Ci  2 X  2 X i


Z   Ci 2  21  Ci  22   Ci X i  1 Ci  2 ( X i  X )
Z Ci  2 xi      4
 2Ci  21  22 X i
Ci multiply by X i on both side
Z Ci X i  2 xi X i
 2  Ci
1 Taking sum on both sides
Z  Ci X i  2  xi X i
 2   Ci X i  1
2 1  2  xi X i
Z Z Z 1
put   0 2    xi X i   ( X i  X ) X i
Ci 1 2  xi X i
we get 1 ( X i ) 2
Ci  1  2 X i      1 2    Xi  X  Xi   Xi
2 2

 xi 2 n
 Ci  0          2
put in rquation  4 
 Ci X i  1        3
xi
Taking sum of equation 1 Ci   wi
 xi 2

 Ci  n1  2  X i So, the variance of b1 is minimum within the class


n1  2  X i from equation  2  of linear unbiased estimators if Ci=Wi
1  2 X put in equation 1 which is Least square Estimator.
27
Sampling Distribution of OLS
estimators
• As least square estimators of regression
parameters are variables, so they have sampling
distribution ( Mean, Variance and Shape)

28
Sampling Distribution of b1

We know that
b1   wiYi
The shape of b1 depend on Yi
whereYi   0  1 X i  ui

Now , first we find the distribution of Yi

29
Distribution of dependent variable

Yi  0  1 X i  ui
VARIANCE
MEAN
E (Yi )  E[ 0  1 X i  ui ] var(Yi )  var[  0  1 X i  ui ]
E (Yi )  0  1 X i var(Yi )  0  0  var(ui )   u2
SHAPE OF Yi
Yi   0  1 X i  ui
Since β 0 andβ1are constats furthermore the values of explanatory variable
are fixed in repating sampling.So, the shape of the distribution
of Yi is determind by the shape of the distribution of u i ,
which is normal by assumption.
Yi ~ N   0  1 X i ,  u2 
i.e. the distribution of u i and Yi are identical.Expect that their means differ.
Infact, the dtribution of u i is just the distribution if Yi translated on to a
zero mean. 30
Sampling distribution of b1

Since b1 is a linear combination of the observations Yi which


is normally distributed.So, b1 is also normally distributed with mean β1
and variance σ 2b1 i.e.
  u2 
b1 ~ N  1 , 2 
  x i 

Theorem:
Linear combination of normally distributed random variables is also normally
distributed

31
Test of significance of B1
The Z-test is applicableif Now ,
i) the variance of Y i.e.σ is known
2
u  
ii) the variance of Y i.e.σ is unknown but sampleissufficently largei.e.(n>30)
2
 b   1
   
2
  1 1  2       
u
var( Z ) E b
 1 1 E b 1 
 u
case I 1
  u2
when population variance of yis known then   x 2   x2
b -
Z 1 1 1
 2 E b1  1  E (b1 )  1 
2
 u2
u
 x2  x2
In this formula thereis one variable b1 which is normally distributed.
1
   
2
Therefor Zis also a variable and normally distributed e.g Z N  0,1  u2
E b 1 E (b1 )
We can prove E  Z  =0 var  Z  =1  x2
   u2
 b -β  E(b1 )-β1 β1 -β1 0  x 2
E(Z)=  1 1
 = = = =0  2 1
 σ 2u  σ 2u σ 2u σ 2u u
 åx 2   x2  x2  x2  x2

32
Test of significance of b1
if population variance of Y is unlnown but the sample is small
then we can use t-test to test β1
We know that
 
 b   (Yi  Yˆi )
2
 ei2
Z  1 1
 and   2 (n  2)
  u2   2
u  2
u
  x2 

first of all we will prove that


e 2
i
quantity has (n-2) degree of fredoom.
σ 2
u

(Yi  Yˆi )
2
 ei2 (Yi  E (Yi ))
2

   2 (n  2)
 2
u  2
u  2
u

 E (Y )   0  1 X E(Y) is unknown then we use


Yˆ  b  b X0 1

So,
(Yi  Yˆi )
2

 2 (n  2)
 2
u 33
Test of significance of b1
t random variable:
The ratio of the SNV to the square root of Independent  2 variable divided by
its degree of freedom.
According to the defination
b1  1
 u2
 xi 2
t
 ei2
 u2
n2
b1  1 b  1
t  1
 ei2 ˆ u2
( n  2)  xi 2
Note ,
As cov(b1 , ei )  0 , so numenrator & denomenator are independent.

34
INTERPRETATIONS OF
TESTS FOR SLOPE
If H0: 1 = 0 is not rejected, one of the following is true:

• For a true underlying straight-line model, x


provides little or no help in predicting y, that
is, y is essentially as good as b 0  b1 x for
predicting y.
• The true underlying relationship between x
and y is not linear; that is, the true model
may involve quadratic, cubic, or other more
complex functions of x.
35
INTERPRETATIONS OF
TESTS FOR SLOPE
If H0: 1 = 0 is rejected, followings are true:

• x provides significant information for


predicting y, that is, the model b 0  b1 x
is far better than the naive model y for
predicting y.
• A better model might have, for example, a
curvilinear term, although there is a definite
linear component.
36
Partition of total variation in Y into
Explained & Unexplained
( y  yˆ )
Variation
i i
45
yi (x i ,y i ) yˆ i  b 0  b1 x i
40

( x i , yˆ i ) ( yi  yi )
ŷi
35

(x, y)
y •
30
yˆ i  y
25

20
5 10 15 20 25 30
37
Partition of total variation in Y into
Explained & Unexplained Variation
Total variation in Y=  (Yi -Y) 2
2
ˆ -Y)+(Y -Y
   (Y ˆ )
i i i 

ˆ -Y) 2 +  (Y -Y
  (Y ˆ ) 2  2  (Y
ˆ -Y)(Y -Y
ˆ )
i i i i i i

Consider the coress product term


ˆ -Y)(Y -Y
 (Y ˆ )=  Y
ˆ (Y -Y
ˆ )-Y  (Y -Y
ˆ ) as ˆ )=0
 (Yi -Y
i i i i i i i i i

 (b0  b1 X i )ei
 b0  ei  b1  X i ei as  ei  0 &  X i ei  0
ˆ -Y) 2 +  (Y -Y
  (Y ˆ )2
i i i

=Explained S.S+Unexplained S.S


• Variation in y = SSE + SSR
SSE – Sum of Squares Error – measures the amount of
variation in y that remains unexplained (i.e. due to error)
• SSR – Sum of Squares Regression – measures the amount
of variation in y explained by variation in the independent38
variable x.
Variance to be
explained by predictors
(SST)

39
X

Variance
Y
explained by X Variance NOT
explained by X
(SSR)
(SSE)
40
Relation between SSR & SSE

SST  SSR  SSE

41
Alternative forms of RegSS &ESS
 i
ˆ ˆ
2
(Y -Y) 2
   b  b X 
0 1 i 0 1 b  b X  Y i  b0  b1 X i

ˆ
 (Yi -Y)   b1 ( X i  X ) 
2 2
Yˆ i  b0  b1 X i
ˆ
 (Yi -Y)   b 1 ( X i  X )
2 2 2
Yˆ i  Y  b1 X  b1 X  Y
 xy
1   RegS .S  b  x i
2 2
from b1  2
x i
1

 xy
2   RegS .S  b1 2  x 2i  b1  xy
x i
Re gSS
3   RegS .S  R  y 2 2
As R  2

TotalSS
ESS  Total  Re gSS   y 2  R 2  y 2   y 2 (1  R 2 ) 42
R-Squared
• The R-squared statistic, also called the
coefficient of determination, is the percentage of
response variation explained by the explanatory
variable.
Total sum of squares - Residual sum of squares
R 
2

Total sum of squares


• Unitless measure of strength of relationship
between x and y

43
Interpreting R2
• R2 takes on values between 0 and 1, with
higher R2 indicating a stronger linear
association.
• If the residuals are all zero (a perfect fit),
then R2 is 1. If the least squares line has
slope 0, R2 will be 0.
• R2 is useful as a unitless summary of the
strength of linear association.

44
Caveats about R2
– R2 is not useful for assessing model adequacy, i.e.,
does simple linear regression model hold (use
residual plots) or whether or not there is an
association (use test of
H 0 : 1  0 vs. H1 : 1  0 )
– A good R2 depends on the context. In precise
laboratory work, R2 values under 90% might be too
low, but in social science contexts, when a single
variable rarely explains great deal of variation in
response, R2 values of 50% may be considered
remarkably good.

45
Prove that the test of significance of regression is
equivalent to the test of significance of RHO
H 0 : 1  0 2  0
As b1 ~ N  1 , var(b1 ) 
Hence
b1  1
~ N (0,1)
var(b1 )
[b1  1 ]2
~  2 (1)         1
var(b1 )

e 2

~  2 ( n  2)          2 
 2
u

We know that F is the ratio of two independent χ 2 variables divided


by their respective degrees of freedom.
from 1& 2
[b1  1 ]2
 u2
 xi 2
1 F (1, n  2) As Cov (b1 , e)  0
e 2

 u2
n2 46
[b1  1 ]2
~ F (1, n  2)
e 2

n2
 xi2
under H 0 : 1  0
b12  xi2
1 ~ F (1, n  2)
e 2

n2
Re gS .S R 2  yi2
1  1
ErrorS .S
n2 1  R   y
2 2
i
n2
R2
~ F (1, n  2) which is test statistic for testing Rho=0
1  R 2 
n2 47
Prove that F(1,v)=t2(V) for testing
significance of regression
b1  1
t
ˆ u2
 xi 2
under H 0 : 1  0
R 2  yi2
t 
2 b1
2
 xi
2

b1  xy
 1 
R 2
 n  2
F
ˆ u
2
e 2
1  R   yi
2 2
1  R 2

n2 n2
Hence proved .
48
• Prediction interval and confidence interval
Two intervals can be used to discover how
closely the predicted value will match the true
value of y.
– Prediction interval - for a particular value of Y,
– Confidence interval - for the expected value of Y.

– The prediction – The confidence


interval ( x  x) 2
interval ( x  x) 2
1 g 1 g
ŷ  t  2 s e 1   ŷ  t  2 s e 
n  ( x i  x) 2 n  ( x i  x) 2

The prediction interval is wider than the confidence interval


49
Let X 0 denote the level of X for which we wish to estimate the mean responce.
The mean respnce when X=X 0 is denoted by
E Y0    0  1 X 0 its sample estimate is
Yˆ0  E Y0   b0  b1 X 0
ˆ
Properties of Y 0

ˆ is a variable So it has its own mean, variance & shape.


Since Y0

1 Linearity
ˆ
The estimator Y is linear eatimator
0

1 
Yˆ0  b0  b1 X 0     Xwi  Yi  X 0  wiYi
n 
1 
    ( X 0  X ) wi  Yi
n 
So, Yˆ0 is linear estimator of E Y0 
 2 Unbiasedness
Ŷ0 is an unbiased estimator.
0Yˆ  b  b X
0 1 0

Taking expection on both sides


 
E Yˆ0  E (b 0 )  E (b1 ) X 0   0  1 X 0

E Yˆ   E Y 
0 0

ˆ is an unbiased estimator of E  Y 
i.e.Y
50
0 0
 3Variance
 
2
var(Yˆ0 )  E Yˆ0  E Yˆ0 
 
var(Yˆ0 )  E b 0  b1 X 0   0  1 X 0 
2

var(Yˆ0 )  var(b 0 )  X 2 var(b 1 )  2 X 0 cov(b 0b 1 )


1 X 2  2 u 2 X 0 X  u2 2 X  u2
2
  
2
2
 X0  cov(b0 , b1 )  
n  x  x  x2  x2
u 2

1 X 2 X 02 2 X 0 X  1 X 2 
  
2
  2 
var(b0 )    
2
2
   
u 2 2 u
 n x x x   n x 
 1 X 2  X 02  2 X 0 X   u2
  
2
 var(b1 ) 
  x2
u 2
 n x 
 1 ( X 0  X )2 
  
2


u 2
 n x 

51
Prove that estimator of E(Yo) is BLUE
ˆ as an linear function of the Y
Define the point estimate Y0 i

Yˆ0   CiYi , by 1st property of Yˆ0


ˆ as a blue
where the weights Ci are to be chosen as as to make Y0

E Yo    0  1 X i
Yˆ0   Ci (  0  1 X i  ui )
Yˆ0   0  Ci  1  Ci X i   Ci ui
E (Yˆ0 )   0  Ci  1  Ci X i
which is an unbiased estimator for E (Y0 ) iff
 Ci  1 &  Ci X i  X 0
under the above condition , Var(Yˆ )
0

V ar(Yˆ0 )  E Yˆ0  E (Yˆ0 )   E   Ci ui 


2 2

var(Yˆ0 )  E   C 2 i u 2i  2   Ci C j ui u j 
 i j 
var(Yˆ0 )   C 2 i E (u 2i )  2   Ci C j E (ui u j )  E (ui u j )  0
i j
52
var(Yˆ0 )    C
2
u
2
i
n  1  n X
1
Now by Lagrange's Multiplier method     X         4
n
multiply eq  A  with eqX i & taking summation
Z   C 2i  2   Ci  1  2  ( Ci X i  X 0 )
 Ci X i    X i    X 2i
partially diff. w.r.t Ci , λ and μ X 0    X i    X 2i
1 
Z X 0     X  nX    X 2 i
 2Ci  2  2  X i  0        1 n 
Ci X 0  X  n X 2    X 2 i
X 0  X   (  X 2 i  nX 2 )    ( X 2 i  X ) 2
Z
 2   Ci  1  0            2  X0  X
            5  put in  4 
 xi2
Z 1 X0  X
 2   Ci X i  X 0   0         3  
 xi2
X       6
 n
putting e values of  5 &  6  in  A
from 1  2  amd  3 1 X  X  X X 
Ci    0 2 X    0 2  X i
Ci     X i         A  n  xi    xi 
1  X0  X  1 ( X 0  X ) xi
Ci    (Xi  X )  
 Ci  1            B  n   xi 2
n  xi2
putting the values of Ci in eq Yˆ0   CiYi
 Ci X i  X 0         C   1 ( X 0  X ) xi 
Yˆ0      Yi
Taking sum of eq  A  n  xi2 
1 X  xiYi X  xiYi
Yˆ0   Yi 
 Ci  n    X i n  xi2

 xi2
 xiYi
1  n  n X Yˆ0  Y  b1 X  b1 X 0 where b1 
 xi2 53
Yˆ0  b0  b1 X 0 & b0  Y  b1 X
ˆ is given by
now, the variance of Y0

var(Yˆ )   C 2 2
0 i u
2
ˆ  1 ( X 0  X ) xi 
var(Y0 )   u   
2

 n  xi
2

ˆ  1 ( X  X ) x i 2( X 0  X ) xi 
2 2
var(Y0 )     2 
2 0
 
 
u 2 2 2
 n ( xi ) n xi 

ˆ  1 ( X  X ) 2
 x 2
2( X 0  X )  xi 
var(Y0 )   u  
2 0 i
   xi  0
n ( xi )
2 2
n  xi2

ˆ  1 ( X  X ) 2

var(Y0 )   u  
2 0

 n  xi
2

Hence proved
54
Prediction of a single value of Y
An important application of the regression model is the predication of Y correponding
to a specfied level of the regresser variable X. Suppose the given value of the
explanatory variable is X0so that our task is to predict the value of Y say Y0 .
Since Y0 is a random variable with values scattered around the point on the population
regression line corresponding to X0 .
If we know the population parameters our predictor of Y0 would be its mean
E(Y0 )   0  1 X 0 which defines a point on the population line. This is the best predictor
of Y0 in the sence that wthe variance of Y0 around E(Y0) is smaller than around any
other point
In reality E(Yo) is not known and has to bbe estimated.
ˆ =b  b X .
the estimator is the corresponding point on the sample regression line Y o 0 1 0

The actual value of Y0 will differ from the predicted value of Y0


1- The value of Y0 will not be equal to E(Y0)
2- The sample regression will not be the same as the population
regression line because of the sampling error 55
The difference between actual value and predicted vaue is called forecast error i.e eo  Y0  Yˆ0
e  Y  Yˆ     X  u  b  b X
o o 0 1 0 0 0 1 0

eo  u0  (b0   0 )  (b1  1 ) X 0
is normally distributed with mean zero
E (eo )  E (Y0  Yˆ0 )  E (u0 )  E (b0   0 )  X 0 E (b1  1 )
E (eo )  0

56
Now, var(eo )  E eo  E  eo    E  eo 
2 2

var(eo )  E u0  {(b0   0 )  X 0 (b1  1 )} where eo  u0  {(b0   0 )  X 0 (b1  1 )


2

var(eo )  E (u0 ) 2  E (b0   0 )  X 0 (b1  1 )  2 E u0{(b0   0 )  X 0 (b1  1 )}


2

var(eo )  E (u0 ) 2  E (b0   0 ) 2  X 20 E (b1  1 ) 2  2 X 0 E (b0   0 )(b1  1 )  2 E u0{(b0   0 )  X 0 (b1  1 )}


var(eo )  E (u0 )2  var(b0 )  X 2 0 var(b1 )  2 X 0 cov(b0 , b1 )  0
 1 X 2  X 02 u2 2 XX 0 u2
     2  
2 2

n  x   x  x2
u u 2

1 X 2 X 02 2 XX 0 
     2  2 
2 2

n  x  x  x2 
u u

 1 ( X 0  X )2  2 1 ( X 0  X )2 
    
2 2
   u 1   
  x2 
u u 2
 n x   n

57
since e0 is a linear conbtion of normal variable is also normally distributed with mean zero and
 1 ( X  X ) 2

variance  u 1  
2 0

 n  x 2

 2 1 ( X 0  X )2  
e0 N  0,  u 1    
  n  x 2
 
Predication intervel is
(Y0  Yˆ0 )  0 (Y0  Yˆ0 )
t 
ˆ
S .E (Y0  Y0 )  1 ( X  X ) 2

ˆ u2 1   0 2 
 n x 
 1 ( X  X ) 2

Yˆ0  t ( ) ˆ 1   0 2 
2
where d . f  n  2
x
u
2  n 

58
What’s the Difference?
Prediction Interval Confidence Interval

ˆ 2  1 ( X 0  X )2   1 ( X 0  X )2 
Yo  t( / 2,V ) ˆ u 1  
 xi2 
 Yˆo  t( / 2,V ) ˆ  
2

 n  xi2 
u
n

1 no 1

Used to estimate the value of Used to estimate the mean


one value of y (at given x) value of y (at given x)

The confidence interval estimate of the expected value of y


will be narrower than the prediction interval for the same
given value of x and confidence level. This is because there is
less error in estimating a mean value as opposed to predicting an
individual value. 59

You might also like