0% found this document useful (0 votes)
2 views

SLRM note

The document provides an overview of simple linear regression models, detailing the relationship between the expected value of Y given Xi and the regression coefficients. It discusses the assumptions required for Ordinary Least Squares (OLS) estimation, properties of the regression line, and the criteria for selecting estimators. Additionally, it covers small and large sample properties of estimators, including unbiasedness, efficiency, and consistency.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

SLRM note

The document provides an overview of simple linear regression models, detailing the relationship between the expected value of Y given Xi and the regression coefficients. It discusses the assumptions required for Ordinary Least Squares (OLS) estimation, properties of the regression line, and the criteria for selecting estimators. Additionally, it covers small and large sample properties of estimators, including unbiasedness, efficiency, and consistency.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

SIMPLE LINEAR REGRESSION MODEL

Expected value of the distribution of Y given Xi is functionally related to Xi.

E (Y X i )  1   2 X i : Linear population regression function.

1 and 2 are unknown but fixed parameters known as the regression


coefficients. Ther are also known as the intercept and slope coefficients
respectively.

LINEAR REGRESSION MODELS

Model linear in Model linear in variables


parameters
Yes No
Yes LRM LRM
No NLRM NLRM

NLRM: such as, quadratic functions, exponential, cubic.

Expressing the deviation of an individual Yi around its expected value as follows:

 i  Yi  E (Y X i )
Yi  E (Y X i )   i

i is the stochastic disturbance or the stochastic error term. It is the


unobservable random variable taking positive or negative values.
If E (Y X i ) is assumed to be linear in Xi

Yi  E (Y X i )   i
 1   2 X i   i
Taking expected value on both sides,

E (Yi X i )  E[ E (Y X i )]  E (  i X i )
 E (Y X i )  E (  i X i )

Since E (Y X i ) , once the value of Xi is fixed, is a constant. We have made use of


the fact that the expected value of a constant is that constant itself.

The above equation implies, E (i X i )  0

THE SAMPLE REGRESSION FUNCTION

The sample counterpart of the population regression function may be written as:

Yˆi  ˆ1  ˆ 2 X i
Yˆi  estimator of E(Y X i )
ˆ1  estimator of 1
ˆ 2  estimator of  2
Note: An estimator, also known as a (sample) statistic, is simply a rule, a formula,
or method that tells how to estimate the population parameter from the information
provided by the sample at hand.

In terms of the sample regression function, the observed Yi can be expressed as:

Yi  Yˆi  ̂ i
And in terms of the population regression function, it can be expressed as:

Yi  E (Y X i )   i

The OLS method presupposes fulfillment of the following set of assumptions:

1. Zero mean of disturbances (  i ), That is, E (  i )  0i


2. Homoskedasticity or constant variance of  i :

Var(  i )  E (  i )   2 (constant )i


2

3. Serial independence of  i : Cov(i ,  j )  E (i  j )  0i  j. If this


assumption is violated and the error term observations are correlated,
autocorrelation is present.
4. Non-stochasticity of Xi: this means the series for Xi is fixed in repeated
samples.
5. Exogeneity: the disturbance term and the explanatory variables are
independently distributed. That is, they are not correlated with each other.
6. Linearity: The model must be linear in parameters because OLS is a linear
estimation technique.
7. Normality: normality assumption for the disturbance term.
OLS ESTIMATION

Yi  ˆ1  ˆ 2 X i   i

One possible criteria is to select ˆ1 and ˆ2 to make  i  0.

Therefore,  (Y i  ˆ1  ˆ2 X i )  0.


Which on dividing through n, gives:

Y  ˆ1  ˆ2 X
Hence the parameters must be chosen in such a way that the estimated line passes

through ( X , Y ).
We apply the least squares criterion which requires that te values of the parameters


2
i is minimized.
are chosen in such a way that

We can write,

   (Yi  ˆ1  ˆ 2 X i ) 2
2
i

The necessary condition of minimizati on of   i2 are


   i2
0
ˆ 1

   i2
0
ˆ 2

Applying these conditions we obtain the normal equations:


 Y  nˆ  ˆ  X
i 1 2 i

 X Y  ˆ  X  ˆ  X
i i 1 i 2 i
2

PROPERTIES OF OLS REGRESSION LINE

1. The regression line passes through the point of means ( X , Y ).


2. The residuals have zero covariance with the sample X values, and also with
i

Yˆi
which represents the estimated or predicted values of Yi.

Proof:

By definition,

1
cov(X i ,  i ) 
n
 ( X i  X )(  i   )
1

n
 ( X i  X ) i [   0]

1 1
  X i i - X  i
n n
1
  X i  i [   i  0]
n

  i2
0
Now the condition ˆ
 implies:
2

ˆ
 (Y i  ˆ1  ˆ 2 X i ) 2  0
2

 2 X i (Yi  ˆ1  ˆ 2 X i )  0


 2 X i  i  0
  X i i  0
1
Thus, cov( X i ,  i ) 
n
 X i i  0

Again, Ŷi  ˆ1  ˆ 2 X i implies that Ŷi is a linear function of X i so that,


cov(Yˆ ,  )  0
i i

3. The estimated coefficients ˆ1 and ˆ2 may be computed using the following
formulae:

ˆ1  Y  ˆ 2 X (1)

ˆ 2 
x y i i
(2)
x 2
i

where,
xi  X i  X
y i  Yi  Y
(1) is a rearrangement of the first normal equation.
(2) Follows from substitution of (1) into the second normal equation. This is
shown below:
 X Y  (Y  ˆ X ) X  ˆ  X
i i 2 i 2 i
2

  X Y  Y  X   X  X  ˆ  X
i i i 2 i 2 i
2

 ˆ [ X  X  X ]   X Y  Y  X
2 i
2
i i i

1 1
 ˆ 2 [ X i2  ( X i ) 2 ]   X i Y   X i  Yi
n n
 ˆ 2  xi2   xi y i

 ˆ 2 
x y i i

x 2
i

Alternativ ely , ˆ 2 
 x y / n  [ ( X  X )(Y  Y )] / n  cov(X , Y )
i i i i i i

x /n 2
i (X  X ) / n var( X )
i
2
i

4. The total variation in Yi may be expressed as sum of two components: the


variation ‘explained’ by the estimated regression line and the variation
‘unexplained’ by the estimated regression line.
y i  yˆ i   i
where, yˆ i  Yˆi  Y
Squaring and summing over all observations,
 y   ( yˆ   )
i
2
i i
2

  yˆ     2 yˆ 
2
i i
2
i i

  yˆ    2
i [ cov( yˆ ,  )  0]
i
2
i i

y 2
i : TSS;  yˆ i2 : ESS;   i2 : RSS
TSS  ESS  RSS
ESS  TSS  RSS
  y i    i2   yˆ i2
2

  xi y i
2

  (  2 xi )   2  xi  
ˆ 2 ˆ 2 2  x 2
  x2  i
 i 
therefore , ESS  ˆ xy 2  i i
Properties of Estimators

Two groups: small sample properties and large sample properties.

SMALL SAMPLE PROPERTIES

Unbiasedness:
An estimator ˆ is said to be an unbiased estimator of  if its mean or
expected value is equal to the value of true population parameter,  , that is,
E (ˆ )   .
This means that if repeated samples of a given size are drawn, and ˆ
computed for each sample, the average of such ˆ values would be equal to
.
If E ( ˆ )   or E (ˆ )    0, then ˆ is said to be biased and the extent of
bias for ˆ is measured by E ( ˆ )   .

Minimum Variance or Bestness:


An estimator ˆ is said to be a minimum variance or best estimator of  if

its variance is less than the variance of any other estimator, say  *.
Thus, when Var(ˆ )  Var( *), ˆ is called the minimum variance or best
estimator of .

Efficiency:
ˆ is an efficient estimator if the following two conditions are satisfied
together:
i. ˆ is unbiased, and
ii. Var(ˆ )  Var( *)

An efficient estimator is also called as a minimum variance unbiased


estimator (MVUE) or best-unbiased estimator.
Linearity
An estimator is said to have the property of linearity if it is possible to
express it as a linear combination of sample observations.

Mean-squared error (MSE)


Sometimes a difficult choice problem arises while comparing two
estimators. Suppose we have two different estimators of which one has
lower bias, but higher variance, compared with the other.
In other words, when
(biasˆ )  (bias *)
but var( ˆ )  var(  *)

How to choose between the two estimators.

In this situation, where one estimator has a larger bias but a smaller variance
than the other estimator, it is intuitively plausible to consider a trade-off
between the two characteristics.

This notion is given a precise, formal, expression in the mean-squared error.

The mean-squared error for  is defined as,

MSE( ˆ )  E[ ˆ   ] 2
 E[{ˆ  E ( ˆ )}  {E ( ˆ )   }] 2
 E{ˆ  E ( ˆ )}2  E{E ( ˆ )   }2  2 E[{ˆ  E ( ˆ )}{E ( ˆ )   }]
 Var( ˆ )  ( Biasˆ ) 2

(the cross product term vanishes)

According to the mean-squared error property, if MSE(ˆ )  MSE( *), we say


that ˆ has lower mean-squared error and accept it as an estimator of  .
LARGE SAMPLE OR ASYMPTOTIC PROPERTIES

These properties relate to the distribution of an estimator when the sample


size is large, and approaches to infinity. The important properties are:

Asymptotic unbiasedness

ˆ is an asymptotically unbiased estimator of  if,

lim E( ˆ )  
n

ˆ
This means that the estimator  , which is otherwise biased, becomes
unbiased as the sample size approaches infinity. It is to be noted that if an
estimator is unbiased, it is also asymptotically unbiased, but the reverse is
not necessarily true.

Consistency
Whether or not an estimator is consistent is understood by looking at the

behavior of its bias and variance as the sample size approaches infinity.
ˆ
is a consistent estimator if,

lim [E( ˆ )   ]  0
n

lim Var( ˆ )  0
n
Hence, the ordinary least squares estimators are BLUE (best, linear and
unbiased estimator).

Proof:

Unbiasedness of ˆ
x y
ˆ 
i i

x 2
i


 x (Y  Y )
i i

x 2
i


x Y Y x
i i i

xY i i

x 2
i x 2
i

xi
ˆ   wi Yi where wi 
 xi2
It follows,
x
w  0
i

x
i 2
i

x X (X  X )X
w X   1
i i i i

x (X  X )
i i 2 2
i i

x 2
1
w  
2 i

( x )  xi2
i 2 2
i

Now we have,

ˆ   wi Yi
  wi (  X i   i )
   wi    wi X i   wi  i
    wi  i

Taking expectations on both sides,


E ( ˆ )  E (    wi  i )
    wi E (  i )
 [ E(  i )  0]

Linearity of ˆ

We know,

̂   wiYi

Since wi’s are a set of fixed values, we may write:

ˆ  w1Y1  w2Y2  ...  wnYn


Thus ˆ has the property of linearity.

Minimum variance or bestness for ˆ

In order to prove the minimum variance property, we shall compute the


variance of ˆ and show that it is lower than the variance of some other
estimator.
ˆ     wi  i
 ˆ     wi  i
 ˆ  E ( ˆ )   wi  i  E( ˆ )  
Thus,
Var( ˆ )  E[ ˆ  E ( ˆ )] 2
 E ( wi  i ) 2
 E ( wi2  i2  2 wi w j  i  j )
 w E (
2
i i
2
)  2 wi w j E (  i  j )
 w
2 2
i [ E(  i2   2 and E(  i  j )  0i  j ]

2 1
ˆ
Var(  )  [  wi2 
 xi2  xi2

Now consider some other estimator, say  * , such that

 *   ciYi

ci (i  1,2,3,..., n) represents a set of weights.

Var(  *)   2  ci2

To compare Var( ˆ ) with Var(  *) , consider the expression:


c i  wi  (ci  wi )
ci2  wi2  (ci  wi ) 2  2wi (ci  wi )
c 2
i   wi2   (ci  wi ) 2  2 wi (ci  wi )

Note that,
 w (c
i i  wi )  0

Thus,
Var(  *)   2 [ wi2   (ci  wi ) 2 ]
 Var( ˆ )   2  (ci  wi ) 2

Since  (c i  wi ) 2  0 unless ci=wi for all i, Var(ˆ )  Var( *) and we conclude


that ˆ is a minimum variance or best estimator.

You might also like