Ref. CH 3 Gujarati Book

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Session 3 : Lecture Outline

Problem of Estimation

• Problem of Estimation:
– Ordinary Least Squares Method
– Method of Moment Estimation Procedure
– Maximum Likelihood Estimation Procedure

• Classical Linear Regression Model: Assumptions

• Precisions of the Estimator: The Standard Errors of Least Squares


Estimators

• Gauss-Markov Theorem

• Coefficient of Determination
Ref. Ch 3 Gujarati Book
Simple Linear Regression Model
Finds a linear relationship between:
- one independent variable X and
- one dependent variable Y
First prepare a scatter plot to verify the data has a linear
trend.
Use alternative approaches if the data is not linear.
Simple Linear Regression Model: Estimation

Model:

where
Y = dependent variable
X = independent variable
β1 = intercept/constant term
β2 = slope coefficient term
ui = error or random factor

Methods to Estimate SRF (estimator)

1. Least Squares Methods (Ordinary/Simple LSM)


2. Method of Moment Estimator
3. Maximum Likelihood Methods
Simple Linear Regression Model: Estimation

Yi Xi

70 80

65 100

90 120

95 140

110 160

115 180

120 200

140 220

155 240

150 260
Simple Linear Regression Model: Estimation
Model:
Yi = ˆ1 + ˆ2 X i + uˆi − − − SRF
Yˆi = ˆ1 + ˆ2 X i − − − − FittedLine
uˆ = (Y − ˆ − ˆ X )
i i 1 2 i

or
uˆi = Yi − Yˆi

Yi Xi
70 80 4.82 65.18
65 100 -10.36 75.36
90 120 4.45 85.55
95 140 -0.73 95.73
110 160 4.09 105.91
115 180 -1.09 116.09
120 200 -6.27 126.27
140 220 3.55 136.45
155 240 8.36 146.64
*
150 260 -6.82 156.82
Simple Linear Regression Model: Estimation
I .Using Methods of Ordinary Least Squares: OLS

We estimate the intercept and slope by minimizing the vertical distance of the
data point and the estimated sample regression function. We are minimizing
the sum of squared residual

uˆi = Yi − Yˆi = Yi − ( ˆ1 + ˆ2 X i )

𝑛 𝑛
2
𝑀𝑖𝑛 ෍ 𝑢ො 𝑖 = 𝑀𝑖𝑛 ෍ ቀ𝑌𝑖 − 𝛽መ1 − 𝛽መ2 𝑋𝑖 ሻ2 ≡ 𝑀𝑖𝑛 𝑆 𝛽መ1 𝛽መ2
𝑖=1 𝛽^1 ,𝛽^2 𝑖=1 𝛽^1 ,𝛽^2

We can obtain ˆ1 , ˆby


2
taking the derivative of S ˆ1 , ˆ2 ( )
with respect to ˆ1 , ˆ order conditions), and set them equal to
(first
2
zero.
First and Second Order condition

First order condition Second order condition:


1) S ( ˆ1 , ˆ2 ) n
Mostly satisfied.
= − (Yi − ˆ1 − ˆ2 X i ) = 0
ˆ1 i =1

S ( ˆ1 , ˆ2 ) n

2) = − (Yi − ˆ1 − ˆ2 X i )( X i ) = 0


ˆ2 i =1

Two Normal Equations of OLS

 Y = nˆ + ˆ  X
i 1 2 i
Solving we get
 Y X = ˆ  X + ˆ  X
2
i i 1 i 2 i
Estimation of Slope and Intercept
Further Simplifying these two normal equations together.
n

(X i − X )(Yi − Y )
 x y
1) ˆ2 = i =1
or i i
Slope
x
n 2

(X
i =1
i − X )2 i Coefficients

2) ˆ1 = Y − ˆ2 X Intercept


where and are the sample means

Estimated Regression Model: Yˆi = ˆ1 + ˆ2 X i

Estimated Regression Model: *


Simple Linear Regression Model: Estimation
II. Deriving OLS Using Method of Moment(MoM)

• Another way of establishing the OLS formula is through the Method of Moments
approach, Developed by Pearson 1894
• The basic idea of this method is to equate certain sample characteristics, such as
the mean, to the corresponding population expected values.
• Method of moments estimation is based solely on the law of large numbers
The Method of Moments (MM) and GMM
1. Unconditional Moment

E(X-µ)=mean
The Method of Moments (MM) and GMM
2. Conditional Moment
Simple Linear Regression Model: Estimation(MoM)
• To derive the OLS estimates we need to realize that our main
assumption of E(u|x) = E(u) = 0 also implies that Cov(x,u) = E(xu) =
0

• We can write our 2 restrictions just in terms of x, y, β1 and β2 , since


u = y – β1 – β2x

• E(y – β1 – β2x) = 0
• E[x(y – β1 – β2x)] = 0

• These are called moment restrictions


Simple Linear Regression Model: Estimation(MoM)

• We want to choose values of the parameters that will ensure that the
sample versions of our moment restrictions are true
• The sample versions are as follows:

Given the definition of a sample mean, and properties of summation, we


can rewrite the first condition as follows

=> OLS estimated Intercept


Simple Linear Regression Model: Estimation(MoM)

=> OLS estimated slope


Statistical Properties of OLS Estimators
Estimated regression Model: Yˆi = ˆ1 + ˆ2 X i
1. The OLS estimators are expressed solely in terms of the observable quantities
2. They are point estimators
3. Once the OLS estimates are obtained, the sample regression line can be
easily obtained. This regression line has the following properties

i) It passes through sample mean of Y & X


1

ii) Yi Yˆi = ˆ1 + ˆ2 X i

iii) E (uˆi ) = 0
Y

iv)
  − 
Cov( 1 ,  2 ) = − X var( 2 )
v)  (uˆi X i ) = 0
X Xi
Some theorem
In deviation from:

SRF:

4. R2=r2

1. 0
Assumptions of CLRM

1: The model is linear in the parameters and variables.


Yi = 1 +  2 X i + u i

2: The X values are fixed in repeated sampling. X is


Nonstochastic

3: Zero mean of the disturbance u


• Given the value of X, the mean, or expected value of the
disturbance term, ui , is zero.

E ( ui | X i ) =0
Assumptions of CLRM
4: Homoscedasticity or equal
variance of ui
• Given the value of X, the
variance of, ui, the
disturbance term is the
same for all observations.

var ( ui | X i ) = E ui − E ( ui | X i ) 
2

= E (ui2 | X i ) uses assumption 3


= 2
Assumptions of CLRM

5: No autocorrelation between the disturbances

• Given any two X values, Xi and Xj, (i  j ) , the


correlation between any two ui and uj, (i  j ) , is zero.

cov ( ui u j ) = E ui − E ( ui | X i )  u j − E ( u j | X j ) 


= E ( ui | X i ) ( u j | X j ) uses Assumption 3
=0
Autocorrelation: Residual Plot
uˆt uˆt

.. .. . .. ..
.. . ...
.
. ...... .
... . . . . . . .. .
.
uˆt −1 . .. . uˆt −1
.
.. . . . .
. . . ..

Positive autocorrelation Negative


autocorrelation
Assumptions of CLRM

6: Zero covariance between Xi and ui or E(Xi ui )=0

cov ( ui X i ) = E ui − E ( ui | X i )   X i − E ( X i ) 
= E ( ui ) ( X i − E ( X i ) ) uses Assumption 3
= E ( ui X i ) − E ( ui ) E ( X i ) since E ( X i ) is nonstochastic
= E ( ui X i ) since E ( ui ) = 0
= 0 by assumption
Assumptions of CLRM
7: The number of observations (n) must be greater
than the number of parameters to be estimated
(k) (Micronumeriosity)

8: Variability in X values
• Technically Var(X) must be a finite positive number

9: The regression model is correctly specified.


There is no specification error or bias in the model
used for empirical analysis

10: There is no perfect Multicollinearity. There are


no perfect linear relationships among the
explanatory variables
Simple Linear Regression Model: Classical Assumptions
● Assumptions for the Classical Linear Regression Model:

1. The regression model is linear in the parameters

2. X values are fixed in repeated sampling

3. Zero mean value of disturbance ui

4. Homoscedasticity or equal variance of ui

5. No autocorrelation between the disturbances

6. Zero covariance between u and X

7. The number of observations n must be greater than the number of parameters to be


estimated

8. Variability in X values

9. The regression model is correctly specified

10. There is no perfect multicollinearity.


Precision or S.E. of Least Squares Estimators

Population Regression results

β1 = 17.00
β2 = 0.60
σ = 11.32
σ2 = 128.42
Sample Regression Function SRF
Precision or S.E. of Least Squares Estimators

Population Regression results

β1 = 17.00
β2 = 0.60
σ = 11.32
σ2 = 128.42
Estimated Model: Y=24.455+0.509X
Precision or S.E. of Least Squares Estimators

● The standard errors for the OLS estimates can be obtained


as follows:

Var (β^ )= 41.0881


SE (β^ 1)= 6.41
Var (β^1 )= 0.0016
SE (β^ 2)= 0.04
2

Cov (β^ ,β^ )=-170*0.0016


1 2 =272

The more variation of X, the smaller the variance of and the more precise estimate of
Variance of ̂ 2

Variation of X is relatively
( )
Var ˆ2 = n
2

 (X − X)
2
Y small. Slope estimate is very i
i =1
imprecise.

. .
. ..
. .. .. . .
.
variation of X

X
Variance of ̂ 2

Variation of X is much bigger. Slope ( )


Var ˆ2 = n
2

 (X − X)
Y estimate is much more precise. 2
i

. i =1

.
. . . .. . .
. . .
. . .. . . .
. . .
.
.
.
variation of X
X
Precision or S.E. of Least Squares Estimators

However, since we do not know the variance of the


error term (population variance), we can estimate
it using sample variance as follows:

̂ 2
=
 ˆ
ui2

n−k
Where n-k are the number of degrees of freedom.

N.B. sample variance of the residual is the unbiased


estimate of the population variance of the error
term.
Precision or S.E. of Least Squares Estimators

• Three important elements will determine


the precision of the estimates:

1. The magnitude of the “noise”

2. The variance of X

3. The number of observations


Precision of the estimates

Yˆi = ˆ1 + ˆ2 X i


Yi
Yˆi

Residual
Yi

Slope

Intercept

Xi
X
1. Variance of ui

Y
Yˆi = ˆ1 + ˆ2 X i

The noise can be large…Or Not


X
2. Variation in X

True relationship

X
Variance in X in relation to variance in u
2. Variation in X

Y
True relationship

X
Variance in X in relation to variance in u
2.Variation in X

X
And this is your sample…
3. Number of observations

Y
True relationship

X
Variance in X in relation to variance in u
3.Number of observations

X
Precision or S.E. of Least Squares Estimators

Covariance Between slope and Intercept term.

  −  Y
Cov( 1 ,  2 ) = − X var( 2 )

Since var ( ˆ 2 ) is always positive , the cov between the slope


and intercept is always positive and depends on sign of X .

If X is positive , then covariance will be negative.

Thus if slope coefficient is over estimated (steep) the intercept


will be under estimated (too small).
Properties of OLS Estimators
(Gauss-Markov Theorem)
Under the assumption of CLRM the least squares
estimators are the Best Linear Unbiased Estimators
(BLUE).

– Best: OLS estimate has the smallest variance


(smallest margin of error or most precise estimate)

– Linear: OLS estimate is obtained by the linear


function of Yi.

– Unbiased: Expected value of OLS estimate is the


same as true population value.
E (ˆ i ) =  i
Gauss Markov Theorem
(1) Linear means linear in the dependent variable.
N

 (X
i =1
i − X )(Yi − Y )
May be rewritten as
̂ 2 = N

 (X
i =1
i − X )2

N
Where
 ( X i − X )(Yi ) N (Xi − X )
̂ 2 = i =1
N
= wY i i
wi = N

(X i − X) 2 i =1
(X
i =1
i − X )2
i =1


–  2 now is a linear function of Yi

–  1 can similarly be written as a linear function of Yi
Gauss Markov Theorem

(2) Unbiased
– The expected value of the estimator is the true
underlying parameter

E ( ˆ 2 ) =  2
(3) Efficiency : (or Minimum Variance)
~
Var ( ˆ )  Var (  2 )
OLS
2

~
– Of all the linear, unbiased estimators,  2 OLS
has the smallest variance
Gauss Markov Theorem
(4) Consistency: An Estimator is called consistent if it converges
stochastically to the true parameter value with probability
approaching one as the sample size increased indefinitely. This
implies
p lim ˆ 2 =  2
n →
n is the number of samples
Pr{| ˆ 2 −  2 |  } → 1 as n goes to infinity for small 

Sufficient condition
(1) E ( ˆ 2 ) →  2
( 2)Var ( ˆ 2 ) → 0
as n goes to infinity


Similarly, we can generalise this with  1
The Overall Goodness of Fit: R2
This measure helps to determine the goodness of fit, or how well the sample
regression line fits the data

TSS= Σ(Yi - Y)2


(total variability of the Y
dependent variable about its
mean) .
.
RSS= Σ(Ŷi - Y)2
. . . . .
(variability in Y explained by
the sample regression) . . . .
. .. . .
ESS= Σ(Yi - Ŷi)2 . . . . .
(variability in Y unexplained
by the dependent variable x)
. .
This regression line gives
the minimum ESS among all
possible straight lines.
X
The Overall Goodness of Fit: R2
The Overall Goodness of Fit: r2 or R2

Decomposition of Variance of Yi

Yi = Yˆi + uˆ i
or
y i = yˆ i + uˆ i
Squaring this equation and summing over the sample, we obtain

 = 2
yi ˆy i2 +  yˆ uˆ
ˆu i2 + 2 i i

 y =  yˆ + uˆ
2
i
2
i
2
i

TSS = ESS + RSS


The Overall Goodness of Fit: r2 or R2
Then, TSS = ESS + RSS
 (Yˆi − Y ) 2  i
2
ESS RSS ˆ
u
1= + = +
TSS TSS  (Y i −Y ) 2
 (Y
i − Y )2

we define r 2 as

r =
2  (Yˆi − Y ) 2
=
ESS
 (Yi − Y ) 2 TSS
or
 i
2
ˆ
u RSS
r 2 = 1− = 1−
 i
(Y − Y ) 2
TSS

The coefficient of determination measures the proportion or percentage


of the total variation in Y explained by the regression model
TSS = ESS + RSS
y 2
i = r 2  yi2 + (1 − r 2 ) yi2
Problem with r2 or R2
1. Spurious regression

2. High correlation of Xt with another variable Zt.

3. Correlation does not necessarily implies Causality

4. Time series equation always generate high R2 value than cross


section equation

5. Low R2 does not means wrong choice of Xt

6. R2s from equation with different forms of Yt are not comparable.

7. R2 can be negative if the model is a bad fit or if RSS > TSS


Thanks

You might also like