Econometrics - Lecture Notesa
Econometrics - Lecture Notesa
2008/09
INTRODUCTION TO
A
D
AB
ECONOMETRICS
N
SE
(ECON. 352)
AS
H
HASSEN A. (M.Sc.)
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 2
2008/09
CHAPTER ONE
INTRODUCTION
1.1 The Econometric Approach
1.2 Models, Economic Models &
HASSEN
Econometric ABDA
Models
1.3 Types of Data for Econometric
Analysis
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 3
2008/09
mathematical economics:
expressing economic theory
using math (mathematical form).
economic statistics: data
HASSEN ABDA
presentation & description.
mathematical statistics:
estimation & testing techniques.
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 6
2008/09
ECONOMETRIC
MODEL
ECONOMIC MODEL
MODEL .
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 8
2008/09
HASSEN
5. Estimation ABDA
of the model
4. Obtain data….
5. Estimate parameters of the model: How?
3 methods!
Suppose Cˆ i = 184.08 + 0.8Yi
6. Hypothesis HASSEN
testing: ABDA
Is 0.8 statistically <1?
7. Interpret the results & use the model for
policy or forecasting:
A 1 Br. increase in income induces an 80
cent rise in consumption, on average.
If Y = 0, then average C = 184.08 .
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 12
2008/09
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 13
2008/09
CHAPTER TWO
SIMPLE LINEAR REGRESSION
2.1 The Concept of Regression Analysis
2.2 The Simple Linear Regression Model
2.3 The Method of Least Squares
HASSEN ABDA
2.4 Properties of Least-Squares Estimators and the
Gauss-Markov Theorem
2.5 Residuals and Goodness of Fit
2.6 Confidence Intervals and Hypothesis Testing in
Regression Analysis
2.7 Prediction with the Simple Linear Regression
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 2
2008/09
]
use: Yi = f (Yˆi ,ei ) to estimate Yi = f (E[Y | Xi ,ei ) .
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 4
2008/09
εεεε iiii
YYYY iiii
αααα
β X
but, E[Yi | X i ] = α + βX i = − −
iiii
HASSEN ABDA
From the SRF:
Y = Yˆ + e e i = Y i − Yˆi
i i i
eeee iiii
YYYY iiii
αααα
ββββ
XXXX
iiii
but Yˆi = αˆ + βˆX = − −
ˆ ˆ
i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 7
2008/09
E[Y|X2] = α + βX2
O4
Y ɛ4
E[Y|Xi] = α + βXi
O1 P3
HASSEN ABDA P4
ɛ1 P2 ɛ3
ɛ2 O3
α P1
O2
X
X1 X2 X3 X4
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 8
2008/09
O4
SRF : Yˆi = αˆ + βˆX i
Y e4 ɛ
4
R4 PRF: Y = α + βX
R3 i i
O1
HASSEN ABDA P4 Ɛi & ei are
RP P3
e1 ɛ1 22 e3 not identical
ɛ3
e2 ɛ2 Ɛ1 < e1
P1 O3
α O2
R1 Ɛ2 = e2
α̂
Ɛ3 < e3
X1 X2 X3 X4 X Ɛ4 > e4
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 9
2008/09
RSS =∑ i
2
Minimize e
i =1
We could think of minimizing RSS by successively
choosing pairs of values for αˆ and βˆ until RSS is
made as small as possible
But, we will use differential calculus (which turns
HASSEN ABDA
out to be a lot easier).
Why the squares of the residuals? Why not just
minimize the sum of the residuals?
To prevent negative residuals from cancelling
positive ones. Because the deviations are first
squared, then summed, there are no cancellations
between positive and negative values.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 13
2008/09
minimize ∑ e = (Y − Yˆ ) = (Y − αˆ − βˆX )2
2
i ∑ 2
i i ∑ i i
i =1 i =1 i =1
αˆ , βˆ
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 14
2008/09
αααα
YYYY
ββββ
XXXX
i =1 i =1
⇒Y −αˆ − βˆX = 0 ⇒ ˆ= −ˆ
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 15
2008/09
n n n
⇒ ∑Yi X i = αˆ ∑ X i + βˆ ∑ X i2
i =1 i =1 i =1
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 16
2008/09
YYYY iiii
XXXX iiii
αααα
XXXX iiii
ββββ
XXXX2222 iiii
αααα
YYYY
ββββ
XXXX
Solve ˆ = − ˆ and ∑ = ˆ∑ + ˆ∑
(called normal equations) simultaneously!
n n n
∑Yi X i = α ∑ X i + β ∑ X i ∑ i i
ˆ ˆ 2
⇒ Y X = (Y − ˆ
βX)( X
∑ i ) + ˆ
β X
∑ i
2
i =1 i =1 i =1
⇒ ∑Yi Xi = Y ∑ Xi − βHASSEN
ˆX ∑ Xi + βABDA
ˆ ∑ Xi
2
⇒ ∑Yi X i − Y ∑ X i = βˆ ∑ X i2 − βˆX ∑ X i
⇒ ∑Yi Xi − Y ∑ Xi = β( ∑ Xi − X ∑ Xi )
ˆ 2
∑ Xi
b/c X = ⇔ ∑ X i = nX.
n
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 17
2008/09
ββββ
the formula:
Alternative expressions for
ˆ :
HASSEN ABDA
∑(X − X)(Y −Y ) ˆβ = ∑ xy
2. βˆ = i i ∑x
2
2 where: x = X − X & y = Y − Y .
∑ ( Xi− X) i i
αααα
YYYY
ββββ
XXXX
αααα
for ˆ just use: ˆ = − ˆ ∑Yi Xi − nXY
Or, if you wish: αˆ = Y −{X.[ ]}
X 2 − nX 2
2 2 2 ∑
[∑ X − nX ]Y −[X∑Y X − nX Y] i
⇒αˆ = i i i
2 2
∑ X − nX HASSEN ABDA
i
Y ∑ X 2 − nX 2Y − X ∑Y X + nX 2Y
⇒αˆ = i i i
X 2 − nX 2
∑ i
2
Y ∑ X − X ∑Y X (∑Y )(∑ X 2) − (∑ X )(∑Y X )
i i i ⇒αˆ = i i i i i
⇒ αˆ =
∑ X 2 − nX 2 n(∑ X 2 − nX 2)
i i 18
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 19
2008/09
− α − ˆX )( X HASSEN
β = ABDA ∑e X = 0
2. ∑ i
[(Y
i=1
ˆ i i )] 0 equivalently, i
i=1
i
Yˆi = αˆ + βˆXi
Y
X
X
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 21
2008/09
4 -3.6 12.96 -3 9
HASSEN
0 ABDA
5 0.4 1.96 0
αˆ = Y − βˆX
6 -2.6 6.76 0 0
7 -0.6 0.36 -2 4 = 9.6 − 0.75(8) = 3.6
8 -0.4 0.16 -1 1
9 -1.4 1.96 1 1
10 0.4 0.16 2 4
Ʃ 0 30.4 0 28
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 29
2008/09
Ʃ 96 0 14.65
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 30
2008/09
βˆ = ∑ x Y
−
i i Y ∑ x i
∑x ∑x2
i
2
i
⇒βˆ = k1Y1 + k2Y2 +...+ knYn
=
∑ xYi i
(since∑x = 0) i
∑x 2
i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 33
2008/09
Note that:
(1) ∑ i is a constant
2
x
(2) because xi is non-stochastic, ki is also nonstochastic
(3) .∑ k i = ∑ ( xi
)=
∑ xi
=0
∑ x HASSEN
2
i ∑ x ABDA
2
i
= ∑(
x
)( x ) =
∑ x
=1
2
i
(4) . ∑ k i x i
i
i
∑x 2
i ∑x 2
i
xi
(5) . ∑ k = ∑[( 2 )] =
2 2
=
1
.
∑i
x 2
i
∑ xi (∑ xi )
2 2
∑ xi
2
xi xi
(6) . ∑ki X i = ∑( 2 )( X i ) = ∑( 2 )( xi + X ) = 2 + ∑i
X ∑ xi
x 2
=1
∑ xi ∑ xi ∑ xi ∑ xi2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 34
2008/09
E ( βˆ ) = E ( β ) + E ( k1ε 1 + k 2 ε 2 + ... + k n ε n )
E ( βˆ ) = E ( β ) + ( ∑ k i ).E (ε i )
E(βˆ ) = β + ( k ).(0)
∑ i
E(βˆ ) = β
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 35
2008/09
var(β ) = k (σ ) + k (σ ) + ... + k (σ )
ˆ 2
1
2 2
2
2 2
n
2
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 36
2008/09
~ ∑ i
~
var(β ) = w1 (σ ) + w2 (σ ) + ... + wn (σ )
2 2 2 2 2 2
~
var(β ) = σ 2
∑w 2
i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 38
2008/09
~ ~
var( β ) = var (βˆ) if and o nly if all
⇒ var(β) > var(β). d s are zero and thus, ∑ d
ˆ
i
2
i39 = 0.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 40
2008/09
Linearity of αˆ: αˆ = Y − βˆ X
⇒ αˆ = Y − X{∑kiY i }
⇒ αˆ = Y − X{k1Y1 + k2Y2 + ...+ knYn }
1
⇒ αˆ = (Y1 + Y2 + ... +HASSEN
Yn ) − {XABDA
k1Y1 + Xk2Y2 + ...+ XknYn }
n
1 1 1
⇒αˆ = ( − Xk1)Y1 + ( − Xk2 )Y2 + ...+ ( − Xkn )Yn
n n n
1
⇒αˆ = f1Y1 + f2Y2 + ...+ fnYn where fi = − Xki
n 40
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 41
2008/09
var(α ) = f1 (σ ) + f2 (σ ) +... + fn (σ ) =σ ∑ fi
ˆ 2 2 2 2 2 2 2 2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 43
2008/09
n
1 2
var(αˆ ) = σ {∑( 2 + X ki − Xki )}
2 2 2
n n
2
2 1 2 1 X
var(αˆ ) = σ { + X ∑ kiHASSEN
2 2
− X ∑ABDA
ki } var(αˆ ) = σ 2 ( + )
n n n ∑xi 2
2
1 1 X
ˆ) = σ 2 ∑ i 2
2
var(αˆ ) = σ { + X ∑ ki } = σ { +
2 2 2 2
}
2 or,var(α
X
n n ∑ xi ∑
n x i
note that :
2
1 1 X
∑ fi =∑(n − Xki ) =1− X∑ki =1 ∑ fi = +
2
n ∑43
xi
2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 44
2008/09
~
var(α ) = z (σ ) + z (σ ) + ... + z (σ )
2 2 HASSEN
2 2 ABDA 2 2
1 2 n
~
var(α ) = σ 2
∑z 2
i
∗ Let us now compare var(αˆ ) and var(α~ )!
∗ Suppose zi ≠ f i , and the relatioship
b/n them be given by : d i = zi − f i .
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 46
2008/09
1 xi
∑ d = ∑ z + ∑ f i − 2{∑[ zi ( n − X
i
2 2
i
2
)]}
∑ xi2
1 X
⇒∑d = ∑z + ∑fi − 2{ ∑zi − ( 2 )(∑zi xi )}
i
2 2
i
2
n ∑xi
1 X
⇒∑d = ∑z + ∑fi − 2{ − ( 2 )(−X)}
i
2 2
i
2
n ∑xi
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 47
2008/09
⇒ ∑ d i2 = ∑ z i2 + ∑ f i 2 − 2∑ f i 2
.
⇒ d =
∑ i ∑ i ∑ i
2
z 2
− f 2
⇒∑z =∑ ∑2
i d +
HASSEN
i
2
ABDA
f 2
i
⇒ ∑ z >∑ f i
2
i
2
⇒σ ∑ z >σ ∑ f
2
i
2 2
i
2
~
⇒ var(α ) > var(αˆ ). all ds and ∑d are zero.
var (α~) = var (αˆ) if and only if
2
i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 48
2008/09
HASSEN ABDA
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 49
2008/09
∑ (Yi − Y ) = ∑ (Yi − Y + ei )
2 ˆ 2
∑y 2
i = ∑ ( yˆ i + HASSEN
2
ei ) ABDA
∑ i ∑ i ∑ i + 2∑ yˆi ei
y 2
= ˆ
y 2
+ e 2
2. R =
2
=βˆ
∑ ( βˆ x ) 2
R2 =
ESS
=
TSS ∑ y 2
TSS ∑ y2
The OLS regression coefficients are chosen in such
HASSEN ABDA
a way as to minimize the sum of the squares of the
residuals. Thus it automatically follows that they
maximize R2. ⇒1=
ESS
+
RSS
TSS = ESS + RSS TSS TSS
ESS RSS
⇒
TSS ESS RSS ⇒
= + TSS
=1−
TSS ⇒ 3. R2 = 1−
∑i
e 2
4. R 2
=
ESS ∑ xy
= βHASSEN
ˆ =
15 . 75
= 0 .5181
2 ABDA 30 . 4
TSS ∑ y
R 2
=
∑ xy ∑ xy
∑ x ∑ y
2 2
2
( ∑ xy )2
[cov(X , Y )]
⇒ 5. R 2
= ⇒ 6. R = 2
∑x ∑y2 2
var(X ) × var(Y )
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 54
2008/09
RSS = (1 − R )∑ y
Note: 2 2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 55
2008/09
To sum up:
Use Yˆi = αˆ + βˆX i to estimate E[Y | X i ] = α + β X i .
n n n
OLS: min i∑=1 ∑
e = 2
i (Yi − Yˆi ) 2
= ∑ i (Y − ˆ
α − ˆ
β X i )2
i =1 i =1
α̂, β̂
∑ xy
β =
ˆ
2 ˆ
α = Y − ˆ
βX
HASSEN ∑ x ABDA
Given the assumptions of the linear regression
model, the estimators αˆ and βˆ have the smallest
variance of all linear and unbiased estimators of
α and β
σ 2 ∑ i
2 2 2
1 X X
var( β ) =
ˆ var(αˆ ) = σ 2 ( + ) =σ
∑ x i2 n ∑ xi 2
n∑55xi2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 56
2008/09
To sum up …
2
σ σ
∑ y = ∑ yˆ + ∑e
2 2
. 2 2
i i i var( βˆ ) = =
∑x 2
28
TSS = ESS + RSS
i
≈ 0 . 0357 σ 2
R 2
=
ESS
=
∑ yˆ 2
1 X 2
var( αˆ ) = σ 2
( + )
TSS ∑y 2
HASSEN ABDA n ∑ x i
2
∑ y = β̂ ∑ xy
ˆ 2
= σ ( 2 1
10
+
64
28
)
∑ ˆ
y 2
= β
ˆ 2 x2
∑ ≈ 2 . 3857 σ 2
E ( RSS ) = E (∑ e ) = (n − 2)σ
2
i
2
Thus, if we define σˆ 2
=
∑ e i
2
, then :
n−2
1 2 ABDA1
HASSEN
E (σˆ ) = (
2
) E (∑ ei ) = ( )(n − 2)σ = σ
2 2
n−2 n−2
⇒ σˆ 2
=
∑ e i
2
is an unbiased estimator of σ . 2
n−2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 58
2008/09
ε
. i ~ N(0, ) σ 2
i⇒ Y ~ N(α + βX ,σ )i
2
σ 2 αˆ − α 2 ∑ i
X 2
βˆ ~ N (β , ) α
ˆ ~ N(α,σ ) ~ t n −2
∑i
x 2
∑ xi
2
s ˆ
e(αˆ )
HASSEN ABDA 2
βˆ − β seˆ(αˆ) = σˆ.
∑ iX
(
σ
) ∑x 2
i ~ N(0,1) 2
n ∑ xi
βˆ − β σˆ
~t seˆ(βˆ ) = ∑e 2
s eˆ ( βˆ )
n−2 2 σˆ = i
x
∑ i n−2
59
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 60
2008/09
αααα
αααα
Confidence Interval for α and β :
ˆ−
αααα
P{−t n−2
α /2 ≤ ≤t n−2
α /2 } = 1−α
seˆ( ˆ )
αααα
αααα
HASSEN ABDA
100( 1 − α)% Two-Sided
αααα
::::
CI for
ˆ ± (t α/n−22 )seˆ( ˆ)
Similarly,
100( 1 − α)% Two-S ided
CI for β: β ± (tα / 2 ) seˆ( β )
ˆ n− 2 ˆ
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 61
2008/09
σ
CI for σ 2: P{χ 2 ≤χ ≤χ 2 2
} = 1−α
1−(α / 2);df df (α / 2);df
σ
( n − 2)ABDA
ˆ 2
⇒ P{χ 2
1− (α / 2 ); ( n − 2 ) ≤HASSEN ≤ χ (α / 2 );( n − 2 ) } = 1 − α
2
σ 2
1 σ 2
1
⇒ P{ ≥ ≥ } = 1−α
χ 2
1− (α / 2 );( n − 2 ) (n − 2)σˆ 2
χ 2
(α / 2 );( n − 2 )
2
1 σ 1
⇒ P{ ≤ ≤ 2 } = 1− α
χ 2
(α / 2 );(n − 2 ) (n − 2 )σˆ 2
χ1−(α / 2 );(n− 2 )
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 62
2008/09
CI for σ 2 (continued):
( n − 2 )σˆ 2
( n − 2 )σˆ 2
⇒ P{ 2 ≤σ ≤ 2
2
} = 1−α
χ (α / 2 ); n − 2 χ 1− (α / 2 ); n − 2
⇒ 100( 1 − α)% Two-S ided CI
HASSEN ABDAfo r σ 2
:
(n − 2)σˆ 2
(n − 2)σˆ 2
[ , ]
χ 2
(α / 2 );n − 2 χ 2
1−(α / 2 );n − 2
RSS RSS
[ , ]
OR
χ 2
(α / 2);n−2 χ2
1−(α / 2);n−2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 63
2008/09
αααα
::::
95% CI for α and β :
95% CI for 1−α = 0.95⇒α = 0.05 ⇒ α / 2 = 0.025
3.6 ± (t 08.025 )( 2 .09 )
⇒ 95% CI for α :
= 3.6 ± ( 2 .306 )( 2 .09 )
= 3 .6 ± 4 . 8195 :::: [−1.2195, 8.4195]
HASSEN ABDA
95% CI for β
0.75 ± (t 08.025 )(0.256 ) ⇒ 95% CI for β :
= 0.75 ± ( 2.306 )(0.256 ) [0.1597, 1.3403]
= 0.75 ± 0.5903
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 65
2008/09
χ α / 2;n − 2 : χ 2
2
0.025;8 = 17.5
χ 2
: χ 2 HASSEN ABDA
0.975;8 = 2.18
1− (α / 2 ); n − 2
::::
⇒ 95% CI for σ 2
14.65 14.65
=[ ,
17.5 2.18
]
= [0.84, 6.72]
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 66
2008/09
var(YP − YP ) = σ
ˆ ∑ i
2
X 2
+
HASSENσ 2 X 2
0
ABDA − 2 X σ 2 X
+ σ 2
0
n∑ xi
2
∑ix 2
∑i
x 2
1 ( X − X ) 2
var(YP − YP ) = σ [1 + +
ˆ 2 0
]
n ∑xi 2
1 ( X − X ) 2
⇒ var(YˆP − YP ) = σ 2 [ + 0 2 ]
n
HASSEN ABDA
∑ xi
Again, the variance increases the farther away the
value of X0 is from X .
The variance (the standard error) of the prediction
error is smaller in this case (of predicting the
average value of Y, given X) than that of predicting
a value of Y, given X.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 78
2008/09
1 ( X − X ) 2
1 ( 6 − 8) 2
se(YP ) = σ [1 + +
ˆ ˆ *
ˆ 2 0
] ⇒ s ˆ
e (Yˆ
P ) = 1.35 1 +
*
+
n ∑ xi 2
10 28
= 1.35(1.115) = 178.508
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 79
2008/09
Point prediction:
HASSEN ABDA
[Average sales | advertising of 600 Birr] = 8,100 Birr.
Interval prediction: 95% CI: 1 (X − X )2
seˆ (YˆP* ) = σˆ 2 [ + 0
]
n ∑x 2
i
1 ( 6 − 8) 2
⇒ se (Yˆ ) = 1.35 + ⇒ se(YP ) = 1.35(0.493) = 0.667
ˆ
* *
P
10 28
ˆ
1. Y = α + βX + ε ⇒ dY = β .dX ⇒ β =
dY
= slope
dX
β is the (AVERAGE) change in Y resulting from
a unit change in X.
α + βX +ε HASSEN ABDA
2. Y =e ⇒ lnY = α + βX +ε
1 dY )
⇒ d(lnY ) = β.dX ⇒ .dY = β.dX ⇒ β = (
Y = Relative ∆ in Y
Y dX Absolute ∆ in X
( dY ) × 100 %age ∆ in Y
⇒ β (×100) = Y = β ( ×100) is the (AVERAGE)
dX dX percentagechange in Y resul -
⇒ %age ∆ in Y = β .dX (× 100 ) ting from a unit change in X.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 81
2008/09
d (ln Y ) dY / Y %age ∆ in Y
⇒β = = =
d (ln X ) dX / X %age ∆ in X
β is the (AVERAGE) percentage change in Y
= Elasticity
resulting from a percentage change in X. 81
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 82
2008/09
HASSEN ABDA
STATA SESSION
2008/09 CHAPTER THREE
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 1
.
3.6 Statistical Inferences in Multiple Linear
Regression
3.7 Prediction with Multiple Linear Regression
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 2
2008/09
3.1 Introduction: The Multiple Linear Regression
Relationship between a dependent & two or more
independent variables is a linear function
Population Random
Population slopes
Y-intercept Error
Yi = β 0 + β1 X 1i + β 2 X 2i + • • • + β K X Ki + ε i
. Y = βˆ + βˆ X + βˆ X + • • • + βˆ X + e
i 0 1 1i 2 2i K
Residual
Dependent (Response) Independent (Explanatory)
variable (for sample) variables (for sample)
Ki i
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 3
2008/09
3.1
3.1 Introduction:
Introduction: The
The Multiple
Multiple Linear
Linear Regression
Regression
) What changes as we move from simple to
multiple regression?
1. Potentially more explanatory power with more
variables;
2. The ability to control for other variables; (and
the interaction of the various explanatory
variables: correlations and multicollinearity);
.
3. Harder to visualize drawing a line through
three or more (n)-dimensional space.
4. The R2 is no longer simply the square of the
correlation coefficient between Y and X.
3.1 Introduction: The HASSEN
JIMMA UNIVERSITY
2008/09 Multiple
A.
Linear Regression
CHAPTER 3 - 4
)Slope ( βj ):
Ceteris paribus, Y changes by βj for every 1 unit
change in X , on average.
j
)Y-Intercept (β0 ):
The average value of Y when all Xj s are zero.
(may not be meaningful all the time)
)A multiple linear regression model is defined to
be linear in the regression parameters rather
.
6. n > K+1. (Number of observations > number of
parameters to be estimated). Number of
parameters is K+1 in this case ( β0, β1, …, βK )
7. ɛi ~N(0, σ2). Normally distributed errors.
3.2
3.2 Assumptions of the Multiple
Assumptions of the Multiple Linear
Linear Regression
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 6
2008/09
Regression
) Additional Assumption:
8. No perfect multicollinearity: That is, no exact
linear relation exists between any subset of
explanatory variables.
) In the presence of perfect (deterministic) linear
relationship between/among any set of the Xjs,
the impact of a single variable ( β j ) cannot be
. identified.
) More on multicollinearity in a later chapter!
JIMMA UNIVERSITY
2008/09 3.3
3.3 Estimation: The
The A.Method
Estimation:HASSEN Method of
of OLS
OLS
CHAPTER 3 - 7
⎛ ∑ y i x1i ⎞ ⎡ ∑ x 1i2 ∑x x ⎤ ⎛ βˆ 1 ⎞
⇒ ⎜⎜ ⎟=⎢ 1i 2i
⎥ ⎜⎜ ˆ ⎟⎟
⎝ ∑ i 2i ⎠ ⎢⎣∑ x 2i x 1i
⎟
y x ∑x 2
2i ⎥⎦ ⎝ β 2 ⎠
F = A • β̂
Solve for the coefficients:
Determinant: A = ∑ 1i ∑ x1i x2i
x 2
.
A1 =
β1 =
ˆ
∑ yi x1i
∑y x
A1
A
i 2i
=
∑x
∑x1i x2i
2
2i
= (∑ yi x1i )(∑x ) − (∑x1i x2i )(∑ yi x2i )
2
2i
A2 =
∑ x ∑ yx 2
1i i 1i
= (∑ yi x2i )(∑x12i ) − (∑x1i x2i )(∑ yi x1i )
∑x x ∑y x 1i 2i i 2i
. β2 =
ˆ
A
=
1i
βˆ 0 = Y − βˆ1 X 1 − βˆ 2 X 2
JIMMA UNIVERSITY
2008/09 3.3
3.3 Estimation: The
TheA.Method
Estimation:HASSEN Method of
of OLS
OLS
CHAPTER 3 - 10
Y = Xβ + e
ˆ
n × ( K + 1) (K +1) ×1 n ×1
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 12
2008/09
3.3
3.3 Estimation:
Estimation: The
The Method
Method of
of OLS
OLS
.
⎡e 1 ⎤ ⎡ Y1 ⎤ ⎡1 X 11 X 21 X 31 … X K1 ⎤ ⎡ βˆ 0 ⎤
⎢e ⎥ ⎢Y ⎥ ⎢1 ⎥ ⎢ˆ ⎥
⎢ 2⎥ ⎢ 2⎥ ⎢ X 12 X 22 X 32 … X K2 ⎥ ⎢ β 1 ⎥
⎢e 3 ⎥ = ⎢Y3 ⎥ − ⎢1 X 13 X 23 X 33 … X K3 ⎥ * ⎢ βˆ 2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣e n ⎥⎦ ⎢⎣Yn ⎥⎦ ⎢⎣1 X 1n X 2n X 3n … X Kn ⎥⎦ ⎢⎣βˆ K ⎥⎦
. e = Y − Xβ̂
3.3
3.3 Estimation: The
Estimation: The Method
Method of
of OLS
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 13
2008/09
OLS
⎛ e1 ⎞
⎜ ⎟
⎜ e2 ⎟
RSS = ∑ei2 = e12 + e22 + ...+ en2 = (e1 e2 … en ).⎜ ⎟ ⇒RSS= e'e
⎜ ⎟
⎜e ⎟
⎝ n⎠
RSS = (Y − Xβˆ )'(Y − Xβˆ ) = Y'Y − Y'Xβˆ − βˆ ' X'Y + βˆ ' X'Xβˆ
Since Y' Xβˆ is a costant, Y' Xβˆ = (Y' Xβˆ )' = βˆ ' X' Y
.
⇒ RSS = Y'Y − 2βˆ ' X'Y + βˆ ' (X'X)βˆ
F.O.C. :
∂( RSS)
∂(βˆ )
=0 ⇒
∂(RSS )
∂(βˆ )
= −2X' Y + 2X' Xβˆ = 0
⇒ −2X' (Y − Xβˆ ) = 0
JIMMA UNIVERSITY
2008/09 3.3
3.3 Estimation: The
Estimation: The Method
Method of
HASSEN A.
of OLS
OLS
CHAPTER 3 - 14
⎡ 1 1 … 1 ⎤ ⎛ e ⎞ ⎛ 0⎞
⎢ X11 X12 … X1n ⎥ ⎜ 1 ⎟ ⎜ ⎟
⎢X ⎥ ⎜ e2 ⎟ ⎜ 0⎟
⇒
. X'e = 0 ⇒ ⎢ 21 X 22 … X 2n ⎥.⎜ ⎟ = ⎜ ⎟
⎢ ⎥⎜ ⎟ ⎜ ⎟
⎢X ⎜ ⎟ ⎜ ⎟
⎣ K1 X K 2 … X Kn ⎥⎦ ⎝ n ⎠ ⎝ 0⎠
e
1. ∑ ei = 0 2. ∑ ei X ji = 0. ( j = 1,2,..., K )
⎛ βˆ0 ⎞ ⎡ 1 ⎤ ⎡1 X11 … X K1 ⎤
⎜ ⎟ 1 … 1 ⎢ ⎥
⎜ β
ˆ ⎟ ⎢ X X … X 1n ⎢
⎥ 1 X … X K2 ⎥
βˆ = ⎜ 1 ⎟ X/ X = ⎢ 11 12
⎥.
12
⎢ ⎥⎢ ⎥
⎜ ⎟
⎜ βˆ ⎟ ⎢⎣ X K1 X K 2 … X Kn ⎥⎦ ⎢1 X … X ⎥
⎝ K⎠ ⎣ 1n 2n ⎦
⎡ n ∑X … ∑XK ⎤
⎢ ⎥
1
⇒X X=
/ ⎢ ∑ X1 ∑X 2
1 …∑ X 1 X K ⎥
⎢ ⎥
⎢ ⎥
⎢⎣∑ X K ∑X K X1 … ∑ X K2 ⎥⎦
. ⎡ 1
⎢ X 11
X Y=⎢
/
⎢
⎢⎣ X K 1
1
X 12
X K2
…
…
…
1 ⎢ ⎥⎤
⎥
⎡
⎥⎢ ⎥
Y1 ⎤
X 1 n ⎥ ⎢Y 2 ⎥
X Kn ⎥⎦ ⎢Y ⎥
⎣ n⎦
⎡ ∑Y ⎤
⎢ YX ⎥
⇒ X/Y = ⎢
⎢
∑ 1⎥
⎢ YX ⎥
⎣∑
⎥
K ⎦
3.3
3.3 Estimation:
Estimation: The
The Method
Method of
of OLS
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 16
2008/09
OLS
. ˆβ = (X' X)−1 (X' Y)
-1
⎛ βˆ 0 ⎞ ⎡ n ∑X ∑X … ∑X ⎤ ⎛ ∑Y ⎞
⎜ ⎟ ⎢ 1 2 K
⎥ ⎜ ⎟
⎜ βˆ1 ⎟ ⎢ ∑ X 1 ∑X ∑X X … ∑X X ⎜ ∑ YX 1 ⎟
2
1 1 2 1 K ⎥
⎜ ˆ ⎟=⎢ X ⎜ YX ⎟
⎜ β2 ⎟ ⎢ ∑ 2 ∑X X ∑X ∑X X ⎜∑ 2 ⎟
2
… K
⎥
⎥
2 1 2 2
⎜ ⎟ ⎢ ⎥ ⎜ ⎟
⎜⎜ ˆ ⎟⎟ ⎢ 2 ⎥ ⎜⎜ ⎟⎟
⎝ β K ⎠ ⎣∑ X K ∑X X1 ∑X X2 … ∑ K ⎦
X ⎝ ∑ YX K ⎠
.
(K+1)×1
K K
.
)Partial effect: holding the other variable constant
or after eliminating the effect of the other variable.
)Thus, β̂ 1 is interpreted as measuring the effect of
X1 on Y after eliminating the effect of X2 on X1.
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 19
. b ye
⇒ bye
=
∑
∑e
ye
=
=2
∑
12 y( x − b x )
∑ (x − b x )
12
∑ yx − b ∑ yx
1
∑ x + b ∑ x − 2b ∑ x x
2
1
2
12
1
12
2
2
12 2
12 2
2
12
2
1 2
But, b12 =
∑
∑x
xx 1 2
2
2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 21
∑ xx
∑ yx − ( x )∑ yx
1 2
.
∑
1 2 2
⇒b =
2
∑ ∑
ye
xx xx
∑ x + ( x ) ∑ x − 2( x )∑ x x
2 1 2 2 2 1 2
∑ ∑
1 2 2 2 1 2
2 2
⇒ bye =
∑2 x 2
(∑ x x ) 2
(∑ x x ) 2
∑ x1 + −2
2 1 2 1 2
∑2 x 2
∑2 x 2
.
bye =
[∑ x22 ∑ yx1 − ∑ x1 x2 ∑ yx2 ]
[∑ x12 ∑ x22 − (∑ x1 x2 ) 2 ]
2
b =
∑2
x 2
∑x
∑
2
x ∑ yx − ∑x x ∑ yx
2
2
∑x ∑x − (∑x x )
ye 2
1
1
⇒ bye = β̂1
2
2
1 2
1 2
2
2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 22
For instance:
a) How much does X2 explain after X1 is already
included in the regression equation? Or,
b) How much does X1 explain after X2 is included?
) These are measured by the coefficients of partial
determination: ry 2•1 and r y21• 2 , respectively.
2
ry1•2 =
ry1 − ry2r12
(1− r )(1− r )
2
y2
2
12
ry2•1 =
ry2 − ry1r12
(1− r )(1− r )
2
y1
2
12
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 25
. 0 ≤ r + r + r − 2ry1ry 2 r12 ≤ 1
2
y1
2
y2
2
12
6. ry2 = r12 = 0 does not mean that ry1 = 0.
Y & X1 and X1 & X2 are uncorrelated does not
mean that Y and X1 are uncorrelated.
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 27
.
) Adding X2 to the model reduces the RSS by:
RSSSIMP − RSSMULT = (1 − R
= (R
y•1 ∑
2
)
2
y 2
y•12
− (1 − Ry•12 ∑
2
− R )∑ y
2
y•1
) y
2
2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 29
.
) This is the Coefficient of Partial Determination
(square of the coefficient of partial correlation).
) We include X2 if the reduction in RSS (or the
increase in ESS) is significant.
) But, when exactly? We will see later!
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 30
.
X 1 & X 2 jointly
2. proportion of ∑y 2
i
unexplained part of
3. proportion of
explained by X 1 alone X leaves unexplained
1
∑ i that
y 2
∑ i
y 2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 31
∑y 2
∑ i
y 2
n
.
) Coefficients of Partial Determination:
r 2
y 2•1 =
R 2
y•12
1− R
−R
2
y•1
2
y•1
r2
y1•2 =
R 2
y •12
1− R
−R
2
i =1
2
y •2
y •2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 32
. variable.
Reason: TSS, ESS, and RSS depend on the units
in which the regressand Yi is measured.
For instance, the TSS for Y is not the same as the
TSS for log(Y).
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 33
∑ yˆ 2
∑ e 2
[ ∑ e 2
]
R 2
= = 1− n − (K + 1)
R 2
= 1−
∑y 2
∑y 2
[∑
y 2
]
n−1
(Dividing TSS and RSS by their df).
)K + 1 represents the number of parameters to be
estimated. e2
R 2
= 1−[
∑ •
n −1
]
∑y 2
n − K −1
n −1
. R = 1− (1− R ) • (
2
1− R = (1− R ) • (
2 n −1
2
n − K −1
2
n − K −1
)
)
As long as K ≥ 1,
1− R 2 > 1− R2 ⇒ R 2 < R2
In general, R 2 ≤ R 2
As n grows larger (relative
to K ), R 2 → R 2 .
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 35
.
4. R. 2 should never be the sole criterion for choosing
between/among models:
) Consider expected signs & values of coefficients,
) Look for results consistent with economic theory
or reasoning (possible explanations), ...
Numerical Example:
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 36
2008/09
. ƩY = 150
24
40
4
8
ƩX1 = 25
9
12
ƩX2 = 50
Numerical Example:
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 37
2008/09
−1
⎛ βˆ ⎞ ⎡ n
⎜ ⎟ ⎢
0 ∑ X ∑ X 1 2 ⎤ ⎛ ∑Y ⎞
⎥ ⎜ ⎟
⎜ βˆ ⎟ = ⎢∑ X
1 ∑ X1 ∑ X X 2
1 1 2 ⎥ • ⎜ ∑YX1 ⎟
⎜⎜ ˆ ⎟⎟ ⎢ ⎜ YX ⎟
β
⎝ ⎠
2 ⎣∑ X ∑ X X
2 ∑ X 1 2
2 ⎥
2 ⎦ ⎝∑ 2 ⎠
⎛ βˆ 0 ⎞ ⎡ 5 25 50 ⎤ −1 ⎛ 150 ⎞
⎜ ⎟ ⎜ ⎟
⎢ ⎥
⎜ βˆ 1 ⎟ = 25 141 262 • ⎜ 812 ⎟
⎜⎜ ˆ ⎟⎟ ⎢ ⎥
⎜ 1552 ⎟
β ⎢ 50 262 510 ⎥
⎝ 2⎠ ⎣ ⎦ ⎝ ⎠
⎛ βˆ 0 ⎞ ⎛ - 23.75 ⎞
.
⎛βˆ 0 ⎞ ⎡40.825 4.375 - 6.25⎤ ⎛ 150⎞
⎜ ⎟
⎜⎜ ˆ ⎟⎟ ⎢
β
⎝ 2⎠ ⎣
⎢
⎢ - 6.25 - 0.75 1
⎥
⎥
⎜
⎜βˆ 1 ⎟ = 4.375 0.625 - 0.75 •⎜ 812⎟
⎥
⎜1552⎟
⎦ ⎝ ⎠
⎜ ⎟ ⎜
⎟ ⇒ ⎜ βˆ 1 ⎟ = ⎜ - 0.25
⎜⎜ ˆ ⎟⎟ ⎜
1. TSS = ∑ y 2 = ∑ Y 2 − nY 2
2008/09
ESS = β1 ∑ x1 + β2 ∑ x2 + 2 βˆ1 βˆ 2 ∑ x1 x2
ˆ 2 2 ˆ 2 2
5. R = 1 −
2 RSS (n − K − 1)
TSS (n − 1)
= 1−
1.5 2
272 4
⇒ R 2
= 0.9890
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 43
Regressing Y on X 1 :
2008/09
βˆ y1 =
∑ yx 1
=
∑ YX 1 − nX 1Y
=
62
= 3.875
∑x 2
1 ∑X 1
2
− nX 1
2
16
ESS SIMP βˆ y •1 ∑ yx1 3.875 × 62
6. R 2
= = = = 0.8833
∑y
y •1
TSS 2
272
RSSSIMP = (1 − 0.8833)(272) = 0.1167(272) = 31.75
X 1 (education ) alone explanis about 88.33% of the difference s
.
in wages, and leaves about 11.67% ( = 31.75) unexplaine d.
7. R
(R 2
2
y•12
y •12
−R
2
y •1
2
y•1 = 0.9945 − 0.8833 = 0.1112
− R )∑ y = 0.1112(272) = 30.25
2
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 44
2008/09
R 2
y •12 −R 2
y •1 0.9945 − 0.8833
8. r 2
y 2•1 = = = 0.9528
. 1− R 2
y •1
ε i ~ N(0,σ ) 2
σ 2
ˆ ˆ
β1 ~ N( β1 , var(β1 )); var(βˆ )=
∑x (1 − r )
1 2 2
1i 12
σ 2
βˆ2 ~ N (β 2 , var(βˆ2 )); var(βˆ2 ) =
∑x (1 − r )
.βˆ0 ~ N(β0 , var(βˆ0 ));
var(β0 ) =
ˆ σ
n
2
2
2i
− σ r 2 2
cov(βˆ1 , βˆ2 ) = 12
( ∑ x 1i x 2 i ) 2
r2
=
∑x ∑x
12 2 2
1i 2i
.
∑x
σˆ =
RSS
2
n−3
2
2i (1 − r ) is the RSS from regressing X 2 on X 1 .
2
12
is an unbiased estimator of σ .
2
3.6
3.6 Statistical
Statistical Inferences
JIMMA UNIVERSITY
2008/09 in
in Multiple
InferencesHASSENMultiple
A.
Linear
Linear Regression
Regression
CHAPTER 3 - 47
−1
⎡ n ∑ X1 … ∑ XK ⎤
⎢ ⎥
2⎢∑ 1
X ∑ X1 2
…∑ X1 X K ⎥
var− cov(β) = σ (X X) = σ
ˆ 2 / -1
⎢ ⎥
⎢ 2 ⎥
⎢⎣∑ X K ∑ X K X1 … ∑ X K ⎥⎦
−1
⎡ n ∑ X1 … ∑ XK ⎤
⎢ ⎥
2⎢∑ ∑ …∑ X1 X K ⎥
2
∧ X1 X1
var − cοο(β) = σˆ
ˆ
⎢ ⎥
⎢ ⎥
∑ X K ∑ X K X 1 … ∑ X K ⎥⎦
.
)Note that: ⎢
⎣
(a) (X'X)-1 is the same matrix we use to derive the
OLS estimates, and
(b) σˆ 2 = RSS in the case of two regressors.
n−3
2
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 48
n − K −1
Note:
) Ceteris paribus, the higher the correlation
coefficient between X1 & X2 ( r12 ), the less
precise will the estimates βˆ1 & βˆ2 be, i.e., the CIs
.
) To test hypotheses about & construct intervals
for individual β j use: ˆ
βj −βj
seˆ(βˆ j )
*
~ tn−K−1;∀j = 0,1,...,K.
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 50
RSS (n − K − 1)σ̂ 2
2
= 2
~ χ 2
n− K −1
σ σ
) Tests of several parameters and several linear
functions of parameters are F-tests.
Procedures for Conducting F-tests:
.
1. Compute the RSS from regressing Y on all Xjs
(URSS=Unrestricted Residual Sum of Squares).
2. Compute the RSS from the regression with the
hypothesized/specified values of parameters (β s)
(RRSS = Restricted RSS).
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 51
.
) A special F-test of common interest is to test the
null that none of the Xs influence Y (i.e., that
our regression is useless!):
Test H0: β1 = β 2 = ... = β K = 0 vs. H1: H0 is not true.
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 52
K n
URSS = (1 − R )∑ y = ∑ y − ∑{βˆ j ∑ x ji yi }.
2 2
i
2
i
j =1 i =1
RRSS = ∑ y . 2
i
( RRSS − URSS) / K R2 / K
⇒ = ~ FK ,n− K −1
URSS /(n − K − 1) (1 − R ) /(n − K − 1)
2
−1
⎡ 5 25 50 ⎤ ⎡40.825 4.375 - 6.25 ⎤
(X' X) −1 ⎢ ⎥
= ⎢25 141 262⎥ ⎢ = ⎢ 4.375 0.625 - 0.75 ⎥⎥
⎢⎣50 262 510⎥⎦ ⎢⎣ - 6.25 - 0.75 1 ⎥⎦
RSS 1.5
σ is estimated by : σˆ =
2 σˆ =
2
2 = 0.75
n − K −1 2
∧
⎡ 40.825 4.375 - 6.25 ⎤
var − cov ( β̂ ) = 0.75 ⎢⎢ 4.375 0.625 - 0.75 ⎥
. ⎢⎣ - 6.25
⎡30.61875 3.28125
= ⎢⎢ 3.28125 0.46875
⎢⎣ - 4.6875 - 0.5625
- 0.75 1
- 4.6875 ⎤
- 0.5625 ⎥⎥
0.75 ⎥⎦
⎥
⎥⎦
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 54
2008/09
.
c) t c =
seˆ( βˆ 2 )
βˆ0 − 0
seˆ( βˆ0 )
=
− 23.75
30.61875
0.75
≈ −4.29
R /K 2
2008/09
0.9945 / 2
d) Fc = = ≈ 180.82
(1 − R ) /(n − K − 1) 0.0055 / 2
2
.Fc =
( RRSS − URSS) / J (12.08 − 1.5) / 1
(URSS) /(n − K − 1)
Ft = F 0.05
1, 2
=
1.5 / 2
≈ 18.51
≈ 14.11
. tt = t = 12.706
1
0.025
t cal < t tab ⇒ do not reject the null.
) The same result as the F-test, but the F-test is
easier to handle.
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 57
To sum up:
Assuming that our model is correctly specified
and all the assumptions are satisfied,
) Education (after controlling for experience)
doesn’t have a significant influence on wages.
) In contrast, experience (after controlling for
education) is a significant determinant of wages.
) The intercept parameter is also insignificant
.
)
)
(though at the margin). Less Important!
Overall, the model explains a significant portion
of the observed wage pattern.
We cannot reject the claim that the coefficients
of the two regressors are equal.
3.7 Prediction with Multiple Linear Regression
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 58
2008/09
Note:
) Even if the R2 for the SRF is very high, it does
not necessarily mean that our forecasts are
good.
) The accuracy of our prediction depends on the
stability of the coefficients between the period
used for estimation and the period used for
. prediction.
) More care must be taken when the values of the
regressors (X's) themselves are forecasts.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 1
2008/09
CHAPTER FOUR
VIOLATING THE ASSUMPTIONS OF
THE CLASSICAL LINEAR
. REGRESSION MODEL (CLRM)
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 2
2008/09
4.1 Introduction
4.1 Introduction
4.1 Introduction
4.1 Introduction
4.1 Introduction
4.1 Introduction
Outline:
1. Small Samples (A1?)
2. Multicollinearity (A2?)
3. Non-Normal Errors (A4?)
4. Non-IID Errors (A3?):
A. Heteroskedasticity (A3.1?)
B. Autocorrelation (A3.2?)
5. Endogeneity (A5?):
4.3 Multicollinearity
4.3 Multicollinearity
∑Y X = β̂ ∑X + β̂ ∑X + β̂ ∑X X
i 1i 0 1i
∑Y X = β̂ ∑X + β̂ ∑X X + β̂ ∑X
i 2i 0 2i
1i
1
2
2
1i
1i
2i
2i
2
2
1i 2i
2
2i
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 12
2008/09
4.3 Multicollinearity
∑ i 0 1 2 ∑ 1i
Y
Y
=
X
nβˆ + [ βˆ + 2βˆ ] X
∑ i 1i 0 ∑ 1i 1 2 ∑ 1i
= ˆ X + [ βˆ + 2βˆ ] X 2
β
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 13
2008/09
4.3 Multicollinearity
⎛ ∑Yi ⎞ ⎡ n
⇒⎜⎜ ⎟=⎢ ∑ 1i ⎥.⎜ 0 ⎞⎟
X ⎤ ⎛ ˆ
β
⎝ ∑Yi X1i
⎟
⎠ ⎣∑X1i ∑ 1i ⎦ ⎝ 1 2 ⎠
X 2 ⎜ˆ
β + 2βˆ ⎟
)The number of β's to be estimated is greater
than the number of independent equations.
)So, if two or more X's are perfectly correlated, it
α̂ = β̂1 + 2β̂2 =
∑YX
i
∑X
1i
2
1i
− nX1Y
− nX2
1
& β̂0 = Y − [β̂1 + 2β̂2 ]X1
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 14
2008/09
4.3 Multicollinearity
4.3 Multicollinearity
4.3 Multicollinearity
4.3 Multicollinearity
)Including a variable computed from other
variables in the model (e.g. using family income,
mother’s income & father’s income together).
)Adding many polynomial terms to a model,
especially if the range of the X variable is small.
)Or, it may just happen that variables are highly
correlated (without any fault of the researcher).
. Detecting Multicollinearity:
)The classic case of multicollinearity occurs
when R2 is high (& significant), but none of X's
is significant (some of the X's may even have
wrong sign).
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 18
2008/09
4.3 Multicollinearity
4.3 Multicollinearity
4.3 Multicollinearity
) As Rj2 increases, VIFj rises.
) If Xj is perfectly correlated with the other X's,
VIFj = ∞. Implication for precision (or CIs)???
) Thus, a large VIF is a sign of serious/severe (or
“intolerable”) multicollinearity.
) There is no cutoff point on VIF (or any other
4.3 Multicollinearity
Solutions to Multicollinearity:
) Solutions depend on the sources of the problem.
) The formula below is indicative of some
solutions: σ̂ 2 =
∑ 2
ei
vâr(β̂ j ) =
∑ x ji (1 − Rj )
2 2 (n − K − 1)∑ ji
x 2
(1 − R2
j )
) More precision is attained with lower variances
4.3 Multicollinearity
c) Greater variation in values of each Xj, cp;
d) Less correlation between regressors, cp.
)Thus, serious multicollinearity may be solved by
using one/more of the following:
1. “Increasing sample size” (if possible). ???
2. Utilizing a priori information on parameters
4.3 Multicollinearity
.may help.
)With large samples, thanks to the central limit
theorem, hypothesis testing may proceed even if
distribution of errors deviates from normality.
)Tests are generally asymptotically valid.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 26
2008/09
4.5.1 Heteroskedasticity
4.5.1 Heteroskedasticity
4.5.1 Heteroskedasticity
4.5.1 Heteroskedasticity
Detecting Heteroskedasticity:
A. Graphical Method
) Run OLS and plot squared residuals versus
fitted value of Y (Ŷ) or against each X.
# In stata (after regression): rvfplot
) The graph may show some relationship (linear,
4.5.1 Heteroskedasticity
4.5.1 Heteroskedasticity
B. A Formal Test:
) The most-often used test for heteroskedasticity
is the Breusch-Pagan (BP) test.
H0: homoskedasticity vs. Ha: heteroskedasticity
) Regress ũ2 on Ŷ or ũ2 on the original X's, X2's
and, if enough data, cross-products of the X's.
4.5.1 Heteroskedasticity
4.5.1 Heteroskedasticity
4.5.2 Autocorrelation
4.5.2 Autocorrelation
4.5.2 Autocorrelation
4.5.2 Autocorrelation
4.5.2 Autocorrelation
4.5.2 Autocorrelation
A. Stochastic Regressors
) Many economic variables are stochastic, and it
is only for ease that we assumed fixed X's.
) For instance, the set of regressors may include:
* a lagged dependent variable (Yt-1), or
* an X characterized by a measurement error.
. B. Measurement Error
) Measurement error in the regressand (Y) only
does not cause bias in OLS estimators as long
as the measurement error is not systematically
related to one or more of the regressors.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 47
2008/09
. and inconsistent.
) SOLUTION: IV/2SLS REGRESSION!
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 48
2008/09
. reasoning.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 51
2008/09
. an alternative specification.
3. Inclusion of irrelevant variables: when one/more
irrelevant variables are wrongly included in the
model. e.g. estimating Y=β0+β1X1+β2X2+β3X3+u
when the correct model is Y=β0+β1X1+β2X2+u.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 55
2008/09
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 60
2008/09
. F cal =
URSS
(K + 1)
[n − 2(K + 1)]
4. Find the critical value: FK+1,n-2(K+1) from table.
5. Reject the null of stable parameters (and favor
Ha: that there is structural break) if Fcal > Ftab.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 63
2008/09
15064.474
29
2 = 6.7632981
. 2. Ordinal variables:
1 Answers to yes/no (or scaled) questions...
# Effect of some quantitative variable may differ
between groups/categories:
1 Returns to education may differ between
sexes or between ethnic groups …
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 68
2008/09
. ⎨
⎩0 for i ∉ group 1.
¾If D = 0, E(Y) = E(Y|D = 0) = β0
¾If D = 1, E(Y) = E(Y|D = 1) = β0 + β1
) Thus, the difference between the two groups (in
mean values of Y) is: E(Y|D=1) – E(Y|D=0) = β1.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 69
2008/09
) constant = D1 + D2 !!!
D2 ]
X
⎡1
= ⎢⎢ 1
⎢⎣ 1
X
X
X
11
12
13
1
1
0
0
0
1
⎤
⎥
⎥
⎥⎦
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 74
2008/09
⎧Yt = C t + I t
⎨
⎩C t = α + βYt + U t
) Yt & Ct are endogenous (simultaneously
determined) and It is exogenous.
) Reduced form: expresses each endogenous
variable as a function of exogenous variables,
. 1− β 1− β
) Yt, in Ct = α + βYt + Ut, is correlated with Ut.
) OLS estimators for β (MPC) & α (autonomous
consumption) are biased and inconsistent.
) Solution:
Solution IV/2SLS
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 77
2008/09
… THE END …
GOOD LUCK!
.