ECO 401 Econometrics: SI 2021 Week 2, 14 September
ECO 401 Econometrics: SI 2021 Week 2, 14 September
SI 2021
Week 2, 14 September
An econometric model consists of a systematic part and a random and unpredictable component e that we will
call a random error
Suppose we want to approximate a variable y by a linear combination of other variables, x2 to xK and a constant.
b 1 + b 2 x 2 + ...... + b kxk
Using vector notation, our econometric model is 𝑦$ = 𝒙&$ 𝜷 + 𝜀$ , 𝑖 = 1, . . , 𝑁
Coefficients in this approximation can be determined by Ordinary Least Squares (OLS), which minimizes the
sum of squared differences between y and the linear combination.
(
2 ≡ 4(𝑦$ − 𝑥$& 𝛽)
𝑆(𝛽) 2 "
$'!
Ordinary Least Squares (OLS)-Recap
• How to find the 'best' 𝒃? Ordinary Least Squares (OLS) minimizes the sum of squared
residuals: min ∑: 𝑒
789 7
; = min 𝒆< 𝒆
𝒃 𝒃
• This is a straightforward minimization problem; note that
𝒆< 𝒆 = 𝒚 − 𝑿𝒃 ′ 𝒚 − 𝑿𝒃 so 𝐞< 𝐞 = 𝒚< 𝒚 − 𝟐𝒚< 𝑿𝒃 + 𝒃< 𝑿< 𝑿𝒃
• Differentiating with respect to 𝒃 and collecting terms gives the first order conditions
(FOC): −2 𝑿< 𝒚 − 𝑿< 𝑿𝒃 = 𝟎
• The solution to the FOC is simple: 𝒃=>? = 𝑿< 𝑿 @𝟏 𝑿′𝒚
: @9 :
• Using vector notation this is: 𝒃=>? = ∑789 𝒙𝒊 𝒙𝒊 ′ ∑789 𝒙𝒊 𝑦7
• In the simple case of 𝐾 = 2 with one regressor and a constant we can plot the variables
in a graph (𝑥 on horizontal axis and 𝑦 on vertical axis; let 𝑥̅ and 𝑦7 be the sample means,
then in this case:
∑$
!"# &! '&̅ )! ')*
𝑠𝑙𝑜𝑝𝑒 = 𝑏!"#$ = ∑$ % ; 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 = 𝑏+"#$ = 𝑦- − 𝑏!"#$ 𝑥̅
!"# &! '&̅
5
The Gauss-Markov assumptions
: < @9
Remember 𝑏 = ∑789 𝑥𝑖 𝑥7 ∑:
789 𝑥𝑖 𝑦7
Under assumptions (A1), (A2), (A3) and (A4):
yi = b1 + b 2 xi 2 + b3 xi 3 + e i
The variance of the OLS estimator b2 is written as follows
-1
s2 é N
2ù
V {b2 } = å i 2 2 úû
1 - r232 êë i =1
( x - x )
or
-1
s2 1 é1 N
ù
V {b2 } =
1- r 2
N ê
ëN
å (x
i =1
i2 - x2 ) 2 ú
û
(2.37)
23
Estimator properties
We estimate the variance of the error term σ2 by the sampling variance of the
residuals.
Under assumptions (A1)-(A4), s2 is unbiased for σ2. The square root is the
standard error of bk.
Estimated variance of OLS estimator
We can think of the standard error as measuring how precisely we have estimated
the population mean, via the sample mean, or another statistic-A measure of accuracy
of estimator.
As the sample size gets bigger and bigger, the standard error will shrink, reflecting the
fact that our estimate for the mean, or another statistic, will become more and more
precise.
Estimated variance of OLS estimator
-1
s2 1 é1 N
ù
V {b2 } =
1- r2
N ê
ëN
å (x
i =1
i2 - x2 ) 2 ú
û
(2.37)
23
• The sample variance of x2 shows more variation in the regressor values leads to a
more accurate estimator
• The larger error variance σ2 produces larger variance of the estimator. The low
value for σ2 means observations are closer to the regression line.
wagei = β1 + β2 malei + εi
The interpretation is: the expected wage of a person, given his or her gender is β1 + β2
malei .
That is, the expected wage of an arbitrary male is β1 + β2, for an arbitrary female it is β1.
13
Table 2.1 OLS estimates wage
equation
The expected hourly wage differential between males and females is $1.17 with a standard error of $0.11.
14
What do the assumptions mean?
(A2): {ε1 ,… εN} is independent of {x1,… xN}: knowing a person’s gender provides
no information about unobservables affecting this person’s wage.
(A3) Homoskedasticity, V{εi} = σ2: variance is the same for males and females.
(A5) Normality: no reason why εi would be normal. (E.g. negative wages are not
possible.)
εi are independent
A convenient fifth assumption is that all error terms have a normal distribution. We
specify:
which is shorthand for: all εi are independent drawings from a normal distribution
with mean 0 and variance σ2. (“normally and independently distributed”)
Summary of Assumptions
MR1 E(e)=0
MR2 var(e)=𝜎 ;
MR3 𝑐𝑜𝑣 𝑦7 , 𝑦C = 𝑐𝑜𝑣 𝑒7 , 𝑒C = 0
MR4 𝑐𝑜𝑣 𝑥7 , 𝑒7 = 0 and are not exact linear functions of the other explanatory
variable
If assumptions of the Multiple Regression Model (MR1-MR5) hold, the least squares
estimators are the best linear unbiased estimators (BLUE) of the parameters (Gauss-
Markov Theorem).
How well the line fits the observations?
The quality of the linear approximation offered by the model can be measured by the R2,
Goodness-of-fit.
• The R2 indicates the proportion of the variance in y that can be explained by the linear
combination of x variables. It is the proportion of variance of y that is explained by the
model. In formula:
18
4.1
Goodness-of-fit
Least Squares
Prediction
yi = β1 + β2 xi + ei
1. to explain how the dependent variable (yi) changes as the independent variable (xi)
changes
2. to predict y0 given an x0
Squaring and summing both sides and using the fact that
å( i ) å( i ) å i
2 2
y - y = ˆ
y - y + ˆ
e 2
Goodness-of-fit…
å( i ) å( i ) å i
2 2
y - y = ˆ
y - y + ˆ
e 2
å i
( y - y ) 2
= Total Sum of Squares = TSS
å ( yˆ - y )
i
2
= Sum of Squares due to regression = ESS
å(y i - ˆ
y ) 2
= å ˆ
ei = Sum of Squares due to Error = RSS
2
Coefficient of determination
ESS
R2 = or
TSS
RSS
R2 = 1-
TSS
Explained and unexplained components of yi
4.2
Goodness-of-fit…
Measuring
Goodness-of-fit
If R2 = 1, then all the sample data fall exactly on the fitted least
squares line, and the model fits the data ‘‘perfectly’’
If the sample data for y and x are uncorrelated and show no linear
association, and R2 = 0
4.2
Measuring
Goodness-of-fit
sxy
rxy =
where: sx s y
sxy = å ( xi - x )( yi - y ) ( N - 1)
å ( xi - x ) ( N - 1)
2
sx =
å ( yi - y ) ( N - 1)
2
sy =
• The sample correlation coefficient rxy has a value between -1 and 1, and it measures the strength of the linear
association between observed values of x and y
25
4.2
Measuring
Goodness-of-fit
R2 and rxy
1. r2xy = R2
• In Table 2.1, the R2 of 3.2% means, only approximately 3.2% of the variation in individual wages can
be attributed to gender differences à low R2
• In other words, gender difference “explains” only 3.2% of the variance of wages.
• Apparently, many other observable and unobservable factors affect a person’s wage beside gender.
• This does not necessarily imply that the estimation in 2.1 is incorrect or useless, rather amendments
are required.
5.2.2
Least Squares
Estimates Using
Hamburger Chain
Data
• Interpretation
• Let us look at another example, where we estimate the relationship of sales revenue of a firm and its
price and advertisement expenditure.
• 44.8% of the variation in sales revenue is explained by the variation in price and by the variation in
the level of advertising expenditure.
• In our sample, 55.2% of the variation in revenue is left unexplained and is due to variation in the error
term or to variation in other variables that implicitly form part of the error term.
29
Goodness-of-fit. adjusted R2
• R2 will never decrease if a variable is added. Therefore we define adjusted R2 as
OLS is the most efficient, unbiased estimation technique (provided the Gauss-Markov
assumptions hold)
Often, economic theory implies certain restrictions upon coefficients, for example
𝛽D = 0
We then have to test if an estimated non-zero value simply occurred by chance.
We can check whether our estimates deviate 'significantly' from these restrictions by
means of a statistical test.
If they do, we reject the null hypothesis that the restrictions are true.
Tests based on OLS estimates
• To perform a test, we need a test statistic
• A test statistic is something we can compute from our sample and has a known
distribution under the assumption that the null hypothesis is true.
• Next, we must decide if the computed value is likely to come from this
distribution, which indicates that the null hypothesis is likely to hold, or
not.
• The most common test is the 𝑡-test. It can be used to test a single
restriction.
• Suppose the null hypothesis is 𝛽D = 𝑞 for some given value 𝑞. Then, consider the
test statistic 𝑡 = 𝑏D − 𝑞 /𝑠𝑒(𝑏D )
• If the null hypothesis is true, and the assumptions (A2) and (A5) hold then 𝑡 has a
𝑡-distribution with 𝑁 − 𝐾 degrees of freedom.
32
Tests involving one parameter
• Suppose we want to test the null hypothesis (𝐻2 ) that the true coefficient is
𝛽3 = 0
• We will reject the null hypothesis if the absolute value of 𝑡 (the 𝑡- ratio) is
'too large'.
• (1.64 at 10%, 1.96 at 5% and 2.58 at 1% confidence level)
• If we want to test with 95% confidence, we reject the null hypothesis if the
absolute value of 𝑡 is larger than 1.96.
• The ratio 𝑡 = 𝑏D /𝑠𝑒(𝑏D ) is the 𝑡-value and is routinely supplied by any regression
package, with null hypothesis that 𝛽D = 0
• When it is rejected, it is said that “𝑏D differs significantly from zero” or “the
corresponding 𝑥 variable has a statistically significant impact on 𝑦” or simply “the
𝑥 variable is statistically significant”
• It is important to also look at the magnitude of beta (economic significance)
33
5.5
Hypothesis Testing
Tests involving one parameter
We need to ask whether the data provide any evidence to suggest that y
is related to each of the explanatory variables
• Testing this null hypothesis is sometimes called a test of significance for the
explanatory variable xk
• Null hypothesis: H 0 : bk = 0
• Alternative hypothesis: H1 : b k ¹ 0
bk
• Test statistic: t= ~ t( N - K )
se ( bk )
35
Do males earn more than females?
• Suppose we want to test whether 𝐽 coefficients are jointly equal to zero; 𝐻E: 𝛽; =
𝛽F =. . = 𝛽G = 0
• The alternative is that one or more of the restrictions under the null hypothesis
does not hold
• The easiest way to obtain a test statistic for this is to estimate the model twice:
– one without the restrictions (the full model)
– second with the restrictions imposed by omitting the corresponding 𝑥 variables
(because the corresponding bs are zero).
37
Tests involving more parameters
• Let the 𝑅;s of the two models be given by 𝑅9; and 𝑅E;; unrestricted and restricted
goodness-of-fit.
– Note that 𝑅9; ≥ 𝑅E;
• The restrictions are unlikely to be valid if the difference between the two 𝑅;s is
'large'.
• The test can be interpreted as testing whether the increase in 𝑅; moving from the
restricted model to the more general model is significant.
H! " @H# " /J
• A test statistic can be computed as 𝐹 = 9@H! " / :@G
38
Tests involving more parameters
• The p-value denotes the significance level for which the null hypothesis
is rejected.
• If a p-value is smaller than the size 𝛼 (e.g. 0.05) we reject the null
hypothesis.
• This allows you to perform the test without checking tables of critical
values.
46
We would extend our topic next week and look at
Marginal Effects
Asymptotic properties of OLS