Introduction To Econometrics
Introduction To Econometrics
ECONOMETRICS
Eugene Kaciak, Ph.D.
Faculty of Business, Brock University
St. Catharines, Ontario, Canada
E-mail: [email protected]
Personal website: http://
spartan.ac.brocku.ca/~ekaciak
1
Introduction
Econometrics is the integration of economic
theory, mathematics, and statistical
techniques for the purpose of:
Testing hypotheses about economic
phenomena
Estimating coefficients of economic
relationships
Forecasting or predicting future values
of economic variables or phenomena.
2
In econometrics, data sets come in
four main types:
Cross-sectional data set: a sample
of objects taken at a given point in
time
Time series data set: observations
on variables over time
3
Pooled cross sections: cross-
sectional data for different objects
taken at different time-periods.
Panel (or longitudinal) data set:
cross-sectional data for the same
objects taken at different time-
periods.
4
Econometric research, in general,
involves the following three stages:
Specification of the model, together
with the a priori theoretical
expectations about the sign and the
size of the parameters of the
function.
5
Collection of data on the variables
of the model and estimation of the
coefficients of the function with
appropriate techniques.
Evaluation of the estimated
coefficients of the function on the
basis of economic, statistical, and
econometric criteria
6
Simple Regression Analysis
1. Testing hypotheses about the relationship
between
a dependent (or explained, endogeneous,
predicted, response, effect, regressand)
variable Y and
one independent (or explanatory,
exogeneous, predictor, control, casual,
regressor) variable X
2. Prediction
7
The simple linear regression model is:
Yi = β0 + β1X1i + ui
i = 1, 2, …, n observations
β0+β1X1 is pop. reg. line (function)
β0 is the intercept of the pop. reg. line
β1 is the slope of the pop. reg. line
ui is the error term
8
Assumptions of Simple Reg. Analysis
The error term ui is assumed to be:
normally distributed with
expected value = 0
constant variance σ2
(this is called homoscedasticity)
• if the variance σ2 is not constant,
we have heteroscedasticity)
9
Any two error terms ui, uj (i≠j) are
uncorrelated;
if they are correlated, this
condition is called
autocorrelation
The variable X1 assumes fixed
values in repeated sampling (so that
X1i and ui are also uncorrelated).
10
11
Ordinary least-squares
(OLS) estimators are best
linear unbiased estimators
(BLUE)
12
Lack of bias (an unbiased
estimator) means that the expected
value of the estimate b of parameter
β equals β, i.e., E(b) = β.
OLS estimators are the best (B in
BLUE) among all unbiased (U in
BLUE) linear (L in BLUE)
estimators (E in BLUE).
13
This is known as the Gauss-
Markov theorem and represents
the most important justification for
using OLS.
14
Another desired feature of an
estimator is consistency.
An estimator is consistent if, as the
sample approaches infinity in the
limit, its value approaches the true
parameter (i.e., it is asymptotically
unbiased) and its distribution
collapses on the true parameter.
15
16
Here is another practical way of
calculating b1:
17
Simple Regression Analysis
Example:
Estimate the model: Y = β0 + β1X + u,
where X is labor hrs of work and Y is
output, based on a sample of n = 10:
i 1 2 3 4 5 6 7 8 9 10
X 10 7 10 5 8 8 6 7 9 10
Y 11 10 12 6 10 7 9 10 11 10
18
We compute:
ΣX = 80 ΣY = 96 AvgX = 8 AvgY = 9.6
ΣXY = 789 ΣX2 = 668 ΣY2 = 952
Sxx = ΣX2 – n(avgX)2 = 668 – 10(82) = 28
Syy = ΣY2 – n(avgY)2 = 952 – 10(9.62) =
30.4
Sxy = ΣXY – n(avgX)(avgY) =
= 789 – 10(8)(9.6) = 21
19
Find the OLS estimates of β0 and β1
Sxx = 28 Syy = 30.4 Sxy = 21
b1 = Sxy/Sxx = 21/28 = 0.75
b0 = avgY – b1avgX = 9.6 – 0.75(8) = 3.6
Ŷ = 3.6 + 0.75X
Constant term has no natural
interpretation. It captures the mean of Y
as well as the average effect of omitted
variables.
20
Standard errors of parameter
estimates
Given: Sxx = 28 Syy = 30.4 Sxy = 21
Residual Sum of Squares (RSS) =
= Syy – b1Sxy = 30.4 – 0.75(21) = 14.65
Standard Error of Regression (SER or ŝ):
ŝ = SER = √[RSS/(n-2)] =√(14.65/8) =
√1.83 = 1.35 Note also that: RSS =
(SER)2[n-(k+1)], where k = # of X’s
21
• Sxx = 28 Syy = 30.4 Sxy = 21
• RSS = 14.65 ŝ = SER = 1.35
Standard Error of b0 = SE(b0)
= ŝ√[(1/n) + (avgX)2/Sxx]
= 1.35√[(1/10) + 82/28]
= 1.35√2.39
= 1.35(1.54)
= 2.09
22
Standard Error of b1 = SE(b1)
= ŝ/√Sxx
= 1.35/√28
= 1.35/5.29
= 0.256
The model estimated by OLS:
Ŷ = 3.6 + 0.75X
(2.09) (0.256) St. Errors
23
Testing the statistical significance of
β0
H0: β0 = 0 vs. H1: β0 > 0
t0 = (b0 – 0)/SE(b0) = (3.6 – 0)/2.09 = 1.72
t0 has a t-student distribution with
n-2 = 8 d.f.
tcritical or tcr = 1.8595 (for α = 5%)
24
Since t0 < tcr --> Cannot reject H0 at the 5%
significance level
If H1: β0 ≠ 0, tcr = 2.3060
(for α/2 = 2.5%)
p-value = is the smallest α that permits
to reject H0 (p-value = 0.123 is found in
Excel)
The general rule is to ignore the
constant term’s lack of significance.
25
Testing the statistical significance of β1
H0: β1 = 0 vs. H1: β1 > 0
t1 = (b1 – 0)/SE(b1) = (0.75 – 0)/0.256 =
2.93
t1 has a t-student distribution with n-
2 = 8 d.f.
tcritical or tcr = 1.8595 (for α = 5%)
26
Since t0 > tcr --> Reject H0 at the 5%
level of significance
If H1: β1 ≠ 0, then tcr = 2.3060
(for α/2 = 2.5%)
p-value (Excel) = 0.019
27
95% Confidence Interval for β0
95% = 1 – α, where α is the signi-
ficance level. In this case, α = 5%
Left-hand side of the confidence level = b0
– tcrSE(b0)
= 3.6 – 2.3060(2.09) = -1.22
Right-hand side of the confidence level =
b0 + tcrSE(b0)
= 3.6 + 2.3060(2.09) = 8.42
28
95% Confidence Interval for β1
Left-hand side of the confidence
level = b1 – tcrSE(b1) =
0.75 –2.3060(0.256) = 0.160
Right-hand side of the confidence
level = b1 + tcrSE(b1) =
0.75 + 2.3060(0.256) = 1.34
29
Goodness-of-fit of the model
The coefficient of determination R2
R2 = b1(Sxy/Syy) = 0.75(21/30.4) = .518
or
R2 = t12/(t12 + n – 2) = 2.932/(2.932 + 8)
= .518
or
R2 = ESS/TSS = 1 – RSS/TSS,
where TSS = ESS + RSS (next slide)
30
TSS = ESS + RSS
• TSS = Total Sum of Squares =
Σ(Y – avgY)2
• ESS = Explained Sum of Squares
= Σ(Ŷ – avgY)2
• RSS = Residual Sum of Squares =
Σ(Y – Ŷ)2 = Σû2
• where û = Y – Ŷ is the residual
31
• ESS = b1Sxy = 0.75(21) = 15.75 or
• ESS = R2Syy = 0.518(30.4) = 15.75
• RSS = (1 – R2)Syy =
(1 – 0.518)(30.4) = 14.65
32
Test of the overall significance of
the regression (when the # of slope
parameters is 1)
H0: β1 = 0 vs.
H1: β1 ≠ 0
F = R2/[(1 – R2)/(n – 2)] =
= 0.518]/[(1 – 0.518)/(10-2)] = 8.60
33
F has an F-distribution with
d.f. = 1 and n-2
We find Fcr= 5.32
Since F = 8.60 > 5.32 = Fcr
reject H0 at the 5% level
Note: p-value (Excel) = 0.0189
34
Prediction with the Simple Reg.
Model
Ŷ = 3.6 + 0.75X R2 = 0.518
(2.09) (0.256) t-statistics
[0.123] [0.019] p-values
35
The 95% confidence interval of the
forecast:
LHS = Ŷ0 – tcr Ŝ
RHS = Ŷ0 + tcrŜ
where
Ŝ = ŝ√[1 + (1/n) + (X0-avgX)2/Sxx]
36
• Let’s compute:
The 95% confidence interval of the forecast
Ŷ0 = 8.1:
Ŝ = ŝ√[1 + (1/n) + (X0-avgX)2/Sxx] =
= 1.35√[1 + 0.1 + (6-8)2/28] = 1.51
LHS = Ŷ0 – tcrŜ
= 8.1 – 2.3060(1.51) = 3.48
RHS = Ŷ0+ tcrŜ
= 8.1 + 2.3060(1.51) = 11.6
37