Econometrics Lecture Notes
Econometrics Lecture Notes
Econometrics can also be defined as the art and science of using economic theory and
statistical techniques to analyse economic data.
Discipline Interrelationship
Overview
𝑃𝑡 = 𝛼0 + 𝛼1 𝑀𝑡 + 𝜇𝑡
Covariance
𝛿𝑋𝑌 = 𝐸 (𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )
Correlation
Page 1 of 16
𝛿𝑋𝑌
𝑟𝑋𝑌 =
𝛿𝑋 𝛿𝑌
This is a scaled covariance of the two variables. Correlation measure is scale free since
the numerator and the denominator are measured in scales. The correlation measure
lies between minus one and one. −1 < 𝑟𝑋𝑌 < 1
0 = No relationship.
Direction for causation not captured in the covariance is captured in the correlation.
Regression Measure
Types of Econometrics
Econometrics
theoretical Empirical
Theoretical economics must then spell out the assumptions of the techniques, the
properties, and what happens when one of the assumptions is violated.
Page 2 of 16
Too much money chasing too few goods is the cause of inflation
2. Specification of a mathematical model for the theory
𝑃𝑡 = 𝛼0 + 𝛼1 𝑀𝑡
The nature of the model is based on the assumption and usually does not specify
the nature of the equation
3. Specification of the econometric model; 𝑃𝑡 = 𝛼0 + 𝛼1 𝑀𝑡 + 𝜇𝑡
𝜇𝑡 = error term, residual, stochastic term.
4. Data Collection
5. Estimation of the parameters of the equation. Choose 𝛼1 and 𝛼0 that should
make 𝜇𝑡 as small as possible.
6. Hypothesis Testing – evaluating whether the economic theory is content with
given data, validating or refuting economic theory.
7. Forecasting and prediction
8. Using the model for control and policy analysis.
Concept of Linearity
Assumption of linearity enables the use of the ordinary least squares method. A
mathematical model of the form 𝑌 = 𝑓 (𝑋); 𝑌 = 𝛼 + 𝛽𝑋 − connotes linearity in the
coefficient/parameters and variables.
i. 𝑌 = 𝛼 + 𝛽𝑋 2 + 𝜇
This equation can also be solved using the ordinary least squares. As long as
the coefficients remain linear the assumption of linearity still holds.
iii. Intrinsically linear regression Model appears non-linear in parameter but can
be transformed into a model linear in parameters e.g. 𝑌 = 𝛽0 𝐿𝛽1 𝐾𝛽2 𝑒 𝜇 - this
is non linear in parameters but after taking logarithms it becomes
ln 𝑌 = ln 𝛽0 + β1 ln 𝐿 + 𝛽2 ln 𝐾 + 𝜇
This makes it linear in parameters but log linear in variables.
𝒀 = 𝜷𝟎 𝑳 𝜷 𝟏 𝑲 𝜷 𝟐 + 𝝁
Page 3 of 16
Types of Data
Time Series Data – collected at regular time interval A time series is a set of
observations on the values that a variable takes at different times. Such data may be
collected at regular time intervals, such as daily (e.g., stock prices, weather reports),
weekly (e.g., money supply figures), monthly [e.g., the unemployment rate, the
Consumer Price Index (CPI)], quarterly (e.g., GDP), annually (e.g., government
budgets), quinquennially, that is, every 5 years (e.g., the census of manufactures), or
decennially (e.g., the census of population).
Cross-Section Data Cross-section data are data on one or more variables collected at
the same point in time, such as the census of population
Pooled Data In pooled, or combined, data are elements of both time series and cross-
section data.
Panel, Longitudinal, or Micropanel Data This is a special type of pooled data in which
the same cross-sectional unit (say, a family or a firm) is surveyed over time.
A panel data (or longitudinal data) set consists of a time series for each cross-sectional
member in the data set. As an example, suppose we have wage, education, and
employment history for a set of individuals followed over a ten-year period. Or we
might collect information, such as investment and financial data, about the same set of
firms over a five-year time period. Panel data can also be collected on geographical
units. For example, we can collect data for the same set of counties in the United States
on immigration flows, tax rates, wage rates, government expenditures, and so on, for the
years 1980, 1985, and 1990.
Review of Probability
The probability of an outcome is the proportion of the time that the outcome occurs in
the long run. E.g. if the probability of your computer not crushing while you are writing
a dissertation is 80%, then over the course of writing many dissertations you will
complete 80% without a crash.
Page 4 of 16
Event – is a subset of the sample space of the sample space i.e. it is a set of one or more
outcomes. The event that my computer will crash more than once is the set consisting of
two outcomes: “no crashes” and “one crash”.
Random Variables
A continuous random variable is one for which the set of possible values is some range
or interval of real numbers. It is a variable that takes on a continuum of possible values.
A simple linear regression is also known as a bivariate linear regression model. It can be
specified as follows:
y = f ( x)
Page 5 of 16
b) A stochastic or statistical specification
f ( x) = + x
y = + x
y = 10 + 0.75 x
x 0 10 20 30 40
y 10 17.5 25 32.5 40
Stochastic Specification
y = + x+
With this specification the values of y for any given values of x cannot be determined
exactly. The error term has a known probability distribution.
y = 10 + 0.75 x +
where
= +50
= −50
The error has a probability of 0.5. The values of y given x will be as follows:
x 0 10 20 30 40
y ( = −50) -40 -32.5 -25 -15.5 -10
y ( = 50) 60 67.5 75 82.5 90
The simple linear regression model is assumed to be linear and and are unknown
regression coefficients or regression parameters.
Page 6 of 16
Relevance of Error Term
Theoretical inadequacy
o Theory itself is inconclusive
o Theory does not specify the number of variables to impact the dependent
variable
Data limitation or inadequacy
Distinction between core and peripheral explanatory variables – attaching
significance to variables.
Inherently random human behaviour.
Poor proxies/measurement errors
Abstraction and parsimony – Occam’s razor i.e. the desire to keep the model as
simple as possible (too much generalization of the model)
Functional misspecification – the effects of using a wrong functional form. This is
captured by the random error.
Page 7 of 16
𝑓 (𝜃)
𝑓(𝜃̌)
𝐸(𝜃̌) ≠ 𝜃
2. Efficiency
In some econometric problems we can have a large number of estimators to
choose from. The unbiased estimator whose sampling distribution has the
smallest variance is considered the most desirable of these unbiased estimators.
It is then called the best unbiased estimator or the efficient estimator. NB. It is
mathematically difficult to determine which estimator has the smallest variance.
Econometrics further add the restriction that the estimator had to be a linear
function of the observations on the dependent variable. The best estimator
becomes the best linear unbiased estimator. (BLUE)
3. Minimum Mean Square Criterion
The BLUE criterion allows unbiased to play an extremely strong role in
determining the choice of a good estimator since only unbiased estimators are
considered. An estimator is a minimum MSE estimator if it has the smallest mean
square error defined as the expected value of the squared differences of the
estimator around the true population parameter .
( ) ( )
2
MSE ˆ = E ˆ −
The MSE is equal to the variance of the estimator plus the square of its bias.
( ) ( )
MSE ˆ = var ˆ + Bias 2 ˆ ( )
Proof:
( ) ( )
2
MSE ˆ = E ˆ −
Page 8 of 16
( ) ( ) = E ˆ − E ( ˆ ) + E E ( ˆ ) − + 2E ˆ − E ( ˆ ) E ( ˆ ) −
2 2 2
E ˆ − E ˆ + E ˆ −
But;
( ) ( ) ( ) ( )
2 2
E ˆ − E ˆ = var ˆ ; E E ˆ − = Bias 2 ˆ
And
( ) ( )
2 E ˆ − E ˆ E ˆ − = 0 because,
( ) ( ) ( ) ( )
2 2
E ˆ − E ˆ − E ˆ + E ˆ = 0
Therefore;
( ) ( )
MSE ˆ = var ˆ + Bias 2 ˆ ( )
Asymptotic Unbiasedness
An estimator is asymptotically unbiased if
𝐸(𝜃 ∗ ) → 0 as 𝑛 → ∞
Page 9 of 16
An estimator is said to be asymptotically unbiased if as the sample size increases
the expectation of the mean tends to zero.
Consistency
An estimator is said to be consistent if
𝑝𝑙𝑖𝑚(𝜃 ∗ ) = 𝜃
An estimator is consistent if the sampling distribution collapses on the value of
the parameter being estimated. A consistent estimator has the characteristic that
both the variance and bias of the estimator should tend to zero as the sample size
increases.
o We assume that the mean of the error term is zero i.e. 𝐸 (𝜀𝑖 ) = 0
2
o Variance (𝜀𝑖 ) = 𝐸(𝜀𝑖 − 𝐸 (𝜀𝑖 )) = 𝐸 (𝜀𝑖2 ) = 𝛿 2 for all i. we assume that
error terms have the same variance which is 𝛿 2 . If the variances of the
error term are the same, we say we have homoskedasticity and
heteroskedasticity if the variances are different.
o The covariance between any two error terms is zero. 𝐶𝑜𝑣(𝜀𝑖 , 𝜀𝑗 ) =
𝐸(𝜀𝑖 − 𝐸 (𝜀𝑖 )) (𝜀𝑗 − 𝐸(𝜀𝑗 )) = 𝐸(𝜀𝑖 𝜀𝑗 ) = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 ≠ 𝑗 (assumption of no
autocorrelation).
Page 10 of 16
o Each error term is normally distributed with 𝐸 (𝜀𝑖 )=0 and 𝑉𝑎𝑟(𝜀𝑖 ) = 𝛿 2 .
The implication is that there is a large probability of obtaining small
disturbances than obtaining large disturbances. We are more likely to
obtain points close to the population regression function than points far
away from it.
NB: These assumptions are usually violated when dealing with econometric data.
Regression analysis was developed in a physical laboratory where they used laboratory
experiment. However, when we use real economic data it does not follow the data of
physical sciences and so violations occur. However, the classical linear regression model
provides a good starting point for dealing with differential aspects of regression
analysis.
𝑦 = 𝛼 + 𝛽𝑥 + 𝜖 … … … … … … . (1.0)𝑃𝑅𝐹
𝑦̂ = 𝛼̂ + 𝛽̂ 𝑥 … … … … … … … . . (1.1) 𝑆𝑅𝐹
𝜖 = 𝑦 − 𝑦̂ … … … … … … (1.3)
𝑦 = 𝑦̂ + 𝜖 … … … … … … … … … . . (1.4)
𝜖 = 𝑦 − (𝛼̂ + 𝛽̂ 𝑥) ⇒ 𝜖 = 𝑦 − 𝛼̂ − 𝛽̂ 𝑥 … … … … … (1.5)
In order for us to derive OLS estimators we need to partially differentiate equation 1.6
with respect to the coefficients.
𝜕 ∑ 𝜖2
= −2 ∑(𝑦 − 𝛼̂ − 𝛽̂ 𝑥) = 0
𝜕𝛼̂
−2 ∑(𝑦 − 𝛼̂ − 𝛽̂ 𝑥) 0
= =
−2 −2
Page 11 of 16
= ∑(𝑦 − 𝛼̂ − 𝛽̂ 𝑥) = 0
= ∑ 𝑦 − ∑ 𝛼̂ − 𝛽̂ ∑ 𝑥 = 0
⟹ ∑ 𝑦 = ∑ 𝛼̂ + 𝛽̂ ∑ 𝑥
⟹ ∑ 𝑦 = 𝑛𝛼̂ + 𝛽̂ ∑ 𝑥 … … … … … … … … … (1.7)
̂
Differentiate with respect to 𝜷
𝜕 ∑ 𝜖2
= −2 ∑(𝑦 − 𝛼̂ − 𝛽̂ 𝑥) 𝑥 = 0
𝜕𝛽̂
−2 ∑(𝑦 − 𝛼̂ − 𝛽̂ 𝑥) 𝑥 0
= =
−2 −2
= ∑(𝑦 − 𝛼̂ − 𝛽̂ 𝑥) 𝑥 = 0
⇒ ∑ 𝑋𝑌 − 𝛼̂ ∑ 𝑋 − 𝛽̂ ∑ 𝑋 2 = 0
∑ 𝑋𝑌 = 𝛼̂ ∑ 𝑋 + 𝛽̂ ∑ 𝑋 2 … … … … … … … … … … … (1.8)
Solving for 𝛼̂
∑ 𝑦 = 𝑛𝛼̂ + 𝛽̂ ∑ 𝑥
𝑛𝛼̂ = ∑ 𝑦 − 𝛽̂ ∑ 𝑥
∑𝑌 ∑𝑋
𝛼̂ = − 𝛽̂
𝑛 𝑛
⇒ 𝛼̂ = 𝑌̅ − 𝛽̂ 𝑋̅ … … … … … … … … … … … … … . . (1.9)
Solving for 𝛽̂
∑ 𝑋𝑌 = 𝛼̂ ∑ 𝑋 + 𝛽̂ ∑ 𝑋 2 ⇒ ∑ 𝑋𝑌 − 𝛼̂ ∑ 𝑋 = 𝛽̂ ∑ 𝑋 2
Page 12 of 16
∑ 𝑋𝑌 − (𝑌̅ − 𝛽̂ 𝑋̅) ∑ 𝑋 = 𝛽̂ ∑ 𝑋 2
∑ 𝑋𝑌 − (𝑌̅ − 𝛽̂ 𝑋̅) ∑ 𝑋 = 𝛽̂ ∑ 𝑋 2
∑𝑌 ∑𝑋
∑ 𝑋𝑌 − ( − 𝛽̂ ) ∑ 𝑋 = 𝛽̂ ∑ 𝑋 2
𝑛 𝑛
2
𝑛 ∑ 𝑋𝑌 − (∑ 𝑋 ∑ 𝑌 − 𝛽̂ (∑ 𝑋) ) = 𝑛𝛽̂ ∑ 𝑋 2
2
𝑛 ∑ 𝑋𝑌 − (∑ 𝑋 ∑ 𝑌) = 𝑛𝛽̂ ∑ 𝑋 2 − 𝛽̂ (∑ 𝑋)
2
𝑛 ∑ 𝑋𝑌 − (∑ 𝑋 ∑ 𝑌) = 𝛽̂ (𝑛 ∑ 𝑋 2 − (∑ 𝑋) )
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌
… … … … … … … … … … (1.10)
𝑛 ∑ 𝑋 2 − (∑ 𝑋 )2
𝑥 = 𝑋 − 𝑋̅ ; 𝑦 = 𝑌 − 𝑌̅
∑ 𝑋𝑌 = 𝛼̂ ∑ 𝑋 + 𝛽̂ ∑ 𝑋 2
∑ 𝑥𝑦 = 𝛼̂ ∑ 𝑥 + 𝛽̂ ∑ 𝑥 2
∑ 𝑥𝑦 = 𝛼̂ ∗ 0 + 𝛽̂ ∑ 𝑥 2 ⇒ ∑ 𝑥𝑦 = 𝛽̂ ∑ 𝑥 2
∑ 𝒙𝒚
̂=
𝜷 … … … … … … … … … … … … … … (𝟏. 𝟏𝟏)
∑ 𝒙𝟐
Example:
Page 13 of 16
Coefficient of Determination
This measures the proportion of the variation in the dependent variable that can be
attributed to the variation in the independent variable. The coefficient of determination
is given by R2 .
Y
(
Unexplained / residual Y − Yˆ = e )
Yˆ = ˆ + ˆ X
Total (TSS ) )
Explained
Y
Where TSS = Total sum of squares, ESS = explained sum of squares and RSS = residual
sum of squares.
( )
TSS = (Y − Y ) ; ESS = Yˆ − Y ; RSS = Y − Yˆ ( )
Thus;
(Y − Y ) = (Yˆ − Y ) + (Y − Yˆ )
Summing and squaring everything gives;
(Y − Y ) = (Yˆ − Y ) + (Y − Yˆ )
2 2 2
Recall that (Y − Y ) = y , so (Y − Y ) = y
2 2
. Therefore’
y = (Yˆ − Y ) + (Y − Yˆ )
2 2
2
Dividing throughout by y 2
gives
Page 14 of 16
y (Y − Y ) + e
2
2 2
=
y
2
y y 2 2
R2
1 = R2 +
e 2
y 2
R 2
= 1−
e 2
y 2
ˆ 2 x 2 ˆ 2 S x2 ˆ xy
R2 = = =
y 2
S y2 y 2
Price (X) 2 7 5 1 4 8 2 8
Output(Y) 15 41 32 9 28 43 17 40
a) Estimate the bivariate linear regression model with price as the independent
variable.
Correlation Coefficient
n XY − X Y
r=
( n X 2
)(
− ( X ) • n Y 2 − ( Y )
2 2
)
In deviation form it is calculated as follows:
r=
xy
( x) • ( y)
2 2
( ) (
Cov ˆ , ˆ = E ˆ − E ˆ ( )) ( ˆ − E ( ˆ ))
Recall, ˆ = Y − ˆ X , so the covariance is given as follows,
( ) (
Cov ˆ , ˆ = E Y − ˆ X − E ˆ( )) ( ˆ − E ( ˆ )) ;
E (ˆ ) = E (Y − ˆ X ) = Y − XE ( ˆ )
E (ˆ ) = Y − X
2
( ) ( ) ( ) ( ) ( )
2
cov ˆ , ˆ = E − X ˆ − ˆ − = − XE ˆ − ˆ
= − X var = − X
x 2
The variance of ˆ is always positive and so the covariance depends on the sign of X . If X is
positive then the covariance will be negative.
Hypothesis Testing
Refer to your Biometry lessons
a) Using the 5% level of significance, test the hypothesis that price influences output.
b) Construct the 95% confidence interval for the two coefficients
c) Calculate the covariance between the two coefficients and comment
d) Calculate the correlation coefficient and comment
Page 16 of 16