IntroduEconometrics - MBA 525 - FEB2024
IntroduEconometrics - MBA 525 - FEB2024
IntroduEconometrics - MBA 525 - FEB2024
(MBA 525 )
1
Chapter 1: Introduction
This chapter discusses
Definition and scope of econometrics
Need, objectives and goal of econometrics
Economic vs. econometric models
Methodology of econometrics
Desirable properties of econometric models
Data structures in econometric analysis
Causality and the notion of ceteris paribus
2
The course Introduction to Econometrics
provides a comprehensive introduction to the
art and science of econometrics.
It deals with how theory, statistical and
mathematical methods are combined in the
analysis of business and economics data, with
a purpose of giving empirical content to the
theories, and then verify or refute them.
3
1.1 Definition and scope of econometrics
Data analysis in economics, finance, marketing,
management and other disciplines is increasingly
becoming quantitative.
This involves estimation of parameters or
functions, quantification of qualitative
information and making hypotheses.
Developing the quantitative relationships among
various economic variables is important to better
understand the relationships, and to provide
better guidance for economic policy making.
4
What is econometrics? Literally, econometrics
means “economic measurement”, but its scope
is much broader.
Derived from the Greek terms ‘Oikovomia’
which means economy, and ‘Metopov’ which
means measurement.
“Econometrics is the science which integrates
economic theory, economic statistics, and
mathematical economics to investigate the
empirical support of the general schematic law
established by economic theory.
.
5
Econometrics is a special type of economic
analysis and research in which the general
economic theories, formulated in mathematical
terms, are combined with empirical
measurements of economic phenomena.
Econometrics is defined as the quantitative
analysis of actual economic phenomena.
Econometrics is the systematic study of
economic phenomena using observed data.
6
Econometrics is the study of the application of
statistical methods to the analysis of economic
phenomena.
Econometrics is the combination of economic
theory, mathematics and statistics, but it is
completely different from each one of these
three branches
Econometrics is a social science in which the
tools of economic theory, mathematics and
statistical inference are applied to the analysis
of economic phenomena.
7
Econometrics may be considered as the
integration of economics, mathematics and
statistics for the purpose of providing
numerical values for the parameters of
economic relationships.
Econometric methods are statistical methods
specifically adapted to the peculiarities of
economic phenomena.
The most important characteristic of
econometric relationships is that they contain
a random element.
8
However, such random element is not
considered by economic theory and
mathematical economics which postulate
relationship between the various economic
magnitudes
Econometrics is the science of testing economic
theories.
Econometrics is the set of tools used for
forecasting the future values of economic
variables.
9
Econometrics is the process of fitting
mathematical economic models to real world
data.
Econometrics is the science and art of using
historical data to make numerical or
quantitative analysis for policy
recommendations in government and business
Econometrics is the science and art of using
economic theory and statistical techniques to
analyze economic data.
10
1.2. Need, objectives and goal of econometrics
A. The Need for Econometrics
Econometrics is fundamental for economic
measurement.
However, its importance extends far beyond the
discipline of economics.
Econometrics has three major uses;
1. Describing economic reality
The simplest use of econometrics is description
We can use econometrics to quantify economic
activity b/c econometrics allows us to estimate
numbers and put them in equations that
previously contained only abstract symbols.
11
2. Testing hypotheses about economic theory
The second and perhaps the most common use
of econometrics is hypothesis testing, the
evaluation of alternative theories with
quantitative evidence
12
3. Forecasting future economic activity
The third and most difficult use of
econometrics is to forecast or predict what is
likely to happen in the future based on what
has happened in the past.
13
B. The goals of econometrics
Three main goals of econometrics are often
identified, including
1. Analysis (i.e., testing economic theory).
2. Policy making (i.e., obtaining numerical
estimates of the coefficients of economic
relationships for policy simulations, and
3. Forecasting (i.e., using the numerical estimates
of the coefficients in order to forecast the
future values of economic magnitudes.
14
1.3 Economic vs. Econometric Models
Economic models: Any economic theory is an
observation from the real world.
15
The sensible procedure is therefore, to pick
up the important factors and relationships
relevant to our problem and to focus our
attention on these alone.
Such a deliberately simplified analytical
framework is called on economic model.
It is an organized set of relationships that
describes the functioning of an economic
entity under a set of simplifying
assumptions.
16
All economic reasoning is ultimately based on
models.
Economic models consist of the following
three basic structural elements;
A set of variables
17
Econometric models: As their most important
characteristic, econometric relationships
contain a random element which is ignored by
mathematical economic models which
postulate exact relationships between
economic variables.
Example: Economic theory postulates that the
demand for a commodity depends on its
price, on the prices of other related
commodities, on consumers’ income and on
tastes.
18
This is an exact relationship which can be
written mathematically as:
Q b0 b1 P b2 Po b3Y b4T
19
In our example, the demand function
studied with the tools of econometrics
would be of the stochastic form:
Q b0 b1 P b2 P0 b3Y b4T ε i
20
Causes of the error
21
1.4. Methodology of econometrics
22
The elements or anatomy of the set up that
constitute an economic analysis thus
involves:
Economic Theory
Mathematical Model of Theory
Econometric Model of Theory
Data
Estimation of Econometric Model
Hypothesis Testing
Forecasting or Prediction
Using the model for control or policy
purposes
23
Fig: Methodologies of econometrics
24
1.5. Desirable properties of Econometric
Models
Theoretical plausibility
Explanatory ability
Accuracy of the estimates of the parameters
Forecasting ability
Simplicity
25
1.6. Data structures in econometrics analysis
The success of any econometric analysis ultimately
depends on the availability of the appropriate data.
It is therefore essential that we spend some time discussing
the nature, sources, and limitations of the data that one
may encounter in empirical analysis.
Sources and Types of Data
In econometrics, data come from two sources: experiments
or non-experiment observations.
Experimental data come from experiments designed to
evaluate a treatment or policy to investigate a casual effect.
Non-experimental data are data obtained by observing
actual behavior outside an experimental setting.
26
It is also known as observational data
Observational data are collected using surveys
such as personal interview or telephone interview
or any other methods of collecting primary data.
Observational data pose major challenges to
econometric attempts to estimate casual effects.
Whether data are experimental or observational,
data sets come in three main types: Time series,
cross-sectional and pooled data.
Data can be available for empirical analysis in the
form of time series, cross-section, pooled and
panel data
27
Time series data: These are data collected over
periods of time. Data which can take different
values in different periods of time are normally
referred as time series data.
Cross-sectional data: Data collected at a point of
time from different places. Data collected at a
single time are known as cross-sectional data. A
cross-sectional data set consists of a sample of
individuals, households, firms, cities, countries,
regions or any other type of unit at a specific
point in time.
28
Pooled data: Data collected over periods of
time from different places. It is the
combination of both time series and cross-
sectional data.
Panel data: It is also known as longitudinal
data. It is a time series data collected from
the same sample over periods of time.
29
1.7. Causality and the notion of ceteris paribus
Simply establishing a relationship between
variables is rarely sufficient
Effects are required to be considered causal
If we’ve truly controlled for enough other
variables, then the estimated ceteris paribus
effect can often be considered to be causal
Otherwise, it can be difficult to establish
causality.
30
The concept of ceteris paribus, that is holding
all other factors constant, is at the center of
establishing a casual relationship.
Simply finding that two variables are
correlated is rarely enough to conclude that a
change in one variable causes a change in
another.
The goal of most empirical studies in
economics and other social sciences is to
determine whether a change in one variable,
say x, causes a change in the other variable,
say y.
31
For example, does having another year of
education cause an increase in monthly salary?
Does reducing class size cause an
improvement in student performance?
Because economic variables are properly
interpreted as random variables, we should
use ideas from probability to formalize the
sense in which a change in x causes a change
in y.
32
Example: Returns to Education
A model of human capital investment implies getting
more education should lead to higher income/earnings
In the simplest case, this implies an equation like
Earning β 0 β 1 Education ε
33
Chapter 2: Simple Linear Regression Model
This chapter discusses
Introduction to two-variables linear regression
37
Error term
Consider the above model: Y = 0.6X + 120.
This functional relationship is deterministic or exact,
that is, given income we can determine the exact
expenditure of a household.
But in reality this rarely happens: different
households with the same income are not expected
to spend equal amounts due to habit persistence,
geographical and time variation, etc.
Thus, we should express the regression model as:
Yi α βX i ε i
where i is the random error term (also called
disturbance term).
38
General reasons for the error term
Omitted variables: a model is a simplification of
reality.
It is not always possible to include all relevant
variables in a functional form.
For instance, we may construct a model relating
demand and price of a commodity.
But demand is influenced not only by own price:
income of consumers, price of substitutes and several
other variables also influence it.
The omission of these variables from the model
introduces an error.
Measurement error: Inaccuracy in collection and
measurement of sample data. 39
Sampling error: Consider a model relating
consumption (Y) with income (X) of households.
Poor households constitute the sample.
Our α and β estimation may not be as good as that
from a balanced sample group.
The size of the error i is not fixed, it is non-
deterministic or stochastic or probabilistic in nature.
This implies that Yi is also probabilistic in nature.
Thus, the probability distribution of Yi and its
characteristics are determined by the values of Xi and
by the probability distribution of i
40
Thus, a full specification of a regression model should
include a specification of the probability distribution
of the disturbance (error) term. This information is
given by what we call basic assumptions or
assumptions of the classical linear regression model
(CLRM).
Consider the model:
Yi = a + bXi+ i;… i=1,2,…n
Here the subscript i refers to the i-th observation. In
the CLRM, Yi and Xi are observable while i is not. If i
refers to some point or period of time, then we speak
of time series data. On the other hand, if i refers to
the i-th individual, object, geographical region, etc.,
then we speak of cross-sectional data.
41
2.2. Assumptions of the CLRM
1. The true model is: Yi = a + bXi+ i whereas a is the
intercept and b is the slope parameter, and i is the
error term, stochastic term (or disturbance)
2. The error terms have zero mean: E(i ) = 0. This is
often called the zero conditional mean assumption.
3. Homoscedasticity (error terms have constant
variance): Var (i ) = E(i 2 ) = s2
4. No error autocorrelation (the error terms i are
statistically independent of each other): cov (i, j) =
E(ij) = 0; for all i ≠ j .
5. Xi are deterministic (non-stochastic): Xi and i are
independent for all i, j
6. Normality: i are normally distributed with mean zero
and variance s2 for all i (written as: i ∼ N(0 , s2). 42
Let us examine the meaning of these assumptions:
Assumption (1) states that the relationship between Yi
and Xi is linear, and that the deterministic component
(α + βXi ) and the stochastic component (i ) are
additive.
The model is linear in parameters and I is a random
real number.
Assumption (2) tells us that the mean of the Yi is:
E(Yi) = α + βXi
This simply means that the mean value of Yi is non-
stochastic.
Assumption (3) tells us that every disturbance has the
same variance s2 whose value is unknown, that is,
regardless of whether the Xi are large or small, the
dispersion of the disturbances is the same.
43
For example, the variation in consumption level of
low income households is the same as that of high
income households.
Assumption (4) states that the disturbances are
uncorrelated. For example, the fact that output is
higher than expected today should not lead to a higher
(or lower) than expected output tomorrow.
Assumption (5) states that Xi are not random
variables, and that the probability distribution of i is
in no way affected by the Xi .
44
We need assumption (6) for parameter estimation
purposes and also to make inferences on the basis of
the normal (t and F) distribution.
Specifying the model and stating its underlying
assumptions are the first stage of any econometric
application.
The next step is the estimation of the numerical values
of the parameters of economic relationships.
The parameters of the simple linear regression model
can be estimated by various methods.
45
2.3. The ordinary least squares (OLS)
method of estimation
Three of the most commonly used methods
are:
1. Ordinary Least Square (OLS) method
2. Maximum Likelihood (MLM) Method
3. Method of Moments (MM) method
But, here we will deal with the OLS and
the MLM methods of estimation.
46
2.3. The ordinary least squares (OLS) method of
estimation
In the regression model Yi = a + bXi+ i , the values of
the parameters α and β are not known. When they
are estimated from a sample of size n, we obtain the
sample regression line given by:
48
Proofing the theorem
Here we will proof 𝛽 is the BLUE of β.
a) To show that 𝜷 is the a linear estimator of β
The OLS estimator of β can be expressed as
Now, we have
E(b) = b (since b is a constant).
50
Since 𝑋𝑖 is non-stochastic (assumption 5) and
E(i ) = 0 (assumption 2), we
Thus,
H
Hence, 𝛽 is unbiased estimator of β
51
c) To show that 𝛽 has the smallest variance out
of all linear unbiased estimators of β
Note:
1. The OLS estimators 𝛼 and 𝛽 are calculated from a
specific sample of observations of the dependent
and independent variables.
If we consider a different sample of
53
c). This is simply, the sum of the squares 𝑥𝑖 2 )
plus the sum of the cross product (𝑥𝑖 𝑥𝑗 )
From equation (*), we have
54
55
Note that (**) follows from assumptions (3)
and (4), that is, var (𝜀𝑖 ) = E(𝜀𝑖 2 ) = 𝛿 2 for all i
and cov(𝜀𝑖 ,𝜀𝑗 ) = E(𝜀𝑖 ,𝜀𝑗 ) = 0, for all i≠ j .
𝛿2
Hence, Var(𝛽 ) =
𝑋𝑖 2
We have seen above (in proof (a)) that the
OLS estimator of β can be expressed as:
56
Let 𝛽∗ be another linear unbiased estimator
of β given by:
𝛽 ∗ =𝑐𝑖 𝑦𝑖 where
57
58
59
To summarize,
1. β is the linear estimator of β.
2. β is the unbiased estimator of β.
3. β has the smallest variance compared to
any linear unbiased estimator.
Hence, we conclude that 𝛽 is the BLUE of β.
60
2.5. Statistical inference in simple linear regression
model
A. Estimation of standard error
To make statistical inferences about the true
(population) regression coefficient β, we make use of
the estimator Band its variance Var(B ) .
We have already seen that:
2
Var ( B ) δ
x
2
where i
xi X i X
Since this variance depends on the unknown
parameter, we have to estimate 𝛿 2 .
As shown above, an unbiased estimator of 𝛿 2 is given
by:
61
B. Test of model adequacy
Is the estimated equation a useful one?
To answer this, an objective measure of some
sort is desirable.
The total variation in the dependent variable
Y is given by:
2
Total variation in(Y) (Yi Yˆ )
Our goal is to partition this variation into
two: one that accounts for variation due to
the regression equation (explained portion)
and another that is associated with the
unexplained portion of the model. 62
We can think of each observation as being
made up of an explained part, and an unexplained part,
y i ŷ i û i
We then define the following :
iy y 2
TSS i
(y ) 2
iŷ y 2
explained or regression sum of squares (RSS) β̂ 2 (x i ) 2
63
TSS RSS ESS
64
In other words, the total sum of squares (TSS) is
decomposed into regression (explained) sum of
squares (RSS) and error (residual or unexplained)
sum of squares (ESS)
The total sum of squares (TSS) is a measure of
dispersion of the observed values of Y about their
mean.
The regression (explained) sum of squares (RSS)
measures the amount of the total variability in the
observed values of Y that is accounted for by the
linear relationship between the observed values of
X and Y.
65
The error (residual or unexplained) sum of squares
(ESS) is a measure of the dispersion of the
observed values of Y about the regression line.
If a regression equation does a good job of
describing the relationship between two variables,
the explained sum of squares should constitute a
large proportion of the total sum of squares.
Thus, it would be of interest to determine the
magnitude of this proportion by computing the
ratio of the explained sum of squares to the total
sum of squares.
66
This proportion is called the sample coefficient of determination, R2
. That is Coefficient of determination (R2 ):
R2 = RSS/TSS
= 1-(ESS/TSS)
1) The proportion of total variation in the dependent
variable (Y) that is explained by changes in the
independent variable (X) or by the regression line is equal
to: R2 *100%.
The proportion of total variation in the dependent variable
(Y) that is due to factors other than X (for example, due to
excluded variables, chance, etc) is equal to: (1– R2)
x100%
67
Test for the coefficient of determination (R2)
The largest value that R2 can assume is 1 (in which
case all observations fall on the regression line), and
the smallest it can assume is zero.
A low value of R2 is an indication that
X is a poor explanatory variable in the sense that
70
To test for the significance of R2 , we compare the
variance ratio with the critical value from the F
distribution with 1 and (n-2) degrees of freedom in the
numerator and denominator, respectively, for a given
significance level α.
Decision: If the calculated variance ratio exceeds the
tabulated value, that is, if Fcal > Fa (1,n 2) , we then
conclude that R2 is significant (or that the linear
regression model is adequate).
The F test is designed to test the significance of all
variables or a set of variables in a regression model.
In the two-variable model, however, it is used to test the
explanatory power of a single variable (X), and at the
same time, is equivalent to the test of significance of R2
71
Illustrative Example 1: SLR Empirics
Consider the following data on the percentage
rate of change in electricity consumption
(millions KWH) (Y) and the rate of change in
the price of electricity (Birr/KWH) (X) for the
years 1979 – 1994.
72
Year X Y Year X Y
1979 -0.13 17.93 1987 2.57 52.17
1980 0.29 14.56 1988 0.89 39.66
1981 -0.12 32.22 1989 1,80 21.80
1982 0.42 2.20 1990 7.86 -49.51
1983 0.08 54.26 1991 6.59 -25.55
1984 0.80 58.61 1992 -0.37 6.43
1985 0.24 15.13 1993 0.16 15.27
1986 -1.09 39.25 1994 0.50 60.40
Summary statistics
Note : x i Xi - X and y i Yi - Y
n 16; X 1.280625; Y 23.42688 ; x i 92.20109; y
2 2
i 13228.7
x y i i 779.235x
73
Based on the above information,
a) Compute the value of the regression
coefficients
b) Estimate the regression equation
c) Test whether the estimated regression
equation is adequate
d) Test whether the change in price of electricity
significantly affects its consumption.
74
Chapter 3
Multiple Linear Regression Models
This chapter discusses
Introduction to k-variables linear regression
Assumptions
75
3.1. Introduction
So far we have seen the basic statistical tools
and procedures for analyzing relationships
between two variables.
But in practice, economic models generally
contain one dependent variable and two or
more independent variables.
Such models are called multiple linear
regression models
76
Example 1
In demand studies we study the relationship
between the demand for a good (Y) and price
of the good (X2), prices of substitute goods
(X3) and the consumer’s income (X4 ). Here,
Y is the dependent variable and X2, X3 and
X4 are the explanatory (independent)
variables. The relationship is estimated by a
multiple linear regression equation (model)
of the form:
Ŷ β̂1 β̂ 2 X 2 β̂ 3X 3 β̂ 4 X 4
77
Example 2
In a study of the amount of output (product),
we are interested to establish a relationship
between output (Q) and labour input (L) &
capital input (K). The equations are often
estimated in log-linear form as:
78
Example 3
In a study of the determinants of the number
of children born per woman (Y), the possible
explanatory variables include years of
schooling of the woman (X2 ), woman’s (or
husband’s) earning at marriage (X3), age of
woman at marriage (X4) and survival
probability of children at age five (X5).
The relationship can thus be expressed as:
82
The only additional assumption here is that
there is no multicollinearity, meaning that
there is no linear dependence between the
regressor variables X2, X3, ….XK
Under the above assumptions, ordinary least
squares (OLS) yields best linear unbiased
estimators (BLUE) of β2, β3, …. βK
83
3.3. Estimation of parameters and SEs
Consider the following equation:
ˆ
β2
[ x 2i ][ x3i ][ ( x 2i x3i )
2
2 2
βˆ3
[ x 2i ][ x3i ][ ( x 2i x3i )
2
2 2
84
Variance of the MLR estimators
Now we know that the sampling distribution of
our estimate is centered around the true
parameter
Want to think about how spread out this
distribution is
Much easier to think about this variance under
an additional assumption, so
Assume Var(u|x1, x2,…, xk) = s2
(Homoskedasticity)
Let x stand for (x1, x2,…xk)
Assuming that Var(u|x) = s2 also implies that
Var(y| x) = s2 85
4. The coefficient of determination (R2)
test of model adequacy
How do we think about how well our
sample regression line fits our sample data?
Can compute the fraction of the total sum
of squares (SST) that is explained by the
model, call this the R-squared of regression
R2 = RSS/TSS = 1 – ESS/TSS
86
More about R-squared
R2 can never decrease when another
independent variable is added to a
regression, and usually will increase
Because R2 will usually increase with the
number of independent variables, it is not a
good way to compare models
87
Too Many or Too Few Variables
90
Normal sampling distributions
Under the CLM assumption s, conditiona l on
the sample values of the independent variable s
j
bˆ ~ Normal b ,Var bˆ , so that
j
j
92
Knowing the sampling distribution for the
standardized estimator allows us to carry
out hypothesis tests
Start with a null hypothesis
For example, H0: bj=0
If accept null, then accept that xj has no
effect on y, controlling for other x’s
If we reject the null, we conclude that xj has
affects y, controlling for other x’s
93
To perform our test we first need to form
ˆ
b
ˆ
" the" t statistic for b j : t bˆ
j
j
j se bˆ
We will then use our t statistic along with
a rejection rule to determine whether to
accept the null hypothesis , H 0
94
t -test: One-sided alternatives
Besides our null, H0, we need an alternative
hypothesis, H1, and a significance level
H1 may be one-sided, or two-sided
H1: bj > 0 and H1: bj < 0 are one-sided
H1: bj 0 is a two-sided alternative
If we want to have only a 5% probability of
rejecting H0 if it is really true, then we say
our significance level is 5%
95
Having picked a significance level, a, we
look up the (1 – a)th percentile in a t
distribution with n – k – 1 df and call this c,
the critical value
We can reject the null hypothesis if the t
statistic is greater than the critical value
If the t statistic is less than the critical value
then we fail to reject the null
96
yi = b0 + b1xi1 + … + bkxik + ui
Fail to reject
reject
1 a a
0 c
97
One-sided vs two-sided
Because the t distribution is symmetric,
testing H1: bj < 0 is straightforward.
The critical value is just the negative of
before
We can reject the null if the t statistic < –c,
and if the t statistic > than –c then we fail to
reject the null
For a two-sided test, we set the critical
value based on a/2 and reject H1: bj 0 if
the absolute value of the t statistic > c
98
Two-sided alternatives
yi = b0 + b1Xi1 + … + bkXik + ui
reject reject
a/2 1 a a/2
-c 0 c
99
Testing hypotheses
A more general form of the t statistic
recognizes that we may want to test
something like H0: bj = aj
In this case, the appropriate t statistic is
bˆ aj
t
j
, where
se bˆ j
a j 0 for the standard test
100
Computing p-values for t tests
An alternative to the classical approach is
to ask, “what is the smallest significance
level at which the null would be rejected?”
So, compute the t statistic, and then look up
what percentile it is in the appropriate t
distribution – this is the p-value
p-value is the probability we would observe
the t statistic we did, if the null were true
101
Illustration 2: Multiple Linear Regression Empirics
Consider the following data of a country on per
capita food consumption (Y), price of food (X2
) and per capita income (X3 ) for the years
1927-1941. Retail price of food and per capita
disposable income are deflated by the
Consumer Price Index.
102
Year Y X2 X3 Year Y X2 X3
1927 88.9 91.7 57.7 1935 85.4 88.1 52.1
1928 88.9 92.0 59.3 1936 88.5 88.0 58.0
1929 89.1 93.1 62.0 1937 88.4 88.4 59.8
1930 88.7 90.9 56.3 1938 88.6 83.5 55.9
106
Computing p-values and t tests with
statistical packages
Most computer packages will compute the
p-value for you, assuming a two-sided test
If you really want a one-sided alternative,
just divide the two-sided p-value by 2
Stata provides the t statistic, p-value, and
95% confidence interval for H0: bj = 0 for
you, in columns labeled “t”, “P > |t|” and
“[95% Conf. Interval]”, respectively
107
Given multiple regression stata output for income as dependent
variable and temperature, altitude, cities, wage, education,
ownership, and location as explanatory variables and _cons is a
constant term. Based on this, answer the questions that follow.
The following table is generated using the command..
regress income temperature altitude cities wage education
ownership location
income Coef. Std. Err. t P>|t| [95% Conf. Interval]
108
Questions
Which of the explanatory variables do
significantly affect the income level at 1%
significance level?
Which of the explanatory variables do not
significantly affect the income level at 1%
significance level?
Which of the explanatory variables significantly
negatively affect the income level at 1%
significance level?
Identify a variable which is not significant at 5%,
but remains significant at 10% level.
Identify variables which are insignificant.
109
3.6. Matrix forms of multiple regression
We can use OLS forms to analyze a system of
equations using matrices
For every given points (𝑋1 , 𝑌1 ), (𝑋2 , 𝑌2 )… (𝑋𝑛 , 𝑌𝑛 )
the OLS regression line can be given as:
Y= b0 + 𝛽𝑋+𝜀
For each observation
𝑌1 = b0 + 𝛽1 𝑋1 +𝜀1
𝑌2 = b0 + 𝛽2 𝑋2 +𝜀2
…….
𝑌𝑛 = b0 + 𝛽𝑛 𝑋1 +𝜀𝑛
110
Now, let us set a matrix equation using the above as:
Y1 1 X1 b0 1
Y 1 X 2 b1
Y ; X
2
; b ; 2
... ... ... ... ...
b
Yn 1 Xn n n
111
In using OLS, we are minimizing the ESS
𝐸𝑆𝑆 = 𝜀1 2 +𝜀2 2 +…+𝜀𝑛 2
In matrix form, this means
𝜀1
𝜀2
𝐸𝑆𝑆 = 𝜀1 𝜀2 … 𝜀𝑛
…
𝜀𝑛
𝐸𝑆𝑆 = 𝜀 𝑇 𝜀 ; since 𝜀 = 𝑦 − 𝑋𝛽
𝑇
𝐸𝑆𝑆 = 𝑦 − 𝑋𝛽 𝑦 − 𝑋𝛽
112
Using the apostrophe for the transpose, we have
′
𝐸𝑆𝑆 = 𝑦 − 𝑋𝛽 𝑦 − 𝑋𝛽
𝐸𝑆𝑆 = (𝑦 ′ −𝛽′ 𝑋 ′ ) 𝑦 − 𝑋𝛽
𝐸𝑆𝑆 = 𝑦 ′ 𝑦 − 𝑦 ′ 𝑋𝛽 − 𝛽 ′ 𝑋 ′ y+𝛽 ′ 𝑋 ′ 𝑋𝛽
𝜕𝐸𝑆𝑆
=0 − 𝑋 ′ y − 𝑋 ′ y+2𝑋 ′ 𝑋𝛽 = 0
𝜕𝛽
−2𝑋 ′ y+2𝑋 ′ 𝑋𝛽 = 0
𝑋 ′ 𝑋𝛽 = 𝑋 ′ y
(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑋𝛽 = (𝑋 ′ 𝑋)−1 𝑋 ′ y
𝛽 = (𝑋 ′ 𝑋)−1 𝑋 ′ y
113
Illustration: Determining OLS regression line using matrix
124 1 49
95 1 69
Y 71 X 1 89 Y bX
45
1 99
18 1 109
115
Now using Y= 𝑋𝛽+𝜀, we need to find X using
𝛽 = (𝑋 𝑇 𝑋)−1 (𝑋 𝑇 𝑌)
124 1 49
95 1 69
b
Y 71 ; X 1 89 ; b 0
45
1 99
b1
18 1 109
1 49
1 69
1 1 1 1 1 5 415
X X
'
1 89
49 69 89 99 109 415 36765
1 99
1 109
1 1 36765 415 a b 1 d b
( X ' X ) -1
11600 415 5 c a
; Recall, if A , then A
c d ad bc
116
Next we need to get, 𝑋 𝑇 𝑌
124
95
1 1 1 1 1 353
XY 71 25367
'
49 69 89 99 109
45
18
b 1 36765 415 353 211
b 0 ( X ' X ) X 'Y
1
117
Now, we can add, 𝜀𝑖 = 𝑦𝑖 − 𝑦𝑖 , to compute the individual residual values
and the total SSE (See the last row in Table below).
3 .7 3 .7
1.3 1 .3
11.3 ; SSE 3.7 1.3 11.3 2.3 7.7 11.3 207.65
2 . 3 2 . 3
Note: 7.7 7.7
The error sum of squares (ESS) or sum of squares of errors (SSE) is Br.
208.
At the price of ETB 54, 𝑌 = 211 − 1.7 54 = 211 − 91.8 = 119.2
Therefore, according to the model, if the price is ETB 54, we expect that
the quantity demanded is about 119 units.
118
Chapter 4: Estimation problems under
violations of the assumptions of OLS
4.1. Multicollinearity
In the construction of an econometric model, it
may happen that two or more variables giving rise
to the same piece of information are included,
That is, we may have redundant information or
unnecessarily included related variables
This is what we call a multicollinearity (MC)
problem.
119
The dependent variable Y is of size nx1.
The explanatory variables are also of size nX1.
Y=Xβ+𝜀, in general terms
Perfect MC exits if two or more explanatory variables
are perfectly correlated, that is, if the following
relationship exists between the explanatory variables,
that is, b 2 X 2 b 3 X 3 ...b n X n 0
One consequence of perfect MC is non-identifiability
of the regression coefficient vector β .
This means that one can not distinguish between two
different models: Y = Xβ + ε and Y = X𝛽 +ε .
These two models are said to be observationally
equivalent.
120
Consider
Model 1 : Y b 2 X 2 b 3 X 3 and
Model 2 : Y b 3 X 3 , and b 2 -1
Then, if there is perfect MC,
b 2 X 2 b3 X 3 0
X 2 b3 X 3 0
X 2 b3 X 3
Thus,
Model 1 : Y 2 b 3 X 3
Model 2 :Y b 3 X 3
Therefore, Model 1 and Model 2 are
observationally equivalent121
.
Another problem is that under perfect MC, we
can not estimate the regression coefficients
For instance, consider
Yi= b1 + b2X2i + b3X3i + . . . bkXki + i ,
Yi= b1 + b2X2i + b3X3i + i … for (k = 3),
Suppose b2 =1 and b3 = -5
Then, under PC,
b2X2i + b3X3i = 0, which means
X2 = 5X3
122
Consider parametric estimation under MLR
[ x2i yi ] [ ( x3i ) 2 ] [ x3i yi ][ x2i x3i ]
bˆ2
[ x2i ][ x3i ] [ ( x2i x3i ) 2
2 2
5 x y x x y 5 x x
2
bˆ 3 i 3i 3i i 3i 3i
2
(5 x ) x (5 x x )
3
2 2
3 3 3
2
5 x y x 5 x y x 0
2 2
bˆ 3 i 3
3 i 3
124
125
Thus, under a high degree of MC, the
standard errors will be inflated and the test
statistic will be a very small number.
This often leads to incorrectly accepting
(not rejecting) the null hypothesis when in
fact the parameter is significantly different
from zero!
Mostly, two extreme cases rarely exist in
practice, and of particular interest are cases
in between: moderate to high degree of MC
126
Such kind of MC is so common in
macroeconomic time series data (such as GNP,
money supply, income, etc) since economic
variables tend to move together over time.
Consequences of MC
Under a high degree of MC, the standard errors
will be inflated and the test statistic will be a
very small number.
This often leads to incorrectly accepting (not
rejecting) the null hypothesis when in fact the
parameter is significantly different from zero!
127
Major implications of a high degree of MC
1. OLS coefficient estimates are still unbiased.
2. OLS coefficient estimates will have large variances (or
the variances will be inflated).
3. There is a high probability of accepting the null
hypothesis of zero coefficient (using the t-test) when
in fact the coefficient is significantly different from
zero.
4. The regression model may do well, that is, R-squared
may be quite high.
5. The OLS estimates and their standard errors may be
quite sensitive to small changes in the data.
128
Methods of detection of MC
Multicollinearity almost always exists in most
applications.
So the question is not whether it is present or
not; it is a question of degree!
MC is not a statistical problem; it is a data
(sample) problem.
Therefore, we do not “test for MC’’; but
measure its degree in any particular sample
(using some rules of thumb).
129
The speed with which variances and co-
variances increase can be seen with the
variance-inflating factor (VIF)
130
Some of the other methods of detecting MC are:
1. High R2 but few (or no) significant t-ratios.
2. High pair-wise correlations among regressor.
131
Remedial measures
To circumvent the problem of MC, some of the
possibilities are:
1. Dropping a variable. This may result in an incorrect
specification of the model (called specification bias).
132
2. Transformation of variables
By transforming the variable, it could be
possible to reduce the effect of
multicollinearity.
3. Increasing the sample size
By increasing the sample, high covariances among
estimated parameters resulting from multicollinearity
in an equation can be reduced, because these
covariances are inversely proportional to sample size.
133
4.2. Autocorrelation
Autocorrelation exists when two or more error
terms are serially correlated.
Non-autocorrelation or absence of serial
correlation assumption tells us that the error
term at time t is not correlated with the error
term at any other point of time.
This means that when observations are made
over time, the effect of the disturbance occurring
at one period does not carry-over into another
period.
134
In case of cross-sectional data such as those on
income and expenditure of different
households, the assumption of non-
autocorrelation is plausible since the
expenditure behaviour of one household does
not affect the expenditure behaviour of any
other household in general.
The assumption of non-autocorrelation is more
frequently violated in case of relations
estimated from time series data.
135
For instance, in a study of the relationship
between output and inputs of a firm or industry
from monthly observations, non-
autocorrelation of the disturbance implies that
the effect of machine breakdown is strictly
temporary in the sense that only the current
month’s output is affected.
But in practice, the effect of a machine
breakdown in one month may affect current
month’s output as well as the output of
subsequent months.
136
In a study of the relationship between demand and price of
electricity from monthly observations, the effect of price
change in a certain month will affect the consumption
behaviour of households (firms) in subsequent months (that is,
the effect will be felt for months to come).
Thus, the assumption of non-autocorrelation does not seem
plausible here.
In general, there are a lot of conditions under which the errors
are autocorrelated (AC).
In such a case, we have 𝑐𝑜𝑣 (𝜀𝑡 𝜀𝑡+1 ) = 𝐸(𝜀𝑡 𝜀𝑡+1 ) ≠0
In order to see the consequences of AC, we have to specify the
nature (mathematical form) of the AC.
Usually we assume that the errors (disturbances) follow the
first-order autoregressive scheme (abbreviated as AR(1)).
137
The error process in AR(1) is
𝜺𝒕 =ρ𝜺𝒕−𝟏 +𝒖𝒕
139
4.3. Heteroskedasticity
Recall the assumption of homoskedasticity
implies that conditional on the explanatory
variables, the variance of the unobserved error, u,
was constant.
If this is not true, that is if the variance of u is
different for different values of the x’s, then the
errors are heteroskedastic
Example: In estimating returns to education and
ability is unobservable, and think the variance in
ability differs by educational attainment
140
Example of Heteroskedasticity
f(y|x)
.
. E(y|x) = b0 + b1x
.
x1 x2 x3 x
141
Thus under heteroschedasticity,
𝑣𝑎𝑟(ε𝑖 )=E(𝜀 2 ) = 𝑘𝛿 2 instead of
𝑣𝑎𝑟(ε𝑖 )=E(𝜀 2 ) = 𝛿 2
𝑉𝑎𝑟(𝛽𝐻𝐸𝑇 ) =
142
Thus, under heteroscedasticity, the OLS
estimators of the regression coefficients are not
BLUE and efficient.
Generally, under error heteroscedasticity we have
the following:
1. The OLS estimators of the regression coefficients
are still unbiased and consistent.
2. The estimated variances of the OLS estimators are
biased and the conventionally calculated confidence
intervals and test of significance are invalid.
143
Consequences Heteroskedasticity
145
Assumptions of MLE
1. The form of the distribution of the parent population
of Y's is assumed known. In particular we assume
that the distribution of Yi is normal.
2. The sample is random. and each ui is independent of
any other value Uj (or. equivalently, Yi is independent
of Yj).
3. The random sampling always yields the single most
probable result: any sample is representative of the
underlying population.
4. This is a strong assumption, especially for small
samples
146
This probability may be computed from the
frequency function of the variable X if we
know its parameters, that is, if we know the
mean, the variance or other constants
which define the distribution.
The probability of observing any given
value (within a range) may be evaluated
given that we know the mean and variance
of the population.
147
The maximum likelihood method chooses among
all possible estimates of the parameters those
values which make the probability of obtaining
the observed sample as large as possible
The function which defines the joint (total)
probability of any sample being observed is called
the likelihood function of the variable X.
The general expression of the likelihood function
is
148
The total probability of obtaining all the values in the sample is the
product of the individual probabilities given that each observation
is independent of the others
149
Since log L is a monotonic function of L, the values of the
parameters that maximise log L will also maximise L.
Thus we maximise the logarithmic expression of the likelihood
function by setting its partial derivatives with respect to
equal to zero.
150
5.2. Simultaneous Equation Models (SEM)
Consider
y1 = a1y2 + b1z1 + u1
y2 = a2y1 + b2z2 + u2
151
Simultaneity
Simultaneity is a specific type of
endogeneity problem in which the
explanatory variable is jointly determined
with the dependent variable
As with other types of endogeneity, IV
estimation can solve the problem
Some special issues to consider with
simultaneous equations models (SEM)
152
Instrumental Variables & 2SLS
153
Why Use Instrumental Variables?
Instrumental Variables (IV) estimation is
used when your model has endogenous x’s
That is, whenever Cov(x,u) ≠ 0
Thus, IV can be used to address the
problem of omitted variable bias
Additionally, IV can be used to solve the
classic errors-in-variables problem
154
What Is an Instrumental Variable?
In order for a variable, z, to serve as a valid
instrument for x, the following must be true
The instrument must be exogenous
That is, Cov(z,u) = 0
The instrument must be correlated with
the endogenous variable x
That is, Cov(z,x) ≠ 0
155
Two Stage Least Squares (2SLS)
It’s possible to have multiple instruments
Consider our original structural model, and
let y2 = p0 + p1z1 + p2z2 + p3z3 + v2
Here we’re assuming that both z2 and z3
are valid instruments – they do not appear
in the structural model and are
uncorrelated with the structural error term,
u1
156
Chapter 6
Limited Dependent Variable Models
In regression analysis, the dependent variable, Y,
is frequently not only quantitative continuous
variable (e.g. income, output, prices, costs,
height, temperature).
But it can also be qualitative (E.g., dummy,
ordinal and truncated).
For instance, consider sex, race, color, religion,
nationality, geographical region, political
upheavals, and party affiliation as variables.
157
There are many examples of this type of
models.
For instance, if we want to examine
determinants of using mobile banking
Yi
1, for mobile banking users
0, mobile banking non- users
This means that for all observations
(customers) i of a bank, we give the value 0 for
those who do not use mobile banking, and 1
for those who uses mobile banking services .
158
Dummy variables can also be used in
regression analysis just as quantitative
variables, being both dependent or
independent variable.
For instance, we can denote the dummy
explanatory variables by the symbol D rather
than by the usual symbol X to emphasize that
we are dealing with a qualitative variable.
As a matter of fact, a regression model may
contain only dummy explanatory variables.
159
Consider the following example of such a
model:
160
Therefore the values obtained, b1 and b2 ,
enable us to estimate the probabilities
In using dummy variable models, we
variables
These types of model tend to be associated
with the cross-sectional econometrics rather
than time series.
161
162
6.2. Data
When examining the dummy dependent
variables, we need to ensure there are
sufficient numbers of 0s and 1s.
For instance, to assess mobile banking users,
we need a sample of both: users who have
mobile banking services and non-users who
have no mobile banking services.
It is easier to find data for both category of
customers, users and non-users.
Three basic models: linear probability, Logit
and Probit models are mostly used to analyze
such data.
163
6.3. Linear Probability Model (LPM)
It is among discrete choice models or
dichotomous choice models.
In this case the dependent variable takes only
two values: 0 and 1.
There are several methods to analyze
regression models where the dependent
variable is 0 or 1.
The simplest method is to use the least
squares method.
164
Example: Linear probability model application
Consider a denial of a mortgage request and ratio of
debt payments to income (P/I ratio) in a data set as
depicted below:
165
In this case the model is called linear
probability model (LPM).
LPM uses OLS for estimation, and the
coefficients and t-statistics etc are then
interpreted in the usual way.
This produces the usual linear regression line,
which is fitted through the two sets of
observations
166
Features of the LPM
The dependent variable has two values, the
value 1 has a probability of p and the value 0
has a probability of (1-p)
This is known as the Bernoulli probability
distribution.
In this case the expected value of a random
variable following a Bernoulli distribution is
the probability that the variable equals 1
Since the probability of p must lie between 0
and 1, then the expected value of the
dependent variable must also lie between 0
and 1.
167
The error term is not normally distributed,
it also follows the Bernoulli distribution
The variance of the error term is
heteroskedastistic.
The variance for the Bernoulli distribution is
p(1-p), where p is the probability of a
success.
The value of the R-squared statistic is
limited, given the distribution of the LPMs.
168
As another case, consider a model of bond
ratings (b) of a firm, estimated using LPM,
with interest payments (r ) and profit (p) as
explanatory variables, as given below:
bˆi 2.79 0.76 p i 0.12ri
1 AA bond rating
b
0 BB bond rating
169
The coefficients are interpreted as in the
usual OLS models, i.e. a 1% rise in profits,
gives a 0.76% increase in the probability of
a bond getting the AA rating.
The R-squared statistic is low, but this is
probably due to the LPM approach, so we
would usually ignore it.
The t-statistics are interpreted in the usual
way.
170
Problems with LPM
Possibly the most problematic aspect of the
LPM is the non-fulfilment of the requirement
that the estimated value of the dependent
variable y lies between 0 and 1.
One way around the problem is to assume
that all values below 0 and above 1 are
actually 0 or 1 respectively
Another problem with the LPM is that it is a
linear model and assumes that the probability
of the dependent variable equalling 1 is
linearly related to the explanatory variable.
171
For example, if we have a model where the
dependent variable takes the value of 1 if a
mortgage is granted to a bank customer and 0
otherwise, regressed on the customer’s income.
The probability of being granted a mortgage
will rise steadily at low income levels, but
change hardly at all at high income levels.
An alternative and much better remedy to the
problem is to use an alternative technique such
as the Logit or Probit models.
172
6.4. Logit Model
The main way around the problems mentioned
earlier is to use a different distribution to the
Bernoulli distribution, where the relationship
between x and p is non-linear and the p is
always between 0 and 1.
This requires the use of ‘S’ shaped distribution
curves, which resemble the cumulative
distribution function (CDF) of a random
variable.
The CDFs used to represent a discrete variable
are the logistic (Logit model) and normal
(Probit model).
173
The problem with the linear probability model is
that it models the probability of Y = 1 as being
linear:
Yi β 0 β 1 X i u i
pi E ( y 1/ xi ) β 0 β 1 xi
176
The cumulative logistic distributive function
can then be written as:
pi 1
zi
1 e
Where : z i β 0 β1 xi
177
There is a problem with non-linearity
in the previous expression, but this can
be solved by creating the odds ratio:
1 p i 1
zi
1 e
zi
pi
1 e e
zi
1 p i zi
1 e
pi
Li ln ( ) z i β 0 β1 xi
1 p i
178
In the previous slide L is the log of the odds
ratio and is linear in the parameters.
The odds ratio can be interpreted as the
probability of something happening to the
probability it won’t happen.
For the mortgage case, the odds ratio of
getting a mortgage is the probability of
getting a mortgage to the probability of not
getting mortgage.
If p is 0.8, , the odds are 4 to 1, which means
the probability of getting mortgage to not
getting it is 4:1.
179
Features of the Logit model
Although L is linear in the parameters, the
probabilities are non-linear.
The Logit model can be used in multiple regression
tests.
If L is positive, as the value of the explanatory
variables increase, the odds that the dependent
variable equals 1 increases.
The slope coefficient measures the change in the
log-odds ratio for a unit change in the explanatory
variable.
Logit and Probit models are usually estimated
using Maximum Likelihood techniques.
180
The R-squared statistic is not suitable for
measuring the goodness-of-fit in discrete
dependent variable models, instead we
compute the count R-squared statistic.
If we assume any probability greater than
0.5 counts as a 1 and any probability less
than 0.5 counts as a 0, then we count the
number of correct predictions.
This is defined as count R-squared as
follows:
2 number of correct prediction s
Count R
Total number of observatio ns
181
The Logit model can be interpreted in a
similar way to the LPM
For instance, consider the previous model
where the dependent variable is granting of
a mortgage (1) or not (0).
The explanatory variable is income of
customers.
The coefficient on y suggests that a 1%
increase in income (y) produces a 0.32%
rise in the log of the odds of getting a
mortgage. 182
This is difficult to interpret, so the
coefficient is often ignored, the z-statistic
(same as t-statistic) and sign on the
coefficient is however used for the
interpretation of the results.
We can transform the natural log for
interpretation.
We could also include a specific value for
the income of a customer and then find
the probability of getting a mortgage.
183
Logit Result
If we have a customer with 0.5 units of
income, we can estimate a value for the Logit
of 0.56+0.32*0.5 = 0.72.
We can use this estimated Logit value to find
the estimated probability of getting a
mortgage.
By including it in the formula given earlier for
the Logit Model we get:
1 1
pi ( 0.72)
0.67
(1 e ) 1.49 184
Given that this estimated probability is
bigger than 0.5, we assume it is nearer 1,
therefore we predict this customer would
be given a mortgage.
With the Logit model we tend to report the
sign of the variable and its z-statistic which
is the same as the t-statistic in large
samples.
185
6.5. The Probit Model
An alternative approach, called by
Goldberger (1964) is the Probit model
The Probit model assumes that there is an
underlying response variable defined by
the following regression relationship.
is unobserved, it is referred to as a
latent variable.
186
The latent variable generates the observed
y’s.
Those who have larger values of the latent
variable are observed as y = 1 and those
who have smaller values are observed as y
=0
We observe the dummy variable y defined
as;
187
An alternative CDF to that used in the Logit
Model is the normal CDF, when this is used
we refer to it as the Probit Model.
In many respects this is very similar to the
Logit model.
The Probit model has also been interpreted as
a ‘latent variable’ model.
This has implications for how we explain the
dependent variable. i.e. we tend to interpret
it as a desire or ability to achieve something.
188
LPM, Logit and Probit models compared
The coefficient estimates from all three models
are related, because with Bernoulli, logistic and
normal distribution function differences .
If you multiply the coefficients from a Logit
model by 0.625, they are approximately the
same as the Probit model.
If the coefficients from the LPM are multiplied
by 2.5 (also 1.25 needs to be subtracted from
the constant term) they are approximately the
same as those produced by a Probit model.
189
In general, dummy variables can also be
used as the dependent variable
The LPM is the basic form of this model,
but has a number of important faults.
The Logit model is an important
development on the LPM, overcoming
many of these problems.
The Probit is similar to the Logit model but
assumes a different CDF, i.e., normal
distribution function.
190
Models for ordinal outcomes
The categories of an ordinal variable can be
ranked from low to high, but the distances
between the categories are unknown.
Ordinal outcomes are common in social
sciences.
For example, in a survey research, opinions are
often ranked as strongly agree, agree, neutral,
disagree, and strongly disagree.
Performance can be ranked as very high, high,
medium, low and very low.
191
Models for ordinal outcomes...
Such data appear without any assumption that
the distance from strongly agreeing and
agreeing is the same as the distance from
agree to disagree.
Educational attainments can be ordered as
elementary education, high school diploma,
college diploma, and graduate or professional
degree.
An ordinal dependent variable violates the
assumptions of the logistic regression model,
which can lead to incorrect conclusions.
192
Accordingly, with ordinal outcomes, it is much better to
use models that avoid the assumption that the distances
between categories are equal.
As with the binary regression model, the ordinal
outcome regression models are nonlinear.
The magnitude of the change in the outcome probability
for a given change in one of the independent variables
depends on the levels of all of the independent variables.
A latent variable model
The ordinal regression model is commonly presented as a
latent variable model.
Defining y∗ as a latent variable ranging from −∞ to ∞,
the structural model is
193
The measured dependent variable of a decision maker are
assumed to be correlated with the latent variable through the
following threshold criterion
Example: A working mother can establish just as warm and
secure of a relationship with her child as a mother who does
not work. [1=Strongly disagree; 2=Disagree; 3=Agree and
4=Strongly agree].
194
Other models with limited dependnet variables
Tobit Models
The linear regression model assumes that the values
of all variables are continous and are observable
(known) for the entire sample.
However, there are situations that the variables may
not be all observed for the entire sample.
There are situations in which the sample is limited by
censoring or truncation.
Censoring occurs when we observe the independent
variables for the entire sample, but for some
observations we have only limited information about
the dependent variable.
195
In certain situations, the dependent variable
is continuous, but its range may be
constrained.
Mostly, this occurs when the dependent
variable is zero for a substantial part of the
population but positive (with many different
outcomes) for the rest of the population.
Examples: Amounts of credit, expenditures on
insurance, expenditures on durable goods,
hours of work on non-farm activities, and the
amount of FDI.
196
Tobit models are particularly suited to model
these types of variables.
The original Tobit model was suggested by James
Tobin (Tobin 1958), who analyzed household
expenditures on durable goods taking into
account their non-negativity.
But only in 1964, Arthur Goldberger referred to
this model as a Tobit model, because of its
similarity to Probit models.
197
The Standard Tobit Model
Suppose that we are interested in explaining the
expenditures on tobacco of households in a given
year.
Let y denote the expenditures on tobacco, while z
denotes all other expenditures.
Total disposable income (or total expenditures) is
denoted by x.
We can think of a simple utility maximization
problem, describing the household’s decision
problem:
198
We account for this by allowing for unobserved
heterogeneity in the utility function and thus for
unobserved heterogeneity in the solution as well. Thus
we write, where ε corresponds to
unobserved heterogeneity.
If there were no restrictions on y and consumers could
spend any amount on tobacco, they would choose to
spend y∗.
The solution to the original, constrained problem, will
therefore be given by
Method of Estimation
If we attempt with the OLS estimation, we cannot use the positive
observations Yi from the following model:
204
Assumptions of Tobit Model
There are two basic assumptions underlying the Tobit model.
1. The error term is not heteroskedastic.
2. The error term should have a normal distribution.
Basics:
Yt = b0 + b1 Yt-1 + b2 Yt-2+ . ..+ bk Yt-k + t
future
popular as forecasting technique
considered. 207
208
Time series data has a temporal
ordering, unlike cross-section data.
We, thus, need to alter some of our
assumptions to take into account that we
no longer have a random sample of
individuals
Instead, we have one realization of a
stochastic (i.e. random) process.
209
Examples of time series models
A static model relates contemporaneous
variables: yt = b0 + b1 yt-1 + b2 yt-2 + ut
This is known as finite distributed lag (FDL)
model
A finite distributed lag (FDL) model allows
one or more variables to affect y with a lag
More generally, a finite distributed lag
model of order q will include q lags of z
210
Considering: yt = b0 + b1 yt-1 + b2 yt-2 + ut
We can call b0 the impact propensity – it
reflects the immediate change in y
For a temporary, 1-period change, y
returns to its original level in period q+1
We can call b0 + b1 +…+ bq the long-
run propensity (LRP) – which reflects the
long-run change in y after a permanent
change.
211
Assumptions for unbiasedness
Still we assume a model that is linear in
parameters: yt = b0 + b1xt1 + . . .+ bkxtk + ut
And we need to make a zero conditional
mean assumption: E(ut|X) = 0, t = 1, 2, …, n
Note that this implies the error term in any
given period is uncorrelated with the
explanatory variables in all time periods.
212
This zero conditional mean
assumption implies the x’s are strictly
exogenous
An alternative assumption, more
parallel to the cross-sectional case, is
E(ut|xt) = 0
This assumption would imply the x’s
are contemporaneously exogenous
But contemporaneous exogeneity will
only be sufficient in large samples
213
Still we need to assume that no x is
constant, and that there is no perfect
collinearity
Note we have skipped the assumption of
a random sample
The key impact of the random sample
assumption is that each ui is independent
Our strict exogeneity assumption takes
care of it in this case
214
Based on these 3 assumptions, when using
time-series data, the OLS estimators are
unbiased
Thus, just as was the case with cross-
section data, under the appropriate
conditions OLS is unbiased
Omitted variable bias can be analyzed in
the same manner as in the cross-section
case
215
Variances of OLS estimators
216
Under these 5 assumptions, the OLS
variances in the time-series case are the
same as in the cross-section case.
OLS remains BLUE
With the additional assumption of normal
errors, inference is the same as the
procedures of making inference in cross
sectional data analysis.
217
Trending time series
Time series data often have a trend
Just because two or more series are
trending together, we can’t assume that
their relationship is causal.
Often, both will be trending because of
other unobserved factors
Even if those factors are unobserved, we
can control for them by directly
controlling for the trend
218
One possibility is a linear trend, which can be
modeled as
yt = a0 + a1t + et, t = 1, 2, …
Another possibility is an exponential trend,
which can be modeled as
log(yt) = a0 + a1t + et, t = 1, 2, …
Another possibility is a quadratic trend, which
can be modeled as
yt = a0 + a1t + a2t2 + et, t = 1, 2, …
219
Seasonality
220
Stationarity
Stationarity is an important property that must
hold before we can estimate a time-series
model, difficult to predict the future otherwise.
A stochastic process is stationary if for every
collection of time indices 1 ≤ t1 < …< tm the joint
distribution of (xt1, …, xtm) is the same as that of
(xt1+h, … xtm+h) for h ≥ 1
Thus, stationarity implies that the xt’s are
identically distributed and that the nature of any
correlation between adjacent terms is the same
across all periods.
221
Weakly stationary process
222
Covariance stationary process
If a process is non-stationary, we cannot use
its past structure to predict the future
A stochastic process is covariance stationary if
E(xt) is constant, Var(xt) is constant and for any
t, h ≥ 1, Cov(xt, xt+h) depends only on h and not
on t
Thus, this weaker form of stationarity requires
only that the mean and variance are constant
across time, and the covariance just depends
on the distance across time
223
Weakly Dependent Time Series
A stationary time series is weakly
dependent if xt and xt+h are “almost
independent” as h increases
If for a covariance stationary process
Corr(xt, xt+h) → 0 as h → ∞, this
covariance stationary process is said to be
weakly dependent
We want to still use law of large numbers
224
Types of the process
(a). Moving average (MA) process
This process only assumes a relation between
periods t and t-1 via the white noise residuals et.
A moving average process of order one [MA(1)]
can be characterized as one where
Yt = et + a1et-1, t = 1, 2,
with et being an iid sequence with mean 0 and
variance s2
This is a stationary, weakly dependent sequence as
variables 1 period apart are correlated, but 2
periods apart they are not 225
Autoregressive (AR) process
An autoregressive process of order one
[AR(1)] can be characterized as one where
Yt = yt-1 + et , t = 1, 2,…
with et being an iid sequence with mean 0
and variance se2
For this process to be weakly dependent, it
must be the case that |r| < 1
An autoregressive process of order one
[AR(p)]
Yt = 1 yt-1 + 2 yt-2 +… p yp-1 + et
226
Similarly, a moving average (MA) of order (q)]
can be given as
Yt = et + a1et-1 + a2et-2 + … + aqet-q
A combined an AR(p) and MA(q) process can
be combined to an ARMA(p,q) process:
Yt = 1 yt-1 + 2 yt-2 + et + a1et-1 + a2et-2 … + aqet-q
Using the lag operator:
LYt=Yt-1
L2Yt =L(L)Yt=L(Yt-1 )=Yt-2
LpYt =Yt-p
227
Generally,
228
Assumptions for consistency
Linearity and weak dependence
A weaker zero conditional mean
assumption: E(ut|xt) = 0, for each t
No perfect collinearity
Thus, for asymptotic unbiasedness
(consistency), we can weaken the
exogeneity assumptions somewhat
relative to those for unbiasedness
229
Estimation and Inference for large sample
Weaker assumption of homoskedasticity:
Var (ut|xt) = s2, for each t
Weaker assumption of no serial
correlation: E(utus| xt, xs) = 0 for t s
With these assumptions, we have
asymptotic normality and the usual
standard errors, t statistics and F statistics
are valid.
230
Forecasting
Once we’ve run a time-series regression
we can use it for forecasting into the future
We can calculate a point forecast and
forecast interval in the same way we got a
prediction and prediction interval with a
cross-section
Rather than using in-sample criteria like
adjusted R2, we often want to use out-of-
sample criteria to judge how good the
forecast is.
231
Summary of objectives and steps in time series analysis
232
Chapter 8: Panel Data Methods
Basics
yit = b0 + b1xit1 + . . . bkxitk + uit
240
Example:
To evaluate whether a free school lunch service
improves outcomes of students, an experiment
is undertaken in Latin America. Student exam
(test) scores were collected from Rio and Sao
Paulo schools during the year 2008. Then,
students in Sao Paulo schools were provided
with free lunch services during the period 2009.
In 2010, students test scores were measured
from both Rio and Sao Paulo schools. The
measured results averaged from both sets of
schools before and after the free lunch service
are given below. 241
Example
Y Pre Post
(Exam scores) (2008) (2010)
Control
(Rio) 30 70
Treated 20 90
(Sao Paulo)
Y Dpost=0 Dpost=1
(Pre) (Post)
Dtreatment = 0 b0 b0b1
(Control)
245
When we don’t truly have random
assignment, the regression form becomes
very useful
Additional x’s can be added to the
regression to control for differences across
the treatment and control groups
Such cases are sometimes referred to as a
“natural experiment” especially when a
policy change is being analyzed
246
Estimation in a regression framework is as follows:
Y = b0 + b1Dpost + b2DTr + b3DpostDTr + b4 X+
The conditional expected value of Y given the values of
the explanatory variables X is given (dropping ) as:
E(Y/X) = b0 + b1Dpost + b2DTr + b3DpostDTr + b4 X
Estimating the above equation with a set of explanatory
variables including socio-economic characteristics is
important to
1. Provide free estimates of standard errors
2. Reduce bias from potential differences in time trends
3. Increase precision of estimates
247
Panel data estimation
The main ways of estimating panel data are
the fixed effect (FE) and random effect (RE).
However, it is better to start with first
differencing.
For two-period data, the first differencing
is easy, but beyond that we can use the
multiple period differencing procedures.
248
First-differences
249
Differencing with Multiple Periods
251
First Differences vs Fixed Effects
252
Random Effects
Start with the same basic model with a
composite error,
yit = b0 + b1xit1 + . . . bkxitk + ai + uit
Previously we’ve assumed that ai was
correlated with the x’s, but what if it’s not?
OLS would be consistent in that case, but
composite error will be serially correlated
253
Fixed Effects or Random?
Choosing data
Using data
Estimating a model
Other problems
Interpreting results
Further issues
255
Choosing a Topic
Start with a general area or set of
questions
Make sure you are interested in the topic
Use online services such as EconLit to
investigate past work on this topic
Narrow down your topic to a specific
question or issue to be investigated
Work through the theoretical issue
256
Choosing Data
Want data that includes measures of the
things that your theoretical model imply
are important
Investigate what type of data sets have
been used in the past literature
Search for what other data sets are
available
Consider collecting your own data
257
Using the Data
Create variables appropriate for analysis
For example, create dummy variables from
categorical variables, create hourly wages,
etc.
Check the data for missing values, errors,
outliers, etc.
Recode as necessary, be sure to report
what you did
258
Estimating a Model
Start with a model that is clearly based in
theory
Test for significance of other variables that
are theoretically less clear
Test for functional form misspecification
Consider reasonable interactions,
quadratics, logs, etc.
259
Estimating a Model (continued)
260
Estimating a Model (continued)
261
Other Problems
263
Interpreting Your Results
265
Don’t worry if you don’t “prove” your theory
With unexpected results, you want to be
careful in thinking through potential biases
But, if you have carefully specified your model
and feel confident you have unbiased
estimates, then that’s just the way things are!
….END…!
266