Econometrics II Slides-1
Econometrics II Slides-1
1 M 34 4000 5
2 F 28 4500 7
3 M 36 5000 4
4 M 25 3000 3
5 F 30 2500 3
… … …. …. ….
B) Time Series
• Time series data consist observations of a variable or several variables over
time
• E.g., stock prices, money supply, consumer price index, gross domestic product,
annual homicide rates, automobile sales, etc.
• Time series observations are typically serially correlated
• Ordering of observations conveys important information
• Data frequency: daily, weekly, monthly, quarterly, annually,
• Typical features: trends and seasonality
• Typical applications: applied macroeconomics and finance
Time Series data can be given as below
Year GDP in bn Export in bn Import in bn
2001 100 30 50
2002 120 32 60
2003 115 30 85
2004 135 36 70
2005 145 45 80
… … … …
C) Pooled Data
• Two or more cross sections are combined in one data set
• i.e. The data is collected from different cross-sectional units at different periods
of time
• Cross sections are drawn independently of each other
• Pooled cross sections are often used to evaluate policy changes
• Example: To evaluate effect of change in property taxes on house prices
Take random sample of house prices for the year 1993
Take a new random sample of house prices for the year 1995
Then, compare before/after (1993: before reform, 1995: after reform)
Pooled Data are given as below
Observation Year Housing Price Property Tax
1 1993 20,000 1
2 1993 25,000 1
3 1993 35,000 1
4 1995 40,000 0
5 1995 35,000 0
6 1995 45,000 0
D) Panel Data
• The same cross-sectional units are followed over time
• It has a cross-sectional and a time series dimension
• It can be used to account for time-invariant unobservable effects
• It can also be used to model lagged responses
• Example:
City crime statistics; each city is observed in two years
Time-invariant unobserved city characteristics may be modeled
Effect of police on crime rates may exhibit time lag
Panel Data are given as below
Observation Year GDP in bn Export in bn
1 2016 400 100
1 2017 500 120
2 2016 550 200
2 2017 600 250
3 2016 80 30
3 2017 90 35
1.2. Describing Qualitative Information
• We often face qualitative information on variables in econometric analysis
• Qualitative variables are variables that can’t be explained in terms of numbers
• They also are difficult to quantify
• They explain peoples’ behavior, existence or absence of certain events and
characteristics
• Qualitative variables often come in the form of binary information
• These binary variables are often called dummy variables and take values of 0 or
1 in regression analysis
• Assigning these arbitrary numbers (0 and 1) is useful for interpretation of
parameter estimates
• Dummy variables can be used as independent or dependent variables
Examples of Qualitative Variables
• Sex; male, female
• Residence; urban, rural
• Poverty status; poor, non-poor
• Race; black, white
• Educational status; illiterate, elementary, high school, diploma and TVET,
degree and above
• Employment; unemployed, self-employed, public employed
• Marital status; married, unmarried, divorced, widowed, separated
• If the Qualitative Variable has two categories, one dummy is required (1
category used as base group)
• If the Qualitative Variable has three categories, two dummies are required
• If the Qualitative Variable has n categories, n-1 dummies are required
1.3. Dummy as Independent Variables
• Dummy variables are binary variables having only two values
• They are used to study behaviors of households, firms, cities, countries, etc.
• They are useful for event study (drought, famine, war, disease), policy analysis,
and program evaluation
• They are used to represent ordinary and/ or qualitative information in
regressions
• The coefficients of dummy variables in regressions show differences in
intercepts between groups/ categories
• Differences in slope coefficients are captured through interaction variables
(interaction of dummy variables and other explanatory variables)
• Example 1: Interpret the following model (level-level form)
• 𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝐸𝑑𝑢𝑖 + 𝛽2 𝐸𝑥𝑝𝑖 + 𝛽3 𝐴𝑔𝑒𝑖 + 𝛽4 𝐷𝑓𝑒𝑚𝑎𝑙𝑒 + 𝑢𝑖
• 𝑤𝑎𝑔𝑒 = 120 + 0.6 𝐸𝑑𝑢𝑖 + 0.7 𝐸𝑥𝑝𝑖 + 0.25 𝐴𝑔𝑒𝑖 − 13 𝐷𝑓𝑒𝑚𝑎𝑙𝑒
• Interpretations
• 𝛽0 =120: wage is 120 when all explanatory variables are zero.
• 𝛽1 =0.6; as education increases by 1 year, wage increases by 0.6 units keeping
the other variables constant.
• 𝛽2 =0.7; as experience increases by 1 year, wage increases by 0.7 units keeping
the other variables constant.
• 𝛽3 =0.25; as age increases by 1 year, wage increases by 0.25 keeping the other
variables constant.
• 𝜷𝟒 =-13; females’ wage is less than males’ wage by 13 units keeping the
other variables constant.
• Example 2: Interpret the following model (log-level form)
• ln𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝐸𝑑𝑢𝑖 + 𝛽2 𝐸𝑥𝑝𝑖 + 𝛽3 𝐴𝑔𝑒𝑖 + 𝛽4 𝐷𝑓𝑒𝑚𝑎𝑙𝑒 + 𝑢𝑖
• 𝑙𝑛𝑤𝑎𝑔𝑒 = 120 + 0.06 𝐸𝑑𝑢𝑖 + 0.07 𝐸𝑥𝑝𝑖 + 0.025 𝐴𝑔𝑒𝑖 − 0.12 𝐷𝑓𝑒𝑚𝑎𝑙𝑒
• Interpretations
• 𝛽0 =120: lnwage is 120 when all explanatory variables are zero.
• 𝛽1 =0.06; as education increases by 1 year, wage increases by 6% keeping the
other variables constant.
• 𝛽2 =0.07; as experience increases by 1 year, wage increases by 7% keeping the
other variables constant.
• 𝛽3 =0.025; as age increases by 1 year, wage increases by 2.5% keeping the other
variables constant.
• 𝜷𝟒 =-0.12; females’ wage is less than males’ wage by 12% keeping the other
variables constant.
• Example 3: Interpret the following model (log-log form)
• 𝑙𝑛𝐺𝐷𝑃 = 𝛽0 + 𝛽1 ln 𝐿𝑡 + 𝛽2 ln 𝐾𝑡 + 𝛽3 ln 𝑅𝐹𝑡 + 𝛽4 𝐷𝑤𝑎𝑟 + 𝑢𝑡
• 𝑙𝑛𝐺𝐷𝑃 = 406 + 0.30 ln 𝐿𝑡 + 0.35 ln 𝐾𝑡 − 0.24 ln 𝑅𝐹𝑡 − 0.2 𝐷𝑤𝑎𝑟
• Interpretation
• 𝛽0 =406; lnGDP is 406 when all of the explanatory variables are zero
• 𝛽1 =0.30; as labor increases by 1%, GDP increases by 0.30% keeping the other
variables constant.
• 𝛽2 =0.35; as capital increases by 1%, GDP increases by 0.35% keeping the other
variables constant
• 𝛽3 =-0.24; as rainfall variability increases by 1%, GDP decreases by 0.24%
keeping the other variables constant
• 𝜷𝟒 =-0.2; GDP during war period is less than GDP during peace period by
0.2% keeping the other variables constant.
1.4. Dummy as Dependent Variable
• We may encounter dummy dependent variables in econometric analysis
• We explain a qualitative event with binary outcome
• Our dependent variable, Y, takes only two values: zero and one
• There are three approaches of estimating binary dependent variable
regressions
• They are: Linear Probability Model (LPM), Logit Model and Probit Model
1.4.1. The Linear Probability Model (LPM)
• The simplest model in terms of interpreting parameter estimates
• Easier to estimate using OLS method
• Given the model:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖 + 𝑢𝑖 … … … … … … … … … 1
• Where the dependent variable, Y, takes two values 0 and 1and the Xs are
explanatory variables
The β can’t be interpreted as the change in Y as a result of a change in X
Y only changes either from 0 to 1 or from 1 to 0
Since the 𝐸(𝑢𝑖/𝑋) = 0, the 𝐸(𝑌𝑖 /𝑋) = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖
Since Y has two values 0 and 1, the probability of “success” or the probability
that Y equals 1 is its expected value. i.e
𝑃 𝑌 = 1Τ𝑋 = 𝐸(𝑌𝑖 /𝑋) = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖 … . . 2
• Since the sum of probabilities is 1, the probability that Y equals 0 is given by
𝑃 𝑌 = 0Τ𝑋 = 1 − 𝑃 𝑌 = 1Τ𝑋 … … … … … … … … … .3
• The multiple linear regression model with a binary dependent variable is called
Linear Probability Model (LPM)
• The response probability is linear in the parameters, β
• In the LPM, β measures the change in the probability of success when X
changes
• Example: given a model on house ownership
• 𝐻𝑂𝑖 = 𝛽0 + 𝛽1 𝑀𝑖 + 𝛽2 𝑀𝑎𝑟𝑟𝑖 + 𝛽3 𝐷𝑖𝑣𝑜𝑟𝑖 + 𝛽4 𝐹𝑒𝑚𝑖 + 𝑢𝑖
• 𝐻𝑂𝑖 = 0.001 + 0.031 𝑀𝑖 + 0.01 𝑀𝑎𝑟𝑟𝑖 − 0.005 𝐷𝑖𝑣𝑜𝑟𝑖 + 0.015 𝐹𝑒𝑚𝑖
• Interpretation
• 𝛽0 =0.001; the probability of owning a house is 0.001 when all of the
explanatory variables are zero.
• 𝛽1 =0.031; as income increases by 1 unit, the probability of owning a house
increases by 0.031 keeping other variables constant
• 𝛽2 =0.01; married people have higher probability of owning a house than the
unmarried people by 0.01 keeping other variables constant.
• 𝛽3 =-0.005; divorced people have lower probability of owning a house than the
unmarried people by 0.005 keeping other variables constant
• 𝛽4 =0.015; females have higher probability of owning a house than males by
0.01 keeping other variables constant
• Drawbacks of LPM
• Functional form problem (theoretically not intuitive), non-normality of ui,
hetroscedasticity of ui, possibility of estimated probabilities lying outside 0-1
range, and lower R2
1.4.2. The Logit and Probit Models
• Logit and Probit models are used to model binary response variables
• The primary interest of logit/ probit models is the response probability
• They are nonlinear in nature
I) The Logit Model
• Consider a binary variable of house ownership
• Let Y=1 means the household owns a house and 0 otherwise and X is income
• The probability of home ownership can be given by;
1
• 𝑃𝑖 = 𝐸 𝑌 = 1/𝑋𝑖 = ………………………..1
1+𝑒 − 𝛽0 +𝛽1 𝑋𝑖
• Defining 𝑍 = 𝛽0 + 𝛽1 𝑋𝑖 , equation 1 can be rewritten as;
1 𝑒𝑍
• 𝑃𝑖 = = ………………………………………2
1+𝑒 −𝑍 1+𝑒 𝑍
• Equation 2 is the cumulative logistic distribution function
• As X and hence Z ranges from - to +, Pi ranges from 0 to 1
• At a very low level of income, X, the change in probability of owning house
for a small increase in income is low.
• Similarly, for a large level of income, the change in probability of owning
house for a small change in income is low
• Probability of owning house only changes highly for some medium range of
income level
• Thus, the graph of logistic distribution function is S shaped
P
1 cdf
0 X
𝑒𝑍
𝑃𝑖 1 + 𝑒 𝑍
= = 𝑒𝑍 … … … … … … … … … … … … … . . 4
1 − 𝑃𝑖 1
1 + 𝑒𝑍
• Taking the logarithm of the odds ratio in equation 4, we get
𝑃𝑖
𝐿 = ln = 𝑍 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 … … … … … … … … … 5
1−𝑃𝑖
• Where L is the log of the odds ratio and is called Logit.
II) The Probit Model
• Another alternative of modelling binary response variables
• Probit model uses normal cumulative distribution function and is called Normit
model
• The Logit and Probit models, though mathematically different, are similar in
outcome
• If a variable X follows normal distribution with mean μ and variance 2 , its
probability distribution function (PDF) and cumulative distribution function
(CDF) are given by;
(𝑋−𝜇)2 (𝑋−𝜇)2
1 − 𝑋0 1 −
•𝑓 𝑋 = 𝑒
2 2 𝑎𝑛𝑑 𝐹 𝑋 = − 𝑒
2 2 respectively.
2 2 2
2
𝛼0 + 𝛼1 𝛽0 𝛼1 𝛽2 𝛼2 𝛼1 1
𝑋𝑖 = + 𝑍1𝑖 + 𝑍2𝑖 + 𝑢1𝑖 + 𝑢2𝑖
1 − 𝛼1 𝛽1 1 − 𝛼1 𝛽1 1 − 𝛼1 𝛽1 1 − 𝛼1 𝛽1 1 − 𝛼1 𝛽1
𝑋𝑖 = 0 + 1 𝑍1𝑖 + 2 𝑍2𝑖 + 1 𝑢1𝑖 + 2 𝑢2𝑖
𝛼0 +𝛼1 𝛽0 𝛼1 𝛽2 𝛼2 𝛼1
• Where; 0 = , 1 = , 2 = , 1 = , 𝑎𝑛𝑑 2 =
1−𝛼1 𝛽1 1−𝛼1 𝛽1 1−𝛼1 𝛽1 1−𝛼1 𝛽1
1
, 𝛼1 𝛽1 ≠ 1
1−𝛼1 𝛽1
• From the last equation, we can clearly see the correlation between Xi and u1 and this
causes bias and inconsistency in OLS estimation
3.3. Order and Rank Conditions of Identification
a) The Identification Problem
• Estimation of simultaneous equation models need identification
• i.e., to estimate a model from SEMs we have to identify it
• Consider the simple supply and demand equations for equilibrium quantity Q:
𝑄 = 𝛽1 𝑃 + 𝛽2 𝑍1 + 𝑢1 … … … … … … … .1
𝑄 = 𝛼1 𝑃 + 𝑢2 … … … … … … … … … … . . 2
• Where the 1st equation is the supply function while the 2nd is the demand function
• The presence of the exogenous variable Z1 in the supply function helps us identify the
demand function
• i.e. Z1 serves as an Instrumental Variable (IV) for price in the demand function
• Therefore, we can estimate the demand function
• However, there is no exogenous variable in the demand equation and thus we have
no IV for price in the supply function
• The supply function is said to be unidentified and can’t be estimated
b) Conditions for Identification: Order and Rank Condition
• Order condition: The number of excluded exogenous variables from an
equation should be, at least, as large as the number of right-hand side
endogenous variables
• The order condition is only necessary but not sufficient condition
• Rank condition: The 1st equation in a two-equation simultaneous equation
model is identified, if and only if, the second equation contains at least as many
exogenous variables (with a non-zero coefficient) that are excluded from that
equation as the number of endogenous variables included in it
• The rank condition is the necessary and sufficient condition for identification
• Once an equation is identified, it can be estimated using IV or 2SLS
• Example: consider the following three equation simultaneous equation models
𝑋1 = 𝛽1 𝑋2 + 𝛽2 𝑋3 + 𝛽3 𝑍1 + 𝑢1
𝑋2 = 𝛼1 𝑋1 + 𝛼2 𝑋3 + 𝛼3 𝑍1 + 𝛼4 𝑍2 + 𝑢2
𝑋3 = 1 𝑋2 + 2 𝑍1 + 3 𝑍2 + 4 𝑍3 + 5 𝑍4 + 𝑢3 … … … … … … 3
• Where the Xs are endogenous while the Zs are exogenous variables
• In terms of the order condition, the 1st equation is over identified because
while we need two IVs, we have three
• The second equation is just identified while the third is unidentified equation
3.4. IV and 2SLS Estimation of the Structural Equations
• consider the structural models of;
𝑋1 = 𝛽1 𝑋2 + 𝛽2 𝑍1 + 𝑢1 … … … 1
𝑋2 = 𝛼1 𝑋1 + 𝛼2 𝑍2 + 𝑢2 … … … . . 2
• Where, the Xs are endogenous and the Zs are exogenous variables
• Suppose we want to estimate the 1st equation
• Substituting the right-hand side of the 1st equation into the 2nd, we have:
𝑋2 = 𝛼1 [𝛽1 𝑋2 + 𝛽2 𝑍1 + 𝑢1 ] + 𝛼2 𝑍2 + 𝑢2
𝑋2 = 𝛼1 𝛽1 𝑋2 + 𝛼1 𝛽2 𝑍1 + 𝛼2 𝑍2 + 𝛼1 𝑢1 + 𝑢2
1 − 𝛼1 𝛽1 𝑋2 = 𝛼1 𝛽2 𝑍1 + 𝛼2 𝑍2 + 𝛼1 𝑢1 + 𝑢2
𝛼1 𝛽2 𝛼2 𝛼1 𝑢1 + 𝑢2
𝑋2 = 𝑍 + 𝑍 +
1 − 𝛼1 𝛽1 1 1 − 𝛼1 𝛽1 2 1 − 𝛼1 𝛽1
𝑋2 = 1 𝑍1 + 2 𝑍2 + 𝑉2 … … … … … … . . 3
𝛼1 𝛽2 𝛼2 𝛼1 𝑢1 +𝑢2
• Where 1 = , 2 = , 𝑎𝑛𝑑 𝑉2 =
1−𝛼1 𝛽1 1−𝛼1 𝛽1 1−𝛼1 𝛽1