0% found this document useful (0 votes)
53 views95 pages

Time Series and Panel Data Econometrics

The document covers various regression models, including nonlinear regression, qualitative response regression, and panel data models, detailing their estimation methods and applications. It discusses the Taylor Expansion Series for approximating nonlinear functions and provides insights into qualitative response models used for categorical dependent variables. Additionally, it highlights the importance of understanding the nature and estimation techniques of these models in econometrics.

Uploaded by

kingsafiullah1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views95 pages

Time Series and Panel Data Econometrics

The document covers various regression models, including nonlinear regression, qualitative response regression, and panel data models, detailing their estimation methods and applications. It discusses the Taylor Expansion Series for approximating nonlinear functions and provides insights into qualitative response models used for categorical dependent variables. Additionally, it highlights the importance of understanding the nature and estimation techniques of these models in econometrics.

Uploaded by

kingsafiullah1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 95

1

CHAPTER 1 Nonlinear Regression Models

Intrinsically Linear and Intrinsically Nonlinear Regression Models…………. 4

Estimation of Nonlinear Regression Models……………….……….……….…….. 5

Taylor Expansion Series………….……….……….……….……….……….……….……6

CHAPTER 2 Qualitative Response Regression Models

Nature of qualitative response models…………….……….……….……….……. 11

Linear probability model……………….……….……….……….……….……….….. 12

Applications of linear probability model…………….……….……….………..…. 15

Logit Model……………….……….……….……….……….……….……….……….….. 16

Estimation of the Logit mode………….……….……….……….……….……….….. 18

Grouped Logit Model……………….……….……….……….……….……….……….. 20

Logit model for ungrouped or individual data……………….……….……….….. 23

Probit Model, Probit estimation with Grouped Data…………….……….….…. 26

The Probit model for ungrouped or individual data…………….……….………. 27

Logit and Probit Models, The Tobit model…………….……….……….……….…. 30

CHAPTER 3 Pool and Panel Data

Panel Data Regression Models………….……….……….……….……….……….…. 32

Estimation of Pool Data Regression Models……………….……….……….…….. 33

Common Intercept Method………….……….……….……….……….……….…….. 35

Fixed-effects model…………….……….……….……….……….……….……….……. 36

Least Squares Dummy Variables Approach…………….……….……….…….…. 38

Random-effects model…………….……….……….……….……….……….…….…. 40

Generalized Least Squares Approach………….……….……….……….………... 41


2

Fixed-effects Model Vs. Random-effects Model……………….……….…..….. 43

Hausman Specification Test……………….……….……….……….……….………. 45

CHAPTER 4 Simultaneous Equation Models and Estimation Methods

Simultaneous Equation Models…………….……….……….……….……….….…. 47

Nature of Simultaneous Equations……………….……….……….……….……….. 50

Examples of Simultaneous Equation Models from economic theory………. 51

Inconsistency of OLS estimators……………….……….……….……….………….. 53

Identification problem…………….……….……….……….……….……….……...…. 55

Unidentified, exactly identified, and overidentified…………….……….…...…. 56

Rules for identification…………….……….……….……….……….……….……...….


58

Simultaneous equations approach to estimation…………….……….……...…. 60

Indirect least squares (ILS)…………….……….……….……….……….……….…….


62

Two-stage least-squares (2SLS), Instrumental Variables………….……….….. 64

Seemingly unrelated regression (SUR)…………….……….……….……….……….


66

Nature of SUR equations…………….……….……….……….……….……….……….


68

Method of Generalized Least Squares (GLS)…………….……….……….………. 70

CHAPTER 5 Time-Series Econometrics

Stationarity, Tests for Stationarity………….……….……….……….……….……….


73

Transforming Non-stationary Time Series…………….……….……….………..…. 77

ARMA and ARIMA Models……………….……….……….……….……….……….…... 79

Comparison of forecast based on ARIMA and regression models.……….…. 82


3

Cointegration and Error Correction Models(ECM)………….……….……...……. 85

ARCH and GARCH Models…………….……….……….……….……….……….….…. 88

CHAPTER 1 Nonlinear Regression Models


Nonlinear Regression Models
Nonlinear regression models are types of regression models where the
relationship between the dependent variable and the independent variables
is not a straight line. These models are used when the effect of changes in
independent variables on the dependent variable is nonlinear.

In such models, the regression equation involves nonlinear functions like


exponential, logarithmic, or polynomial terms.

Mathematical Form
Y =f ( X , β )+ ϵ Where:

1. Y : Dependent variable
2. X : Independent variable(s)
3. β : Parameters to be estimated
4. ϵ : Error term

Examples
i) Exponential Model:
Y=β0eβ1X+ϵ

ii) Logarithmic Model:


Y= β0+β1ln(X)+ϵ

iii) Polynomial Model:


Y= β0+β1X+β2X2+ϵ

iv) Logistic Model:


βο
Y= − β 1( Χ −β 2)
1+e

v) Power Model:
Y= β0Xβ1+ϵ
4

Q1: Intrinsically linear and intrinsically nonlinear


Regression models
Intrinsically Linear Regression Models
A type of model where the relationship between the dependent variable (Y)
and the independent variable(s) (X) is linear in terms of the parameters
(coefficients). These models can be written as:

Y = β₀ + β₁X₁ + β₂X₂ + ... + ε

Example:
Y = β₀ + β₁X + ε

Here, the model is linear because the parameter β₁ is not raised to a power
or multiplied by another parameter.

Intrinsically Nonlinear Regression Models


A type of model where the relationship between the dependent variable (Y)
and the independent variable(s) (X) is nonlinear in terms of the parameters.
These models cannot be written as a straight-line equation and need
advanced estimation techniques like nonlinear least squares.

The parameters are involved in nonlinear functions like exponents,


logarithms, or multiplication of parameters.

Harder to estimate and interpret.

Example:
Y = β₀eβ₁X+ε

Here, the parameter β₁ appears inside an exponent, making the model


nonlinear in parameters.
5

Q2: Estimation of nonlinear Regression models


Estimation of Nonlinear Regression Models
Non Linear Regression Models can be estimate through trail & error method.

Trail & Error Method


The Trial and Error Method is a simple technique used to estimate the
parameters of a nonlinear regression model. In this method, we start by
making an initial guess about the values of the parameters (coefficients).
Then, we calculate the error (difference between the predicted and actual
values) and adjust the parameters to reduce this error. We repeat this
process until the error becomes very small or acceptable.

Steps for the Trial and Error Method:


i) Choose a Model: Start with a nonlinear model. For example, a
model might be Y=α+βX2+γX where Y is the dependent variable,
and X is the independent variable.

ii) Initial Guess: Guess initial values for the parameters (like
α(alpha), β(beta), and γ(gamma). These guesses can be random or
based on prior knowledge.

iii) Estimate Values: Plug the guessed values into the model to
estimate Y.

iv) Calculate the Error: Compare the estimated Y with the actual
observed values of Y. The error is the difference between the
predicted and the actual value.

v) Adjust Parameters: Change the values of the parameters to


reduce the error. This adjustment is done by trial and error until the
error becomes small.

vi) Repeat: Repeat the process until the error is minimal or you reach
an acceptable level of accuracy.

Example:
Let's say we have a simple nonlinear model like:

Y = α + βX2 + γX

1. Start with guesses: α = 1, β = 2, and γ = 3.


2. Estimate Y using these values.
6

3. Compare the estimated Y with actual values.


4. Adjust α, β, and γ and repeat the steps until the error is small.

Pros of Trial & Error Method:


1. Simple and easy to understand.

2. Does not require complex mathematics or algorithms.

Cons of Trial & Error Method:


1. Can be time-consuming.
2. Finding the best set of parameters may be difficult.
3. May not always give the optimal solution.
This method is basic but helps in understanding how nonlinear relationships
can be estimated.

Taylor Expansion Series


 The Taylor series is named after the mathematician Brook Taylor, who
introduced it in the early 18th century.

 The Taylor Expansion Series is a way to break a complex function into a sum of simpler
terms, using the function's value and its derivatives at a specific point. It helps
approximate the function around that point, making it easier to understand or calculate.

Mathematically:

f ' '(a) f ' ' '(a)


f(x) = f(a) + f ’ (a)(x – a) + (x – a)2 + (x – a)3 + …….
2! 3!

Where:

f ( a ) : Value of the function at (a).

f ' (a): First derivative of the function at (a).


7

f ' ' (a): Second derivative of the function at (a).

Convergence:

 The Taylor Series converge to original function f(x) if

i) f(x) is infinitely differentiable at x=a


ii) The reminder approach to zero as the number of term increases.

Limitations:

1. Accuracy: It may not be accurate far from the point \( a \).

2. Convergence: It might not always give a correct result, even with infinite terms.

3. Smoothness: The function needs to be smooth and have derivatives at \( a \).

4. Many Terms: More terms are needed for better accuracy, which can be complex.

5. Non-Analytic Functions: It doesn’t work well for functions with sharp turns or
discontinuities.

Expand 3x3 + 8x2 + x – 2 in Power (x – 3) using Taylor Series

1st find f ? As x – 3 = 0 so x = 3

f(x) = 3x3 + 8x2 + x – 2

Put x Value

f(3) = 3(3)3 + 8(3)2 + 3 – 2

f(3) = 3(27) + 8(9) + 1

f(3) = 81 + 72 + 1

f(3) = 154 , f(a) = 154

Taylor Expension Series:

f ' '(a) f ' ' '(a)


f(x) = f(a) + f ’ (a)(x – a) + (x – a)2 + (x – a)3
2! 3!
8

Now Find f ’ ( a )

3x3 + 8x2 + x – 2

d
f ’ (x ) = (3x3 + 8x2 + x – 2)
dx

 9x2 + 16x + 1 (Put x Value)


 9(3)2 + 16(3)+ 1
 81 + 48 + 1
 130
 f ' (a) = 130

Now Find f ' ' ( a )

d
f ’ ’(x ) = (9x2 + 16x + 1)
dx

f ’ ’(x ) = 18x + 16 (Put x Value)

 18(3) + 16
 54 + 16 = 70
 f ' ' (a) = 130

Now Find f ' ' ' ( x )

d
f ’ ’ ’ (x) = (18x + 16)
dx

f ’ ’ ’ (x) = 18

 18

 f ' ' ' (a) = 18

Put all values in Taylor Expension Series

f ' '(a) f ' ' '(a)


f(x) = f(a) + f ’ (a)(x – a) + (x – a)2 + (x – a)3
2! 3!
9

70 18
f(x) = 154 + 130(x–3) + (x–3)2 + (x–3)3
2 6

f(x) = 154 + 130(x–3) + 35(x–3)2 + 3(x–3)3

Or

3x3 + 8x2 + x – 2 = 154 + 130(x–3) + 35(x–3)2 + 3(x–3)3

Non Linear Regression Model Linearization Using Taylor


Expansion Series
1. Start with a Non Linear Regression Model:

Suppose we have Non Linear Regression Model

Y = α eβx + ei

Where

 Y = Dependent Variable
 x = Independent Variable
 α & β are Parameters
 e = Error term

2. Expand with Taylor Series:

Start with initial guess for parameters

Let α = α0 & β = β0

These values are not necessarily equals to original values but close enough for approximation

3. Taylor Expansion Series around initial Values:

We will approximate

f(x,α,β) = f(x,α0,β0)
. .
∆f ∆f
f(x,α,β) = f(x,α0,β0) = ∫
∆ α (α 0 , β 0 )
( α−α 0)+
∆ β
∫ ( β−β 0)
(α 0 ,β 0)
10

 Evaluate f(x,α0,β0) = α0 eβ0x


.
∆f
For α => ∫ ¿ eβ0x
∆ α (α 0 , β 0 )

.
∆f
For β =>
∆β
∫ ¿ α0 xeβ0x
(α 0 ,β 0)

Substitute & Linearize the Model

Y = f(x,α0,β0) = eβ0x(α-α0) + α0 xeβ0x(β-β0) => Equation (A)

Let

 Y = Y*
 α-α0 = ∆α
 β-β0 = ∆β
 eβ0x = Jα
 α0 xeβ0x = Jβ

Put in Equation A

We get:

Y = f(x,α0,β0) = Jα ∆α + Jβ ∆β

The Model is now Linear

Estimate Parameters: We can now apply ordinary least squares (OLS) or other linear
regression techniques to estimate 𝛼 and 𝛽 from the data. If the Error is not minimum we will
start guessing again and again until we get minimum error.

Recalculate the Model: Once you have the estimates of 𝛼 & 𝛽, plug them back into the
nonlinear equation to get the final estimated model.
11

CHAPTER 2 Qualitative Response Regression Models

Q1: The Nature of Qualitative Response Models


Qualitative Response Model

A Qualitative Response Model is used when the dependent variable is not numerical but
categorical. For example, it can represent choices like "yes" or "no," or outcomes like
"employed" or "unemployed." These models help to analyze decisions or situations where the
outcome is not continuous but qualitative.

The Nature of Qualitative Response Models

 Dependent variable: It is categorical (e.g., binary or more than two


categories).
 Purpose: They study relationships between independent variables
(numerical or categorical) and a qualitative dependent variable.
 Types: Common models include:
• Binary choice models (e.g., Probit, Logit).
• Multinomial models (for more than two outcomes).
 Applications: Used in fields like economics, psychology, and
marketing to study decisions, preferences, or classifications.

Mathematically

Y = α + βΧ + μi

 Y = 1, having a car
 Y = 0, having no car

Y = 1 = Pi, Probability of having a car

Y = 0 = 1 - Pi, Probability of having no car


12

Q2: The linear probability model

The Linear Probability Model (LPM)


The Linear Probability Model (LPM) is used to explain a binary dependent
variable (0 or 1). It estimates the conditional probability of the dependent
variable YY being 1, given the independent variable XX.

The equation for the model is:

Υi = β1 + β2Χi + μi
Here:

 Yi is the binary dependent variable (e.g., 1 if a family owns a house, 0


otherwise).

 Xi is the independent variable (e.g., family income).

 β1 and β2 are coefficients to estimate.

 μi s the error term.

The conditional expectation of Yi,E ( YiXi ) . represents the probability of Yi = 1:


( Yi )
Yi,E Xi = β1 + β2Χi + μi = Pi

Here, Pi is the probability of the event occurring (e.g., owning a house).

The dependent variable Yi follows a Bernoulli distribution with:

Mean Pi= E ( )
Yi
Xi
Variance Pi (1 − Pi)

The coefficient β2 indicates how the probability of Yi=1 changes when Xi


increases by one unit.

Challenges in LPM
Non-Normality of Disturbances (μi)
The error term μi follows a Bernoulli distribution, not a normal distribution.
13

This affects statistical inference in small samples but does not make the OLS
estimates biased.

Heteroscedasticity
The variance of μi depends on Xi, meaning it is not constant:

Var(μi) = Pi(1−Pi)

This makes OLS estimators inefficient (higher variance).

Out-of-Bounds Predictions

( )
Yi
E Xi must lie between 0 and 1, but LPM does not guarantee this. Predicted
probabilities Ŷᵢ can be less than 0 or greater than 1.

Solutions to LPM Problems


To address heteroscedasticity, Weighted Least Squares (WLS) can be used by
transforming the model:
Yi 1 Χi 1
=β1 + β2 + μi
√ ωi √ ωi √ ωi √ ωi

Here, ωi = Pi(1−Pi)

For the bounds issue, models like Logit and Probit are used, which ensure
probabilities stay between 0 and 1.

Key Features of the Linear Probability Model (LPM)


 Binary Dependent Variable
The dependent variable Y is binary (0 or 1), representing two possible
outcomes (e.g., success/failure).

 Linear Relationship
The model assumes a linear relationship between the independent
variable X and the probability of Y = 1.

 Conditional Probability

The predicted value of Yi,E ( YiXi ), represents the conditional probability


of the event occurring.
14

 Bernoulli Distribution
The dependent variable follows a Bernoulli distribution, with mean Pi
and variance Pi(1−Pi).

 Estimation via OLS


Ordinary Least Squares (OLS) is used to estimate the coefficients β1
and β2.

 Interpretation of Coefficients
The slope coefficient β2 shows the change in probability for a one-unit
change in X.

Advantages of LPM
 Simplicity
The model is easy to understand and apply, requiring only basic
regression techniques.

 Direct Interpretation
The coefficients directly represent the change in probability, making
them easy to interpret.

 Useful for Initial Analysis


LPM provides quick insights into relationships between variables in
binary outcome models.

 Flexible for Small Data Sets


The model can be applied even with smaller sample sizes.

Limitations of LPM
 Predicted Probabilities Out of Bounds
LPM does not guarantee that predicted probabilities will lie between 0
and 1.

 Heteroscedasticity
The variance of the error term is not constant, leading to inefficient
OLS estimates.

 Non-Normality of Errors
The error term follows a Bernoulli distribution, which complicates
statistical inference in small samples.
15

 Linear Assumption
Probabilities are not necessarily linear in relation to the independent
variables, which can lead to inaccurate results.

 Alternative Models Preferred


Logit and Probit models are better alternatives as they address the
limitations of LPM, such as ensuring probabilities stay within bounds
and handling heteroscedasticity.

Q3: Applications of Linear Probability Model


Applications of Linear Probability Model (LPM)
The Linear Probability Model (LPM) is used in various fields to study and
predict binary outcomes (where the result is either 0 or 1). Here are its key
applications:

 Predicting Binary Outcomes


LPM helps predict the probability of an event, such as:

• Will a student pass an exam? (Pass = 1, Fail = 0)

• Will a customer purchase a product? (Yes = 1, No = 0)

 Policy Evaluation
Governments and organizations use LPM to evaluate the impact of
policies, like:

• The effect of a health program on vaccination rates (Vaccinated


= 1, Not Vaccinated = 0).

• The impact of education subsidies on school attendance


(Attends = 1, Does Not Attend = 0).

 Banking and Finance


Banks use LPM for credit risk analysis, such as:

• The likelihood of a loan default (Default = 1, No Default = 0).

 Market Research
Businesses use LPM to understand customer behavior, for example:

• Will a customer choose a specific brand? (Yes = 1, No = 0).


16

• The effect of advertising on product choice.

 Employment Studies
LPM is used to analyze labor market issues, such as:

• The probability of being unemployed (Unemployed = 1,


Employed = 0).

• The likelihood of getting a promotion.

 Social and Economic Research


Researchers use LPM to study social problems, like:

• The chances of falling into poverty (Poor = 1, Not Poor = 0).

• The probability of housing ownership (Owns = 1, Rents = 0).

Q4: The logit model

The Logit Model

The Logit Model is a regression model used to estimate the probability of a binary outcome
(e.g., success or failure, yes or no) based on one or more independent variables. It assumes that
the dependent variable is the log of the odds ratio, which is linearly related to the independent
variables.

The general form of the Logit Model is:

1
Pi= −Zi
1+ e

Zi = β1 + β2Xi

Where:

 Pi: Probability of the event occurring (e.g., owning a house).


 β1: Intercept term.
 β2: Coefficient of the independent variable X (e.g., income).
 e: Base of the natural logarithm.
 Xi: Independent variable (e.g., income).
17

Key Features of the Logit Model:

1. Probability Representation:

 In the Logit Model, the probability of an event, Pi, is represented as:


1
Pi= −Zi
1+ e

2. Odds Ratio:

 The odds ratio, which measures the odds of an event occurring, is:

(1−P i)
= eZi
Pi

 If Pi = 0.8, the odds are 4:1 in favor of the event happening.

3. Logit Transformation:

 Taking the natural log of the odds ratio gives the logit:

Li=ln ( 1−Pi
Pi
)=Zi=¿β1+¿β2 Xi
 The logit is linear in X and the parameters β1 and β2.

4. Nonlinear Relationship:

 The probabilities (Pi) are nonlinear in X, unlike the Linear Probability Model (LPM).

Properties of the Logit Model:

1. Boundaries of Probability:

 As Zi moves from −∞ to +∞, Pi remains between 0 and 1.


 The logit (Li) ranges from −∞ to +∞.

2. Multiple Regressors:

 The model can include multiple independent variables if needed.

3. Interpretation of Parameters:
18

 β2: Shows the change in the log-odds (Li) for a one-unit change in X. For example, it
indicates how the odds of owning a house change with income.
 β1: Represents the log-odds when X = 0.

4. Positive and Negative Logits:

 If L > 0: Higher X increases the odds of the event.


 If L < 0: Higher X decreases the odds of the event.

5. Estimation of Probabilities:

 Using β1 and β2, the probability Pi can be directly calculated.

6. Difference from LPM:

 While LPM assumes a linear relationship between Pi and X, the Logit Model assumes a
linear relationship between the log-odds and X.

Q5: Estimation of the Logit Model

Estimation of the Logit Model:


The Logit Model is estimated by transforming the probability of an event
occurring into the logit (log-odds) form. The logit is defined as:

Li=ln ( 1−Pi
Pi
)=Zi=¿β1+¿β2 Xi+ ui
where:

 Li is the logit (log-odds),


 Pi is the probability of the event occurring,
 Xi is the independent variable,
 β1 and β2 are the model parameters to be estimated, and
 ui is the error term.

Individual-Level Data:
19

If we have individual-level data, OLS (Ordinary Least Squares) cannot be


used directly because Pi can only take values of 0 or 1, which makes the logit
undefined. Instead, Maximum Likelihood Estimation (MLE) is used to
estimate the model parameters.

Grouped or Replicated Data:

If we have grouped data (e.g., data categorized by income levels), we first


calculate the estimated probability

P i= ¿
^
¿

 ni is the number of events (e.g., homeowners),


 Ni is the total number of observations in the group.
 Using this, we calculate the logit as:

^
^L i= P i
1−P^i

 The parameters β1 and β2 can then be estimated using Weighted


Least Squares (WLS) to account for heteroscedasticity (different
error variances).

Heteroscedasticity:

The error term in the Logit Model is heteroscedastic, meaning the variance of
the error term changes with the level of Pi. This requires the use of weights in
the estimation process to correct for this.

Using WLS for Estimation:

To correct for heteroscedasticity, the equation is transformed as:

√ ωiLi=β 1 √ ωi+ β 2 √ ωi+ √ ωiμi

Where:

 ωi = Ni ^ ^ i) are the weights.


P i (1 – P
 This transformation ensures that the error term is homoscedastic
(constant variance).
20

OLS Estimation:
After transforming the data, we apply OLS (Ordinary Least Squares) to
estimate the parameters β1 and β2

Hypothesis Testing:

Confidence intervals and hypothesis tests are performed to assess the


significance of the estimated parameters, but these conclusions are valid
only if the sample size is large enough.

Q6: The Grouped Logit Model

The Grouped Logit Model:


The grouped logit model is used when we have grouped data (e.g., income
levels grouped by families) and we want to study the relationship between a
variable (like income) and the probability of an outcome (e.g., owning a
house).

Calculation:
The Model work with the log-odds formula:

Li=ln ( 1−Pi
Pi
)
 Pi: Probability of owning a house in a group.
 (1−Pi): Probability of not owning a house.

We use weighted least squares (WLS) regression to calculate the relationship


between income and the log-odds of owning a house.

Example:
Suppose we want to study how income affects the probability of adopting
solar panels. The data is grouped by income levels (e.g., families earning
$10,000–$20,000, $20,000–$30,000, etc.). For each group, we know:

 The total number of families (Ni).


21

 The number of families that adopted solar panels (ni)

Income Level Total Adopted Probability Log-Odds


($) Families
(Ni)
Families
(ni)
(Pi) = ni/Ni
Li=ln( Pi
1−Pi )
10,000– 50 10 0.20 -1.3863
20,000
20,000– 60 18 0.30 -0.8472
30,000
30,000– 70 35 0.50 0
40,000
40,000– 80 60 0.75 1.0986
50,000
50,000– 90 81 0.90 2.1972
60,000

Step 1: Building the Glogit Model


Log-Odds Formula
The log-odds are calculated using:

Li=ln ( 1−Pi
Pi
)
For example:

 At income $10,000–$20,000, Pi=0.20, so:

Li=ln ( 1−0.20
0.20
)=ln ( 0.25)=−1.3863
Step 2: Regression Equation
The grouped logit regression model is:
^L i=¿β1+¿β2 Xi

Where:

 Li: Log-odds of adopting solar panels.


 Xi: Income level (e.g., $15,000, $25,000).

After performing weighted regression, the equation might look like this:
^L i=−2.0+0.05 Xi
22

Step 3: Interpretation of Results


 Intercept (−2.0)
At very low income levels (X = 0), the log-odds of adopting solar
panels are −2.0.

 Slope (0.05)
For every $1,000 increase in income, the log-odds of adopting solar
panels increase by 0.05.

Step 4: Calculating Odds and Probabilities


Odds Ratio
The odds ratio can be calculated by taking the antilog of the slope:

e0.05 ≈ 1.051
This means the odds of adopting solar panels increase by 5.1% for every
$1,000 increase in income.

Probability Example
To calculate the probability of adopting solar panels for a specific income
level, use:
Li
e
Pi=¿ Li
1+ e

For income = $30,000:

Li=−2.0+ 0.05(30)=−0.5
−0.5
e
Pi=¿ −0.5
=0.3775
1+ e

The probability of adopting solar panels at $30,000 income is 37.75%.

Step 5: Change in Probability


The change in probability for a $1,000 increase in income is:
23

ΔP=β ⋅ P⋅(1−P)

At $30,000 income:

ΔP=0.05 ⋅0.3775 ⋅ ( 1−0.3775 )=0.0117

This means the probability increases by 1.17% for every $1,000 increase in
income.

Q7: The Logit Model for Ungrouped or Individual Data

The Logit Model for Ungrouped or Individual Data


The Logit model for ungrouped or individual data is used when the
dependent variable is binary, meaning it can only take two values (e.g., 1 or
0, yes or no, A or not A). This model is particularly useful when predicting
outcomes such as the likelihood of someone getting a certain grade, buying
a product, or getting a job, based on various factors.

The Logit Model is written as:

Li=ln ( 1−Pi
Pi
)=β 1+ β 2(GPA)+ β 3(TUCE)+ β 4( PSI )+ μ
Where:

 Li is the log of the odds of getting an A grade.


 Pi is the probability of getting an A.
 GPA, TUCE, and PSI are the independent variables, such as grade point
average, test scores, and the teaching method.
 μ is the error term.

Maximum Likelihood Estimation (MLE): Since the dependent variable is


binary, we use the method of Maximum Likelihood Estimation (MLE) to
estimate the model parameters. This method helps to find the best-fitting
model.

Interpretation of Coefficients:
24

 Each coefficient shows the effect of a variable on the log odds of the
outcome (getting an A).
 For example, if the coefficient for GPA is positive, it means that as GPA
increases, the odds of getting an A increase.
 The coefficients are usually interpreted in terms of odds ratios. For
example, if the coefficient for PSI is 2.3786, the odds of getting an A
are about 10 times higher for students using the new teaching method
(PSI = 1) compared to those who are not (PSI = 0).

Goodness of Fit:

 McFadden R²: This is a measure of how well the model fits the data,
but it is different from the usual R² in linear regression. Higher values
indicate a better fit.
 Count R²: This is the proportion of correct predictions made by the
model. It shows how accurate the model is in classifying students'
grades as A or not A.

Statistical Significance: To check if the coefficients are statistically


significant, we use the Z-statistic instead of the t-statistic (as the logit model
uses a non-linear approach). If the Z-statistic is high and the p-value is low, it
means the coefficient is significant.

Example:
• The coefficient for GPA is 2.8261, meaning a higher GPA increases the
likelihood of getting an A.

• The coefficient for PSI (new teaching method) is 2.3786, meaning students
in the PSI group are about 10 times more likely to get an A compared to
those who are not in the PSI group.

Likelihood Function:
A likelihood function represents the probability of observing the sample
data given a specific set of parameters in a statistical model. It is used to
estimate the parameters that make the observed data most probable. In
25

other words, the likelihood function helps us to find the values of the model's
parameters that best fit the observed data.

For example, if we have a dataset of observations y1,y2,...,yn from a


random variable Y, and we assume that the data follows a specific probability
distribution (such as normal distribution), the likelihood function L(θ) based
on the data will look like:

L(θ)=P ( y 1 , y 2 ,... , yn|θ ¿

Where:

 θ is the parameter(s) of the distribution (e.g., mean μ and standard


deviation σ for a normal distribution).
 P ( y 1 , y 2, ... , yn|θ ¿ is the joint probability of observing the data given the
parameters θ\theta.

The likelihood function is typically maximized to estimate the parameters of


the model.

Maximum Likelihood Function for the Logit Model:


In the Logit Model, the goal is to model the probability that an event occurs
given a set of explanatory variables. The model assumes the following form
for the probability

P(Y =1∣ X ):
1
P ( Y =1 ∣ X )= −(βο +β 1 Χ 1+ β 2 Χ 2+…+ βκΧκ )
1+ e

Where:

 P(Y=1∣X) is the probability that the outcome Y equals 1, given the


explanatory variables X.
 β0,β1,β2,...,βk are the parameters to be estimated.
 X1,X2,...,Xk are the explanatory variables.

The likelihood function for the logit model is the joint probability of observing
all the n data points, assuming that each observation Yi is independent. The
likelihood function is:
26

n
L(β ∣ X , Y )=Π i=1 ¿

Where:

 Yi is the actual observed outcome (0 or 1) for the i-th observation.


 P(Yi=1 ∣ Xi) is the predicted probability that Yi = 1, based on the values
of Xi and the parameters β.

The log-likelihood function is then the natural logarithm of the likelihood


function:
n
L(β ∣ X , Y )=Σ i=1 ¿

The Maximum Likelihood Estimation (MLE) procedure estimates the


values of the parameters β that maximize the log-likelihood function. This is
typically done using numerical optimization methods.

Q8: The Probit model, Probit estimation with grouped data


The Probit Model:
The probit model is used when the dependent variable is binary (i.e., it takes
two values, such as 0 or 1). It uses the cumulative normal distribution
function (CDF) to model the probability of the binary outcome. The basic idea
is to assume that there is an unobservable (latent) variable, called the utility
index (denoted as IiI_i), that influences the binary decision. The observed
outcome depends on whether this utility index crosses a certain threshold.
2
−Z
Function: Φ ( Ζ )= 1 ∫ −∞
Ζ
e 2
dt
√2 π
The Probit Model is expressed as:

P ( Y =1 ∣ X )=Φ ( Ζ )=Φ( β 1+ β 2 Xi)

Where:

 P(Y=1∣X) is the probability that the event occurs (e.g., a family owns a
house).
27

 Φ(Ζ) is the cumulative distribution function (CDF) of the normal


distribution.
 β1 and β2 are parameters to be estimated.
 Χi is an explanatory variable (e.g., income).

In the probit model, the latent variable Ι i is assumed to follow a normal


distribution, and the decision rule is that a family will own a house (i.e., Y=1)
if the latent utility Ι i exceeds a certain threshold. The threshold itself is
assumed to follow a normal distribution, allowing for estimation of both the
parameters and the unobservable index.

Probit Estimation with Grouped Data:


When dealing with grouped data, the probit model can be estimated using
the empirical probabilities for different groups. The idea is to estimate the
unobservable index Ι i from the cumulative probability Pi, using the normal
CDF. Once we have these estimated indices, we can proceed to estimate the
parameters β1 and β2.

The grouped data refers to situations where we have data that is aggregated
into categories, such as income ranges, and for each category, we observe
the proportion of people who own a house. The empirical probabilities Pi for
each group can be used to estimate the latent variable Ι i .

The steps in estimating a probit model with grouped data are:

 Calculate the empirical probabilities Pi for each group (e.g., the


proportion of families owning a house for different income ranges).
 Use these probabilities to estimate the latent variable Ι i using the
normal CDF.
 Estimate the parameters β1 and β2 by fitting the probit model to the
grouped data.

This method requires calculating the inverse of the normal CDF (i.e., using
the standard normal distribution to reverse the transformation) to obtain the
latent variable values. The model is then estimated by standard regression
methods, taking into account the relationship between the explanatory
variables and the latent variable.

The estimated coefficients can then be used to interpret the effects of


explanatory variables (like income) on the probability of the binary outcome
(like owning a house).
28

Q9: The Probit Model for Ungrouped or Individual Data

The Probit Model for Ungrouped or Individual Data


The Probit model is a type of regression model used for binary dependent
variables, meaning the outcome variable has two possible outcomes (such as
success/failure or yes/no). It is widely used in econometrics, especially in
time series and panel data analysis, for modeling probabilities.

For ungrouped or individual data, the Probit model can be


explained as follows:

Model Structure:
 In the Probit model, the probability of an event occurring is modeled
using the cumulative distribution function (CDF) of the normal
distribution.
 The model is written as: P(Y =1∣ X )=Φ (Xβ)

Where:

Y is the binary dependent variable (0 or 1),

X is the vector of independent variables (predictors),

β is the vector of coefficients to be estimated,

Φ is the CDF of the standard normal distribution.

Estimation Method:

 Maximum Likelihood Estimation (MLE) is typically used to estimate the


parameters β\beta.
 The likelihood function for a Probit model is:

Yi 1−Yi
L(β )=Π ni=1 [Φ ( Xiβ ) ⋅ ( 1−Φ ( Xiβ ) ) ]
29

where n is the number of observations, Yi is the binary outcome for the i-


th observation, and Xi is the vector of independent variables for the i-th
observation.

Interpretation:

 The coefficients β in the Probit model represent the change in the Z-


score (i.e., the inverse of the standard normal CDF) for a one-unit
change in the corresponding predictor variable.
 The marginal effect of a predictor on the probability of Y = 1 is given
by:
∂ P ( Y =1∣ X )
=ϕ ( Xβ)⋅ βi
∂ Xi
where ϕ is the probability density function (PDF) of the standard
normal distribution.

Application in Time Series and Panel Data:

 When applying the Probit model to time series data, you often
model the likelihood of a specific event occurring at each time period,
considering time-dependent variables.
 In panel data, the Probit model can account for individual-specific
effects by using fixed or random effects to capture unobserved
heterogeneity.

Maximum Likelihood Function for the Probit Model

The Probit Model is used when the dependent variable (Y) is binary, taking values like 0 or 1. It
assumes that the probability of Y = 1 is determined by a cumulative normal distribution
function.

Steps to derive the maximum likelihood function:

 Probit Probability Function:


P(Yi=1 ∣ Xi)=Φ( Xiβ)

P(Yi=0 ∣ Xi)=1−Φ (Xiβ )


30

where Φ is the cumulative distribution function (CDF) of the standard normal


distribution.

 Likelihood Function:
The likelihood function for n observations is:
Yi 1−Yi
L ( β ) =Π ni=1 [Φ ( Xiβ ) ⋅ ( 1−Φ ( Xiβ ) ) ]

 Log-Likelihood Function:
Taking the log of the likelihood function to simplify calculations:
n
ln L ( β )=Σ i=1 [Yi lnΦ ( Xiβ )+ ( 1−Yi ) ln ⁡(1−Φ( Xiβ))]

 Maximization:
The parameters (β) are estimated by maximizing the log-likelihood function using
numerical optimization techniques (e.g., Newton-Raphson).

Q10: Logit and Probit Models, The Tobit Model

Logit and Probit Models


Difference Between Logit and Probit Models
Both logit and probit models are used to model binary dependent variables
(e.g., yes/no, success/failure). They differ in the distribution functions they
use. The logit model is based on the logistic distribution, while the probit
model uses the normal distribution. The logit model has slightly fatter tails
than the probit model, meaning the probability approaches 0 or 1 more
slowly in logit than in probit. Logit models are often preferred due to their
simpler mathematical formulation.
31

Choosing Between Logit and Probit


The results from both models are typically very similar. There is no strong
reason to prefer one over the other, but logit models are commonly chosen
because of their simplicity.

Interpretation of Coefficients
Coefficients in the logit and probit models cannot be directly compared
because the distributions have different variances. To compare coefficients,
you can multiply the probit coefficient by approximately 1.81 to get the logit
coefficient or multiply the logit coefficient by 0.55 to get the probit
coefficient.

LPM (Linear Probability Model) vs. Logit and Probit Models


LPM suffers from issues like heteroscedasticity and predicted probabilities
that fall outside the 0-1 range. Logit and probit models address these issues
and are preferred for modeling binary outcomes.

The Tobit Model


The Tobit model is used when the dependent variable is censored,
meaning some values are only partially observed. For instance, if
you're modeling the amount spent on housing, you may only
observe expenditures for those who bought a house, while those
who didn't purchase a house are censored.
Censoring and Censored Samples
A censored sample occurs when the dependent variable is not fully observed.
For example, if we don't have data on spending for those who did not buy a
house, we have a censored sample, which the Tobit model can handle.

Equation for Tobit Model


The Tobit model is expressed as follows:

 If Yi> 0 ,Yi=β 1+ β 2 Xi+ μi


 If Yi ≤ 0 ,Yi=0
Here, Yi is the dependent variable (e.g., housing expenditure), Xi are
the explanatory variables (e.g., income), and μi is the error term.

Estimating Tobit Model


Ordinary Least Squares (OLS) is inappropriate for censored data because it
leads to biased estimates. Instead, maximum likelihood estimation (MLE) is
used to estimate the Tobit model. Alternatively, the Heckman two-step
procedure can be used, where the first step estimates the probability of
32

censoring (using a probit model), and the second step adjusts the regression
for this probability.

Example of Tobit Model


In the case of extramarital affairs, the dependent variable (number of affairs)
is censored because many people report zero affairs. A Tobit model can be
used to estimate the relationship between variables (e.g., age, marital
happiness) and the number of affairs.

Key Takeaways
The Tobit model is useful for censored data, where the dependent variable is
only partially observed. Maximum likelihood estimation is typically used to
estimate the parameters of the Tobit model, providing more accurate results
than OLS in such cases.

CHAPTER 3 Pool and Panel Data


Q1: Why Panel Data Regression Models
Panel Data:
Panel data refers to a dataset that includes observations of multiple cross-
sectional units (e.g., individuals, firms, or countries) over several time
periods.
33

Why Panel Data Regression Models?


Panel data regression models are used because they combine both cross-
sectional data (data on different entities at a single point in time) and
time-series data (data on the same entity over multiple time periods). This
combination offers several advantages:

 Control for Unobserved Heterogeneity


Panel data allows us to control for unobservable variables (such as
individual characteristics) that might affect the dependent variable.
This reduces the problem of omitted variable bias.

 More Data, More Efficiency


Since panel data includes observations across time and entities, it
increases the size of the dataset. This helps to provide more precise
estimates and improves efficiency in statistical analysis.

 Captures Dynamics Over Time


Panel data models can track changes over time, which helps in
understanding the dynamics of behavior or the effects of policy
changes.

 Reduced Multicollinearity
By observing multiple entities over time, panel data often reduces the
issue of multicollinearity that can arise in cross-sectional or time-series
data alone.

 Flexibility in Model Design


Panel data supports different types of models, such as:

1. Fixed Effects Model (FEM): Focuses on analyzing the


effect of variables within an entity over time.
2. Random Effects Model (REM): Assumes that differences
across entities are random and uncorrelated with the
independent variables.

 Better Policy Analysis


Panel data regression is useful for evaluating the impact of policies
over time while considering individual-specific characteristics.

 Distinction Between Time-Invariant and Time-Variant


Factors
34

Panel data allows distinguishing between factors that change over time
and those that remain constant, improving the understanding of their
respective impacts.

Q2: Estimation of Pool Data Regression Models


Pool Data:

Pooled data is a dataset where observations from different cross-sectional


units (e.g., individuals, firms, or countries) are combined over time but
treated as if they are part of a single dataset without accounting for
individual or time-specific effects.

Estimation of Pool Data Regression Models:


Pooled OLS Estimation:

In Pooled OLS, all the observations (both cross-sectional and time-series) are
combined into one dataset. The regression model is then estimated using the
usual Ordinary Least Squares (OLS) method.

Formula:
Y ¿ =α + β X ¿ +ϵ ¿

Where

 Y ¿ is the dependent variable for individual i at time t,


 X ¿ is the independent variable,
 α is the intercept,
 β is the coefficient for the independent variable,
 ϵ ¿ is the error term.

This method assumes that the relationship between the variables is the
same for all individuals and across all time periods.

Fixed Effects Estimation:


35

Fixed Effects (FE) estimation controls for time-invariant differences between


individuals. It removes the individual-specific effects by subtracting the mean
of each variable for each entity (individual or group).

The model can be estimated by either:

 Within Transformation: Subtracting the individual means from


both the dependent and independent variables, which removes
individual-specific effects.
 Dummy Variable Approach: Including a dummy variable for
each individual (except one to avoid perfect collinearity).
 Formula:
Y ¿ =α i+ β X ¿ +ϵ ¿

Where α iis the individual-specific intercept.

The FE method is appropriate when there are unobserved individual effects


that are correlated with the independent variables.

Random Effects Estimation:

Random Effects (RE) estimation assumes that the individual-specific effects


are random and uncorrelated with the explanatory variables.

The model incorporates both individual-specific effects and time variation in


the error term.

Formula:
Y ¿ =α + β X ¿ + μ¿ + ϵ ¿

Where:

 μ¿ is the individual-specific random effect, and ϵ ¿ is the usual error


term.
 RE estimation is more efficient than FE if the assumption of no
correlation between individual effects and the independent variables
holds.

Choosing Between Fixed and Random Effects:


36

The Hausman test is used to decide between Fixed and Random Effects.
The null hypothesis of the test suggests that the Random Effects model is
appropriate (i.e., the individual effects are uncorrelated with the regressors).

If the Hausman test shows significant differences between the Fixed and
Random Effects estimates, the Fixed Effects model should be used.

Q3: Common Intercept Method

Common Intercept Method


Common Intercept Method assumes that all entities (such as countries,
firms, or individuals) share the same intercept. This means that the baseline
value for the dependent variable is the same across all entities. The model
simplifies the analysis by not accounting for individual differences in
intercepts.

Key Points:
 Same Intercept for All: All entities are assumed to have the
same intercept, denoted by α. This simplifies the model by treating all
entities as if they start from the same baseline value.

 Model Structure: The equation for the common intercept method


is typically written as:

Y ¿ =α + β X ¿ +ϵ ¿

Where:

Y ¿ is the dependent variable for entity i at time t,

α is the common intercept (same for all entities),

β is the coefficient for the independent variable X ¿

ϵ ¿ is the error term.

 Assumptions: This method assumes that all entities respond


similarly to the independent variable, meaning the effect of X ¿ is the
37

same for each entity, and the only difference arises from the error term
ϵ ¿.

 Use: The common intercept method is used when the entities are
expected to have similar characteristics and behaviors. For example, if
studying the inflation rates of different countries, it can be assumed
that all countries behave similarly in terms of inflation, leading to a
common intercept.

 Limitations:

It ignores differences between entities (heterogeneity).

It might not be suitable for cases where entities have


significantly different starting points or structures.

Q4: Fixed-effects Model

Fixed-effects Model
The fixed-effects model is used in panel data analysis when you are
interested in analyzing the impact of variables that vary over time but might
have individual differences that do not change across time (e.g., individual
characteristics of entities like people, firms, or countries).

Key Features:
 Individual-Specific Effects: It assumes that each entity (e.g.,
individual, firm, or country) has its own unique characteristics that
might affect the dependent variable but do not vary over time.

 Controlling for Unobserved Heterogeneity: The fixed-effects


model controls for unobserved factors that are constant over time but
vary across entities. This helps in reducing bias caused by omitted
variables that are constant over time but vary across individuals.

 Model Structure: The basic fixed-effects model with LSDV can be


written as:
Y ¿ =α i+ β X ¿ +ϵ ¿

Where:
38

Y ¿ is the dependent variable for entity ii at time tt,


α i is the entity-specific fixed effect,
X ¿ represents the independent variables,
β is the coefficient vector,
ϵ ¿ is the error term.

 Within-Group Estimation: The fixed-effects model estimates the


relationship between the independent and dependent variables by
focusing on the variation within each entity, rather than between
entities.

 No Assumption of Homogeneity: Unlike random effects, the fixed-


effects model does not assume that the individual effects (αi\alpha_i)
are random. Instead, it treats them as fixed and estimates them for
each individual entity.

 Advantages of Fixed-Effects Model:

It eliminates bias from unobserved factors that do not change over


time.

It allows for a more accurate estimation of the relationships in the data


by controlling for individual heterogeneity.

 Disadvantages:

It cannot estimate the effects of variables that do not vary over time
within entities.

It can lead to a loss of degrees of freedom when there are many


individual effects.
39

Q5: Least Squares Dummy Variables Approach

Least Squares Dummy Variables (LSDV) Approach


The Least Squares Dummy Variables (LSDV) approach is a method used to
estimate panel data models by including individual-specific dummy variables
in the regression model. It is used when there are unobserved individual
effects that may affect the dependent variable.

Key Concepts:

Panel Data: Data that includes multiple observations on the same


units (e.g., individuals, firms, countries) over time.

Dummy Variables: Binary variables (0 or 1) representing specific


characteristics or categories of the units (e.g., individual firms,
countries).

How LSDV Works:


In panel data, there are two main types of effects:

Fixed Effects: These are unobserved characteristics that are


constant for each individual but vary across individuals. The LSDV
approach controls for these by including dummy variables for each
individual.

Random Effects: These are assumed to be uncorrelated with the


independent variables, unlike fixed effects.

Steps to Apply LSDV:


Model Specification: The basic panel data model with fixed
effects can be written as:

Y ¿ =α + β X ¿ +γ i + ϵ ¿

Where:

Y ¿ is the dependent variable for individual ii at time tt,

X ¿ are the independent variables for individual ii at time tt,


40

α is the overall intercept,

β is the vector of coefficients for the independent variables,

γ i represents the individual-specific fixed effect (dummy variable


for each individual),

ϵ ¿ is the error term.

Including Dummies: To implement the LSDV approach, we introduce


dummy variables γ i for each individual. These dummy variables capture the
individual-specific intercepts.

Estimation: The coefficients are estimated by running an Ordinary Least


Squares (OLS) regression on the transformed model. The inclusion of
individual dummies helps control for the unobserved heterogeneity that
could bias the results.

Interpretation: The estimated coefficients will show the effect of the


independent variables on the dependent variable, while accounting for the
individual-specific characteristics.

Advantages of LSDV:

Controls for unobserved heterogeneity by including individual-specific


effects.

Provides consistent estimates for fixed effects models.

Disadvantages:

The model can become very large if there are many individuals, as it requires
a dummy variable for each individual.

The approach may suffer from the "dummy variable trap" (perfect
multicollinearity) if an intercept is included in the model.
41

Q6: Random-effects Model


Random-effects Model
The Random-effects Model is a type of model used when data includes
multiple entities (like countries, companies, or individuals) observed over
multiple time periods. The model assumes that individual-specific effects
(the unobserved factors affecting each entity) are random and not correlated
with the independent variables in the model.

Assumptions:

The individual-specific effects are random and uncorrelated with


the explanatory variables.

There is variability across individuals, but it is assumed that this


variability is random and not due to a fixed characteristic.

Model Structure: The random-effects model can be expressed as:

Y ¿ =α + β X ¿ + μ¿ + ϵ ¿

Where:

Y ¿ is the dependent variable for individual ii at time tt.

X ¿ represents the independent variables.

μ¿ is the individual-specific random effect.

ϵ ¿ is the idiosyncratic error term.

Advantages:

It is more efficient than the fixed-effects model when the


assumption of random effects is true.
42

It allows for the inclusion of time-invariant variables (unlike fixed


effects).

It is less computationally expensive than the fixed-effects model.

Disadvantages:

If the assumption that individual-specific effects are uncorrelated


with the explanatory variables is violated, it can lead to biased
estimates.

It is not suitable when individual effects are correlated with the


independent variables.

Choosing Between Random-effects and Fixed-effects:

The Hausman Test is often used to determine which model to use. If the
test indicates that the individual effects are correlated with the independent
variables, the fixed-effects model is preferred. Otherwise, the random-effects
model is a better choice.

The random-effects model is commonly used when researchers believe the


unobserved heterogeneity across individuals is not correlated with the
predictors.

Q7: Generalized Least Squares Approach

Generalized Least Squares Approach


The Generalized Least Squares (GLS) is a method used in econometrics to
estimate the parameters of a model when the assumption of
homoscedasticity (constant variance of errors) is violated. In simpler terms,
GLS is used when the errors in the model have different variances or are
correlated, which can lead to inefficient and biased estimates if Ordinary
Least Squares (OLS) is used.

Here’s how GLS works:


43

Identifying Heteroscedasticity or Autocorrelation:

If there is heteroscedasticity (when error terms have unequal variances) or


autocorrelation (when error terms are correlated across observations), GLS is
preferred over OLS.

Transformation of the Model:

GLS transforms the original model to make the errors homoscedastic (with
constant variance) and uncorrelated by multiplying the model by a weighting
matrix. This matrix is typically the inverse of the error covariance matrix.

Estimation:

After transformation, GLS applies the OLS method to the modified model to
get efficient and unbiased estimates.

Steps in GLS:
Model Specification:

The linear regression model is generally given as:

y= Xβ+ μ

where y is the dependent variable, X is the matrix of explanatory variables, β


is the vector of coefficients, and μ is the error term.

Assumption of Error Structure:

Assume that the errors μ have a specific structure of covariance, denoted by


Σ, which could be heteroscedastic or correlated across observations.

Transforming the Model:

−1
Multiply both sides of the model by a matrix W (such that W =Σ 2 the inverse
of the square root of Σ) to transform the model into one with uncorrelated,
homoscedastic errors.

Wy=WXβ +Wμ

OLS on Transformed Model:


44

−1
Now, we apply OLS to the transformed model: ^β GLS= ( X ' W ' WX ) X ' W ' Wy

Interpretation:

The resulting ^β GLS is the efficient estimator of the coefficients, as it takes into
account the heteroscedasticity or autocorrelation in the error terms.

Advantages of GLS:

More efficient than OLS when the error terms are heteroscedastic or
autocorrelated.

Provides unbiased estimates under correct specification of the error


structure.

Disadvantages of GLS:

Requires knowledge or estimation of the error covariance matrix Σ,


which can be difficult or computationally expensive in some cases.

If the error structure is misspecified, GLS may not perform better


than OLS.

Q8: Fixed-effects model Vs. Random-effects model

Fixed-effects Model vs. Random-effects Model


Both the fixed-effects model and the random-effects model are used to
analyze panel data (data that involves multiple entities over time). They help
account for individual differences across entities (like countries, companies,
etc.).

Fixed-effects model:
45

 Purpose: It assumes that individual-specific characteristics


(unobserved factors) are constant over time but differ across entities.
These individual effects are treated as fixed.
 Assumption: The individual effects are correlated with the
independent variables (predictors).
 How it works: It removes the impact of time-invariant
characteristics by focusing only on the within-entity variations.
 Example: If studying the impact of education on income, fixed
effects would control for individual-specific factors (like innate ability or
family background) that do not change over time.
 Use case: It’s used when we believe that there are unique,
unobserved factors for each entity that could affect the results.

Random-effects model:

 Purpose: It assumes that individual-specific characteristics are


random and uncorrelated with the independent variables. These
effects are treated as part of the error term.
 Assumption: The individual effects are uncorrelated with the
independent variables.
 How it works: The model assumes that the individual-specific
effects are drawn from a distribution and treats them as random
variables.
 Example: If studying the effect of education on income across
various countries, random effects might assume that country-specific
factors (like culture or government policies) are random and not
related to education or income.
 Use case: It’s used when we believe that the individual effects are
not correlated with the independent variables.

Key differences:

Correlation with independent variables: Fixed-effects


assume correlation, while random-effects assume no correlation.

Control for unobserved heterogeneity: Fixed-effects control


for individual heterogeneity by removing it, while random-effects
model assumes it’s random.
46

Efficiency: The random-effects model is more efficient (produces


less biased estimates) when the assumption of no correlation holds,
but fixed-effects are more reliable if the correlation assumption is
violated.

Q9: Hausman Specification Test

Hausman Specification Test


The Hausman Specification Test is used to check for endogeneity
(simultaneity problem) between variables in econometric models.
Specifically, it tests whether an explanatory variable in a model is correlated
with the error term, which would make it endogenous.

Set up the Model:

Demand function:

Qdt =α o +α 1 Pt + α 2 I t + α 3 Rt + μ 1t

Supply function:

Qst =β o + β 1 Pt + μ 2t

Where:

P = Price

Q = Quantity

I = Income

R = Wealth

μ = Error terms
47

Assume Exogeneity and Endogeneity:

Income I and wealth R are assumed to be exogenous.

Price P and quantity Q are endogenous, meaning that they may


be correlated with the error term.

Regress the Supply and Demand Functions:

The goal is to detect if there is a simultaneity problem, meaning


that the endogenous variable Pt is correlated with the error term
μ2 t in the supply equation.

Reduced-Form Equations:

Regress the price P and quantity Q on the exogenous variables:

Pt =γ o +γ 1 I t + γ 2 Rt + v t Qt =δ 3 + δ 4 I t +δ 5 R t +w t

Where v t and w t are the reduced-form errors.

Obtain Estimates of Price ^


Pt and Residuals ^v t:

Estimate the price Pt using Ordinary Least Squares (OLS).

The estimated ^
Pt and the residuals ^v t will be used in the next
step.

New Regression with Estimated Price and Residuals:

Regress the quantity QtQ_t on ^Pt and ^v t:

Q t =β o + β 1 ^
Pt + β2 v^ t + μ 2 t

Test the Significance of the Coefficient of ^v t:

Perform a t-test on the coefficient of ^v t:

 Null hypothesis (H0): There is no simultaneity, meaning that v^ t and


μ2 t are not correlated, and the coefficient of ^v t should be zero.
48

 Alternative hypothesis (H1): There is simultaneity, and the


coefficient of ^v t should be significantly different from zero.

CHAPTER 4 Simultaneous Equation Models and


Estimation Methods

Q1: Simultaneous Equation Models

Simultaneous Equation Models


Simultaneous equation models are used to analyze systems where multiple
equations are interdependent. In these models, one variable can be both a
dependent and independent variable in different equations. This happens
because the variables influence each other simultaneously.

Key Features of Simultaneous Equation Models:

Endogeneity: Variables are endogenous (determined within the system)


and affect each other directly or indirectly.

Structural Equations: The system is made up of structural equations


representing economic relationships.

Reduced Form: These equations can be transformed into reduced-form


equations, where endogenous variables are expressed only in terms of
exogenous variables.

Identification Problem: Determining whether a specific equation in the


system can be estimated uniquely (identification is crucial).

Estimation Methods: Standard regression methods like OLS cannot be


used due to endogeneity. Specialized techniques like Two-Stage Least
Squares (2SLS) and Three-Stage Least Squares (3SLS) are used.
49

Example of a Simultaneous Equation Model:


Demand Equation:
Qd =α o +α 1 P+α 2 Y + μ 1

Where:

 Qd : Quantity demanded
 P: Price
 Y: Income
 μ1: Error term

Supply Equation:
Qs =β o + β1 P+ μ2Where:

 Qs : Quantity supplied
 P: Price
 μ2: Error term

Here, P (price) and Q (quantity) are endogenous variables, as they are


determined within the system.

Estimation Methods for Simultaneous Equation Models


Simultaneous equation models require special estimation methods because
of endogeneity. Ordinary Least Squares (OLS) cannot provide unbiased
estimates in such cases. Below are the common estimation methods:

1. Indirect Least Squares (ILS)

Purpose: Used when the system of equations is exactly identified (only one
unique solution exists for each equation).

Process:

 Solve the structural equations to express endogenous variables as


functions of exogenous variables (reduced form).
 Estimate the reduced-form parameters using OLS.
 Use these estimates to compute structural parameters.

Limitation: Cannot be applied if the system is overidentified.


50

2. Two-Stage Least Squares (2SLS)

Purpose: Most common method for estimating structural equations in


overidentified systems.

Process:

First Stage: Regress each endogenous explanatory variable on all


exogenous variables (instruments) to obtain predicted values.

Second Stage: Replace the endogenous variables in the structural equation


with their predicted values and estimate the equation using OLS.

Advantages: Reduces bias caused by endogeneity.

Example: Estimating demand and supply models.

3. Three-Stage Least Squares (3SLS)

Purpose: Extends 2SLS by considering correlations between error terms in


different equations.

Process:

 Combine all equations in the system and estimate them


simultaneously.
 Use Generalized Least Squares (GLS) to account for correlations in the
error terms.

Advantages: Provides more efficient estimates when error terms are


correlated.

Limitation: Computationally intensive and requires strong assumptions.

4. Instrumental Variables (IV) Estimation

Purpose: Addresses endogeneity by using instruments (variables that are


correlated with endogenous regressors but uncorrelated with the error term).
51

Process:

 Select valid instruments.


 Replace endogenous variables with their instrument-predicted
values.

Advantages: Reduces bias caused by simultaneous causality.

Limitation: Selecting valid instruments is challenging.

Q2: Nature of Simultaneous Equations

Nature of Simultaneous Equations


Simultaneous equations are a set of equations where two or more dependent
variables are determined at the same time. These equations are commonly
used in economics to model situations where variables influence each other,
creating interdependence. Unlike single-equation models, where the
relationship between variables is one-directional, simultaneous equations
capture the two-way or multiple-way relationships.

Key Characteristics:
Interdependence:
The dependent variable in one equation may appear as an explanatory
(independent) variable in another equation. For example, in a supply and
demand model:

 Quantity demanded depends on price (demand equation).


 Price depends on quantity supplied (supply equation).

Endogeneity:
Endogeneity arises because the dependent variable is influenced by other
variables within the system, which are also determined by the model. This
makes the usual estimation techniques, like Ordinary Least Squares (OLS),
biased and inconsistent.

System of Equations:
A simultaneous system consists of multiple equations, each representing a
specific relationship. These equations are solved together since the variables
are interrelated. For example:
52

 Demand: Qd =a−bP
 Supply: Qs =c +dP

Identification Problem:
In simultaneous systems, not all equations can be estimated directly
because the variables are jointly determined. To estimate a particular
equation, it must be identified—either exactly identified or overidentified—
based on restrictions (e.g., exclusion of certain variables or structural
assumptions).

Equilibrium Relationships:
These systems often model equilibrium in markets, where the demand and
supply equations interact to determine price and quantity.

Example: Supply and Demand Model

 Demand Equation: Qd =100−5 P+2 Y


 Supply Equation: Qs =−20+10 P

Here, Qd and Qs are interdependent because price P is determined by both


supply and demand.

Q3: Examples of simultaneous equation models from


economic theory

Examples of Simultaneous Equation Models from


Economic Theory:
Simultaneous equation models (SEMs) are used when multiple
equations describe relationships between variables, and these equations are
interdependent. Here are some examples of simultaneous equation models
from economic theory:

1. Supply and Demand Model:

In this model, the quantity supplied and quantity demanded are determined
simultaneously by the price level. The two equations could be:

 Demand Equation: Qd=α −βP


53

 Supply Equation: Qs=γ + δP

Here, P is the price, and Qd and Qs are the quantity demanded and
supplied, respectively. The market equilibrium occurs when Qd =
Qs, and the price is determined simultaneously by both demand and
supply.

2. IS-LM Model:

This model represents the equilibrium in the goods market and the money
market. The IS curve shows the relationship between interest rates and
output in the goods market, and the LM curve shows the relationship
between interest rates and output in the money market.

 IS Equation: Y =C (Y −T )+ I (r )+ G
 LM Equation: M / P=L(Y , r )

Here, Y is output (income), r is the interest rate, C is


consumption, T is taxes, I is investment, G is government
spending, and M/P is real money supply.

3. Phillips Curve:

The Phillips curve expresses the inverse relationship between inflation and
unemployment, often represented as:

 Inflation Equation: π t =π et −β (U t −U n )
 Unemployment Equation: U t =U −γ (π t−π e )
n

Where π t is inflation, π et is expected inflation, U t is the


unemployment rate, and U n is the natural rate of unemployment.

4. Investment and Savings Model:

This model explores the interaction between savings and investment in an


economy.

 Investment Equation: I =I o−iY


 Saving Equation: S=S o +sY
54

Here, I is investment, Y is national income, i is the interest rate, S


is savings, and s is the savings rate.

5. Money Demand and Money Supply Model:

The money demand equation reflects how the demand for money depends
on income and interest rates, while the money supply is determined by
central banks.

 Money Demand Equation: Md=kY −hi


 Money Supply Equation: Ms=M 0

Here, Md is the demand for money, Ms is the supply of money, Y


is income, and i is the interest rate.

Q4: Inconsistency of OLS Estimators

Inconsistency of OLS Estimators


The concept of Inconsistency of OLS Estimators arises when we apply
the Ordinary Least Squares (OLS) method to estimate parameters in a
system of simultaneous equations, and one or more of the explanatory
variables are correlated with the disturbance term. This violates a key
assumption of OLS, which is that the explanatory variables must not be
correlated with the error term.

To explain the inconsistency of OLS estimators in a system of simultaneous


equations, let's use the example of a Keynesian model of income
determination. The consumption function can be written as:
Y t =β o + β 1 Y t + μ t + I t

Where:

 Y t is income,

 μt is the error term (disturbance),

 I t is an exogenous variable,
55

 β o∧β 1are parameters to be estimated.

Show that Y t and μt are correlated.


To do this, substitute the equation into the original model:
βo 1 1
Y t= + It + μ
1−β 1 1−β 1 1−β 1 t

Then, subtract the Expected value of Yt from Yt:


μt
Y t −E(Y t )=
1−β 1

The disturbance term μt is not independent of the income Y t , thus:


2
σ
cov (Y t , μ t )=
1−β 1

Since the covariance is non-zero, it shows that Y t and μt are correlated. This
violates the assumption of OLS that the error term is uncorrelated with the
explanatory variables.

Show that ^β t is inconsistent.


The OLS estimator for β1 is given by:

^β t = ∑ ( Ct −C ˉ ) ( Yt −Y ˉ )
2
∑ ( Yt−Y ˉ )

Substitute Ct from the consumption function:

^β = ( o
∑ β + β 1 Yt + μ t ) Yt
1 2
∑Y t

Now, take the expectation of ^β 1:

E ( Y t μt )
E( ^β1 )=β 1+ 2
E(Y t )

Since Yt and μt are correlated, the second term does not equal zero.
Therefore, ^β 1is biased.
56

The Probability limit of ^β 1:


As the sample size n increases, the sample covariance between Yt and μt will
2
σ
tend towards the true covariance . The sample variance of Yt will
1−β 1
approach its population variance σ 2Y .

Thus, the probability limit (plim) of ^β 1 is:


2
σ
plim( ^β1 )=β 1+ 2
(1−β 1) σ Y

Since the second term is positive, ^β 1 will always overestimate β1, and the
bias will not disappear even as the sample size increases. Therefore, ^β is an
1

inconsistent estimator.
Q5: Identification Problem

Identification Problem
The Identification Problem refers to the challenge in determining whether
numerical estimates of parameters in a structural equation can be derived
from the reduced-form coefficients. In simpler terms, it asks whether we can
figure out the underlying causes (structural parameters) from the observed
data (reduced-form equations).

Identification: If the parameters in a structural equation can be uniquely


estimated from the reduced-form coefficients, the equation is said to be
identified.
Unidentified (Underidentified): If the parameters cannot be
uniquely estimated, then the equation is unidentified (or underidentified).
This means there is not enough information in the data to estimate the
parameters.

Exactly Identified: If unique numerical values for all parameters of the


equation can be determined, the equation is exactly identified. This
happens when there are just enough equations for the number of
parameters.

Overidentified: If more than one numerical value can be obtained for


some parameters, then the equation is overidentified. This means there
are more equations than necessary to estimate the parameters.
57

Why does the Identification Problem arise?

The identification problem arises because a given set of data might be


consistent with multiple structural equations. This creates ambiguity, as the
reduced-form equation could correspond to several different models or
hypotheses. Essentially, different models could explain the same data,
making it difficult to pinpoint which model is the correct one.

Examples: Suppose we have data for an economic system, but


there are multiple structural models that could explain this data. If
we cannot uniquely determine the parameters of the model, we
face the identification problem. In such cases, we need additional
information or restrictions to make the model identifiable.

Q6: Unidentified, exactly identified, and overidentified

Unidentified, Exactly Identified, and Overidentified


1. Unidentified Model or Under Identified Model
An unidentified model occurs when there are more unknown parameters to
be estimated than there are independent equations to estimate them. As a
result, there is no unique solution to the model, meaning that it's impossible
to uniquely determine the values of all the parameters.

Mathematical Explanation:
Let's assume we have a system of equations with k unknown parameters
(denoted by θ₁, θ₂, ..., θₖ) and n independent equations. The model is
unidentified if the number of equations n is less than the number of unknown
parameters k (i.e., n < k).

For example, consider the following demand and supply system:

 Demand Equation:
Qd =α o +α 1 Pt + μ 1t

 Supply Equation:
Qs =β o + β1 P1 + μ2 t
58

Here, we have two equations with four unknowns: α0, α1, β0, and β1.
We cannot uniquely solve for the parameters because we only have two
equations.

Thus, with only two equations and four unknowns, the model is unidentified.

Characteristics of an Unidentified Model:

 Not enough equations to estimate the parameters.


 There are infinite combinations of parameters that could fit the
data.
 Identification issues arise when there is collinearity (e.g., shared
variables across equations like price and quantity).

2. Exactly Identified Model


An exactly identified model occurs when the number of independent
equations is equal to the number of unknown parameters. In this case, there
is a unique solution for the parameters, and the model can be estimated with
certainty.

Mathematical Explanation:
For a model to be exactly identified, the number of unknown parameters k
must be equal to the number of independent equations n. Mathematically,
this condition can be expressed as:
n=k

Consider the following example:

 Demand Equation with an Additional Exogenous Variable


(Income, I):
Qd =α o +α 1 Pt + α 2 I t + μ1 t
 Supply Equation:
Qs =β o + β1 Pt + μ 2t

In this case, we have:

 k = 5 parameters: α0, α1, α2, β0, β1

 n = 5 independent equations (from the demand and supply


system).

Therefore, the model is exactly identified because we have exactly the same
number of equations as unknowns.
59

Characteristics of an Exactly Identified Model:

 Equations equal the number of parameters to be estimated.


 A unique solution is possible, and parameters can be estimated.
 If there is no extra information (such as additional exogenous
variables or instruments), you cannot estimate the model if it is
not identified.

3. Overidentified Model
An overidentified model occurs when there are more independent equations
than unknown parameters. This allows for more data to test the validity of
the model and gives extra information to help estimate the parameters.
Overidentification provides extra equations that help improve the reliability
of parameter estimates.

Mathematical Explanation:
For a model to be overidentified, the number of independent equations n
must be greater than the number of unknown parameters k. Mathematically,
this condition can be expressed as:
n> k

For example, consider the following demand and supply system:

 Demand Equation:
Qd =α o +α 1 Pt + α 2 I t + μ1 t
 Supply Equation with Lagged Price:
Qs =β o + β1 Pt + β 2 P t−1 + μ2 t

Here, we have:

 k = 6 parameters: α0, α1, α2, β0, β1, β2

 n = 6 independent equations (from the demand and supply


system).

Therefore, the model is overidentified because we have more equations than


parameters to estimate.

Characteristics of an Overidentified Model:

 More equations than unknowns, leading to extra information.


60

 The model can be estimated, and additional testing (such as Hansen's


J-test) can help validate the instruments or restrictions.
 The extra equations help to check for consistency and improve the
reliability of parameter estimates.

Q7: Rules for Identification

Rules for Identification


Identification is essential to ensure that the parameters of a model can be
uniquely estimated. Two main conditions for identifying equations in such
systems are the Order Condition and the Rank Condition. Let's break them
down in simple terms:

1. Order Condition of Identifiability


This is a necessary but not sufficient condition for identification. It provides a
straightforward way to check if an equation can potentially be identified by
excluding certain variables.

 M: Number of endogenous variables in the system.


 m: Number of endogenous variables in a given equation.
 K: Number of predetermined (exogenous) variables in the model.
 k: Number of predetermined variables in a given equation.

Order Condition:

An equation can be identified if it excludes at least M - 1 variables (both


endogenous and predetermined) from the model.

If the equation excludes exactly M - 1 variables, it is just identified.

If the equation excludes more than M - 1 variables, it is overidentified.

Alternative Formula: The equation can also be identified if the number of


predetermined variables excluded from it is greater than or equal to m - 1,
where m is the number of endogenous variables in the equation.

 For Exact Identified:


K−k=m−1
61

 For Under Identified:


K−k <m−1

 For Under Identified:


K−k >m−1

Example:
For a simple system with two equations (like the demand and supply
functions), the order condition helps us check whether each equation can be
estimated. If an equation excludes enough variables (at least M - 1), it can be
identified.

2. Rank Condition of Identifiability


This is a necessary and sufficient condition for identification. The rank
condition helps to confirm that an equation can be uniquely identified after
applying the order condition.

Rank Condition:

For an equation to be identified, at least one non-zero determinant of order


(M - 1) × (M - 1) should be possible to construct from the coefficients of the
variables excluded from the equation but included in other equations of the
model.

How to Apply:

 Write the system in tabular form, where you track the coefficients of
the endogenous and predetermined variables.
 Remove the coefficients of the row for the equation under
consideration.
 Remove the columns of the variables that are in that equation.
 Form matrices with the remaining coefficients and check if at least one
non-zero determinant can be found.
 If the determinant is non-zero, the equation is identified. If all
determinants are zero, the equation is unidentified.

Q8: Simultaneous Equations Approach to Estimation


62

Simultaneous Equations Approach to Estimation


Simultaneous equations models (SEMs) involve multiple equations that are
interdependent, meaning the dependent variables in one equation may
appear as independent variables in other equations. Estimating the
parameters of these models requires specialized methods because
traditional techniques like Ordinary Least Squares (OLS) cannot account for
the interrelationships between the equations.

Estimation Methods:
There are two main types of estimation methods for simultaneous equations:
least squares and maximum likelihood. Each of these categories has specific
approaches:

1. Least Squares Method:

2SLS (Two-Stage Least Squares): In this method,


endogenous variables are replaced with predicted values in the
first stage, and in the second stage, OLS is applied to estimate
parameters. While 2SLS provides consistent estimates, it is not
asymptotically efficient because it does not account for the error
covariances across equations.

3SLS (Three-Stage Least Squares): This method is an


extension of 2SLS. It accounts for the correlation between errors
across equations by estimating the covariance matrix in the third
stage, improving efficiency compared to 2SLS.

IT3SLS: This is an iterative version of 3SLS where the covariance


matrix is updated until the estimates converge.

2. Maximum Likelihood Method:

LIML (Limited Information Maximum Likelihood): LIML


provides consistent estimates and is viewed as a maximum
likelihood estimator. It is more computationally intensive than
2SLS but can be more efficient when dealing with small samples.

FIML (Full Information Maximum Likelihood): This method


maximizes the likelihood function based on the entire system of
equations rather than just a single equation, improving efficiency
63

in large samples. However, FIML is computationally more


expensive than other methods.

3. Instrumental Variables (IV):

In the IV approach, endogenous variables are replaced by


instruments (predetermined variables) that are uncorrelated with
the residuals but correlated with the endogenous variable. For
example, the 2SLS method substitutes the endogenous variable
with its predicted value, obtained from the instrumental
variables.

4. K-Class Estimators:

K-class estimators, which include OLS, 2SLS, LIML, and MELO


(Minimum Expected Loss Estimator), are flexible methods for
estimating parameters in simultaneous equations. The
parameter kk controls the balance between OLS and 2SLS, and
the optimal value of kk minimizes the risk of estimation errors.

MELO is a Bayesian K-class estimator that combines OLS and


2SLS estimates to minimize expected loss. It has finite second
moments and thus avoids infinite risk, making it a more reliable
estimator under certain conditions.

5. SUR (Seemingly Unrelated Regressions):

SUR is used when there is contemporaneous correlation between


the error terms of the equations. It estimates the error
covariance matrix and uses this information to improve the
efficiency of parameter estimates.

Q9: Indirect Least Squares

Indirect Least Squares (ILS)


64

To solve a question based on Indirect Least Squares (ILS) , we need to follow


the steps involved in ILS as outlined in the example you provided. Here's a
simplified process to solve an ILS problem:

Step 1: Obtain the Reduced-Form Equations


Given a system of structural equations, the first step is to express them in
their reduced form. This involves isolating the endogenous variables in each
equation. For example, if we have a demand and supply system as follows:

 Demand function: Qt=α o+ α 1 Pt +α 2 X t + μ1 t


 Supply function: Qt= β o+ β1 Pt + μ2 t

We obtain the reduced-form equations:

Pt =γ o +γ 1 X t +w t

Qt=δ 2 +δ 3 X t + v t

Where γ o , γ 1, δ 2 , δ 3 are the reduced-form coefficients, and w t , v t are the error


terms.

Step 2: Apply OLS to the Reduced-Form Equations


Next, apply Ordinary Least Squares (OLS) regression to each reduced-form
equation separately. This involves regressing the dependent variable (like Pt
or Qt) on the independent variables (like Xt) to obtain the estimates for the
reduced-form coefficients.

For example, the OLS estimates would be:

∑ X t Pt
 γ^ 1= 2
∑Xt
 γ^ o=P ‾ − γ^ 1 X ‾
∑ X t Qt
 δ^ 3= 2
∑ Xt
 δ^ 2=Q ‾ −δ^ 3 X ‾

Step 3: Derive Structural Coefficients from Reduced-Form


Estimates
65

Once the reduced-form coefficients are estimated, you can derive the
structural coefficients (like α0,α1,β0,β1) using the relationships between the
reduced-form and structural coefficients.

For example, for the supply function, the structural coefficients can be
estimated as:

^β = δ^ − β^ γ^
ο 2 1 o

^
^β = δ 3
1
^γ 1

Now, using the numerical estimates from the reduced-form equations, we


can calculate the ILS estimates of the structural coefficients.

Example Problem:
Consider the following:

2 Demand equation: Qt=α o+ α 1 Pt +α 2 X t + μ1 t


3 Supply equation: Qt= β o+ β1 Pt + μ2 t

From the reduced-form equations, assume we get the following OLS


estimates:

 ^
Pt =90.9601+0.0007 Xt
^ t =59.7618+0.0020 Xt
 Q

From the ILS method:

^β = δ^ − β^ γ^
ο 2 1 o

^
^β = δ 3
1
^γ 1

Using the values from the reduced-form estimates:

 ^β ο=−183.7043
 ^β =2.6766
1

These are the ILS estimates of the structural coefficients.


66

Q10: Two-Stage Least-Squares, Instrumental Variables

Two-Stage Least-Squares, Instrumental Variables


Two-Stage Least Squares (2SLS) is used to estimate the relationships between variables when
there is a problem of endogeneity in the model. Endogeneity arises when an explanatory variable
is correlated with the error term, leading to biased and inconsistent estimates in ordinary least
squares (OLS). 2SLS is widely used in Time Series and Panel Data when instrumental variables
are available to deal with the endogeneity problem.

• First Stage
Replace the endogenous variable with its predicted values by regressing it on
the instrumental variables and other exogenous variables. The instrumental
variables must be:

 Highly correlated with the endogenous variable.


 Uncorrelated with the error term.

For example, if we have an equation:


Y = β o + β1 X + μ

where X is endogenous, we first regress X on the instruments Z:


X =α o +α 1 Z+ v The predicted values ( ^
X ) are saved.

• Second Stage
Use the predicted values of X ( ^ X ) from the first stage in place of the original
endogenous variable in the main equation. Estimate the model:
Y = β o + β1 ^
X + μThis gives consistent and unbiased estimates of β 1.

Assumptions of 2SLS

 Instrument Relevance: Instruments must be strongly correlated


with the endogenous variable.
 Instrument Exogeneity: Instruments must not be correlated with
the error term.
67

 Model is Linear: The relationship is linear in parameters.

Advantages of 2SLS

Solves the problem of endogeneity.

Provides consistent and unbiased parameter estimates.

Easy to implement in econometric software.

Limitations of 2SLS

Weak instruments lead to imprecise estimates.

Requires valid and relevant instruments, which can be hard to find.

Results can be sensitive to the choice of instruments.

Instrumental Variables (IV)


Instrumental Variables (IV) are used to deal with endogeneity. IVs are
external variables that are correlated with the endogenous explanatory
variables but uncorrelated with the error term.

Properties of a Good Instrument

Relevance: The instrument should be highly correlated with the


endogenous variable.

Exogeneity: The instrument should not be correlated with the error


term.

For example:
In studying the effect of education (X) on income (Y), years of schooling
might be endogenous. Distance to the nearest school (Z) can be used as an
instrument.

Steps in Using IV

 Identify the endogenous variable in the model.


 Select valid instruments satisfying relevance and exogeneity.
 Use the IV to estimate the model (often through 2SLS).
68

Application in Time Series and Panel Data


IV and 2SLS methods are used in dynamic panel models, where lagged
dependent variables or external instruments help deal with simultaneity bias
and unobserved heterogeneity.

Advantages of IV

 Handles endogeneity effectively.


 Produces consistent parameter estimates.
 Helps in causal inference.

Disadvantages of IV

 Finding valid instruments is challenging.


 Weak instruments can lead to biased results.
 Requires robust testing for instrument validity.

Q11: Seemingly unrelated regression (SUR)

Seemingly unrelated regression (SUR)


Seemingly Unrelated Regression (SUR) is a method used when we have
multiple regression equations that are related through their error terms, but
the equations themselves may have different dependent variables.

Key Features of SUR:


Multiple Equations: SUR involves a system of equations, where
each equation represents a different regression, but they share a
common structure in terms of the error terms.

Error Terms Correlation: The key feature is that the error terms
across the different equations are assumed to be correlated. Even
though the equations are separate, the errors are linked in some way,
which is why the system is called "seemingly unrelated."

Efficient Estimation: Unlike estimating each equation separately


(which would ignore the correlation between errors), SUR provides a
69

way to estimate all equations jointly, taking the correlation of errors


into account. This leads to more efficient parameter estimates.

Application: SUR is useful when we suspect that there are


relationships between different dependent variables that can't be
captured in a single equation. For example, if you have several
economic models with related error terms, using SUR will help to
improve the accuracy of your estimates.

Mathematical Representation:
Consider a system of mm equations:
y i= X i β i +ϵ i for i=1 , 2 ,... , m

Where:

 y i is the dependent variable for the i th equation.


 X i is the matrix of independent variables.
 β i is the vector of coefficients.
 ϵ i is the error term.

The errors across the equations, ϵ i are assumed to have a variance-


covariance structure that is not diagonal, i.e., there are correlations between
the errors of different equations.

Estimation Process:
Ordinary Least Squares (OLS): Each equation can be
estimated using OLS independently.

Seemingly Unrelated Estimation: Once we have OLS


estimates, SUR uses the correlation between the error terms of the
different equations to re-estimate the parameters in a way that
improves efficiency.

Generalized Least Squares (GLS): A more efficient estimation


method, GLS, is often used in SUR to take advantage of the correlation
between the errors.
70

Advantages of SUR:
 Improved Efficiency: By accounting for the correlation between
the error terms, SUR provides more efficient and reliable estimates
than estimating each equation separately.

 Flexibility: It can be used in systems where the dependent variables


are related and the errors are correlated.

Limitations:
Assumption of Error Correlation: SUR assumes that error terms
across equations are correlated, but if this assumption is wrong, the results
may be misleading.

Complexity: The method is more complex to implement than estimating


each equation individually.

Q12: Nature of SUR Equations

Nature of SUR Equations


The nature of Seemingly Unrelated Regression (SUR) equations is defined by
the following key features:

1. Multiple Equations:

System of Equations: SUR involves a system of two or more regression


equations, each with its own dependent variable. Each equation can have
different independent variables, but they are related because their error
terms are correlated.

For example, in a system with two equations, you could have:

y 1= X 1 β 1+ ϵ 1

y 2= X 2 β 2 +ϵ 2
71

Where y 1and y 2 are dependent variables, and X 1 and X 2 are the


corresponding sets of independent variables. The error terms ϵ 1 and ϵ 2 are
assumed to be correlated.

2. Error Term Correlation:

The key feature of SUR is that the error terms across the equations are
correlated. This implies that the errors from one equation provide information
about the errors in another equation. Despite the equations appearing
unrelated (seemingly), they are interconnected because of the correlation in
the residuals.

Mathematically, the covariance matrix of the error terms is assumed to be


non-diagonal, indicating correlation between them. This is different from
standard ordinary least squares (OLS), where the error terms are assumed to
be uncorrelated.

3. Dependent Variables and Independent Variables:

Each equation in the SUR system has its own dependent variable and a
potentially different set of independent variables. However, the equations
are related through the correlation of their error terms.

The dependent variables in each equation can be related in terms of


underlying economic theory, but each equation is treated separately in terms
of explanatory variables.

4. Efficiency in Estimation:

Seemingly Unrelated Estimation is used to jointly estimate the parameters of


all equations, using the correlations between the error terms to improve
efficiency. In simple terms, it takes advantage of the fact that the equations
share some common information (through the error correlation) and
produces more accurate estimates than estimating each equation separately.

5. Generalized Least Squares (GLS):

In the context of SUR, Generalized Least Squares (GLS) is typically employed


for estimation, instead of Ordinary Least Squares (OLS). GLS adjusts for the
72

correlation in the error terms across equations, leading to more efficient


parameter estimates.

6. Flexibility of Model Structure:

The structure of the SUR system allows for heterogeneity in the equations.
This means each equation can have different sets of explanatory variables
and different coefficients, which provides flexibility when modeling systems
with multiple relationships.

7. Practical Applications:

SUR is widely used in cases where multiple equations are likely to have
correlated errors, such as in macroeconomic modeling, market demand and
supply models, or any situation where different related phenomena are being
studied simultaneously.

Q13: Method of Generalized Least Squares

Method of Generalized Least Squares


The Generalized Least Squares (GLS) method is an extension of the Ordinary
Least Squares (OLS) method that adjusts for heteroscedasticity (non-
constant variance) in the data. Here's a detailed explanation of GLS:

Why OLS is Not Best, Although Unbiased


In OLS, every observation is treated equally when estimating the parameters
of the regression model. This works well when the error terms have constant
variance (homoscedasticity). However, if the error terms have different
variances (heteroscedasticity), OLS does not make the best use of the
available information. Specifically:

 OLS assigns equal weight to all observations, irrespective of the


variance of the error terms.
 In situations where some observations come from populations with
higher variability (larger error terms), the estimation might not be as
reliable.
73

For example, if you have data on employee compensation across different


employment classes, the variability in compensation might differ between
these classes. Ideally, you want to give less weight to observations with
higher variability and more weight to those with smaller variability. OLS
doesn't do this, but GLS does.

Process of Generalized Least Squares:


Model Setup:

The model is:

Y i=β 1 + β 2 X i+ μ i

where Y i is the dependent variable (e.g., compensation), X i is the


independent variable (e.g., employment size), and μi is the error term.

Transformation of the Model:

To address heteroscedasticity, we transform the model by dividing both


sides by σ i, where σ 2i is the variance of the error term for each
observation:

Yi Xo i Xi μi
= β1 + β2 +
σi σi σi σi

This transformation gives a new model with a homoscedastic error


¿
term, as the variance of the transformed error μi becomes constant.

Why Transform the Model?


¿
The transformation ensures that the error term in the new model, μi , has
constant variance. This makes the assumptions of the classical linear
regression model hold, allowing us to apply OLS to the transformed model:

¿ ¿ ¿ ¿ ¿ ¿
Y i =β 1 X o+ β 2 X i + μi

¿ ¿ ¿ ¿ ¿
where Y i , X i , μi are the transformed variables, β 1 β 2are the parameters of the
transformed model.

Applying OLS to the Transformed Model:


74

Since the error term in the transformed model is homoscedastic, applying


OLS to this model yields Best Linear Unbiased Estimators (BLUE) for the
¿ ¿
parameters β 1∧β 2 .

GLS Estimation Procedure:

To obtain the GLS estimators, we minimize the sum of squared residuals for
the transformed model:

∑ ( Y ¿i − β^ ¿1 X ¿o − ^β¿2 X ¿i )
2

This process leads to the GLS estimators for the parameters.

Difference Between OLS and GLS

 OLS minimizes the sum of squared residuals without considering the


variability of each observation.
 GLS minimizes a weighted sum of squared residuals, where the weight
1
for each observation is the inverse of its error variance ( 2 ). This
σi
means observations with smaller error variance are given more weight
in the estimation.

Thus, GLS adjusts for heteroscedasticity by assigning weights based on the


variance of the error terms, leading to more efficient and reliable estimators.

Weighted Least Squares (WLS)

1
GLS with a specific form of weights (where weights are w i=( )
2 is
σi
often referred to as Weighted Least Squares (WLS).

WLS is a special case of GLS, where the weights are chosen to


account for heteroscedasticity.

Key Points of GLS:

 GLS allows you to handle heteroscedasticity by transforming the


variables in such a way that the errors become homoscedastic.
 It provides more reliable estimators when the error variances differ
across observations.
75

 The method is efficient, producing BLUE estimators, which are the best
possible unbiased estimators under heteroscedasticity.

CHAPTER 5 Time-Series Econometrics

Q1: Stationarity, Tests for Stationarity

Stationarity
Stationarity means that the statistical properties (mean, variance, and
autocovariance) of the series do not change over time. This is crucial for
accurate forecasting and modeling in econometrics.

Types of Stationarity
 Weak or Covariance Stationarity
A time series is weakly stationary if:

The mean is constant: E(Y t )=μ

2
The variance is constant: Var ( Y t ) =E ( Y t −μ ) =σ

The autocovariance depends only on the lag k, not on time t:

γ k =E [(Y t −μ)(Y t +k −μ)]

 Strict Stationarity
A series is strictly stationary if the entire probability distribution
remains unchanged over time (all moments are invariant). For normal
distributions, weak stationarity implies strict stationarity.
76

Tests for Stationarity


1. Unit Root Test
The unit root test is used to determine whether a time series is non-
stationary by checking if it has a unit root. A unit root indicates that the
series has a stochastic trend, meaning shocks to the series have a lasting
effect.

Formula:
Y t =ρ Y t−1 + ε t

Null Hypothesis (H₀): The series has a unit root (ρ = 1), meaning it is
non-stationary.

Alternative Hypothesis (H₁): The series does not have a unit root,
meaning it is stationary.

The test is crucial because most econometric models require the series
to be stationary for valid results. If the series has a unit root,
differencing or transformation is usually required.

2. Dickey-Fuller (DF) Test


The Dickey-Fuller test is the simplest test for checking stationarity by testing
for a unit root. It is based on estimating the following equation:
Δ Y t=α + β Y t−1 +ε t

Here:

Δ Y t: First difference of the series Y t −Y t −1 .

α : Constant term (optional).

β Y t−1: Lagged value of the series.

ε t : Error term.

If β = 0, the series has a unit root and is non-stationary. The test focuses on
whether β is significantly different from zero.

Limitations:
77

It assumes that the residuals ε t are uncorrelated, which may not hold in
many cases.

3. Augmented Dickey-Fuller (ADF) Test


The Augmented Dickey-Fuller test is an extension of the Dickey-Fuller test
that addresses autocorrelation in the residuals by adding lagged differences
of the dependent variable. The test equation is:
ρ
Δ Y t=α + β Y t−1 +∑i=1 γ i Δ Y 1−t +ε t

Here:

vii) ∑i=1pγiΔYt−i\sum_{i=1}^{p} \gamma_i \Delta Y_{t-i}: Lagged


differences added to control for autocorrelation.

The ADF test checks the null hypothesis that β=0\beta = 0 (unit root exists)
against the alternative that β<0\beta < 0 (stationarity).

The inclusion of lagged terms improves reliability, especially when the series
exhibits serial correlation. However, selecting the appropriate number of lags
(pp) is critical and can impact results.

4. Phillips-Perron (PP) Test


The Phillips-Perron test is another test for a unit root, similar to the ADF test
but differing in how it handles autocorrelation and heteroskedasticity. It
adjusts the test statistics using a non-parametric method instead of adding
lagged differences.

The test uses the same equation as the Dickey-Fuller test:


Δ Y t=α + β Y t−1 +ε t

The P test is particularly useful when the residuals exhibit heteroskedasticity


or serial correlation. It modifies the test statistic to account for these issues.

Null Hypothesis (H₀): The series has a unit root (non-stationary).


Alternative Hypothesis (H₁): The series is stationary.

Advantages:

 Does not require specifying the number of lags.


78

 More robust to heteroskedasticity in the error term compared to the


ADF test.

5. KPSS Test (Kwiatkowski-Phillips-Schmidt-Shin Test)


The KPSS test is based on the hypothesis that a time series is stationary
around a deterministic trend or a constant. The null hypothesis of the KPSS
test assumes that the series is stationary, while the alternative hypothesis
suggests the series is non-stationary.

Hypotheses:

Null Hypothesis (H₀): The time series is stationary (stationary around


a deterministic trend or constant).

Alternative Hypothesis (H₁): The time series is non-stationary (it


has a unit root).

KPSS Test Formula:


The formula for the KPSS test statistic is based on the residuals of the
regression of the time series on its trend or constant. The test statistic is
calculated as follows:

 Regress the time series data on a constant or a trend (depending on


the test).
 Calculate the residuals from this regression.
 Compute the partial sum of the residuals.
 Use these partial sums to compute the test statistic.

The test statistic is given by:


T 2
Σ t =1 e^ t
KPSS statistic= 2
σ^
Where:

 e^ t = residual at time t,
 σ^ 2 = estimated variance of the residuals.
79

T-Statistic (Test Statistic Calculation):


The KPSS test statistic is computed and compared with critical values from
the KPSS distribution table. If the test statistic exceeds the critical value at a
certain significance level (usually 5%), the null hypothesis is rejected,
suggesting the series is non-stationary.

Interpretation:

 If the test statistic is greater than the critical value: Reject the null
hypothesis. This suggests that the series is non-stationary and likely
has a unit root.
 If the test statistic is less than the critical value: Fail to reject the null
hypothesis. This indicates that the series is stationary (or at least does
not have a unit root).

Steps for KPSS Test:

 Choose whether to test for stationarity around a constant (level


stationarity) or a trend (trend stationarity).
 Estimate the KPSS statistic based on the residuals.
 Compare the calculated statistic with the critical value at the desired
significance level.
 Make a decision based on the comparison: reject or fail to reject the
null hypothesis.

The KPSS test is often used alongside other tests, such as the ADF test, to
confirm the results and provide a more robust analysis of stationarity in time
series data.

Advantages:

 Complements unit root tests by providing a reverse hypothesis.


 Useful for confirming results obtained from ADF and PP tests.

Q2: Transforming Non-stationary Time Series


80

Non-stationary
A non-stationary time series is a series whose statistical properties, such as
mean, variance, and autocorrelation, change over time. Mean its behavior is
not constant or predictable in the long term. Non-stationarity can arise due
to trends, cycles, or structural breaks in the data. Non-stationary data may
not be useful for modeling or forecasting without transformation.

Transforming Non-stationary Time Series


In Time Series and Panel Data Econometrics, non-stationary time series can
cause problems in analysis. To make them useful for modeling, we often
need to transform them into stationary series. There are two main types of
transformations: Difference-Stationary Processes and Trend-Stationary
Processes. Here's a detailed note on both:

1. Difference-Stationary Processes (DSP)


A time series is difference-stationary if it becomes stationary after
differencing (i.e., subtracting the previous observation from the current one).
This transformation is used when the time series shows a random walk or
unit root behavior.

Steps to Transform Using DSP:


 First Differencing: Subtract the value at time t−1t-1 from the
value at time tt:

Δ y t= y t − y t −1

If the series shows a random walk, differencing once often makes it


stationary.

 Further Differencing: In some cases, more than one differencing


step is required. For example, if the first differencing doesn't lead to
stationarity, you can try second differencing:

2
Δ y t =Δ y t− Δ y t−1

Characteristics:

 A series with a unit root (i.e., a random walk) is difference-stationary.


81

 The differenced series will not have a deterministic trend but may have
a stochastic trend.
 The process involves taking the difference between consecutive values
until stationarity is achieved.

2. Trend-Stationary Processes (TSP)


A time series is trend-stationary if it has a deterministic trend (e.g., a linear
or quadratic trend) and becomes stationary once the trend is removed.

Steps to Transform Using TSP:

 Detrending: The idea here is to remove the deterministic trend


from the series. This can be done by:
 Subtracting the trend: Fit a trend (e.g., a linear or quadratic
regression model) and subtract it from the original series.
 Regression Method: Perform regression of the series on time (or
other variables) to estimate the trend and then subtract this trend from
the series.
 Seasonal Adjustment: In some cases, removing seasonal effects
is also required, especially if the trend is influenced by seasonal
variations.

Characteristics:

 A series with a deterministic trend (e.g., a line) is trend-stationary.


 After removing the trend, the residual series is stationary and does not
exhibit a unit root.
 Trend-stationary series may exhibit cyclical or seasonal movements,
but once the trend is removed, these patterns are not an issue.

Q3: ARMA and ARIMA Models

ARMA and ARIMA Models


AR (Autoregressive) Model
The AR model describes a time series where the current value depends on its
previous values. The AR process is expressed as:
82

(Y t −δ)=α t (Y t −1−δ)+ μt

Where:

 Y t is the value at time tt,


 δ is the mean of the series,
 α is the coefficient for the lagged term,
 μt is the random error term (white noise).

Process: In the AR(1) process, the value at time t depends on the


value at t−1, plus a random shock. A higher-order AR model (AR(p))
includes more lags of the series:

(Y t −δ)=α 1(Y t−1−δ )+ α 2 (Y t −2−δ)+⋯+ α p (Y t− p−δ)+ μ t

The order of the model (p) refers to how many past values of the
series are included.

Purpose: AR models are useful when the current value of the series is
closely related to its past values.

MA (Moving Average) Model


The MA model focuses on the error terms (shocks) in the time series. The
current value depends on past error terms. The MA model is expressed as:

Y t =μ+ β o ut + β 1 u t−1

Where:

 μ is a constant,
 ut are error terms (white noise),
 β0,β1 are coefficients.

Process: In the MA(1) process, the value at time tt is a weighted average


of the current and previous error terms. A higher-order MA model (MA(q))
includes more past error terms:

Y t =μ+ β o ut + β 1 u t−1 +⋯+ β q ut−q


83

The order of the model (q) refers to how many past error terms
are used.

Purpose: MA models are suitable when the current value is influenced by


recent shocks or random disturbances.

ARMA (Autoregressive Moving Average) Model


The ARMA model combines both autoregressive and moving average
components. It models the time series with both past values of the series
and past error terms. The ARMA model is written as:

Y t =θ+ α 1 Y t−1 + β o u t + β 1 ut −1

Where:

 θ is a constant,
 α 1 is the coefficient for the autoregressive part,
 β o, β 1 are the coefficients for the moving average part,
 ut are the error terms.

Process: The ARMA(p, q) process has p autoregressive terms and q moving


average terms. This allows the model to capture both the dependence on
past values and the impact of past shocks:

Y t =θ+ α 1 Y t−1 + β o u t + β 1 ut −1

Purpose: ARMA models are used when a time series exhibits both
autoregressive and moving average properties.

ARIMA (Autoregressive Integrated Moving Average) Model


ARIMA models are used when the time series is non-stationary (i.e., its mean,
variance, and covariance change over time). The series must first be
differenced (d times) to make it stationary before applying the ARMA model.
The ARIMA model is written as:

Y t =θ+ α 1 Y t−1 +⋯+α p Y t − p + β o u t +⋯+ β q ut−q


84

The key difference is the integration part (I), which refers to


differencing the series to make it stationary.

Process: An ARIMA(p, d, q) process has:

 p autoregressive terms,
 d differencing steps (to make the series stationary),
 q moving average terms.

If the series is non-stationary and needs differencing (e.g., ARIMA(2, 1,


2)), then we difference the series once to make it stationary before
applying the ARMA(2, 2) model.

Purpose: ARIMA models are used for forecasting non-stationary time series
after differencing them to achieve stationarity.

Q4: Comparison of forecast based on ARIMA and


regression models,

Comparison of Forecasts Based on ARIMA and


Regression Models
Both ARIMA (Autoregressive Integrated Moving Average) and regression
models are commonly used for forecasting time series data. However, the
two approaches have different strengths and are suited for different types of
data. Here is a comparison of these two forecasting techniques:

1. Model Structure
ARIMA Model:

 ARIMA models are designed specifically for time series forecasting.


They account for the temporal dependencies in the data
(autoregressive component), past errors (moving average component),
and any necessary differencing to make the series stationary
(integration component).
85

 It is a purely time-series model and doesn't require external


explanatory variables. The model relies only on the history of the time
series itself to make predictions.

Regression Model:

 Regression models, especially linear regression, use explanatory


variables (predictors) to forecast the dependent variable (response).
These models assume a relationship between the dependent variable
and one or more independent variables.
 In time series forecasting, regression models can be extended to
include lagged values of the dependent variable or other time-related
variables. This is often referred to as "autoregressive regression" when
past values of the dependent variable are used as independent
variables.

2. Stationarity
ARIMA Model:

 ARIMA models require the time series to be stationary (constant mean,


variance, and covariance over time). If the series is non-stationary,
differencing is applied until the series becomes stationary. This is a key
feature of ARIMA models.
 ARIMA handles trends and seasonality through differencing and by
incorporating autoregressive and moving average terms.

Regression Model:

 Regression models do not require the time series to be stationary, but


the underlying assumption is that the relationship between variables
remains consistent over time.
 If the time series has trends or seasonality, these must be included as
independent variables in the regression model (e.g., including time as
a predictor, or using seasonal dummy variables).

3. Data Requirements
ARIMA Model:
86

 ARIMA models use only the historical data of the time series itself,
making them useful when you have a single series to forecast, and no
additional explanatory variables are available.
 It is not necessary to include any external information or predictors,
though external factors can be included in an ARIMAX model (an
extension of ARIMA).

Regression Model:

 Regression models require external predictor variables, which may or


may not be time-related. These predictors could be other time series or
even unrelated external factors (e.g., economic indicators, policy
changes).
 A regression model can work with multiple predictors (multivariate
regression) to improve the forecast.

4. Handling Trends and Seasonality


ARIMA Model:

 ARIMA models handle trends and seasonality in a more systematic way


by differencing the data and incorporating seasonal components (using
SARIMA for seasonal effects).
 The differencing in ARIMA removes trends, and the autoregressive and
moving average parts model the residual fluctuations.

Regression Model:

 Regression models require trends and seasonality to be explicitly


modeled as additional predictors. For example, time can be included as
an independent variable to capture the overall trend, or seasonal
dummy variables can be added to account for seasonal fluctuations.
 The model can become more complex when there are multiple
seasonal cycles or other time-related patterns to account for.

5. Forecasting Approach
ARIMA Model:

 The ARIMA model forecasts future values based solely on past values
and past error terms. The forecast is entirely data-driven, relying on
the structure of the time series.
87

 ARIMA is good for series with no external explanatory variables and


when you have sufficient historical data to identify the patterns
(autoregressive and moving average processes).

Regression Model:

 A regression model forecasts based on the relationship between the


dependent variable and the independent variables. If the model
includes time-related predictors (such as the lagged values of the
dependent variable), it can be similar to an autoregressive model.
 The regression approach allows you to incorporate external factors that
could influence the dependent variable, which can improve forecasting
accuracy when those factors are well understood and measurable.

6. Complexity and Flexibility


ARIMA Model:

 ARIMA models are relatively simple but can become complex if


the time series has higher-order autoregressive or moving
average terms or if it requires seasonal adjustments.
 ARIMA's flexibility lies in its ability to differencing the series and
fit the best model for a given time series data.

Regression Model:

 Regression models can become more complex as more external


variables are included. The complexity grows with the number of
predictors, especially when dealing with multivariate regression.
 The model structure is flexible, allowing the inclusion of multiple
predictors, but it requires a deep understanding of the
relationships between the predictors and the dependent variable.

7. Forecasting Accuracy
ARIMA Model:

 ARIMA is particularly useful when there are no significant


external predictors, and the time series has clear patterns of
autocorrelation.
88

 However, ARIMA may not perform well if the series is heavily


influenced by external factors that are not captured by the past
values and error terms.

Regression Model:

 Regression models can perform better if external predictors have


a significant influence on the dependent variable. For example,
when economic indicators, weather, or other variables affect the
series, regression models can incorporate these effects to
improve forecasting accuracy.
 However, regression models can suffer from multicollinearity or
overfitting if too many predictors are included without proper
regularization.

Q5: Cointegration and Error Correction Models(ECM)

Cointegration and Error Correction Models (ECM)

Cointegration:
Cointegration refers to the situation when two or more non-stationary time
series variables are linked by a long-term equilibrium relationship. Even
though individual variables may follow random walks and be non-stationary,
their linear combination can be stationary. This stationary relationship
between the variables indicates that they share a common stochastic trend,
and hence, they are cointegrated.

In econometrics, cointegration is an important concept because it suggests


that despite short-term fluctuations or disequilibrium, the variables tend to
move together in the long run. This idea is fundamental when modeling
relationships between economic variables that are non-stationary.

Error Correction Model (ECM)


The Error Correction Model (ECM) is used to correct for disequilibrium
between two cointegrated variables by adjusting the short-term deviations
back to the long-term equilibrium. This model is particularly useful when
89

studying time series that are cointegrated, as it helps in analyzing both the
long-run and short-run dynamics between the variables.

The ECM was introduced by Sargan and later popularized by Engle and
Granger. According to the Granger Representation Theorem, if two variables
are cointegrated, there exists a dynamic relationship that can be expressed
as an ECM.

Basic Concept of ECM


The general form of the ECM is as follows:
Y t =α o +α 1 X t +α 2 ut −1+ ϵ t

Where:

 Y t and X t are the dependent and independent variables,


respectively.
 ut −1 is the lagged error term from the cointegration equation.
 ϵ t is a white noise error term.
 α 2 is the coefficient of the lagged error term, also known as the
"error correction term."

Interpretation of the Error Correction Term


The key component of ECM is the error correction term ut −1, which captures
the disequilibrium from the previous period. The coefficient α 2 indicates the
speed of adjustment to the equilibrium:

5. If α 2is negative, it means that if the variable Y t deviates from the


equilibrium (i.e., if ut −1is positive), the model will correct the deviation
by reducing Y t in the next period. This restores the equilibrium.
6. If ut −1 is negative, Y t will increase in the next period to correct the
disequilibrium.

Thus, the ECM ensures that the model adjusts the short-term fluctuations
back towards the long-term equilibrium, ensuring that the variables do not
diverge indefinitely.

Cointegration and Error Correction in Practice


In practice, the ECM is applied to cointegrated variables to model both short-
term and long-term relationships. For example, consider the relationship
between LPCE (log of per capita consumption) and LDPI (log of per capita
90

income), which are assumed to be cointegrated. The cointegration equation


is:
ut =LPC Et −β 1−β 2 LDP I t −β 3 t

Where ut is the error term from the cointegration equation, and the model
assumes that LPCE depends on LDPI and the error term.

The error correction model can then be written as:


LPC Et =α o +α 1 LDP I t + α 2 ut−1 +ϵ t

Here, α 2 is the error correction coefficient. If α 2 is negative and significant, it


indicates that the model is adjusting to correct any disequilibrium from the
previous period.

Challenges and Limitations


While cointegration and ECM are powerful tools for modeling relationships
between cointegrated variables, there are challenges:

Small sample performance: The performance of cointegration


and ECM tests can be unreliable with small sample sizes.

Critical values: The critical values for many of these tests are not
well-established for a wide range of models.

Model assumptions: The validity of ECM depends on assumptions


such as linearity and stationarity, and deviations from these
assumptions may lead to inaccurate results.

Data issues: The presence of outliers, structural breaks, or other


data problems can affect the performance of ECM.

Q6: ARCH and GARCH Models


ARCH and GARCH Models
91

ARCH Model (Autoregressive Conditional Heteroscedasticity)


The ARCH model was introduced by Robert Engle in 1982 to address issues of
changing volatility over time in financial markets and economic data.
Traditional time series models assumed constant variance
(homoscedasticity), but in reality, financial data often exhibits periods of high
and low volatility, leading to heteroscedasticity (variance changing over
time).

Basic Concept: The ARCH model assumes that the variance of the error
term at any given time depends on the past values of the error term. In
simple terms, the model suggests that future volatility is influenced by past
shocks (error terms).

Mathematical Representation: The ARCH model can be written as:

y t =μ+ϵ t

Where y t is the observed value (for example, returns), μ is the mean


(often assumed to be zero), and ϵ t is the error term at time t.

The error term ϵ t is modeled as:

ϵ t =σ t z t

Where:

 σ t is the conditional standard deviation (volatility) at time tt.

 z t is a white noise error term with zero mean and unit variance.

The conditional variance σt2\sigma^2_t (which represents volatility) is


assumed to depend on the past squared error terms:

2 2 2 2
σ t =α o +α 1 ϵ t −1 + α 2 ϵ t−2 +⋯+α q ϵ t−q

Here:

 α o is a constant (usually assumed to be positive).


 α 1, α 2 ,,,, α q are parameters that measure the contribution of past
squared error terms (shocks).
92

 q represents the number of lagged error terms considered in the


model.

Key Features of the ARCH Model:

1. Time-varying volatility: The model allows the volatility of the


error terms to change over time.
2. Dependence on past errors: The volatility at any point in time
is determined by the previous errors (shocks).
3. Financial applications: The ARCH model is often used in
financial markets to model risk, particularly in stock returns,
foreign exchange rates, and commodity prices.

Limitations:

Parameter explosion: As the number of lags increases, the


number of parameters grows, which can make the model
computationally expensive and difficult to estimate.

Short memory: The ARCH model relies heavily on the most


recent shocks, which may not capture the full dynamics of
volatility in some cases.

2. GARCH Model (Generalized Autoregressive Conditional


Heteroscedasticity)
The GARCH model was introduced by Tim Bollerslev in 1986 as an extension
of the ARCH model. The key improvement in GARCH is that it incorporates
both past squared error terms and past values of the conditional variance
(volatility), which makes it more efficient and effective at capturing volatility
clustering (a common phenomenon in financial markets where high volatility
is followed by high volatility and low volatility is followed by low volatility).

Basic Concept: The GARCH model assumes that current volatility is


influenced by both past error terms and past volatility, providing a more
accurate description of the persistence of volatility over time.

Mathematical Representation: The GARCH(p, q) model can be written as:

ϵ t =σ t z t
93

Where σ t is the conditional standard deviation (volatility) at time t, and


z t is a white noise error term with zero mean and unit variance.

The conditional variance σ 2t in the GARCH model is modeled as:

2 q 2 p 2
σ t =α o +∑i=1 α i ϵ^ t−i +∑ j=1 β j σ t− j

Where:

 α o is a constant term (positive).

 α iare coefficients for past squared error terms (i.e., past shocks).

 β j are coefficients for past conditional variances (volatilities).

 p is the number of lags for past volatilities (conditional


variances).

 q is the number of lags for past squared errors (shocks).

In this model, both past shocks and past volatility contribute to the
current volatility. The GARCH model thus allows volatility to exhibit
persistence, meaning that once high volatility is observed, it tends to
continue over time.

Key Features of the GARCH Model:

Incorporates both past shocks and volatilities: This is the


key difference from ARCH, making it a more flexible and powerful
model.

Persistence of volatility: The GARCH model is able to capture


long-lasting volatility clustering in financial data.

More efficient: Because GARCH models include past volatility in


their modeling, they are more efficient and require fewer
parameters than a high-order ARCH model.

Limitations:

Stationarity assumption: The GARCH model assumes that the


volatility is stationary over time, which may not always be
realistic in the long run.
94

Model complexity: While the GARCH model is more efficient


than ARCH, it can still become quite complex when there are
many lags (high pp and qq).

Not suitable for all data types: GARCH models are generally
more appropriate for financial data that exhibits volatility
clustering but may not perform well with other types of time
series data.

Variants of GARCH Models


There are several variants of the basic GARCH model that address specific
aspects of financial data and modeling challenges:

EGARCH (Exponential GARCH): Introduced by Nelson (1991), the


EGARCH model allows for modeling of asymmetric effects in volatility
(i.e., bad news may have a different impact on volatility compared to
good news).

ϵ t −i
ln ( σ t ) =α o+ ∑i=1 α i
2 q p 2
+ ∑ j=1 β j ln( σ t − j)
σ t−i

This model allows volatility to respond asymmetrically to positive and


negative shocks.

GJR-GARCH (Glosten, Jagannathan, and Runkle GARCH): Similar


to EGARCH, it models the asymmetry in volatility due to positive or
negative shocks, but in a different form.

TGARCH (Threshold GARCH): This variant also models asymmetry


by introducing a threshold parameter to account for different effects of
positive and negative shocks on volatility.

Estimation and Applications


Estimation: The parameters of both ARCH and GARCH models are
typically estimated using Maximum Likelihood Estimation (MLE),
which seeks to find the values of the parameters that maximize the
likelihood of observing the given data.
95

MLE estimation is preferred because it ensures the most


accurate estimates of the model parameters.

Applications:

Financial Markets: The most common application of ARCH and


GARCH models is in financial econometrics, where they are used
to model and forecast volatility in asset returns (stocks, bonds,
exchange rates, etc.). These models help in risk management,
asset pricing, and portfolio optimization.

Risk Management: Volatility forecasting is crucial for


determining the potential risk associated with assets and making
informed decisions about hedging or trading strategies.

Option Pricing: The GARCH model can be used to estimate the


volatility needed for pricing options and other derivative
instruments using models like the Black-Scholes model.

You might also like