Time Series and Panel Data Econometrics
Time Series and Panel Data Econometrics
Logit Model……………….……….……….……….……….……….……….……….….. 16
Fixed-effects model…………….……….……….……….……….……….……….……. 36
Random-effects model…………….……….……….……….……….……….…….…. 40
Identification problem…………….……….……….……….……….……….……...…. 55
Mathematical Form
Y =f ( X , β )+ ϵ Where:
1. Y : Dependent variable
2. X : Independent variable(s)
3. β : Parameters to be estimated
4. ϵ : Error term
Examples
i) Exponential Model:
Y=β0eβ1X+ϵ
v) Power Model:
Y= β0Xβ1+ϵ
4
Example:
Y = β₀ + β₁X + ε
Here, the model is linear because the parameter β₁ is not raised to a power
or multiplied by another parameter.
Example:
Y = β₀eβ₁X+ε
ii) Initial Guess: Guess initial values for the parameters (like
α(alpha), β(beta), and γ(gamma). These guesses can be random or
based on prior knowledge.
iii) Estimate Values: Plug the guessed values into the model to
estimate Y.
iv) Calculate the Error: Compare the estimated Y with the actual
observed values of Y. The error is the difference between the
predicted and the actual value.
vi) Repeat: Repeat the process until the error is minimal or you reach
an acceptable level of accuracy.
Example:
Let's say we have a simple nonlinear model like:
Y = α + βX2 + γX
The Taylor Expansion Series is a way to break a complex function into a sum of simpler
terms, using the function's value and its derivatives at a specific point. It helps
approximate the function around that point, making it easier to understand or calculate.
Mathematically:
Where:
Convergence:
Limitations:
2. Convergence: It might not always give a correct result, even with infinite terms.
4. Many Terms: More terms are needed for better accuracy, which can be complex.
5. Non-Analytic Functions: It doesn’t work well for functions with sharp turns or
discontinuities.
1st find f ? As x – 3 = 0 so x = 3
Put x Value
f(3) = 81 + 72 + 1
Now Find f ’ ( a )
3x3 + 8x2 + x – 2
d
f ’ (x ) = (3x3 + 8x2 + x – 2)
dx
d
f ’ ’(x ) = (9x2 + 16x + 1)
dx
18(3) + 16
54 + 16 = 70
f ' ' (a) = 130
d
f ’ ’ ’ (x) = (18x + 16)
dx
f ’ ’ ’ (x) = 18
18
70 18
f(x) = 154 + 130(x–3) + (x–3)2 + (x–3)3
2 6
Or
Y = α eβx + ei
Where
Y = Dependent Variable
x = Independent Variable
α & β are Parameters
e = Error term
Let α = α0 & β = β0
These values are not necessarily equals to original values but close enough for approximation
We will approximate
f(x,α,β) = f(x,α0,β0)
. .
∆f ∆f
f(x,α,β) = f(x,α0,β0) = ∫
∆ α (α 0 , β 0 )
( α−α 0)+
∆ β
∫ ( β−β 0)
(α 0 ,β 0)
10
.
∆f
For β =>
∆β
∫ ¿ α0 xeβ0x
(α 0 ,β 0)
Let
Y = Y*
α-α0 = ∆α
β-β0 = ∆β
eβ0x = Jα
α0 xeβ0x = Jβ
Put in Equation A
We get:
Y = f(x,α0,β0) = Jα ∆α + Jβ ∆β
Estimate Parameters: We can now apply ordinary least squares (OLS) or other linear
regression techniques to estimate 𝛼 and 𝛽 from the data. If the Error is not minimum we will
start guessing again and again until we get minimum error.
Recalculate the Model: Once you have the estimates of 𝛼 & 𝛽, plug them back into the
nonlinear equation to get the final estimated model.
11
A Qualitative Response Model is used when the dependent variable is not numerical but
categorical. For example, it can represent choices like "yes" or "no," or outcomes like
"employed" or "unemployed." These models help to analyze decisions or situations where the
outcome is not continuous but qualitative.
Mathematically
Y = α + βΧ + μi
Y = 1, having a car
Y = 0, having no car
Υi = β1 + β2Χi + μi
Here:
Mean Pi= E ( )
Yi
Xi
Variance Pi (1 − Pi)
Challenges in LPM
Non-Normality of Disturbances (μi)
The error term μi follows a Bernoulli distribution, not a normal distribution.
13
This affects statistical inference in small samples but does not make the OLS
estimates biased.
Heteroscedasticity
The variance of μi depends on Xi, meaning it is not constant:
Var(μi) = Pi(1−Pi)
Out-of-Bounds Predictions
( )
Yi
E Xi must lie between 0 and 1, but LPM does not guarantee this. Predicted
probabilities Ŷᵢ can be less than 0 or greater than 1.
Here, ωi = Pi(1−Pi)
For the bounds issue, models like Logit and Probit are used, which ensure
probabilities stay between 0 and 1.
Linear Relationship
The model assumes a linear relationship between the independent
variable X and the probability of Y = 1.
Conditional Probability
Bernoulli Distribution
The dependent variable follows a Bernoulli distribution, with mean Pi
and variance Pi(1−Pi).
Interpretation of Coefficients
The slope coefficient β2 shows the change in probability for a one-unit
change in X.
Advantages of LPM
Simplicity
The model is easy to understand and apply, requiring only basic
regression techniques.
Direct Interpretation
The coefficients directly represent the change in probability, making
them easy to interpret.
Limitations of LPM
Predicted Probabilities Out of Bounds
LPM does not guarantee that predicted probabilities will lie between 0
and 1.
Heteroscedasticity
The variance of the error term is not constant, leading to inefficient
OLS estimates.
Non-Normality of Errors
The error term follows a Bernoulli distribution, which complicates
statistical inference in small samples.
15
Linear Assumption
Probabilities are not necessarily linear in relation to the independent
variables, which can lead to inaccurate results.
Policy Evaluation
Governments and organizations use LPM to evaluate the impact of
policies, like:
Market Research
Businesses use LPM to understand customer behavior, for example:
Employment Studies
LPM is used to analyze labor market issues, such as:
The Logit Model is a regression model used to estimate the probability of a binary outcome
(e.g., success or failure, yes or no) based on one or more independent variables. It assumes that
the dependent variable is the log of the odds ratio, which is linearly related to the independent
variables.
1
Pi= −Zi
1+ e
Zi = β1 + β2Xi
Where:
1. Probability Representation:
2. Odds Ratio:
The odds ratio, which measures the odds of an event occurring, is:
(1−P i)
= eZi
Pi
3. Logit Transformation:
Taking the natural log of the odds ratio gives the logit:
Li=ln ( 1−Pi
Pi
)=Zi=¿β1+¿β2 Xi
The logit is linear in X and the parameters β1 and β2.
4. Nonlinear Relationship:
The probabilities (Pi) are nonlinear in X, unlike the Linear Probability Model (LPM).
1. Boundaries of Probability:
2. Multiple Regressors:
3. Interpretation of Parameters:
18
β2: Shows the change in the log-odds (Li) for a one-unit change in X. For example, it
indicates how the odds of owning a house change with income.
β1: Represents the log-odds when X = 0.
5. Estimation of Probabilities:
While LPM assumes a linear relationship between Pi and X, the Logit Model assumes a
linear relationship between the log-odds and X.
Li=ln ( 1−Pi
Pi
)=Zi=¿β1+¿β2 Xi+ ui
where:
Individual-Level Data:
19
P i= ¿
^
¿
^
^L i= P i
1−P^i
Heteroscedasticity:
The error term in the Logit Model is heteroscedastic, meaning the variance of
the error term changes with the level of Pi. This requires the use of weights in
the estimation process to correct for this.
Where:
OLS Estimation:
After transforming the data, we apply OLS (Ordinary Least Squares) to
estimate the parameters β1 and β2
Hypothesis Testing:
Calculation:
The Model work with the log-odds formula:
Li=ln ( 1−Pi
Pi
)
Pi: Probability of owning a house in a group.
(1−Pi): Probability of not owning a house.
Example:
Suppose we want to study how income affects the probability of adopting
solar panels. The data is grouped by income levels (e.g., families earning
$10,000–$20,000, $20,000–$30,000, etc.). For each group, we know:
Li=ln ( 1−Pi
Pi
)
For example:
Li=ln ( 1−0.20
0.20
)=ln ( 0.25)=−1.3863
Step 2: Regression Equation
The grouped logit regression model is:
^L i=¿β1+¿β2 Xi
Where:
After performing weighted regression, the equation might look like this:
^L i=−2.0+0.05 Xi
22
Slope (0.05)
For every $1,000 increase in income, the log-odds of adopting solar
panels increase by 0.05.
e0.05 ≈ 1.051
This means the odds of adopting solar panels increase by 5.1% for every
$1,000 increase in income.
Probability Example
To calculate the probability of adopting solar panels for a specific income
level, use:
Li
e
Pi=¿ Li
1+ e
Li=−2.0+ 0.05(30)=−0.5
−0.5
e
Pi=¿ −0.5
=0.3775
1+ e
ΔP=β ⋅ P⋅(1−P)
At $30,000 income:
This means the probability increases by 1.17% for every $1,000 increase in
income.
Li=ln ( 1−Pi
Pi
)=β 1+ β 2(GPA)+ β 3(TUCE)+ β 4( PSI )+ μ
Where:
Interpretation of Coefficients:
24
Each coefficient shows the effect of a variable on the log odds of the
outcome (getting an A).
For example, if the coefficient for GPA is positive, it means that as GPA
increases, the odds of getting an A increase.
The coefficients are usually interpreted in terms of odds ratios. For
example, if the coefficient for PSI is 2.3786, the odds of getting an A
are about 10 times higher for students using the new teaching method
(PSI = 1) compared to those who are not (PSI = 0).
Goodness of Fit:
McFadden R²: This is a measure of how well the model fits the data,
but it is different from the usual R² in linear regression. Higher values
indicate a better fit.
Count R²: This is the proportion of correct predictions made by the
model. It shows how accurate the model is in classifying students'
grades as A or not A.
Example:
• The coefficient for GPA is 2.8261, meaning a higher GPA increases the
likelihood of getting an A.
• The coefficient for PSI (new teaching method) is 2.3786, meaning students
in the PSI group are about 10 times more likely to get an A compared to
those who are not in the PSI group.
Likelihood Function:
A likelihood function represents the probability of observing the sample
data given a specific set of parameters in a statistical model. It is used to
estimate the parameters that make the observed data most probable. In
25
other words, the likelihood function helps us to find the values of the model's
parameters that best fit the observed data.
Where:
P(Y =1∣ X ):
1
P ( Y =1 ∣ X )= −(βο +β 1 Χ 1+ β 2 Χ 2+…+ βκΧκ )
1+ e
Where:
The likelihood function for the logit model is the joint probability of observing
all the n data points, assuming that each observation Yi is independent. The
likelihood function is:
26
n
L(β ∣ X , Y )=Π i=1 ¿
Where:
Where:
P(Y=1∣X) is the probability that the event occurs (e.g., a family owns a
house).
27
The grouped data refers to situations where we have data that is aggregated
into categories, such as income ranges, and for each category, we observe
the proportion of people who own a house. The empirical probabilities Pi for
each group can be used to estimate the latent variable Ι i .
This method requires calculating the inverse of the normal CDF (i.e., using
the standard normal distribution to reverse the transformation) to obtain the
latent variable values. The model is then estimated by standard regression
methods, taking into account the relationship between the explanatory
variables and the latent variable.
Model Structure:
In the Probit model, the probability of an event occurring is modeled
using the cumulative distribution function (CDF) of the normal
distribution.
The model is written as: P(Y =1∣ X )=Φ (Xβ)
Where:
Estimation Method:
Yi 1−Yi
L(β )=Π ni=1 [Φ ( Xiβ ) ⋅ ( 1−Φ ( Xiβ ) ) ]
29
Interpretation:
When applying the Probit model to time series data, you often
model the likelihood of a specific event occurring at each time period,
considering time-dependent variables.
In panel data, the Probit model can account for individual-specific
effects by using fixed or random effects to capture unobserved
heterogeneity.
The Probit Model is used when the dependent variable (Y) is binary, taking values like 0 or 1. It
assumes that the probability of Y = 1 is determined by a cumulative normal distribution
function.
Likelihood Function:
The likelihood function for n observations is:
Yi 1−Yi
L ( β ) =Π ni=1 [Φ ( Xiβ ) ⋅ ( 1−Φ ( Xiβ ) ) ]
Log-Likelihood Function:
Taking the log of the likelihood function to simplify calculations:
n
ln L ( β )=Σ i=1 [Yi lnΦ ( Xiβ )+ ( 1−Yi ) ln (1−Φ( Xiβ))]
Maximization:
The parameters (β) are estimated by maximizing the log-likelihood function using
numerical optimization techniques (e.g., Newton-Raphson).
Interpretation of Coefficients
Coefficients in the logit and probit models cannot be directly compared
because the distributions have different variances. To compare coefficients,
you can multiply the probit coefficient by approximately 1.81 to get the logit
coefficient or multiply the logit coefficient by 0.55 to get the probit
coefficient.
censoring (using a probit model), and the second step adjusts the regression
for this probability.
Key Takeaways
The Tobit model is useful for censored data, where the dependent variable is
only partially observed. Maximum likelihood estimation is typically used to
estimate the parameters of the Tobit model, providing more accurate results
than OLS in such cases.
Reduced Multicollinearity
By observing multiple entities over time, panel data often reduces the
issue of multicollinearity that can arise in cross-sectional or time-series
data alone.
Panel data allows distinguishing between factors that change over time
and those that remain constant, improving the understanding of their
respective impacts.
In Pooled OLS, all the observations (both cross-sectional and time-series) are
combined into one dataset. The regression model is then estimated using the
usual Ordinary Least Squares (OLS) method.
Formula:
Y ¿ =α + β X ¿ +ϵ ¿
Where
This method assumes that the relationship between the variables is the
same for all individuals and across all time periods.
Formula:
Y ¿ =α + β X ¿ + μ¿ + ϵ ¿
Where:
The Hausman test is used to decide between Fixed and Random Effects.
The null hypothesis of the test suggests that the Random Effects model is
appropriate (i.e., the individual effects are uncorrelated with the regressors).
If the Hausman test shows significant differences between the Fixed and
Random Effects estimates, the Fixed Effects model should be used.
Key Points:
Same Intercept for All: All entities are assumed to have the
same intercept, denoted by α. This simplifies the model by treating all
entities as if they start from the same baseline value.
Y ¿ =α + β X ¿ +ϵ ¿
Where:
same for each entity, and the only difference arises from the error term
ϵ ¿.
Use: The common intercept method is used when the entities are
expected to have similar characteristics and behaviors. For example, if
studying the inflation rates of different countries, it can be assumed
that all countries behave similarly in terms of inflation, leading to a
common intercept.
Limitations:
Fixed-effects Model
The fixed-effects model is used in panel data analysis when you are
interested in analyzing the impact of variables that vary over time but might
have individual differences that do not change across time (e.g., individual
characteristics of entities like people, firms, or countries).
Key Features:
Individual-Specific Effects: It assumes that each entity (e.g.,
individual, firm, or country) has its own unique characteristics that
might affect the dependent variable but do not vary over time.
Where:
38
Disadvantages:
It cannot estimate the effects of variables that do not vary over time
within entities.
Key Concepts:
Y ¿ =α + β X ¿ +γ i + ϵ ¿
Where:
Advantages of LSDV:
Disadvantages:
The model can become very large if there are many individuals, as it requires
a dummy variable for each individual.
The approach may suffer from the "dummy variable trap" (perfect
multicollinearity) if an intercept is included in the model.
41
Assumptions:
Y ¿ =α + β X ¿ + μ¿ + ϵ ¿
Where:
Advantages:
Disadvantages:
The Hausman Test is often used to determine which model to use. If the
test indicates that the individual effects are correlated with the independent
variables, the fixed-effects model is preferred. Otherwise, the random-effects
model is a better choice.
GLS transforms the original model to make the errors homoscedastic (with
constant variance) and uncorrelated by multiplying the model by a weighting
matrix. This matrix is typically the inverse of the error covariance matrix.
Estimation:
After transformation, GLS applies the OLS method to the modified model to
get efficient and unbiased estimates.
Steps in GLS:
Model Specification:
y= Xβ+ μ
−1
Multiply both sides of the model by a matrix W (such that W =Σ 2 the inverse
of the square root of Σ) to transform the model into one with uncorrelated,
homoscedastic errors.
Wy=WXβ +Wμ
−1
Now, we apply OLS to the transformed model: ^β GLS= ( X ' W ' WX ) X ' W ' Wy
Interpretation:
The resulting ^β GLS is the efficient estimator of the coefficients, as it takes into
account the heteroscedasticity or autocorrelation in the error terms.
Advantages of GLS:
More efficient than OLS when the error terms are heteroscedastic or
autocorrelated.
Disadvantages of GLS:
Fixed-effects model:
45
Random-effects model:
Key differences:
Demand function:
Qdt =α o +α 1 Pt + α 2 I t + α 3 Rt + μ 1t
Supply function:
Qst =β o + β 1 Pt + μ 2t
Where:
P = Price
Q = Quantity
I = Income
R = Wealth
μ = Error terms
47
Reduced-Form Equations:
Pt =γ o +γ 1 I t + γ 2 Rt + v t Qt =δ 3 + δ 4 I t +δ 5 R t +w t
The estimated ^
Pt and the residuals ^v t will be used in the next
step.
Q t =β o + β 1 ^
Pt + β2 v^ t + μ 2 t
Where:
Qd : Quantity demanded
P: Price
Y: Income
μ1: Error term
Supply Equation:
Qs =β o + β1 P+ μ2Where:
Qs : Quantity supplied
P: Price
μ2: Error term
Purpose: Used when the system of equations is exactly identified (only one
unique solution exists for each equation).
Process:
Process:
Process:
Process:
Key Characteristics:
Interdependence:
The dependent variable in one equation may appear as an explanatory
(independent) variable in another equation. For example, in a supply and
demand model:
Endogeneity:
Endogeneity arises because the dependent variable is influenced by other
variables within the system, which are also determined by the model. This
makes the usual estimation techniques, like Ordinary Least Squares (OLS),
biased and inconsistent.
System of Equations:
A simultaneous system consists of multiple equations, each representing a
specific relationship. These equations are solved together since the variables
are interrelated. For example:
52
Demand: Qd =a−bP
Supply: Qs =c +dP
Identification Problem:
In simultaneous systems, not all equations can be estimated directly
because the variables are jointly determined. To estimate a particular
equation, it must be identified—either exactly identified or overidentified—
based on restrictions (e.g., exclusion of certain variables or structural
assumptions).
Equilibrium Relationships:
These systems often model equilibrium in markets, where the demand and
supply equations interact to determine price and quantity.
In this model, the quantity supplied and quantity demanded are determined
simultaneously by the price level. The two equations could be:
Here, P is the price, and Qd and Qs are the quantity demanded and
supplied, respectively. The market equilibrium occurs when Qd =
Qs, and the price is determined simultaneously by both demand and
supply.
2. IS-LM Model:
This model represents the equilibrium in the goods market and the money
market. The IS curve shows the relationship between interest rates and
output in the goods market, and the LM curve shows the relationship
between interest rates and output in the money market.
IS Equation: Y =C (Y −T )+ I (r )+ G
LM Equation: M / P=L(Y , r )
3. Phillips Curve:
The Phillips curve expresses the inverse relationship between inflation and
unemployment, often represented as:
Inflation Equation: π t =π et −β (U t −U n )
Unemployment Equation: U t =U −γ (π t−π e )
n
The money demand equation reflects how the demand for money depends
on income and interest rates, while the money supply is determined by
central banks.
Where:
Y t is income,
I t is an exogenous variable,
55
Since the covariance is non-zero, it shows that Y t and μt are correlated. This
violates the assumption of OLS that the error term is uncorrelated with the
explanatory variables.
^β t = ∑ ( Ct −C ˉ ) ( Yt −Y ˉ )
2
∑ ( Yt−Y ˉ )
^β = ( o
∑ β + β 1 Yt + μ t ) Yt
1 2
∑Y t
E ( Y t μt )
E( ^β1 )=β 1+ 2
E(Y t )
Since Yt and μt are correlated, the second term does not equal zero.
Therefore, ^β 1is biased.
56
Since the second term is positive, ^β 1 will always overestimate β1, and the
bias will not disappear even as the sample size increases. Therefore, ^β is an
1
inconsistent estimator.
Q5: Identification Problem
Identification Problem
The Identification Problem refers to the challenge in determining whether
numerical estimates of parameters in a structural equation can be derived
from the reduced-form coefficients. In simpler terms, it asks whether we can
figure out the underlying causes (structural parameters) from the observed
data (reduced-form equations).
Mathematical Explanation:
Let's assume we have a system of equations with k unknown parameters
(denoted by θ₁, θ₂, ..., θₖ) and n independent equations. The model is
unidentified if the number of equations n is less than the number of unknown
parameters k (i.e., n < k).
Demand Equation:
Qd =α o +α 1 Pt + μ 1t
Supply Equation:
Qs =β o + β1 P1 + μ2 t
58
Here, we have two equations with four unknowns: α0, α1, β0, and β1.
We cannot uniquely solve for the parameters because we only have two
equations.
Thus, with only two equations and four unknowns, the model is unidentified.
Mathematical Explanation:
For a model to be exactly identified, the number of unknown parameters k
must be equal to the number of independent equations n. Mathematically,
this condition can be expressed as:
n=k
Therefore, the model is exactly identified because we have exactly the same
number of equations as unknowns.
59
3. Overidentified Model
An overidentified model occurs when there are more independent equations
than unknown parameters. This allows for more data to test the validity of
the model and gives extra information to help estimate the parameters.
Overidentification provides extra equations that help improve the reliability
of parameter estimates.
Mathematical Explanation:
For a model to be overidentified, the number of independent equations n
must be greater than the number of unknown parameters k. Mathematically,
this condition can be expressed as:
n> k
Demand Equation:
Qd =α o +α 1 Pt + α 2 I t + μ1 t
Supply Equation with Lagged Price:
Qs =β o + β1 Pt + β 2 P t−1 + μ2 t
Here, we have:
Order Condition:
Example:
For a simple system with two equations (like the demand and supply
functions), the order condition helps us check whether each equation can be
estimated. If an equation excludes enough variables (at least M - 1), it can be
identified.
Rank Condition:
How to Apply:
Write the system in tabular form, where you track the coefficients of
the endogenous and predetermined variables.
Remove the coefficients of the row for the equation under
consideration.
Remove the columns of the variables that are in that equation.
Form matrices with the remaining coefficients and check if at least one
non-zero determinant can be found.
If the determinant is non-zero, the equation is identified. If all
determinants are zero, the equation is unidentified.
Estimation Methods:
There are two main types of estimation methods for simultaneous equations:
least squares and maximum likelihood. Each of these categories has specific
approaches:
4. K-Class Estimators:
Pt =γ o +γ 1 X t +w t
Qt=δ 2 +δ 3 X t + v t
∑ X t Pt
γ^ 1= 2
∑Xt
γ^ o=P ‾ − γ^ 1 X ‾
∑ X t Qt
δ^ 3= 2
∑ Xt
δ^ 2=Q ‾ −δ^ 3 X ‾
Once the reduced-form coefficients are estimated, you can derive the
structural coefficients (like α0,α1,β0,β1) using the relationships between the
reduced-form and structural coefficients.
For example, for the supply function, the structural coefficients can be
estimated as:
^β = δ^ − β^ γ^
ο 2 1 o
^
^β = δ 3
1
^γ 1
Example Problem:
Consider the following:
^
Pt =90.9601+0.0007 Xt
^ t =59.7618+0.0020 Xt
Q
^β = δ^ − β^ γ^
ο 2 1 o
^
^β = δ 3
1
^γ 1
^β ο=−183.7043
^β =2.6766
1
• First Stage
Replace the endogenous variable with its predicted values by regressing it on
the instrumental variables and other exogenous variables. The instrumental
variables must be:
• Second Stage
Use the predicted values of X ( ^ X ) from the first stage in place of the original
endogenous variable in the main equation. Estimate the model:
Y = β o + β1 ^
X + μThis gives consistent and unbiased estimates of β 1.
Assumptions of 2SLS
Advantages of 2SLS
Limitations of 2SLS
For example:
In studying the effect of education (X) on income (Y), years of schooling
might be endogenous. Distance to the nearest school (Z) can be used as an
instrument.
Steps in Using IV
Advantages of IV
Disadvantages of IV
Error Terms Correlation: The key feature is that the error terms
across the different equations are assumed to be correlated. Even
though the equations are separate, the errors are linked in some way,
which is why the system is called "seemingly unrelated."
Mathematical Representation:
Consider a system of mm equations:
y i= X i β i +ϵ i for i=1 , 2 ,... , m
Where:
Estimation Process:
Ordinary Least Squares (OLS): Each equation can be
estimated using OLS independently.
Advantages of SUR:
Improved Efficiency: By accounting for the correlation between
the error terms, SUR provides more efficient and reliable estimates
than estimating each equation separately.
Limitations:
Assumption of Error Correlation: SUR assumes that error terms
across equations are correlated, but if this assumption is wrong, the results
may be misleading.
1. Multiple Equations:
y 1= X 1 β 1+ ϵ 1
y 2= X 2 β 2 +ϵ 2
71
The key feature of SUR is that the error terms across the equations are
correlated. This implies that the errors from one equation provide information
about the errors in another equation. Despite the equations appearing
unrelated (seemingly), they are interconnected because of the correlation in
the residuals.
Each equation in the SUR system has its own dependent variable and a
potentially different set of independent variables. However, the equations
are related through the correlation of their error terms.
4. Efficiency in Estimation:
The structure of the SUR system allows for heterogeneity in the equations.
This means each equation can have different sets of explanatory variables
and different coefficients, which provides flexibility when modeling systems
with multiple relationships.
7. Practical Applications:
SUR is widely used in cases where multiple equations are likely to have
correlated errors, such as in macroeconomic modeling, market demand and
supply models, or any situation where different related phenomena are being
studied simultaneously.
Y i=β 1 + β 2 X i+ μ i
Yi Xo i Xi μi
= β1 + β2 +
σi σi σi σi
¿ ¿ ¿ ¿ ¿ ¿
Y i =β 1 X o+ β 2 X i + μi
¿ ¿ ¿ ¿ ¿
where Y i , X i , μi are the transformed variables, β 1 β 2are the parameters of the
transformed model.
To obtain the GLS estimators, we minimize the sum of squared residuals for
the transformed model:
∑ ( Y ¿i − β^ ¿1 X ¿o − ^β¿2 X ¿i )
2
1
GLS with a specific form of weights (where weights are w i=( )
2 is
σi
often referred to as Weighted Least Squares (WLS).
The method is efficient, producing BLUE estimators, which are the best
possible unbiased estimators under heteroscedasticity.
Stationarity
Stationarity means that the statistical properties (mean, variance, and
autocovariance) of the series do not change over time. This is crucial for
accurate forecasting and modeling in econometrics.
Types of Stationarity
Weak or Covariance Stationarity
A time series is weakly stationary if:
2
The variance is constant: Var ( Y t ) =E ( Y t −μ ) =σ
Strict Stationarity
A series is strictly stationary if the entire probability distribution
remains unchanged over time (all moments are invariant). For normal
distributions, weak stationarity implies strict stationarity.
76
Formula:
Y t =ρ Y t−1 + ε t
Null Hypothesis (H₀): The series has a unit root (ρ = 1), meaning it is
non-stationary.
Alternative Hypothesis (H₁): The series does not have a unit root,
meaning it is stationary.
The test is crucial because most econometric models require the series
to be stationary for valid results. If the series has a unit root,
differencing or transformation is usually required.
Here:
ε t : Error term.
If β = 0, the series has a unit root and is non-stationary. The test focuses on
whether β is significantly different from zero.
Limitations:
77
It assumes that the residuals ε t are uncorrelated, which may not hold in
many cases.
Here:
The ADF test checks the null hypothesis that β=0\beta = 0 (unit root exists)
against the alternative that β<0\beta < 0 (stationarity).
The inclusion of lagged terms improves reliability, especially when the series
exhibits serial correlation. However, selecting the appropriate number of lags
(pp) is critical and can impact results.
Advantages:
Hypotheses:
e^ t = residual at time t,
σ^ 2 = estimated variance of the residuals.
79
Interpretation:
If the test statistic is greater than the critical value: Reject the null
hypothesis. This suggests that the series is non-stationary and likely
has a unit root.
If the test statistic is less than the critical value: Fail to reject the null
hypothesis. This indicates that the series is stationary (or at least does
not have a unit root).
The KPSS test is often used alongside other tests, such as the ADF test, to
confirm the results and provide a more robust analysis of stationarity in time
series data.
Advantages:
Non-stationary
A non-stationary time series is a series whose statistical properties, such as
mean, variance, and autocorrelation, change over time. Mean its behavior is
not constant or predictable in the long term. Non-stationarity can arise due
to trends, cycles, or structural breaks in the data. Non-stationary data may
not be useful for modeling or forecasting without transformation.
Δ y t= y t − y t −1
2
Δ y t =Δ y t− Δ y t−1
Characteristics:
The differenced series will not have a deterministic trend but may have
a stochastic trend.
The process involves taking the difference between consecutive values
until stationarity is achieved.
Characteristics:
(Y t −δ)=α t (Y t −1−δ)+ μt
Where:
The order of the model (p) refers to how many past values of the
series are included.
Purpose: AR models are useful when the current value of the series is
closely related to its past values.
Y t =μ+ β o ut + β 1 u t−1
Where:
μ is a constant,
ut are error terms (white noise),
β0,β1 are coefficients.
The order of the model (q) refers to how many past error terms
are used.
Y t =θ+ α 1 Y t−1 + β o u t + β 1 ut −1
Where:
θ is a constant,
α 1 is the coefficient for the autoregressive part,
β o, β 1 are the coefficients for the moving average part,
ut are the error terms.
Y t =θ+ α 1 Y t−1 + β o u t + β 1 ut −1
Purpose: ARMA models are used when a time series exhibits both
autoregressive and moving average properties.
p autoregressive terms,
d differencing steps (to make the series stationary),
q moving average terms.
Purpose: ARIMA models are used for forecasting non-stationary time series
after differencing them to achieve stationarity.
1. Model Structure
ARIMA Model:
Regression Model:
2. Stationarity
ARIMA Model:
Regression Model:
3. Data Requirements
ARIMA Model:
86
ARIMA models use only the historical data of the time series itself,
making them useful when you have a single series to forecast, and no
additional explanatory variables are available.
It is not necessary to include any external information or predictors,
though external factors can be included in an ARIMAX model (an
extension of ARIMA).
Regression Model:
Regression Model:
5. Forecasting Approach
ARIMA Model:
The ARIMA model forecasts future values based solely on past values
and past error terms. The forecast is entirely data-driven, relying on
the structure of the time series.
87
Regression Model:
Regression Model:
7. Forecasting Accuracy
ARIMA Model:
Regression Model:
Cointegration:
Cointegration refers to the situation when two or more non-stationary time
series variables are linked by a long-term equilibrium relationship. Even
though individual variables may follow random walks and be non-stationary,
their linear combination can be stationary. This stationary relationship
between the variables indicates that they share a common stochastic trend,
and hence, they are cointegrated.
studying time series that are cointegrated, as it helps in analyzing both the
long-run and short-run dynamics between the variables.
The ECM was introduced by Sargan and later popularized by Engle and
Granger. According to the Granger Representation Theorem, if two variables
are cointegrated, there exists a dynamic relationship that can be expressed
as an ECM.
Where:
Thus, the ECM ensures that the model adjusts the short-term fluctuations
back towards the long-term equilibrium, ensuring that the variables do not
diverge indefinitely.
Where ut is the error term from the cointegration equation, and the model
assumes that LPCE depends on LDPI and the error term.
Critical values: The critical values for many of these tests are not
well-established for a wide range of models.
Basic Concept: The ARCH model assumes that the variance of the error
term at any given time depends on the past values of the error term. In
simple terms, the model suggests that future volatility is influenced by past
shocks (error terms).
y t =μ+ϵ t
ϵ t =σ t z t
Where:
z t is a white noise error term with zero mean and unit variance.
2 2 2 2
σ t =α o +α 1 ϵ t −1 + α 2 ϵ t−2 +⋯+α q ϵ t−q
Here:
Limitations:
ϵ t =σ t z t
93
2 q 2 p 2
σ t =α o +∑i=1 α i ϵ^ t−i +∑ j=1 β j σ t− j
Where:
α iare coefficients for past squared error terms (i.e., past shocks).
In this model, both past shocks and past volatility contribute to the
current volatility. The GARCH model thus allows volatility to exhibit
persistence, meaning that once high volatility is observed, it tends to
continue over time.
Limitations:
Not suitable for all data types: GARCH models are generally
more appropriate for financial data that exhibits volatility
clustering but may not perform well with other types of time
series data.
ϵ t −i
ln ( σ t ) =α o+ ∑i=1 α i
2 q p 2
+ ∑ j=1 β j ln( σ t − j)
σ t−i
Applications: