0% found this document useful (0 votes)
62 views11 pages

Chapter 4 Panel

The document discusses lag length selection criteria essential for autoregressive processes, emphasizing the importance of determining the correct number of lags for effective econometric modeling. It outlines various criteria such as AIC, BIC, and HQC, and their roles in optimizing model performance while considering data characteristics and model complexity. Additionally, it introduces concepts of Error Correction Mechanism (ECM) and cointegration tests, followed by an introduction to panel data analysis, its types, advantages, and challenges in estimating panel data regression models.

Uploaded by

bereketwubie89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views11 pages

Chapter 4 Panel

The document discusses lag length selection criteria essential for autoregressive processes, emphasizing the importance of determining the correct number of lags for effective econometric modeling. It outlines various criteria such as AIC, BIC, and HQC, and their roles in optimizing model performance while considering data characteristics and model complexity. Additionally, it introduces concepts of Error Correction Mechanism (ECM) and cointegration tests, followed by an introduction to panel data analysis, its types, advantages, and challenges in estimating panel data regression models.

Uploaded by

bereketwubie89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

-----continued from ch-3

Lag Length selection criteria

Lag length selection is the most serious problem of Auto regressive process, thus
correct lag length determination is vital for perform VAR, VECM and Johansen
test of cointegraton. Various econometrics testing procedures such as unit root
tests, causality tests, cointegration tests and linearity tests involved the
determination of autoregressive lag length.

Lag length selection criteria are statistical methods used to determine the
appropriate number of lags to include in time series models, such as autoregressive
integrated moving average (ARIMA) models. The goal of the lag length selection
is to determine how many past observations (lags) to be included in the model to
optimize its performance.

Considerations for Lag Selection

 Data Characteristics: The nature of the data (e.g., stationarity, seasonality)


can influence which criterion is most suitable.
 Model Complexity: More complex models may fit the data better but can
lead to over fitting. Criteria that penalize complexity (like BIC) can help
mitigate this risk.
 Domain Knowledge: Understanding the underlying processes can guide the
selection of appropriate lags, making statistical criteria more meaningful.
 Multiple Criteria: It’s often helpful to evaluate models using several criteria
to gain a comprehensive view of model performance.

There are different lag length selection criterions developed for econometrics
analysis so far. This includes Akaike Information Criterion, Schwarz Information
Criterion(SIC), Hannan-Quinn Criterion (HQC), Final Prediction Error (FPE) and
CAIC.

1. Akaike Information Criterion (AIC)


 Purpose: Balances model fit and complexity
 Interpretation: Lower AIC values indicate a better model
 Formula: AIC=2k−2ln(L)

 k: number of parameters
 L: maximum likelihood estimate

2. Bayesian Information Criterion (BIC) or Schwarz Criterion (SIC)

♠ Purpose: Similar to AIC but applies a higher penalty for


complexity.
♠ Interpretation: Lower BIC values indicate better models.
♠ Formula: SIC= kln(n)−2(L)

 n: sample size
 k: number of parameters
 L: maximum likelihood estimate

3. Hannan-Quinn Information Criterion (HQIC)

Purpose : A compromise between AIC and BIC


Interpretation: Lower HQIC values indicate better models.
Formula : HQC = 2kln(ln(n))−2(L)

Notice: choosing the right number of lags is crucial for model accuracy &
interpretability. It often involves balancing complexity with fit, and using criteria
like AIC, BIC useful in making informed decisions. Each criterion can yield
different results, so it's wise to consider multiple criteria when selecting lags.
Brooks (2004) indicated that estimates of the model would be

Example Scenario

1. Suppose we have a time series model where we want to determine the optimal
lag length for an autoregressive model. We fit models with different lag lengths
(from 0 to 3) and obtain the following results:

Lag Log-Likelihood No of Parameters AIC BIC


Length HQC
0 -150.00 1 302.00
305.00 303.00
1 -140.00 2 284.00 286.00
289.00
2 -135.00 3 276.00 279.00
283.00
3 -130.00 4 262.00 266.00
271.00

Calculating Criteria
 AIC: = 2k−2(L)  HQC=2kln(ln(n))−2(L)
 BIC= kln(n)−2(L)
Where:  L = log-likelihood
 n = sample-size
 k = number of parameters
 let n=100
Therefore,

1. AIC for Lag 1:

 AIC=2(2)−2(−140)= 284

2. BIC for Lag 1:

 BIC= 2ln(100)−2(−140) ≈ 4.605(2)+280 = 289

3. HQC for Lag 1:


o HQC= 2(2)ln(ln(100))−2(−140)
o HQC = 4ln(ln(100)) ≈ 4 ln(4.605)≈ 4(1.526) = 286

Selecting the Best Lag Length

From the table, we see:

 AIC is minimized at lag length 3 (AIC = 262).


 BIC is also minimized at lag length 3 (BIC = 271).
 HQC is minimized at lag length 3 (HQC = 266).

4
2. Suppose we have a different time series model with the following results for lag
lengths from 0 to 4 and n = 100.

Lag Log-Likelihood Number of


Length Parameters
0 -200.00 1
1 -180.00 2
2 -175.00 3
3 -170.00 4
4 -168.00 5

Error Correction Mechanism

The Error Correction Mechanism (ECM) is a concept primarily used in time series analysis and
is often associated with models that involve co-integrated time series data. ECM helps adjust
short-term deviations from a long-term equilibrium relationship.

ECM is widely used in economic modeling, forecasting, and understanding relationships


between economic variables over time (Co-integration process).

Cointegration refers when two or more non-stationary time series are combined, they may
produce a stationary series. This indicates a long-term equilibrium relationship among the series.

If co-integration exists, it implies that even though the individual series may wander, they do so
around a stable relationship, which is captured by the ECM.

Cointegration Test

Cointegration tests are statistical methods used to determine whether a set of non-stationary time
series shares a long-term equilibrium relationship. Common tests include:

5
1. Engle-Granger Test: A two-step procedure where you first estimate a long-run
relationship and then test the residuals for stationarity.
2. Johansen Test: A multivariate approach that allows testing for multiple cointegration
relationships among several time series. It provides a more comprehensive analysis than
the Engle-Granger method.
3. Phillips-Ouliaris Test: Another test for co-integration that can be used as an alternative
to the Engle-Gr anger method.

Chapter 4
Introduction to Panel Data Model Analysis
1.1. Definition and Structure of Panel Data
Panel Data refers collection of quantities obtained across multiple individuals that are assembled
over time even intervals in time and ordered chronologically. Panel contains observation about
various cross sections across time. Panel data can be:
longitudinal data
combination of cross section and time series data
space and time dimension
In panel data, unlike in cross section data, the same cross-sectional unit (e.g family or a firm or a
state) is surveyed over time.
Panel Data Analysis is a statistical method used in different science including econometrics to
examine two dimensional panel data (cross section & time series data). The data are collected
usually overtime and across the same individual units such as firm, state and country after that
regression is conducted on these two dimensions.
A panel Dataset is a cross-sectional time-series dataset which provides repeated measurements
of a certain number of variables over a period of time on observed units such as individuals,
households, firms, cities or states.
A cross-sectional Data set consists of observations on a certain number of variables at certain
point of time whereas a time-series data set consists of a variable or several variables of
observations over a number of periods.

6
Table 4: Panel Data Set

1.2.Types of Panel Data


The types of panel data are given in the following table below.
Types of Panel Data Description
Balanced Panel • Number of observation for each entity for the
entire period is equal.

7
Unbalanced Panel • Number of observation is not the same for all
(N =T) subjects.
N = no of cross-sectional unit • resulted from missing observation due to
T = Time period Attrition & selection Bias
Short Panel Data/ Micro- panel Data • Number of items observed is more than the
(short & wide panel data, T<N) number of time periods (N>T).
Long Panel / Macro-Panel Data • Number of the time period is more than the
Long & narrow panel data, T > N number of individuals.
Dynamic Panel Data • Panel includes the lag value of the dependent
variable (Yt-1) as regressor.
• Applies the General Method of Moment (GMM)
estimation.

1.3.Advantage of Panel Data Analysis


 Accounts heterogeneity explicitly by allowing individual specific
 provides more informative data, more variability, less colinearity, more degree of
freedom & more efficiency
 suited for studying dynamic change (job turn over, labor mobility & un employment
 best to measure effects that can’t studied by pure cross section & time series data
 reduce bias that may be exist for aggregate data (panel focus on 1000’s of units)
1.4. Problems of Panel Data Analysis
 Endogeneity: arises when there is a correlation between the independent variables and
the error term in the regression model. This violates the assumption of exogeneity &
can lead to biased & inconsistent estimates. Techniques such as instrumental variable
estimation, fixed effects models, or random effects models can be employed to address
endogeneity.
 Heterogeneity: Panel data often exhibit heterogeneity across individuals or entities.
Failing to account for this heterogeneity can lead to biased results. Fixed effects or
random effects models, or more advanced techniques like the Mundlak approach, can
be used to address heterogeneity.

8
 Panel Attrition: happen where some individuals drop out of the study over time. This
can introduce selection bias & affect the representativeness of the panel. Appropriate
methods like attrition analysis, sample selection models, or Heckman correction can
be used to handle panel attrition. Panel attrition is due to measurement errors may arise
because of faulty responses due to unclear questions, memory errors, deliberate
distortion of responses (e.g. prestige bias), inappropriate informants, misrecording of
responses and interviewer effects.
 Autocorrelation: is serial correlation occurred due to the errors in the regression
model are correlated over time. Ignoring autocorrelation can lead to inefficient &
inconsistent estimates. Techniques like panel-corrected standard errors, generalized
least squares (GLS) & autoregressive distributed lag (ARDL) models can help address
autocorrelation.
1.5. ESTIMATION OF PANEL DATA REGRESSION MODELS
A. Fixed Effects Model (FEM)
The model also called Least Square Dummy Variable or covariance or covariate model. The
fixed effects model is widely used to control for omitted variables that are constant over the
period of time & vary across the units that is called unobserved heterogeneity or fixed effects.
It enables us to eliminate time-invariance unobserved errors that are specific to each observation.
 There are two widely used methods for estimating fixed effect models. They are:
 Least squares dummy variable (LSDV) estimator, for small no of entities)
 Time-demeaning or fixed estimator (process of transforming the original variables into
deviations from the group means of each variable, for large sample).
Consider the following Panel Data Model Specification.
Yit = β0i + β1 X1it + β2 X3it + …βk Xkit + uit
i = 1, 2, 3 … N and t = 1, 2. . . T
k = kth explanatory variable & uit = error term
Note: FEM uses dummy variable to capture both individual which reduces degree of freedom.
B. Random Effect Model (REM) or Error Component Model (ECM)
REM is assumed that the intercept of an individual unit is a randomly selected from a much
larger population with a constant mean value (̅̅̅). The individual intercept which capture
individuals random effect is then expressed as a deviation from this constant mean value. Unlike

9
FEM, ECM is economical in degrees of freedom since it isn’t estimate N cross-sectional
intercepts but rather it only estimate the mean value of the intercept & its variance. ECM is
appropriate in situations where the (random) intercept of each cross-sectional unit is uncorrelated
with the regressors. Random Effect Model assumes that β0i as a random variable with mean ̅̅̅
but not fixed over time as considered by FEM.
Yit = β0i + β1X1it + β2X2it + uit FEM

But in ECM, the intercept for individual observation is given by


β0i = ̅̅̅+ εi i = 1, 2. . . N
Where εi is a random error term with a mean value of zero and variance of σ 2
β0 implies that the observations in sample are a drawing from a much larger universe so that they
have a common mean value for the intercept & the individual differences in the intercept values
of each observation in the error term εi. That means the original equation becomes;
Yit = ̅̅̅+ εi+ β1X1it + β2 X3it + uit
= β1 + β2 X2it + β3 X3it + wit
wit = εi + uit
The composite error term wit consists of two components, εi , which is the cross-section, or
individual-specific error component and uit is the combined time series and cross-section error
component. The term error components model derives its name because the composite error term
wit consists of two error components.
The assumptions of ECM
 εi ∼ N(0, σ2ε  E(εi εj ) = 0 (i ≠ j)
 uit ∼ N(0, σ2u)  E(uit uis) = E(uit ujt ) = E(uit ujs) = 0
 E(εi uit ) = 0 (I ≠ j; t ≠ s).
Note: the individual error components are not correlated with each other & are not auto-
correlated across both cross-section & time series units. In FEM each cross-sectional unit has its
own (fixed) intercept value. In ECM, on the other hand, the intercept β1 represents the mean
value of all the (cross-sectional) intercepts and the error component εi represents the (random)
deviation of individual intercept from this mean value. However, εi is not directly observable; it
is what is known as an unobservable, or latent, variable.
Therefore,
E(wit ) = 0 var (wit ) = σ 2ε + σ 2u

10
2
If σ ε = 0, there is no variation among individuals from the population so that we can simply
pool all the (cross-sectional & time series) observations and just run the pooled regression.
the error term wit is assumed to be homoscedastic. Yet wit and wis (t ≠ s) are correlated; that is,
the error terms of a given cross-sectional unit at two different points in time are correlated.
corr (wit, wis) = σ2ε / σ 2ε + σ 2u
The correlation structure remains the same for all cross sectional units. Thus, OLS estimation
results inefficient estimators. It is advisable to use GLS (General Least Square) estimation
technique.
Summary:
 If N is large and T is small, and if the assumptions of ECM holds, ECM is preferred.
 If the individual error component εi & regressors are correlated, FEM is preferred.
 If T is large & N is small, no difference in the values of the parameters estimated by FEM
and ECM. So, the choice is based on computational convenience. i.e FEM may be
preferable.
 Formally, Hausman Test:
The null hypothesis underlying the Hausman test is that the FEM and ECM estimators do not
differ substantially. The test statistic developed by Hausman has an asymptotic χ2 distribution. If
the null hypothesis is rejected, the conclusion is
H0: no time fixed effects are needed H0: ECM is appropriate
H1: time fixed effects are needed H1: FEM is appropriate

 Decision rule: If F-calculate are less than F-tabulated we accept H0 & we reject
alternative hypothesis. Then, we failed to reject the null that all years’ coefficients are
jointly equal to zero therefore no time fixed effects are needed (⟾ ECM is preferred).

11

You might also like