Panel Data Assignment
Panel Data Assignment
Panel Data Assignment
Submitted to:
Sir Mudassir Uddin
TABLE OF CONTENTS
OBJECT
1. Introduction to Panel Data 1.1 What is Panel Data 1.2 Reason for Using Panel Data Importance of Fixed and Random Effects 2.1 The Fixed Effects Model 2.2 The Random Effects Model 2.3 Assessing the appropriateness of fixed effects and random effects estimation 2.4 When to Use fixed effects, random effects or Pooled OLS Model Panel Data Analysis Using STATA 9.1 3.1 Panel Data 3.2 Preference of STATA over Other Packages 3.3 Panel Data in STATA 9.1 a. Input the Data in STATA b. Declare the Cross-Section and Time Variables c. Carry out the Descriptive Statistic d. Outline the Main Variables in the Basic Sample Statistics e. Examine the Relationship between the Dependant and Independent Variables f. Conclusion Panel Data Models 4.1 Different Panel Data Models 4.2 Fixed Effect and Random Effect Model in STATA 4.3 Fixed Effect Models a. One-way Fixed Effect Models: Group Effects i. The Pooled OLS Regression Model ii. Least Squares Dummy Variable Models iii. Testing Fixed Group Effects (F-test) b. One-way Fixed Effect Models: Fixed Time Effect i. The Pooled OLS Regression Model ii. Least Squares Dummy Variable Models iii. Testing Fixed Time Effects (F-test) 4.4 Random Effect Models a. One-way Random Group Effect Model b. One-way Random Time Effect Model c. Testing Random Effect Models i. Testing Random Group Effects ii. Testing Random Time Effect
1
3 4 4 6 7 7 7 8 9 10 10 11 11 11 12 12 13 13 14 15 15 16 16 16 17 18 19 19 19 20 20 20 21 22 22 23
2.
3.
4.
Page i
6.
Page ii
OBJECT
Page 1
1. Describe what panel data is and the reasons for using it in this format 2. Assess the importance of fixed and random effects 3. Examine the Hausman test, which determines if fixed or random effects should be used. 4. Evaluate some panel data models
Page 2
Page 3
1.1
A data set containing observations on multiple phenomena observed over multiple time periods is called panel data. Panel data aggregates all the individuals, and analyzes them in a period of time. Alternatively, the second dimension of data may be some entity other than time. Whereas time series and cross-sectional data are both one-dimensional, panel data sets are two-dimensional. A longitudinal, or panel, data set follows a given sample of individuals over time, and thus provides multiple observations on each individual in the sample. Panel data have become widely available in both the developed and developing countries. In developing countries, there may not have a long tradition of statistical collection. It is of special importance to obtain original survey data to answer many significant and important questions.
1.2
Panel data sets for economic research possess several major advantages over conventional cross-sectional or time-series data sets. Some of the many reasons for using panel data include: More accurate inference of model parameters: Panel data usually contain more sample variability than cross-sectional data which may be viewed as a panel with T = 1, or time series data which is a panel with N = 1, hence improving the efficiency of econometric estimates Reduced collinearity: Panel data usually give the researcher a large number of data points, increasing the degrees of freedom and reducing the collinearity among explanatory variables Heterogeneity: Since panel data relate to individuals, firms, states, countries, etc.,over time, there is bound to be heterogeneity in these units. The techniques of panel data estimation can take such heterogeneity explicitly into account by allowing for individual-specific variables More receptive: Panel data has greater capacity for capturing the complexity of human behavior than a single cross-section or time series data. This includes constructing and testing more complicated behavioral hypotheses Minimized bias: By making data available for several thousand units, panel data can minimize the bias that might result if we aggregate individuals or firms into broad aggregates
Page 4
Controlling the impact of omitted variables: Panel data contain information on both the intertemporal dynamics and the individuality of the entities may allow one to control the effects of missing or unobserved variables. Uncovering dynamic relationships: With panel data, we can rely on the interindividual differences to reduce the collinearity between current and lag variables to estimate unrestricted time-adjustment patterns Complicated behavioral models: Panel data enables us to study more complicated behavioral models. For example, phenomena such as economies of scale and technological change can be better handled by panel data than by pure crosssection or pure time series data Generating more accurate predictions: Panel data can help to produce more accurate predictions for individual outcomes by pooling the data rather than generating predictions of individual outcomes using the data on the individual in question. If individual behaviors are similar conditional on certain variables, panel data provide the possibility of learning an individuals behavior by observing the behavior of others. Simplifying computation and statistical inference: Panel data involve at least two dimensions, a cross-sectional dimension and a time series dimension. Under normal circumstances one would expect that the computation of panel data estimator or inference would be more complicated than cross-sectional or time series data. However, in certain cases, the availability of panel data actually simplifies computation and inference. Analysis of nonstationary time series: If panel data are available, and observations among cross-sectional units are independent, then one can invoke the central limit theorem across cross-sectional units to show that the limiting distributions of many estimators remain asymptotically normal Measurement error: Measurement errors can lead to under-identification of an econometric model. The availability of multiple observations for a given individual or at a given time may allow a researcher to make different transformations to induce different and deducible changes in the estimators, hence to identify an otherwise unidentified model. In short, panel data can enrich empirical analysis in ways that may not be possible if we use only cross-section or time series data.
Page 5
2.
Page 6
2.1
Another type of panel model would have constant slopes but intercepts that differ according to the cross-sectional (group) unitfor example, the country. Although there are no significant temporal effects, there are significant differences among countries in this type of model. While the intercept is cross-section (group) specific and in this case differs from country to country, it may or may not differ over time. These models are called fixed effects models. Because i-1 dummy variables are used to designate the particular country, this same model is sometimes called the Least Squares Dummy Variable model
2.2
The random effects model is a regression with a random constant term. One way to handle the ignorance or error is to assume that the intercept is a random outcome variable. The random outcome is a function of a mean value plus a random error. But this cross-sectional specific error term vi, which indicates the deviation from the constant of the cross-sectional unit must be uncorrelated with the errors of the variables if this is to be modeled. The time series cross-sectional regression model is one with an intercept that is a random effect.
Under these circumstances, the random error vi is heterogeneity specific to a crosssectional unit. This random error vi is constant over time. Therefore, the random error eit is specific to a particular observation. For vi to be properly specified, it must be orthogonal to the individual effects. Because of the separate cross-sectional error term, these models are sometimes called one-way random effects models. Owing to this intrapanel variation, the random effects model has the distinct advantage of allowing for time-invariant variables to be included among the regressors.
2.3
Page 7
2.4
The following flowchart efficiently summarizes when to use fixed or random effects:
Page 8
3.
Page 9
3.1
PANEL DATA
Panel (or longitudinal) data are cross-sectional and time-series. There are multiple entities, each of which has repeated measurements at different time periods. Panel data may have group effects, time effects, or the both, which are analyzed by fixed effect and random effect models. A panel data set contains n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. Thus, the total number of observations is nT. Ideally, panel data are measured at regular time intervals (e.g., year, quarter, and month). Otherwise, panel data should be analyzed with caution. A short panel data set has many entities but few time periods (small T), while a long panel has many time periods (large T) but few entities.
3.2
We will use the statistical package STATA which has a rich variety of panel analytic procedures. It has fixed and random effects models, can handle balanced or unbalanced panels, and have one- or two-way random and fixed effects models. It also has both Hausman and Sargan tests for specification. STATA has Arellano, Bond and Bover's estimator for dynamic panel models, and can also handle groupwise heteroskedasticity in the random effects model. STATA mainly provides the following estimation methods:
xtreg xtregar xtgls xtpcse xtrchh xtivreg xtabond xtabond2 xttobit xtintreg xtlogit xtprobit xtcloglog xtpoisson xtnbreg Fixed-, between- and random-effects, and population-averaged linear models Fixed- and random-effects linear models with an AR(1) disturbance Panel-data models using GLS OLS or Prais-Winsten models with panel-corrected standard errors Hildreth-Houck random coefficients models Instrumental variables and two-stage least squares for panel-data models Arellano-Bond linear, dynamic panel data estimator Arellano-Bond system dynamic panel data estimator Random-effects tobit models Random-effects interval data regression models Fixed-effects, random-effects, & population-averaged logit models Random-effects and population-averaged probit models Random-effects and population-averaged cloglog models Fixed-effects, random-effects, & population-averaged Poisson models Fixed-effects, random-effects, & population-averaged negative binomial models
Page 10
3.3
We are interested in testing the effects of physicians' emigration rate to the UK (dependant variable Y) in relation with physicians' emigration rate to Canada (independent variable X) for the Middle Eastern countries of Egypt, Iran and Turkey from the years 1991 to 2004. These countries have approximately the same population per 1000 of individuals, and thus make a suitable data set for comparison. Consider the following dataset:
YEAR 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 EGYPT_UK 0.0035 0.0030 0.0024 0.0014 0.0019 0.0009 0.0009 0.0005 0.0007 0.0008 0.0008 0.0007 0.0007 0.0008 EGYPT_CAN 0.0043 0.0035 0.0030 0.0022 0.0021 0.0019 0.0023 0.0019 0.0016 0.0015 0.0013 0.0013 0.0013 0.0012 IRAN_UK 0.0004 0.0005 0.0004 0.0001 0.0002 0.0001 0.0001 0.0001 0.0002 0.0004 0.0004 0.0006 0.0006 0.0011 IRAN_CAN 0.0044 0.0044 0.0043 0.0028 0.0020 0.0018 0.0016 0.0013 0.0014 0.0014 0.0014 0.0013 0.0013 0.0014 TUR_UK 0.0003 0.0002 0.0002 0.0001 0.0001 0.0002 0.0001 0.0000 0.0001 0.0000 0.0001 0.0002 0.0002 0.0002 TUR_CAN 0.0011 0.0011 0.0011 0.0009 0.0009 0.0013 0.0007 0.0007 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004
a.
Command:
b.
Command:
Statistics > Time Series > Setup and Utilities > Declare Data to be Time Series Time Variable: year Panel ID variable: country Display format for the time variable: Yearly
Page 11
c.
xtdes
Command: Output:
xtdes country: 1, 2, ..., 3 n= 3 year: 1991, 1992, ..., 2004 T= 14 Delta(year) = 1; (2004-1991)+1 = 14 (country*year uniquely identifies each observation) Distribution of T_i: min 14 5% 14 25% 14 50% 14 75% 14 95% 14 max 14
Freq. Percent Cum. | Pattern ---------------------------------+--------------------------------3 100.00 100.00 | 11111111111111 ---------------------------------+--------------------------------3 100.00 | XXXXXXXXXXXXXX
d.
Command: Output:
Variable uk
xtsum uk canada
Mean
Std. Dev.
Min
Max
Observations N= n= T= N= n= T= 42 3 14 42 3 14
canada
0.001681 0.001115 0.0004 0.0044 0.000814 0.000743 0.0022 0.000889 0.000781 0.003881
Page 12
Assignment on Panel Data ACT - 672 e. Examine the Relationship between the Dependant and Independent Variables
Command:
xtline uk canada
Output:
1
.001.002.003.004
1990
1995
2000
2005
3
.001.002.003.004 0
1990
1995
2000
2005
YEAR UK
Graphs by Country
CANADA
f.
Conclusion
It is observed that a relationship may exist between the two variables (physicians emigration rate to the UK and physicians emigration rate to Canada) therefore, we will continue with an in-depth panel data analysis testing for fixed and random effects using the Hausman test, as well as testing for autocorrelation and heteroskedasticity
Page 13
4.
Page 14
4.1
Panel data models examine fixed and/or random effects of entity (individual or subject) or time. The core difference between fixed and random effect models lies in the role of dummy variables. If dummies are considered as a part of the intercept, this is a fixed effect model. In a random effect model, the dummies act as an error term. Fixed effects are tested by the (incremental) F test, while random effects are examined by the Lagrange Multiplier (LM) test (Breusch and Pagan 1980). If the null hypothesis is not rejected, the pooled OLS regression is favored. If one cross-sectional or time-series variable is considered (e.g., country, firm, and race), this is called a one-way fixed or random effect model. Two-way effect models have two sets of dummy variables for group and/or time variables (e.g., state and year).
4.2
We use the xtreg command for estimating the following four basic linear panel data models: Fixed effect model (Fixed-effect) Random effects model (Random-effect) Between groups effects model (Between-effect) Sample average model (Population -average).
Command:
xtreg depvar [varlist] [if exp], model_type [level (#)]
Where the level (#) option is used to set the level of significance, (the default value is 95%)
The model_type option corresponds with the following four kinds of models:
model_type Models
be fe re pa mle between-effects estimator fixed-effects estimator GLS random-effects estimator GEE population-averaged estimator maximum-likelihood random-effects estimator
Page 15
4.3
There are several strategies for estimating fixed effect models. The least squares dummy variable model (LSDV) uses dummy variables, whereas the within effect model does not. These strategies, of course, produce the identical parameter estimates of non-dummy independent variables. The between effect model fits the model using group and/or time means of dependent and independent variables without dummies.
a.
A one-way fixed group model examines group differences in intercepts. The LSDV for this fixed model needs to create as many dummy variables as the number of entities or subjects. When many dummies are needed, the within effect model is useful since it transforms variables using group means to avoid dummies. The between effect model uses group means of variables.
i.
First, fit the pooled regression model without any dummy variable
Command:
Page 16
Model 7.5293e-06 1 7.5293e-06 Residual .000017287 40 4.3217e-07 Total uk .000024816 41 6.0527e-07 Coef. Std. Err. t 4.17 -0.12
Comments:
The regression equation is mgrt_uk = -.0000223 + .3843646* mgrt_can This model does not fit the data well (F=17.42, p = 0.0002and R2=0.3034) We may, however, suspect if there is a fixed group effect producing different
intercepts across groups.
ii.
Least Squares Dummy Variable Regression Model (LSDV1) drops a dummy variable to get the model identified. LSDV1 produces correct ANOVA information, goodness of fit, parameter estimates, and standard errors. As a consequence, this approach is commonly used in practice.
Command: Output:
Source Model Residual Total
df
MS
Number of obs = 42 F( 3, 38) = 20.23 Prob > F = 0.0000 R-squared = 0.6149 Adj R-squared = 0.5845 Root MSE = .0005
Page 17
g1 g2 mgrt_can _cons
.0002241 3.40 0.002 .0002289 -1.12 0.268 .0880796 3.78 0.001 .0001491 -0.70 0.487
Comments:
LSDV1 fits the data better than does the pooled OLS. SSE decreases from .000017287 to 9.5559e-06 But R2 increases significantly from 0.3034 to 0.6149 Due to the dummies included, this model loses two degrees of freedom (from 40 to 38).
iii.
In a regression of , the null hypothesis is that all dummy parameters except for one for the dropped are zero. This hypothesis is tested by the F-test, which is based on loss of goodness-of-fit. The robust model in the following formula is LSDV (or within effect model) and the efficient model is the pooled regression.
The F statistic is computed as Fcal = (0.6149 - 0.3034)/(3 1) = 14.55985 (1 - 0.6149)/ (40 3 1) Ftab = F[.95,2,36] = 0.051366 Since Fcal > Ftab, therefore we reject the null hypothesis and conclude that the fixed group effect model is better than the pooled OLS model.
Page 18
Assignment on Panel Data ACT - 672 b. One-way Fixed Effect Models: Fixed Time Effect
A fixed time effect model investigates how time affects the intercept using time dummy variables. The logic and method are the same as those of the fixed group effect model.
i.
Similar to our previous section, we will fit a pooled regression model without any dummy variable and obtain the same results.
ii.
Command: Output:
Source Model Residual Total
mgrt_uk t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 mgrt_can _cons
df 14 27 41
Number of obs = 42 F( 14, 27) = 1.06 Prob > F = 0.4313 R-squared = 0.3546 Adj R-squared = 0.0200 Root MSE = .00077
[95% Conf. -.0016321 -.0016599 -.0017915 -.0018601 -.0015279 -.0018613 -.0018378 -.0019083 -.0017217 -.0016419 -.0015697 -.0014903 -.0014903 .0757848 -.0006448 Interval] .0012989 .0011972 .0010151 .0007876 .0010848 .0007515 .0007633 .0006789 .0008609 .0009399 .0010109 .0010903 .0010903 .6888916 .0012802
Std. Err. .0007142 .0006962 .0006839 .0006452 .0006367 .0006367 .0006339 .0006304 .0006293 .0006292 .0006289 .0006288 .0006288 .1494048 .0004691
-0.23 0.817 -0.33 0.742 -0.57 0.575 -0.83 0.413 -0.35 0.731 -0.87 0.391 -0.85 0.404 -0.98 0.338 -0.68 0.500 -0.56 0.582 -0.44 0.660 -0.32 0.753 -0.32 0.753 2.56 0.016 0.68 0.504
Page 19
Again we use the null hypothesis that all dummy parameters except for one for the dropped are zero. The F statistic is computed as Fcal = (0.3546- 0.3034)/(14 1) = 0.208201 (1 - 0.3546)/ (40 15 1) Ftab = F[.95,13,24] = 0.41319 Since Fcal < Ftab, therefore we fail to reject the null hypothesis and conclude that there is no fixed time effect.
4.4
A random effect model examines how group and/or time affect error variances. This model is appropriate for n individuals who were drawn randomly from a large population. Here we will focus on the feasible generalized least squares (FGLS) with variance component estimation methods.
a.
In STATA, the .xtreg command has the re option to fit the one-way random effect model. This produces the FGLS estimates.
Command:
iis g xtreg mgrt_uk mgrt_can, re theta The theta option reports an estimated theta
Output:
Random-effects Group variable (i): g GLS regression Number of obs = 42 Number of groups = 3 Obs per group: min = 14 avg = 14.0 max = 14 Wald chi2(1) = 15.19
R-sq: within = 0.2737 Between = 0.3568 overall = 0.3034 Random effects u_i ~ Gaussian
Page 20
b.
In Stata, we have to switch group and time variables using the .tsset command.
Command:
tsset year g
Output:
panel variable: year, 1991 to 2004 time variable: g, 1 to 3
Command: Output:
Random-effects GLS regression Number of obs = 42 Group variable (i): year Number of groups = 14 R-sq: within between overall Random effects corr(u_i, X) theta = 0.1952 = 0.7414 = 0.3034 u_i ~ Gaussian = 0 (assumed) =0 Obs per group: min avg max Wald chi2(1) = Prob > chi2 = = = 3 3.0 3
17.42 = 0.0000
Page 21
The Breusch-Pagan Lagrange multiplier (LM) test is designed to test random effects. The null hypothesis of the one-way random group effect model is that individual-specific or time-series error variances are zero . If the null hypothesis is not rejected, the pooled regression model is appropriate.
i.
In Stata, run the .xttest0 command right after estimating the one-way random group effect model.
Command:
quietly xtreg xttest0 mgrt_uk mgrt_can, re i(g)
Output:
Breusch and Pagan Lagrangian multiplier test for random effects: mgrt_uk[g,t] = Xb + u[g] + e[g,t] Estimated results: Var sd = sqrt(Var) -----------------+----------------------------mgrt_uk 6.05e-07 .000778 e 2.51e-07 .0005015 u 5.18e-07 .0007195 Test: Var(u) = 0 chi2(1) = Prob > chi2 = 43.56 0.0000
Comments:
Since the null hypothesis is rejected, therefore it is concluded that the pooled regression is inappropriate. Clearly, the test results indicate the existence of random effects.
Page 22
ii.
The null hypothesis of the one-way random time effect is that variance components for time are . In Stata, run the .xttest0 command right after estimating the one-way random time effect model.
Command:
tsset year g
Output:
panel variable: year, 1991 to 2004 time variable: g, 1 to 3
Command:
quietly xtreg xttest0 mgrt_uk mgrt_can, re i(year)
Output:
Breusch and Pagan Lagrangian multiplier test for random effects: mgrt_uk[year,t] = Xb + u[year] + e[year,t] Estimated results: | Var sd = sqrt(Var) ---------+----------------------------mgrt_uk | 6.05e-07 .000778 e | 5.93e-07 .0007702 u | 0 0 Test: Var(u) = 0 chi2(1) = Prob > chi2 = 6.38 0.0116
Comments:
Since the null hypothesis is accepted at .01 level of significance, therefore it is concluded that the variance components for time are zero.
Page 23
5.
HAUSMAN TEST
Page 24
5.1
How do we compare a fixed effect model and its counterpart random effect model? From the purely practical point of view, fixed-effects model often restricts a lot of freedom, especially for large number of cross-section panel data. In such a case a random effects model would seem more appropriate. On the other hand, fixedeffects model has a unique advantage, since we do not have the assumption that the individual effects and other explanatory variables related, while in the random effects model, this assumption is necessary.
5.2
HAUSMAN TEST
We can test the fixed effects u_i with other explanatory variables is related to serve as a fixed effect and random effects of model basis. Hausman test is such a test statistic. The Hausman specification test examines if the individual effects are uncorrelated with the other regressors in the model. Following are the steps involved in the Hausman Test: a. Estimate the fixed-effect model
Command:
tsset g year
Output:
panel variable: time variable: g, 1 to 3 year, 1991 to 2004
Command:
quietly xtreg mgrt_uk mgrt_can, fe estimates store fixed_group
b.
Command:
quietly xtreg mgrt_uk mgrt_can, re
c.
Command:
hausman fixed_group
Page 25
b = consistent under Ho and Ha; obtained from xtreg inconsistent under Ha, efficient under Ho; obtained from xtreg Ho: difference in coefficients not systematic = (b-B)'[(V_b-V_B)^(-1)](b-B) = 0.02 = 0.8839
chi2(1) Prob>chi2
Comments:
The test does not reject the null hypothesis, in favor of the random effect model. Therefore we conclude that a random effect model is more suitable.
Page 26
6.
CONCLUSION
Page 27
6.1
STATISTICAL PACKAGE
Four statistical packages can be used for panel data analysis; namely SAS, LIMDEP, SPSS and Stata. We have preferred to use Stata as the software of choice in this report, as it is very handy to manipulate panel data as well as being user-friendly.
6.2
DATA
A panel data set needs to be arranged in the long format in order to be used in STATA. If the number of groups (subjects) or time periods is extremely large, panel data models may be less useful because the null hypothesis of F test is too strong. Then, we may consider categorizing subjects to reduce the number of groups. If data are severely unbalanced, read output with caution and consider dropping subjects with many missing data points. This document assumes that data are balanced without missing values.
6.3
The Ordinary Least Squares Regression model for the pooled data did not prove to be a good fit with a very low R2 value. This implies that there may be
a fixed group effect producing different intercepts across groups.
b.
Fixed effect models are estimated by the least squares dummy variable (LSDV) regression and within effect model. LSDV has three approaches to avoid perfect multicollinearity. LSDV1 drops a dummy, LSDV2 suppresses the intercept, and LSDV3 includes all dummies and imposes restrictions instead. LSDV1 is commonly used since it produces correct statistics. That is why we only applied LSDV1 on this data. LSDV1 fits the data better than does the pooled OLS with a much higher R2 value.
c.
Page 28
d.
Fixed effects are tested by the F-test and random effects by the BreuschPagan Lagrange multiplier test.
i.
The result is significant; therefore we reject the null hypothesis and conclude that the fixed group effect model is better than the pooled OLS model.
ii.
The result is insignificant; therefore we fail to reject the null hypothesis and
iii.
The result is significant; therefore it is concluded that the pooled regression is inappropriate and there is existence of random effects.
iv.
The result is insignificant; therefore we fail to reject the null hypothesis and
v.
Hausman Test
The Hausman specification test compares a fixed effect model and a random effect model. If the null hypothesis of uncorrelation is rejected, the fixed effect model is preferred. In this case the null hypothesis is accepted, thus confirming the adequacy of the random effects model for this data as proved above.
Page 29