Tutorial Week 6: Stata Data Analysis Examples on Panel data regression models: Pooled model
A researcher is interested in examining the effect of public spending on education on
educational outcomes for the years 1992 through 1998. It is the district-level analogue of
the school-level data used by Papke (2005).
- The response variable of interest in this question is:
the percentage of fourth graders in local district receiving a passing score on
a standardized math test in year .
- The key explanatory variable is:
the real expenditures per pupil in the district in year .
- Other independent variables are
real expenditures per pupil in the district.
total enrolment rate in district in year .
is the percentage of students in the district in year eligible for the school lunch
program. (So is a proxy of the district-wide poverty rate.)
- The model of interest is
= + . . . + + ( ) + + ( ) +
+ +
1
Tutorial Week 6: Stata Data Analysis Examples on Panel data regression models: Pooled model
.The data is in the data file MATHPNL.dta, open the data file in Stata and answer the
following questions:
a. How many cross-section units and how many time series periods are in the data set?
Is the panel data balanced?
b. Estimate a pooled regression which include an intercept along with the year dummies
to allow to have a nonzero expected value.
c. Is the sign of the lunch coefficient what you expected? Interpret the magnitude of the
coefficient.
d. What are the estimated effects of the spending variables? Is the sign of the lunch
coefficient what you expected? Interpret the magnitude of the coefficient.
e. Does the district poverty rate have a large impact on test pass rates? Explain.
f. Use first differencing to estimate the model in part (c). The simplest approach is to
allow an intercept in the first-differenced equation and to include dummy variables for
the years 1994 through 1998. Interpret the coefficient on the spending variable.
g. Now, add one lag of the spending variable to the model and reestimate using first
differencing. Note that you lose another year of data, so you are only using changes
starting in 1994. Discuss the coefficients and significance on the current and lagged
spending variables.
h. Obtain heteroskedasticity-robust standard errors for the first-differenced regression in
part (g). How do these standard errors compare with those from part (g) for the
spending variables?