Notes 14
Notes 14
Specification searches
Experiments vs. non-experiments
o If we can do random controlled experiments, then we don’t need to worry about
omitted variables bias because the regressor of interest (treatment effect) is
random and uncorrelated with everything that might be omitted.
Controlled experiments are becoming more common in economics
Development projects may choose villages as treatment or control
villages
Policies can sometimes be separated randomly into treatment or control
groups
Is it ethical to withhold “treatment” if we know that it is likely to
be beneficial?
Of course, experimental economics has long put experimental subjects
into controlled settings randomly.
o Most often, we must use the “fallen fruit” of “natural experiments” or
observational data
Examples:
State policy differences such as the seat-belt law regressions we
looked at earlier in the semester
Cross-country growth regressions in which countries differ in
variables such as initial per-capita income that are supposed to
affect growth
In these cases, we must worry about selection and omitted-variable bias
Can we control for the other variables that are correlated with
selection into the “treatment group” (or with the regressor of
interest)?
If not, our results are biased
Idealized econometric project
o Theory tells us exactly which variables should be in the regression as controls
o All regressors are measured accurately
o We know about any endogeneity issues and can deal with them using
instrumental variables
o We know the appropriate structure of the error term
o In this case, we need only do one regression to complete the project
o None of these conditions is ever fully realized
That’s why we have tests for the various regression pathologies
That’s (one reason) why we have tests for significance of regressors
~ 172 ~
That’s why we look at our residuals for clues
That’s why we usually try linear and log-based models
That’s why we have to experiment with different lag lengths
In real research, we must deal with what Leamer calls “misspecification error” which,
like sampling error, generally causes our results to be imprecise
o Consider the regressions that you ran with the 254,654 Census observations on
further fertility of mothers with two children.
How much sampling error is there when N = 254,654? If all of our
assumptions are correct, then our estimates converge with the square root
of N, so the standard errors with this sample are divided by 500!
What are the likely relative magnitudes of sample error and
misspecification error in this exercise?
o Null hypotheses and maintained hypotheses
In any statistical test, we make lots of assumptions
Some of the assumptions are “givens” such as functional form, structure
of the error term, IID (or other assumed nature) of the sample, etc.
These are the “maintained hypotheses” that are assumed to be true
in the test.
We usually assume that we have made no misspecification errors
as a maintained hypothesis.
Some of the assumptions are tested.
These are the null hypothesis.
We are not sure that these are true; in fact, we usually expect to
disprove the null hypothesis.
o What does a hypothesis test do?
It measures the likelihood that such an extreme violation would occur if
both the null hypothesis and the maintained hypotheses are true.
However, we interpret evidence against this joint set of assumptions as
invalidating the null hypothesis, not the maintained hypothesis.
In fact, what we have found is evidence that the world is not as the null
and maintained hypotheses assume it is.
This could be due to the null hypothesis being false with the
maintained hypotheses true (which is what we always assume)
Or it could be that the maintained hypothesis (or one part of it) is
false and the null hypothesis is true (which is the essence of an
invalid test: we have made incorrect assumptions underlying the
test)
Or both could be false (an invalid test that gives the right answer)
~ 173 ~
By separating the assumptions into null and maintained classes, we
artificially define which ones we are going to blame for any failure of the
data to conform to the collective set of assumptions.
If we do this wrong, then we obviously can draw incorrect conclusions.
Leamer on functional form
o With a high-enough order polynomial, we can exactly fit the data!
~ 174 ~
~ 175 ~
Leamer is a proponent of “Bayesian” econometrics, which we may study next week
if there is interest.
o In Bayesian model, one specifies a “prior” distribution for the parameter before
beginning the analysis
o Then the evidence from the data is combined with the prior to calculate a
“posterior” distribution
o Criticized because you can get nearly any posterior distribution by varying your
prior.
o Shouldn’t your results reflect the evidence from the data and not your opinions?
Leamer’s point exactly!
Conventionally reported results reflect your opinion as much as a Bayesian
posterior, but you haven’t reported how your opinion conditioned your
results.
How do we solve this problem? Page 38:
Econometricians are still too optimistic about how much they know.
Robust standard errors are not a panacea because we still have inefficient
estimators.
Sensitivity analysis is crucial: show the mapping from assumptions to conclusions!
Experiments may be problematic in small samples if we don’t observe and control
for all the confounding variables:
“Interactive confounders” are variables that affect the effect of our variable of interest
on the dependent variable (needing interaction terms). Leaving these out is
problematic even in truly randomized experiments.
Do “data-generating processes” really exist? Are they stable? Are people rational
enough to yield predictable econometric relationships?
Modern computers and software have made the actual computations of econometrics
trivially easy, but the “thinking” part is just as hard as ever.
~ 177 ~
This is not going to change! Your generation of econometricians will face ever
greater temptation to “push the button” and get results without thinking about the
correct underlying assumptions, then publish them if they “look nice.”
Publication bias
If you look at econometric papers published in journals, most null hypotheses are
rejected.
o The papers published that accept the central null hypothesis tend to fail to reject
hypotheses that are widely believed to be false.
Are all economic hypotheses false? De Long and Lang build simple model to test:
o Size of test = = 0.05 = Pr reject| H 0 true .
~ 178 ~
o Power of test = q Pr accept| H 0 false
o Suppose that the true proportion of true null hypotheses is
Let a be a test statistic and let f (a) be its marginal significance level (p value)
Under the null hypothesis,
f a ~ U 0,1
Pr f a p 1 p
Under alternative hypothesis, f (a) follows some unknown distribution G so that
Pr f a p 1 G p . We assume 1 G p 1 p .
Share of test statistics that have p value less than or equal to p (= share of rejected
nulls at significance level p) should be
Pr f a p 1 p 1 1 G p
1 p 1 G p 1 G p
Pr f a p 1 G p
1 p 1 G p
Pr f a p
.
1 p
This gives an upper bound for .
For example, if = 1/2, then at least
Pr f a 0.2 0.50 0.10 : at least 10% of actual p values should be in
the range (0.80, 1.00).
~ 179 ~
These are point estimates. They can reject the null hypothesis that = 1/3 against the
alternative that it is < 1/3. Thus, they are quite confident that from the evidence of the
literature, < 1/3.
Why? They think probably publication bias.
~ 180 ~
o How much of a problem is this? Are the actual expected values and (especially)
standard errors of the distributions that different?
o We can use simulation to determine the properties of our standard test statistics
under the null hypothesis when the assumptions we usually make fail to hold.
Simulation methods
o Monte Carlo analysis is the simulation of the behavior of estimators under
controlled conditions that may deviate from the standard assumptions under
which it is used.
o Bootstrap methods apply simulation to a specific sample of data, re-running a
regression many times with either parametric or non-parametric error terms to
estimate the standard deviation of the test statistic under H0 (rather than using the
conventional standard error as an estimate).
Generating data for simulations
o Can use actual variables (as Lovell did in his second data-mining experiment
with macro variables) or can generate them “randomly”
o Error terms are always generated randomly.
o Random-number generators
No computer-generated sequence of numbers is truly random.
The way these generators work is to begin with a “seed,” then generate
new numbers in the sequence based on calculations such as remainders of
division by large prime numbers.
Same seed implies same sequence of numbers, so if you want to
control the process (especially during debugging) you can get the
same sequence again.
Default seed is usually taken from the seconds of the computer
clock or something like that: will not be the same on repeated
execution.
In Stata: runiform (or uniform) draws a random number from
(0, 1). rnormal(mean, std) draws from the normal distribution
with mean and standard deviation given.
Can generate normal variate as invnorm(runiform())
o We generate random set of e* and use them to compute y* under the null
hypothesis about given the values of x, which may be set to sample values,
generated randomly, or something else.
Implementing Monte Carlo
o Create repeated samples (how many? 1000? 10000? 100000?) of e* and y*.
o
For each sample, calculate the test statistic of interest: ˆ , se ˆ , t ˆ or anything
else.
o Accumulate the estimates in a new data set.
~ 181 ~
o Examine the properties of the estimates:
Mean to assess bias
Standard deviation to compare to estimated standard error
Quantiles to assess critical values or estimate p values for your estimates
Bootstrap standard errors
o If the assumptions of OLS are not valid for your sample, you can estimate the
standard errors of your OLS estimates by using a bootstrap technique
o Use your actual x variables, sample size, etc.
o Generate a sample of e* error terms
Can use a normal distribution based on the SEE as estimate of standard
deviation
Can use “re-sampling,” assigning random uˆi values to observation j.
o Calculate sample of y* values.
o Run regression of y* on actual x values
o Save estimated coefficients ˆ for the kth replication
k
~ 182 ~
replace x=l.x+a if id>1
* Run regression of y on x
reg y x
end
o Show single replication
o What can be retrieved? ereturn list shows available results
o Command to invoke simulation:
simulate b=_b[x] se=_se[x] r2=e(r2) , reps(1000) : spurious
o Creates data set with 1000 observations with variables b, se, r2
o Can now use summarize, centile, and histogram to look at behavior of estimates.
o Contrast dspurious with spurious to see effect of regression on integrated
variables.
~ 183 ~
o Not missing at random (NMAR): Probability that the observation/variable is
missing depends on the true value of that variable.
Methods of dealing with missing data
o Complete-case analysis
This is the default: Stata will simply delete any observations for which
one or more variables in the model are missing.
We lose information by doing this.
Example: Suppose that we are missing one observation on x out of ten
and that the coefficients based on the other nine observations are y = 5 +
10x. The missing observation has a y value of 25. By omitting this
observation, we are implicitly assuming that the x value is 2, so that it
will not have a residual and not add to the regression. If the univariate
distribution of x in the rest of the sample is such that a value of 2 seems
highly unlikely, then we are almost surely missing important information
about the relationship by ignoring this observation.
Complete-case analysis does not lead to bias if missingness does not
depend on y. (This is standard sample-selection problem that we have
dealt with before.)
o Available-case analysis
Regression coefficients and standard errors depend only on the sample
variances and covariances of the variables.
Even if y is missing for an observation, if x1 and x2 are available, we can
use that observation to contribute to the estimate of the variances of the x
variables and to their covariance.
This seems to use additional information, but has other problems and it
rarely used.
Because it uses different groups of observations, there is no
guarantee that XX has an inverse, so it may even be impossible
to calculate OLS estimate.
o Dummy-variable methods
Yi 0 1 X1i 2 X 2i ui
X1 is complete; X2 has some missing data.
1 if X 2 is missing,
Let M i
0 otherwise.
X 2i if M i 0,
Let X 20i
0 if M i 1.
Yi 0 0 1 X 1i 2 X 20i M i ui is biased for 1.
1 picks up the effect of unobserved variation in X2.
~ 184 ~
Yi 0 0 M i 1 X 1i 1 X 1i M i 2 X 20i ui is unbiased, but is difficult to
implement unless pattern of missingness is “block-style.”
o Imputation methods
If there is an irregular pattern in which several variables have missing
observations scattered through the sample (and the same observations do
not tend to be missing for all variables), then we have some information
about the observations for which a particular variable is missing based on
the observed values of other variables.
Imputation methods use the values of the other variables (and the pattern
of covariance between the observed and missing variables for the part of
the sample for which both are observed) to impute estimates of the
missing values.
Unconditional imputation replaces missing values by the means of the
variables.
This leads to bias in the coefficients because the other variables
that are correlated with the missing one have to carry “extra
weight” in predicting y for those observations in which the
missing X is set to its mean.
Conditional imputation based on other X variables
Use complete cases to estimate X 2i 0 1 X1i vi .
o Could use LDV model if appropriate.
Calculate single imputed values for missing observations as
X ˆ ˆ X .
2i 0 1 1i
~ 185 ~
Use complete cases to estimate X 2i 0 1 X1i 2Yi vi .
Note that we can (and must) include y here when we are using
random draws from the distribution rather than expected values.
Can use LDV methods if the missing variable is a dummy,
ordered, censored, etc.
Calculate m random imputed samples using X 2ij ˆ 0 ˆ 1 X1i ˆ 2Yi vij ,
where vij is a random draw from the estimated distribution of v, usually
normal with zero mean and variance equal to the estimated variance of v
based on residuals.
For each sample j, run the regression using imputed values:
Yi 0 1 X 1i 2 X 2 i ui and get the estimates ˆ ij and the squared
ˆ .
standard errors var ij
Combine the results of the m regressions as follows:
1 m
ˆ i ˆ ij
m j 1
ˆ 1 ˆ 1
m m 2
var i
m j 1
var ij
m 1 j 1
ˆ ij ˆ i
The parameter estimate is just the mean of the estimates for the m
imputed samples.
The variance is the mean of the estimated variances in the m
samples, plus the estimated variance of the parameter estimate
across the samples.
o This last term corrects the standard error for the
imputation process, adding variance to account for the
fact that the m imputations do not all lead to the same
answer.
o Because a highly uncertain imputation process is likely to
lead to wide variation in ˆ ij across samples, this
correction to the variance will be high when the
imputation process is imprecise.
Stata 11+ has an implementation of MI models with a “dashboard” to
control imputation regressions (which can be OLS, probit, tobit, ordered
probit, etc.) and the combined regression using the multiple imputations.
MICE works with MCAR of MAR data.
o Can also use ML methods to estimate missing-data models (not going to talk
about)
~ 186 ~
Models with varying parameters
We have talked a lot about Assumption #0: The same model applies to all observations.
o What if this is false and the model changes from one set of observations (either
over time or cross-sectionally) to another?
o We can model this by allowing some parameters of the model to vary across
observations.
o We have considerable experience with simple, deterministic forms of varying
parameters:
Dummy variables allow the constant term to differ for the set of
observations for which the dummy is turned on.
Interaction terms allow the effect of one variable to depend on the
magnitude of another (where one or both may be dummies).
Splitting samples at recognized breakpoints is another strategy.
o We now consider models in which the variation in the parameters is at least
partially random, especially over time.
Stationary random parameter models
Yt t X t ut ,
o
t Z t vt .
Yt X t X t Z t wt ,
o Substituting yields
w t ut v t X t .
o Our usual assumptions are that u and v are classical error terms that are
uncorrelated with one another. In that case, var w t u2 X t2 v2 and w is not
serially correlated unless u or v is.
o This model is heteroskedastic with variance a proportional to 1 + x2, where
v2
.
2u
o How to estimate?
Could use OLS with robust standard errors (did not exist when Maddala
wrote his book).
Maddala suggests ML with
Yt X t X t Z t
2
n 1 n 1 n
ln L K ln 2u ln 1 X t2 2 , with K
2 2 i 1 2 u i 1 1 X t2
an irrelevant constant.
Can do this with a two-step procedure:
~ 187 ~
For given , the and that maximize L are the WLS estimators
calculated by applying OLS to
Yt Xt Zt X t
.
1 X t
2
1 X t
2
1 X t2
Search over to find the value that yields the highest L with ()
and () calculated by WLS/OLS.
Switching regressions: two (or more) regimes with different parameters
o We considered the simple case of this with the Quandt likelihood-ratio (QLR)
test when we talked about nonstationarity due to breaks in S&W’s Chapter 14.
The QLR test statistic is the maximum of the Chow-test F statistic
considered over possible breakpoints within the middle 70% (or so) of the
sample.
S&W’s Table 14.6 gives the critical values for the QLR test statistic,
which does not follow a standard parametric distribution.
o More interesting case is where model can switch back and forth depending on
values of other variables.
Example: is economic response to oil-price increases different than oil-
price increases? One set of parameters when PO is positive and a
different set when it is negative.
This is simple case because there are no unknown parameters in
the switching rule.
More interesting case is where the switching rule involves
unknown parameters.
Suppose that the parameters are in regime 1 ( Yt 1 1 X t ut ) when
1Z1 k Z k c and in regime 2 ( Yt 2 2 X t ut ) when > c.
Error term may also differ between regimes.
Can estimate by ML, which is kind of like a regression (to determine the
and parameters) combined with a probit (to determine which regime
governs each observation)
o Another model of interest is the single-breakpoint model constraining the
function to be continuous over time.
Example: fitting a trend line to the log of a variable and allowing the
trend growth rate to change at some date without allowing the function
to jump at that date.
Let n0 be the breakpoint in the sample, so that Yt 1 1 X t ut for 1 t
n0; Yt 2 2 X t ut for n0 < t N.
~ 188 ~
Both regression lines must go through the point n0, so we must impose
the restriction 1 1 X n0 2 2 X n0 on the estimation. This is a simple
linear restriction that can be imposed in OLS by the usual means.
Adaptive regression: constant term is a random walk.
o This model was developed before the theory of integrated processes was well
understood.
o The model that they propose has issues with an integrated error term (and
dependent variable) that are better handled with differencing and (sometimes)
cointegration methods.
o Can look at the more interesting model where slope is a random walk as well.
Cannot estimate all t parameters t.
Can estimate one of them: suggestion is to estimate the last one (or one
after last)
For varying constant term: t t 1 vt . Let
Yt T t T X t ut
T
T X t ut v.
i t 1
i
~ 189 ~
We lose a lot of degrees of freedom in this model. In the limiting case of
all coefficients varying deterministically across units, we are just doing
separate time-series regressions for each unit.
o If the varying parameter is a slope coefficient and variation is random, then we
have a variant of the random effects model in which the variance of the “unit-
specific error component” for each unit depends on the values of x for that unit.
When to use varying-parameter models?
o Can almost always justify it.
o What do we really gain from modeling the variation in the coefficients rather
than putting in the error term?
If variation is systematic, then we have a better understanding of how the
effect of x on y depends on Z. This is the essence of interaction terms and
we know that they can be very useful.
If variation is random, then we may not gain too much, although
adaptive regression model is appealing and if there are large variations in
x, then we might want to take it into account if the coefficient of x varies
randomly.
~ 190 ~
The hazard rate is defined as the probability that the spell ends now conditional on the
fact that it has lasted this long:
Pr t T T t |T t F t t F t f t f t
o t lim lim .
t 0 t t 0 t S t S t 1 F t
Note similarity to inverse Mills ratio
d ln S t
o t because f (t) is –S (t)
dt
t
The integrated hazard function is t s ds.
0
S t e
t
o
o t ln S t
All of these functions can (obviously) be derived from one another, so f, F, S, , and
are all equivalent ways to characterize the hazard behavior of the model as a function of
current duration t.
Modeling the hazard rate:
o Constant hazard rate
t ,
ln S t k t ,
S t Ke t e t because S 0 1
With constant hazard rate, E t 1/ , so MLE of is 1/ t
o Positive or negative duration dependence
Greene’s T25.8 and F25.2 show several common choices for non-
constant functions
~ 191 ~
Weibull is a common one because depending on the parameter p it can be
increasing or decreasing with t.
~ 192 ~
o In case of hazard models, we use the analog of a frequency distribution:
What share of spells that lasted two weeks ended in the third week?
What share of spells that lasted three weeks ended in the fourth week?
Etc.
o Plot these as a function of duration to get empirical hazard function
o Advantages: no distributional assumption, can model unusual shapes
o Disadvantages: does not invoke smoothness assumptions that may be
appropriate, difficult to model effects of other variables
Quantile regression
We get so used to the basic idea of traditional regression analysis that we sometimes
forget important details about what we are doing.
o Standard regression estimates the conditional mean of y as a function of x.
o What about other properties of the conditional distribution of y?
o We sometimes talk about the estimated conditional standard deviation (SEE),
but rarely about any other attributes of the distribution.
If y follows a normal distribution, then we can calculate the whole
distribution from the mean and standard deviation.
If y is not normal, then we generally don’t know all the details of the
distribution.
o There may be much more useful information embodied in the conditional
distribution than just the mean.
o Consider Figure 1 from Koenker & Hallock:
Provides: quartiles, range, median, arithmetic and geometric means of
CEO compensation for each decile of firm size.
What would regression give us?
Equivalent of a line connecting the means (either arithmetic or
geometric if we used a log function)
This is an example of the kind of expanded view of the conditional
distribution that we can get from quantile regression, which looks at how
the quantiles of the distribution of the dependent variable depend on the
regressor.
~ 193 ~
Moments and quantiles as minimization problems:
n
y
2
o The unconditional mean is the value of that minimizes i
i 1
n
o Unconditional median is the value of m that minimizes y
i 1
i m
n
o Unconditional th quantile is the value of that minimizes y
i 1
i , where
x if x 0,
is the “tilted absolute value” function x
( 1) x if x 0.
Generalizing to the condition regression situation:
o In standard parametric regression, we let depend on x
o In quantile regression, we let the th quantile be a function of x:
n n
min yi x i , , which for the linear case is min yi x i .
i 1 i 1
~ 194 ~
Food expenditure as a function of income
OLS regression gives us dashed line
Condition median of distribution of food expenditure as linear function of
income is bold line.
Other lines are 0.05, 0.1, 0.25, 0.75, 0.9, 0.95 quantiles of distribution as
linear functions of income.
o Under standard OLS regression, the distribution of food expenditure conditional
on income would be assumed to be normal with mean given by the dashed line
and constant variance given by SEE2. (Regressing in log terms would allow
variance to be proportional to x.)
~ 195 ~
o Note effect of college graduates: Much less likely to have a very small baby
(strong effect at low quantiles) but not much more likely to have a very large
baby (little effect at upper quantiles)
~ 197 ~
aid at X (in the equation) and aid elsewhere (in the error term),
which lead to correlation between aid and the error
Effect of aid is likely biased downward because of this
o Could we randomize? Would any school be willing to increase aid for a random
selection of students to see which ones come? (Perhaps not)
The idea of RD is that we sometimes have arbitrary, discrete thresholds where people
who are nearly identical but on opposite sides of the threshold are treated differently.
o This allows us to estimate a “treatment effect” by comparing the nearly identical
people on the two sides of the line; we can think of these as natural experiments
o Examples:
Cutoff birthdays for school attendance: September 2 babies are a year
older when they start school than August 31 babies: does this affect their
outcomes?
Laws sometimes have arbitrary cutoff points: If unemployment insurance
lasts 26 weeks, is the likelihood of an unemployed worker taking a job
higher in the 27th week than the 26th week?
van der Klaauw: College X has arbitrary thresholds for awarding discrete
levels of aid: are students just above the threshold (who get more aid)
more likely to attend than nearly identical students just below the
threshold?
Manacorda et al.: Welfare program in Uruguay that was designed by
economists to be based on “predicted” income to avoid mis-reporting and
fluctuations in annual income. Households immediately on both sides of
the line are very similar, but one gets transfer and other doesn’t. Are
household getting transfer more likely to support the government?
Manacorda result:
~ 198 ~
Classic RD design is illustrated by the van de Klaauw paper’s Figure 1:
~ 199 ~
o The treatment here depends sharply on the value of the fully observable selection
variable S: people above S are in the treatment group and people below in the
control group.
o The gap at the selection value S is the effect of crossing the threshold, which
could be an unbiased measure of the effect of the treatment variable.
o Econometrically, we estimate the two relationships on both sides (which may or
may not have the same slope) and then estimate the treatment effect as
lim E y |S lim E y |S
S S S S
o We can estimate this in the simplest case as van der Klaauw’s equation (8):
yi Ti k Si i , where T is a treatment dummy and k(S) is the general
relationship between y and S ignoring the treatment (which could be a linear or
nonlinear function, shown as linear here).
o If the relationship between T and S is not sharp, then there may be some people
close to S who are put into the “wrong” category. This is the “fuzzy” RD design
illustrated by the selection criteria in Figure 2 of the paper:
~ 200 ~
o In this case, we have to use the “predicted T” rather than the actual T and our
identification of becomes:
o Aid offers depend on the thresholds, but also on need for “filers” who applied for
need-based aid:
~ 201 ~
o Clearly the thresholds are important determinants of the amount of aid offered,
especially for “non-filers,” so this suggests that RD at the thresholds might be a
useful way to identify the effect of aid on enrollment.
o For filers, estimated relationship between S and enrollment probability is shown
in Figure 7:
~ 202 ~
Section 15 Empirical Research Projects
Starting point: Question and data
Starting point always must be “What question am I trying to answer?”
o For thesis: something you can be interested in for a whole year
o Something that can be answered
Second consideration: “What data are available to help me find the answer?”
o Macro data
o Micro data from existing surveys
o Collecting your own data from surveys
Methods
Once you have the question and the data, you can carefully consider what method you
should use
Nature of dependent variable: continuous, limited?
o Might need to consider LDV models
What explanatory variables can you measure (and what is omitted)?
Are there endogeneity concerns?
o If yes, are appropriate instruments available to allow IV estimation?
Are there other concerns about the error term?
o Heteroskedasticity?
o Autocorrelation?
Are your data time series, cross section, pooled, or panel?
o Appropriate models for each, including stationarity concerns
What is the appropriate specification?
o Functional form
o Scaling and/or differencing to make the variables comparable
~ 203 ~
Writing the paper
Introduction
o What is the question?
o How do you go about answering it?
o What do you conclude?
Theory section
o What does economic theory tell us about the question?
o What variables should be in the regression?
o What considerations does theory suggest about functional form (e.g., CRTS)?
Literature review
o May come before theory section
o Who else has explored this question and what did they find?
Methods and data section
o What estimation methods and tests are you proposing to use?
Why are these methods appropriate?
o What data do you have (and not have)?
What issues of measurement might be important?
Results section
o Regression tables with basic description of results
o Text must read as a narrative, referring to tables but not relying on them to tell
the story.
Analysis/interpretation/discussion section
o What do the results mean?
o Are there simulated experiments using your model that would help the reader
understand your results?
o How strong are the results?
o Issues of internal and external validity: is it safe to draw conclusions based on
your results?
Conclusion
o What do you conclude from your analysis?
o What additional work remains to be done in future research?
~ 204 ~