0% found this document useful (0 votes)

6 views33 pages

Notes 14

The document discusses issues related to specification searches and misspecification error in econometric analysis. It covers topics like experiments versus observational data, assumptions made in hypothesis testing, functional form selection, and the idea that econometric results reflect researchers' opinions as much as the data.

Uploaded by

zenith6505

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views33 pages

Notes 14

Uploaded by

zenith6505

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Section 14 Advanced Topics

Specification searches
 Experiments vs. non-experiments
o If we can do random controlled experiments, then we don’t need to worry about
omitted variables bias because the regressor of interest (treatment effect) is
random and uncorrelated with everything that might be omitted.
 Controlled experiments are becoming more common in economics
 Development projects may choose villages as treatment or control
villages
 Policies can sometimes be separated randomly into treatment or control
groups
 Is it ethical to withhold “treatment” if we know that it is likely to
be beneficial?
 Of course, experimental economics has long put experimental subjects
into controlled settings randomly.
o Most often, we must use the “fallen fruit” of “natural experiments” or
observational data
 Examples:
 State policy differences such as the seat-belt law regressions we
looked at earlier in the semester
 Cross-country growth regressions in which countries differ in
variables such as initial per-capita income that are supposed to
affect growth
 In these cases, we must worry about selection and omitted-variable bias
 Can we control for the other variables that are correlated with
selection into the “treatment group” (or with the regressor of
interest)?
 If not, our results are biased
 Idealized econometric project
o Theory tells us exactly which variables should be in the regression as controls
o All regressors are measured accurately
o We know about any endogeneity issues and can deal with them using
instrumental variables
o We know the appropriate structure of the error term
o In this case, we need only do one regression to complete the project
o None of these conditions is ever fully realized
 That’s why we have tests for the various regression pathologies
 That’s (one reason) why we have tests for significance of regressors

~ 172 ~
 That’s why we look at our residuals for clues
 That’s why we usually try linear and log-based models
 That’s why we have to experiment with different lag lengths
 In real research, we must deal with what Leamer calls “misspecification error” which,
like sampling error, generally causes our results to be imprecise
o Consider the regressions that you ran with the 254,654 Census observations on
further fertility of mothers with two children.
 How much sampling error is there when N = 254,654? If all of our
assumptions are correct, then our estimates converge with the square root
of N, so the standard errors with this sample are divided by 500!
 What are the likely relative magnitudes of sample error and
misspecification error in this exercise?
o Null hypotheses and maintained hypotheses
 In any statistical test, we make lots of assumptions
 Some of the assumptions are “givens” such as functional form, structure
of the error term, IID (or other assumed nature) of the sample, etc.
 These are the “maintained hypotheses” that are assumed to be true
in the test.
 We usually assume that we have made no misspecification errors
as a maintained hypothesis.
 Some of the assumptions are tested.
 These are the null hypothesis.
 We are not sure that these are true; in fact, we usually expect to
disprove the null hypothesis.
o What does a hypothesis test do?
 It measures the likelihood that such an extreme violation would occur if
both the null hypothesis and the maintained hypotheses are true.
 However, we interpret evidence against this joint set of assumptions as
invalidating the null hypothesis, not the maintained hypothesis.
 In fact, what we have found is evidence that the world is not as the null
and maintained hypotheses assume it is.
 This could be due to the null hypothesis being false with the
maintained hypotheses true (which is what we always assume)
 Or it could be that the maintained hypothesis (or one part of it) is
false and the null hypothesis is true (which is the essence of an
invalid test: we have made incorrect assumptions underlying the
test)
 Or both could be false (an invalid test that gives the right answer)

~ 173 ~
 By separating the assumptions into null and maintained classes, we
artificially define which ones we are going to blame for any failure of the
data to conform to the collective set of assumptions.
 If we do this wrong, then we obviously can draw incorrect conclusions.
 Leamer on functional form
o With a high-enough order polynomial, we can exactly fit the data!

 So what is the right function? A, B, or C?

 We can only answer that question by applying our judgment.
o All econometric analysis is a combination of calculation and
interpretation: this is unavoidable!
o More so in economics than in “hard” sciences?
 Leamer, page 36, 37:

~ 174 ~
~ 175 ~
 Leamer is a proponent of “Bayesian” econometrics, which we may study next week
if there is interest.
o In Bayesian model, one specifies a “prior” distribution for the parameter before
beginning the analysis
o Then the evidence from the data is combined with the prior to calculate a
“posterior” distribution
o Criticized because you can get nearly any posterior distribution by varying your
prior.
o Shouldn’t your results reflect the evidence from the data and not your opinions?
 Leamer’s point exactly!
 Conventionally reported results reflect your opinion as much as a Bayesian
posterior, but you haven’t reported how your opinion conditioned your
results.
 How do we solve this problem? Page 38:

Angrist & Pischke and Leamer’s response

 “Credibility revolution” in econometrics?
o Better and more data
o “Fewer distractions”
 Functional form and GLS methods often don’t matter
o Better research design
 Quasi-experimental methods: IV, D-in-D, RD
 Randomized trials: STAR and development studies
o More transparent discussion of research design
o “Extreme bounds” analysis
 Has not caught on much
~ 176 ~
 Leamer’s pet: test all possible specifications and look at the ones that are
least favorable to your preferred hypothesis
 Leamer: “Tantalus on the road to Asymptopia”

 Econometricians are still too optimistic about how much they know.
 Robust standard errors are not a panacea because we still have inefficient
estimators.
 Sensitivity analysis is crucial: show the mapping from assumptions to conclusions!
 Experiments may be problematic in small samples if we don’t observe and control
for all the confounding variables:

 “Interactive confounders” are variables that affect the effect of our variable of interest
on the dependent variable (needing interaction terms). Leaving these out is
problematic even in truly randomized experiments.
 Do “data-generating processes” really exist? Are they stable? Are people rational
enough to yield predictable econometric relationships?
 Modern computers and software have made the actual computations of econometrics
trivially easy, but the “thinking” part is just as hard as ever.

~ 177 ~
 This is not going to change! Your generation of econometricians will face ever
greater temptation to “push the button” and get results without thinking about the
correct underlying assumptions, then publish them if they “look nice.”

Lovell’s “data mining” experiment

 Stepwise regression
o How does it work
o Formally available in Stata
o Informally practiced by many econometricians
 Lovell’s Monte Carlo experiment
o Create a large set of orthogonal regressors that are unrelated to the dependent
variable
o Regress dependent variable on c of these and choose the best 2 t statistics
o Table 1 shows results
o How often would be expect k independent tests to turn up no significant t statistic
at the  level? (1 –)k
o Lovell’s rule of thumb test statistic: choosing best k out of c candidate regressors
give a true significance level of   1  1  
ˆ
c /k
, where ̂ is the putative
significance level.
o Second experiment uses macroeconomic regressors that are correlated (and
nonstationary) and shows that it is almost always possible to find regressors with
“significant” t statistics even when the dependent variable is orthogonal. (There
is some spurious regression effect here also because his variables are
nonstationary.)

Publication bias
 If you look at econometric papers published in journals, most null hypotheses are
rejected.
o The papers published that accept the central null hypothesis tend to fail to reject
hypotheses that are widely believed to be false.
 Are all economic hypotheses false? De Long and Lang build simple model to test:
o Size of test =  = 0.05 = Pr  reject| H 0 true  .
~ 178 ~
o Power of test = q  Pr accept| H 0 false 
o Suppose that the true proportion of true null hypotheses is 

Fail to reject Reject Total

H0 true 0.95 0.05 
H0 false (1 – q)(1 – ) q (1 – ) 1–
Total 1  q    q  0.05   q   0.05  q  

 Let a be a test statistic and let f (a) be its marginal significance level (p value)
 Under the null hypothesis,
f  a  ~ U  0,1
Pr  f  a   p   1  p
 Under alternative hypothesis, f (a) follows some unknown distribution G so that
Pr  f  a   p   1  G  p  . We assume 1  G  p   1  p .
 Share of test statistics that have p value less than or equal to p (= share of rejected
nulls at significance level p) should be
Pr  f  a   p    1  p   1    1  G  p  
  1  p   1  G  p     1  G  p  

Pr  f  a   p   1  G  p  

1  p   1  G  p  
Pr  f  a   p 
 .
1  p 
 This gives an upper bound for .
 For example, if  = 1/2, then at least
Pr  f  a     0.2  0.50  0.10 : at least 10% of actual p values should be in
the range (0.80, 1.00).

~ 179 ~
 These are point estimates. They can reject the null hypothesis that  = 1/3 against the
alternative that it is < 1/3. Thus, they are quite confident that from the evidence of the
literature,  < 1/3.
 Why? They think probably publication bias.

Simulation, Monte Carlo, and bootstrap methods

 In a few special cases, with appropriate assumptions, we know that actual distributions
of the test statistics that we use. In some additional cases, we can approximate the
asymptotic distributions of these statistics.
o However, most of the time these assumption are probably dubious.

~ 180 ~
o How much of a problem is this? Are the actual expected values and (especially)
standard errors of the distributions that different?
o We can use simulation to determine the properties of our standard test statistics
under the null hypothesis when the assumptions we usually make fail to hold.
 Simulation methods
o Monte Carlo analysis is the simulation of the behavior of estimators under
controlled conditions that may deviate from the standard assumptions under
which it is used.
o Bootstrap methods apply simulation to a specific sample of data, re-running a
regression many times with either parametric or non-parametric error terms to
estimate the standard deviation of the test statistic under H0 (rather than using the
conventional standard error as an estimate).
 Generating data for simulations
o Can use actual variables (as Lovell did in his second data-mining experiment
with macro variables) or can generate them “randomly”
o Error terms are always generated randomly.
o Random-number generators
 No computer-generated sequence of numbers is truly random.
 The way these generators work is to begin with a “seed,” then generate
new numbers in the sequence based on calculations such as remainders of
division by large prime numbers.
 Same seed implies same sequence of numbers, so if you want to
control the process (especially during debugging) you can get the
same sequence again.
 Default seed is usually taken from the seconds of the computer
clock or something like that: will not be the same on repeated
execution.
 In Stata: runiform (or uniform) draws a random number from
(0, 1). rnormal(mean, std) draws from the normal distribution
with mean and standard deviation given.
 Can generate normal variate as invnorm(runiform())
o We generate random set of e* and use them to compute y* under the null
hypothesis about  given the values of x, which may be set to sample values,
generated randomly, or something else.
 Implementing Monte Carlo
o Create repeated samples (how many? 1000? 10000? 100000?) of e* and y*.
o  
For each sample, calculate the test statistic of interest: ˆ , se ˆ , t ˆ or anything

else.
o Accumulate the estimates in a new data set.

~ 181 ~
o Examine the properties of the estimates:
 Mean to assess bias
 Standard deviation to compare to estimated standard error
 Quantiles to assess critical values or estimate p values for your estimates
 Bootstrap standard errors
o If the assumptions of OLS are not valid for your sample, you can estimate the
standard errors of your OLS estimates by using a bootstrap technique
o Use your actual x variables, sample size, etc.
o Generate a sample of e* error terms
 Can use a normal distribution based on the SEE as estimate of standard
deviation
 Can use “re-sampling,” assigning random uˆi values to observation j.
o Calculate sample of y* values.
o Run regression of y* on actual x values
o Save estimated coefficients ˆ for the kth replication
k

o Repeat K times with different randomly generated error terms

o Examine the distribution of ̂ by calculating the standard deviation: this is the
bootstrap standard error.
o Look at the 2.5th and 97.5th percentiles of the distribution: these are the critical
values for a two-tailed 5% hypothesis test.
 Monte Carlo demonstration of Granger-Newbold spurious regression result
o Do-file spurious.do:
 program spurious

 drop _all
 set obs 100
 g id=_n
 tsset id

 * Generate y variable as random walk

 g e=rnormal(0,1)
 g y=e if id==1
 replace y=l.y+e if id>1

 * Generate x variable as independent random walk

 g a=rnormal(0,1)
 g x=a if id==1

~ 182 ~
 replace x=l.x+a if id>1

 * Run regression of y on x

 reg y x

 end

o Show single replication
o What can be retrieved? ereturn list shows available results
o Command to invoke simulation:
 simulate b=_b[x] se=_se[x] r2=e(r2) , reps(1000) : spurious
o Creates data set with 1000 observations with variables b, se, r2
o Can now use summarize, centile, and histogram to look at behavior of estimates.
o Contrast dspurious with spurious to see effect of regression on integrated
variables.

Methods for coping with missing data

 Missing data problems are very common in econometrics.
o In surveys, some people omit questions or have undecipherable responses.
o In longitudinal surveys, attrition usually occurs
o In databases compiled for other purposes, they often don’t care if some variables
are missing: Reed database missing class ranks, high-school grades, etc.
o In macro data, sometimes there is a change in how the series is defined
 Not exactly missing data, but how do you “splice” the two series?
 Best way to splice is to regress overlapping observations and use fitted
values for shorter series.
 If not enough overlapping observations to run regression, then can use
pivot observation to join series
 Example: price index changing base years. If value in 2002 is 125
in 1990 dollars and 95 in 2005 dollars, then you can multiply all
of the 1990-dollar observations by 95/125 to convert to 2005
dollars.
 Key question that informs missing-data problem: Why are the data missing?
o Missing completely at random (MCAR): Probability that the
observation/variable is missing is unrelated to any variable in the analysis.
o Missing at random (MAR): Probability that the observation/variable is missing
is unrelated to the missing variable, but may be related to other, observed
variables.

~ 183 ~
o Not missing at random (NMAR): Probability that the observation/variable is
missing depends on the true value of that variable.
 Methods of dealing with missing data
o Complete-case analysis
 This is the default: Stata will simply delete any observations for which
one or more variables in the model are missing.
 We lose information by doing this.
 Example: Suppose that we are missing one observation on x out of ten
and that the coefficients based on the other nine observations are y = 5 +
10x. The missing observation has a y value of 25. By omitting this
observation, we are implicitly assuming that the x value is 2, so that it
will not have a residual and not add to the regression. If the univariate
distribution of x in the rest of the sample is such that a value of 2 seems
highly unlikely, then we are almost surely missing important information
about the relationship by ignoring this observation.
 Complete-case analysis does not lead to bias if missingness does not
depend on y. (This is standard sample-selection problem that we have
dealt with before.)
o Available-case analysis
 Regression coefficients and standard errors depend only on the sample
variances and covariances of the variables.
 Even if y is missing for an observation, if x1 and x2 are available, we can
use that observation to contribute to the estimate of the variances of the x
variables and to their covariance.
 This seems to use additional information, but has other problems and it
rarely used.
 Because it uses different groups of observations, there is no
guarantee that XX has an inverse, so it may even be impossible
to calculate OLS estimate.
o Dummy-variable methods
 Yi  0  1 X1i  2 X 2i  ui
 X1 is complete; X2 has some missing data.
1 if X 2 is missing,
 Let M i  
0 otherwise.
 X 2i if M i  0,
 Let X 20i  
0 if M i  1.
 Yi 0  0  1 X 1i   2 X 20i  M i  ui is biased for 1.
 1 picks up the effect of unobserved variation in X2.

~ 184 ~
 Yi  0   0 M i  1 X 1i  1 X 1i M i  2 X 20i  ui is unbiased, but is difficult to
implement unless pattern of missingness is “block-style.”
o Imputation methods
 If there is an irregular pattern in which several variables have missing
observations scattered through the sample (and the same observations do
not tend to be missing for all variables), then we have some information
about the observations for which a particular variable is missing based on
the observed values of other variables.
 Imputation methods use the values of the other variables (and the pattern
of covariance between the observed and missing variables for the part of
the sample for which both are observed) to impute estimates of the
missing values.
 Unconditional imputation replaces missing values by the means of the
variables.
 This leads to bias in the coefficients because the other variables
that are correlated with the missing one have to carry “extra
weight” in predicting y for those observations in which the
missing X is set to its mean.
 Conditional imputation based on other X variables
 Use complete cases to estimate X 2i  0  1 X1i  vi .
o Could use LDV model if appropriate.
 Calculate single imputed values for missing observations as
X  ˆ  ˆ X .
2i 0 1 1i

 Use full sample to estimate Yi  0  1 X 1i  2 X 2 i  ui .

 This procedure is consistent if data are MCAR.
 Standard errors are problematic because we don’t take account of
the imputed nature of the data and the error in measurement that
results.
 Conditional imputation based on other X variables and y
 Can include y in the imputation regression.
 Improves quality of imputation if missing X is highly correlated
with y.
 Leads to bias in OLS regressions of filled-in model.
o Multiple imputation with combined equations (MICE)
 Instead of replacing missing observation with single conditional
expectation based on imputation regression, we construct multiple
samples with stochastic imputations: Expected value of missing X plus
random draw from the error term of the imputation regression, which
includes both X variables and y.

~ 185 ~
 Use complete cases to estimate X 2i  0  1 X1i  2Yi  vi .
 Note that we can (and must) include y here when we are using
random draws from the distribution rather than expected values.
 Can use LDV methods if the missing variable is a dummy,
ordered, censored, etc.
 Calculate m random imputed samples using X 2ij  ˆ 0  ˆ 1 X1i  ˆ 2Yi  vij ,
where vij is a random draw from the estimated distribution of v, usually
normal with zero mean and variance equal to the estimated variance of v
based on residuals.
 For each sample j, run the regression using imputed values:
Yi  0  1 X 1i  2 X 2 i  ui and get the estimates ˆ ij and the squared
 ˆ .
standard errors var ij  
 Combine the results of the m regressions as follows:
1 m
 ˆ i   ˆ ij
m j 1

 
 ˆ  1  ˆ  1
   
m m 2
 var i 
m j 1
var ij 
m  1 j 1
ˆ ij  ˆ i

 The parameter estimate is just the mean of the estimates for the m
imputed samples.
 The variance is the mean of the estimated variances in the m
samples, plus the estimated variance of the parameter estimate
across the samples.
o This last term corrects the standard error for the
imputation process, adding variance to account for the
fact that the m imputations do not all lead to the same
answer.
o Because a highly uncertain imputation process is likely to
lead to wide variation in ˆ ij across samples, this
correction to the variance will be high when the
imputation process is imprecise.
 Stata 11+ has an implementation of MI models with a “dashboard” to
control imputation regressions (which can be OLS, probit, tobit, ordered
probit, etc.) and the combined regression using the multiple imputations.
 MICE works with MCAR of MAR data.
o Can also use ML methods to estimate missing-data models (not going to talk
about)

~ 186 ~
Models with varying parameters
 We have talked a lot about Assumption #0: The same model applies to all observations.
o What if this is false and the model changes from one set of observations (either
over time or cross-sectionally) to another?
o We can model this by allowing some parameters of the model to vary across
observations.
o We have considerable experience with simple, deterministic forms of varying
parameters:
 Dummy variables allow the constant term to differ for the set of
observations for which the dummy is turned on.
 Interaction terms allow the effect of one variable to depend on the
magnitude of another (where one or both may be dummies).
 Splitting samples at recognized breakpoints is another strategy.
o We now consider models in which the variation in the parameters is at least
partially random, especially over time.
 Stationary random parameter models
Yt  t X t  ut ,
o
t    Z t  vt .
Yt  X t  X t Z t  wt ,
o Substituting yields
w t  ut  v t X t .
o Our usual assumptions are that u and v are classical error terms that are
uncorrelated with one another. In that case, var  w t   u2  X t2 v2 and w is not
serially correlated unless u or v is.
o This model is heteroskedastic with variance a proportional to 1 + x2, where
v2
 .
2u
o How to estimate?
 Could use OLS with robust standard errors (did not exist when Maddala
wrote his book).
 Maddala suggests ML with
Yt  X t  X t Z t 
2
n 1 n 1 n
ln L  K  ln 2u   ln 1  X t2   2  , with K
2 2 i 1 2 u i 1 1  X t2
an irrelevant constant.
 Can do this with a two-step procedure:

~ 187 ~
 For given , the  and  that maximize L are the WLS estimators
calculated by applying OLS to
Yt Xt Zt X t
  .
1  X t
2
1  X t
2
1  X t2
 Search over  to find the value that yields the highest L with ()
and () calculated by WLS/OLS.
 Switching regressions: two (or more) regimes with different parameters
o We considered the simple case of this with the Quandt likelihood-ratio (QLR)
test when we talked about nonstationarity due to breaks in S&W’s Chapter 14.
 The QLR test statistic is the maximum of the Chow-test F statistic
considered over possible breakpoints within the middle 70% (or so) of the
sample.
 S&W’s Table 14.6 gives the critical values for the QLR test statistic,
which does not follow a standard parametric distribution.
o More interesting case is where model can switch back and forth depending on
values of other variables.
 Example: is economic response to oil-price increases different than oil-
price increases? One set of parameters when PO is positive and a
different set when it is negative.
 This is simple case because there are no unknown parameters in
the switching rule.
 More interesting case is where the switching rule involves
unknown parameters.
 Suppose that the parameters are in regime 1 ( Yt  1  1 X t  ut ) when
  1Z1    k Z k  c and in regime 2 ( Yt  2  2 X t  ut ) when  > c.
 Error term may also differ between regimes.
 Can estimate by ML, which is kind of like a regression (to determine the
 and  parameters) combined with a probit (to determine which regime
governs each observation)
o Another model of interest is the single-breakpoint model constraining the
function to be continuous over time.
 Example: fitting a trend line to the log of a variable and allowing the
trend growth rate to change at some date without allowing the function
to jump at that date.
 Let n0 be the breakpoint in the sample, so that Yt  1  1 X t  ut for 1  t
 n0; Yt  2  2 X t  ut for n0 < t  N.

~ 188 ~
 Both regression lines must go through the point n0, so we must impose
the restriction 1  1 X n0   2  2 X n0 on the estimation. This is a simple
linear restriction that can be imposed in OLS by the usual means.
 Adaptive regression: constant term is a random walk.
o This model was developed before the theory of integrated processes was well
understood.
o The model that they propose has issues with an integrated error term (and
dependent variable) that are better handled with differencing and (sometimes)
cointegration methods.
o Can look at the more interesting model where slope is a random walk as well.
 Cannot estimate all t parameters  t.
 Can estimate one of them: suggestion is to estimate the last one (or one
after last)
 For varying constant term: t  t 1  vt . Let
Yt  T    t  T   X t  ut
T
 T  X t  ut  v.
i  t 1
i

 When we write the model in terms of T, observation T – 1 has

additional variance relative to T because of change in  from T – 1 to T.
Observation T – 2 has yet more variance because the  is two changes
away from T. Thus, we end up with a WLS estimator that weights the
most recent observations most heavily and earlier observations less.
 There will also be correlation between the composite error terms because
of the accumulation of the parameter changes.
o This is an intuitively attractive idea for regressions that you are using for
forecasting but don’t know if the parameters are stable over time.
 Most recent observations are the most relevant for the forecast, so we
weight them the most heavily.
 Observations in the distant past are not totally irrelevant, but are less
important so we include them with lower weights.
 Another class of models is panel-data models in which each cross-sectional unit has a
different parameter value:
o If the varying parameter is the intercept term and the variation is deterministic,
then this is the fixed-effects model.
o If the varying parameter is the intercept term and the variation is random, this is
the random-effects model.
o If the varying parameter is a slope coefficient and the variation is deterministic,
then this is a variation on fixed effects where the unit dummies interact with the
variable whose coefficient is changing.

~ 189 ~
 We lose a lot of degrees of freedom in this model. In the limiting case of
all coefficients varying deterministically across units, we are just doing
separate time-series regressions for each unit.
o If the varying parameter is a slope coefficient and variation is random, then we
have a variant of the random effects model in which the variance of the “unit-
specific error component” for each unit depends on the values of x for that unit.
 When to use varying-parameter models?
o Can almost always justify it.
o What do we really gain from modeling the variation in the coefficients rather
than putting in the error term?
 If variation is systematic, then we have a better understanding of how the
effect of x on y depends on Z. This is the essence of interaction terms and
we know that they can be very useful.
 If variation is random, then we may not gain too much, although
adaptive regression model is appealing and if there are large variations in
x, then we might want to take it into account if the coefficient of x varies
randomly.

Duration and hazard-rate models

 We have encountered duration problems before: when we considered the censored
distribution of unemployment spells in a sample where some are ongoing, for example.
o In these models, the focus was on what other variables determine the length
(duration) of the spell.
 The formal analysis of hazard (or survival) models focuses not only on the effects of
other variables, but on modeling the probability that a spell will end as a function of its
current length.
o Does it become more or less likely that something will happen when it has not
happened for a long time? Earthquakes, divorce, end of a strike, success in job
search, survival after events are examples.
 In hazard analysis, we think of a duration event as a sequence of opportunities to end,
with a certain probability (hazard) of ending at each time that may depend on other
variables and on the current duration of the event.
 Let T be the “spell length” variable with density f (t). (Normal distribution not
appropriate because T must be non-negative.)
t
o F  t    f  s  ds  Pr T  t  is the probability that the spell is no longer than t.
0

o S  t   1  F  t   Pr T  t  is the survival function: the probability that a spell is

at least length t.

~ 190 ~
 The hazard rate is defined as the probability that the spell ends now conditional on the
fact that it has lasted this long:
Pr  t  T  T  t |T  t  F  t  t   F  t  f t  f t 
o   t   lim  lim   .
t 0 t t 0 t S  t  S t  1  F t 
 Note similarity to inverse Mills ratio
d ln S  t 
o t    because f (t) is –S (t)
dt
t
 The integrated hazard function is   t      s  ds.
0

S t   e
 t 
o
o   t    ln S  t 
 All of these functions can (obviously) be derived from one another, so f, F, S, , and 
are all equivalent ways to characterize the hazard behavior of the model as a function of
current duration t.
 Modeling the hazard rate:
o Constant hazard rate
  t   ,
 ln S  t   k  t ,
S  t   Ke t  e t because S  0   1
 With constant hazard rate, E  t   1/  , so MLE of  is 1/ t
o Positive or negative duration dependence
 Greene’s T25.8 and F25.2 show several common choices for non-
constant  functions

~ 191 ~
 Weibull is a common one because depending on the parameter p it can be
increasing or decreasing with t.

 Estimation of survival models

o We estimate these models by ML:
ln L  
uncensored
ln f  t |   
censored
ln S  t | 
observations observations
o
 
uncensored
ln   t |    all
ln S  t | 
observations observations

 Including exogenous variables

o We usually want other variables to condition the survival/hazard functions
o One common model:  i  e  X i  replacing constant  in Weibull function (or
exponential function)
o Note that x must be constant over the spell (such as personal characteristics) or
model become more complex. (You would need to know x through entire spell in
order to model different hazards at different moments during spell.)
 Nonparametric models
o What do we mean by nonparametric?
 No assumption of a specific functional form or probability distribution

~ 192 ~
o In case of hazard models, we use the analog of a frequency distribution:
 What share of spells that lasted two weeks ended in the third week?
 What share of spells that lasted three weeks ended in the fourth week?
 Etc.
o Plot these as a function of duration to get empirical hazard function
o Advantages: no distributional assumption, can model unusual shapes
o Disadvantages: does not invoke smoothness assumptions that may be
appropriate, difficult to model effects of other variables

Quantile regression
 We get so used to the basic idea of traditional regression analysis that we sometimes
forget important details about what we are doing.
o Standard regression estimates the conditional mean of y as a function of x.
o What about other properties of the conditional distribution of y?
o We sometimes talk about the estimated conditional standard deviation (SEE),
but rarely about any other attributes of the distribution.
 If y follows a normal distribution, then we can calculate the whole
distribution from the mean and standard deviation.
 If y is not normal, then we generally don’t know all the details of the
distribution.
o There may be much more useful information embodied in the conditional
distribution than just the mean.
o Consider Figure 1 from Koenker & Hallock:
 Provides: quartiles, range, median, arithmetic and geometric means of
CEO compensation for each decile of firm size.
 What would regression give us?
 Equivalent of a line connecting the means (either arithmetic or
geometric if we used a log function)
 This is an example of the kind of expanded view of the conditional
distribution that we can get from quantile regression, which looks at how
the quantiles of the distribution of the dependent variable depend on the
regressor.

~ 193 ~
 Moments and quantiles as minimization problems:
n

 y  
2
o The unconditional mean is the value of  that minimizes i
i 1

n
o Unconditional median is the value of m that minimizes y
i 1
i m

n
o Unconditional th quantile is the value of  that minimizes   y
i 1
 i   , where

 x if x  0,
 is the “tilted absolute value” function   x   
 (   1) x if x  0.
 Generalizing to the condition regression situation:
o In standard parametric regression, we let  depend on x
o In quantile regression, we let the th quantile be a function of x:
n n
min    yi    x i ,   , which for the linear case is min    yi  x i   .
i 1 i 1

o Because the  function is non-differentiable, we can’t use basic calculus methods

to solve this, but we can use methods developed for linear programming models
to find minimum pretty efficiently.
 Sample output of quantile regression model
o There will be a separate regression for each quantile that we are interested in.
o Figure 3 from Koenker & Hallock:

~ 194 ~
 Food expenditure as a function of income
 OLS regression gives us dashed line
 Condition median of distribution of food expenditure as linear function of
income is bold line.
 Other lines are 0.05, 0.1, 0.25, 0.75, 0.9, 0.95 quantiles of distribution as
linear functions of income.
o Under standard OLS regression, the distribution of food expenditure conditional
on income would be assumed to be normal with mean given by the dashed line
and constant variance given by SEE2. (Regressing in log terms would allow
variance to be proportional to x.)

 A multivariate example: Figure 4 shows baby birth weight as a function of mother’s

variables:
o Note that each variable can have distinct pattern of effects on different quantiles
of the distribution.
o For example, boy babies tend to be larger, but especially at the top end of the
distribution. That suggests that the difference is driven more by really big boys
than by really little girls.
o Can’t get that nuance out of an OLS regression.
o High-school graduation has across-the-board effect on all parts of the
distribution.

~ 195 ~
o Note effect of college graduates: Much less likely to have a very small baby
(strong effect at low quantiles) but not much more likely to have a very large
baby (little effect at upper quantiles)

 In Stata: qreg dvar indvars , quantile(0.5) will do 0.5 quantile.

~ 196 ~
 Example: Reed GPA as dependent variable
o Reed qreg.dta
o Show reg uggpa irdr satm100 satv100 hsgpa female if humfresh
o Show qreg with quantile(0.5) for MAD regression estimator
o Ask: Which of these variables would you expect to impact the top end of the
grade distribution more or less than the bottom?
o Show reedqregs.doc for results of various quantiles
o Show Reed qregs.xlsx for diagram (old version) of effects of irdr on quantiles of
grades

Regression discontinuity models

 Identification is always difficult: endogeneity is always a threat and instruments are rare
o Randomized experiments are probably the best way to avoid endogeneity, but
are not always feasible
o Sometimes we can find “natural experiments” that allow us to effectively control
for the things we cannot measure
 Example: we can’t control people’s genetic structure (yet) but we can
examine identical twins
 The case that is examined in van der Klaauw (2002): Effect of increased financial aid on
probability of enrollment.
o Intuition says that more aid (or lower cost, in general)  higher likelihood of
enrollment, other things held constant
o If we could measure all of the factors that go into the enrollment decision, we
could estimate this directly
 This would include the complete set of colleges to which the student was
admitted and the amount of aid/cost at each
 It would also include all of the relevant characteristics of the student,
both “objective” (test scores, high-school grades), “subjective” (essays,
interviews, recommendations), and “preferential” (student’s preferences
about location and characteristics of school, experience during campus
visit, etc.)
 Obviously, we can never measure all of these, so they go into the error
term. If they are correlated with the amount of aid, then the estimated
effect of aid will be biased.
 Why would these be correlated with aid?
 Unmeasured factors would increase aid at College X, but
probably also increase aid elsewhere
 Those who are offered high aid packages at X may be less likely to
come unless we control for the unobservable factors that affect

~ 197 ~
aid at X (in the equation) and aid elsewhere (in the error term),
which lead to correlation between aid and the error
 Effect of aid is likely biased downward because of this
o Could we randomize? Would any school be willing to increase aid for a random
selection of students to see which ones come? (Perhaps not)
 The idea of RD is that we sometimes have arbitrary, discrete thresholds where people
who are nearly identical but on opposite sides of the threshold are treated differently.
o This allows us to estimate a “treatment effect” by comparing the nearly identical
people on the two sides of the line; we can think of these as natural experiments
o Examples:
 Cutoff birthdays for school attendance: September 2 babies are a year
older when they start school than August 31 babies: does this affect their
outcomes?
 Laws sometimes have arbitrary cutoff points: If unemployment insurance
lasts 26 weeks, is the likelihood of an unemployed worker taking a job
higher in the 27th week than the 26th week?
 van der Klaauw: College X has arbitrary thresholds for awarding discrete
levels of aid: are students just above the threshold (who get more aid)
more likely to attend than nearly identical students just below the
threshold?
 Manacorda et al.: Welfare program in Uruguay that was designed by
economists to be based on “predicted” income to avoid mis-reporting and
fluctuations in annual income. Households immediately on both sides of
the line are very similar, but one gets transfer and other doesn’t. Are
household getting transfer more likely to support the government?
 Manacorda result:

~ 198 ~
 Classic RD design is illustrated by the van de Klaauw paper’s Figure 1:

~ 199 ~
o The treatment here depends sharply on the value of the fully observable selection
variable S: people above S are in the treatment group and people below in the
control group.
o The gap at the selection value S is the effect of crossing the threshold, which
could be an unbiased measure of the effect of the treatment variable.
o Econometrically, we estimate the two relationships on both sides (which may or
may not have the same slope) and then estimate the treatment effect as
  lim E  y |S   lim E  y |S 
S S S S

o We can estimate this in the simplest case as van der Klaauw’s equation (8):
yi    Ti  k  Si   i , where T is a treatment dummy and k(S) is the general
relationship between y and S ignoring the treatment (which could be a linear or
nonlinear function, shown as linear here).
o If the relationship between T and S is not sharp, then there may be some people
close to S who are put into the “wrong” category. This is the “fuzzy” RD design
illustrated by the selection criteria in Figure 2 of the paper:

~ 200 ~
o In this case, we have to use the “predicted T” rather than the actual T and our
identification of  becomes:

o Aid offers depend on the thresholds, but also on need for “filers” who applied for
need-based aid:

~ 201 ~
o Clearly the thresholds are important determinants of the amount of aid offered,
especially for “non-filers,” so this suggests that RD at the thresholds might be a
useful way to identify the effect of aid on enrollment.
o For filers, estimated relationship between S and enrollment probability is shown
in Figure 7:

o Note general downward slope of the relationship between S and probability of

enrollment: Why?
o Note jumps at the threshold levels: these are the measured effects of the discrete
change in aid associated with crossing a threshold: students on opposite sides of
the line are very similar except that one group gets more aid than the other.
o Result is a statistically strong and economically significant effect of aid on
enrollment: The elasticity for filers is 0.86, which is larger than most estimates in
the literature obtained by traditional means (though not of those estimates that
have access to additional data such as competing aid offers). The elasticity for
non-filers is only 0.13, which is consistent with our expectation (and other
evidence).

~ 202 ~
Section 15 Empirical Research Projects
Starting point: Question and data
 Starting point always must be “What question am I trying to answer?”
o For thesis: something you can be interested in for a whole year
o Something that can be answered
 Second consideration: “What data are available to help me find the answer?”
o Macro data
o Micro data from existing surveys
o Collecting your own data from surveys

Methods
 Once you have the question and the data, you can carefully consider what method you
should use
 Nature of dependent variable: continuous, limited?
o Might need to consider LDV models
 What explanatory variables can you measure (and what is omitted)?
 Are there endogeneity concerns?
o If yes, are appropriate instruments available to allow IV estimation?
 Are there other concerns about the error term?
o Heteroskedasticity?
o Autocorrelation?
 Are your data time series, cross section, pooled, or panel?
o Appropriate models for each, including stationarity concerns
 What is the appropriate specification?
o Functional form
o Scaling and/or differencing to make the variables comparable

Estimation, diagnostic testing, re-estimation

 What did you learn from the first regression?
 Are there issues in the residuals or diagnostics based on the coefficients or residuals that
suggest that your assumptions are incorrect?
o Look for outliers and consider why they do not fit
o (Errors in data)
 Can you test the underlying assumptions formally? Are they OK?

~ 203 ~
Writing the paper
 Introduction
o What is the question?
o How do you go about answering it?
o What do you conclude?
 Theory section
o What does economic theory tell us about the question?
o What variables should be in the regression?
o What considerations does theory suggest about functional form (e.g., CRTS)?
 Literature review
o May come before theory section
o Who else has explored this question and what did they find?
 Methods and data section
o What estimation methods and tests are you proposing to use?
 Why are these methods appropriate?
o What data do you have (and not have)?
 What issues of measurement might be important?
 Results section
o Regression tables with basic description of results
o Text must read as a narrative, referring to tables but not relying on them to tell
the story.
 Analysis/interpretation/discussion section
o What do the results mean?
o Are there simulated experiments using your model that would help the reader
understand your results?
o How strong are the results?
o Issues of internal and external validity: is it safe to draw conclusions based on
your results?
 Conclusion
o What do you conclude from your analysis?
o What additional work remains to be done in future research?

~ 204 ~

Michael A Bailey - Real Econometrics - The Right Tools To Answer Important Questions 2nd Edition OXFORD UNIVERSITY PRESS - Libgenli
No ratings yet
Michael A Bailey - Real Econometrics - The Right Tools To Answer Important Questions 2nd Edition OXFORD UNIVERSITY PRESS - Libgenli
656 pages
Baze University Brochure.6ba370ec
No ratings yet
Baze University Brochure.6ba370ec
15 pages
Problem Set 1
No ratings yet
Problem Set 1
5 pages
Gower Handbook of Internal Communication PDF
100% (1)
Gower Handbook of Internal Communication PDF
496 pages
Currys Pyramid Model
No ratings yet
Currys Pyramid Model
10 pages
Sample Exam With Solutions. Econometrics II 2015.
No ratings yet
Sample Exam With Solutions. Econometrics II 2015.
15 pages
Solutions To Sample Final Exam ECO2151
No ratings yet
Solutions To Sample Final Exam ECO2151
7 pages
(Ebook PDF) Real Econometrics: The Right Tools To Answer Important Questions 2Nd Edition
No ratings yet
(Ebook PDF) Real Econometrics: The Right Tools To Answer Important Questions 2Nd Edition
52 pages
Past Paper 2019
No ratings yet
Past Paper 2019
7 pages
Wooldridge 6e Ch09 SSM
No ratings yet
Wooldridge 6e Ch09 SSM
8 pages
Econ 4
No ratings yet
Econ 4
92 pages
Applied Econometrics Lecture 1: Introduction
No ratings yet
Applied Econometrics Lecture 1: Introduction
34 pages
(Ebook PDF) Real Econometrics: The Right Tools To Answer Important Questions 2nd Edition PDF Download
100% (8)
(Ebook PDF) Real Econometrics: The Right Tools To Answer Important Questions 2nd Edition PDF Download
58 pages
Mock Exam 2 Solutions Mark Scheme
No ratings yet
Mock Exam 2 Solutions Mark Scheme
11 pages
Empirical Methods in Microeconomics
No ratings yet
Empirical Methods in Microeconomics
3 pages
Mock Exam Solution Empirical Methods For Finance
No ratings yet
Mock Exam Solution Empirical Methods For Finance
6 pages
Exam Questions
No ratings yet
Exam Questions
3 pages
Metrics Jan 2021
No ratings yet
Metrics Jan 2021
10 pages
07 - Natural Experiment (Part 2) PDF
No ratings yet
07 - Natural Experiment (Part 2) PDF
90 pages
Econometrics Exam
No ratings yet
Econometrics Exam
8 pages
Syllabus
No ratings yet
Syllabus
8 pages
Problem Set 1
No ratings yet
Problem Set 1
4 pages
Econ 140 Berkeley Section 12 Handout
No ratings yet
Econ 140 Berkeley Section 12 Handout
4 pages
Causal Inference: Yu Xie University of Michigan
No ratings yet
Causal Inference: Yu Xie University of Michigan
51 pages
3334 Exam Cheat Sheet
No ratings yet
3334 Exam Cheat Sheet
26 pages
518 2023 05 23 Econometrics - 08052023b
No ratings yet
518 2023 05 23 Econometrics - 08052023b
11 pages
Final Paper Guide For PS, Spring : e Source File For This Document Is Not Yet Available at
No ratings yet
Final Paper Guide For PS, Spring : e Source File For This Document Is Not Yet Available at
13 pages
ps8 +fall2013
No ratings yet
ps8 +fall2013
6 pages
04 16 Simple Regression
No ratings yet
04 16 Simple Regression
47 pages
EC313 Assignment2 W12 Sol
No ratings yet
EC313 Assignment2 W12 Sol
4 pages
Econometrics Solutions
No ratings yet
Econometrics Solutions
11 pages
Type It Nicely (Latex or Word With Equation Editor) - Upload The Word or PDF File in Blackboard. Scanned Handwritten Problem Sets Are Not Allowed and Will Not Be Graded
No ratings yet
Type It Nicely (Latex or Word With Equation Editor) - Upload The Word or PDF File in Blackboard. Scanned Handwritten Problem Sets Are Not Allowed and Will Not Be Graded
3 pages
Econometrics: Autocorrelation: What Happens If The Error Terms Are Correlated?
No ratings yet
Econometrics: Autocorrelation: What Happens If The Error Terms Are Correlated?
43 pages
ECO311 Practice Questions 1
No ratings yet
ECO311 Practice Questions 1
5 pages
Autocorrelation
No ratings yet
Autocorrelation
36 pages
CH 07 Specification and Data Issues TQT
No ratings yet
CH 07 Specification and Data Issues TQT
45 pages
Model Choice and Specification Analysis
No ratings yet
Model Choice and Specification Analysis
46 pages
Stock Watson 4E AnswersToReviewTheConcepts
No ratings yet
Stock Watson 4E AnswersToReviewTheConcepts
34 pages
Econ361 - Assignment1 Solution PDF
No ratings yet
Econ361 - Assignment1 Solution PDF
7 pages
Assessing Studies Based On Multiple Regression: Solutions To Exercises
No ratings yet
Assessing Studies Based On Multiple Regression: Solutions To Exercises
3 pages
Econ 103 2023 MidTerm Practice1 - With - Solutions
No ratings yet
Econ 103 2023 MidTerm Practice1 - With - Solutions
19 pages
Metrics Aug2020
No ratings yet
Metrics Aug2020
15 pages
Econometric Evaluation Social Programs
No ratings yet
Econometric Evaluation Social Programs
29 pages
Delhi School of Economics Past Year Paper
No ratings yet
Delhi School of Economics Past Year Paper
3 pages
Section13 PDF
No ratings yet
Section13 PDF
7 pages
12th B BSS 7th Sem ECON 405 2021
No ratings yet
12th B BSS 7th Sem ECON 405 2021
3 pages
ECON6001 HW1 Fall2024
No ratings yet
ECON6001 HW1 Fall2024
4 pages
Econometrics 2 Exam Answers
67% (3)
Econometrics 2 Exam Answers
6 pages
14 - 382 - Pset - 5 (1) - Merged
No ratings yet
14 - 382 - Pset - 5 (1) - Merged
9 pages
Chapter 4
No ratings yet
Chapter 4
62 pages
Final Exam 2020 Online V1
No ratings yet
Final Exam 2020 Online V1
6 pages
Note Applied IV Est
No ratings yet
Note Applied IV Est
11 pages
This Are Some Solutions
No ratings yet
This Are Some Solutions
33 pages
Exam 2021
No ratings yet
Exam 2021
7 pages
GMU Econ535-Applied Econometrics Final Exam Spring 2023 Solutions
No ratings yet
GMU Econ535-Applied Econometrics Final Exam Spring 2023 Solutions
13 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
29 pages
IE Questions
No ratings yet
IE Questions
6 pages
CBCS Core - Introductory Econometrics Semester 4th
No ratings yet
CBCS Core - Introductory Econometrics Semester 4th
28 pages
Romer 5e Solutions Manual 01
No ratings yet
Romer 5e Solutions Manual 01
22 pages
ECO2020 ILNUncertainty
No ratings yet
ECO2020 ILNUncertainty
37 pages
Linear Rational Expectations Models - Charles H - Whiteman - University of Minnesota Press, Minneapolis, 1983
No ratings yet
Linear Rational Expectations Models - Charles H - Whiteman - University of Minnesota Press, Minneapolis, 1983
160 pages
Notes On The Identification of VARs Using External Instruments - Michele Piffer
No ratings yet
Notes On The Identification of VARs Using External Instruments - Michele Piffer
21 pages
Previewpdf
No ratings yet
Previewpdf
72 pages
Lawson - Current Debates in Economics - Book
No ratings yet
Lawson - Current Debates in Economics - Book
315 pages
Computer Science Project
No ratings yet
Computer Science Project
41 pages
Adaptive Learning Systems in Mathematics Classrooms: January 2018
No ratings yet
Adaptive Learning Systems in Mathematics Classrooms: January 2018
18 pages
Lesson 4 Computing The Variance of A Discrete Probability Distribution
No ratings yet
Lesson 4 Computing The Variance of A Discrete Probability Distribution
13 pages
Hulp Bij Thesis Spss
100% (3)
Hulp Bij Thesis Spss
7 pages
Thesis New
No ratings yet
Thesis New
72 pages
An Overview of A Well-Structured Essay: Created By: Darren Chiang-Schultheiss English Department Fullerton College
No ratings yet
An Overview of A Well-Structured Essay: Created By: Darren Chiang-Schultheiss English Department Fullerton College
11 pages
202004160626023624rajiv Saksena Advance Statistical Inference
No ratings yet
202004160626023624rajiv Saksena Advance Statistical Inference
31 pages
Consumers Buying Behaviour Towards Branded Tea'S
No ratings yet
Consumers Buying Behaviour Towards Branded Tea'S
21 pages
Taye Kufa - AFCA - 2017 Presentation PDF
No ratings yet
Taye Kufa - AFCA - 2017 Presentation PDF
27 pages
Wei William CV
No ratings yet
Wei William CV
7 pages
IMP-1A Review On Deep Learning Techniques For Video Prediction
No ratings yet
IMP-1A Review On Deep Learning Techniques For Video Prediction
26 pages
Article1437574026 - Hamadneh and Al - Masaeed
No ratings yet
Article1437574026 - Hamadneh and Al - Masaeed
7 pages
Ecm-Bsa 110
No ratings yet
Ecm-Bsa 110
2 pages
AOL-2-Mod-1 MA
No ratings yet
AOL-2-Mod-1 MA
17 pages
Nature of Inquiry and Research: Prepared By: Mr. Norberto L. Gesite, JR, LPT Research/ Ict Teacher
No ratings yet
Nature of Inquiry and Research: Prepared By: Mr. Norberto L. Gesite, JR, LPT Research/ Ict Teacher
79 pages
Best Thesis Topics For Political Science
100% (2)
Best Thesis Topics For Political Science
6 pages
An Evaluation of Four Methods of Assessing The Behaviour of Anxious Child Dental Patients
No ratings yet
An Evaluation of Four Methods of Assessing The Behaviour of Anxious Child Dental Patients
9 pages
Probability and Random Variables: Department of Mathematics Ma8391 - Probability and Statistics Unit - I
No ratings yet
Probability and Random Variables: Department of Mathematics Ma8391 - Probability and Statistics Unit - I
6 pages
The Impact of AI-Driven Personalization On Customer Satisfaction in E-Commerce: Balancing Technology, Transparency, and Control
No ratings yet
The Impact of AI-Driven Personalization On Customer Satisfaction in E-Commerce: Balancing Technology, Transparency, and Control
7 pages
SCOR Model + Inventory Management
No ratings yet
SCOR Model + Inventory Management
10 pages
Assessing The Utilization of Digital Wallet On
No ratings yet
Assessing The Utilization of Digital Wallet On
4 pages
Nonlinear Interpolation
No ratings yet
Nonlinear Interpolation
10 pages
International Market Entry Strategies of Digital Platform Businesses
100% (1)
International Market Entry Strategies of Digital Platform Businesses
75 pages
Evaluation of Baseline Schedule Metrics Final 1
No ratings yet
Evaluation of Baseline Schedule Metrics Final 1
20 pages
Domain 5: Ssessment AND Eporting
No ratings yet
Domain 5: Ssessment AND Eporting
18 pages

Notes 14

Uploaded by

Notes 14

Uploaded by

Section 14 Advanced Topics

 So what is the right function? A, B, or C?

Angrist & Pischke and Leamer’s response

Lovell’s “data mining” experiment

Fail to reject Reject Total

Simulation, Monte Carlo, and bootstrap methods

o Repeat K times with different randomly generated error terms

Methods for coping with missing data

 Use full sample to estimate Yi  0  1 X 1i  2 X 2 i  ui .

 When we write the model in terms of T, observation T – 1 has

Duration and hazard-rate models

o S  t   1  F  t   Pr T  t  is the survival function: the probability that a spell is

 Estimation of survival models

 Including exogenous variables

o Because the  function is non-differentiable, we can’t use basic calculus methods

 A multivariate example: Figure 4 shows baby birth weight as a function of mother’s

 In Stata: qreg dvar indvars , quantile(0.5) will do 0.5 quantile.

Regression discontinuity models

o Note general downward slope of the relationship between S and probability of

Estimation, diagnostic testing, re-estimation

You might also like