Handbook of Econometrics Volume 3
Handbook of Econometrics Volume 3
Harvard University
Contents
‘See Kuznets (1971) and Morgenstem (1950) for earlier expressions of similar opinions. Morgen-
stern’s Cassandra like voice is still very much worth listening to on this range of topics.
Ch. 25: Economic Data Issues 1461
until quite recently, econometricians were not to be found inside the various
statistical agencies, and especially not in the sections that were responsible for
data collection. Thus, there grew up a separation of roles and responsibility.
“They” collect the data and “they” are responsible for all of their imperfections.
“We” try to do the best with what we get, to find the grain of relevant
information in all the chaff. Because of this, we lead a somewhat remote existence
from the underlying facts we are trying to explain. We did not observe them
directly; we did not design the measurement instruments; and, often we know
little about what is really going on (e.g. when we estimate a production function
for the cement industry from Census data without ever having been inside a
cement plant). In this we differ quite a bit from other sciences (including
observational ones rather than experimental) such as archeology, astrophysics,
biology, or even psychology where the “facts” tend to be recorded by the
professionals themselves, or by others who have been trained by and are super-
vised by those who will be doing the final data analysis. Economic data tend to be
collected (or often more correctly “reported’) by firms and persons who are not
professional observers and who do not have any stake in the correctness and
precision of the observations they report. While economists have increased their
use of surveys in recent years and even designed and commissioned a few special
purpose ones of their own, in general, the data collection and thus the responsibil-
ity for the quality of the collected material is still largely delegated to census
bureaus, survey research centers, and similar institutions, and is divorced from the
direct supervision and responsibility of the analyzing team.
It is only relatively recently, with the initiation of the negative income tax
experiments and various longitudinal surveys intended to follow up the effects of
different governmental programs, that econometric professionals had actually
become involved in the primary data collection process. Once attempted, the job
turned out to be much more difficult than was thought originally, and taught us
some humility.2 Even with relatively large budgets, it was not easy to figure out
how to ask the right question and to collect relevant answers. In part this is
because the world is much more complicated than even some of our more
elaborate models allow for, and partly also because economists tend to formulate
their theories in non-testable terms, using variables for which it is hard to find
empirical counterparts. For example, even with a large budget, it is difficult to
think of the right series of questions, answers to which would yield an unequiv-
ocal number of the level for “human capital” or “permanent income” of an
individual. Thinking about such “alibi-removing” questions should make us a bit
more humble, restrain our continuing attacks on the various official data produc-
ing agencies, and push us towards formulating theories with more regard to what
is observable and what kind of data may be available.
Even allowing for such reservations there has been much progress over the
years as a result of the enormous increase in the quantity of data available to us,
in our ability to manipulate them, and in our understanding of their limitations.
Especially noteworthy have been the development of various longitudinal micro-
data sets (such as the Michigan PSID tapes, and Ohio State NLS surveys, the
Wisconsin high school class follow-up study, and others),3 the computerization of
the more standard data bases and their easier accessibility at the micro, individual
response level (I have in mind here such developments as the Public Use Samples
from the U.S. Population Census and the Current Population Surveys).4 Unfor-
tunately, much more progress has been made with labor force and income type
data, where the samples are large, than in the availability of firm and other
market transaction data. While significant progress has been made in the collec-
tion of financial data and security prices, as exemplified in the development of the
CRISP and Compustat data bases which have had a tremendous impact on the
field of finance, we are still in our infancy as far as our ability to interrogate and
get reasonable answers about other aspects of firm behavior is concerned. Most of
the available microdata at the firm level are based on legally required responses to
questions from various regulatory agencies who do not have our interests exactly
in mind.
We do have, however, now a number of extensive longitudinal microdata sets
which have opened a host of new possibilities for analysis and also raised a whole
range of new issues and concerns. After a decade or more of studies that try to
use such data, the results have been somewhat disappointing. We, as econometri-
cians, have learned a great deal from these efforts and developed whole new
subfields of expertise, such as sample selection bias and panel data analysis. We
know much more about these kinds of data and their limitations but it is not clear
that we know much more or more precisely about the roots and modes of
economic behavior that underlie them.
The encounters between econometricians and data are frustrating and ulti-
mately unsatisfactory both because econometricians want too much from the data
and hence tend to be disappointed by the answers, and because the data are
incomplete and imperfect. In part it is our fault, the appetite grows with eating.
As we get larger samples, we keep adding variables and expanding our models,
until on the margin, we come back to the same insignificance levels.
There are at least three interrelated and overlapping causes of our difficulties:
(1) the theory (model) is incomplete or incorrect; (2) the units are wrong, either at
too high a level of aggregation or with no way of allowing for the heterogeneity of
responses; and, (3) the data are inaccurate on their own terms, incorrect relative
3See Borus (1982) for a recent survey of longitudinal data sets.
4This survey is, perforce, centered on U.S. data and experience, which is what I am most familiar
with. The overall developments, however, have followed similar patterns in most other countries.
Ch. 25: Economic Data Issues 1469
to what they purport to measure. The average applied study has to struggle with
all three possibilities.
At the macro level and even in the usual industry level study, it is common to
assume away the underlying heterogeneity of the individual actors and analyze
the data within the framework of the “representative” firm or “average” individ-
ual, ignoring the aggregation difficulties associated with such concepts. In analyz-
ing microdata, it is much more difficult to evade this issue and hence much
attention is paid to various individual “effects” and “heterogeneity” issues. This
is wherein the promise of longitudinal data lies - their ability to control and allow
for additive individual effects. On the other hand, as is the case in most other
aspects of economics, there is no such thing as a free lunch: going down to the
individual level exacerbates both some of the left out variables problems and
the importance of errors in measurement. Variables such as age, land quality, or
the occupational structure of an enterprise, are much less variable in the aggre-
gate. Ignoring them at the micro level can be quite costly, however. Similarly,
measurement errors which tend to cancel out when averaged over thousands or
even millions of respondents, loom much larger when the individual is the unit of
analysis.
It is possible, of course, to take an alternative view: that there are no data
problems only model problems in econometrics. For any set of data there is the
“right” model. Much of econometrics is devoted to procedures which try to assess
whether a particular model is “right” in this sense and to criteria for deciding
when a particular model fits and is “correct enough” (see Chapter 5, Hendry,
1983 and the literature cited there). Theorists and model builders often proceed,
however, on the assumption that ideal data will be available and define variables
which are unlikely to be observable, at least not in their pure form. Nor do they
specify in adequate detail the connection between the actual numbers and their
theoretical counterparts. Hence, when a contradiction arises it is then possible to
argue “so much worse for the facts.” In practice one cannot expect theories to be
specified to the last detail nor the data to be perfect or of the same quality in
different contexts. Thus any serious data analysis has to consider at least two data
generation components: the economic behavior model describing the stimulus-
response behavior of the economic actors and the measurement model, describing
how and when this behavior was recorded and summarized. While it is usual to
focus our attention on the former, a complete analysis must consider them both.
In this chapter, I discuss a number of issues which arise in the encounter
between the econometrician and economic data. Since they permeate much of
econometrics, there is quite a bit of overlap with some of the other chapters in the
Handbook. The emphasis here, however, is more on the problems that are posed
by the various aspects of economic data than on the specific technological
solutions to them.
1470 2. Griliches
After a brief review of the major classes of economic data and the problems
that are associated with using and interpreting them, I shall focus on issues that
are associated with using erroneous or partially missing data, discuss several
empirical examples, and close with a few final remarks.
The level of fabrication dimension refers to the “closeness” of the data to the
actual phenomenon being measured. Even though they may be subject to various
biases and errors, one may still think of reports of hours worked during last week
by a particular individual in a survey or the closing price of a specific common
stock on the New York Stock Exchange on December 31 as primary observations.
These are the basic units of information about the behavior of economic actors
and the information available to them (though individuals are also affected by the
macro information that they receive). They are the units in which most of our
microtheories are denominated. Most of our data are not of this sort, however.
They have usually already undergone several levels of processing or fabrication.
For example, the official estimate of total corn production in the State of Iowa in
a particular year is not the result of direct measurement but the outcome of a
rather complicated process of blending sample information on physical yields,
reports on grain shipments to and from elevators, benchmark census data from
previous years, and a variety of informal Bayes-like smoothing procedures to
yield the final official “estimate” for the state as a whole. The final results, in this
case, are probably quite satisfactory for the uses they are put to, but the
procedure for creating them is rarely described in full detail and is unlikely to be
replicable. This is even more true at the aggregated level of national income
accounts and other similar data bases, where the link between the original
primary observations and the final aggregate numbers is quite tenuous and often
mysterious.
I do not want to imply that the aggregate numbers are in some sense worse
than the primary ones. Often they are better. Errors may be reduced by aggrega-
tion and the informal and formal smoothing procedures may be based on correct
prior information and result in a more reliable final result. What needs to be
remembered is that the final published results can be affected by the properties of
the data generating mechanism, by the procedures used to collect and process the
data. For example, some of the time series properties of the major published
economic series may be the consequence of the smoothing techniques used in
their construction rather than a reflection of the underlying economic reality.
(This was brought forceably home to me many years ago while collecting
unpublished data on the diffusion of hybrid corn at the USDA when I came
across a circular instructing the state agricultural statisticians: “When in
doubt -use a growth curve.“) Some series may fluctuate because of fluctuations in
the data generating institutions themselves. For example, the total number of
patents granted by the U.S. Patent Office in a particular year depends rather
strongly on the total number of patent examiners available to do the job. For
budgetary and other reasons, their number has gone through several cycles,
inducing concomitant cycles in the actual number of patents granted. This last
example brings up the point that while particular numbers may be indeed correct
as far as they go, they do not really mean what we thought they did.
1472 2. Griliches
Such considerations lead one to consider the rather amorphous notion of data
“quality.” Ultimately, quality cannot be defined independently of the intended
use of the particular data set. In practice, however, data are used for multiple
purposes and thus it makes some sense to indicate some general notions of data
quality. Earlier I listed extent, reliability, and validity as the three major dimen-
sions along which one may judge the quality of different data sets. Extent is a
synonym for richness: How many variables are present, what interesting ques-
tions had been asked, how many years and how many firms or individuals were
covered? Reliability is actually a technical term in psychometrics, reflecting the
notion of replicability and measuring the relative amount of random measure-
ment error in the data by the correlation coefficient between replicated or related
measurement of the same phenomenon. Note that a measurement may be highly
reliable in the sense that it is a very good measure of whatever it measures, but
still be the wrong measure for our particular purposes.
This brings us to the notion of validity which can be subdivided in turn into
representativeness and relevance. I shall come back to the issue of how repre-
sentative is a body of data when we discuss issues of missing and incomplete data.
It will suffice to note here that it contains the technical notion of coverage: Did all
units in the relevant universe have the same (or alternatively, different but known
and adjusted for) probability of being selected into the sample that underlies this
particular data set? Coverage and relevance are related concepts which shade over
into issues that arise from the use of “proxy” variables in econometrics. The
validity and relevance questions relate less to the issue of whether a particular
measure is a good (unbiased) estimate of the associated population parameter and
more to whether it actually corresponds to the conceptual variable of interest.
Thus one may have a good measure of current prices which are still a rather poor
indicator of the currently expected future price and relatively extensive and well
measured IQ test scores which may still be a poor measure of the kind of
“ability” that is rewarded in the labor market.
My father would never eat “cutlets” (minced meat patties) in the old
country. He would not eat them in restaurants because he didn’t know
what they were made of and he wouldn’t eat them at home because he
did.
AN OLD FAMILY STORY
I will be able to touch on only a few of the many serious practical and conceptual
problems that arise when one tries to use the various economic data sets. Many of
these issues have been discussed at length in the national income and growth
measurement literature but are not usually brought up in standard econometrics
Ch. 25: Economic Data Issues 1413
courses or included in their curriculum. Among the many official and semi-official
data base reviews one should mention especially the Creamer GNP Improvement
report (U.S. Department of Commerce, 1979), the Rees committee report on
productivity measurement (National Academy of Sciences, 1979), the Stigler
committee (National Bureau of Economic Research, 1961) and the Ruggles
(Council on Wage and Price Stability, 1977) reports on price statistics, the
Gordon (President’s Committee to Appraise Employment Statistics, 1962), and
the Levitan (National Committee on Employment and Unemployment Statistics,
1979) committee reports on the measurement of employment and unemployment,
and the many continuous and illuminating discussions reported in the proceed-
ings volumes of the Conference on Research in Income and Wealth, especially in
volumes 19, 20, 22, 25, 34, 38, 45, 47, and 48 (National Bureau of Economic
Research, 1957...1983). All these references deal almost exclusively with U.S.
data, where the debates and reviews have been more extensive and public, but are
also relevant for similar data elsewhere.
At the national income accounts level there are serious definitional problems
about the borders of economic activity (e.g. home production and the investment
value of children) and the distinction between final and intermediate consumption
activity (e.g. what fraction of education and health expenditures can be thought
of as final rather than intermediate “goods” or “ bads”). There are also difficult
measurement problems associated with the existence of the underground economy
and poor coverage of some of the major service sectors. The major serious
problem from the econometric point of view probably occurs in the measurement
of “real” output, GNP or industry output in “constant prices,” and the associated
growth measures. Since most of the output measures are derived by dividing
(“deflating”) current value totals by some price index, the quality of these
measures is intimately connected to the quality of the available price data.
Because of this, it is impossible to treat errors of measurement at the aggregate
level as being independent across price and “quantity” measures.
The available price data, even when they are a good indicator of what they
purport to measure, may still be inadequate for the task of deflation. For
productivity comparisons and for production function estimation the observed
prices are supposed to reflect the relevant marginal costs and revenues in a, at
least temporary, competitive equilibrium. But this is unlikely to be the case in
sectors where output or prices are controlled, regulated, subsidized, and sold
under various multi-part tariffs. Because the price data are usually based on the
pricing of a few selected items in particular markets, they may not correspond
well to the average realized price for the industry as a whole during a particular
time period, both because “easily priced” items may not be representative of the
average price movements in the industry as a whole and because many transac-
tions are made with a lag, based on long term contracts. There are also problems
associated with getting accurate transactions prices (Kruskal and Telser, 1960 and
1474 2. Griliches
Stigler and Kindahl, 1970) but the major difficulty arises from getting compar-
able prices over time, from the continued change in the available set of commod-
ities, the “quality change” problem.
“Quality change” is actually a special version of the more general comparabil-
ity problem, the possibility that similarly named items are not really similar,
either across time or individuals. In many cases the source of similarly sounding
items is quite different: Employment data may be collected from plants (establish-
ments), companies, or households. In each case the answer to the same question
may have a different meaning. Unemployment data may be reported by a
teenager directly or by his mother, whose views about it may both differ and be
wrong. The wording of the question defining unemployment may have changed
over time and so should also the interpretation of the reported statistic. The
context in which a question is asked, its position within a series of questions on a
survey, and the willingness to answer some of the questions may all be changing
over time making it difficult to maintain the assumption that the reported
numbers in fact relate to the same underlying phenomenon over time or across
individuals and cultures.
The common notion of quality change relates to the fact that many commod-
ities are changing over time and that often it is impossible to construct ap-
propriate pricing comparisons because the same varieties are not available at
different times and in different places. Conceptually one might be able to get
around this problem by assuming that the many different varieties of a commod-
ity differ only along a smaller number of relevant dimensions (characteristics,
specifications), estimate the price-characteristics relationship econometrically and
use the resulting estimates to impute a price to the missing model or variety in the
relevant comparison period. This approach, pioneered by Waugh (1928) and
Court (1936) and revived by Griliches (1961) has become known as the “hedonic”
approach to price measurement. The data requirements for the application of this
type of an approach are quite severe and there are very few official price indexes
which incorporate it into their construction procedures. Actually, it has been used
much more widely in labor economics and in the analyses of real estate values
than in the construction of price deflator indexes. See Griliches (1971) Gordon
(1983), Rosen (1974) and Triplett (1975) for expositions, discussions, and exam-
ples of this approach to price measurement.
While the emergence of this approach has sensitized both the producers and the
consumers of price data to this problem and contributed to significant improve-
ments in data collection and processing procedures over time, it is fair to note
that much still remains to be done. In the U.S. GNP deflation procedures, the
price of computers has been kept constant since the early 1960s for lack of an
agreement of what to do about it, resulting in a significant underestimate in the
growth of real GNP during the last two decades. Similarly, for lack of a more
appropriate price index, aircraft purchases had been deflated by an equally
Ch. 25: Economic Data Issues 1415
weighted index of gasoline engine, metal door, and telephone equipment prices
until the early 197Os, at which point a switch was made to a price index based on
data from the CAB on purchase prices for “identical” models, missing thereby
the major gains that occurred from the introduction of the jet engine, and the
various improvements in operating efficiency over time.5 One could go on adding
to this gallery of horror stories but the main point to be made here is not that a
particular price index is biased in one or another direction. Rather, the point is
that one cannot take a particular published price index series and interpret it as
measuring adequately the underlying notion of a price change for a well specified,
unchanging, commodity or service being transacted under identical conditions
and terms in different time periods. The particular time series may indeed be
quite a good measure of it, or at least better than the available alternatives, but
each case requires a serious examination whether the actual procedures used to
generate the series do lead to a variable that is close enough to the concept
envisioned by the model to be estimated or by the theory under test. If not, one
needs to append to the model an equation connecting the available measured
variable to the desired but not actually observed correct version of this variable.
The issues discussed above affect also the construction and use of various
“capital” measures in production function studies and productivity growth
analyses. Besides the usual aggregation issues connected with the “existence” of
an unambiguous capital concept (see Diewert, 1980 and Fisher, 1969 on this) the
available measures suffer from potential quality change problems, since they are
usually based on some cumulated function of past investment expenditures
deflated by some combination of available price indexes. In addition, they are
also based on rather arbitrary assumptions about the pattern of survival of
machines over time and the time pattern of deterioration in the flow of their
services. The available information on the reasonableness of such assumptions is
very sparse, ancient, and flimsy. In some contexts it is possible to estimate the
appropriate pattern from the data rather than impose them a priori. I shall
present an example of this type of approach below.
Similar issues arise also in the measurement of labor inputs and associated
variables at both the macro and micro levels. At the macro level the questions
revolve about the appropriate weighting to be given to different types of labor:
young-old, male-female, black-white, educated vs. uneducated, and so forth.
The direct answer here as elsewhere is that they should be weighted by their
appropriate marginal prices but whether the observed prices actually reflect
correctly the underlying differences in their respective marginal productivities is
one of the more hotly debated topics in labor economics. (See Griliches, 1970 on
the education distinction and Medoff and Abraham, 1980 on the age distinction.)
‘For a recent review and reconstruction of the price indexes for durable producer goods see
Gordon’s (1985) forthcoming monograph.
1476 Z. Griliches
Connected to this is also the dilhculty of getting relevant labor prices. Most of the
usual data sources report or are based on data on average annual, weekly, or
hourly earnings which do not represent adequately either the marginal cost of a
particular labor hour to the employer or the marginal return to a worker from the
additional hour of work. Both are affected by the existence of overtime premia,
fringe benefits, training costs, and transportation costs. Only recently has an
employment cost index been developed in the United States. (See Triplett, 1983
on this range of issues.) From an individual worker’s point of view the existence
of non-proportional tax schedules introduces another source of discrepancy
between the observed wage rates and the unobserved marginal after tax net
returns from working (see Hausman, 1982, for a more detailed discussion).
While the conceptual discrepancy between the desired concepts and the avail-
able measures dominates at the macro level the more mundane topics of errors of
measurement and missing and incomplete data come to the fore at the micro,
individual survey level. This topic is the subject of the next section.
While many of the macro series may be also subject to errors, the errors in them
rarely fit into the framework of the classical errors-in-variables model (EVM) as it
has been developed in econometrics (see Chapter 23 for a detailed exposition).
They are more likely to be systematic and correlated over time.6 Micro data are
subject to at least three types of discrepancies, “errors,” and fit this framework
much better:
(a) Transcription, transmission, or recording error, where a correct response is
recorded incorrectly either because of clerical error (number transposition, skip-
ping a line or a column) or because the observer misunderstood or misheard the
original response.
(b) Response or sampling error, where the correct underlying value could be
ascertained by a more extensive sampling, but the actual observed value is not
equal to the desired underlying population parameter. For example, an IQ test is
based on a sample of responses to a selected number of questions. In principle,
the mean of a large number of tests over a wide range of questions would
6For an “error analysis” of national income account data based on the discrepancies between
preliminary and “final” estimates see Cole (1969) Young (1974), and Haitovsky (1972). For an earlier
more detailed evaluation based on subjective estimates of the differential quality of the various
“ingredients” (series) of such accounts see Kuznets (1954, chapter 12).
Ch. 25: Economic Data Issues 1417
converge to some mean level of “ability” associated with the range of subjects
being tested. Similarly, the simple permanent income hypothesis would assert that
reported income in any particular year is a random draw from a potential
population of such incomes whose mean is “permanent income.” This is the case
where the observed variable is a direct but fallible indicator of the underlying
relevant “ unobservable,” “ latent factor” or variable (see Chapter 23 and Griliches,
1974, for more discussion of such concepts).
(c) When one is lacking a direct measure of the desired concept and a “proxy”
variable is used instead. For example, consider a model which requires a measure
of permanent income and a sample which has no income measures at all but does
have data on the estimated market value of the family residence. This housing
value may be related to the underlying permanent income concept, but not clearly
so. First, it may not be in the same units, second it may be affected by other
variables also, such as house prices and family size, and third there may be
“random” discrepancies related to unmeasured locational factors and events that
occurred at purchase time. While these kinds of “indicator” variables do not fit
strictly into the classical EVM framework, their variances, for example, need not
exceed the variance of the true “unobservable,” they can be fitted into this
framework and treated with the same methods.
There are two classes of cases which do not really fit this framework: Occasion-
ally one encounters large transcription and recording errors. Also, sometimes the
data may be contaminated by a small number of cases arising from a very
different behavioral model and/or stochastic process. Sometimes, these can be
caught and dealt with by relatively simple data editing procedures. If this kind of
problem is suspected, it is best to turn to the use of some version of the “robust
estimation” methods discussed in Chapter 11. Here we will be dealing with the
more common general errors-in-measurement problem, one that is likely to affect
a large fraction of our observations.
The other case that does not fit our framework is where the true concept, the
unobservable is distributed randomly relative to the measure we have. For
example, it is clear that the “number of years of school completed” (S) is an
erroneous measure of true “education” (E), but it is more likely that the
discrepancy between the two concepts is independent of S rather than E. I.e. the
“error” of ignoring differences in the quality of schooling may be independent of
the measured years of schooling but is clearly a component of the true measure of
E. The problem here is a left-out relevant variable (quality) and not measurement
error in the variable as is (years of school). Similarly, if we use the forecast of
some model, based on past data, to predict the expectations of economic actors,
we clearly commit an error, but this error is independent of the forecast level (if
this forecast is optimal and the actors have had access to the same information).
This type of “error” does not induce a bias in the estimated coefficients and can
be incorporated into the standard disturbance framework (see Berkson, 1950).
1478 Z. Griliches
y=a+pz+e, (4.1)
the absence of direct observations on z, and the availability of a fallible measure
of it
X=Z+& 7 (4.2)
where E is a purely random i.i.d. measurement error, with EE = 0, and no
correlation with either z or y. This is quite a restrictive set of assumptions,
especially the assumption of the errors not being correlated with anything else in
the model including their own past values. But it turns out to be very useful in
many contexts and not too far off for a variety of micro data sets. I will discuss
the evidence for the existence of such errors further on, when we turn to consider
briefly various proposed solutions to the estimation problem in such models, but
the required assumptions are not more difficult than those made in the standard
linear regression model which requires that the “disturbance” e, the model
discrepancy, be uncorrelated with all the included explanatory variables.
It may be worthwhile, at this point, to summarize the main conclusions from
the EVM for the standard OLS estimates in contexts where one has ignored the
presence of such errors. Estimating
y=a+bx+u, (4.3)
where the true model is the one given above yields - PA as the asymptotic bias of
the OLS 8, where X = u,‘/u: is a measure of the relative amount of measurement
error in the observed x series. The basic conclusion is that the OLS slope estimate
is biased towards zero, while the constant term is biased away from zero. Since, in
this model one can treat y and x symmetrically, it can be shown (Schultz, 1938,
Frisch, 1934, Klepper and Learner, 1983) that in the “other regression,” the
regression of x on y, the slope coefficient is also biased towards zero, implying a
“bracketing” theorem
These results generalize also to the multivariate case. In the case of two indepen-
dent variables (xi and x2), where only one (xi) is subject to error, the coefficient
of the other variable (the one not subject to errors of measurement) is also biased
(unless the two variables are uncorrelated). That is, if the true model is
then
P~(~,,z.,1-P2)=pa,x/(1-P2) (4.7)
= - P[bias&].
That is, the bias in the coefficient of the erroneous variable is “transmitted” to the
other coefficients, with an opposite sign (provided, as is often the case, that
p > 0), (see Griliches and Ringstad, 1971, Appendix C, and Fisher, 1980 for the
derivation of this and related formulae).
If more than one independent variable is subject to error, the formulae become
more complicated, but the basic pattern persists. If both zi and z2 are unob-
served and x,=z,+ Ed, x2=z2+z2, where the E’S are independent (of each
other) errors of measurement, and we have normalized the variables so that
a*Xl= IJ:~=~, then
plim(b,,.2-P,)=-B,h,/(l-p2)+P2X2p/(1-~2) (4-g)
J$(l-@),
with a similar symmetric formula for plim by2.1. Thus, in the multivariate case, the
bias is increased by the factor l/(1 - p*), the reduction in the independent
variance of the true signal due to its intercorrelation with the other variable(s),
and attenuated by the fact that the particular variable compensates somewhat for
the downward bias in the other coefficients caused by the errors in the other
variables. Overall, there is still a bias towards zero. For example, in this case the
sum of the estimated coefficients is always biased towards zero:
y=a+pz+Yz*+e, (4.10)
1480 Z. Griliches
plim&=/?(l-X), (4.11)
while
Procedures for estimation with known h’s are outlined in Chapter 23. Occa-
sionally we have access to “replicated” data, when the same question is asked on
different occasions or from different observers, allowing us to estimate the
variance of the “true” variable from the covariance between the different mea-
sures of the same concept. This type of an approach has been used in economics
by Bowles (1972) and Borus and Nestel(1973) in adjusting estimates of parental
background by comparing the reports of different family members about the same
concept, and by Freeman (1984) on a union membership variable, based on a
comparison of worker and employer reports. Combined with a modelling ap-
proach it has been pursued vigorously and successfully in sociology in the works
of Bielby, Hauser, and Featherman (1977), Massagli and Hauser (1983) and
Mare and Mason (1980). While there are difficulties with assuming a similar error
variance on different occasions or for different observers, such assumptions can be
relaxed within the framework of a larger model. This is indeed the most
promising approach, one that brings in additional independent evidence about
the actual magnitude of such errors.
Almost all other approaches can be thought of as finding a reasonable set of
instrumental variables for the problem, variables that are likely to be correlated
with the true underlying z, but not with either the measurement error E or the
equation error (disturbance) e. One of the earlier and simpler applications of this
approach was made by Griliches and Mason (1972) in estimating an earnings
function and worrying about errors in their ability measure (AFQT test scores).
In a “true” equation of the form
y=a+ps+ya+Sx+e, (4.12)
‘Grouping methods that do not use an “outside” grouping criterion but are based on grouping on x
alone (or using its ranks as instruments) are not in general consistent and need not reduce the EV
induced bias. (See Pakes, 1982).
1482 Z. Griliches
a = xs, + g,
t=a+e, (4.13)
s = xs, + y,a + u,
J’ = ps + y,a + e,
t=xs,+g+&,
imposing the non-linear parameter restrictions across the equations and retrieving
additional information about them from the variance-covariance matrix of the
residuals, given the no-correlation assumption about the E’S, g’s, U’S,and e’s. It
is possible, for example, to retrieve an estimate of p + y2/y1 from the
variance-covariance matrix and pool it with the estimates derived from the
reduced form slope coefficients. In larger, more over-identified models, there are
more binding restrictions connecting the variance- covariance matrix of the
residuals with the slope parameter estimates. Chamberlain and Griliches (1975)
used an expanded version of this type of model with sibling data, assuming that
the unobserved ability variable has a variance-components structure. Aasness
(1983) uses a similar framework and consumer expenditures survey data to
estimate Engel functions and the unobserved distribution of total consumption.
All of these models rely on two key assumptions: (1) The original model
y = (Y+ bz + e is correct for all dimensions of the data. I.e. the /3 parameter is
stable and (2) The unobserved errors are uncorrelated in some well specified
known dimension. In cross-sectional data it is common to assume that the z’s (the
“true” values) and the E’S (the measurement errors) are based on mutually
independent draws from a particular population. It is not possible to maintain
Ch. 25: Economic Data Issues 1483
this assumption when one moves to time series data or to panel data (which are a
cross-section of time series), at least as far as the z’s are concerned. Identification
must hinge then on known differences in the covariance generating functions of
the z’s and the E’S. The simplest case is when the E’S can be taken as white (i.e.
uncorrelated over time) while the z’s are not. Then lagged x’s can be used as
valid instruments to identify /3. For example, the “contrast” estimator suggested
by Kami and Weisman (1974) which combines the differentially biased level
(plim b = /3 - /?X) and first difference estimators [plim b, = fi - PA/(1 - p)] to
derive consistent estimators for fl and A, can be shown, for stationary x and y, to
be equivalent (asymptotically) to the use of lagged x’s as instruments.
While it may be difhcult to maintain the hypothesis that errors of measurement
are entirely white, there are many different interesting cases which still allow the
identification of /3. Such is the case if the errors can be thought of as a
combination of a “permanent” error or misperception of or by individuals and a
random independent over time error component, The first part can be encom-
passed in the usual “correlated” or “fixed” effects framework with the “within”
measurement errors being white after all. Identification can be had then from
contrasting the consequences of differencing over differing lengths of time.
Different ways of differencing all sweep out the individual effects (real or errors)
and leave us with the following kinds of bias formulae:
plimb,,=P(l-2a,2/&), (4.15)
where u,” is the variance of the independent over time component of the E’S, 1A
denotes the transformation x1 - x1 while 24 indicates differences taken two
periods apart: x3 - xi and so forth, and the s2’s are the respective variances of
such differences in x. (4.15) can be solved to yield:
where wjAis the covariance of j period differences in y and x. This in turn, can
be shown to be equivalent to using past and future x’s as instruments for the first
differences.*
More generally, if one were willing to assume that the true z’s are non-sta-
tionary, which is not unreasonable for many evolving economic series, but the
measurement errors, the E’S,are stationary, then it is possible to use panel data to
identify the parameters of interest even when the measurement errors are corre-
*See Griliches and Hausman (1984) for details, generalizations, and an empirical example.
1484 Z. Griliches
lated over time.’ Consider, for example, the simplest case of T = 2. The probabil-
ity limit of the variance-covariance matrix between y and x is given by:
im (4.17)
where now sth stands for the variances and covariances of the true z’s, a* is the
variance of the E’S, and p is their first order correlation coefficient. It is obvious
that if the z’s are non-stationary then (covy,x, - covyzx2)/(varx1 - varx,) and
(covy,x, - covy*xJ/(covxlxz -covx2x1) yield consistent estimates of fl. In
longer panels this approach can be extended to accommodate additional error
correlations and the superimposition of “correlated effects” by using its first
differences analogue.
Even if the z’s were stationary, it is always possible to handle the correlated
errors case provided the correlation is known. This rarely is the case, but
occasionally a problem can be put into this framework. For example, capital
measures are often subject to measurement error but these errors cannot be taken
as uncorrelated over time, since they are cumulated over time by the construction
of such measures. But if one were willing to assume that the errors occur
randomly in the measurement of investment and they are uncorrelated over time,
and the weighting scheme (the depreciation rate) used in the construction of the
capital stock measure is known, then the correlation between the errors in the
stock levels is also known.
For example, if one is interested in estimating the rate of return to some capital
concept, where the true equation is
rt=a+rK,*+e,, (4.18)
which is now in standard EVM form, and use lagged values of I as instruments.
Hausman and Watson (1983) use a similar approach to estimate the seasonality in
the unemployment series by taking advantage of the known correlation in the
measurement errors introduced by the particular structure of the sample design in
their data.
One needs to reiterate, that in these kinds of models (as is also true for the rest
of econometrics) the consistency of the final estimates depends both on the
correctness of the assumed economic model and the correctness of the assump-
tions about the error structure. lo We tend to focus here on the latter, but the
former is probably more important. For example, in Friedman’s (1957) classical
permanent income consumption function model, the estimated elasticity of con-
sumption with respect to income is a direct estimate of one minus the error ratio
(the ratio of the variance of transitory income to the variance of measured
income). But this conclusion is conditional on having assumed that the true
elasticity of consumption with respect to permanent income is unity. If that is
wrong, the first conclusion does not follow. Similarly in the profit-capital stock
example above, we can do something because we have assumed that the true
depreciation is both known and geometric. All our conclusions about the amount
of error in the investment series are conditional on the correctness of these
assumptions.
Relative to our desires data can be and usually are incomplete in many different
ways. Statisticians tend to distinguish between three types of “missingness”:
undercoverage, unit non-response, and item non-response (NAS, 1983). Under-
coverage relates to sample design and the possibility that a certain fraction of the
“The usual assumption of normality of such measurement and response errors may not be tenable
in many actual situations. See Ferber (1966) and Hamilton (1981) for empirical evidence on this point.
1486 Z. Griliches
relevant population was excluded from the sample by design or accident. Unit
non-response relates to the refusal of a unit or individual to respond to a
questionnaire or interview or the inability of the interviewers to find it. Item
non-response is the term associated with the more standard notion of missing
data: questions unanswered, items not filled in, in a context of a larger survey or
data collection effort. This term is usually applied to the situation where the
responses are missing for only some fraction of the sample. If an item is missing
entirely, then we are in the more familiar omitted variables case to which I shall
return in the next section.
In this section I will concentrate on the case of partially missing data for some
of the variables of interest. This problem has a long history in statistics and
somewhat more limited history in econometrics. In statistics, most of the discus-
sion has dealt with the randomly missing, or in newer terminology, ignorable carve
(see Rubin, 1976, and Little, 1982) where, roughly speaking, the desired parame-
ters can be estimated consistently from the complete data subsets and “missing
data” methods focus on using the rest of the available data to improve the
efficiency of such estimates.
The major problem in econometrics is not just missing data but the possibility
(or more accurately, probability) that they are missing for a variety of self-selec-
tion reasons. Such “behavioral missing” implies not only a loss of efficiency but
also the possibility of serious bias in the estimated coefficients of models that do
not take this into account. The recent revival of interest in econometrics in limited
dependent variables models, sample-selection, and sample self-selection problems
has provided both the theory and computational techniques for attacking this
problem. Since this range of topics is taken up in Chapter 28, I will only allude to
some of these issues as we go along. It is worth noting, however, that this area has
been pioneered by econometricians (especially Amemiya and Heckman) with
statisticians only recently beginning to follow in their footsteps (e.g. Little, 1983).
The main emphasis here will be on the no-self-selection ignorable case. It is of
some interest, because these kinds of methods are widely used, and because it
deals with the question of how one combines scraps of evidence and what one can
learn from them. Consider a simple example where the true equation of interest is
y=/?x+yz+e, (5.1)
where e is a random term satisfying the usual OLS assumptions and the constant
has been suppressed for notational ease. /3 and y could be vectors and x and z
could be matrices, but I will think of them at first as scalars and vectors
respectively. For some fraction A[n2/( n, + nz)] of our sample we are missing
observations (responses) on x. Let us rearrange the data and call the complete
data sample A and the incomplete sample B. Assume that it is possible to
Ch. 25: Economic Data Issues 1487
d=l if g(x,z,m;e)+E20,
d=O if g(x,z,m;@)+e<O, (5.2)
x=6z++m+u, (5.3)
where E(u) = 0 and E( ue) = 0. Note that as far as this equation is concerned, the
missing data problem is one of missing the dependent variable for sub-sample B.
If the probability of being present in the sample were related to the size of U, we
“l%is section borrows heavily from Griliches, Hall and Hausman (1978).
1488 Z. Griliches
How one estimates /3, y, and 6 depends on what one is willing to assume about
the world that generated such data. There are two kinds of assumptions possible:
The first is a “regression” approach, which assumes that the parameters which are
constant across different subsamples are the slope coefficients p, y, and 6 but
does not impose the restriction that CJ,’and CJ,’are the same across all the various
subsamples. There can be heteroscedasticity across samples as long as it is
independent from the parameters of interest. The second approach, the maximum
likelihood approach, would assume that conditional on z, y and x are distributed
normally and the missing data are a random sample from such a distribution.
This implies that crCt= IJ~ and uU”,= cr,2h.
The first approach starts by recognizing that under the general assumptions of
the model Sample A yields consistent estimates of p, y, and 6 with variance
covariance matrix I=. Then a “first order” procedure, i.e., one that estimates
missing x,‘s by f alone and does not iterate, is equivalent to the following:
Estimate /3,, To,, 8, from sample A, rewrite the y equation as
(5.5)
where E involves terms which are due to the discrepancy between the estimated /i
and 6 and their true population values. Then just estimate y from this “com-
pleted” sample by OLS.
It is clear that this procedure results in no gain in the efficiency of /3, since /?, is
based solely on sample A. It is also clear that the resulting estimate of y could be
improved somewhat using GLS instead of OLS.12
How much of a gain is there in estimating y this way? Let the size of sample A
be Nr and of B be N2. The maximum (unattainable) gain in efficiency would be
proportional to (Ni + N,)/N, (when u,” = 0). Ignoring the contribution of E’S,
which is unimportant in large samples, the asymptotic variance of y from the
Var(Yn+b)= [~~cr*+N,(a*+B:o,Z)]/(N,+~*)*u~~
and (5.6)
where CJ*= u,‘; and X = N,/(N, + N,). Hence efficiency will be improved as long
as p*u~/u’ c l/(1 - X), i.e. the unpredictable part of x (unpredictable from z) is
not too important relative to u *, the overall noise level in the y equation.13
Let us look at a few illustrative calculations. In the work to be discussed below,
y will be the logarithm of the wage rate, x is IQ, and z is schooling. IQ scores are
missing for about one-third of the sample, hence X = f. But the “importance” of
IQ in explaining wage rates is relatively small. Its independent contribution
(p”u,‘) is small relative to the large unexplained variance in y. Typical numbers
are j3 = 0.005, uU=12, and u = 0.4, implying
which is about equal to the 4’s one would have gotten ignoring the terms in the
brackets. Is this a big gain in efficiency? First, the efficiency (squared) metric may
be wrong. A more relevant question is by how much can the standard error of y
be reduced by incorporating sample B into the analysis. By about 18 percent
(J&6?? = 0.82) for these numbers. Is this much? That depends how large the
standard error of y was to start out with. In Griliches, Hall and Hausman (1978)
a sample consisting of about 1,500 individuals with complete information yielded
an estimate of y, = 0.0641 with a standard error of 0.0052. Processing another
700 plus observations could reduce this standard error to 0.0043, an impressive
but rather pointless exercise, since nothing of substance depends on knowing y
within 0.001.
If IQ (or some other missing variable) were more important, the gain would be
even smaller. For example, if the independent contribution of x to y were on the
order of a*, then with one-third missing, Eff((~=+,,)2: 3, and the standard devia-
tion of y would be reduced by only 5.7 percent. There would be no gain at all, if
the missing variable was one and a half times as important as the disturbance [or
more generally if j3“u,‘/u * > l/(1 - X)].
I3 Thus, remark 2 of Gomieroux and Monfort (1981, p. 583) is in error. The first-order method is
not always more efficient. But an “appropriately weighted first-order method,” GLS, will be more
efficient. See Nijman and Palm (1985).
1490 Z. Griliches
The efficiency of such estimates can be improved a bit more by allowing for the
implied heteroscedasticity in these estimates and by iterating further across the
samples. This is seen most clearly by noting that sample B yields an estimate of
7i = /3 + $3 with an estimated standard error a,,. This information can be blended
optimally with the sample A estimates of p, y, 6, and 2, using non-linear
techniques and maximum likelihood is one way of doing this.
If additional variables. which could be used to predict x but which do not
appear on their own accord in the y equation were available, then there is also a
possibility to improve the efficiency of the estimated p and not just of y. Again,
unless these variables are very good predictors of x and unless the amount of
complete data available is relatively small, the gains in efficiency from such
methods are unlikely to be impressive. (See Griliches, Hall and Hausman, 1978,
and Haitovsky, 1968, for some illustrative calculations.)
The maximum likelihood approaches differ from the “first-order” ones by
using also the dependent variable y to “predict” the missing x’s, and by
imposing restrictions on equality of the relevant variances across the samples. The
latter assumption is not usually made or required by the first order methods, but
follows from the underlying likelihood assumption that conditional on z, x and y
are jointly normally (or some other known distributions) distributed, and that the
missing values are missing at random. In the simple case where only one variable
is missing (or several variables are missing at exactly the same places), the joint
likelihood connecting y and x to z, which is based on the two equations
y=px+yz+e,
x=sz+v, (5.7)
with Ee = a2, Ev2 = q2, Eev = 0 can be rewritten in terms of the marginal
distribution function of y given z, and the conditional distribution function of x
given y and z, with corresponding equations:
y=cz+u,
x=dy+fi+w, (5.8)
and Eu2 = g2, Ew2 = h2, E wu = 0. Given the normality assumption, this is just
another way of rewriting the same model, with the new parameters related to the
old ones by
In this simple case the likelihood factors and one can estimate c and g2 from the
Ch. 25: Economic Data Issues 1491
Table 1
Earnings equations for NLS sisters: Various missing data estimators.a
Total Sample:
N=520
OLS with pre-
dicted IQ in 0.0423 0.00433 0.1186
missing portion* (0.00916) (0.00148)
LW 0.13488 0.12388
IQ 1.2936 187.71 -
SC 0.19749 11.0703 3.4476 0.23472 4.3408
‘Data Source: The National Longitudinal Survey of Young Women (see Center for Human
Resource Research, 1979).
complete sample; d, f, and h2 from the incomplete sample and solve back
uniquely for the original parameters p, y, 8, u2, and q2. In this way all of the
information available in the data is used and computation is simple, since the two
regressions (y on z in the whole sample and x on y and z in the complete data
portion) can be computed separately. Note, that while x is implicitly “estimated”
for the missing portion, no actual “predicted” value of x are either computed or
used in this framework.14
Table 1 illustrates the results of such computations when estimating a wage
equation for a sample of young women from the National Longitudinal Survey,
30 percent of which were missing IQ data. The first row of the table gives
t4Marini et al. (1980) describe such computations in the context of more than one set of variables
missing in a nested pattern.
1492 2. Griiiches
estimates computed solely from the complete data subsample. The second one
uses the schooling variable to estimate the missing IQ values in the incomplete
portion of the data and then re-computes the OLS estimates. The third row uses
GLS, reweighting the incomplete portion of the data to allow for the increased
imprecision due to the estimation of the missing IQ values. The last row reports
the maximum likelihood estimates. All the estimates are very close to each other.
Pooling the samples and “estimating” the missing IQ values increases the efficiency
of the estimated schooling coefficient by 29 percent. Going to maximum likeli-
hood adds another percentage point. While these gains are impressive, substan-
tively not much more is learned from expanding the sample except that no special
sample selectivity problem is caused by ignoring the missing data subset. The ~22
test for pooling yields the insignificant value of 0.8. That the samples are roughly
similar, also can be seen from computing the biased schooling coefficient (ignor-
ing IQ) in both matrices: it is equal to 0.057 (0.010) in the complete data subset
and 0.054 in the incomplete one.
The maximum likelihood computations get more complicated when the likeli-
hood does not factor as neatly as it does in the simple “nested” missing case. This
happens in at least two important common cases: (1) If the model is overiden-
tified then there are binding constraints between the L(ylz, 19,)and L(xly, z, 19~)
pieces of the overall likelihood function. For example, if we have an extra
exogenous variable which can help predict x but does not appear on its own in
the “structural” y equation, then there is a constraining relationship between the
8, and S, parameters and maximum likelihood estimation will require iterating
between the two. This is also the case for multi-equation systems where, say, x is
itself structurally endogenous because it is measured with error. (2) If the pattern
of “missingness” is not nested, if observations on some variables are missing in a
number of different patterns which cannot be arranged in a set of nested blocks,
then one cannot factor the likelihood function conveniently and one must
approach the problem of estimating it directly.
There are two related computational approaches to this problem: The first is
the EM algorithm (Dempster et al., 1977). This is a general approach to
maximum likelihood estimation where the problem is divided into an iterative
two-step procedure. In the E-step (estimation), the missing values are estimated
on the basis of the current parameter values of the model (in this case starting
with all the available variances and covariances) and an M-step (maximization) in
which maximum likelihood estimates of the model parameters are computed
using the “completed” data set from the previous step. The new parameters are
then used to solve again for the missing values which are then used in turn to
reestimate the model, and this process is continued until convergence is achieved.
While this procedure is easy to program, its convergence can be slow, and there
are no easily available standard error estimates for the final results (though Beale
and Little, 1975, indicate how they might be derived).
Ch. 25: Economic Data Issues 1493
(5.10)
with respect to 8. If 0 is exactly identified, the estimates are unique and can be
solved directly from the definition of 2 and the assumption that S is a consistent
estimator of it. If 8 is over-identified, then the maximum likelihood procedure
“fits” the model Z(0) to the data S as best as possible. If the observed variables
are multivariate normal this estimator is the Full Information Maximum Likeli-
hood estimator for this model. Even if the data are not multivariate normal but
follow some other distribution with E(sle) = 2(e), $is is a pseudo- or quasi-
maximum likelihood estimator yielding a consistent r3.15The correctness of the
computed standard errors will depend, however, on the validity of the normality
assumption. Robust standard errors for this model can be computed using the
approach of White.
There is no conceptual difficulty in generalizing this to a multiple sample
situation where the resulting Z;.(ei) may depend on somewhat different parame-
ters. As long as these matrices can be taken as arising independently, their
respective contributions to the likelihood function can be added up, and as long
as the ej’s have parameters in common, there is a return from estimating them
jointly. This can be done either utilizing the multiple samples feature of LISREL-V
(see Allison, 1981, and Joreskog and Sorbom, 1981) or by extending the
MOMENTS program (Hall, 1979) to the connected-multiple matrices case. The
estimation procedure combines these different matrices and their associated pieces
of the likelihood function, and then iterates across them until a maximum is
found. (See Bound, Griliches and Hall, 1984, for more exposition and examples.)
t=a+e,=(f+g)+e,,
s=Sa+h+e,=S(f+g)+(w+u)+e,, (5.11)
y=/&z+X(s_e,)+e,=7r(f+g)+y(w+u)+e,,
where 7 and p are the ratios of the variance of the family components to total
variance in the a and h factors respectively.
Given these assumptions, the expected values of the variance-covariance
matrix of all the observed variables across both members of a sib-pair is given by
t1 Sl Yl t2 $2 Y2
t, a*+a,2 6a2 77a2 ra 2 da* ma 2
31
S2a2 + h2 + uf 6na2 + yh2 d2a2 + ph* dma2 + pyh2 ,
Yl r2a2 + y2h2 + 03” m2a2 + py2h2
(5.13)
where only the 12 distinct terms of the overall 6 x 6 matrix are shown, since the
others are derivable by symmetry and by the assumption that all the relevant
variances (conditional on a set of exogenous variables) are the same across sibs.
With 10 unknown parameters this model would be under-identified without
Ch. 25: Economic Data Issues 1495
sibling data. This type of model was estimated by Bound, Griliches and Hall
(1984) using sibling data from the National Longitudinal Surveys of Young Men
and Young Women. l6 They had to face, however, a very serious missing data
problem since much of the data, especially test scores, were missing for one or
both of the siblings. Data were complete for only 164 brothers pairs and 151
sister pairs but additional information subject to various patterns of “missing-
ness” was available for 315 more male and 306 female siblings pairs and 2852 and
3398 unrelated male and female respondents respectively. Their final estimates
were based on pooling the information from 15 different matrices for each sex
and were used to test the hypothesis that the unobserved factors are the same for
both males and females in the sense that their loading (coefficients) are similar in
the male and female versions of the model and that the implied correlation
between the male and female family components of these factors was close to
unity. The latter test utilized the cross-sex cross-sib covariances arising from the
brother-sister pairs (N = 774) in these panels.
Such pooling of data reduced the estimated standard errors of the major
coefficients of interest by about 20 to 40 percent without changing the results
significantly from those found solely in their “complete data” subsample. Their
major substantive conclusion was that taking out the mean differences in wages
between young males and females, one could not detect significant differences in
the impact of the unobservables or in their patterns between the male and female
portions of their samples. As far as the IQ-Schooling part of the model is
concerned, families and the market appeared to be treating brothers and sisters
identically.
A class of similar problems occurs in the time series context: missing data at
some regular time intervals, the “construction” of quarterly data from annual
data and data on related time series, and other “interpolation” type issues. Most
of these can be tackled using adaptations of the methods described above, except
for the fact that there is usually more information available on the missing values
and it makes sense to adapt these methods to the structure of the specific
problem. A major reference in this area is Chow and Lin (1971). More recent
references are Harvey and Pierse (1982) and Palm and Nijman (1984).
“Ask not what you can do to the data but rather what the data can do
for you.”
Every econometric study is incomplete. The stated model usually lists only the
“major” variables of interest and even then it is unlikely to have good measures
for all of the variables on the already foreshortened list. There are several ways in
“?he cited paper uses a more detailed 4 equation model based on an additional “early” wage rate.
1496 Z. Griliches
which econometricians have tried to cope with these facts of life: (1) Assume that
the left-out components are random, minor, and independent of all the included
exogenous variables. This throws the problem into the “disturbance” and leaves it
there, except for possible considerations of heteroscedasticity, variance-compo-
nents, and similar adjustments, which impinge only on the efficiency of the usual
estimates and not on their consistency. In many contexts it is difficult, however, to
maintain the fiction that the left-out-variables are unrelated to the included ones.
One is pushed than into either, (2), a specification sensitivity analysis where the
direction and magnitude of possible biases are explored using prior information,
scraps of evidence, and the standard left-out-variable bias formulae (Griliches
1957 and Chapter 5) or (3) one tries to transform the data so as to minimize the
impact of such biases.
In this section, I will concentrate on this third way of coping which has used
the increasingly available panel data sets to try to get around some of these
problems. Consider, then, the standard panel data set-up:
Both of these assumptions are in principle testable, but are rarely questioned in
practice. Unless there is some kind of stability in /?, unless there is some interest
in its central moments, it is not clear why one would engage in estimation at all.
Since the longitudinal dimension of such data is usually quite short (2-10 years),
it makes little sense to allow /3 to change over time, unless one has a reasonably
clear idea and a parsimonious parameterization of how such changes happen.
(The fact that the p’s are just coefficients of a first order linear approximation to
a more complicated functional relationship and hence should change as the level
of X’S changes can be allowed for by expanding the list of x’s to contain higher
order terms.)
The assumption that pi = p, that all individuals respond alike (up to the
additive terms, the zi, which can differ across individuals), is one of the more
Ch. 25: Economic Data Issues 1491
Y(t)Zjl=Zj, (6.4)
I have only two cautionary comments on this topic: As is true in many other
contexts, and as was noted earlier, solving one problem may aggravate another. If
there are two reasons for the zit, e.g. both “fixed” effects and errors in variables,
then
plim(&--P)=b,,,-PA,, (6.7)
where b, x is the auxiliary regression coefficient in the projection of the (Y;‘son the
x’s, whiie A,= u~*/u,” is the error variance ratio in x. Going “within”, on the
other hand, would eliminate the first term and leave us with
where the constancy of the /3’s is imposed across individuals and across time. The
empirical problem is how does one estimate, say, 9 p’s if one only has four to five
“The following discussion borrows heavily from Pakes and Griliches (1984).
1500 Z. Griliches
years history on the y’s and x ‘s. In general this is impossible. If the length of the
lag structure exceeds the available data, then the data cannot be informative
about the unseen tail of the lag distribution without the imposition of stronger
a priori restrictions. There are at least two ways of doing this: (a) We can assume
something strong about the /3’s. For example, that they decline geometrically
after a few free terms, that &+i = A/?,. This leads us back to the geometric lag
case which we know more or less how to handle.‘* (b) We can assume something
about the unseen x’s, that they were constant in the past (in which case we are
back to the fixed effects with a changing coefficient case), or that they follow some
simple low order autoregressive process (in which case their influence on the
included x’s dies out after a few terms).
Before proceeding along these lines, it is useful to recall the notion of the
II-matrix, introduced in Chapter 22, which summarizes all the (linear) informa-
tion contained in the standard time series -cross section panel model. This
approach, due to Chamberlain (1982), starts with the set of unconstrained
multivariate regressions, relating each year’s y,, to all of the available x’s, past,
present, and future. Consider, for example, the case where data on y are available
for only three years (T = 3) and on x’s for four. Then the IT matrix consists of
the coefficients in the following set of regressions:
where we have ignored constants to simplify matters. Now all that we know from
our sample about the relationship of the y’s to the x’s is summarized in these 7r’s
(or equivalently in the overall correlation matrix between all the y’s and the x ‘s),
and any model that we shall want to fit will impose a set of constraints on it.19
A series of increasingly complex possible worlds can be written as:
C. yit=~oXir+~1(Xit-1+hxif_2+~2x,t_3+ ..‘)+e;t,
d. Yjt=PoXir+P1(Xi,_1+XXi,-2+~2Xir-3’ *.*)+a,+ei,,
(6.13)
e. _Yi,= POxit + Plxit- 1+ P2Xir_2 + P3Xi*-3 + P4x**p4 ’’. + eiz,
“See Anderson and Hsiao (1982) and Bhargava and Sargan (1983).
“There may be, of course, additional useful information in the separate correlation matrices
between all of the y’s and all the x’s respectively.
Ch. 25: Economic Data Issues 1501
going from the simple one lag, no fixed effects case (a) to the arbitrary lag
structure with the one factor correlated effects structure (f). For each of these
cases we can derive the expected value of II. It is obvious that (a) implies
For the b case, fixed effects with no lags, we need to define the wide sense least
squares projection (E*) of the unseen effects (CQ)on all the available x’s
Then
63 82 6, + PO 8, + P1
II(b)= 6, 6, +I-% h+P1 43 .
1 h+Po &+P1 6, 4l
To write down the II matrix for c, the geometric lag case, we rewrite (6.11) as
and (6.14) as
m3 m2 m,+& mo+&
case we have seven unknown parameters to estimate (4 m’s, 2 p ‘s, and A) from
the 12 unconstrained I7 coefficients.20
Adding fixed effects on top of this, as in d, adds another four coefficients to be
estimated and strains identification to its limit. This may be feasible with larger T
but the data are unlikely to distinguish well between fixed effects and slowly
changing initial effects, especially in short panels.
Perhaps a more interesting version is represented by (6.13e), where we are
unwilling to assume an explicit form for the lag distribution since that happens to
be exactly the question we wish to investigate, but are willing instead to assume
something restrictive about the behavior of the x’s in the unseen past; specifically
that they follow an autoregressive process of low order. In the example sketched
out, we never see x_1, x_2 and x_~, and hence cannot identify /3, (or even &)
but may be able to learn something about PO, &, and &. If the x’s follow a first
order autoregressive process, then it can be shown (see Pakes and Griliches, 1984)
that in the projection of x_, on all the observed x’s
only the last coefficient is non-zero, since the partial correlation of x _-T with all
the subsequent x’s is zero, given its correlation with x0. If the x’s had followed a
higher order autoregression, say third order, then the last three coefficients would
be non-zero. In the first order case the II matrix is
where now only PO, & and p2 are identified from the data. Estimation proceeds
by leaving the last column of II free and constraining the rest of it to yield the
parameters of interest. 21 If we had assumed that the x’s are AR(2), we would be
able to identify only the first two p’s, and would have to leave the last two
columns of II free.
20An alternative approach would take advantage of the geometric nature of the lag structure, and
use lagged values of the dependent variable to solve out the unobserved z,‘s. Using the lagged
dependent variables formulation would introduce both an errors-in-variables problem (since y,-,
proxies for z subject to the e,_ 1 error) and a potential simultaneity problem due to their correlation
with the a,% (even if the (I’S are not correlated with the x’s). Instruments are available, however, in
the form of past y’s and future x’s and such a system is estimable along the lines outlined by
Bhargava and Sargan (1983).
21This is not fully efficient. If we really believe that the x’s follow a low order Markov process with
stable coeffiecients over time (which is not necessary for the above), then the equations for x can be
appended to this model and the g’s would be estimated jointly, constraining this column of II also.
Ch. 25: Economic Data Issues 1503
where I have normalized m, = 1. The first three /3’s should be identified in this
model but in practice it may be rather hard to distinguish between all these
parameters, unless T is significantly larger than 3, the underlying samples are
large, and the x’s are not too collinear.
Following Chamberlain, the basic procedure in this type of model is first to
estimate the unconstrained version of the II matrix, derive its correct
variance-covariance matrix allowing for the heteroscedasticity introduced by our
having thrust those parts of the (Y~and zi which are uncorrelated with the x’s into
the random term (using the formulae in Chamberlain 1982, or White 1980) and
then impose and test the constraints implied by the specific version deemed
relevant.
Note that it is quite likely (in the context of larger T) that the test will reject all
the constraints at conventional significance levels. This indicates that the underly-
ing hypothesis of stability over time of the relevant coefficient may not really
hold. Nevertheless, one may still use this framework to compare among several
more constrained versions of the model to see whether the data indicate, for
example, that “if you believe in a distributed lag model with fixed coefficients,
then two terms are better than one.”
Some of these ideas are illustrated in the following empirical example which
considers the ubiquitous question of “capital.” What is the appropriate way to
define it and measure it? This is, of course, an old and much discussed question to
which the theoretical answer is that in general it cannot be done in a satisfactory
fashion (Fisher, 1969) and that in practice it depends very much on the purpose at
hand (Griliches, 1963). There is no intention of reopening the whole debate here
(see the various papers collected in Usher 1980 for a review of the recent state of
this topic); the focus is rather on the much narrower question of what is the
appropriate functional form for the depreciation or deterioration function used in
the construction of conventional capital stock measures. Almost all of the data
used empirically are constructed on the basis of conventional “length of life”
assumptions developed for accounting and tax purposes and based on very little
direct evidence on the pattern of capital services over time. These accounting
1504 2. Griliches
estimates are then taken to imply rather sharp declines in the service flows of
capital over time using either the straight line or double declining balance
depreciation formulae. Whatever independent evidence there is on this topic
comes largely from used assets markets and is heavily contaminated by the effects
of obsolescence due to technical improvements in newer assets.
Pakes and Griliches (1984) present some direct empirical evidence on this
question. In particular they asked: What is the time pattern of the contribution of
past investments to current profitability? What is the shape of the “deterioration
of services with age function” (rather than the “decline in present value”
patterns)? All versions of capital stock measures can be thought of as weighted
sums of past investments:
K, = cw,I,-,, (6.18)
(6.19)
where e, is the ex post discrepancy between expected and actual profits assumed
to be uncorrelated with the ex ante optimally chosen I ‘s. Given a series on II,
and I,, in principle one could estimate all the w parameters except for the
problem that one rarely has a long enough series to estimate them individually,
especially in the presence of rather high multi-collinearity in the 1’s. Pakes and
Griliches used panel data on U.S. firms to get around this problem, which greatly
increases the available degrees of freedom. But even then, the available panel data
are rather short in the time dimension (at least relative to the expected length of
life of manufacturing capital) and hence some of the methods described above
have to be used.
They used data on the gross profits of 258 U.S. manufacturing firms for the
nine years 1964-72 and their gross investment (deflated) for 11, years 1961-71.
Profits were deflated by an overall index of the average gross rate of return
(1972 = 100) taken from Feldstein and Summers (1977) and all the observations
were weighted inversely to the sum of investment over the whole 1961-71 period
to adjust roughly for the great heteroscedasticity in this sample. Model (6.13f) of
the previous section was used. That is, they tried to estimate as many uncon-
strained w terms as possible asking whether these coefficients in fact decline as
rapidly as is assumed by the standard depreciation formulae. To’ identify the
model, it was assumed that in the unobserved past the I ‘s followed an autoregres-
Ch. 25: Economic Data Issues 1505
‘*For a methodologically related study see Hall, Griliches and Hausman (1983) which tried to figure
out whether there is a significant “tail” to the patents as a function of past R&D expenditures lag
structure.
1506 Z. Griliches
Table 2
The relationship of profits to past investment expenditures for U.S. manufacturing firms:
Parameter estimates allowing for heterogeneity.*
‘fi = Estimated covariance matrix of the disturbances from the system of profit eqs. (across years).
For the free II matrix: trace fi = 253.6
*The dependent variable is gross operating income deflated by the implicit GNP deflator and an
index of the overall rate of return in manufacturing (1972 =l.O). The w, refer to the coefficients of
gross investment expenditures in period t-r deflated by the implicit GNP producer durable
investment deflator. kz and kz are deflated Compustat measures of net and gross capital at the
beginning of the year. kg, refers to undeflated gross capital in 1961 as reported by Compustat. All
variables are divided by the square root of the firm’s mean investment expenditures over the 1961-71
period. Dummy variables for the nine time periods are included in all equations. N = 258 and T = 9.
The overall fit, measured by 1 -(trace h/1208.4), 1208.4 =X:3;,, where $, is the sample variance
in _r,, is 0.72 for the model in Column 2 as against 0.79 for the free FI matnx.
From: Pakes and Griliches (;984).
Ch. 25: Economic Data Issues 1507
7. Final remarks
Over 30 years ago Morgenstern (1950) asked whether economic data were
accurate enough for the purposes that economists and econometricians were using
them for. He raised serious doubts about the quality of many economic series and
implicitly about the basis for the whole econometrics enterprise. Years have
passed and there has been very little coherent response to his criticisms.
There are basically four responses to his criticism and each has some merit: (1)
The data are not that bad. (2) The data are lousy but it does not matter. (3) The
data are bad but we have learned how to live with them and adjust for their
foibles. (4) That is all there is. It is the only game in town and we have to make
the best of it.
There clearly has been great progress both in the quality and quantity of the
available economic data. In the U.S. much of the agricultural statistical data
collection has shifted from judgment surveys to probability based survey sam-
pling. The commodity converge in the various official price indexes has been
greatly expanded and much more attention is being paid to quality change and
other comparability issues. Decades of criticisms and scrutiny of official statistics
have borne some fruit. Also, some of the aggregate statistics have now much more
extensive micro-data underpinnings. It is now routine, in the U.S., to collect large
periodic labor force activity and related topics surveys and release the basic
micro-data for detailed analysis with relatively short lags. But both the improve-
ments in and the expansion of our data bases have not really disposed of the
questions raised by Morgenstern. As new data appear, as new data collection
methods are developed, the question of accuracy persists. While quality of some
of the “central” data has improved, it is easy to replicate some of Morgenstem’s
horror stories even today. For example, in 1982 the U.S. trade deficit with Canada
was either $12.8 or $7.9 billion depending on whether this number came from
U.S. or Canadian publications. It is also clear that the national income statistics
for some of the LDC’s are more political than economic documents (Vernon,
1983).23
Morgenstem did not distinguish adequately between levels and rates of change.
Many large discrepancies represent definitional differences and studies that are
mostly interested in the movements in such series may be able to evade much of
this problem. The tradition in econometrics of allowing for “constants” in most
relationships and not over-interpreting them, allows implicitly for permanent
23See also Prakash (1974) for a collection of confidence shattering comparisons of measures of
industrial growth and trade for various developing countries based on different sources.
1508 Z. Griliches
“errors” in the levels of the various series. It is also the case that in much of
economic analysis one is after relatively crude first order effects and these may be
rather insensitive even to significant inaccuracies in the data. While this may be
an adequate response with respect to much of the standard especially macro-eco-
nomic analysis, it seems inadequate when we contemplate some of the more
recent elaborate non-linear multi-equational models being estimated at the fron-
tier of the subject. They are much more likely to be sensitive to errors and
inconsistencies in the data.
In the recent decade there has been a revival of interest in “error” models in
econometrics, though the progress in sociology on this topic seems more impres-
sive. Recent studies using micro-data from labor force surveys, negative-tax
experi*ments and similar data sources exhibit much more sensitivity to measure-
ment error and sample selectivity problems. Even in the macro area there has
been some progress (see de Leeuw and McKelvey, 1983) and the “rational
expectations” wave has made researchers more aware of the discrepancy between
observed data and the underlying forces that are presumably affecting behavior.
All of this has yet to make a major dent on econometric textbooks and
econometric teaching but there are signs that change is coming.24 It is more
visible in the areas of discrete variable analysis and sample selectivity issues, (e.g.
note the publication of the Maddala (1983) and Man&i-McFadden (1981)
monographs) than in the errors of measurement area per se, but the increased
attention that is devoted to data provenance in these contexts is likely to spill over
into a more general data “aware” attitude.
One of the reasons why Morgenstem’s accusations were brushed off was that
they came from “outside” and did not seem sensitive to the real difficulties of
data collection and data generation. In most contexts the data are imperfect not
by design but because that is all there is. Empirical economists have over
generations adopted the attitude that having bad data is better than having no
data at all, that their task is to learn as much as is possible about how the world
works from the unquestionably lousy data at hand. While it is useful to alert users
to their various imperfections and pitfalls, the available economic statistics are
our main window on economic behavior. In spite of the scratches and the
persistent fogging, we cannot stop peering through it and trying to understand
“Theil(l978) devotes five pages out of 425 to this range of problems. Chow (1983) devotes only six
pages out of 400 to this topic directly, but does return to it implicitly in the discussion of rational
expectations models. Dhrymes (1974) does not mention it explicitly at all, though some of it is implicit
in his discussion of factor analysis. Dhrymes (1978) does devote about 25 pages out of 500 to this
topic. Maddala (1977) and Malinvaud (1980) devote separate chapters to the EVM, though in both
cases these chapters represent a detour from the rest of the book. The most extensive textbook
treatment of the EVM and related topics appears in a chapter by Judge et al. (1980). The only book
that has some explicit discussion of economic data is Intriligator (1978). Except for the sample
selection literature there is rarely any discussion of the processes that generate economic data and the
resultant implications for econometric practice.
Ch. 25: Economic Data Issues 1509
what is happening to us and to our environment, nor should we. The problematic
quality of economic data presents a continuing challenge to econometricians. It
should not cause us to despair, but we should not forget it either.
In this somewhat disjointed survey, I discussed first some of the long standing
problems that arise in the encounter between the practicing econometrician and
the data available to him. I then turned to the consideration of three data related
topics in econometrics: errors of measurement, missing observations and incom-
plete data sets, and missing variables. The last topic overlapped somewhat with
the chapter on panel analysis (Chapter 22), since the availability of longitudinal
microdata has helped by providing us with one way of controlling for missing but
relatively constant information on individuals and firms. It is difficult, however, to
shake off the impression that here also, the progress of econometric theory and
computing ability is outracing the increased availability of data and our under-
standing and ability to model economic behavior in increasing detail. While we
tend to look at the newly available data as adding degrees of freedom grist to our
computer mills, the increased detail often raises more questions than it answers.
Particularly striking is the great variety of responses and differences in behavior
across firms and individuals. Specifying additional distributions of unseen param-
eters rarely adds substance to the analysis. What is needed is a better understand-
ing of the behavior of individuals, better theories and more and different
variables. Unfortunately, standard economic theory deals with “representative”
individuals and “big” questions and does not provide much help in explaining the
production or hiring behavior of a particular plant at a particular time, at least
not with the help of the available variables. Given that our theories, while
couched in micro-language, are not truly micro-oriented, perhaps we should not
be asking such questions. Then what are we doing with microdata? We should be
using the newly available data sets to help us find out what is actually going on in
the economy and in the sectors that we are analyzing without trying to force our
puny models on them. 2s The real challenge is to try to stay open, to learn from
the data, but also, at the same time, not drown in the individual detail. We have
to keep looking for the forest among all these trees.
References
Aasness, J. (1983) “Engle Functions, Distribution of Consumption and Errors in Variables”. Paper
presented at the European Meeting of the Econometric Society in Pisa, Oslo: Institute of Econom-
ics.
Aigner, D. J. (1973) “Regression with a Binary Independent Variable Subject to Errors of Observa-
tion”, Journal of Economeirics, 17, 49-59.
25An important issue not discussed in this chapter is the testing of models which is a way of staying
open and allowing the data to reject our stories about them. There is a wide range of possible tests that
models can and should be subjected to. See, e.g. Chapters 5, 13, 14, 15, 18, 19, and 33 and Hausman
(1978) and Hendry (1983).
1510 2. Griliches
Allison, P. D. (1981) “Maximal Likelihood Estimation in Linear Models When Data Are Missing”,
Sociological Methodology.
Anderson. T. W. and C. Hsiao (1982) “Formulation and Estimation of Dvnamic Models Usine Panel
Data”, Journal of Econometrics, 18(l), 47-82.
Beale, E. M. L. and R. J. A. Little (1975) “Missing Values in Multivariate Analysis”, Journal of the
Royal Statistical Society, Ser. B., 37, 129-146.
Berkson, J. (1950) “Are There Two Regressions. v”, Journal of the American Statistical Association, 45,
164-180.
Bhargava, A. and D. Sargan (1983) “Estimating Dynamic Random Effects Models from Panel Data
Coveming Short Time Periods”, Econometrica, 51(6), 1635-1660.
Bielby, W. T.. R. M. Hauser and D. L. Featherman (1977) “Response Errors of Non-Black Males in
Models of the Stratification Process”, in: Aigner and Goldberger, eds., Latent Variables in Socioeco-
nomic Models. Amsterdam: North-Holland Publishing Company, 227-251.
Borus, M. E. (1982) “An Inventory of Longitudinal Data Sets of Interest to Economists”, Review of
Public Data Use, lO(l-2), 113-126.
Borus, M. E. and G. Nestel (1973) “Response Bias in Reports of Father’s Education and Socioeco-
nomic Status”, Journal of the American Statistical Association, 68(344), 816-820.
Bound, J., Z. Griliches and B. H. Hall (1984) “Brothers and Sisters in the Family and Labor Market”.
NBER Working Paper No. 1476. Forthcoming in International Economic Review.
Bowles, S. (1972) “Schooling and Inequality from Generation to Generation”, Journal of Political
Econom.y, Part II, 80(3), S219-S251.
Center for Human Resource Research (1979) The National Longitudinal Survey Handbook. Columbus:
Ohio State University.
Chamberlain, Gary (1977) “An Instrumental Variable Interpretation of Identification in Variance
Components and MIMIC Models”, Chapter 7, in: P. Taubman, ed., Kinometrics. Amsterdam:
North-Holland Publishing Company, 2351254.
Chamberlain. Garv (1980) “Analvsis of Covariance with Oualitative
. Data”. Reuiew of_I Economic
Studies, 47(l), 2251238.’ *
Chamberlain, Gary (1982) “Multivariate Regression Models for Panel Data”, Journal of Econometrics,
18(l), 5-46.
Chamberlain, G. and Z. Griliches (1975) “Unobservables with a Variance-Components Structure:
Ability, Schooling and the Economic Success of Brothers”, International Economic Review, 16(2),
422-449.
Chamberlain, Gary (1977) “More on Brothers”, in: P. Taubman, ed., Kinometrics: Determinants of
Socioeconomic Success Within and Between Families. New York: North-Holland Publishing Com-
pany, 97-124.
Chow, G. C. (1983) Econometrics. New York: McGraw Hill.
Chow, G. C. and A. Lin (1971) “Best Linear Unbiased Interpolation, Distribution and Extrapolation
of Time Series by Related Series”, Review of Economics and Statistics, 53(4), 372-375.
Cole, R. (1969) Error in Provisional Estimates of Gross National Product. Studies in Business Cycles
#21, New York: NBER.
Council on Wage and Price Stability (1977) The Wholesale Price Index: Review and Evaluation.
Washington: Executive Office of the President.
Court, A. T. (1939) “Hedonic Price Indexes with Automotive Examples”, in: The D.ynamics of
Automobile Demand. New York: General Motors Corporation, 99-117.
de Leeuw, F. and M. J. McKelvey (1983) “A ‘True’ Time Series and Its Indicators”, Journal of the
American Statistical Association, 78(381), 37-46.
Dempster, A. P., N. M. Laird and D. B. Rubin (1977) “Maximum Likelihood from Incomplete Data
via the EM Algorithm”, Journal of the Royal Statistical Society, Ser. B, 39(l), l-38.
Dhrymes, P. J. (1974) Econometrics. New York: Springer-Verlag.
Dhrymes, P. J. (1978) Introductory Econometrics. New York: Springer-Verlag.
Diewert, W. E. (1980) “Aggregation Problems in the Measurement of Capital”; in: D. Usher, ed., The
Measurement of Capital, Studies in Income and Wealth. University of Chicago Press for NBER, 45,
433-538.
Eicker, F. (1967) “Limit Theorems for Regressions with Unequal and Dependent Errors”, in:
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley:
University of California, Vol. 1.
Ch. 25: Economic Data Issues 1511
Feldstein, M. and L. Summers (1977) “Is the Rate of Profit Falling?“, Brookings Papers on Economic
Activity, 211-227.
Ferber, R. (1966) “The Reliability of Consumer Surveys of Financial Holdings: Demand Deposits”,
Journal of the American Statistical Association, 61(313), 91-103.
Fisher, F. M. (1969) “The Existence of Aggregate Production Functions”, Econometrica, 37(4),
553-577.
Fisher, F. M. (1980) “The Effect of Sample Specification Error on the Coefficients of ‘Unaffected’
Variables” in: L. R. Klein, M. Nerlove and S. C. Tsiang, eds., Quantitative Economics and
Development. New York: Academic Press, 157-163.
Freeman, R. B. (1984) “Longitudinal Analyses of the Effects of Trade Unions”, Journal of Labor
Economics, 2(l), l-26.
Friedman, M. (1957) A Theory of the Consumption Function. NBER General Series 63, Princeton:
Princeton University Press.
Frisch, R. (1934) Statistical Confluence Analysis by Means of Complete Regression Systems. Oslo:
University Economics Institute, Publication No. 5.
Gordon, R. J. (1982) “Energy Efficiency, User-Cost Change, and the Measurement of Durable Goods
Prices”. in: M. Foss, ed.. NBER. Studies in Income and Wealth.The U.S. National Income and
Products Accounts. Chicago: University of Chicago Press, 47, 205-268.
Gordon, R. J. (1985) The Measurement of Durable Goods Prices, unpublished manuscript.
Gourieroux, C. and A. Monfort (1981) “On the Problem of Missing Data in Linear Models”, Review
of Economic Studies, XLVIII(4), 579-586.
Griliches, Z. (1957) “Specification Bias in Estimates of Production Functions”, Journal of Farm
Economics, 39(l), 8-20.
Griliches, Z. (1961) “Hedonic Price Indexes for Automobiles: An Econometric Analysis of Quality
Change”, in: The Price Statistics of the Federal Government, NBER, 173-196.
Griliches. Z. (1963) “Canital Stock in Investment Functions: Some Problems of Concent and
Measurement”, in! Chrid, et al., eds., Measurement in Economics. Studies in Mathematical Econom-
its and Econometrics in Memory of Yehuda Grunfeld. Stanford: Stanford University Press,
115-137.
Griliches, Z. (1970) “Notes on the Role of Education in Production Functions and Growth
Accounting”, in: W. L. Hansen, ed., Education, Income and Human Capital. NBER Studies in
Income and Wealth. 35, 71-127.
Griliches, Z. (1971) Price Indexes and Quality Change. Cambridge: Harvard University Press.
Griliches, Z. (1974) “Errors in Variables and Other Unobservables”, Econometrica, 42(6), 971-998.
Griliches, Z. (1977) “Estimating the Returns to Schooling: Some Econometric Problems”,
Econometrica, 45(l), l-22.
Griliches, Z. (1979) “Sibling Models and Data in Economics: Beginnings of a Survey”, Journal of
Political Economy, Part 2, 87(5), S37-S64.
Griliches, Z., B. H. Hall and J. A. Hausman (1978) “Missing Data and Self-Selection in Large
Panels”, Annales de L’INSEE, 30-31, 138-176.
Griliches, Z. and J. A. Hausman (1984) “Errors-in-Variables in Panel Data”, NBER Technical Paper
No. 37, forthcoming in Journal of Econometrics.
Griliches, Z. and J. Mairesse (1984) “Productivity and R&D at the Firm Level”, in: Z. Griliches, ed.,
R&D, Patents und Productivity. NBER, Chicago: University of Chicago Press, 339-374.
Griliches, Z. and W. M. Mason (1972) “Education, Income and Ability”, Journal of Political
Economy, Part II, 80(3), S74-S103.
Griliches, Z. and V. Ringstad (1970) “Error in the Variables Bias in Non-Linear Contexts”,
Econometrica, 38(2), 368-370.
Griliches, Z. (1971) Economies of Scale and the Form of the Production Function. Amsterdam:
North-Holland.
Haitovsky, Y. (1968) “Estimation of Regression Equations When a Block of Observations is Missing”,
ASA, Proceedings of the Business and Economic Statistics Section, 454-461.
Haitovskv, Y. (1972) “On Errors of Measurement in Rearession Analysis in Economics”, Interna-
tional S>atistical Review, 40(l), 23-35.
Hall. B. H. (1979) Moments: The Moment Matrix Processor User Manual. Stanford: California
Hall, B. H., %Z.Griliches and J. A. Hausman (1983) “Patents and R&D: Is There A Lag Structure?‘.
NBER Working Paper No. 1227.
1512 2. Griliches
Hamilton, L. C. (1981) “Self Reports of Academic Performance: Response Errors Are Not Well
Behaved”, Sociological Methods and Research, 10(2), 165-185.
Harvey, A. C. and R. G. Pierse (1982) “Estimating Missing Observations in Economic Time Series”.
London: London School of Economics Econometrics Programme Discussion Paper No. A33.
Hauser, R. M. and A. S. Goldberger (1971) “The Treatment of Unobservable Variables in Path
Analysis”, in: H. L. Costner, ed., Sociological Methodology 1971. San Francisco: Jossey-Bass,
81-117.
Hausman, J. A. (1978) “Specification Tests in Econometrics”, Econometrica, 46(6), 1251-1271.
Hausman, J. A. (1982) “The Econometrics of Non Linear Budget Constraints”, Fisher-Schultz
Lecture given at the Dublin Meetings of the Econometric Society, Econometrica, forthcoming.
Hausman, J. A., B. H. Hall and Z. Griliches (1984) “Econometric Models for Count Data with
Application to the Patents- R&D Relationship”, Econometrica, 52(4), 909-938.
Hausman, J. A. and W. E. Taylor (1981) “Panel Data and Unobservable Individual Effects”,
Econometrica, 49(6), 1377-1398.
Hausman, J. A. and M. Watson (1983) “Seasonal Adjustment with Measurement Error Present”.
National Bureau of Economic Research Working Paper No. 1133.
Hausman, J. A. and D. Wise, eds. (1985) S ocial Experimentation. NBER, Chicago: University of
Chicago Press, forthcoming.
Hendry, D. F. (1983) “Econometric Modelling: The ‘Consumption Function’ in Retrospect”, Scottish
Journal of Political Economy, 30, 193-220.
Intriligator, M. D. (1978) Econometric Models, Techniques and Applications. Englewood Cliffs:
Prentice-Hall.
Joreskog, K. and D. Sorbom (1981) LISRELV, Analysis of Linear Structural Relationships by
Maximum Likelihood and Least Squares Method. Chicago: National Educational Resources.
Judge, G. G., W. R. Griffiths, R. C. Hill and T. C. Lee (1980) The Theory and Practice of Econometrics.
New York: Wiley.
Karni. E. and I. Weissman (1974) “A Consistent Estimator of the Slope in a Regression Model with
Errors in the Variables”, Journal of the American Statistical Association, 69(345), 211-213.
Klepper, S. and E. E. Learner (1983) “Consistent Sets of Estimates for Regressions with Errors in All
Variables”, Econometrica, 52(l), 163-184.
Kruskal, W. H. and L. G. Telser (1960) “Food Prices and The Bureau of Labor Statistics”, Journal of
Business, 33(3), 258-285.
Kuznets, S. (1954) Nutional Income and Its Composition 1919- 1938. New York: NBER.
Kuznets, S. (1971) “Data for Quantitative Economic Analysis: Problems of Supply and Demand”.
Lecture delivered at the Federation of Swedish Industries. Stockholm: Kungl Boktryckeriet P. A.
Norsted and Soner.
Little, R. J. A. (1979) “Maximum Likelihood Inference for Multiple Regressions with Missing Values:
A Simulation Study”, Journal of the Royal Statistical Society, Ser. B. 41(l), 76-87.
Little, R. J. A. (1983) “Superpopulation Models for Non-Response”, in: Madow, Olkin and Rubin,
eds., National Academy of Sciences, Incomplete Data in Sample Surveys. New York: Academic
Press, Part VI, II, 337-413.
Little, R. J. A. (1982) “Models for Non-Reponse in Sample Surveys”, Journal of the American
Statistical Association, 77(378), 237-250.
MaCurdy. T. E. (1982) “The Use of Time Series Processes to Model the Error Structure of Earnings in
Longitudinal Data Analysis”, Journal of Econometrics, 18(l), 83-114.
Maddala, G. S. (1971) “The Use of Variance Components Models in Pooling Cross Section and Time
Series Data”, Econometrica, 39(2), 341-358.
Maddala. G. S. (1977) Econometrics. New York: McGraw Hill,
Maddala, G. S. (1983) Limited-Dependent and Qualitative Variables in Econometrics. Cambridge:
Cambridge University Press.
Malinvaud, E. (1980) Stutisticul Methods of Econometrics. 3rd revised ed., Amsterdam: North-Holland.
Manski, C. F. and D. MacFadden, eds. (1981) Structural Analysis of Discrete Dam with Econometric
Applicutions. Cambridge: MIT Press.
Mare, R. D. and W. M. Mason (1980) “Children’s Report of Parental Socioeconomic Status: A
Multiple Group Measurement Model”, Sociological Methods und Research, 9, 178-198.
Marisk M. M., A. R. Olsen and D. B. Rubin (1980) “Maximum-Likelihood Estimation in Panel
Studies with Missing Data”, Sociological Methodology 1980, 9. 315-357.
Ch. 25: Economic Data Issues 1513
Massagli, M. P. and R. M. Hauser (1983) “Response Variability in Self- and Proxy Reports of
Paternal and Filial Socioeconomic Characteristics”, American Journal of Sociology, 89(2), 420-431.
Medoff, J. and K. Abraham (1980) “Experience, Performance, and Earnings”, Quartet+ Journal of
Economics, XVC(4), 703-736.
Morgenstem, 0. (1950) On the Accuracy of Economic Observations. Princeton: Princeton University
Press, 2nd edition, 1963.
Mundlak, Y. (1978) “On the Pooling of Time Series and Cross Section Data”, Econometrica, 46(l),
69-85.
Mundlak, Y. (1980) “Cross Country Comparisons of Agricultural Productivity”. Unpublished
manuscript.
National Academy of Sciences (1979) Measurement and Interpretation of Productivity. Washington,
D.C.
National Academy of Sciences (1983) in: Madow, Olkin and Rubin, eds., Incomplete Data in Sample
Surveys. New York: Academic Press, Vol. 1-3.
National Bureau of Economic Research (1961) The Price Statistics of the Federal Government, Report
of the Price Statistic Review Committee, New York: General Series, No. 73.
National Bureau of Economic Research (1957a) Studies in Income and Wealth, Problems of Capital
Formation: Concepts, Measurement, and Controlling Factors. New York: Arno Press, Vol. 19.
National Bureau of Economic Research (1957b) Studies in Income and Wealth, Problems in Interna-
tional Comparisons of Economic Accounts. New York: Amo Press, Vol. 20.
National Bureau of Economic Research (1958) Studies in Income and Wealth. A Critiaue of the United
States Income and Products Accounts. hew York: Amo Press, Vol. 22. ’ ,
National Bureau of Economic Research (1961) Studies in Income and Wealth, Output, Input and
Productivity Measurement. New York: NBER, Vol. 25.
National Bureau of Economic Research (1969) Studies in Income and Wealth, V. R. Fuchs, ed.,
Production and Productivity in the Service Industries. New York: Columbia University Press, Vol. 34.
National Bureau of Economic Research (1973) Studies in llncome’and / Wealth, M. Moss, ed., The
Measurement of Economic and Social Performance. New York: Columbia University Press, Vol. 38.
National Bureau of Economic Research (1983a) Studies in Income and Wealth, M. Foss, ed., The U.S.
National Income and Product Accounts. Chicago: University of Chicago Press, Vol. 47.
National Bureau of Economic Research (1983b) Studies in Income and Wealth, J. Triplett, ed., The
Measurement of Labor Cost. Chicago: University of Chicago Press, Vol. 48.
National Commission on Employment and Unemployment Statistics (1979) Counting the Labor Force.
Washington: Government Printing Office.
Nijman, Th. E. and F. C. Palm (1985) “Consistent Estimation of a Regression Model with
Incompletely Observed Exogenous Variable”, Netherlands Central Bureau of Statistics, Unpublished
paper.
Pakes, A. (1982) “On the Asymptotic Bias of Wald-Type Estimators of a Straight Line When Both
Variables Are Subject to Error”, International Economic Review, 23(2), 491-497.
Pakes, A. (1983) “On Group Effects and Errors in Variables in Aggregation”, Review of Economics
and Statistics, LXV(l), 168-172.
Pakes, A. and Z. Griliches (1984) “Estimating Distributed Lags in Short Panels with An Application
to the Specification of Depreciation Patterns and Capital Stock Constructs”, Review of Economic
Studies, LI(2), 243-262.
Palm, F. C. and Th. E. Nijman (1984) “Missing Observations in the Dynamic Regression Model”,
Econometrica, November, 52(6), 1415-1436.
P&ash, V. (1974) “Statistical Indicators of Industrial Development: A Critique of the Basic Data”.
International Bank for Reconstruction and Development, DES Working Paper NO. 189.
President’s Committee to Appraise Employment and Unemployment Statistics (1962) Measuring
Employment and Unemployment. Washington: Government Printing Office.
Rosen, S. (1974) “Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition”,
Journal of Political Economy, 82(l), 34-55.
Rubin, D. B. (1976) “Inference and Missing Data”, Biometriha, 63(3), 581-592.
R&es, N. D. (1964) Review of 0. Morgenstem, On the Accuracy of Economic Observations, 2nd
edition, American Economic Review, LIV(4, part l), 445-447.
Schultz, H. (1938) The Theory and Measurement of Demand. Chicago: University of Chicago Press.
1514 Z. Griliches
Stewart, M. B. (1983) “The Estimation of Union Wage Differentials from Panel Data: The Problems
of Not-So-Fixed Effects”. Cambridge: National Bureau of Economic Research Conference on the
Economics of Trade Unions, unpublished.
Stigler, G. J. and J. K. Kindahl (1970) The Behaviour of Industrial Prices, National Bureau of
Economic Research, New York: Columbia University Press.
Theil, H. (1978) Infroduction to Econometrics. Englewood Cliffs: Prentice Hall.
Triplett, J. E. (1975) “The Measurement of Inflation: A Survey of Research on the Accuracy of Price
Indexes”, in: P. H. Earl, ed., Analysis of Inflation. Lexington: Lexington Books, Chapter 2, 19-82.
Triplett. J. E. (1983) “An Essay on Labor Cost”, in: National Bureau of Economic Research, Studies
in Income and Wealth, The Measurement of Labor Cost. Chicago: University of Chicago Press, 49,
l-60.
U.S. Department of Commerce (1979) Gross Nutional Product Improvement Report. Washington:
Government Printing Office.
Usher, D., ed. (1980) The Meusurement of Cupitul, National Bureau of Economic Research: Studies in
Income and We&h. Chicago: University of Chicago Press, Vol. 45.
Van Praag, B. (1983) “The Population-Sample Decomposition in Minimum Distance Estimation”.
Unpublished paper presented at the Harvard-MIT Econometrics seminar.
Vernon, R. (1983) “The Politics of Comparative National Statistics”. Cambridge, Massachusetts,
unpublished.
Waugh. F. V. (1928) “Quality Factors Influencing Vegetable Prices”, Journal of Farm Economics, 10,
185-196.
White, H. (1980) “Using Least Squares to Approximate Unknown Regression Functions”, Inrerna-
tional Economic Review, 21(l), 149-170.
Young, A. H. (1974) “Reliability of the Quarterly National Income and Product Accounts in the
United States, 1947-1971”, Review of Income and Wealth, 20(l), l-39.
Chapter 26
Contents
1. Introduction 1516
2. Criteria for the selection of functional forms 1520
2.1. Theoretical consistency 1520
2.2. Domain of applicability 1527
2.3. Flexibility 1539
2.4. Computational facility 1545
2.5. Factual conformity 1546
3. Compatibility of the criteria for the selection of functional forms 1547
3.1. Incompatibility of a global domain of applicability and flexibility 1548
3.2. Incompatibility of computational facility and factual conformity 1551
3.3. Incompatibility of a global domain of applicability, flexibility and
computational facility 1552
4. Concluding remarks 1558
Appendix 1 1559
References 1564
*The author wishes to thank Kemreth Arrow, Erwin Diewert, Zvi Griliches, Dale Jorgenson and
members of the Econometrics Seminar at the Department of Economics, Stanford U&ersity, for
helnful comments and discussions. Financial sunnort for this research under srant SOC77-11105 from
the* National Science Foundation is gratefully &cnowledged. Responsibility for errors remains with
the author.
1. Introduction
where y is the observed value of the dependent variable, X is the observed value
of the vector of independent variables, cu is a finite vector of unknown constant
parameters and E is a stochastic disturbance term. The deterministic part,
f( X, a), is supposed to be a known function. The functional form problem that
we consider is the ex ante choice of the algebraic form of the function f( X; CX)
prior to the actual estimation. We ask: What considerations are relevant in the
selection of one algebraic functional form over another, using only a priori
information not specific to the particular data set?
This problem of ex ante choice of functional forms is to be carefully dis-
tinguished from that of ex post choice, that is, the selection of one functional
form from among several that have been estimated from the same actual data set
on the bases of the estimated results and/or post-sample predictive tests. The
ex post choice problem belongs properly to the realm of specification analysis and
hypothesis testing, including the testing of nonnested hypotheses.
We do not consider here the choice of functional forms in quanta1 choice
analysis as the topic has been brilliantly covered by McFadden (1984) elsewhere
in this Handbook. In our discussion of functional forms, we draw our examples
largely from the empirical analyses of production and consumer demand because
the restrictions implied by the respective theories on functional forms are richer.
But the principles that we use are applicable more generally.
Historically, the first algebraic functional forms were chosen because of their
ease of estimation. Almost always a functional form chosen is linear in parame-
ters, after a transformation of the dependent variable if necessary. Thus, one
specializes from
Ch. 26: Functional Forms in Econometric Model Building 1517
to
Y= Cfi(x)ai9
or
g(Y)= CfiCxiJai7
I
or
An example of eq. (1.1) is the widely used linear functional form in which
f( Xi) = Xi. An example of eq. (1.2) is the double-logarithmic functional form in
which g(y) = In y and f( Xi) = In Xi. It has the constant-elasticity property with
the advantage that the parameters are independent of the units of measurement.
In addition, functional forms of the type in eqs. (1.1) and (1.2) may be
interpreted as first-order approximations to any arbitrary function in a neighbor-
hood of some X= X,, and that is one reason why they have such wide currency.
However, linear functions, while they may approximate whatever underlying
function reasonably well for small changes in the independent variables, fre-
1518 L. J. L.uu
quently do not work very well for many others purposes. For example, as a
production function, it implies perfect substitution among the different inputs
and cons&~ marginal products. It cannot represent the phenomenon of di-
minishing marginal returns. Moreover, the perfect substitution property of the
linear production function has the unacceptable implication that almost always
only a single input will be employed and an ever so slight change in the relative
prices of inputs will cause a complete shift from one input to another.
Another linear-in-parameters functional form that was used is that of the
Leontief or fixed-coefficients production function in its derived demand functions
representation:
where Xi is the quantity of the ith input, i =l,. . . , m and Y is the quantity of
output. However, this production function implies zero substitution among the
different inputs. No matter what the relative prices of inputs may be, the relative
proportions of the inputs remain the same. This is obviously not a good
functional form to use if one is interested in the study of substitution possibilities
among inputs.
The first widely used production function that allows substitution is the
Cobb-Douglas (1928) production function, which may be regarded as a special
case of eq. (1.2):
y = AjpL(1-a) 3 0.4
where K and L are the quantities of capital and labor respectively. Fq. (1.4)
reduces to the form of eq. (1.3) by taking natural logarithms of both sides. The
Cobb-Douglas production function became the principal work horse of empirical
analyses of production until the early 1960s and is still widely used today.
Ch. 26: Functional Forms in Econometric Model Building 1519
The next advance in functional forms for production functions came when
Arrow, Chenery, Minhas and Solow (1961) introduced the Constant-Elasticity-
of-Substitution (C.E.S.) production function:
Y=y[(14)KP+GLP]“P, 0 -5)
where y, 6 and p are parameters. This function is not itself linear in parameters.
However, it gives rise to average productivity relations which are linear in
parameters after a monotonic transformation:
03
where p, r, and w are the prices of output, capital and labor respectively and (Y,p
and u are parameters. The C.E.S. production function was discovered, again
through a process of induction, when the estimated u from eq. (1.6) turned out to
be different from one as one would have expected if the production function were
actually of the Cobb-Douglas form.
Unfortunately, although the C.E.S. production function is more general than
the Cobb-Douglas production function (which is itself a limiting case of the
C.E.S. production function), and is perfectly adequate in the two-input case, its
generalizations to the three or more-input case impose unreasonably severe
restrictions on the substitution possibilities. [See, for example, Uzawa (1962) and
McFadden (1963)]. In the meantime, interest in gross output technologies dis-
tinguishing such additional inputs as energy and raw materials continued to grow.
Almost simultaneously advances in the computing technology lifted any con-
straint on the number of parameters that could reasonably be estimated. This led
to the growth of the so-called “flexible” functional forms, including the gener-
alized Leontief functional form introduced by Diewert (1971) and the tran-
scendental logarithmic functional form introduced by Christensen, Jorgenson and
Lau (1973). These functional forms share the common characteristics of linearity-
in-parameters and the ability of providing second-order approximations to any
arbitrary function. In essence they allow, in addition to the usual linear terms, as
in eqs. (1.1) and (1.2), quadratic and interaction terms in the independent
variables.
Here we study the problem of the ex ante choice of functional form when the
true functional form is unknown. (Obviously, if the true functional form is
known, we should use it.) We shall approach this problem by considering the
relevant criteria for the selection of functional forms.
1520 L. J. ZAU
What are some of the criteria that can be used to guide the ex ante selection of an
algebraic functional form for a particular economic relationship? Neither eco-
nomic theory nor available empirical knowledge provide, in general, a sufficiently
complete specification of the economic functional relationship so as to determine
its precise algebraic form. Consequently the econometrician has wide latitude in
deciding which one of many possible algebraic functional forms to use in building
an econometric model. Through practice over the years, however, a set of criteria
has evolved and developed. These criteria can be broadly classified into five
categories:
(1) Theoretical consistency;
(2) Domain of applicability;
(3) Flexibility;
(4) Computational facility; and
(5) Factual conformity.
We shall discuss each of these criteria in turn.
2. I. Theoretical consistency
Theoretical consistency means that the algebraic functional form chosen must be
capable of possessing all of the theoretical properties required of that particular
economic relationship for an appropriate choice of parameters. For example, a
cost function of a cost-minimizing firm must be homogeneous of degree one,
nondecreasing and concave in the prices of inputs, and nondecreasing in the
quantity of output. Thus, any algebraic functional form selected to represent a
cost function must be capable of possessing these properties for an appropriate
choice of the parameters at least in a neighborhood of the prices of inputs and
quantity of output of interest. For another example, a complete system of demand
functions of a utility-maximizing consumer must be summable,’ homogeneous of
degree zero in the prices of commodities and income or total expenditure and
have a Jacobian matrix which gives rise to a negative semidefinite and symmetric
Slutsky substitution matrix. Thus, any algebraic functional form selected to
represent a complete system of consumer demand functions must be capable of
possessing these properties for an appropriate choice of the parameters at least in
a neighborhood of the prices of commodities and income of interest.
Obviously, not all functional forms can meet these theoretical requirements, not
even in a small neighborhood of the values of the independent variables of
‘Summability means that the sum of expenditures on all commodities must be equal to income or
total expenditure.
Ch. 26: Functional Forms in Econometric Model Building 1521
interest. However, a sufficiently large number of functional forms will satisfy the
test of theoretical consistency, at least locally, that other criteria must be used to
select one from among them. Moreover, many functional forms, while they may
satisfy the theoretical consistency requirement, are in fact readily seen to be
rather poor choices. For example, the cost function
c(P>y)=y
[
c”iPi
i=l I 3
where piis the price of the ith input and Y is the quantity of output and (Y~> 0,
i=l >*.., m, satisfies all the theoretical
requirements of a cost function. It is
homogeneous of degree one, nondecreasing and concave in the prices of inputs
and nondecreasing in the quantity of output. However, it is not regarded as a
good functional form in general because it allows no substitution among the
inputs. The cost-minimizing demand functions corresponding to this cost function
are given by Hotelling (1932)-Shephard (1953) Lemma as:
Thus all inputs are employed in fixed proportions. While zero substitution or
equivalently fixed proportions may be true for certain industries and processes, it
is not an assumption that should be imposed a priori. Rather, the data should be
allowed to indicate whether there is substitution among the inputs, which brings
up the question of “flexibility” of a functional form to be considered below.
Yet sometimes considerations of theoretical consistency alone, even locally, can
rule out many functional forms otherwise considered acceptable. This is demon-
strated by way of the following two examples, one taken from the empirical
analysis of producer behavior and one from consumer behavior.
First, we consider the system of derived demand functions of a cost-minimiz-
ing, price and output-taking firm with the constant-elasticity property:
where Xi is the quantity demanded of the ith input, pj is the price of the jth
input, and Y is the quantity of output. The elasticities of demand with respect to
1522 L. J. Lau
own and cross prices and the quantity of output are all constants:
z =&, i=l,...,m.
Functional forms with constant elasticities as parameters are often selected over
other functional forms with a similar degree of ease of estimation because the
values of the parameters are then independent of the units of measurement of the
variables. It can be readily verified that in the absence of further restrictions on
the values of the parameters /Iij’s and &r’s, such a system of derived input
demand functions is flexible, that is, it is capable of attaining any given value of
X (necessarily positive), h’X’/ap and aX/aY at any specified positive values of
p=I_i and Y=Y through a suitable choice of the parameters &,‘s and /Iiv’s.
However, if it were required, in addition, that the system of derived demand
functions in eq. (2.1) be consistent with cost-minimizing behavior on the part of
the producer, at least in a neighborhood of the prices of input and the quantity of
output, then certain restrictions must be satisfied by the parameters pij’s and
piv’s. Specifically, the function:
l,i=j
where 6,, =
0, otherwise
must have all the properties of a cost function and its partial derivatives with
respect to pi:
A cost function is homogeneous of degree one in the prices of inputs and the
first-order partial derivative of a cost function with respect to the price of an
input is therefore homogeneous of degree zero, implying:
if j, i, j=l >*.*,m,
which implies:
ax,=a.
aPj api J’
i # j, i, j=l,..., m. (2.6)
parameters:
a, - Pjiea/ = 0.2
P,,e
We note that for this case, (pli + 1) > 0 and ( pjj + 1) > 0, implying that the
own-price elasticities of the i th and j th inputs must be greater than minus unity
(or less than unity in absolute value)-a significant restriction. We further note
that if & # 0 for some k, k # i, j, Pjk z 0 for the same k. But if Pik # 0 and
Pjk f 0 by eq. (2.7), Pki f 0 and piXi/p,X, = Pki/&, a constant, and hence the
relative expenditures of all three inputs, i, j and k, are constants. Moreover, the
proportionality of expenditures implies that pii + 1 - Pki = 0 for all k such that
Pik # 0, k # i. Hence all &‘s, k # i, must have the same sign-positive, in this
case. All Pki’s, k # i, must have the same positive sign and magnitude. And
PiY= bjY=PkY'
By considering all the i’s it can be shown that the inputs are separable into n,
n I m, mutually exclusive and jointly exhaustive groups such that
(1) Cross-price elasticities are zero between any two commodities belonging to
different groups;
(2) Relative expenditures are constant within each group.
where pJ is the vector of prices of the jth group of inputs and each C,( ) has the
form:
*This restriction results from setting the prices of all inputs and the quantity of output to unities.
Ch. 26: Functional Form in Econometric Model Building 1525
where
Third, pij < 0 and pij < 0, in which case the relative expenditures on the two
inputs are again constants independent of the prices of inputs and quantity of
output, implying the same restrictions on the parameters as those in eq. (2.8).
However, as derived earlier, all &‘s that are nonzero must have the same
sign-negative, in this case. But then cy= i& cannot be zero as required by zero
degree homogeneity. We conclude that a cost function of the form in eq. (2.9) is
the only possibility, with rather restrictive implications.
From this example we can see that the requirement of theoretical consistency,
even locally, may impose very strong restrictions on an otherwise quite flexible
functional form.
Second, we consider the complete system of demand functions of a utility-max-
imizing, budget-constrained consumer with the constant-elasticity property: 3
z=flij, J
i, j-1 ,..., m,
3Such a system was employed by Schultz (1938), Wold with Jureen (1953) and Stone (1953).
1526 L. J. Lau
ipixj=
i-l
2 exp(a,+
C
i=l j=l
(Pijf6,j)lnP,+Bi~lnM)
=M (2.11)
identically. It will be shown that (local) summability alone, through eq. (2.11),
imposes strong restrictions on the parameters pi,‘s and pjM’s.
By dividing both sides by M, eq. (2.11) can be transformed into:
i=l
=
exp
l 0~~+ 2
j=l
( pij + aij)ln pj + ( PiM - 1)ln M
I
=l. (2.12)
m I m \
iFl(Pik+Sik)2exP{ai+ C (P;j+Gij)ln~,+(~i,-l)lnM}=o~
j=l
k=l,..., m. (2.13)
But
Thus, in order for the left-hand side of eq. (2.13) to be zero, one must have:
(Pi/c + 6ik) = OY
i, k =l,..., m.
We conclude that (local) summability alone implies that the system of consumer
demand functions must take the form:
which is no longer flexible.4 For this system, the own-price elasticity is minus
unity, the cross-price elasticities are zeroes, and the income elasticity is unity for
the demand function of each and every commodity.
We conclude that theoretical consistency, even if applied only locally, can
indeed impose strong restrictions on the admissible range of the values of the
parameters of an algebraic functional form. It is essential in any empirical
application to verify that the algebraic functional form remains reasonably
flexible even under all the restrictions imposed by the theory. We shall return to
the concept of “flexibility” in Section 2.3 below.
4This result is well known. The proof here follows Jorgenson and Lau (1977) which contains a more
general result.
1528 L. J. Lau
51t is possible, and sometimes advisable, to take the applicable domain to be a compact convex
subset of the set of all nonnegative prices.
61t is possible, and sometimes advisable, to take the applicable domain to be a compact convex
subset of the set of all nonnegative prices and incomes.
Ch. 26: Functional Forms in Econometric Model Building 1529
These two examples share an interesting property - for given a, if the algebraic
functional form is locally valid, it is globally valid. This property, however, does
not always hold. We shall consider two examples of unit cost functions- the
generalized Leontief unit cost function introduced by Diewert (1971) and the
transcendental logarithmic unit cost function introduced by Christensen,
Jorgenson and Lau (1973).
The generalized Leontief unit cost function for a single-output, two-input
technology takes the form:
c(F,,P,)lo;
VC(F1, P,) 2 0; (2.17)
v2C(PI,p2)negative semidefinite.
We note that a change in the units of measurement of the inputs leaves the values
of the cost function and the expenditures unchanged. Without loss of generality,
the price per unit of any input can be set equal to unity at any specified set of
positive prices by a suitable change in the units of measurement. The parameters
of the cost function, of course, must be appropriately resealed. We therefore
assume that the appropriate resealing of the parameters have been done and take
(pi, p2) to be (1,l). By direct computation:
vC(l,J) =
I“o+h
-
a2
1+ $a,
a1
)
a1
V2C(Ll)=
i-
ds, _(y
4 -1 4
4l
.
It is clear that by choosing (Y~ to be positive and sufficiently large all three
conditions in eq. (2.17) can be strictly satisfied at (l,l). We conclude that for local
theoretical consistency (pi positive and sufficiently large is sufficient. (Actually (Y~
nonnegative is necessary.)
1530 L. J. Llu
We shall now show that (pi positive and sufficiently large alone is not sufficient
for global theoretical consistency. Global theoretical consistency requires that
1 20; (2.19)
1
-l/2 -l/2
iP1 P2
negative semidefinite;
-1 l/2 -3,‘2 ’
4P1 P2
(2.20)
are necessary and sufficient for global theoretical consistency of the generalized
Leontief unit cost function.
The transcendental logarithmic unit cost function for a single-output, two-input
technology takes the form:
P
+ +ln pi. (2.22)
C(l,l) = eao 2 0,
vC(1,l) =
I eao(l
- q)1ea”(y1 20, (2.23)
1
%(l-%)-Pll
negative semidefinite,
-(l-+1+&1 ’
Ch. 26: Functional Forms in Econometric Model Building 1531
eao is always greater than zero. 12 (pi 2 0 is necessary and sufficient for vC(l,l)
to be nonnegative. (~r((~r- l)+ j3i1 I 0 is necessary and sufficient for v2C(1, 1) to
be negative semidefinite. The set of necessary and sufficient restrictions on the
parameters for local theoretical consistency at (1,l) is therefore:
12cyr20; “r(ol,-l)+&rIo. (2.24)
We shall now show that the conditions in eq. (2.24) are not sufficient for global
theoretical consistency. Global theoretical consistency requires that
P
+*lnpi 20 (2.25)
i
vC(P,,P,)'=C
1
a1 + Pllln p1
PI
- &11n P2
a2c c
x (~-,)-&&P~+P~&P~
P2 I-,. (2.26)
-=-(~l+Plllnpl-Pllln ~~)((~~-1+Plllnpl-Plllnp2)
aPf P:
+Plllo, (2.27)
“&t-l) 1o
,
P:
are necessary and sufficient for global theoretical consistency of the transcenden-
tal logarithmic unit cost function.
We shall show later that under the necessary and sufficient restrictions for
global theoretical consistency on their parameters both the generalized Leontief
unit cost function and the transcendental logarithmic unit cost function lose their
flexibility.
Having established that functional forms such as the generalized Leontief unit
cost function and the transcendental logarithmic unit cost function can be
globally valid only under relatively stringent restrictions on the parameters, but
that they can be locally valid under relatively less stringent restrictions we turn
our attention to a second question, namely, characterizing the domain of theoreti-
cal consistency for a functional form when it fails to be global.
As our first example, we consider again the generalized Leontief unit cost
function. We note that (pi 2 0 is a necessary condition for local theoretical
consistency. Given ai 2 0, eq. (2.20) is identically satisfied. The set of prices of
inputs over which the generalized Leontief unit cost function is theoretically
consistent must satisfy:
(2.31)
Eq. (2.31) thus defines the domain of theoretical consistency of the generalized
Leontief unit cost function. If (1,l) were required to be in this domain then the
additional restrictions of:
(2.32)
(2.33)
where + 2 (1- a)a 2 pii > 0. If pi1 < 0, it can be shown that the domain of
theoretical consistency is given by:
Our analysis shows that both the generalized Leontief and the translog unit
cost functions cannot be globally theoretically consistent for all choices of
parameters. However, even when global theoretical consistency fails, there is still
a set of prices of inputs over which theoretical consistency holds and this set may
well be large enough for all practical purposes. The question which arises here is
that given neither functional form is guaranteed to be globally theoretically
consistent, is there any objective criterion for choosing one over the other?
One approach that may provide a basis for comparison is the following: We
can imagine each functional form to be attempting to mimic the values of C, VC
and v2C at some arbitrarily chosen set of prices of inputs, say, without loss of
generality, (1,l). Once the values of C, VC and v2C are given, the unknown
parameters of each functional form is determined. We can now investigate,
holding C, VC and v2C constant, the domain of theoretical consistency of each
functional form. If the domain of theoretical consistency of one functional form
always contains the domain of theoretical consistency of the other, no matter
what the values of C, VC and v2C are, we say that the first functional form
dominates the second functional form in terms of extrapolative domain of
applicability. In general, however, there may not be dominance and one func-
tional form may have a larger domain of theoretical consistency for some values
of C, vC and v2C and a smaller domain for other values.
We shall apply this approach to a comparison of the generalized Leontief and
transcendental logarithmic unit cost functions in the single-output, two-input
case.
‘See Lau and Schaible (1984) for a derivation. See also Caves and Christensen (1980).
1534 L. J. Lau
C&l) =1,9
1
(2.35)
vC(l,l) = k2
[ l-k, ’
and
c(1,1)=1=cw,+a,+a2,
which imply:
a,=4k3,
a,=k,-2k3, (2.36)
a2 = (1- k,)-2k,.
It can be verified that (Ye+ (pi + a2 is indeed equal to unity. Thus, the generalized
9C(1, 1) may be set equal to any positive constant by an appropriate resealing of all the parameters.
We choose C(l, 1) = 1 for the sake of convenience.
Ch. 26: Funciional Forms in Econometric Model Building 1535
For the translog unit cost function, the rules of interpolation are:
which imply:
ao=o,
q=kz, (2.38)
&I=-k3+k2(l-k2).
lnC(p,,p,)=k,lnp,+(l-k,)lnp,
+ [k,(l+)-k31 (hp >2
1
2
- [k&-k,)-k,bv4w,
+ [k,(l-kd-k31 (lnp2)2
(2.39)
2
I I[]
a1
ao T Pl
l/2
2 0,
a1 l/2
y a2 P2
l/2
or
Ii 1
k, - 2k, 2k3
(2.40)
2k3
(l- k,)-2k, ;;I2 “’
1536 L‘.J.Lau
1.
EL, (1- k,)-2k, *
(2.41)
P2 - 2k,
(2.42)
Finally if k, -2k, < 0 and (1- k2) = 2k, < 0, then the domain of theoretical
consistency is given by:
( k2y;k3)2k~2
[(l-$-y*. (2.43)
For the translog unit cost function, the domain of theoretical consistency is
defined by eqs. (2.33) and (2.34). If pii = - k, + k,(l- k2) = 0, the domain of
theoretical consistency is the whole of the positive orthant of R* (and may be
uniquely extended to the whole of the nonnegative orthant of R*). If &i = - k,
+ k,(l - k2) > 0, then the domain of theoretical consistency is given by:
exp((f+\/i-[k,(l-k,)-k,] -k,)/[k,(l-k,)-k,])>E
If pii = - k, + k,(l - k2) < 0, then the domain of theoretical consistency is given
by:
exp{-k,/[k,(l-k2)-k,]} ~~~exp{(l-k2)/[k,(l-k2)-k;]}.
(2.45)
With these formulas we can compare the domains of theoretical consistency for
different values of k, and k, such that 12 k, 2 0 and k, 2 0. First, suppose
k, = 0, then k, -2k, 2 0 and (1- k,)-2k, 2 0 and the domain of theoretical
consistency for the generalized Leontief unit cost function is the whole of the
Ch. 26: Functional Forms in Econometric Model Building 1537
nonnegative orthant of R2. k, = 0 i mplies that pi1 = k,(l- k2) 2 0. Thus, the
domain of theoretical consistency for the translog unit cost function is given by:
which is clearly smaller than the whole of the nonnegative orthant of R*. We note
that the maximum and minimum values of k,(l - k,) over the interval [0, l] is +
and 0 respectively. Given k, = 0, if k,(l- k,) = 0, pII = 0, which implies that
the domain of theoretical consistency is the whole of the nonnegative orthant of
R2. If k,(l - k2) = $, pII = a, and the domain of theoretical consistency reduces
to a single ray through the origin defined by pi = p2. If k,(l- k2) = $, (k2 = )),
the domain of theoretical consistency is given by:
e312=4.48kfi21.
P2
Overall, we can say that the domain of theoretical consistency of the translog unit
cost function is not satisfactory for k, = 0.
Next suppose k, = k,(l - k,) (which implies that k, I a), then either
k,-2k,=k,-2k2+2k;
= k2(2k2 - 1) < 0,
or
or
neither functional form dominates the other. The cases of k, = 0 and k, = k,(l -
k2) correspond approximately to the Leontief and Cobb-Douglas production
functions respectively.
How do the two functional forms compare at some intermediate values of k,
and k,? Observe that the value of the elasticity of substitution at (1,l) is given by:
C(LW,,(L1)
a(lJ) = C,(l,l)C*(l,l) ’
= b’[k,(l- &)I.
If we let k, = ), (l- k2) = $, then a(l,l) = a is achieved at k, = i. At these
values of k, and k,, the domain of theoretical consistency of the generalized
Leontief unit cost function is still the whole of the nonnegative orthant of R*. At
these values of k, and k,, pII = -i + 6 = & > 0. The domain of theoretical
consistency of the translog unit cost function is given by:
56,233 2 e 2 0.0012,
We see that although it is short of the whole of the nonnegative orthant of R*, for
all practical purposes, the domain is large enough. Similarly ~(1, 1) = 3 is achieved
at k, = &. At these values of k, and k,, the domain of theoretical consistency of
the generalized Leontief unit cost function is given by:
(221&b-o,
4 P2
or p2 cannot be more than 6: times greater than pl. The domain of theoretical
consistency of the translog unit cost function is given by:
e6 = 403.4 2 2 2 0.000006.
We see that ignoring extremely small relative prices, the domain of theoretical
consistency of the translog unit cost function is much larger than that of the
generalized Leontief unit cost function.
The comparison of the domains of theoretical consistency of different func-
tional forms for given values of k, and k, is a worthwhile enterprise and should
be systematically extended to other functional forms and to the three or more-
input cases. The lack of space does not permit an exhaustive analysis here. It
suffices to note that the extrapolative domain of applicability does not often
provide a clearcut criterion for the choice of functional forms in the absence of
Ch. 26: Functional Forms in Econometric Model Building 1539
2.3. Flexibility
c(P,y)=y
[
FaiPi
i=l1
7 ai > 0, i=l ,-.*> m.
The inputs are always employed in fixed proportions, whatever the values of (Y
may be. Moreover, own and cross-price elasticities of all inputs are always zero!
Thus, although the cost function satisfies the criterion of theoretical consistency,
it cannot be considered “flexible” because it is incapable of approximating any
theoretically consistent cost function satisfactorily through an appropriate choice
of the parameters. to If we are interested in estimating the price elasticities of the
derived demand for say labor or energy, we would not employ the linear cost
function as an algebraic functional form because the price elasticities of demands
that can be derived from such a cost function are by a priori assumption always
zeroes.
“There is of course, the question of what satisfactory approximation means, which is addressed
below. ’
1540 L. J. Luu
DeJinition
An algebraic functional form for a unit cost function C( p; a) is said to be flexible
if at any given set of nonnegative (positive) prices of inputs the parameters of the
cost function, (Y, can be chosen so that the derived unit-output input demand
functions and their own and cross-price elasticities are capable of assuming
arbitrary values at the given set of prices of inputs subject only to the require-
ments of theoretical consistency.”
c( p,; a) = c,
vc( j; a) = x, (2.46)
v2c( p; a) = 9,
is given by:
where without loss of generality pij = /3,,,Vi, j. The elements of the gradient and
Hessian matrix of the generalized Leontief unit cost function are given by:
l3C
-&-, = p;; + 12 jzic p.‘J.pYpy,
’
i=l >..-, m; (2.48)
I
a2c
~ = ~pijp,~~~zp,-l/z. i # j, i, j=l 3.e.3 m;
aPidPj
a?
-=-a,~,Bijp,~3/2p:/2, i=l,..., m. (2.50)
ad
Second, eq. (2.48) can always be solved by an appropriate choice of the p;;‘s,
pi, 2 0, whatever the value of
+ C,/3iJp;1/2pj/2, i = 1,. . _, m.
J+’
a2c
apfPi= - C *PjY
I i+i aPiaPi
1542 L. J. Lau
so that
i = 1,. . . , m,
C &jp;3/2p)/220, i=l,..., m,
j+i
in order for the Hessian matrix to be negative semidefinite. We conclude that the
generalized Leontief unit cost function is flexible.
Another example of a flexible algebraic functional form for a unit cost function
is the transcendental logarithmic cost function. The translog unit cost function is
given by:
InC(p)=C,+Cailnpi+~CCB,,lnP,lnPj, (2.51)
i ’ J
where ci~, =l; cjpij = 0,Vi and without loss of generality fiij = /3ji,Vi,.j. The
elements of the gradient and Hessian matrix of the translog unit cost function are
given by:
ac
-=-- c ahc
ap, pi alnpi'
m; (2.52)
i # j,
(2.53)
i, j=l ,a.., m,
Pi ac
= ffi + C/3ijlnpj, i=l,...,m,
C aPi j
0 ... 0
p2
0
*** 0
P, 1
or
L
0
v2Qp)
0
0
0 0 --* pm Pl
0
0 0
P2
0 ... Pf?l
- ww’ - diag[ w 1, (2.55)
where wi= alnC/alnp,, i=l,..., m, and diag[w] is a diagonal matrix with wi’s
on the diagonal. Every term on the right-hand side of eq. (2.55) is either known or
specified. Thus, /3 can be chosen, subject to cipij = O,Vj, to satisfy any negative
semidefinite matrix specified for v2C( p). We conclude that the translog unit cost
function is flexible.
Similarly, we can give a working definition of “flexibility” for an algebraic
functional form for a complete system of consumer demand functions as follows:
Dejinition
More formally, let F*( p*, M*; a) be a vector-valued algebraic functional form
for a complete system of consumers demand functions expressed in natural
1544 L. J. Lou
Then flexibility implies and is implied by the existence of a solution a( p*, a*; F*,
aF*‘/ap*, aF*/aM*) to the following set of equations:
(2.56)
for every positive value of j*, a* and F* and symmetric negative semidefinite
value of the corresponding Slutsky substitution matrix which depends on p*, M*,
aF*‘/ap* and JF*/aM*.
We note that an equivalent definition may be phrased in terms of the natural
derivatives of the demand functions with respect to the prices of commodities and
income rather than the logarithmic derivatives or elasticities.
An example of a flexible algebraic functional form for a complete system of
consumer demand functions is the transcendental logarithmic demand system
introduced by Christensen, Jorgenson and Lau (1975). The transcendental loga-
rithmic demand system is given by:
(Y,+ ~/3,j(lnpj-lnM)
p,x,= .i
i=l ,-.., m, (2.57)
M -l+ x/?j,(lnpj-lnM) ’
where pij = pji, i, j = 1,. . . , m and xi&, = /3’M, j = 1,. . . , m. It may be verified
that this complete system of demand functions can attain at any prespecified
positive values of p = _is and M = a and given positive value of X and negative
semidefinite value of the Slutsky substitution matrix S such that S’p = 0, where a
typical element of S is given by:
aXi aXi
sij= F + XjzI i,j=l 7.*., m,
functional form often prescribes the value, or at least the range of values, of the
critical parameters. In general, the degree of flexibility required depends on the
application. For most applications involving producer or consumer behavior,
the flexibility required is that the own and cross-price derivatives (or equivalently
the elasticities) of demand for inputs or commodities be free to attain any set of
theoretically consistent values. For other applications, the desired degree of
flexibility may be greater or less. Sometimes a knowledge of the sign and/or
magnitude of a third-order derivative may be necessary. For example, in the
analysis of behavior under uncertainty, the third derivative of the utility function
of the decision maker plays a critical role in the comparative statics. In the
empirical analysis of such situations, the algebraic functional form should be
chosen so that it is “third-order” flexible, that is, it permits the data to inform
about the sign and/or magnitude of the third derivative of the utility function (or
equivalently, the second-order derivative of the demand function). In other words,
we need to know not only the elasticity of demand, but also the rate of change of
the elasticity of demand.
-F(PAo
xi = &/l i =l,...,m,
aM(Pdw ’
Y=Min(
f(X), E-,
where X is the vector of all other inputs, f(X) is a function of X and M is the
quantity of raw material input; or
Y= f(X)M.
The fact that not all Engel curves (of different commodities) are linear suggests
that the use of the Gorman (1953) condition for the analysis of aggregate
consumer demand can be justified only as an approximation.14
In the choice of algebraic functional forms, one should avoid, insofar as
possible, the selection of one which has implications that are at variance with
established facts.
A natural question that arises is: Are there algebraic functional forms that satisfy
all five categories of criteria that we have laid down in Section 2? In other words,
does there exist an algebraic functional form that is globally theoretically con-
“‘The Gorman condition on the utility function justifies the existence of aggregate demand
functions as functions of aggregate income and is widely applied in empirical analyses. See for
example Blackorby, Boyce and Russell (1978).
1548 L. J. Lau
Consider the generalized Leontief unit cost function for a single-output, two-
input technology:
which, as shown in Section 2.2, is theoretically consistent over the whole nonnega-
tive orthant of prices of inputs if and only if a0 2 0; (pi 2 0 and a2 2 0. We shall
show that under these parametric restrictions, the unit cost function is not
flexible, that is, the parameters cannot be chosen such that it can attain arbitrary
but theoretically consistent values of C, VC and v2C at an arbitrary set of prices
of inputs.
Without loss of generality let the set of prices be (1, l), and let the arbitrarily
chosen values of C, VC and v2C at (1,l) be
C(1,l) = k, 2 0,
(3.1)
$(l,l)=a,++Y,=k,, (3.2)
1
$(l,l) =- $01~
= - k,.
1
The reader can verify that satisfaction of eq. (3.2) is equivalent to the satisfaction
of eq. (3.1). It is easy to see that CY~can always be chosen to be 4k, and hence
2 0. However,
cannot hold with (~a 2 0 if 2k, 2 k,. Thus, flexibility fails if the generalized
Leontief unit cost function is required to be theoretically consistent globally. We
note that 2k, 2 k, implies that
-- Pl _=-
ax, aW/aP: k, 1
4 aP1 p1 ac/aP, ‘CT,
q,=k,-2k,rO,
q = 4k, 2 0,
cw,=k,-k,-2k,20,
CC,, k, k,
’= -C,C, = g (k, - k2)
k; 1
=-
k; (l-k;)’
ah x1 k:
alnp, = G’
ah x2 k:
PC
dInPI (l-k;) ’
alnx, -kz
---=
alnP2 (1-k;) ’
Ch. 26: Functional Forms in Econometric Model Building 1551
Figure 1
In Section 2.5 we pointed out the known fact that some commodities, notably
food, have income elasticities less than unity. Thus, any algebraic functional form
for a complete system of consumer demand functions that has the property of
unitary income elasticity for every commodity must be at variance with the facts
and should not be used. This rules out all complete systems of consumer demand
1552 L. J. Lm
(3.3)
(3.4)
(3.5)
“Linearity in parameters as used here requires that the restrictions on the parameters, if any, are
linear also. Thus, the Linear Expenditure System introduced by Stone (1954) is not a linear-in-parame-
ters functional form.
‘“See, for example, Jorgenson and Lau (1977) and (1979) and Lau (1977).
Ch. 26: Functional Forms in Econometric Model Building 1553
were q = p2/p1. We note that eqs. (3.3) and (3.4) together imply that C*(q) > 0.
Lemma 1
Let a normalized unit cost function have the linear-in-parameters and parsimoni-
ous form:
c(q)=fo(q)~o+fl(q)~l+f*(q)~2~17 (3.6)
where the fi(q)‘s are a set of linearly independent twice continuously differentia-
ble functions of q. In addition, suppose that the functional form is flexible, that
is, for every 4 > 0 and every k 2 0, there exists a set of parameters (Y,,,CX~
and (Y*
such that:
i=O
- 2 fi”(ij)a, =k,.
i=o
W(ij)a= k. (3.7)
where
By hypothesis, for all 4 > 0, and for all k 2 0, there is a solution (Ysatisfying
W(ij)ci= k.
“This functional form is parsimonious because it has the minimum number of independent
parameters required for flexibility.
1554 L. J. Luu
By Gale’s (1960) Theorem of the Alternative this implies that there must not be a
solution y to the equations
Suppose W(q) is singular for some 4, then there exists j f 0 such that
w( q)‘j = 0.
W(4)” = k*, k* 2 0,
We note that if the functions fo(q), fi(q) and f2(q) are linearly dependent,
then W(q) is always singular. It is clear that the functional form in eq. (3.6) is
parsimonious in the number of parameters since the number of independent
unknown parameters is equal to the number of components of k that need to be
matched.
Lemma 2
Proof
Lemma 3
Let a class of normalized unit cost functions have the linear-in-parameters and
parsimonious form:
WI; 4 =fo(cr)~o+f1(4)“1+f*(q)a21
where the fi(q)‘s are a set of linearly independent twice continuously differenti-
able functions of 4. In addition, suppose that the functional form is flexible, that
is, for every 4 > 0 and every k 2 0, there exists a set of parameters CQ, (pi and CQ
such that:
2 (L(~)-#i’(4))ai=ko,
i=O
or equivalently
W(ij)a= k.
IV+2 0, vq20.
W(+x= k.
“1 am grateful to Kenneth Arrow for correcting an error in the original formulation of Lemma 3.
1556 LJ.L.UU
(Y= W(q)-%.
Suppose the theorem is false, then there exists W(q) such that:
W(q)cu=W(q)W(q)-‘k>O,Vq>O,q>Oand k20.
4qd = ~bdw~)-l~
which is nonnegative. Then
67) = 4% 4Mq)?
and hence
fed = NcL~)P~(~).
19A permutation matrix is a square matrix which can be put into the form of an identity matrix by a
suitable reordering of the rows (or columns) if necessary.
Ch. 26: Functional Forms in Econometric Model Building 1557
2oAs an example, consider classical Newtonian mechanics and relativistic mechanics. The latter
reduces to the former at low velocities. However, an extrapolation of Newtonian mechanics to
high-velocity situations would be wrong!
1558 L. J. Lau
4. Concluding remarks
The most important conclusion that can be drawn from our analysis here is that
in general it is not possible to satisfy all five categories of criteria simultaneously.
Some trade-offs have to be made. It is however not recommended that one
compromises on local theoretical consistency - any algebraic functional form must
be capable of satisfying the theoretical consistency restrictions at least in a
neighborhood of the values of the independent variables of interest. It is also not
recommended, except as a last resort, to give up computational facility, as the
burden of and probability of failure in the estimation of nonlinear-in-parameters
models is at least one order of magnitude higher than linear-in-parameters models
and in many instances the statistical theory is less well developed. It is also not
advisable to sacrifice flexibility-inflexibility restricts the sensitivity of the param-
eter estimates to the data and limits a priori what the data are allowed to tell the
econometrician. Unless there is strong a priori information on the true functional
form, flexibility should be maintained as much as possible.
This leaves the domain of applicability as the only area where compromises
may be made. As argued in Section 3.3, most practical applications can be
accommodated even if the functional form is not globally theoretically consistent
so long as it is theoretically consistent within a sufficiently large but nevertheless
compact subset of the space of independent variables. For example, any extrapo-
lative domain of theoretical consistency which allows the relative price of inputs
to vary by factor of one million is plenty large enough. Moreover, by making a
compromise on the extrapolative domain of applicability one can also simulta-
neously reduce the domain over which the functional form has to be flexible.
Further, one can also make compromises with regard to the interpolative domain
of the functional form, that is, to limit the set of possible values of the derivatives
of the function that the functional form has to fit. For example, one may specify
that a functional form for a unit cost function C( p; a(k)) be theoretically
consistent for all prices in a compact subset of positive prices and for all values of
k in a compact subset of possible values of its first and second derivatives. This
last possibility holds the most promise.
With regard to specific applications, one can say that as far as the empirical
analysis of production is concerned, the surest way to obtain a theoretically
consistent representation of the technology is to make use of one of the dual
concepts such as the profit function, the cost function or the revenue function.
There, as we have learned, one has to be prepared to make compromises with
regard to the domain of applicability. The impossibility theorem in Section 3.3
applies not only to unit cost functions but to other similar concepts such as profit
and revenue functions as well.
As far as the empirical analysis of consumer demand is concerned, the surest
way to obtain a theoretically consistent and flexible complete system of demand
Ch. 26: Functional Forms in Econometric Model Building 1559
Appendix 1
Lemma 3
Proof
Sufficiency follows from the fact that the inverse of a permutation matrix is its
transpose, which is also a permutation matrix. The proof of necessity is by
induction on the order of the matrix n. First, we verify the necessity of the lemma
“A permutation matrix is a square matrix which can be put into the form of an identity matrix by a
suitable reordering of the rows (or columns) if necessary.
1560 L. J. Luu
for n = 2. The elements of A and A-l, both nonnegative, must satisfy the
following equations:
where A,, and B,, are scalars. The elements of A and A-’ must satisfy the
following equations:
First, suppose A,, # 0, then by eq. (A.6) b,, = 0 which implies that B,, f 0 and
B, is nonsingular (otherwise A-’ is singular). B,, Z 0 implies by eq. (A.7) u,r = 0.
B,, is nonsingular implies by eq. (A.6) a,, = 0. By eq. (A.@ B, = Ai ‘. By eq.
Ch. 26: Functional Forms in Econometric Model Building 1561
(A.5) B,, = A<l. Thus the matrices A and A-’ have the following forms:
A= [“d’
l-1, A-'= [“e’
A;1].
But A,, and Ai1 are both nonnegative, implying, by the lemma that
A, = D,P,.
We conclude that
qnbnl = 1.
alnB,, = 0.
We note, first of all, that eq. (A.8) implies that anlbln must be a diagonal
matrix. A typical element of anlbln is a,l,ib,,,j. In order for this to be identically
zero for i # j, all i, j, it is necessary and sufficient that a,, and b,, be nonzero in
only one element which is common to both a,, and b,,. Let this element be the
kth element of a,, (and b,,). Moreover, since anlbln is then a diagonal matrix
with the k th element on the diagonal nonzero, I,, - anlbln is also a diagonal
matrix. However, it must have a rank equal to A,B, and hence less than or equal
to n - 1. We conclude that the nonzero diagonal element of anlbln must be equal
to unity. The product A,B, is then equal to an identity matrix with the kth
element on the diagonal replaced by a zero. The ranks of A, and B, must be
equal to (n - 1). If either of them were less than (n - l), then the matrix A (or
A _ ‘) would be singular.
1562 L. J. Lau
A,,&,, = 0,
al,,4 = 0,
a,, can have only one nonzero element. Moreover, because alnbnl = 1, the same
element in a,, and b,, must be nonzero. Thus, the matrix A has the form:
0
A=
anl,k
0
where the Ith column of A, is a column of zeros. Similarly, A-’ has the form:
0 . . .0 b ln,k 0 . . .0
argument, that the k th column of B, is identically zero. Thus, the matrices A and
A-’ have the following forms.
I_
0 A:-,,,-, o A,*-k,,-,
0 0
0
&-I : 0
A,B,, = 0 ... 0 . ..O
0 : In-k
I
Let AZ be the matrix formed by deleting the k th row and Ith column of A, and
B,* be the matrix formed by deleting the Ith row and kth column of B,, it can be
shown that the resulting product of the two square matrices A,* and B,* is:
A*B*=
n n I n _ 19
1564 L. J. L.uu
A; = Dn-lPn-,,
al”,/
41
0
D kpl,k-1
A=
a nl,k
D kk
D n-1.77-1
-0 o... 1 ... 0
0 0
0 Pi%,,-l : Pi+-1 , n-l
b
(A.lO)
1 0e.e 0 ... 0
0 0
: p,*_k,,-l ’ pi+-k / n-l
where the Dir’s are the elements of the positive diagonal matrix D,_, and P,T’s
are conformable partitions of the permutation matrix P,*_ 1. It can be verified that
the second matrix of the product in eq. (A.lO) is a permutation matrix. Q.E.D.
References
Arrow, K. J., H. B. Chenery, B. S. Minhas and R. M. Solow (1961) “Capital-Labor Substitution and
Economic Efficiency”, Review of Economics and Statistics, 43, 225-250.
Barten, A. P. (1967) “Evidence on the Slutsky Conditions for Demand Equations”, Review’ of
Economics and Statistics, 49, 77-84.
Barten, A. P. (1977) “The Systems of Consumer Demand Functions Approach: A Review”, in: M. D.
Intriligator, ed., Frontiers of Quantitative Economics. IIIA, Amsterdam: North-Holland, 23-58.
Berndt, E. R., M. N. Darrough and W. E. Diewert (1977) “Flexible Functional Forms and
Expenditure Distributions: An Application to Canadian Consumer Demand Functions”, Interna-
tional Economic Review, 18, 651-676.
Blackorby, C., R. Boyce and R. R. Russell (1978) “Estimation of Demand Systems Generated by the
Gorman Polar Form: A Generalization of the S-branch Utility Tree”, Econometricu, 46, 345-364.
Ch. 26: Functional Forms in Econometric Model Building 1565
Lau, L. J., W. L. Lin and P. A. Yotopoulos (1978) “The Linear Logarithmic Expenditure System: An
Application to Consumption-Leisure Choice”, Econometrica, 46, 843-868.
Lau, L. J. and S. Schaible (1984) “A Note on the Domain of Monotonicity and Concavity of the
Transcendental Logarithmic Unit Cost Function”, Department of Economics, Stanford: Stanford
University, mimeographed.
Lau, L. J. and B. A. Van Zummeren (1980) “The Choice of Functional Forms when Prior Information
is Diffused”. Paper presented at the Fourth World of Congress of the Econometric Society,
Aix-en-Provence, France, August 28-September 2, 1980.
McFadden, D. L. (1963) “Further Results on C.E.S. Production Functions”, Review of Economic
Studies, 30, 73-83.
McFadden. D. L. (1964) “Existence Conditions for Theil-Type Preferences”, Department of Econom-
ics, Berkeley: University of California, mimeographed. __
McFadden, D. L. (1978) “Cost, Revenue, and Profit Functions”, in: M. A. Fuss and D. L. McFadden,
eds., Production Economics: A Dual Approach to Theoty and Applications. Amsterdam: North-Hol-
land, 1, 3-109.
McFadden, D. L. (1984) “Econometric Analysis of Qualitative Response Models”, in: Z. Griliches
and M. D. Intriliaator. eds.. Handbook of Econometrics. Amsterdam: North-Holland, Vol. 2.
Muellbauer, J. S. (i975) “Aggregation, Income Distribution, and Consumer Demand”, Review of
Economic Studies, 42, 525-543.
Muellbauer. J. S. (1976) “Communitv Preferences and the Renresentative Consumer”, Econometrica,
44979-999. . ’
Nerlove, M. (1963) “Returns to Scale in Electricity Supply”, in: C. F. Christ, et al., eds., Measurement
in Economics: Studies in Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld.
Stanford: Stanford University Press, Vol. I.
Pollak, R. A. and T. J. Wales (1978) “Estimation of Complete Demand Systems from Household
Budget Data: The Linear and Quadratic Expenditure Systems”, American Economic Reuiew, 68,
348-359.
Pollak, R. A. and T. J. Wales (1980) “Comparison of the Quadratic Expenditure System and Translog
Demand Systems with Alternative Specifications of Demographic Effects”, Economefrica, 48,
595-612.
Roy, R. (1943) De I’utihte. Paris: Hermann.
Schultz, H. (1938) The Theoty and Measurement of Demand. Chicago: University of Chicago Press.
Shephard, R. W. (1953) Cost and Production Functions. Princeton: Princeton University Press.
Shephard, R. W. (1970) Theory of Cost and Production Functions. Princeton: Princeton University
Press.
Stone, J. R. N. (1953) The Measurement of Consumer’s Expenditure and Behuvior in the United
Kingdom, 1820- 1938. Cambridge: Cambridge University Press Vol. 1.
Stone, J. R. N. (1954) “Linear Expenditure Systems and Demand Analysis: An Application to the
Pattern of British Demand”, Economic Journal, 64, 511-527.
Theil, H. (1967) Economics and Information Theory. Amsterdam: North-Holland.
Uzawa. H. (1962) “Production Functions with Constant Elasticities of Substitution”. Review of
Economic Studies, 29, 291-299.
Wold, H. with L. Jureen (1953) Demand Analysis. New York: Wiley.
Chapter 27
PHOEBUS J. DHRYMES
Columbia University
Contents
0. Introduction 1568
1. Logit and probit 1568
1 .l Generalities 1568
1.2. Why a general linear model (GLM) formulation is inappropriate 1570
1.3. A utility maximization motivation 1572
1.4. Maximum likelihood estimation 1575
1.5. Goodness of fit 1579
2. Truncated dependent variables 1585
2.1. Generalities 1585
2.2. Why simple OLS procedures fail 1586
2.3. Estimation of parameters by ML methods 1589
2.4. An initial consistent estimator 1590
2.5. Limiting properties and distribution of the ML estimator 1595
2.6. Goodness of tit 1603
3. Sample selectivity 1604
3.1. Generalities 1604
3.2. Inconsistency of least squares procedures 1606
3.3. The LF and ML estimation 1610
3.4. An initial consistent estimator 1613
3.5. Limiting distribution of the ML estimator 1619
3.6. A test for selectivity bias 1625
References 1626
0. Introduction
1.1. Generalities
Let
w= (w,,w*,...,w,),
be a vector of characteristics relative to the alternatives corresponding to the
events 6 and 2; finally, let
rt.=(rtl,..., r ,m) ,
where
e,:t=1,2 ,..., T,
y,=l,
we must have
(2)
while for observations in which
y,=o,
we must have
Et = - x,,p. (3)
Thus, the error term can only assume two possible values, and we are immediately
led to consider an issue that is important to the proper conceptualization of such
models, viz., that what we need is not a linear model “explaining” the choices
Ch. 27: Limited Dependent Variables 1571
&,=1-Q,
P,l= KY, =a (4
and
with probability
Pr*=~bt=o)=l-P,1. (5)
What we really should be asking is: what determines the probability that the t th
economic agent chooses in accordance with event 8, and eq. (1) should be viewed
as a clumsy way of going about it. We see that putting
Hence, prima facie, least squares techniques are not appropriate, even if the
formulations in (1) made intuitive sense.
We shall see that similar situations arise in other LDV contexts in which the
absurdity of least squares procedures is not as evident as it is here.
1572 P. J. Dhrymes
where
The justification for the parameter vector 0 being subscripted is that, since w is
constant across alternatives, 8 must vary. While this may seem unnatural to the
reader it is actually much more convenient, as the following development will
make clear.
If the individual chooses in accordance with 2, then
u, = U( W, r; e,)+ + 01)
E2-E11U(W,r;e1)-U(W,r;e2), 03)
which makes it abundantly clear that we can speak unambiguously only about the
probabilities of choice. To “predict” choice we need an additional “rule” - such
as, for example,
Alternative 1 is chosen when the probability attaching to event 8 is 0.5 or
higher.
If the functions u( .) in (13) are linear, then the t th individual will choose
Alternative 1 if
where
then we have a basis for estimating the parametric structure of our model. Before
we examine estimation issues, however, let us consider some possible distribution
for the errors, i.e. the random variables stl, E,~.
Thus, suppose
and the E,.‘s are independent identically distributed (i.i.d.). We easily find that
Hence
1574 P. J. Dhtymes
where
and F(p) is the c.d.f.’ of the unit normal. Notice that in this context it is not
possible to identify separately /I and u * by observing solely the choices individu-
als make; we can only identify /3/a.
For reasons that we need not examine here, analysis based on the assumption
that errors in (10) and (11) are normally distributed is called Probit Analysis.
We shall now examine another specification that is common in applied re-
search, which is based on the logistic distribution. Thus, let q be an exponentially
distributed random variable so that its density is
u=ln(q)-‘=-lnq. 08)
The Jacobian of this transformation is
J( r + q) = e-O.
If the &ti, i = 1,2 of (14) are mutually independent with density as in (19), then the
joint density is
(20)
Put
Ul+U2=&*,
Since
v1=E2-Et,
exp-(v,+2v,)exp-(e-“Z+e-“1-“z)dv,.
l+e-‘l=t, s = tee”2
to obtain
e-“1
g( vr) = 5 /mSeCsdS =
0 (1 +e-uq2 .
The event 8 may correspond to entering the labor force or going to college in the
examples considered earlier.
where
x,.= (V-J,
w is the s-element row vector describing the relevant attributes of the alternatives
and rr, is the m-element row vector describing the relevant socioeconomic
characteristics of the t th individual.
We recall that a likelihood function may be viewed in two ways: for purposes
of estimation we take the sample as given (here the Yt’s and x,.‘s) and regard it as
a function of the unknown parameters (here the vector j3) with respect to which it
is to be maximized; for purposes of deriving the limiting distribution of estima-
tors it is appropriate to think of it as a function of the dependent variable(s) - and
hence as one that encompasses the probabilistic structure imposed on the model.
This dual view of the likelihood function (LF) will become evident below.
The LF is easily determined to be
and as such it does not contain any random variables1 - even symbolically! Thus,
it is rather easy for a beginning scholar to become confused as to how, solving
aL
-=
ap
0
9
will yield an estimator, say 8, with any probabilistic properties. At least the
analogous situation in the GLM
y=xp+u,
s = (X’X)_‘X’y,
f(xJ)
- - fkB’a) ]Xf=
~=,~l[y~F(x,.s) (1 yf) 1
f
0. (25)
it is important to ensure that solving (25) does, indeed, yield a maximum in the
form of (26) and not merely a local stationary point - at least asymptotically.
The assumptions under which the properties of the ML estimator may be
established are partly motivated by the reason stated above. These assumptions
are
Assumption A.l.1.
The explanatory variables are uniformly bounded, i.e. x,, EH*, for all t, where H,
is a closed bounded subset of R s + m, i.e. the (s + m)-dimensional Euclidean space.
Assumption A.1.2.
The (admissible) parameter space is, similarly, a closed bounded subset of R,+,,,,
say, P* such that P* 3 N(/3O), where N( /3’) is an open neighborhood of the true
parameter point PO.
Remark I
Assumption (A.l.l.) is rather innocuous and merely states that the socioeconomic
variables of interest are bounded. Assumption (A.1.2.) is similarly innocuous. The
technical import of these assumptions is to ensure that, at least asymptotically,
the maximum maximorum of (24) is properly located by the calculus methods of
(25) and to also ensure that the equations in (25) are well defined by precluding a
singularity due to
F(x,./3) = 0 or l-F(x,.P)=O.
Moreover, these assumptions also play a role in the argument demonstrating the
consistency of the ML estimator.
To the above we add another condition, well known in the context of the
general linear model (GLM).
Assumption A. 1.3.
5et
lim X’X
rank(X)=s+m, -=M>O.
T-cc T
With the aid of these assumptions we can easily demonstrate (the proof will not
be given here) the validity of the following
Theorem I
Given assumption A.l.l. through A.1.3. the log likelihood function, L of (24) is
concave in p, whether -F( .) is the unit normal or the logistic c.d.f..
Remark 2
On the other hand as the sample size tends to infinity then with probability one
the condition above is satisfied.
1580 P. J. Dhtymes
plim +T = 0. (30)
T+CO
Now,
1 a3L
@*)=G,,,
TpttT ap,ap,ap,
1 a3L
TpFz~3/= apiapjap,
=Op
aL
i
J7;@ (PI -
--
o
-[ f-&(P”)]h(B-a”)_
Thus,
Hence
as against
HI : pie.
Under Ho
and
2[L(B)-Tlnil- x?+~,
1582 P. J. Dhrymes
is a test statistic for testing the null hypothesis in (32). On the other hand, this is
not a useful basis for defining an R2 statistic, for it implicitly juxtaposes the
economically motivated model that defines the probability of choice as a function
of
and the model based on the principle of insuficient reason which states that the
probability to be assigned to choice corresponding to the event 6’ and that
corresponding to its complement C?are both $. It would be far more meaningful
to consider the null hypothesis to be
i.e. to follow for a nonzero constant term, much as we do in the case of the GLM.
The null hypothesis as above would correspond to assigning a probability to
choice corresponding to event 8 by
J=F(flo) or B, = F-‘(J),
where
QP) = SUPW).
HO
wm- &(“.)(a-PO>
UB)] - -(S-p”)’
L(B”)(P
+(a-p”)‘
~;;g -PO). (33)
In fact, (33) represents a transform of the likelihood ratio (LR) and as such it is a
LR test statistic. We shall now show that in the case where
Ho: P&=0,
Ch. 2 7: Limited Dependent Variables 1585
x,.P,
R”+$ (42)
1. R2 E [O,l)
ii. the larger the contribution of the bona fide variables to the maximum of the
LF the closer is R2 to 1
. ..
ul. R2 stands in a one-to-one relation to the &i-square statistic for testing the
hypothesis that the coefficients of the bona fide variables are zero. In fact,
under HO
-X(&R2 - x:+,,-~.
Finally, we should also stress that R2 as in (42) does not have the interpretation
as the square of the correlation coefficient between “predicted” and “actual”
observations.
2.1. Generalities
households report zero expenditures on consumer durables. This was, in fact, the
situation faced by Tobin (1958) and he chose to model household expenditure on
consumer durables as
The same model was later studied by Amemiya (1973). We shall examine below
the inference and distributional problem posed by the manner in which the
model’s dependent variable is truncated.
(A.2.1.) The {u,: t=l,2 ,... } is a sequence of i.i.d. random variables with
The first question that occurs is why not use the entire sample to estimate /3?
Thus, defining
we may write
y=xfi+u,
and estimate /3 by
u, = - x*.p, t=T,+,,...,T,+T,,
This may appear quite reasonable at first, even though it is also apparent that we
are ignoring some (perhaps considerable) information. Deeper probing, however,
will disclose a much more serious problem. After all, ignoring some sample
elements would affect only the degrees of freedom and the t- and F-statistics
alone. If we already have a large sample, throwing out even a substantial part of it
will not affect matters much. But now it is in order to ask: What is the process by
which some dependent variables are assigned the value zero? A look at (43)
convinces us that it is a random process governed by the behavior of the error
process and the characteristics relevant to the economic agent, x,.. Conversely,
the manner in which the sample on the basis of which we shall estimate fi is
selected is governed by some aspects of the error process. In particular we note
that for us to observe a positive y,, according to
Thus, for the positive observations we should be dealing with the truncated
distribution function of the error process. But, what is the mean of the truncated
distribution? We have, if f( .) is the density and F( .) the c.d.f. of U,
1
E( U,]U, > - x,.P) = O” 5f(Odk
l-F(-x,.P) / -x,.B
f co>?
and, in addition, we also find
I - F( - x,./3> = F(x,.P).
Moreover, if we denote by +(. ), G(e) the iV(0, 1) density and c.d.f., respectively,
and by
y=- X*.P
t u ’
(47)
then
(49)
Var(u,)=a2(1-v,$,--1C/:). (50)
We are operating with the model in (43), subject to (A.2.1.) through (A.2.4.) and
the convention that the first Ti observations correspond to positive dependent
variables, while the remaining T2, (Tl + T2 = T), correspond to zero observations.
Define
c, =1 if yr > 0,
= 0 otherwise, (51)
L= 5 ((1-c,)lnB(v,)-c,[+ln(2n)+flno2+-$(~~-x~.~)”]).
t=l
(52)
8L
-; i {(l-~~)~-,(y~-~~.pi)x,.=o, (53)
ap=
r=1
aL
-=_- 1
i {c~[l-~(~~-x~.8,‘1-(1-c~)~) =O,
au2 2e2 t=i
and these equations have to be solved in order to obtain the ML estimator. It is,
first, interesting to examine how the conditions in (53) differ from the equations
to be satisfied by simple OLS estimators applied to the positive component of the
sample. By simple rearrangement we obtain, using the convention alluded to
above,
(55)
where
+(v,> 445)
SW = qv,> Y c-5) = @(_vt). (56)
Since these expressions occur very frequently, we shall often employ the abbrevia-
15W P. J. Dhrymes
ted notation
#, = ~(V,>~ #T=IC,(-VA.
Thus, if in ‘some sense
is negligible, the ML estimator, say j?, could yield results that are quite similar,
from an applications point of view, to those obtained through the simple OLS
estimator, say fi, as applied to the positive component of the sample. From (54) it
is evident that if z$.. of (57) is small then
aL
-&=o, Y= (P’,a=)‘>
so located is the ML estimator it is necessary to show either that the equation
above has only one root-which is difficult-or that we begin the iteration with an
initial consistent estimator.
Bearing in mind the development in the preceding section we can rewrite the
model describing the positive component of the sample as
Yr=Xr_P+a~t+Ut=u(v,+~,)+u,, (58)
such that
and such that they are independent of the explanatory variables x,.
The model in (58) cannot be estimated by simple means owing to the fact that
4, is not directly observable; thus, we are forced into nonstandard procedures.
We shall present below a modification and simplification of a consistent
estimator due to Amemiya (1973). First we note that, confining our attention to
the positive component of the sample
y:=u2(v,+~t)2+U:+2ut(vt+~r)a. (60)
Hence
The problem, of course, is that JJ~ is correlated with E, and hence simple
regression will not produce a consistent estimator for p and u2.
However, we can employ an instrumental variables (IV.) estimator3
31t is here that the procedure differs from that suggested by Amemiya (1973). He defines
j, =x,. ( xpJ1x;y’“,
while we define
~,=+.a,
where
X, = (D,X,,e), (65)
and
(67)
Clearly
2, x =
* *
XP&JX,
Y’X,
XlV
e’e 1.
Now
and
where
lim f; w&.x,.,
r, + 00 1 t=1
converges to a matrix with finite elements. Further and similar calculations will
show that
(70)
(71)
where
Define, further
se2
g,=+,
1
S; = T;/2ST,.
But then it is evident that Liapounov’s condition is satisfied, i.e. with K a uniform
bound on Ela,.~,[~+’
t=1 Tl K
lim s*2+s
SK lim = lim 0.
TI - cc T+m T1+s/2S;,+a T,-tm T;/2S2i8 =
TI 1
&f:
- E- N(0, H),
where
1
; (x,.a)2x:.xt.Var(Et) 5 (x,.a)x:.Var(&,)
t=1
H= lim 1 ‘=‘r,
(73)
T-cc Tl
5 Var(Et) ’
t=1
Q=lim
( zcx*>
a.c. Tl .
(74)
Moreover since
Ch. 27: Limited Dependent Variables 1595
Lemma 1
Consider the model in (43) subject to assumptions (A.2.1.) through (A.2.4.);
further consider the I.V. estimator of the parameter vector y in
4 = (X&X*)_lk;w,
Returning now to eqs. (53) or (54) and (55) we observe that since the initial
estimator, say 9, is strongly consistent, at each step of the iterative procedure we
get a (strongly) consistent estimator. Hence, at convergence, the estimator so
determined, say p, is guaranteed to be (strongly) consistent.
The perceptive reader may ask: Why did we not use the apparatus of Section
1.d. instead of going through the intermediate step of obtaining the initial
consistent estimator? The answer is, essentially, that Theorem 1 (of Section 1.d.)
does not hold in the current context. To see that, recall the (log) LF of our
problem and write it as
L,(y)=+ f ((1-c,)ln@(-v,)-c,
t=l
(75)
Ch. 27: Limited Dependent Variables 1597
Proof
{&: t =1,2,...},
Remark 3
The device of beginning the iterative process for solving (76) with a consistent
estimator ensures that for sufficiently large T we will be locating the estimator,
say j$, satisfying
G(%-) = suP&-(Y).
Y
Lemma 2, can be shown to imply that
7 = yo.
On the other hand, it is not possible to show routinely that fraz’yo. Essentially,
the problem is the term corresponding to a2 which contains expressions like
c
*bt - a2Q)”.’
1598 P. J. Dhrymes
which cannot be (absolutely) bounded. This does not prevent us from showing
convergence a.c. of pr. to y”. By the iterative process we have shown that qT
converges to y” at least in probability. Convergence a.c. is shown easily once we
obtain the limiting distribution of PT -a task to which we now turn.
Thus, as before, consider the expansion
(78)
We already have an explicit expression in eq. (53) for the derivative dL,/dy. So
let us obtain the Hessian of the LF. We find
m> x,.PO
Elr=(l-c*)@(_vp) -cf
i
Yt -
uo
I
’ (80)
t2t=ct[l-(
y~-;J”~]-(l-c,)~!~;)
and
&l,=(l-c,)~r”(lClfo-vP)+CI,
+ VP- vM”>9
L = t21r= Cl- c,)J/tO(l (81)
(2214(y+yO) +(1- Ct)V$#=O(l + v#T” - vP2),
Ch. 27: Limited Dependent Variables 1599
where, evidently,
i I[]
~(,0)_!_~ i
t=1
a.& tit
0
521 ’
(8’4
and
where 52, r is a matrix all of whose elements are zero except the last diagonal
element, which is
+ ,$r $-&..
JT(Pr-YO) - N(O,&-‘),
where
Ch. 27: Limited Dependent Variables 1601
converges in probability to the null matrix, element by element. But the elements
of
are seen to be sums of independent random variables with finite means and
bounded variances; hence, they obey a Kolmogorov criterion and thus
=W 121 = W21r3
EG22t) = W22t.
Hence
and, moreover,
2.6. Goodness of $2
In the context of the truncated dependent variable model the question arises as to
what we would want to mean by a “goodness of fit” statistic.
As analyzed in the Section on discrete choice models the usual R*, in the
context of the GLM, serves a multiplicity of purposes; when we complicate the
process in which we operate it is not always possible to define a single statistic
that would be meaningful in all contexts.
Since the model is
the fitted model may “describe well” the first statement but poorly the second or
vice versa. A useful statistic for the former would be the square of the simple
correlation coefficient between predicted and actual Y,. Thus, e.g. suppose we
follow our earlier convention about the numbering of observations; then for the
positive component of the sample we put
(89)
where
d> 0. (93)
3. Sample selectivity
3. I. Generalities
This is another important class of problems that relate specifically to the issue of
how observations on a given economic phenomenon are generated. More particu-
larly, we hypothesize that whether a certain variable, say yz, is observed or not
depends on another variable, say y12 *. Thus, the observability of y,T depends on the
probability structure of the stochastic process that generates y,;, as well as on that
of the stochastic process that governs the behavior of yzT. The variable y,; may be
inherently unobservable although we assert that we know the variables that enter
its “systematic part.”
To be precise, consider the model
where x2 ., x;C2. are rl, r,-element row vectors of observable “exogenous” variables
Ch. 27: Limited Dependent Variables 1605
u::=(u;,ur,), t=1,2,...,
* (95)
P.i=P.:, p.2=*p. Utl=UX, *_ *
ut2 = U,l u,23
i
-a’: i
’
with the understanding that if x;“l. and x2. have elements in common, say,
then
h-P”;2\
where /?.:i, /3.t2 are the coefficients of z,i in x;“l_ and xz respectively, 8; is the
coefficient of zx. and p.;2 is the coefficient of zfrz.
Hence, the model in (94) can be stated in the canonical form
and subject to the condition that ytl is observable (observed) if and only if
Y,, 2 0.
that the standard conditions of the typical GLM hold, nothing in the subsequent
discussion suggests a correlation between x,r. and u,~; hence, if any problem
should arise it ought to be related to the probability structure of the sequence in
(98) insofar as it is associated with observable ytI -a problem to which we now
turn. We note that the conditions hypothesized by the model imply that (poten-
tial) realizations of the process in (98) are conditioned on4
Or, perhaps more precisely, we should state that (implicit) realizations of the
process in (98) associated with observable realizations
{Y*G t =1,2,...},
are conditional on (99). Therefore, in dealing with the error terms of (potential)
samples the marginal distribution properties of (98) are not relevant; what are
relevant are its conditional properties-as conditioned by (99).
We have
Lemma 3
000)
where
1
=- xt2.P.2 7rI =- y,,+---u
P12
YI2 l/2 ' J/2 l/2
r1 )
022 i 011 i
001)
2 =- 42
P12 a=l- pt2,
~11~22
Proof
i. is quite evidently valid since by the standard assumptions of the GLM we assert
that (x:.,x,*.) and uF=(u;, u&) are mutually independent and that
{ 24;‘: 1=1,2,...},
u,2 2 -x,2./3.2,
s= (5- f31)/(42cxY2,
we find
1 7
f(u,,lu,z~-x,2,~.2)=~1
@( Yt2) \/2?ra,, exp- z”“.
To verify that this is, indeed, a density function we note that it is everywhere
nonnegative and
/
O”
-CC
f
(51142 2 - X,,.P.2M
1 O3~ 1 4
=-
@(Yr2) / _-oo\/j----& _J&P- $3~2 *exp- g-p,.
ii I
Ch. 27: Limited Dependent Variables 1609
f2 = .1’25* - P,231,
Lemma 4
(k -2s k _ 1
[2( y-r)]!
0021
*2+r(&p_,)!’
&--I
, 003)
where
+bt2>
4(%*)
--
- qvt2) 3 I,,, =I, I,,, = d12P,24(V,2). 004)
1610 P. J. Dhtymes
Remark 5
It is evident, from the preceding discussion, that the moments of the error process
corresponding to observable y,, are uniformly bounded in P.r, P. 2, urr, ur2, uz2,
xI1, and x12, -provided the parameter space is compact and the elements of
x,, ., x,~ are bounded.
Remark 6
It is also evident from the preceding that for (potential) observations from the
model
(105)
We are now in a position to answer the question, raised earlier, whether OLS
methods applied to the first equation in (97) will yield at least consistent
estimators. In this connection we observe that the error terms of observations on
the first equation of (97) obey
- %P:2+2h2)
We shall assume that in our sample we have entities for which y,r is observed and
entities for which it is not observed; if ytl is not observable, then we know that
Ch. 27: Limited Dependent Variables 1611
u,z-c- xt2.P.2.
Evidently, the probability of observing ytr is @(vt2) and given that ytI is observed
the probability it will assume a value in some internal A is
-+‘d.&
exp- %r
Hence, the unconditional probability that ytI will assume a value in the interval A
is
Define
c, =l if y,, is observed,
= 0 otherwise.
L*= fi [~(Y12)f(Y~1-x,,.P.llu,22
-Xt2.P.2)IC([~(-Y12)11-c,. (106)
r=l
Thus, e.g. if for a given sample we have no observations on y,, the LF becomes
Finally, if the sample contains entities for which y,, is observed as well as entities
1612 P. J. Dhymes
for which it is not observed, then we have the situation in (106). We shall examine
the estimation problems posed by (106) in its general form.
Remark 7
It is evident that we can parametrize the problem in terms of /?.i, P. 2, utr, u22,a,,;
it is further evident that j3.2 and a,, appear only in the form (/I.2/u:2/2) - hence,
that a,, cannot be, separately, identified. We shall, thus, adopt the convention
022 =
1. (107)
A consequence of (107) is that (105) reduces to
KG?,., u,2 2 - x,,.P.,> = x,1.P.1+ fJl244d. 008)
L= g (l-c,)ln@(--v,,)
t=1
+ c, - iln(27r) - +lnu,, 2i
- --$,1-x,1.&)“]
]
Remark 8
(110)
(111)
Ch. 27: Limited Dependent Variables 1613
Putting
g(p) =0.
Evidently, this is a highly nonlinear set of relationships which can be solved
only by iteration, from an initial consistent estimator, say 7.
To obtain an initial consistent estimator we look at the sample solely from the
point of view of whether information is available on y,i, i.e. whether ytl is
observed with respect to the economic entity in question. It is clear that this, at
best, will identify only /3.*, since absent any information on y,, we cannot
possibly hope to estimate p.i. Having estimated B. 2 by this procedure we proceed
to construct the variable
Then, turning our attention to that part of the sample which contains observa-
tions on yti, we simply regress ytl on (x,i., 4,). In this fashion we obtain
estimators of
6= %2)’
(PC12 (117)
as well as of uii.
Examining the sample from the point of view first set forth at the beginning of
this section we have the log likelihood function
which is to be maximized with respect to the unknown vector /3.2. In Section 1.d.
we noted that L, is strictly concave with respect to p. 2; moreover, the matrix of
Ch. 27: Limited Dependent Variables 1615
It is our contention that the estimator in (125) is consistent for p.l and a,,; moreover
that it naturally implies a consistent estimator for oI1, thus yielding the initial
consistent estimator, say
fi(&&O)=
( 1~-Yb.1-u12($-
q
$41 - NO, FL (130)
for suitable matrix F, thus showing that 8 converges to 6’ with probability one
(almost surely).
In order that we may accomplish this task it is imperative that we must specify
more precisely the conditions under which we are to consider the model5 in (94),
as expressed in (97). We have:
(A.3.1.) The basic error process
(A.3.3.) The exogenous variables are nonstochastic and are bounded, i.e.
5A~ pointed out earlier,it may be more natural to state conditions in terms of the basic variables
x,:., X; ., U; and uZ; doing so, however, will disrupt the continuity of our discussion; for this reason
we state conditions in terms of x,~., q., u,~. and u,q.
1616 P. J. Dhtymes
lip +x;x*
= P, P>O.
Remark 9
It is a consequence of the assumptions above that, for any x12, and admissible
P.2r there exists k such that
Consequently,
+,dM +,,.P.d
v+J= @(Xt2.&) ’ J/*w=@(- x,,.p.,>’
Lemma 5
6 We remind the reader that in the canonical representation of (97), the vector xtl. is a subvector of
x,* .; hence the boundedness assumptions on q2 imply similar boundedness conditions on xzl..
Incidentally, note that B.t is not necessarily a subvector of /X0,, since the latter would contain
Pfl - /3.7! and in addition S.$, - a.*,“, , while the former will contain /?:y, 85,.
Ch. 27: Limited Dependent Variables 1617
where
Proof
We examine
(133)
s* _ a2iG,h2)
t- evaluated at /I.2 = /3.$,
ad
or of the form
1618 P. J. Dhynes
T
lim + C oyx:,.x,,.,
T-+CC r=l
plim S, = 0,
T+CC
which implies
Corollas 4
The limiting distribution of the left member of (130) is obtainable through
Under assumption (A.3.1) through (A.3.4) the initial (consistent) estimator of this
section has the limiting distribution
where
Ch. 27: Limited Dependent Variables 1619
Corollary 5
The initial estimator above is strongly consistent.
Proof
8 converges to 6’ a.c.
Evidently, the parameter cril can be estimated (at least consistently) by
all= T
l [a; + 6,,J$,2 + c$&:] f
and have shown that it converges to the true parameter point, say y”, with
probability one (a.c.).
We now investigate the properties of the ML estimator, say q, obtained by
solving
Ch. 27: Limited Dependent Variables 1621
P12
&2
G2(d
@2(Tt)
i
p12v,2 +
_
$2
)I’ (139)
1
1 h)
522r=;cr @(#+ m
@2(rt) +(1-ct)~*(%2)[+*(~,2)--y1217
[
I( )-_!k~~++~,
2 2
,$,=C, 2 A!$
51 * 611 f J!&
i 011 i
I d2
a
)I+2(d
@2(q)
(
Ufl
u:1/2
2
’
P. J. Dhrymes
U?l
l/2 .
i-i011
Remark 10
The starred symbols, for example, t4:r, [;33r, &&, all correspond to components
of the Hessian of the log LF having mean zero. Hence, such components can be
ignored both in determining the limiting distribution of the ML estimator and in
its numerical derivation, given a sample. We can, then, represent the Hessian of
the log of the LF as
where 52: contains only zeros or elements having mean zero. It is also relatively
straightforward to verify that
where the elements of A,, .$., and C2, have been evaluated at the true parameter
point y”.
To determine the limiting distribution of the ML estimator (i.e. the converging
iterate beginning with an initial consistent estimator) we need
Ch. 27: Limited Dependent Variables 1623
Lemma 6
where
Proof
The sequence
+ E(YO)
- W,C*). (Q.E.D.)
f-g(Y). y E H.
Again for the sake of brevity of exposition we shall only state the result without
proof
Lemma 7
uniformly in y.
Ch. 27: Limited Dependent Variables 1625
Corollary 7
1 a=L
T J--Y&~*) 2. lim +[ &YO,].
TdCO
Proof
[
- +ct ln(2~)+lna,, + &( y,,- x,,.~.,)’ Ii. (142)
We note that (142) is separable in the parameters (/3!,, uri)’ and p.*. Indeed, the
ML estimator of /3.= is the “probit” estimator, p,=, obtained in connection with
eq. (118) in Section 3.d.; the ML estimator of (fi!i, (I&’ is the usual one obtained
by least squares except that uir is estimated with bias - as all maximum likelihood
procedures imply in the normal case. Denote the estimator of Y obtained under
Ho, by y. Denote by y the ML estimator whose limiting distribution was
obtained in the preceding section.
1626 P. J. Dhrymes
Thus
(143)
-2x-x:.
In the context of the model of this section a test for the absence of selectivity bias
can be carried out by the likelihood ratio (LR) principle. The test statistic is
-2x-x:,
where
A = supL(y)- supL(y).
HO Y
References
Bartlett, M. S. (1935) “Contingent Table Interactions”, Supplement to the Journal of the Royal
Statistical Sociev, 2, 248-252.
Berkson, J. (1949) “Application of the Logistic Function to Bioassay”, Journal of the American
Statistical Association, 39, 357-365.
Berkson, J. (1951) “Why I Prefer Logits to Probits”, Biometrika, 7, 327-339.
Berkson, J. (1953) “A StatisticaIly Precise and Relatively Simple Method of Estimating the Bio-Assay
with Quantal Response, Based on the Logistic Function”, Journal of the American Statistical
Association, 48, 565-599.
Berkson, J. (1955) “Estimate of the Integrated Normal Curve by Minimum Normit Cl&Square with
Particular Reference to B&Assay”, Journal of the American Statistical Association, 50, 529-549.
Berkson, J. (1955) “Maximum Likelihood and Minimum C&Square Estimations of the Logistic
Function”, Journal of the American Statistical Association, 50, 130-161.
Bishop, T., S. Feiberg and P. Hollan (1975) Discrete Multiuariate Analysis. Cambridge: MIT Press.
Block, H. and J. Marschak (1960) “Random Orderings and Stochastic Theories of Response”, in: I.
Olkin, ed., Contributions to Probability and Statistics. Stanford: Stanford University Press.
Bock, R. D. (1968) “Estimating Multinomial Response Relations”, in: Contributions to Statistics and
Probability: Essays in Memory of S. N. Roy. Chapel Hill: University of North Carolina Press.
Bock, R. D. (1968) The Measurement and Prediction of Judgment and Choice. San Francisco:
Holden-Day.
Boskin, M. (1974) “A Conditional Logit Model of Occupational Choice”, Journal of Political
Economy, 82, 389-398.
Boskin, M. (1975) “A Markov Model of Turnover in Aid to Families with Dependent Children”,
Journal of Human Resources, 10, 467-481.
Chambers, E. A. and D. R. Cox (1967) “Discrimination between Alternative Binary Response
Models”, Biometrika, 54, 573-578.
Cosslett, S. (1980) “Efficient Estimators of Discrete Choice Models”, in: C. Manski and D.
McFadden, eds., Structural Analysis of Discrete Data. Cambridge: MIT Press.
Cox, D. (1970) Analysis of Binary Data. London: Methuen.
Cox, D. (1972) “The Analysis of Multivariate Binary Data”, Applied Statistics, 21, 113-120.
Cox, D. (1958) “The Regression Analysis of Binary Sequences”, Journal of the Royal Statistical
Society, Series B, 20, 215-242.
Cox, D. (1966) “Some Procedures Connected with the Logistic Response Curve”, in: F. David, ed.,
Research Papers in Statistics. New York: Wiley.
Cox, D. and E. Snell (1968) “A General Definition of Residuals”, Journal of the Royal Statistical
Society, Series B, 30, 248-265.
Cragg, J. G. (1971) “Some Statistical Models for Limited Dependent Variables with Application to the
Demand for Durable Goods”, Econometrica, 39, 829-844.
Cragg, J. and R. Uhler (1970) “The Demand for Automobiles”, Canadian Journal of Economics, 3,
386-406.
Cripps, T. F. and R. J. Tarling (1974) “An Analysis of the Duration of Male Unemployment in Great
Britain 1932-1973”, The Economic Journal, 84, 289-316.
Daganzo, C. (1980) Multinomial Probit. New York: Academic Press.
Dagenais, M. G. (1975) “Application of a Threshold Regression Model to Household Purchases of
Automobiles”, The Review of Economics and Statistics, 57, 275-285.
Debreu, G. (1960) “Review of R. D. Lute Individual Choice Behavior”, American Economic Review,
50,186-188.
Dhrymes, P. J. (1970) Econometrics: Statistical Foundations and Applications. Harper & Row, 1974,
New York: Springer-Verlag.
Dhrymes, P. J. (1978a) Introductory Econometrics. New York: Springer-Verlag.
Dhrymes, P. J. (1978b) Mathematics for Econometrics. New York: Springer-Verlag.
Domencich, T. and D. McFadden (1975) Urban Travel Demand: A Behavioral Analysis. Amsterdam:
North-Holland.
Efron, B. (1975) “The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis”,
Journal of the American Statistical Association, 70, 892-898.
Fair, R. C. and D. M. JaKee (1972) “Methods of Estimation for Markets in Disequilibrium”,
Econometrica, 40, 497-514.
1628 P. J. Dhrymes
Hausman, J. A. and D. A. Wise (1980) “Stratification on Endogenous Variables and Estimation: The
Gary Experiment”, in: C. Manski and D. McFadden, eds., Structural Analysis of Discrete Dota.
Cambridge: MIT Press.
Heckman, J. (1974) “Shadow Prices, Market Wages, and Labor Supply”, Econometrica, 42, 679-694.
Heckman, J. (1976) “Simultaneous Equations Model with Continuous and Discrete Endogenous
Variables and Structural Shifts”, in: S. M. Goldfeld and E. M. Quandt, eds., Studies in Non-Linear
Estimation. Cambridge: Ballinger.
Heckman, J. (1976) “The Common Structure of Statistical Models of Truncation, Sample Selection
and Limited Dependent Variables and a Simple Estimation for Such Models”, Annals of Economic
and Social Measurement, 5, 415-492.
Heckman, J. (1978) “Dummy Exogenous Variables in a Simultaneous Equation System”, Econometrica,
46, 931-959.
Heckman, J. (1978) “Simple Statistical Models for Discrete Panel Data Developed and Applied to
Test the Hypothesis of True State Dependence Against the Hypothesis of Spurious State Depen-
dence”, Annals de I’lnsee, 30-31, 227-270.
Heckman, J. (1979) “Sample Selection Bias as a Specification Error”, Econometrica, 47, 153-163.
Heckman, J. (1980) “Statistical Models for the Analysis of Discrete Panel Data”, in: C. Manski and
D. McFadden, eds., Structural Analysis of Discrete Data. Cambridge: MIT Press.
Heckman, J. (1980) “The Incidental Parameters Problem and the Problem of Initial Conditions in
Estimating a Discrete Stochastic Process and Some Monte Carlo Evidence on Their Practical
Importance”, in: C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data. Cam-
bridge: MIT Press.
Heckman, J. and R. Willis (1975) “Estimation of a Stochastic Model of Reproduction: An Economet-
ric Approach”, in: N. Terleckyj, ed., Household Production and Consumptron. New York: National
Bureau of Economic Research.
Heckman, J. and R. Willis (1977) “A Beta Logistic Model for the Analysis of Sequential Labor Force
Participation of Married Women”, Journal of Political Economy, 85, 27-58.
Joreskog, K. and A. S. Goldberger (1975) “Estimation of a Model with Multiple Indicators and
Multiple Causes of a Single Latent Variable Model”, Journal of the Amencan Statistical Assocration,
70, 631-639.
Kiefer, N. (1978) “Discrete Parameter Variation: Efficient Estimation of a Switching Regression
Model”, Econometrica, 46, 427-434.
Kiefer, N. (1979) “On the Value of Sample Separation Information”, Econometrica, 47, 997-1003.
Kiefer, N. and G. Neumann (1979) “An Empirical Job Search Model with a Test of the Constant
Reservation Wage Hypothesis”, Journal of Political Economy, 87, 89-107.
Kohn, M., C. Manski and D. Mundel (1976), “An Empirical Investigation of Factors Influencing
College Going Behavior”, Annals of Economic and Social Measurement, 5, 391-419.
Ladd, G. (1966) “Linear Probability Functions and Discriminant Functions”, Econometrica, 34,
873-885.
Lee, L. F. (1978) “Unionism and Wage Rates: A Simultaneous Equation Model with Qualitative and
Limited Dependent Variables”, International Economic Review, 19, 415-433.
Lee, L. F. (1979) “Identification and Estimation in Binary Choice Models with Limited (Censored)
Dependent Variables”, Econometrica, 47, 917-996. _
Lee. L. F. (1980) “Simultaneous Eouations Models with Discrete and Censored Variables”, in: C.
Manski and D: McFadden, eds., Stmtural Analysis of Discrete Data. Cambridge: MIT Press.
Lee, L. F. and R. P. Trost (1978) “Estimation of Some Limited Dependent Variable Models with
Applications to Housing Demand”, Journal of Econometrics, 8, 357-382.
Lerman, S. and C. Manski (1980) “On the Use of Simulated Frequencies to Approximate Choice
Probabilities”, in: C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data.
Cambridge: MIT Press.
Li, M. (1977) “A Logit Model of Home Ownership”, Econometrica, 45, 1081-1097.
Little, R. E. (1968) “A Note on Estimation for Quantal Response Data”, Biometrika, 55, 578-579.
Lute, R. D. (1959) Individual Choice Behavior: A Theoretical Analysis. New York: Wiley.
Lute, R. D. (1977) “The Choice Axiom After Twenty Years”, Journal of Mathematical Psychology, 15,
215-233.
Lute, R. D. and P. Suppes (1965) “Preference, Utility, and Subjective Probability”, in: R. Lute, R.
1630 P. J. Dhtymes
Bush and E. Galanter, eds., Handbook of Mathematical Psychology III. New York: Wiley.
Maddala, G. S. (1977) “Self-Selectivity Problem in Econometric Models”, in: P. Krishniah, ed.,
Applications of Statistics. Amsterdam: North-Holland.
Maddala, G. S. (1977) “Identification and Estimation Problems in Limited Dependent Variable
Models”, in: A. S. Blinder and P. Friedman, eds., Natural Resources, Uncertainty and General
Equilibrium Systems: Essays in Memory of Rafael Lusky. New York: Academic Press.
Maddala, G. S. (1978) “Selectivity Problems in Longitudinal Data”, Annals de I’INSEE, 30-31,
423-450.
Maddala, G. S. and L. F. Lee (1976) “Recursive Models with Qualitative Endogenous Variables”,
Annals of Economic and Social Measurement, 5.
Maddala, G. and F. Nelson (1974) “Maximum Likelihood Methods for Markets in Disequilibrium”,
Econometrica, 42, 1013-1030.
Maddala, G. S. and R. Trost (1978) “Estimation of Some Limited Dependent Variable Models with
Application to Housing Demand;‘, Journal of Econometrics, 8, 357-382.
Maddala. G. S. and R. Trost (1980) “Asvmutotic Covariance Matrices of Two-Staae Probit and
Two-Stage Tobit Methods for‘Sim&ne&~Equations Models with Selectivity”, Econometrica, 48,
491-503.
Manski, C. (1975) “Maximum Score Estimation of the Stochastic Utility Model of Choice”, Journal of
Econometrics, 3, 205-228.
Manski, C. (1977) “The Structure of Random Utility Models”, Theory and Decision, 8, 229-254.
Manski, C. and S. Lerman (1977) “The Estimation of Choice Probabilities from Choice-Based
Samples”, Econometrica, 45; 1977-1988.
Manski, C. and D. McFadden (1980) “Alternative Estimates and Sample Designs for Discrete Choice
Analysis”, in: C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data. Cambridge:
MIT Press.
Marshak, J. “Binary-Choice Constraints and Random Utility Indicators”, in: K. Arrow, S. Karlin and
P. Suppes, eds., Mathematical Methodr in the Social Sciences. Stanford University Press.
McFadden, D. “Conditional Logit Analysis of Qualitative Choice Behavior”, in: P. Zarembka, ed.,
Frontiers in Econometrics. New York: Academic Press.
McFadden, D. (1976) “A Comment on Discriminant Analysis ‘Versus’ Logit Analysis”, Annals of
Economics and Social Measurement, 5, 511-523.
McFadden, D. (1976) “Quantal Choice Analysis: A Survey”, Annals of Economic and Social
Measurement, 5, 363-390.
McFadden, D. (1976) “The Revealed Preferences of a Public Bureaucracy”, Bell Journal, 7, 55-72.
Miller, L. and R. Radner (1970) “Demand and Supply in U.S. Higher Education”, American
Economic Review, 60, 326-334.
Moore, D. H. (1973) “Evaluation of Five Discrimination Procedures for Binary Variables”, Journal of
American Statistical Association, 68, 399-404.
Nelson, F. (1977) “Censored Regression Models with Unobserved Stochastic Censoring Thresholds”,
Journal of Econometrics, 6, 309-327.
Nelson, F. S. and L. Olsen (1978) “Specification and Estimation of a Simultaneous Equation Model
with Limited Dependent Variables”, International Economic Review, 19, 695-710.
Nerlove, M. (1978) “Econometric Analysis of Longitudinal Data: Approaches, Problems and Pro-
spects”, Annales de I’lnsee, 30-31, 7-22.
Nerlove, M. and J. Press (1973) “Univariable and Multivariable Log-Linear and Logistic Models”,
RAND Report No. R-1306-EDA/NIH.
Oliveira, J. T. de (1958) “Extremal Distributions”, Revista de Faculdada du Ciencia, Lisboa, Serie A,
7, 215-227.
Olsen, R. J. (1978) “Comment on ‘The Effect of Unions on Earnings and Earnings on Unions: A
Mixed Logit Approach”‘, International Economic Review, 259-261.
Plackett, R. L. (1974) The Analysis of Categorical Data. London: Charles Griffin.
Poirier, D. J. (1976) “The Determinants of Home Buying in the New Jersey Graduated Work
Incentive Experiment”, in: H. W. Watts and A. Rees, eds., Impact of Experimental Payments on
Expenditure, Health and Social Behavior, and Studies on the Quality of the Evidence. New York:
Academic Press.
Poirier, D. J. (1980) “A Switching Simultaneous Equation Model of Physician Behavior in Ontario”,
Ch. 27: Limited Dependent Variables 1631
in: D. McFadden and C. Manski, eds., Structural Analysis of Discrete Data: With Econometric
Applications. Cambridge: MIT Press.
Pollakowski, H. (1980) Residential Location and Urban Housing Markets. Lexington: Heath.
Quandt, R. (1956) “Probabilistic Theory of Consumer Behavior”, Quarterly Journal of Economics, 70,
501-536.
Quandt, R. (1970) The Demand for Travel. London: Heath.
Quandt, R. (1972) “A New Approach to Estimating Switching Regressions”, Journal of the American
Statisiical Association, 67, 306-310.
Quandt, R. (1978) “Tests of the Equilibrium vs. Disequilibrium Hypothesis”, Internutional Economic
Review, 19, 435-452.
Quandt, R. and W. Baumol(1966) “The Demand for Abstract Travel Modes: Theory and Measure-
ment”, Journal of Regional Science, 6, 13-26.
Quandt, R. E. and J. B. Ramsey (1978) “Estimating Mixtures of Normal Distributions and Switching
Regressions”, Journal of the American Statistical Association, 71, 730-752.
Quigley, J. M. (1976) “Housing Demand in the Short-Run: An Analysis of Polytomous Choice”,
Explorations in Economic Research, 3, 76-102.
Radner, R. and L. Miller (1975) Demand and Supply in U.S. Higher Education. New York:
McGraw-Hill.
Sattath, S. and A. Tversky (1977) “Additive Similarity Trees”, Psychometrika, 42, 319-345.
Shakotko; Robert A. and M. Grossman (1981) “Physical Disability and Post-Secondary Educational
Choices”, in: V. R. Fuchs, ed., Economic Aspects of Health. National Bureau of Economic Research,
Chicago: University of Chicago Press.
Sickles, R. C. and P. Schmidt (1978) “Simultaneous Equation Models with Truncated Dependent
Variables: A Simultaneous Tobit Model”, Journal of Economics and Business, 31, 11-21.
Theil, H. (1969) “A Multinomial Extension of the Linear Logit Model”, International Economic
Review, 10, 251-259.
Theil, H. (1970) ‘I On the Estimation of Relationships Involving Qualitative Variables”, American
Journal of Sociology, 76, 103-154.
Thurstone, L. (1927) “A Law of Comparative Judgement”, Psychological Review, 34, 273-286.
Tobin, J. (1958) “Estimation of Relationships for Limited Dependent Variables”, Econometrica, 26,
24-36.
Tversky, A. (1972) “Choice by Elimination”, Journal of Mathematical Psychology. 9, 341-367.
Tversky, A. (1972) “Elimination by Aspects: A Theory of Choice”, Psychological Review, 79,281-299.
Walker, S. and D. Duncan (1967) “Estimation of the Probability of an Event as a Function of Several
Independent Variables”, Biometrika, 54, 167-179.
Westin, R. (1974) “Predictions from Binary Choice Models”, Journal of Econometrics, 2, 1-16.
Westin, R. B. and D. W. Gillen (1978) “Parking Location and Transit Demand: A Case Study of
Endogenous Attributes in Disaggregate Mode Choice Functions”, Journal of Econometrics, 8,
75-101.
Willis, R. and S. Rosen (1979) “Education and Self-Selection”, Journal of Political Economy, 87,
507-536.
Yellot, J. (1977) “The Relationship Between Lute’s Choice Axiom, Thurstone’s Theory of Compara-
tive Judgment, and the Double Exponential Distribution”, Journal of Mathematical Psychology, 15,
109-144.
Zellner, A. and T. Lee (1965) “Joint Estimation of Relationships Involving Discrete Random
Variables”, Econometrica, 33, 382-394.
Chapter 28
G. S. MADDALA
University of Florida
Contents
1. Introduction 1634
2. Estimation of the switching regression model:
Sample separation known 1637
3. Estimation of the switching regression model:
Sample separation unknown 1640
4. Estimation of the switching regression model with imperfect
sample separation information 1646
5. Switching simultaneous systems 1649
6. Disequilibrium models: Different formulations of price adjustment 1652
6.1. The meaning of the price adjustment equation 1653
6.2. Modifications in the specification of the demand and supply functions 1656
6.3. The validity of the “Mm” condition 1660
7. Some other problems of specification in disequilibrium models 1662
7.1. Problems of serial correlation 1663
7.2. Tests for distributional assumptions 1664
7.3. Tests for disequilibrium 1664
7.4. Models with inventories 1667
8. Multimarket disequilibrium models 1668
9. Models with self-selection 1672
10. Multiple criteria for selectivity 1676
11. Concluding remarks 1680
References 1682
*This chapter was first prepared in 1979. Since then Quandt (1982) has presented a survey of
disequilibrium models and Maddala (1983a) has treated self-selection and disequilibrium models in
two chapters of the book. The present paper is an updated and condensed version of the 1979 paper. If
any papers are not cited, it is just through oversight rather than any judgment on their importance.
Financial support from the NSF is gracefully acknowledged.
1. Introduction
The title of this chapter stems from the fact that there is an underlying similarity
between econometric models involving disequilibrium and econometric models
involving self-selection, the similarity being that both of them can be considered
switching structural systems. We will first consider the switching regression model
and show how the simplest models involving disequilibrium and self-selection fit
in this framework. We will then discuss switching simultaneous equation models,
disequilibrium models and self-selection models.
A few words on the history of these models might be in order at the outset.
Disequilibrium models have a long history. In fact all the “partial adjustment”
models are disequilibrium models.’ However, the disequilibrium models consid-
ered here are different in the sense that they add the extra element of ‘quantity
rationing’. The differences will be made clear later (in Section 6). As for
self-selection models, one can quote an early study by Roy (1951) who considers
an example of two occupations: Hunting and fishing and individuals self-select
based on their comparative advantage. This example and models of self-selection
are discussed later (in Section 9). Finally, as for switching models, almost all the
models with discrete parameter changes fall in this category and thus they have a
long history. The models considered here are of course different in the sense that
we consider also “endogenous” switching. We will first start with some examples
of switching regression models. Switching simultaneous equations models are
considered later (in Section 5).
Suppose the observations on a dependent variable Y can be classified into two
regimes and are generated by different probability laws in the two regimes. Define
Yl=xBl+~l. 0.1)
Y2 = w, + u 2. (1.2)
and
‘The disequilibrium model in continuous time analyzed by Bergstrom and Wymer (1976) is also a
partial adJustment model except that it is formulated in continuous time.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1635
0:
1
012 (Jlu
z= u12 u; u2u .
(JlU 02u 1
i
We have set var(u) = 1 because, by the nature of the conditions (1.3) and (1.4) (Y
is estimable only up to a scale factor.
The model given by eqs. (1.1) to (1.4) is called a switching regression model. If
% = (72” = 0 then we have a model with exogenous switching. If uiU or u2U is
non-zero, we have a model with endogenous switching. This distinction between
switching regression models with exogenous and endogenous switching has been
discussed at length in Maddala and Nelson (1975).
We will also distinguish between two types of switching regression models.
Model A: Sample separation known.
Model B: Sample separation unknown.
In the former class we know whether each observed y is generated by (1.1) or
(1.2). In the latter class we do not have this information. Further, in the models
with known sample separation we can consider two categories of models:
Model A-l: y observed in both regimes.
Model A-2: y observed in only one of the two regimes.
We will discuss the estimation of this type of models in the next section. But first,
we will given some examples for the three different types of models.
Example 1: Disequilibrium market model
Fair and Jaffee (1972) consider a model of the housing market. There is a demand
function and a supply function but demand is not always equal to supply. (As to
why this happens is an important question which we will discuss in a later
section.) The specification of the model is:
Demand function: D = XP, + u1
Supply function: S = X/3, + u2
The quantity transacted, Q, is given by
Q = X,2 + ~2 if D > S.
1636 G. S. Maddala
Figure 1
where a2 = Var(u, - u2) = 0: + u; - 25. Thus the model is the same as the
switching regression model in eqs. (1.1) to (1.4) with 2 = X, (Y= ( p2 - Pr)/a and
u = (ur - u2)/u. If sample separation is somehow known, i.e. we know which
observations correspond to excess demand and which correspond to excess
supply, then we have Model A-l. If sample separation is not known, we have
Model B.
Example 2: Model with self-selection
Consider the labor supply model considered by Gronau (1974) and Lewis (1974).
The wages offered W, to an individual, and the reservation wages W, (the wages
at which the individual is willing to work) are given by the following equations:
wo=xp,+u, wr=xp,+u*.
The individual works and the observed wage W = W, if W, 2 W,. If W, < W,, the
individual does not work and the observed wages are W = 0. This is an example
of Model A-2. The dependent variable is observed in only one of the two regimes.
The observed distribution of wages is a truncated distribution-it is the distribu-
tion of wage offers truncated by the “Self-selection” of individuals-each individ-
ual choosing to be ‘in the sample’ of working individuals or not, by comparing his
(or her) wage offer with his (or her) reservation wage.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1637
y, = Desired borrowings
y2 = Threshhold level below which banks will not use the discount window.
The structure of this model is somewhat different from that given in examples 2
and 3, because we observe yi all the time. We do not observe y2 but we know for
each observation whether y, I y, (the bank borrows in the Federal funds market)
or yi > y2 (the bank borrows from the discount window).
Some other examples of the type of switching regression model considered here
are the unions and wages model by Lee (1978), the housing demand model by Lee
and Trost (1978), and the education and self-selection model of Willis and Rosen
(1979).
Returning to the model given by eqs. (1.1) to (1.4), we note that the likelihood
function is given by (dropping the t subscripts on U, X, Z, y and I)
(2-l)
where
and the bivariate normal density of (tli, u) has been factored into the marginal
1638 G. S. Maddala
density gr( ur) and the conditional density fr( u[ui), with a similar factorization of
the bivariate normal density of (u,, u). Note that ui2 does not occur at all in the
likelihood function and thus is not estimable in this model. Only urU and uzu are
estimable. In the special case u = (ur - ~*)/a where u2 = Var(u, - u2) as in the
examples in the previous section, it can be easily verified that from the consistent
estimates of a:, CT:, uiU and a,, we can get a consistent estimate of ur2.
The maximum likelihood estimates can be obtained by an iterative solution of
the likelihood equations using the Newton-Raphson method or the Berndt et al.
(1974) method. The latter involves obtaining only the first derivatives of the
likelilood function and has better convergence properties. In Lee and Trost
(1978) it is shown that the log-likelihood function for this model is uniformly
bounded from above. The maximum likelihood estimates of this model can be
shown to be consistent and asymptotically efficient following the lines of proof
that Amemiya (1973) gave for the Tobit model. To start the iterative solution of
the likelihood equations, one should use preliminary consistent estimates of the
parameters which can be obtained by using a two-stage estimation method which
is described in Lee and Trost (197Q2 and will not be reproduced here.
There are some variations of this switching regression model that are of
considerable interest. The first is the case of the labor supply model where y is
observed in only one of the two regimes (Model A-2). The model is given by the
following relationships:
J’ = Yl if Y, 2 y2
= 0 otherwise.
Hence the likelihood function for this model can be written as:
where
‘This procedure first used by Heckman (1976) for the labor supply model was extended to a wide
class of models by Lee (1976).
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1639
@( .) is the distribution function of the standard normal and f is the joint density
of ( uit, uZt). Since y is observed only in one of the regimes, we need to impose
some identifiability restrictions on the parameters of the model. These restrictions
are:
(a) There should be at least one explanatory variable in (1.1) not included in
(1.2)
or
(b) Cov( ui, ZQ) = 0.
These conditions were first derived in Nelson (1975) and since then have been
re-derived by others.
The second variation of the switching regression model that has found wide
application is where the criterion function dete rmining the switching also involves
yi and y2 i.e. eqs. (1.3) and (1.4) are replaced by
Y = Yl iff I* > 0,
Y = Y2 iff I* IO.
Where
1*=y,yi+y,y2+zcw-u. (2.3)
Examples of this model are the unions and wages model by Lee (1978) and the
education and self-selection model by Willis and Rosen (1979). In both cases, the
choice function (2.3) determining the switching involves the income differential
( y1 - y2). Thus yZ = - yi. Interest centers on the sign and significance of the
coefficient of (y, - y2).
The estimation of this model proceeds as before. We first write the criterion
function in its reduced form and estimate the parameters by the probit method.
Note that, for normalization purposes, instead of imposing the condition Var( u)
=l, it is more convenient to impose the condition that the variance of the
residual U* in the reduced form for (2.3) is unity.
i.e.Var(u*)=Var(y,u,+y,u,-u)=l. (2.4)
This means that Var( u) = u,’ is a parameter to be estimated. But, in the switching
regression model, the parameters that are estimable are: pi, &, u:, I$, ulU*, and
(I~,,* where a& = Cov(u,, u*) and ulU * = Cov(u,, u*). The estimates of uiU* and
u2U* together with the normalization eq. (2.4) give us only 3 equations from which
we still have to estimate four parameters ui2, uiU, u2,, and u,‘. Thus, in this model
we have to impose the condition that one of the covariances Q, ulU, u2U is zero.
The most natural assumption is u12 = 0.
1640 G. S. Mad&la
As for the estimation of the parameters in the choice function (2.3) again we
have to impose some conditions on the explanatory variables in y, and y2. After
obtaining estimates of the parameters 8% and &, we get the estimated values jjl
and j$ or y, and y2 respectively and estimate the parameters in (2.3) by the
probit method using these estimated values of y, and y2. The condition for the
estimability of the parameters in (2.3) is clearly that there be no perfect multicol-
linearity between j+, j$ and z.
This procedure, called the “two-stage probit method” gives consistent estimates
of the parameters of the choice function. Note that since (yr - jr) and (y2 - j&)
are heteroscedastic, the residuals in this two-stage probit method are hetero-
scedastic, But this heteroscedasticity exists only in small samples and the residuals
are homoscedastic asymptotically, thus preserving the consistency properties of
the two-stage probit estimates. For a proof of this proposition and the derivation
of the asymptotic covariance matrix of the two-stage probit estimates. see Lee
(1979).
Q, = Min(D,, S,).
Let f(~r, u2) be the joint density of (ur, u2) and g(D, S) the joint density of D
and S derived from it. If observation t is on the demand function, we know that
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1641
JdQ,lQ, = 4) = jQwd%
I
Q,)W/(l- A,). (3.3)
h<Q,>=X,h<Q,lQ,=D,>+<1-x,>h<Q,lQ,=s,>
= /+dQty $)W + /f&L Q,)dDt. (3.4)
f ,
L= I-b(Q,). (3.5)
As will be shown later, the likelihood function for this model is unbounded for
certain parameter values.
Once the parameters in the model have been estimated, we can estimate the
probability that each observation is on the demand function or the supply
function. Maddala and Nelson (1974) suggest estimating the expressions A, in
(3.1). These were the probabilities calculated in Sealy (1979) and Portes and
Winter (1980). Kiefer (1980a) and Gersovitz (1980) suggest calculating:
where h(QL,) is defined in (3.4). Lee (1983b) treats the classification of sample
observations to periods of excess demand or excess supply as a problem in
1642 G. S. Maddala
Where fi and f2 are the density functions of ui and a2 respectively. Thus, the
distribution of y is the mixture of two normal distributions. Given n observations
yi, we can write the likelihood functions as:
where
and
.
I
As noted earlier, the likelihood function for this model becomes unbounded for
certain parameter values. However, Kiefer (1978) has shown that a root of the
1644 G. S. Maddala
E(e@) =Xexp
[
*;&B+B2$
I
+(1-X)exp
[
e2u2
x;P20+--$-
1
. (3.8)
E(eej-“) by i ,i eeJyl,
1=1
and
(3.9)
where
@j_Yi),
Zj( S,) = eXP(
and G(y, xi, 9) is the value of the expression on the right hand side of (3.8) for
B = t$ and the ith observation.
The normal equations obtained by minimizing (3.9) with respect to y are the
same as those obtained by minimizing
(3.10)
;=I j=l
3Hartley and Mallela (1977) prove the strong consistency of the maximum likehood estimator but
on the assumption that q and 4 are bounded away from zero. Amemiya and Sen (1977) show that
even if the likelihood function is unbounded, a consistent estimator of the true parameter value in this
model corresponds to a local maximum of the likelihood function rather than a global maximum.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1645
where
The discussion in the previous two sections is based on two polar cases: sample
separation completely known or unknown. In actual practice there may be many
cases where information about sample separation is imperfect rather than perfect
or completely unavailable. Lee and Porter (1984) consider the case. They consider
the model:
for r =1,2,..., T. There is a dichotomous indicator IV; for each r which provides
sample separation information for each t. We define a latent dichotomous
variable It where
where
pll+Plo=landp,,+p,=l.
Let
Prob( W, = 1) = p.
Then
where
X = prob(I, =l).
If we assume &ir and Ed, to be normally distributed as N(0, u:) and N(O,&
respectively and define
1
exp - *(Y,- xjtPi)2 for i =1,2,
fi = (2n)“5Ji [ u, I
.[fix(l-P1,)+f*(l-X)(l-POlll-Wtr (4.3)
(4.4)
If pl1 = pOl, then the joint density f(Y, W,) can be factored as:
and hence the indicators W, do not contain any information on the sample
separation. One can test the hypothesis p 11 = pOl in any actual empirical case, as
shown by Lee and Porter. Also, if pl1 = 1 and pOl = 0, the indicator W, provides
1648 G. S. Maddala
Thus, both the cases considered earlier-sample separation known and sample
separation unknown are particular cases of the model considered here.
Lee and Porter also show that if pI1 # pal, then there is a gain in efficiency by
using the indicator W,. Lee and Porter show that the problem of unbounded
likelihood functions encountered in switching regression models with unknown
sample separation also exists in this case of imperfect sample separation. As for
ML estimation, they suggest a suitable modification of the EM algorithm sug-
gested by Hartley (1977, 1978) and Kiefer (1980b) for the switching regression
model with unknown sample separation.
The paper by Lee and Porter is concerned with a switching regression model
with exogenous switching but it can be readily extended to a switching regression
model with endogenous switching. For instance, in the simple disequilibrium
market model
4 = 4tP1+ &it,
4 = X2,82+ E2I!
Q, = Min(D,,s,),
f(Q,, W) = h% + ~olG,,l~[(l-p,,)G,,+(1-~ol)G2fl~--’~,
where
and g(D, S) is the joint density of D and S. The marginal density h(Q,) of Q, is
given by the eq. (3.4). As before, if p1t = pal then the joint density f( Q,, W,) can
be written as
One can use the sign of AP, for W,. The procedure would then be an extension of
Ch. 28: Disequdibrium, Self -selecrion, and Switching Models 1649
the ‘directional method’ of Fair and Jaffee (1.972) in the sense that the sign of A P,
is taken to be a noisy indicator rather than a precise indicator as in Fair and
Jaffee. Further discussion of the estimation of disequilibrium models with noisy
indicators can be found in Maddala (1984).
Unless (1 - y1y2) and (1 - yiy;) are of the same sign, there will be an inconsistency
in the conditions Yi < c and Y, > c from the two reduced forms. Such conditions
1650 G. S. Maddala
for logical consistency have been pointed out by Amemiya (1974), Maddala and
Lee (1976) and Heckman (1978). They need to be imposed in switching simulta-
neous systems where the switch depends on some of the endogenous variables.
Gourieroux et al. (1980b) have derived some general conditions which they call
“coherency conditions” and illustrate them with a number of examples. These
conditions are derived from a theorem by Samelson et al. (1958) which gives a
necessary and sufficient condition for a linear space to be partitioned in cones.
We will not go into these conditions in detail here. In the case of the switching
simultaneous system considered here, the condition they derive is that the
determinants of the matrices giving the mapping from the endogenous variables
(Y,, Y,,..., Y,) to the residuals (zdi, Us,..., uk) are of the same sign, in the
different regimes. The two determinants under consideration are (1 - y1y2) and
(1 - yly$). The condition for logical consistency of the model is that they are of
the same sign or (1 - y1y2)(1 - yly;) > 0. A question arises about what to do’with
these conditions. One can impose them and then estimate the model. Alterna-
tively, since the condition is algebraic, if it cannot be given an economic
interpretation, it is important to check the basic structure of the model. An
illustration of this is the dummy endogenous variable model in Heckman (1976a).
The model discusses the problem of estimation of the effect of fair employment
laws on the wages of blacks relative to whites, when the passage of the law is
endogenous. The model as formulated by Heckman is a switching simultaneous
equations model for XC& we have to impose a condition for “logical con-
sistency”. However, the condition does not have any meaningful economic
interpretation and as pointed out in Maddala and Trost (1981) a careful
examination of the arguments reveals that there are two sentiments, not one as
assumed by Heckman, that lead to the passa,ge of the law, and when the model is
reformulated, there is no condition for logical consistency that needs to be
imposed.
The simultaneous equations models with truncated dependent variables consid-
ered by Amemiya (1974) are also switching simultaneous equations models which
require conditions for logical consistency. Again, one needs to examine whether
these conditions need to be imposed exogenously or whether a more logical
formulation of the problem leads to a model where these conditions are automati-
cally satisfied. For instance, Waldman (1981) gives an example of time allocation
of young men to school and work where the model is formulated in terms of
underlying behavioural relations and the conditions derived by Amemiya follow
naturally from economic theory. On the other hand, these conditions have to be
imposed exogenously (and are difficult to give an economic interpretation) if the
model is formulated in a mechanical fashion where time allocated to’work was
modelled as a linear function of school time and exogenous variables and time
allocated to school was modelled as a linear function of work time and exogenous
variables.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1651
D, Y,, Y, are the endogenous variables and X, and X3 are sets of exogenous
variables. Note that the exogenous variables in the demand for durables equation
and the demand for debt equation are the same.
The model is a switching simultaneous equations model with endogenous
switching. We can write the model as follows:
If we get the reduced forms for Y, and Y, in the two regimes and simplify the
expression Y, - Y2, we find that:
(Yr-Y,)inRegimeZ=s{(Y,-Y,)inRegimel}.
Thus, the condition for the logical consistency of this model is that (1 - (Y~oL~)
and
(1 - (Y& are of the same sign - a condition that can also be derived by using the
theorems in Gourieroux et al. (1980b).
The interesting thing to note is that the simultaneous equation system in
Regime 1 is under-identified. However, if the system of equations in Regime 2 is
identified, the fact that we can get consistent estimates of the parameters in the
demand equation for durables from Regime 2, enables us to get consistent
estimates of the parameters in the Y, equation. Thus the parameters in the
simultaneous equations system in Regime 1 are identified. One can construct a
formal and rigorous proof but this will not be attempted here. Avery (1982) found
1652 G. S. Maddaia
that he could not estimate the parameters of the structural equation for Y, but
this is possibly due to the estimation methods used.
In summary, switching simultaneous equations models often involve the im-
position of constraints on parameters so as to avoid some internal inconsistencies
in the model. But it is also very often the case that such logical inconsistencies
arise when the formulation of the model is mechanical. In many cases, it has been
found that a re-examination and a more careful formulation leads to an alterna-
tive model where such constraints need not be imposed.
There are also some switching simultaneous equations models where a variable
is endogenous in one regime and exogenous in another and, unlike the cases
considered by Richard (1980) and Davidson (1978), the switching is endogenous.
An example is the disequilibrium model in Maddala (1983b).
and the eqs. (6.1) and (6.2). To interpret the “price adjustment” eq. (6.2) we have
to ask the basic question of why disequilibrium exists. One interpretation is that
prices are fixed by someone. The model is thus a j&price model. The disequi-
librium exists because price is fixed at a level different from the market equilibrat-
ing level (as is often the case in centrally planned economies). In this case the
41~e directional method makes sense only for the estimation of the reduced form equations for 0,
and S, in a model with a price adjustment equation. There are cases where this is needed. The
likelihood function for the estimation of the parameters in this model is derived in Maddala and
Nelson (1974). It is:
where g( D, S) is the joint density of D and S (from the reduced form equations). When A P < 0 we
have D = Q and S > Q and when AP > 0 we have S = Q and D > Q. Note that the expression given
in Fair and Kelejian (1974) as the likelihood function for this model is not correct though it gives
consistent estimates of the parameters.
1654 G. S. Maddula
price adjustment eq. (6.2) can be interpreted as the rule by which the price-fixing
authority is changing the price. However, there is the problem that the price-fixing
authority does not know D, and S, since they are determined only after P, is
fixed. Thus, the eq. (6.2) cannot make any sense in the fix-price model. Laffont
and Garcia (1977) suggested a modification of the price adjustment equation
which is:
In this case the price fixing authority uses information on the past period’s
demand and supply to adjust prices upwards or downwards. In this case the
price-fixing rule is an operational one but one is still left wondering why the
price-fixing authority follows such a dumb rule as (6.2’). A more reasonable thing
to do is to fix the price at a level that equates expected demand and supply. One
such rule is to determine price by equating the components of (6.3) and (6.4) after
ignoring the stochastic disturbance terms. This gives
p = 4*P1- X2$*
f (6.5)
a2 - a1
This is the procedure suggested by Green and Laffont (1981) under the name of
“anticipatory pricing”.
As mentioned earlier, the meaning of the price adjustment equation depends on
the source of disequilibrium. An alternative to the fix-price model as an explana-
tion of disequilibrium is the pa&l adjustment model (see Bowden, 1978 a, b). The
source of disequilibrium in this formulation is stickiness of prices (due to some
institutional constraints or other factors). Let P,* be the market equilibrating
price. However, prices do not adjust fully to the market equilibrating level and we
specify the “partial adjustment” model:
p* - p*-1z&P,*-P,). (6.7)
If P, -c P,* there will be excess demand and if P, > P,* there will be excess supply.
Hence, if A P, c 0 we have a situation of excess supply.
Note that in this case it is AP, (not AP,+l as in the Laffont-Garcia case) that
gives the sample separation. But the interpretation is not that prices rise in response
to excess demand (as implicitly argued by Fair and Jaffee) but that there is excess
Ch. 28: Disrqudbkmz, Self- selectron cmd Switching Models 1655
demand (or excess supply) because prices do not fully adjust to the equilibrating
values.5
Equation (6.7) can also be written as
=&(p;-P,) ifP,*<P,
2
Note first that the conditions P,* r Prmi, P, > Pt-l, P,* > P, and D, > St are all
equivalent. Also assuming that excess demand is proportional to P,* - P, we can
write eqs. (6.10) as
4=~,@,-4) if D, > S,
‘The formulation in terms of partial adjustment towards P* was suggested by Bowden (1978a)
though he does not use the interpretation of the Fair-Jafiee equation given here. Bowden (1978b)
discusses this approach in greater detail under the title: “The PAMEQ Specification”.
1656 G. S. Maddala
There is still one disturbing feature about the partial adjustment eq. (6.6) that
Bowden adopts and under which we have given a justification for the Fair and
Jaffee directional and quantitative methods. This is that AP, unambiguously gives
us an idea about whether there is excess demand or excess supply. As mentioned
earlier this does not make intuitive sense. On closer examination one sees that the
problem is with eq. (6.6), in particular the assumption that X lies between 0 and
1. This is indeed a very strong assumption and implies that prices are sluggish but
never change to overshoot P,* the equilibrium prices. There is? however, no
a priori reason why this should happen. 6 Once we drop the assumption that h
should lie between 0 and 1, it is no longer true that we can use AP, to classify
observations as belonging to excess demand or excess supply. As noted earlier the
assumption 0 < h < 1 implies that the conditions P,* > Pt__,, P, > PC_,, P,* > P,
and D, > S, are all equivalent. With A > 1, this no longer holds good.
In summary, we considered two models of disequilibrium-- the fix-price model
and the partial adjustment model. In the f&price model,the price adjustment eq.
(6.2) is non-operational. The modification (6.2’) suggested by Laffont and Garcia
is an operational rule but really does not make much sense. A more reasonable
formula for a price-setting rule is the anticipatory pricing rule (6.5). But this
implies that a price-adjustment equation like (6.2) or (62’j is not valid.
In the case of the partial adjustment model one ca.n derive an equation of the
form (6.2) though its meaning is different from the one given by Fair and Jaffee
and many others using this price adjustment equation. The meaning is not that
prices adjust in response to excess demand or supply but that excess demand and
supply exist because prices do not adjust to the market equilibrating level.
However, as discussed earlier, eq. (6.2) can be derived from the partial adjustment
model (6.6) only under a restrictive set of assumptions.
The preceding arguments hold good when eq. (6.2) is made stochastic with the
addition of a disturbance term. In this case there is not much use for the
price-adjustment equation. The main use of eq. (6.2) is that it gives a sample
separation, and estimation with sample separation known is much simpler than
estimation with sample separation unknown. If one is anyhow going to estimate a
model with sample separation unknown, then one can as well eliminate eq. (6.2).
For fix-price models, one substitutes the anticipatory price eq. (6.5) and for
partial adjustment models one uses eq. (6.6) directly.
6Since no economic model has been specified, there is no reaon to make any aItematr\e assumption
either.
Ch. 28: Disequilibrium, Self-selection and Switching Models 1657
well. We will now discuss alternative specifications of the demand and supply
functions.
The probability that there would be rationing should affect the demand and
supply functions. There are two ways of taking account of this. One procedure
suggested by Eaton and Quandt (1983) is to introduce the probability of rationing
as an explanatory variable in the demand and/or supply functions (6.3) and (6.4).
A re-specification of eq. (6.3), they consider is
(6.3’)
where Pteis the expected price, i.e. the price the suppliers expect to prevail in
period t, the expectation being formed at time t - 1 (we will assume a one period
lag between production decisions and supply). Regarding the expected price P,', if
we use some naive extrapolative or the adaptive expectations formulae, then the
estimation proceeds as in earlier models with no price expectations, with minor
modifications. For instance, with the adaptive expectations formula, one would
‘Though the analysis is similar, the computations are more complex because of the presence of q in
the demand function.
1658 G. S. Maddala
where P,’ is the expected price and Z,_1 represents the information set the
economic agents are assumed to have.
Equation (6.12) implies that we can write
where u, is uncorrelated with all the variables in the information set It-l. If the
information set Z,_, includes the exogenous variables X,, and Xzt, i.e. if these
exogenous variables are known at time I - 1, then we can substitute P,’ = P, - u,
in eq. (6.12). We can re-define a residual U$ = uzt - (Y*u, and u;, has the same
properties as Us,. Thus the estimation of the model simplifies to the case
considered by Fair and Jaffee.
If, on the other hand, X,, and X,, are not known at time (t - 1) we cannot
treat u, the same way as we treat uzl since u, can be correlated with Xi, and X,,.
In this case we proceed as follows.
From eqs. (6.2) (6.3), and (6.4’) we have
or
or
(6.13)
where
&= [l+y(a:-al),
’
h2=[l+y(:2-al)]
’
Ch. 28: Disequilibrium, Se&f-selection and Switching Models 1659
and XI7 and Xi, are the expected values of X,, and X,,. (Note that this equation
is valid even if the price adjustment eq. (6.2) is made stochastic.)
To obtain Xz and Xz we have to make some assumptions about how these
exogenous variables are generated. A common assumption is that they follow
vector autoregressive processes. Let us for the sake of simplicity of notation
assume a first order autoregressive process.
Then
and
with 6, > 0, S, > 0, and S,6, ~1. [See Orsi (1982) for this last condition.]
At time (t - l), Q,_, is equal to D,_r or S,_,. Thus, one of these is not
observed. However, if the price adjustment eq. (6.2) is not stochastic, one has a
four-way regime classification depending on excess demand or excess supply at
time periods (t - 1) and t. Thus, the method of estimation suggested by Amemiya
(1974a) for the Fair and Jaffee model can be extended to this case. Such extension
1660 G. S. Maddala
is done in Laffont and Monfort (1979). Orsi (1982) applied this model to the
Italian labor market but the estimates of the spill-over coefficients were not
significantly different from zero. This method is further extended by Chanda
(1984) to the case where the supply function depends on expected prices and
expectations are formed rationally.
As mentioned in the introduction, the main element that distinguishes the recent
econometric literature on disequilibrium models from the earlier literature is the
‘Mm’ condition’ (6.1). This condition has been criticized on the grounds that:
(a) Though it can be justified at the micro-levei, it cannot be valid at the
aggregate level where it has been very often used.
(b) It introduces unnecessary computational problems which can be avoided by
replacing it with
Q=M~~[E(D),E(S)]+E. (6.1’)
(c) In some disequilibrium models, the appropriate condition for the trans-
acted quantity is
Q=O if D # S.
Q, = 0 if D, f S,. (6.1”)
Ch. 28: Disequilibrium, Self-selection and Switching Models 1661
The term ‘trading model’ arose by analogy with commodity trading where trading
stops when prices hit a floor or a ceiling (where there is excess demand or excess
supply respectively). However, in commodity trading, a sequence of trades takes
place and all we have at the end of the day is the total volume of trading and the
opening, high, low and closing prices. ’ Thus, commodity trading models do not
necessarily fall under the category of ‘trading’ models defined here. On the other
hand models that involve ‘rationing’ at the aggregate level might fall into the class
of ‘trading’ models defined here at the micro-level. Consider, for instance, the
loan demand problem with interest rate ceilings. At the aggregate level there
would be an excess demand at the ceiling rate and there would be rationing. The
question is how rationing is carried out. One can argue that for each individual
there is a demand schedule giving the loan amounts L the individual would want
to borrow at different rates of interest R. Similarly, the bank would also have a
supply schedule giving the amounts L it would be willing to lend at different rates
of interest R. If the rate of interest at which these two schedules intersect is I E
the ceiling rate, then a transaction takes place. Otherwise no transaction takes
place. This assumption is perhaps more appropriate in mortgage loans rather than
consumer loans. In this situation Q is not Min(D, S). In fact Q = 0 if D # S.The
model would be formulated as:
LoanDemand Li=alRi+/3~Xli+uli
if RTsx,
Loan Supply Li=azRi+fl;X2i+~i
I
Li= 0 otherwise.
R5 is the rate of interest that equilibrates demand and supply. If the assumption
is that the individual borrows what is offered at the ceiling rate i?, an assumption
more appropriate for consumer loans, we have
Li=a,R+p;X,i+U2i if RF>i?.
8Actually, in studies on commodity trading, the total number of contracts is treated as Q, and the
closing price for the day as P,. The closing price is perhaps closer to an equilibrium price than the
opening, low and high prices. But it cannot be treated as an equilibrium price. There is the question of
what we mean by ‘equilibrium’ price in a situation where a number of trades take place in a day. One
can interpret it as the price that would have prevailed if there was to be a Walrasian auctioneer and a
single trade took place for a day. If this is the case, then the closing price would be an equilibrium
price only if a day is a long enough period for prices at the different trades to converge to some
equilibrium. These problems need further work. See Monroe (1981).
1662 G. S. Maddala
The important situations where this sort of disequilibrium model arises is where
there are exogenous controls on the movement of prices. There are essentially
three major sources of disequilibrium that one can distinguish.
(1) Fixed prices
(2) Imperfect adjustment of prices
(3) Controlled prices
We have till now discussed the case of fixed prices and imperfect adjustment to
the market equilibrating price. The case of controlled prices is different from the
case of fixed prices. The disequilibrium model considered earlier in example 1,
Section 1 is one with flx:d prices. With fixed prices, the market is almost always
in disequilibrium. With controlled prices, the market is sometimes in equilibrium
and sometimes in disequilibrium.’
Estimation of disequilibrium models with controlled prices is discussed in
Maddala (1983a, pp. 327-34 and 1983b) and details need not be presented here.
Gourieroux and Monfort (1980) consider endogenously controlled prices and
Quandt (1984) discusses switching between equilibrium and disequilibrium.
In summary, not all situations of disequilibrium involve the ‘Min’ condition
(6.1). In those formulations where there is some form of rationing, the alternative
condition (6.1’), that has been suggested on grounds of ccmputational simplicity,
is not a desirable one to use and is difficult to justify conceptually.
What particular form the ‘Min’ condition takes depends on how the rationing
is carried out and whether we are analyzing micro or macro data. The discussion
of the loan problem earlier shows how the estimation used depends on the way
customers are rationed. This analysis applies at the micro level. For analysis with
macro data Goldfeld and Quandt (1983) discuss alternative decision criteria by
which the Federal Home Loan Bank Board (FHLBB) rations its advances to
savings and loan institutions. The paper based on earlier work by Goldfeld, Jaffee
and Quandt (1980) discusses how different targets and loss functions lead to
different forms of the ‘Min’ condition and thus call for different estimation
methods. This approach of deriving the appropriate rationing condition from
explicit loss functions is the appropriate thing to do, rather than writing down the
demand and supply functions (6.3), and (6.4), and saying that since their is
disequilibrium (for some unknown and unspecified reasons) we use the ‘Min’
condition (6.1).
‘Mackinnon (1978) discusses this problem but the likelihood functions he presents are incorrect.
The correct analysis of this model is presented in Maddala (1983b).
Ch. 28: Disequilibrium, Self-selection and Switching Models 1663
Y, = x,p + d, - ll,,
%* = Plul,,-l+ e1t9
Then we have
and
There have been many tests suggested for the “disequilibrium hypothesis”, i.e. to
test whether the data have been generated by an equilibrium model or a
disequilibrium model. Quandt (1978) discusses several tests and says that there
does not exist a uniformly best procedure for testing the hypothesis that a market
is in equilibrium against the alternative that it is not.
A good starting point for “all” tests for disequilibrium is to ask the basic
question of what the disequilibrium is due to. In the case of the partial adjustment
model given by eq. (6.7) the disequilibrium is clearly due to imperfect adjustment
of prices. In this case the proper test for the equilibrium vs. disequilibrium
hypothesis is to test whether X = 1. See Ito and Ueda (1981). This leads to a test
that l/y = 0 in the Fair and Jaffee quantitative model, since y is proportional to
l/l - X. This is the procedure Fair and Jaffee suggest. However, if the meaning of
the price adjustment equation is that prices adjust in response to either excess
demand or excess supply, then as argued in Section 6, the price adjustment
equation should have A P, + 1 not A P,, and also it is not clear how one can test for
the equilibrium hypothesis in this case. The intuitive reason is that now the price
adjustment equation does not give any information about the source of the
disequilibrium.
Quandt (1978) argues that there are two classes of disequilibrium models which
are;
(a) Models where it is known for which observations 0, < S, and for which
D, > S,, i.e. the sample separation is known, and
Ch. 28: Disequilibrium, Self selection and Switching Models 1665
(7.3)
and
u3’# 0,
then we get the likelihood function for the equilibrium model (Q, = D, = St) and
thus the hypothesis is “nested”; but that if 03’= 0, the likelihood function for the
disequilibrium model does not tend to the likelihood function for the equilibrium
model even if y + cc and thus the hypothesis is not tested. The latter conclusion,
however, is counter-intuitive and if we consider the correct likelihood function for
this model derived in Amemiya (1974) and if we take the limits as y + cc, we get
the likelihood function for the equilibrium model.
1666 G. S. Maddala
from the equilibrium and disequilibrium model. The difference between the two
models is that ?~i,7r2,3 are stable over time in the equilibrium model and varying
over time in the disequilibrium model. Hwang, therefore, proposes to use stability
tests available in the literature for testing the hypothesis of equilibrium. In the
case of the equilibrium model P, is endogenous. Eq. (7.5) is derived from the
conditional distribution of Q, given P, and hence can be estimated by ordinary
least squares. The only problem with the test suggested by Hwang is that
parameter instability can arise from a large number of sources and if the null
hypothesis is rejected, we do not know what alternative model to consider.
In summary, it is always desirable to base a test for disequilibrium on a
discussion of the source for disequilibrium.
The issues of how to formulate the desired inventory holding and how to
formulate inventory behaviour in the presence of disequilibrium are problems
that need further study.
The analysis in the preceding sections on single market disequilibrium models has
been extended to multimarket disequilibrium models by Gourieroux et al. (1980)
and Ito (1980). Quandt (1978) first considered a two-market disequilibrium model
of the following form: (the exogenous variables are omitted):
4, = alQzr + 4,
S,, = &Qzr + 4,
4 = azQu + F/1,,
% = PzQn + &t, (8.1)
Q,, = Mid% fL),
Q2, = Min@L &). (8.2)
Quandt did not consider the logical consistency of the model. This is considered
in Amemiya (1977) and Gourieroux et al. (1980a).
Consider the regimes:
R,: D,2S1.D2kSz,
R,: D,>S,.D,<S,,
R,: D,cS,-D,cS,,
R,, R,, R, respectively that give the mapping from (III, St, D,,S,) to
(q, u2, u3, %I.
A,=
0
i 010 I --i%
1a2 -8101
-a,0 01
A,=
-(Y2
-p20
01 01 -81
-(Y1
01 01
11*
and
-&OO
1 0 0 -a1
0 1 0 -PI
A,=
-lY2 0 1 0
C=Min(Cd,CS),
The demands and supplies actually presented in each market are called “effective”
demands and supplies and these are determined by the exogenous variables and
the endogenous quantity constraints (8.4). By contrast, the “notional” demands
and supplies refer to the unconstrained values. Denote these by Cd, p, Ed, E.
The different models of multi-market disequilibrium differ in the way ‘effective’
demands and “spill-over effects” are defined. Gourieroux et al. (1980a) define the
1670 G. S. Maddala
Model I
cd=cd if L = L” I Ld,
(8.5)
=Cd+q(L-tS) ifL=Ld<LS,
CS=F if L = Ld 2 L”,
(8.6)
=CS+a,(L-Ed) if L = L” < Ld,
Ld=zd if C=C”SCd,
(8.7)
=?ld+P1(C-CS) if C=Cd<CS,
LS=E” if C=CdSCS,
(8.8)
=L”+p,(c-Cd) if C=C”<Cd.
This specification is based on Malinvaud (1977) and assumes that agents on the
short-side of the market present their notional demand as their effective demand
in the other market. For instance eq. (8.5) says that if households are able to sell
all the labor they want to, then their effective demand for goods is the same as
their ‘notional’ demand. On the other hand, if they cannot sell all the labor they
want to, there is a “spill-over effect” but note that this is proportional to L - 1’
not L - L’. (I.e. it is proportional to the difference between actual labor sold and
the ‘notional’ supply of labor.)
The model considered by Ito (1980) is as follows:
Model II
Cs=CS+fx2(L-Zd), (8.6’)
L”=Z”+p,(c-Cd). (8.8’)
Model III
L”=L”+j?,(c-Cd). (8.8”)
Portes compares the reduced forms for these three models and argues that
econometrically, there is little to choose between the alternative definitions of
effective demand.
The conditions for logical consistency (or coherency) are the same in all these
models viz: 0 < CX;~,~1 for i, j = 1,2. Both Gourieroux et al. (1980a) and Ito
(1980) derive these conditions, suggest price and wage adjustment equations
similar to those considered in Section 6, and discuss the maximum likelihood
estimation of their models. Ito also discusses two-stage estimation similar to that
proposed by Amemiya for the Fair and Jaffee model, and derives sufficient
conditions for the uniqueness of a quantity-constrained equilibrium in his model.
We cannot go into the details of all these derivations here. The details involve
more of algebra than any new conceptual problems in estimation. In particular,
the problems mentioned in Section 6 about the different price adjustment
equations apply here as well.
Laffont (1983) surveys the empirical work on multi-market disequilibrium
models. Quandt (1982, pp. 39-54) also has a discussion of the multi-market
disequilibrium models.
The applications of multi-market disequilibrium models all seem to be in the
macro area. However, here the problems of aggregation are very important and it
is not true that the whole economy switches from a regime of excess demand to
one of excess supply or vice versa. Only some segments might behave that way.
The implications of aggregation for econometric estimation have been studied in
some simple models by Malinvaud (1982).
The problems of spillovers also tend to arise more at a micro-level rather than a
macro-level. For instance, consider two commodities which are substitutes in
consumption (say natural gas and coal) one of which has price controls. We can
define the demand and supply functions in the two markets (omitting the
exogenous variables) as follows:
Q, = Mh(D,, S,),
D,=y,P,+S,P,+A(D,-S,)+V,,
S, = YZPZ + v,,
Q2 = D, = S,, i.e. the second market is always in equilibrium.
If P, I P, we have the usual simultaneous equations model with the two quanti-
1672 G. S. Maddulu
ties and two prices as the endogenous variables. If P, > P, then there is excess
demand in the first market and a spill-over of this into the second market. This
model is still in a “partial equilibrium” framework but would have interesting
empirical applications. It is at least one step forward from the single-market
disequilibrium model which does not say what happens to the unsatisfied demand
or supply.
w, = XP, + Ul,
w,+xp,+u,. (9-I)
-403 ’
E(u,lWo2Wr)=-a ‘W(Z)
where
(9.2)
where E(V) = 0.
Ch. 28: Disequilibrium, Self-selection and Switching Models 1673
A test for selectivity bias is a test for ui,, = 0. Heckman (1976) suggested a
two-stage estimation method for such models. First get consistent estimates for
the parameters in 2 by the probit method applied to the dichotomous variable
(in the labor force or not). Then estimate eq. (9.2) by OLS using the estimated
values 2 for 2.
The self-selectivity problem has since been analyzed in different contexts by
several people. Lee (1978) has applied it to the problem of unions and wages. Lee
and Trost (1978) have applied it to the problem of housing demand with choices
of owning and renting. Willis and Rosen (1979) have applied the model to the
problem of education and self-selection. These are all switching regression mod-
els. Griliches et al. (1979) and ReMy et al. (1979) consider models with both
selectivity and simultaneity. These models are switching simultaneous equations
models. As for methods of estimation, both two-stage and maximum likelihood
methods have been used. For two-stage methods, the paper by Lee et al. (1980)
gives the asymptotic covariance matrices when the selectivity criterion is of the
probit and tobit types.
In the literature of self-selectivity a major concern has been with testing for
selectivity bias. These are tests for ulU = 0 and a,, = Cov(u, u2) = 0. However, a
more important issue is the sign and magnitude of these covariances and often
not much attention is devoted to this. In actual practice we ought to have
a2u -%I > 0 but ulu and uzU can have any signs.‘o It is also important to
estimate the mean values of the dependent variables for the alternate choice. For
instance, in the case of college education and income, we should estimate the
mean income of college graduates had they chosen not to go to college, and
the mean income of non-college graduates had they chosen to go to college. In the
example of hunting and fishing we should compute the mean income of hunters
had they chosen to be fishermen and the mean income of fishermen had they
chosen to be hunters. Such computations throw light on the effects of self-selec-
tion and also reveal difficiencies in the model which simple tests for the existence
of selectivity bias do not. See Bjiirlund and Moffitt (1983) for such calculations.
In the literature on labor supply, there has been considerable discussion of
“individual heterogeneity”, i.e. the observed self-selection is due to individual
characteristics not captured by the observed variables (some women want to work
no matter what and some women want to sit at home no matter what). Obviously,
these individual specific effects can only be analyzed if we have panel data. This
problem has been analyzed by Heckman and Chamberlain, but since these
problems will be discussed in the chapters on labor supply models by Heckman
“This is pointed out in Lee (1978b). Trost (1981) illustrates this with an empirical example on
returns to college education.
1614 G. S. Maddala
and analysis of cross-section and time-series data by Chamberlain they will not be
elaborated here.
One of the more important applications of the procedures for the correction of
selectivity bias is in the evaluation of programs.
In evaluating the effects of several social programs, one has to consider the
selection and truncation that can occur at different levels. We can depict the
situation by a decision tree as follows.
Total Sample
One further problem is that of truncated samples. Very often we do not have
data on all the individuals -participants and non-participants. If the data consists
of only participants in a program and we know nevertheless that there is
self-selection and we have data on the variables determining the participation
decision function, then we can still correct for selectivity bias. The methodology
for this problem is discussed in the next section. The important thing to note is
that though, theoretically, truncation does not change the identifiability of the
parameters, there is, nevertheless a loss of information.
There is a vast amount of literature on program evaluation. Some important
references are: Goldberger (1972) and Barnow, Cain and Goldberger (1981).
These papers and the selectivity problem in program evaluation have been
surveyed in Maddala (1983a, pp. 260-267).
One other problem is that of correcting for selectivity bias when the explana-
tory variables are measured with error. An example of this occurs in problems of
measuring wage discrimination, particularly a comparison between the Federal
and non-Federal sectors. A typical regression equation considered is one of
regressing earnings on productivity and a dummy variable depicting race or sex or
ethnic group. Since productivity cannot be measured, some proxies are used.
When such equations are estimated, say for the Federal (or non-Federal) sectors,
one has to take account of individual choices to belong to one or the other sector.
To avoid the selection bias we have to model not only the determinants of wage
offers but also the process of self-selection by which individuals got into that
sector. An analysis of this problem is in Abowd, Abowd and Killingsworth
(1983).
Finally, there is the important problem that most of the literature on selectivity
bias adjustment is based on the assumption of normality. Consider the simple two
equation model to analyze the selectivity problem.
Y= xp + u,
I*=zy-&,
and Jones (1968). This approach permits the analysis of selection bias with any
distributional assumptions. Details can be found in the papers by Lee, and a
summary in Maddala (1983a, pp. 272-275).
There are several practical instances where selectivity could be due to several
sources rather than just one as considered in the examples in the previous Section.
Griliches et al. (1979) cite several problems with the NLS young men data set that
could lead to selectivity bias. Prominent among these are attrition and (other)
missing data problems. In such cases we would need to formulate the model as
switching regression or switching simultaneous equations models where the switch
depends on more than one criterion function.
During recent years there have been many applications involving multiple
criteria of selectivity. Abowd and Farber (1982) consider a model with two
decisions: the decision of individuals to join a queue for union jobs and the
decision of employers to draw from the queue. Poirier (1980) discusses a model
where the two decisions are those of the employee to continue with the sponsoring
agency after training and the decisions of the employer to make a job offer after
training. Fishe et al. (1981) consider a two-decision model: whether to go to
college or not and whether to join the labor force or not. Ham (1982) examines
the labor supply problem by classifying individuals into four categories according
to their unemployment and under-employment status. Catsiapis and Robinson
(1982) study the demand for higher education and the receipt of student aid
grants. Tunali (1983) studies migration decisions involving single and multiple
moves. Danzon and Lillard (1982) analyze a sequential process of settlement of
malpractice suits. Venti and Wise (1982) estimate a model combining student
preferences for colleges and the decision of the university to admit the student.
All these problems can be classified into different categories depending on
whether the decision rules are joint or sequential. This distinction, however, is not
made clear in the literature and the studies all use the multivariate normal
distribution to specify the joint probabilities.
With a two decision model, the specification is as follows:
I:=z,Y,-&i, (10.3)
We also have to consider whether the choices are completely observed or they
are partially observed. Define the indicator variables
Then the ML estimates of yr, y2 and p are obtained by maximizing the likelihood
function
With the assumption of bivariate normality of .si and Q, this involves the use of
bivariate probit analysis.
In the sequential decision model with partial observability, if we assume that
the function (10.4) is defined only on the subpopulation Ii = 1, then since the
distribution of Ed that is assumed is considered on ei < Ziy,,’ the likelihood
function to be maximized would be
Again, the parameters yi and y2 are estimable only if there is at least one
non-overlapping variable in either one of Z, and Z, (otherwise we would not
know which estimates refer to yi and which refer to y2). In their example on job
queues and union status of workers, Abowd and Farber (1982) obtain their
parameter estimates using the likelihood function (10.6). One can, perhaps, argue
that even in the sequential model, the appropriate likelihood function is still
(10.5) and not (10.6). It is possible that there are persons who do not join the
queue (Ii = 0) but for whom employers would want to give a union job. The
reason we do not observe these individuals in union jobs is because they had
decided not to join the queue. But we do not also observe in the union jobs all
those with I2 = 0. Thus, we can argue that 1; exists and is, in principle, defined
even for the observations Ii = 0. If the purpose of the analysis is to examine
what factors influence the employers’ choice of employees for union jobs, then
possibly the parameter estimates should be obtained from (10.5). The difference
between the two models is in the definition of the distribution of E*. In the case of
(10.5), the distribution of e2 is defined over the whole population. In the case of
(10.6), it is defined over the subpopulation I, = 1. The latter allows us to make
only conditional inferences. l1 The former allows us to make both conditional and
marginal inferences. To make marginal inferences, we need estimates of yz. To
make conditional inferences we consider the conditional distribution f(ezlei <
Ziy,) which involves yi, ya, and p.
Yet another type of partial observability arises in the case of truncated samples.
An example is that of measuring discrimination in loan markets. Let I;” refer to
the decision of an individual on whether or not to apply for a loan, and let 1;
refer to the decision of the bank on whether or not to grant the loan.
Rarely do we have data on the individuals for whom I1 = 0. Thus what we have is
a truncated sample. We can, of course, specify the distribution of 1; only for the
subset of observations I1 = 1 and estimate the parameters yz by say the probit ML
method and examine the significance of the coefficients of race, sex, age, etc. to
see whether there is discrimination by any of these variables. This does not,
however, allow for self-selection at the application stage, say for some individuals
not applying because they feel they will be discriminated against. For this purpose
we define 1: over the whole population and analyze the model from the truncated
sample. The argument is that, in principle 1** exists even for the non-applicants.
The parameters yi, yz and p can be estimated by maximizing the likelihood
function
In this model the parameters yi, y2 and p are, in principle, estimable even if Z,
and Z, are the same variables. In practice, however, the estimates are not likely to
be very good. Muthen and Jiirekog (1981) report the results of some Monte-Carlo
experiments on this. Bloom et al. (1981) report that attempts at estimating this
model did not produce good estimates. However, the paper by Bloom and
Killingsworth (1981) shows that correction for selection bias can be done even
with truncated samples. Wales and Woodland (1980) also present some encourag-
ing Monte-Carlo evidence. Since the situation of truncated samples is of frequent
occurrence (see Bloom and Killingsworth for a number of examples) more
evidence on this issue will hopefully accumulate in a few years.
The specification of the distributions of ei and e2 in (10.3) and (10.4) depends
on whether we are considering a joint decision model or a sequential decision
model. For problems with sequential decisions, the situation can diagrammati-
cally be described as follows:
of problems of aggregation. The Fair and Jaffee example on the housing market
as well as the different models of “credit rationing” are all based on aggregate
data and there is much to be desired in the detailed specification of these models.
Perhaps the most interesting application of the disequilibrium models are in the
areas of regulated industries. After all, it is regulation that produces disequi-
librium in these markets. Estimation of some disequilibrium models with micro-
data sets for regulated industries and estimation of the effects of regulation would
make the disequilibrium literature more intellectually appealing than it has been.
There are also some issues that need to be investigated regarding the appropriate
formulation of the demand and supply functions under disequilibrium. The
expectation of disequilibrium can itself be expected to change the demand and
supply functions. Thus, one needs to incorporate expectations into the modelling
of disequilibrium.
The literature on self-selection, by contrast to the disequilibrium literature, has
several interesting empirical applications. However, even here a lot of work
remains to be done. The case of selectivity being based on several criteria rather
than one has been mentioned in Section 10. Here one needs a clear distinction to
be made between joint decision and sequential decision models. Another problem
is that of correcting for selectivity bias when the explanatory variables are
measured with error. Almost all the usual problems in the single equation
regression model need to be analyzed in the presence of the selection (self-selec-
tion) problem.
References
Abowd, A. M., J. M. Abowd and M. R. Killingsworth (1983) “Race, Spanish Origin and Earnings
Differentials Among Men: The Demise of Two Stylized Facts”. Discussion Paper #83-11, Econom-
ics Research Center/NORC, University of Chicago.
Abowd, J. M. and H. S. Farber (1982) “Job Queues and Union Status of Workers”, Industrial and
Labor Relations Review, 35(4), 354-361.
Amemiya, T. (1973) “Regression Analysis When the Dependent Variable is Truncated Normal”,
Econometrica, 41(6), 997-1016.
Amemiya, T. (1974a) “A Note on a Fair and JaKee Model”, Econometrica, 42(4), 759-762.
Amemiya, T. (1974b) “Multivariate Regression and Simultaneous Equations Models When the
Dependent Variables are Truncated Normal”, Econometrica, 42(6), 999-1012.
Amemiya, T. (1977) “The Solvability of a Two-Market Disequilibrium Model”. Working Paper 82,
IMSSS, Stanford University, August 1977.
Amemiya, T. and Sen G. (1977) “The Consistency of the Maximum Likelihood Estimator in a
Disequilibrium Model”, Technical Report No. 238, IMSSS, Stanford University.
Avery, R. B. (1982) “Estimation of Credit Constraints by Switching Regressions”, in: C. Manski and
D. McFadden, eds., Structural Analysis of Discrete Data: With Econometric Applications. MIT Press.
Barnow, B. S., G. G. Cain and A. S. Goldberger (1980) “Issues in the Analysis of Selectivity Bias”, in:
E. W. Stromsdorder and G. Farkas, eds., Evaluation Studies - Review Annual, 5, 43-59.
Batchelor, R. A. (1977) “A Variable-Parameter Model of Exporting Behaviour”, Review of Economic
Studies, 44(l), 43-58.
Bergstrom, A. R. and C. R. Wymer (1976) “A Model for Disequilibrium Neoclassical Growth and its
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1683
Application to the United Kingdom”, in: A. R. Bergstrom, ed., Statistical Inference in Continuous
Time Economic Models. Amsterdam, North-Holland Publishing Co.
Bemdt, E. R., Hall, B. H., Hall, R. E. and J. A. Hausman (1974) “Estimation and Inference in
Non-Linear Structural Models”, Annals of Economic and Social Measurement, 3(4), 653-665.
Bjorklund, A. and R. Moffitt (1983) “The Estimation of Wage Gains and Welfare Gains From
Self-Selection Models”. Manuscript, Institute for Research on Poverty, University of Wisconsin.
Bloom, D. E. and M. R. Killingsworth (1981) “Correcting for Selection Bias in Truncated Samples:
Theory, With an Application to the Analysis of Sex Salary Differentials in Academe”. Paper
presented at the Econometric Society Meetings, Washington, D.C., Dec. 1981.
Bloom, D. E., B. J. Preiss and J. Trussell(l981) “Mortgage Lending Discrimination and the Decision
to Apply: A Methodological Note”. Manuscript, Carnegie Mellon University.
Bock, R. D. and L. V. Jones (1968) The Measurement and Prediction of Juagement and Choice. San
Francisco: Holden-Day.
Bouissou, M. B., J. J. Laffont and Q. H. Vuong (1983) “Disequilibrium Econometrics on Micro Data”.
Paper presented at the European Meeting of the Econometric Society, Pisa, Italy.
Bowden. R. J. (1978a) “Snecification. Estimation and Inference for Models of Markets in Diseaui-
librium”, Inttknatiokal konomic Review, 19(3), 711-726.
Bowden, R. J. (1978b) The Econometrics of Disequilibrium. Amsterdam: North Holland Publishing Co.
Catsiapis, G. and C. Robinson (1982) “Sample Selection Bias With Multiple Selection Rules”, Journal
of Econometrics, 18, 351-368.
Chambers, R. G., R. E. Just, L. J. Moffitt and A. Schmitz (1978) “International Markets in
Disequilibrium: A Case Study of Beef”. Berkeley: California Agricultural Experiment Station.
Chanda, A. K. (1984) Econometrics of Disequilibrium and Rational Expectations. Ph.D. Dissertation,
University of Florida.
Charemza, W. and R. E. Quandt (1982) “Models and Estimation of Disequilibrium for Centrally
Planned Economies”, Review of Economic Studies, 49, 109-116.
Cosslett, S. R. (1984) “Distribution-Free Estimation of a Model with Sample Selectivity”. Discussion
Paper, Center for Econometrics and Decision Sciences, University of Florida.
Cosslett, S. R. and Long-Fei Lee (1983) “Serial Correlation in Latent Discrete Variable Models”.
Discussion Paper, University of Florida, forthcoming in Journal of Econometrics.
Dagenais, M. G. (1980) “Specification and Estimation of a Dynamic Disequilibrium Model”,
Economics Letters, 5, 323-328.
Danzon, P. M. and L. A. Lillard (1982) The Resolution of Medical Malpractice Claims: Modetiing the
Bargaining Process. Report #R-2792-ICJ, California: Rand Corporation.
Davidson, J. (1978) “FIML Estimation of Models with Several Regimes”. Manuscript, London School
of Economics, October 1978.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) “Maximum Likelihood from Incomplete Data
via the EM Algorithm”, Journal of the Royal Statistical Society, Series B, 39, l-38 with discussion.
Dubin, J. and D. McFadden (1984) “An Econometric Analysis of Residential Electrical Appliance
Holdings and Consumption”, Econometrica, 52(2), 345-62.
Eaton, J. and R. E. Quandt (1983) “A Model of Rationing and Labor Supply: Theory and
Estimation”, Econometrica, 50, 221-234.
Fair, R. C. and D. M. Jaffee (1972) “Methods of Estimation for Markets in Disequilibrium”,
Econometrica, 40, 497-514.
Fair, R. C. and H. H. Kelejian (1974) “Methods of Estimation for Markets in Disequilibrium: A
Further Study”, Econometrica, 42(l), 177-190.
Fishe, R. P. H., R. P. Trost and P. Lurie (1981) “Labor Force Earnings and College Choice of Young
Women: An Examination of Selectivity Bias and Comparative Advantage”, Economics of Education
Review, 1, 169-191.
Frisch, R. (1949) “Prolegomena to a Pressure Analysis of Economic Phenomena”, Metroeconomica, 1,
135-160:
Gersovitz, M. (1980) “Classification Probabilities for the Disequilibrium Model”, Journal of Econo-
metrics, 41, 239-246.
Goldberger, A. S. (1972) “Selection Bias in Evaluating Treatment Effects: Some Formal Illustrations”.
Discussion Paper # 123-72, Institute for Research on Poverty, University of Wisconsin.
Goldberger, A. S. (1981) “Linear Regression After Selection”, Journal of Econometrics, 15, 357-66.
1684 G. S. Maddala
Goldberger, A. S. (1980) “Abnormal Selection Bias”. Workshop Series #8006, SSRI, University of
Wisconsin.
Goldfeld, S. M., D. W. JatTee and R. E. Quandt (1980) “ A Model of FHLBB Advances: Rationing or
Market Clearing?“, Review of Economics and Statistics, 62, 339-347.
Goldfeld, S. M. and R. E. Quandt (1975) “Estimation in a Disequilibrium Model and the Value of
Information”, Journal of Econometrics, 3(3), 325-348.
Goldfeld, S. M. and R. E. Quandt (1978) “Some Properties of the Simple Disequilibrium Model with
Covariance”, Economics Letters, 1, 343-346.
Goldfeld, S. M. and R. E. Quandt (1983) “The Econometrics of Rationing Models”. Paper presented
at the European Meetings of the Econometric Society, Pisa, Italy.
Gourieroux, C., J. J. Laffont and A. Monfort (1980a) “Disequilibrium Econometrics in Simultaneous
Eqs. Systems”, Econometrica, 48(l), 75-96.
Gourieroux, C., J. J. LatTont and A. Monfort (1980b) “Coherency Conditions in Simultaneous Linear
Fqs. Models with Endogenous Switching Regimes”, Econometrica, 48(3), 675-695.
Gourieroux, C. and A. Monfort (1980) “Estimation Methods for Markets with Controlled Prices”.
Working Paper # 8012, INSEE, Paris, October 1980.
Green, J. and J. J. Laffont (1981) “Disequilibrium Dynamics with Inventories and Anticipatory Price
Setting”, European Economic Review, 16(l), 199-223.
Griliches, Z., B. H. Hall and J. A. Hausman (1978) “Missing Data and Self-Selection in Large
Panels”, Annales de L’INSEE, 30-31, The Econometrics of Panel Duta, 137-176.
Gronau, R. (1974) “Wage Comparisons: A Selectivity Bias”, Journal of Political Economy, 82(6),
1119-1143.
Ham, J. C. (1982) “Estimation of a Labor Supply Model with Censoring Due to Unemployment and
Underemployment”, Review of Economic Studies, 49, 335-354.
Hartley, M. J. (1977) “On the Estimation of a General Switching Regression Model via Maximum
Likelihood Methods”. Discussion Paper #415, Department of Economics, State University of New
York at Buffalo.
Hartley, M. J. (1979) “Comment”, Journal of the American Statistical Association, 73(364), 738-741.
Hartley, M. J. and P. Mallela (1977) “The Asymptotic Properties of a Maximum Likelihood Estimator
for a Model of Markets in Disequilibrium”, Econometrica, 45(5), 1205-1220.
Hausman, J. A. (1978) “Specification Tests in Econometrics”, Econometrica, 46(6), 1251-1272.
Hay, J. (1980) “Selectivity Bias in a Simultaneous Logit-OLS Model: Physician Specialty Choice and
Specialty Income”. Manuscript, University of Connecticut Health Center.
Heckman, J. J. (1974) “Shadow Prices, Market Wages and Labor Supply”, Econometrica, 42(4),
679-694.
Heckman, J. J. (1967a) “Simultaneous Equations Models with Continuous and Discrete Endogenous
Variables and Structural Shifts”, in: Goldfeld and Quandt, eds., Studies in Nonlinear Estimation.
Carnbrldge: Ballinger Publishing.
Heckman, J. J. (1976b) “The Common Structure of Statistical Models of Truncation, Sample
Selection, and Limited Dependent Variables, and a Simple Estimator for Such Models”, Annals of
Economic and Social Measurement, 5(4), 475-492.
Heckman, J. J. (1978) “Dummy Endogenous Variables in a Simultaneous Equations System”,
Econometrica, 46(6), 931-959.
Heckman, J. J. (1979) “Sample Selection Bias as a Specification Error”, Econometrica, 47(l), 153-161.
Heckman, J. and B. Singer (1984) “A Method for Minimizing the Impact of Distributional Assump-
tions in Econometric Models for Duration Data”, Econometrica, 52(2), 271-320.
Hendry, D. F. and A. Spanos (1980) “Disequilibrium and Latent Variables”. Manuscript, London
School of Economics.
Hildebrand, F. B. (1956) Introduction to Numercial Analysis. New York: McGraw-Hill.
Hwang, H. (1980) “A Test of a Disequilibrium Model”, Journal of Econometrics, 12, 319-333.
Ito, T. (1980) “Methods of Estimation for Multi-Market Disequilibrium Models”, Econometrica,
48(l), 97-125.
Ito, T. and K. Ueda (1981) “Tests of the Equilibrium Hypothesis in Disequilibrium Econometrics: An
International Comparison of Credit Rationing”, International Economic Review, 22(3), 691-708.
Johnson, N. L. and S. Katz (1972) Distributions in Statistics: Continuous Multivariate Distributions.
Wiley: New York.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1685
Johnson, P. D. and J. C. Taylor (1977) “Modellina Monetary Disequilibrium”, in: M. G. Porter, ed.,
The Australian Monetary System in the 1970’s. Australia: Monash University.
Kennv. L. W.. L. F. Lee. G. S. Maddala and R. P. Trost (1979) “Returns to College Education: An
Inv&tigation of Self-Selection Bias Based on the Project Talent Data”, Znteriational Economic
Review, 20(3), 751-765.
Kiefer, N. (1978) “Discrete Parameter Variation: Efficient Estimation of a Switching Regression
Model”, Econometrica, 46(2), 427-434.
Kiefer, N. (1979) “On the Value of Sample Separation Information”, Econometrica, 47(4), 997-1003.
Kiefer. N. (198Oa) “A Note on Reaime Classification in Disequilibrium Models”. Review of Economic
Studies, 47(l), 637-639. -
Kiefer, N. (1980b) “A Note on Switching Regression and Logistic Discrimination”, Econometrica, 48,
637-639.
King, M. (1980) “An Econometric Model of Tenure Choice and Housing as a Joint Decision”,
Journal of Public Economics, 14(2), 137-159.
Kooiman, T. and T. Kloek (1979) “Aggregation and Micro-Markets in Disequilibrium: Theory and
Application to the Dutch Labor Market: 1948-1975”. Working Paper, Rotterdam: Econometric
Institute, April 1979.
Laffont, J. J. (1983) “Fix-Price Models: A Survey of Recent Empirical Work”. Discussion Paper
# 8305, University of Toulouse.
Laffont, J. J. and R. Garcia (1977) “Disequilibrium Econometrics for Business Loans”, Econometrica,
45(5), 1187-1204.
LalTont, J. J. and A. Monfort (1979) “Disequilibrium Econometrics in Dynamic Models”, Journal of
Econometrics, 11, 353-361.
Lee, L. F. (1976) Estimation of Limited Dependent Variable Models by Two-Stage Methods. Ph.D.
Dissertation, University of Rochester.
Lee, L. F. (1978a) “Unionism and Wage Rates: A Simultaneous Equations Model with Qualitative
and Limited Dependent Variables”, International Economic Review, 19(2), 415-433.
Lee, L. F. (1978b) “Comparative Advantage in Individuals and Self-Selection”. Manuscript, Univer-
sity of Minnesota.
Lee, L. F. (1979) “Identification and Estimation in Binary Choice Models with Limited (Censored)
Dependent Variables”, Econometrica, 47(4), 977-996.
Lee, L. F. (1982a) “Some Approaches ot the Correction of Selectivity Bias”, Review of Economic
Studies, 49, 355-372.
Lee, L. F. (1982b) “Test for Normality in the Econometric Disequilibrium Markets Model”, Journal
of Econometrics, 19, 109-123.
Lee, L. F. (1983a) “Generalized Econometric Models with Selectivity”, Econometrica, 51(2), 507-512.
Lee, L. F. (1983b) “Regime Classification in the Disequilibrium Market Models”. Discussion Paper
# 93, Center for Econometrics and Decision Sciences, University of Florida.
Lee, L. F. (1984) “Sequential Discrete Choice Econometric Models With Selectivity”. Discussion
Paper, University of Minnesota.
Lee, L. F. and R. P. Trost (1978) “Estimation of Some Limited Dependent Variable Models with
Application to Housing Demand”, Journal of Econometrics, 8, 357-382.
Lee, L. F., G. S. Maddala and R. P. Trost (1980) “Asymptotic Covariance Matrices of Two-Stage
Probit and Two-Stage Tobit Methods for Simultaneous Equations Models with Selectivity”,
Econometrica, 48(2), 491-503.
Lee, L. F. and G. S. Maddala (1983a) “The Common Structure of Tests for Selectivity Bias, Serial
Correlation, Heteroscedasticity and Normality in the Tobit Model”. Manuscript, Center for
Econometrics and Decision Sciences, University of Florida. Forthcoming in the International
Economic Review.
Lee, L. F. and G. S. Maddala (1983b) “Sequential Selection Rules and Selectivity in Discrete Choice
Econometric Models”. Manuscript, Center for Econometrics and Decision Sciences, University of
Florida.
Lee, L. F. and R. H. Porter (1984) “Switching Regression Models with Imperfect Sample Separation
Information: With an Application on Cartel Stability”, Econometrica, 52(2), 391-418.
Lewis, H. G. (1974) “Comments on Selectivity Biases in Wage Comparisons”, Journal of Political
Economy, 82(6), 1145-1155.
1686 G. S. Mad&a
Equations Model with Limited Dependent Variables”, International Economic Review, 22(3),
731-730.
Wales, T. .I. and A. D. Woodland (1980) “Sample Selectivity and the Estimation of Labor Supply
Functions”, Intemutional Economic Review, 21, 437-468.
Wallis, K. F. (1980) “Econometric Implications of the Rational Expectations Hypothesis”,
Econometrica, 48(l), 49-72.
Willis, R. J. and S. Rosen (1979) “Education and Self-Selection”, Journal of Political Economy, Part 2,
87(5), 507-526.
Wu, De-Mm (1973) “Alternative Tests of Independence Between Stochastic Regressors and Dis-
turbances”, Econometrica, 41(3), 733-750.
Chdpter 29
BURTON SINGER
Contents
0. Introduction 1690
1. Single spell models 1691
1.1. Statistical preliminaries 1691
1.2. Examples of duration models produced by economic theory 1695
1.3. Conventional reduced form models 1704
1.4. Identification and estimation strategies 1710
1.5. Sampling plans and initial conditions problems 1727
1.6. New issues that arise in formulating and estimating choice theoretic duration models 1744
2. Multiple spell models 1748
2.1. A unified framework 1748
2.2. General duration models for the analysis of event history data 1753
3. Summary 1759
References 1761
*This research was supported by NSF Grant SES-8107963 and NIH Grant NIH-l-ROl-HD16846-01
to the Economics Research Center, NORC, 6030 S. Ellis, Chicago, Illinois 60637. We thank Takeshi
Amemiya and Aaron Han for helpful comments.
0. Introduction
In analyzing discrete choices made over time, two arguments favor the use of
continuous time models. (1) In most economic models there is no natural time
unit within which agents make their decisions and take their actions. Often it is
more natural and analytically convenient to characterize the agent’s decision and
action processes as operating in continuous time. (2) Even if there were natural
decision periods, there is no reason to suspect that they correspond to the annual
or quarterly data that are typically available to empirical analysts, or that the
discrete periods are synchronized across individuals. Inference about an underly-
ing stochastic process that is based on interval or point sampled data may be very
misleading especially if one falsely assumes that the process being investigated
operates in discrete time. Conventional discrete choice models such as logit and
probit when defined for one time interval are of a different functional form when
applied to another time unit, if they are defined at all. Continuous time models
are invariant to the time unit used to record the available data. A common set of
parameters can be used to generate probabilities of events occurring in intervals
of different length. For these reasons the use of continuous time duration models
is becoming widespread in economics.
This paper considers the formulation and estimation of continuous time
econometric duration models. Research on this topic is relatively new and much
of the available literature has borrowed freely and often uncritically from
reliability theory and biostatistics. As a result, most papers in econometric
duration analysis present statistical models only loosely motivated by economic
theory and assume access to experimental data that are ideal in comparison to the
data actually available to social scientists.
This paper is in two parts. Part I -which is by far the largest-considers single
spell duration models which are the building blocks for the more elaborate
multiple spell models considered in Part II. Many issues that arise in multiple
spell models are more easily discussed in a single spell setting and in fact many of
the available duration data sets only record single spells.
Our discussion of single spell duration models is in six sections. In Section 1.1
we present some useful definitions and statistical concepts. In Section 1.2 we
present a short catalogue of continuous time duration models that arise from
choice theoretic economic models. In Section 1.3 we consider conventional
methods for introducing observed and unobserved variables into reduced form
versions of duration models. We discuss the sensitivity of estimates obtained from
single spell duration models to inherently ad hoc methods for controlling for
observed and unobserved variables.
Ch. 29: Econometric Analysis of Longitudinal Data 1691
There are now a variety of excellent textbooks on duration analysis that discuss
the formulation of duration models so that a lengthy introduction to standard
survival models is unnecessary.’ In an effort to make this chapter self-contained,
however, this section sets out the essential ideas that we need from this literature
in the rest of the chapter.
‘See especially, Kalbfleisch and Prentice (1980), Lawless (1982) and Cox and Oakes (1984).
1692 J. J. Heckman and B. Singer
Knowledge of G determines h.
Conversely, knowledge of h determines G because by integration of (1.1.1)
/*h(U)du=-ln(I-G(x))Irfc,
0 0
For the rest of this paper we assume that the distribution of T is absolutely
continuous, and we associate T with spell duration.2 In this case it is also natural
to interpret h(t) as an exit rate or escape rate from the state because it is the limit
(as A + 0) of the probability that a spell terminates in interval (t, t + A) given
that the spell has lasted t periods, i.e.
G(t +A)-G(t)
= lim
A-0 A I (I- &))
s(t) (1.1.4)
= l-G(t) ’
Equation (1.1.4) constitutes an alternative definition of the hazard that links the
models discussed in Part I to the more general multistate models discussed in Part
II.
*For a treatment of duration distributions that are not absolutely continuous see, e.g. Lawless
(1982).
Ch. 29: Econometric Analysis of Longitudinal Data 1693
S(t)=P(T>t)=l-G(t)=exp(-c(u)du). (1.1.5)
or equivalently that
s(o0) = 0.
dh(t) _+.
dt ’
The only density with no duration dependence almost everywhere is the exponen-
tial distribution. For in this case h(t) = h, a constant, and hence from (1.1.2), T is
an exponential random variable. Obviously if G is exponential, h(t) = h.
If dh(t)/dt > 0, at t = t,, there is said to be positive duration dependence at t,.
If d h( t)/dt < 0, at t = to, there is said to be negative duration dependence at t,. In
job search models of unemployment, positive duration dependence arises in the
case of a “declining reservation wage” (see, e.g. Lippman and McCall, 1976). In
this case the exit rate from unemployment is monotonically increasing in t. In job
turnover models negative duration dependence (at least asymptotically) is associ-
ated with worker-firm matching models (see, e.g. Jovanovic, 1979).
1694 J. .I.Heckman and B. Singer
Pr(t<T-ct+AlT>t,x(t),B(t))
h(tlx(t),t3(t)) = lim (1.1.7)
A-0 A
The dating on regressor vector x(t) is an innocuous convention. x(t) may include
functions of the entire past or future or the entire paths of some variables, e.g.
G(tlx,e) =I-exp(
-~(2+(u),e(24))d24), (1.1.8)
One specification of conditional hazard (1.1.7) that has received much attention
in the literature is the proportional hazard specification [see Cox (1972)]
which postulates that the log of the conditional hazard is linear in functions of t,
x and 8 and that
The one period version of this model is the workhorse of labor economics.
Consumers at age a are assumed to possess a concave twice differentiable one
period utility function defined over goods (X(a)) and leisure (L(a)). Denote this
utility function by U( X(a), L(a)). Define leisure hours so that 0 I L(a) I 1. The
consumer is free to choose his hours of work at parametric wage W(a). There are
no .fixed costs of work, and for convenience taxes are ignored. At each age the
consumer receives unearned income Y(a). There is no saving or borrowing.
Decisions are assumed to be made under perfect certainty.
The consumer works at age a if the marginal rate of substitution between
goods and leisure evaluated at the no work position (also known as the non-
1696 J. J. Hecknun and B. Singer
market wage)
mw~l) (1.24
M(Y(a))
= U,(Y(a),l) ’
is less than the market wage N’(a). For if this is so, his utility is higher in the
market than at home. The subscripts on U denote partial derivatives with respect
to the appropriate argument. It is convenient to define an index function Z(u)
written as
z(u) =W(u)-M(Y(u)).
If Z(u) 2 0, the consumer works at age a, and we record this event by setting
d(u) =l. If Z(a) -C0, d(u) = 0.
In a discrete time model, a spell of employment begins at a, and ends at u2 + 1
provided that Z(u, - 1) < 0, Z(u, + j) 2 0, j = 0,. . . , u2 - a,, I(u, + 1) < 0. Re-
versing the direction of the inequalities generates a characterization of a nonwork
spell that begins at a, and ends at u2.
To complete the econometric specification, error term E(U) is introduced.
Under an assumption of perfect certainty, the error term arises from variables
observed by the consumer but not observed by the econometrician. In the current
context, E(U) can be interpreted as a shifter of household technology and tastes.
For each person successive values of E(U) may be correlated, but it is assumed
that E(U) is independent of Y(u) and IV(u). We define the index function
inclusive of E(U) as
The probability that an employed person does not leave the employed state is
I- P(G)> (1.2.3)
The probability that a spell is longer than t, is the sum over j of the products of
the probability of receiving j innovations in t,(Pi) and the probability that the
person does not leave the employed state on each of the i occasions (1 - F( #))j.
Thus
p(q>t,)= 2 ‘f
(iJ
pj(i-P)‘e-‘(i-F(q))’
=
j=()
P(T,=t,)=P(T,>t,-l)-P(T,x,)
= (l-PF(lj/))“-‘(PF(~)). (1.2.5)
In conventional models of discrete choice over time [see, e.g. Heckman (1981a)l
P is implicitly set to one. Thus in these models it is assumed that the consumer
receives a new draw of E each period. The model just presented generalizes these
models to allow for the possibility that E may remain constant over several
periods of time. Such a generalization creates an identification problem because
PF( 4) or P(l -
from a single employment or nonemployment spell it is only possible to estimate
F( #)) respectively. This implies that any single spell model of the
duration of employment or nonemployment is consistent with the model of eq.
(1.2.2) with P =l or with another model in which (1.2.2) does not characterize
behavior but in which the economic variables determine the arrival time of new
values of E. However, access to both employment and nonemployment spells
1698 J. J. Heckman and B. Singer
solves this problem because P = PF(#)+ P(l- F(q)), and hence F(#) and. P
are separately identified.
The preceding model assumes that there are natural periods of time within
which innovations in E may occur. For certain organized markets there may be
well-defined trading intervals, but for the consumer’s problem considered here no
such natural time periods exist. This suggests the following continuous time
reformulation.
In place of the Bernoulli assumption for the arrival of fresh values of E, suppose
instead that a Poisson process governs the arrival of shocks. As is well known [see,
e.g. Feller (1970)] the Poisson distribution is the limit of a Bernoulli trial process
in which the probability of success in each interval 17= A/n, P,, goes to zero in
such a way that lim, _,OnPq + X # 0. Thus in the reformulated continuous time
model it is assumed that an infinitely large number of very low probability
Bernoulli trials occur within a specified interval of time.
For a time homogeneous environment the probability of receiving j offers in
time period t, is
Thus for the continuous time model the probability that a person who begins
employment at a = a, will stay in the employed state at least t, periods is, by
reasoning analogous to that used to derive (1.2.6),
Pr(T,>t,)= 2 exp(-hr,)$$(l-F($))i
J=o
A more direct way to derive (1.2.8) notes that from the definition of a Poisson
process, the probability of receiving a new value of E in interval (a, CI+ A) is
p = XA + o(A),
where lim d ~ o( o(A)/A) + 0, and the probability of exiting the employment state
conditional on an arrival of E is E;(#). Hence the exit rate or hazard rate from the
Ch. 29: Econometric Analysis of Longitudinal Data 1699
employment state is
h’=di_m XAFtJI)
+o(A) 3
A
= XF($).
Using (1.1.4) relating the hazard function and the survivor function we conclude
that
Pr(T,>t,lh)=exp(-A(l-F($))t,).
Analogous to the identification result already presented for the discrete time
model, it is impossible using single spell employment or nonemployment data to
separate X from F( I/J) or 1 - F( J/) respectively. However, access to data on both
employment and nonemployment spells makes it possible to identify both X and
F(#).
The assumption of time homogeneity of the environment is made only to
simplify the argument. Suppose that nonmarket time arrives via a nonhomoge-
neous Poisson process so that the probability of receiving one nonmarket draw in
interval (a, a + A) is
Assuming that W and Y remain constant, the hazard rate for exit from employ-
ment at time period a for a spell that begins at a, is
4As first noted by Lundberg (1903), it is possible to transform this model to a time homogeneous
Poisson model if we redefine duration time to be
~*(r,,a,)=/““f’A(u)du.
01
Allowing for time inhomogeneity in Y(a) and W(a) raises a messy, but not especially deep problem.
It is possible that the values of these variables would change at a point in time in between the arrival
of E values and that such changes would result in a reversal of the sign of I*(a) so that the consumer
would cease working at points in time when e did not change. Conditioning on the paths of Y(a) and
W(a) formally eliminates the probiem.
1700 J. J. Heckman and B. Singer
By similar reasoning
~(~,>t,(a,)=exp -(I-F(#))f+%(u)du).
i
This model is well exposited in Lippman and McCall (1976). The environment is
assumed to be time homogeneous. Agents are assumed to be income maximizers.
If an instantaneous cost c is incurred, job offers arrive from a Poisson process
with parameter X independent of the level of c(c > 0). The probability of
receiving a wage offer in time interval At is X At + o(At).5 Thus the probability
of two or more job offers in interval At is negligible.6
Successive wage offers are independent realizations from a known absolutely
continuous wage distribution F(W) with finite mean that is assumed to be
common to all agents. Once refused, wage offers are no longer available. Jobs last
forever, there is no on the job search, and workers live forever. The instantaneous
rate of interest is r( > 0).
V is the value of search. Using Bellman’s optimality principle for dynamic
programming [see, e.g. Ross (1970)], V may be decomposed into three compo-
nents plus a negligible component [of order o(A t)].
v=------- cAt
(l- XAt) ~
l+rAt + l+rAt
= 0 otherwise. (1.2.12)
The first term on the right of (1.2.12) is the discounted cost of search in interval
At. The second term is the probability of not receiving an offer (1 - X At) times
the discounted value of search at the end of interval At. The third term is the
probability of receiving a wage offer, (A At), times the discounted value of the
expectation [computed with respect to F(W)] of the maximum of the two options
confronting the agent who receives a wage offer: to take the offer (with present
value w/r) or to continue searching (with present value I’). Note that eq. (1.2.12)
is defined only for V> 0. If I/= 0, we may define the agent as out of the labor
force [see Lippman and McCall (1976)]. As a consequence of the time homogene-
ity of the environment, once out the agent is always out. Sufficient to ensure the
existence of an optimal reservation wage policy in this model is E( ]Wl) < cc
[Robbins (1970)].
Collecting terms in (1.2.12) and passing to the limit, we reach the familiar
formula [Lippman and McCall (1976)]
p = hA + o(A), (1.2.14)
and further note that the probability that an offer is accepted is (1 - F(rV)) so
and
For discussion of the economic content of this model, see, e.g Lippman and
McCall (1976) or Flinn and Heckman (1982a).
Accepted wages are truncated random variables with rV as the lower point of
truncation. The density of accepted wages is
Thus the one spell search model has the same statistical structure for accepted
wages .as other models of self selection in labor economics [Lewis (1974),
Heckman (1974), and the references in Amemiya (1984)].
1702 J. J. Heckmun and B. Singer
From the assumption that wages are distributed independently of wage arrival
times, the joint density of duration times t, and accepted wages (w) is the
product of the density of each random variable,
&)=X(r)A+o(A). (1.2.19)
The probability that it is accepted is (1 - F( rV( 7))). Thus the hazard rate at time
7 for exit from an unemployment spell is
h(z)(l-F(rV(z)))dr). (1.2.21)
. exp ( -J’““+)(l-a(rV(z)))dr).8
71
‘For time inhomogeneity induced solely by the finiteness of life, the reservation wage property
characterizes an optimal policy (see, e.g. De Groot, 1970).
‘Note that in this model it is trivial to introduce time varying forcing variables because by
assumption the agent cannot accept a job in between arrival of job offers. Compare with the discussion
in footnote 4.
C/I. 29: Econometric Analysis o/ Longitudinul Datu 1703
As in the marketing literature (see, e.g. Hauser and Wisniewski, 1982a, b, and its
nonstationary extension in Singer, 1982), we imagine consumer choice as a
sequential affair. An individual goes to a grocery store at randomly selected times.
Let h(r) be the hazard function associated with the density generating the
probability of the event that the consumer goes to the store at time r. We assume
that the probability of two or more visits to the store in interval A is o(A).
Conditional on arrival at the store, he may purchase one of J items. Denote the
purchase probability by Pi<7). Choices made at different times are assumed to be
independent, and they are also independent of arrival times. Then the probability
that the consumer purchases good j at time r is
so that the probability that the next purchase is item j at a time t = r + ri or later
is
The Pj may be specified using one of the many discrete choice models discussed
in Amemiya’s survey (1981). For the McFadden random utility model with
Weibull errors (1974), the Pi are multinominal logit. For the Domencich-
McFadden (1975) random coefficients preference model with normal coefficients
the Pj are specified by multivariate probit.
In the dynamic McFadden model few new issues of estimation and specifica-
tion arise that have not already been discussed above or in Amemiya’s survey
article (1984). For concreteness, we consider the most elementary version of this
model.
Following McFadden (1974), the utility associated with each of J possible
choices at time r is written as
~(T)=v(~,xj(T))+E(~,xj(T)), j=l,. . ., J
c exP(x;(+(4).
I=1
The most direct approach to estimating the economic duration models presented
in Section 1.2 is to specify functional forms for the economic parameters and their
dependence on observed and unobserved variables. This approach is both costly
and controversial. It is controversial because economic theory usually does not
produce these functional forms- at best it specifies potential lists of regressor
variables some portion of which may be unobserved in any data set. Moreover in
many areas of research such as in the study of unemployment durations, there is
no widespread agreement in the research community about the correct theory.
The approach is costly because it requires nonlinear optimization of criterion
functions that often can be determined only as implicit functions. We discuss this
point further in Section 1.6.
Because of these considerations and because of a widespread belief that it is
useful to get a “feel for the data” before more elaborate statistical models are fit,
reduced form approaches are common in the duration analysis literature. Such an
approach to the data is inherently ad hoc because the true functional form of the
duration model is unknown. At issue is the robustness of the qualitative in-
ferences obtained from these models with regard to alternative ad hoc specifica-
tions. In this section of the paper we review conventional approaches and reveal
Ch. 29: Econometric Analysis of Longitudinal Data 1705
their lack of robustness. Section 1.4 presents our response to this lack of
robustness.
The problem of nonrobustness arises solely because regressors and unobserv-
ables are introduced into the duration model. If unobservables were ignored and
the available data were sufficiently rich, it would be possible to estimate a
duration model by a nonparametric KaplanMeier procedure [see, e.g. Lawless
(1982) or Kalbfleisch and Prentice (1980)]. Such a general nonparametric ap-
proach is unlikely to prove successful in econometrics because (a) the available
samples are small especially after cross classification by regressor variables and
(b) empirical modesty leads most analysts to admit that some determinants of any
duration decision may be omitted from the data sets at their disposal.
Failure to control for unobserved components leads to a well known bias
toward negative duration dependence. This is the content of the following
proposition:
Proposition 1
Uncontrolled unobservables bias estimated hazards towards negative duration
dependence. 0
The proof is a straightforward application of the Cauchy-Schwartz theorem.
Let h( tlx, 6) be the hazard conditional on x, B and h(tln) is the hazard
conditional only on x. These hazards are associated respectively with conditional
distributions G(tlx,8) and G(tlx).
From the definition,
j&lxJ)dd6’)
h(tlx) =
Thus’
ah(M) =
j-(1-G(tlx,e)) ah(ra-Jye)
d/J(e)
Jo-ww))dd~)
ah(tlx,@) = at
at
1706 J. J. Heckman and B. Singer
v(t) = 0 t-c L,
(1.3.3)
v(t) =l 12 L,
Ch. 29: Econometric Analysis of Lmgitudinal Data 1101
= {~(t)‘-dv(t)d}l[h(tlr(t),e)l”S(tlx(t),e)dp(B). (1.3.4)
8
(1.3.5)
“Heckman and Singer (1982) present some examples. They are not hard to generate for anyone
with access to tables of integral transforms.
1708 J. J. Heckman and B. Singer
such variables because introducing them into the analysis raises computational
problems. Except for special time paths of variables the term
which appears in survivor function (1.1.8) does not have a closed form expression.
To evaluate it requires numerical integration.
To circumvent this difficulty, one of two expedients is often adopted (see, e.g.
Lundberg, 1981, Cox and Lewis, 1966):
(i) Replacing time trended variables with their within spell average
Expedient (i) has the undesirable effect of building spurious dependence between
duration time t and the manufactured regressor variable. To see this most clearly,
suppose that x is a scalar and x(u) = a + bu. Then clearly
x(t) = a + it,
and t and x(t) are clearly linearly dependent. Expedient (ii) ignores the time
inhomogeneity in the environment.”
To illustrate the potential danger from adopting these expedients consider the
numbers presented in Table 1. These record Weibull hazards ((1.3.2) with yz = 0
and A, = 0) estimated on data for employment to nonemployment transitions
using the CTM program described by Hotz (1983). In these calculations, unob-
servables are ignored. A job turnover model estimated using expedient (i) indi-
cates weak negative duration dependence (column one row two) and a strong
negative effect of high national unemployment rates on the rate of exiting jobs.
The same model estimated using expedient (ii) now indicates (see column two)
strong negative duration dependence and a strong positive effect of high national
I1Moreover in the multistate models with heterogeneity that are presented in Part II of this paper,
treating x(0) ‘as exogenous is incorrect because the value of x(0) at the start of the current spell
depends on the lengths of outcomes of preceding spells. See the discussion in Section 2.2. This
problem is also discussed in Flinn and Heckman (1982b, p. 62).
Ch. 29: Econometric Anulysis of Longitudinal Dais 1709
Table 1
Weibull model - Employment to nonemployment transitions
(absolute value of normal statistics in parentheses)a
PI-1
ln(h(tlx,d))=x(t)P+ 7 -Y1+ w.
i 1 i
If
t”’ -1
x(t) = 7’
1
Table 2
Sensitivity to misspecification of the mixing distribution ~(0)“~~
control for time varying regressor variables may mislead, but introducing such
variables may create an identification problem.
Next we consider the consequence of misspecifying the distribution of unob-
servables. Table 2 records estimates of a Weibull duration model with three
different specifications for ~(6) as indicated in the column headings. The esti-
mates and inference vary greatly depending on the functional form selected for
the mixing distribution. Trussell and Richards (1983) report similar results and
exhibit similar sensitivity to the choice of the functional form of the conditional
hazard h(tlx, 8) for a fixed p(O).
(A) What features, if any, of h( tin, l3) and/or ~(0) can be identified from the
“raw data”, i.e. G(tlx)?
Ch. 29: Econometric Analysis of Longiiudinal Data 1711
(B) Under what conditions are h(tjx, tl) and ~(6) identified? i.e. how much
a priori information has to be imposed on the model before these functions
are identified?
(C) What empirical strategies exist for estimating h(tlx, 0) and/or ~(8) non-
parametrically and what is their performance?
This section presents criteria that can be used to test the null hypothesis of no
structural duration dependence and that can be used to assess the degree of model
complexity that is required to adequately model the duration data at hand. The
criteria to be set forth here can be viewed in two ways: As identification theorems
and as empirical procedures to use with data.
We consider the following problem: G(tlx) is estimated. We would like to infer
properties of G(tlx, 0) without adopting any parametric specification for p(O) or
h (t Ix, 0). We ignore any initial conditions problems. We further assume that x(t)
is time invariant.‘*
As a consequence of Proposition 1 proved in the preceding section, if G( tlx)
exhibits positive duration dependence for some intervals of t values, h( tlx, 0)
must exhibit positive duration dependence for some interval of @ values in those
intervals of t. As noted in Section 1.3, this is so because the effect of scalar
heterogeneity is to make the observed conditional duration distribution exhibit
more negative duration dependence (more precisely, never less negative duration
dependence) than does the structural hazard h( tlx, 0).
In order to test whether or not an empirical G( tlx) exhibits positive duration
dependence, it is possible to use the total time on test statistic (Barlow et al. 1972,
p. 267). This statistic is briefly described here. For each set of x values,
constituting a sample of 1, durations, order the first k durations starting with the
smallest
l2 If x(t) is not time invariant, additional identification problems arise. In particular, nonparametric
estimation of G( tlx( 1)) becomes much more difficult.
1712 J. J. Heckman and B. Singer
V, is called the cumulative total time on test statistic. If the observations are from
a distribution with an increasing hazard rate, V, tends to be large. Intuitively, if
G(tlx)is a distribution that exhibits positive duration dependence, D,: 1, sto-
chastically dominates D,: I,, D,: ,, stochastically dominates D, : l,, and so forth.
Critical values for testing the null hypothesis of no duration dependence have
been presented by Barlow and associates (1972, p. 269). This test can be modified
to deal with censored data (Barlow et al. 1972, p. 302). The test is valuable
because it enables the econometrician to test for positive duration dependence
without imposing any arbitrary parametric structure on the data.
Negative duration dependence is more frequently observed in economic data.
That this should be so is obvious from eq. (1.3.1) in the proof of Proposition 1.
Even when the structural hazard has a positive derivative dh(t]x, 0)/& > 0, it
often occurs that the second term on the right-hand side of (1.3.1) outweighs the
first term. It is widely believed that it is impossible to distinguish structural
negative duration dependence from a pure heterogeneity explanation of observed
negative duration dependence when the analyst has access only to single spell
data. To investigate duration distributions exhibiting negative duration depen-
dence, it is helpful to distinguish two families of distributions.
Let 9,= {G: -ln[l-G(t] x )] is concave in t holding x fixed}. Membership in
this class can be determined from the total time on test statistic. If G, is log
concave, the 0,: r, defined earlier are stochastically increasing in i for fixed 1,
and X. Ordering the observations from the largest to the smallest and changing
the subscripts appropriately, we can use V, to test for log concavity.
Next let gz = {G: G(tlx) = /(l -exp( - t+(X)n(d)))dp (0) for some probabil-
ity measure p on [0, cc]}. It is often erroneously suggested that gt = gz2, i.e. that
negative duration dependence by a homogeneous population (G E 9,) cannot be
distinguished from a pure heterogeneity explanation (G E g*).
In fact, by virtue of Bernstein’s theorem (see, e.g. Feller, 1971, p. 439-440) if
G E qz it is completely monotone, i.e.
[see Heckman and Singer (1982) and Lancaster and Nickel1 (1980)].
Ch. 19: Econometric Annlysis of Longitudinal Data 1713
(1.4.3)
where
A”S(rlx) = S(@),
A’S(/lx)=S(llx)-S(l+llx),
(1.4.6)
and, in what follows, set dv(t) = u(t)dr and g(t]e) = /3(B)[expte]v(t). Then the
density g(t) = /fl( B)exp(te) u( t)m( B)de governs the observable durations of
spells, g(t]e) is a member of the exponential family, and k(t]@) = P(B)exp(te) is
TP, (Karlin, 1968). The essential point in isolating this class of duration densities
is that knowledge of the number and character of the modes of g/u implies that
the density, m, of the mixing measure must have at least as many modes. In
particular, if g/u is unimodal, m cannot be monotonic; it must have at least one
mode. More generally, if c is an arbitrary positive level and (g(t)/u(t))- c
changes sign k times as t increases from 0 to + co, then m(O)- c must change
sign at least k times as 8 traverses the parameter set 8 from left to right (Karlin,
1968, p. 21).
The importance of this variation-diminishing character of the transformation
jk( tl8)m( 8)dB for modeling purposes is that if we assess the modality of g
using, for example, the method of Hartigan and Hartigan (1985) then because u
is given a priori, we know the modality of g/u, which in turn, implies restrictions
on m in fitting mixing densities to data. In terms of a strategy of fitting finite
mixtures, a bimodal g/u suggests fitting a measure with support at, say, five
1716 J. J. Heckman and B. Singer
Figure 1
points to the data, but subject to the constraints that p1 < p2, p2 > p3, p3 < p4,
and p4 > ps, as shown in Figure 1.
Subsequent specification of a mixing density m( r3) to describe the same data
could proceed by fitting spline polynomials with knots at 8,, . . ., 0, to the
estimated discrete mixing distribution.
1.4.3. Identijiability
Z(t) = [+!+)du.
Then for the proportional hazard model (1.4.7) we have the following proposition
due to Elbers and Bidder (1982).
Proposition 2
If (i) E(O) = 1, (ii) Z(t) defined on [0, co) can be written as the integral of a
nonnegative integrable function 4 (t) defined on [0, cc), Z( t ) = /d# (u) du, (iii) the
set S, x E S, is an open set in Rk and the function cp is defined on S and is
nonnegative, differentiable and nonconstant on S, then Z, (p, and p(e) are
identified. 0
Ch. 29: Econometric Analysis of Longitudinal Data 111-l
l-G,(tjx)=L,(o’)=L,(w)=l-G&lx),
I3 For
so a model with
explains the data as well as the original model (a = LYE,/3 = & and p = p. with
E( 0) < 00).
The requirement that E(O) < 00 is overly strong. Heckman and Singer (1984a)
establish identifiability when E(O) = cc by restricting the tail behavior of the
admissible mixing distribution. Their results are recorded in the following pro-
position.
Proposition. 3
If
m(e)- (lns)‘sf’ier~(q
’
(1.4.8)
ase~oowherec>O,O<~<landy2OwhereL(B)isslowlyvaryingin
the sense of Karamata.14 E is assumed known.
(ii) Z E 2 = { Z(t), t 2 0 : Z(t) is a nonnegative increasing function with Z(0) =
0 and 3c > 0 and t, not depending on the function Z(t) such that
Z( t + ) = c where c is a known constant}.
(iii) $JE @ = { +( x ), x E S: $J is nonconstant on S, 3 at least one coordinate xi
defined on (- 00, 00) such that +(O,O,. . . , x;,O,. . .) traverses (0, cc)
as xi traverses (- cc, cc), 0 ES, and +(O) = l}.
Then Z, 9, and p are identified. 0 For proof, see Heckman and Singer (1984a).
Condition (i) is weaker than the Elbers and Kidder condition (i). 8 need not
possess moments of any order nor need the distribution function p have a
density. However, in order to satisfy (i) the tails of the true distribution are
assumed to die off at a fast enough rate and the rate is assumed known. The
condition that Z( t + ) = c for some c > 0 and t + > 0 for all admissible Z plays an
important role. This condition is satisfied, for example, by a Weibull integrated
hazard since for all (Y,Z(1) = 1. The strengthened condition (ii) substitutes for the
weakened (i) in our analysis. Condition (iii) has identical content in both analyses.
The essential idea in both is that + varies continuously over an interval. In the
l4Heckman and Singer (1984a) also present conditions for p( 0) that are not absolutely continuous.
For a discussion of slowly varying functions see Feller (1971, p. 275).
Ch. 29: Econometric Analysis of Longitudinal Data 1719
for all t 2 0 must hold for at least two distinct pairs (cy,, po), ((~i, pr). We then
derive contradictions. We demonstrate under certain stated conditions that these
identities cannot hold unless a, = a,. Then ~1 is identified by the uniqueness
theorem for Laplace transforms.
15As previously noted, in their Appendix Elbers and Ridder (1982) generalize their proofs to a case
in which all of the regressors are discrete valued. However, a regressor is required in order to secure
identification.
1720 J. J. Heckman and B. Singer
Z;=exp(y(y)).
For this class of hazard models there is an interesting tradeoff between the
interval of admissible h and the number of bounded moments that is assumed to
restrict the admissible p( 0). More precisely, the following propositions are proved
in our joint work.
Proposition 4
For the true value of A, A,, defined so that A, I 0, if E(O) -c COfor all admissible
p, and for all bounded y, then the triple (ye, A,, pO) is uniquely identified. 0 [For
proof, see Heckman and Singer (1984a).]
Proposition 5
For the true value of A, A,, such that 0 < A, < 1, if all admissible p are restricted
to have a common finite mean that is assumed to be known a priori (E( 0) = m,)
and a bounded (but not necessarily common) second moment E(02) < 00, and
all admissible y are bounded, then the triple (yO, A,, pO) is uniquely identified. 0
(For proof see Heckman and Singer, 1984a.)
Proposition 6
For the true value of A, A,, restricted so that 0 < A, < j, j a positive integer, if all
admissible ~1are restricted to have a common finite mean that is assumed to be
known a priori (E(O) = m,) and E bounded (but not necessarily common)
(j + 1)” moment (E(@+‘) < CO), and all admissible y are bounded, then the
triple (y,,, A,, p,J is uniquely identified. 0 (For proof see Heckman and Singer,
1984a.)
Proposition 7
For hazard model (1.4.9), the triple (A,, (Ye,p,,) is identified provided that the
admissible p are restricted to have a common finite mean E(O) = m, < cc. 0 (For
proof, see Heckman and Singer, 1984a.)
An interesting and more direct strategy of proof of identifiability which works
for some of the hazard model specifications given above is due to Arnold and
Brockett (1983). To illustrate their argument, consider the Weibull hazard
h(tpq = d-9,
and mixing distributions restricted to those having a finite mean. Then a routine
calculation shows that (Y may be calculated directly in terms of the observed
survivor function via the recipe
The mixing distribution is then identified using the uniqueness theorem for
Laplace transforms. Their proof of identifiability is constructive in that it also
provides a direct procedure for estimation of ~(6) and (Ythat is distinct from the
procedure discussed below.
Provided that one adopts a parametric position on h(tlt9) these propositions
show that it is possible to completely dispense with regressors. Another way to
interpret these results is to note that since for each value of x, we may estimate 2,
and p(8), it is not necessary to adopt proportional hazards specification (1.4.7) in
order to secure model identification. All that is required is a conditional (on x)
proportional hazards specification. Z and p may be arbitrary functions of X.
Although we have no theorems yet to report, it is obvious that it should be
possible to reverse the roles of p(e) and h(tlr3): i.e. if p(O) is parameterized it
should be possible to specify conditions under which h( tie) is identified nonpara-
metrically.
The identification results reported here are quite limited in scope. First, as
previously noted in Section 1.3, the restriction that the regressors are time
invariant is crucial. If the regressors contain a common (to all observations) time
trended variable, ‘p can be identified from +!Jonly if strong functional form
assumptions are maintained so that In Ic, and lncp are linearly independent. Since
one cannot control the external environment, it is always possible to produce a #
function that fails this linear independence test. Moreover, even when x(t)
follows a separate path for each person, so that there is independent variation
between In Jl(t) and lncp( t), at least for some observations, a different line of
proof is required than has been produced in the literature.
1122 J. J. Heckman and B. Singer
Second, and more important, the proportional hazard model is not derived
from an economic model. It is a statistically convenient model. As is implicit from
the models presented in Section 1.2 and as will be made explicit in Section 1.6
duration models motivated by economic theory cannot in general be cast into a
proportional hazards mold. Accordingly, the identification criteria discussed in
this section are of limited use in estimating explicitly formulated economic
models. In general, the hazard functions produced by economic theory are not
separable as is assumed in (1.4.7).
Research is underway on identifiability conditions for nonseparable hazards.
As a prototype we present the following identification theorem for a specific
nonseparable hazard.
Proposition 8
Nonseparable model with (i) Z,(t) = t(ux)2+e, (ii) density ws(x]O) =
(8 +P)[exp-(0 +&xl and (iii) jedp(B) < cc is identified. 0 For proof, see
Heckman and Singer (1983).
Note that not only is the hazard nonseparable in x and 8 but the density of x
depends on 0 so that x is not weakly exogeneous with respect to 8.
Before concluding this discussion of identification, it is important to note that
the concept of identifiability employed in this and other papers is the requirement
that the mapping from a space of (conditional hazards) X (a restricted class of
probability distributions) to (a class of joint frequency functions for durations
and covariates) be one to one and onto. This formulation of identifiability is
standard. In this literature there is no requirement of a metric on the spaces or of
completeness. Such requirements are essential if consistency of an estimator is
desired. In this connection, Kiefer and Wolfowitz (1956) propose a definition of
identifiability in a metric space whereby the above-mentioned mapping is 1: 1 on
the completion (with respect to a given metric) of the original spaces. Without
some care in defining the original space, undesirable distributions can appear in
the completions.
As an example, consider a Weibull hazard model with conditional survivor
function given an observed k-dimensional covariate x defined as
where
Cartesian product of the parameter space of allowed OLand /3 values and the
probability distributions on [0, + cc) satisfying /$‘a dF,( V) = 1. The completion
contains distributions Fi on [0, + oe) satisfying jomudF,( u) = cc. Now observe
that if S(t(x) has a representation as defined above for some (YE (0,l) and F,
with mean 1, then it is also a completely monotone function of t. Thus we also
have the representation
S(tlx) = jm[exp(
- t(w( -d%))u)]d&(u),
0
but now Fi must have an infinite mean. This implies that ((Ye,&,, F,) and
(1, &, Fi) generate the same survivor function. Hence the model is not identifi-
able on the completion of a space where probability distributions are restricted to
have a finite mean.
This difficulty can be eliminated by further restricting F, to belong to a
uniformly integrable family of distribution functions. Then all elements in the
completion with respect to the Kiefer-Wolfowitz and a variety of other metrics
will also have a finite mean and identifiability is again ensured. The comparable
requirement for the case when E,,(V) = cc is that (1.4.8) converges uniformly to
its limit.
The a priori restriction of identifiability considerations to complete metric
spaces is not only central to establishing consistency of estimation methods but
also provides a link between the concept of identifiability as it has developed in
econometrics and notions of identifiability which are directly linked to con-
sistency as in the engineering literature on control theory.
t* = q(x)[#(+-b= cp(x)z(t).
For any fixed value of the parameters determining v(x) and Z(t) in (1.4.7) t*
conditional on 8 is an exponential random variable i.e.
For this model, the following propositions can be established for the Nonpara-
metric Maximum Likelihood Estimator (NPMLE).
Proposition 9
Assuming that no points of support { ei} come from the boundary of 8 the
NPMLE is unique. 0 (See Heckman and Singer, 1984b.)
Proposition 11
For uncensored data, 1&,.,=1/t& and d,, =1/t,& where “*” denotes the
NPMLE estimate, and t;, and t& are, respectively, the sample maximum and
161ncomputing the estimator it is necessary to impose all of the identifiability conditions in order to
secure consistent estimators. For example, in a Weibull model with E( 8) < cc, it is important to
impose this requirement in securing estimates. As our example in the preceding subsection indicated,
there are other models with E(8) = cc that will explain the data equally well. In large samples, this
condition is imposed, for example, by picking estimates of p(0) such that l/(1 - b(@)df?l< DC,or
equivalently l/(1 - P(t?)df?-’ > 0. Similarly, if identification is secured by tail condition (1.4.8), this
must be imposed in selecting a unique estimator. See also the discussion at the end of Section 1.4.3.
CIT. 29: Econometric Anu!ysis of Longitudinal Data 1725
minimum values for t*. For censored data, emi, = 0 and d,, = l/tz,-,. 0 (See
Heckman and Singer, 1984b.)
These propositions show that the NPMLE for p( r3) in the proportional hazard
model is in general unique and the estimated points of support lie in a region with
known bounds (given t *). In computing estimates one can confine attention to
this region. Further characterization of the NPMLE is given in Heckman and
Singer (1984b).
It is important to note that all of these results are for a given t* = Z( t)cp(x).
The computational strategy we use fixes the parameters determining Z(t) and
(p(x) and estimates ~(0). For each estimate of ~(6’) so achieved Z(t) and cp(x)
are estimated by traditional parametric maximum likelihood methods. Then fresh
t * are generated and a new p( 0) is estimated until convergence occurs. There is
no assurance that this procedure converges to a global optimum.
In a series of Monte Carlo runs reported in Heckman and Singer (1984b) the
following results emerge.
(i) The NPMLE recovers the parameters governing Z(t) and q(x) rather well.
(ii) The NPMLE does not produce reliable estimates of the underlying mixing
distribution.
(iii) The estimated c.d.f. for duration times G(tlx) produced via the NPMLE
predicts the sample c.d.f. of durations quite well even in fresh samples of
data with different distributions for the x variables.
A typical run is reported in Table 3. The structural parameters (a,, (Ye) are
estimated rather well. The mixing distribution is poorly estimated but the within
sample agreement between the estimated c.d.f. of T and the observed c.d.f. is
good. Table 4 records the results of perturbing a model by changing the mean of
the regressors from 0 to 10. There is still close agreement between the estimated
model (with parameters estimated on a sample where X - N(O,l)) and the
observed durations (where X - N(10, 1)).
The NPMLE can be used to check the plausibility of any particular parametric
specification of the distribution of unobserved variables. If the estimated parame-
ters of a structural model achieved from a parametric specification of the
distribution of unobservables are not “too far” from the estimates of the same
parameters achieved from the NPMLE, the econometrician would have much
more confidence in adopting a particular specification of the mixing distribution.
Development of a formal test statistic to determine how far is “too far” is a topic
for the future. However, because of the consistency of the nonparametric maxi-
mum likelihood estimator a test based on the difference between the parameters
of Z(t) and cp(x) estimated via the NPMLE and the same parameters estimated
under a particular assumption about the functional form of the mixing distribu-
tion would be consistent.
1126 J. J. Heckman and B. Singer
Table 3
Results from a typical estimation
*The numbers reported below the estimates are standard errors from the estimated information
matrix for (a, P, 0) given I*. As noted in the text these have no rigorous justification.
Table 4
Predictions on a fresh sample, X - N(10, 1)
(The model used to fit the parameters is X - N(O,l)).
The fact that we produce a good estimator of the structural parameters while
producing a poor estimator for p suggests that it might be possible to protect
against the consequences of m&specification of the mixing distribution by fitting
duration models with mixing distributions from parametric families, such as
members of the Pearson system, with more than the usual two parameters. Thus
the failure of the NPMLE to estimate more than four or five points of increase for
p can be cast in a somewhat more positive light. A finite mixture model with five
points of increase is, after all, a nine (independent) parameter model for the
mixing distribution. Imposing a false, but very flexible, mixing distribution may
not cause much bias in estimates of the structural coefficients. Morever, for small
I*, computational costs are lower for the NPMLE than they are for traditional
parametric maximum likelihood estimators of ~(8). The computational costs of
precise evaluation of ~(0) over “small enough” intervals of 13 are avoided by
estimating a finite mixtures model.
We conclude this section noting that the Arnold and Brockett (1983) estimator
for (Ydiscussed in Section 1.4.3 circumvents the need to estimate dp(B) and so in
this regard is more attractive than the estimator discussed in this subsection.
Exploiting the fact that t * is independent of X, it is possible to extend their
estimator to accommodate models with regressors. (The independence conditions
provide orthogonality restrictions from which it is possible to identify j3.) How-
ever, it is not obvious how to extend their estimator to deal with censored data.
Our estimator can be used without modification on censored data.
There are few duration data sets for which the start date of the sample coincides
with the origin date of all sampled spells. Quite commonly the available data are
random samples of interrupted spells or else are spells that begin after the start
date of the sample. For interrupted spells one of the following duration times may
be observed: (1) time in the state up to the sampling date (Tb), (2) time in the
state after the sampling date (T,), or (3) total time in a completed spell observed
at the origin of the sample (T, = T, + Tb). Durations of spells that begin after the
origin date of the sample are denoted Td.
In this section we derive the density of each of these durations for time
homogeneous and time inhomogeneous environments and for models with and
without observed and unobserved explanatory variables. The main message of
this section is that in general the distributions of each of the random variables T,,
Tb, T, and Td differ from the population duration distribution G(t). Estimators
based on the wrong duration distribution in general produce invalid estimates of
the parameters of G(t) and will lead to incorrect inference about the population
duration distribution.
1728 J. J. Heckmanand B. Singer
We first consider the analytically tractable case of a single spell duration model
without regressors and unobservables in a time homogeneous environment. To
simplify notation we assume that the sample at our disposal begins at calendar
time 0. Looking backward, a spell of length t, interrupted at 0 began t, periods
ago. Looking forward, the spell lasts t, periods after the sampling date. The
completed spell is t, = t, + t, in length. We ignore right censoring and assume
that the underlying distribution is nondefective. (These assumptions are relaxed in
Subsection 15.2 below.)
Let k( - tb) be the intake rate; i.e. t, periods before the sample begins, k( - tb)
is the proportion of the population that enters the state of interest at time
r = - t,. The time homogeneity assumption implies that
Let g(t) = h( t)exp( - /dh( u)du) be the density of completed durations in the
population.The associated survivor function is
s(t)=l-G(t)=exp(-c(u)du).
P,,lmk(_t~)(l-G(tb))dth=J”k(-tb)exp(-j~~h(u)du)dt,.
0 0 0
Thus the density of an interrupted spell of length t, is the ratio of the proportion
surviving from those who entered tb periods ago to the total stock
k(-t,)exp( -Joihh(u)du)
f(t,> = k(-t&-G@,)) = (1.5.2)
PII PO
“See Cox (1962), Cox and Lewis (1966), Sheps and Menken (1973), Salant (1977) and Baker and
Trevedi (1982) for useful presentations of time homogeneous models.
Ch. 29: Econometric Am&is of Longitudinal Data 1729
The density of sampled interrupted spells is not the same as the population
density of completed spells.
The density of sampled completed spells is obtained by the following straight-
forward argument. In the population, the conditional density of t, given 0 < t, -c t,
is
dd
g(tch)= (1 _ @lb))
= h(t,)exp( - /%(u)du),
fb
t, > t,. (1.5.4)
(1.5.5)
SO
The density of the forward time t, can be derived from (1.5.4). Substitute for tc
using t, = t, + t, and integrate out t, using density (1.5.3). Thus
The following results are well known about the distributions of the random
variables T,, Tb and T,.
(0 If g(t) is exponential with parameter 0 (i.e. g(t) = 0exp( - 10) then so are
f( t,) and f( th). The proof is immediate.
(ii) E(T,) = (m/2)(1 +(a2/m2))18
where a2=E(T-m)2= w(r - m)2g(t)dt.
J
(iii) E(T,) = (m/2)(1 +(u2,mz0))
(since T, and T, have the same density)
(iv) E(TJ = rn(l+(~~/rn~))‘~
so E(T,) = 2E(T,) = 2E(T,)
and E (T,) > m unless u 2 = 0.
(v) If (-ln(l-G(t))/t)t in t, u2/m2 >l.
(This condition is implied if h(t) = g(t)/1 - G(t) is increasing in f i.e.
h’(t) > 0). In this case, E(T,) = E(T,) > m. (See Barlow and Proschan, 1975
for proof).
(vi) If (-ln(l-G(t))/t)lt, u2/m2<1.
(This condition is implied if h’(t) < 0.) In this case E(T,) = E(T,) < m. (See
Barlow and Proschan, 1975 for proof).
Result (i) restates the classical result (see Feller, 1970) that if the population
distribution of durations is exponential so are the sample distributions of T, and
Tb. Result (iii) coupled with result (v) indicates that if the population distribution
of durations exhibits positive duration dependence, the mean of interrupted spells
( Tb) exceeds the population mean duration. Result (iii) coupled with (vi) reverses
this ordering for duration distributions with negative duration dependence. Result
(iv) indicates that sampled completed spells have a mean in excess of the
population mean unless a2 = 0 (hence the term “length biased sampling”) and
that completed spells have a mean twice that of interrupted (Tb) or partially
completed forward spells (T,).
We next present the distribution of Td, the duration time for spells that begin
after the origin date of the sample. Let F denote the time a spell begins. The
density of Y is k(r). Assuming that F and Td are independent the joint
probability that a spell begins at .F = r and lasts less than t, periods is
Pr{.F=randT,<t,}=k(r)G(t,).
The distributions of T,, Tb and T, are of a different functional form than the
distribution of T. The only exception is the case in which T is an exponential
random variable with parameter X; in this case T, and Tb are also exponential
with parameter X. The distribution of Td has the same functional form as the
distribution of T.
Thus in a typical longitudinal sample in which data are available for the
completed portions of durations of spells in progress (T,) and on durations
initiated after the origin date of the sample (Td), two different distributions are
required to analyze the data.
It is common to “solve” the left censoring problem by assuming that G(t) is
exponential. The bias that results from invoking this assumption when it is false
can be severe. As an example suppose that the population distribution of t is
Weibull so
Suppose that the sample data are on the completed portions of interrupted spells
and that there is no right censoring so that using formula (1.56)
fw( - t”cp)
J-0,)= ,I \ 3
r ;+1
( 1
f
V-J
l/a+1
I
If it is falsely assumed that g*(t) = Xe- Ix, the maximum likelihood estimator of
1732 J. J. Heckman and B. Singer
r; 1
plim 1 = q~(‘/~)-. i
r;i i
For (Y= 2,
plim X = ((p)l’*I71/2).
plimj\=2A.
and the parameters (Yand cp are estimated by maximum likelihood. The maxi-
Ch. 29: Econometric Analysis of Longitudina/ Data 1733
i: ti”
1
-=
i=l
+ I’
C In tj @ f: (lnt,)tf
L+ i=l = i=l
& Z z ’
so
i In ti i tFlnti
I+ i=l = i=l
(1.59)
& I
it; .
i=l
Wp)
cctP-‘(lnt)exp(-th)dt=X-P F-lnXr(P) ,
/0
and the fact that in large samples plim &= CY*is the value of (Y* that solves
(1.5.9), (Y*is the solution to
E(t”*lnt)
-$+E(lnt)=
E(t”*) ’
r’(P+l) 1 + T’(P)
r(P+l) =F z-(P)’
1734 J. J. Heckman and B. Singer
Since r(2) = 1, it is clear that (Y*= 1 is never a solution of this equation. In fact,
since the left hand side is monotone decreasing in ar* and the right hand side is
monotone increasing in (Y*, and since at OL*= 1, the left hand side exceeds the
right hand side, the value of (Y* that solves (1.5.11) exceeds unity. Thus if a
Weibull model is fit by maximum likelihood to length biased completed spells
generated by an exponential population model, in large samples positive duration
dependence will always be found, i.e. (Y*> 1.
It can also be shown that
A"*-'
plim + =
r(cu* +2) .
1.5.2. The densities of T,, Tb, T, and Td in time inhomogeneous environments for
models with observed and unobserved explanatory variables
We define k( r1x( r), 0) to be the intake rate into a given state at calendar time r.
We assume that 0 is a scalar heterogeneity component and x(r) is a vector of
explanatory variables. It is convenient and correct to think of k( rlx( r), 6) as the
density associated with the random variable F for a person with characteristics
(x(r), 8). We continue the useful convention that spells are sampled at 7 = 0.
The densities of T,, Tb, T, and Td are derived for two cases: (a) conditional on a
sample path { x(u)}’ o. and (b) marginally on the sample path { x(u)}’ m, (i.e.
integrating it out). We denote the distribution of {x(u)}’ o. as D(x) with
associated density dD( x).
Ch. .?9: Econometric Analysis of Longitudinal Data 1735
Note that, unlike the case in the models analyzed in Section 1.5.1, this integral
may exist even if the underlying distribution is defective provided that the k( .)
factor damps the survivor function. We require
The proportion of people in the state with sample path {x(u)}? m whose spells
are exactly of length t, is the set of survivors from a spell that initiated at
7= -t, or
P,,= / $+~(x),
1736 J. J. Heckmun and B. Singer
X exp (-ld”h(ulx(u-t,),8)du)dp(e)dD(x)
f(b) = (1.514)
PO
Note that we use a function space integral to integrate out {x(u)}‘?,. [See Kac
(1959) for a discussion of such integrals.] Note further that one obtains an
incorrect expression for the marginal density of Tb if one integrates (5.13) against
the population density of x(dD(x)). The error in this procedure is that the
appropriate density of x against which (1.5.13) should be integrated is a density
of x conditional on the event that an observation is in the sample at r = 0. By
Bayes’ theorem this density is
which is not in general the same as the density dD(x). For proper distributions
for Th,
f(xITh>O)=dD(x)F.
0
The derivation of the density of T,, the completed length of a spell sampled at
Y = 0 is equally straightforward. For simplicity we ignore right censoring prob-
lems so that we assume that the sampling frame is of sufficient length that all
spells are not censored and further assume that the underlying duration distribu-
tion is not defective. (But see the remarks at the conclusion of this section.)
Conditional on {x(u)}? o3 and 8 the probability that the spell began at r is
h(tlX(7+t),8)exp(-j051(21lX(7+u),8)du).
that t,> -r
X exp (-jni’h(u,x(r+u)$)du)dC(B)d7
f( fcl{44L) = . (1.515)
POW
/“,
_ c
X exp (-jb’h(ulx(7++9)du)dC(B)dD(x)dT
f(G) = (1.516)
PO
/” J+i+),
_coo v4t, - +a),e)
X exp (-f-‘h(Ulx(U+r),@)dU)dB(B)d7
f(Llc+4 “J = >
pow
(1.517)
JomJXJ$(~14~P)h(L
_ - 44L)J)
X exp (-r-‘h(~(X(u+r)$)du)dp(8)dD(x)dr
f kJ = . (1.5.18)
PO
Of special interest is the case k( r/x, t9) = k(x) in which the intake rate does not
depend on unobservables and is constant for all 7 given x, and in which x is time
1738 J. J. Heckman and B. Singer
(1.5.13’)
where
m(x)=/X/exp(-Jk(ulx,8)du)dp(8)d~.
0 e 0
This density is very similar to (1.5.3). Under the same restrictions on k and x,
(1.5.15) and (1.5.17) specialize respectively to
(1.5.15’)
which is to be compared to (1.5.6). For this special case all of the results (i)-(vi)
stated in Section 1.5.1 go through with obvious redefinition of the densities to
account for observed and unobserved variables.
It is only for the special case of k(rlx, 0) = k(r)x) with time invariant regressors
that the densities of T,, Tb and T, do mot depend on the parameters of k.
Only if h(x( u + T), 6) = h(x( u + r)), so that unobservables do not enter the
model (or equivalently that the distribution of 0 is degenerate), does k cancel in
the expression. In that case the numerator factors into two components, one of
which is the denominator of the density. “k” also disappears if it is a time
invariant constant that is functionally independent of e.*O
At issue is the plausibility of alternative specifications of k. Although nothing
can be said about this matter in a general way, for a variety of economic models,
it is plausible that k depends on 8, r and x( 7) and that the x are not time
invariant. For example, in a study of unemployment spells over the business
cycle, the onset of a spell of unemployment is the result of prior job termination
or entry into the workforce. So k is the density of the length of a spell resulting
from a prior economic decision. The same unobservables that determine unem-
ployment are likely to determine such spells as well. In addition, it is odd to
assume a time invariant general economic and person specific environment in an
analysis of unemployment spells: Aggregate economic conditions change, and
person specific variables like age, health, education and wage rates change over
time. Similar arguments can be made on behalf of a more general specification of
k for most economic models.
20We note that one “short cut” procedure frequently used does not avoid these problems. The
argument correctly notes that conditional on 0 and the start date of the sample
This expression obviously does not depend on k. The argument runs astray by integrating this
expression against dp(B) to get a marginal (with respect to 0) density. The correct density of B is not
dp( 0) and depends on k by virtue of the fact that sample 0 are generated by the selection mechanism
that an observation must be in the sample at r = 0. Precisely the same issue arises with regard to the
distribution of x in passing from (1.5.13) to (15.14). However, density (*) can be made the basis of a
simpler estimation procedure in a multiple spell setting as we note below in Section 2.2.
1740 J. J. Heckman and B. Singer
The initial conditions problem for the general model has two distinct compo-
nents.
(i) The functional form of k( 71x( r), 8) is not in general known. This includes as
a special case the possibility that for some unknown 7* < 0, k(T[x(T), 0) = 0
for 7 -Cr*. In addition, the value of r* may vary among individuals so that if
it is unknown it must be treated as another unobservable.
(ii) If x( 7) is not time invariant, its value may not be known for r < 0 so that
even if the functional form of k is known, the correct conditional duration
densities cannot be constructed.
0
/ -1, /Ie (x(7):7<0)
k(7lX(7),e)h(t,lX(t,+dJ>
Xexp ( - ~(z+(r+u)J)du)dD(x)d/@)d~
f(~,lw4~r;> = 9
PO
(1.5.19)
k(7lx(7),e)h(t,-QIx(t,),B)
dD(x)j_ot Ibx: 7 < 0)
Xexp ( - ~-‘h(ux(.+~),s)d.)do(x)dp(8)d*
f(tu,x(r)lr20)= 7
PO
(1.5.20)
It is this density and not dD(x) that is estimated using within sample data on x.
This insight suggests two further points. (1) By direct analogy with results
already rigorously established in the choice based sampling literature (see, e.g.
Manski and Lerman, 1977; Manski and McFadden, 1981, and Cosslett, 1981)
more efficient estimates of the parameters of h(tlx, e), and p(B) can be secured
using the joint densities of T, and x since the density of within sample data
depends on the structural parameters of the model as a consequence of the sample
selection rule. (2) Access to other sources of data on the x will be essential in
order to “integrate out” presample x via formulae like (1.5.19).
A partial avenue of escape from the initial conditions problem exploits T,, i.e.
durations for spells initiated after the origin date of the sample. The density of Td
” Preciselythe same phenomenonappearsin the choice based sampling literature (see, e.g. Manski
and Lerman, 1977, Manski and McFadden, 1981 and Cosslett, 1981). In fact the suggestion of
integrating out the missing data is analogous to the suggestions offered in Section 1.7 of the Manski
and McFadden paper.
1142 J. J. Heckman and B. Singer
Xexp ( - ld.‘h(ulX(*+U),B)dlI)dr(B)d7
r(LlW41:‘“) =
IwJk(71X(7),s)dp(e)d7 . (1*5*21)
0 8
Xexp ( - Jddh(ulx(7fu),B)du)dp(B)d7
f(t,A{+G}:+“) =
(1.5.22)
xexp ( - l~~~(ulx(7+u),B)du)dp(8)d~d~~
0
The denominator is the joint probability of the events 0 < Y < r* - Td and
0 < Td < T* which must occur if we are to observe a completed spell that begins
during the sampling frame 0 < Y < r *. As r* + cc, this expression is equivalent
to the density in (5.21).
The density of right censored spells that begin after the start date of the sample
is simply the joint probability of the events 0 < Y < r* and Td > 7* - .7, i.e.
P(0< 9- <~*AT~>~*--~{x(u)}~*)
= /” J’*-‘~(rix(r),e)h(t,1x(7+ t,),e) .
-52 --7
Xexp ( - /%(UIx(r + u)J)du)d&i’)di,dr
0
1744 J. J. Heekman and B. Singer
The denominator of this expression is the joint probability of the events that
- .7 < T, < T* - 7 and Y I 0. For spells sampled at r = 0 for which we observe
presample values of the duration and post-sample right censored durations, it must
be the case that (a) Y < 0 and (b) T, 2 7 * - F so the density for such spells is
The derivation of the density for T, in the presence of a finite length sample
frame is straightforward and for the sake of brevity is deleted. It is noted in
Sheps-Menken (1973) (for models without regressors) and Flinn-Heckman
(1982b) (for models with regressors) that failure to account for the sampling
frame produces the wrong densities and inference based on such densities may be
seriously misleading.
1.6. New issues that arise in formulating and estimating choice theoretic duration
models
In this section we briefly consider new issues that arise in the estimation of choice
theoretic duration models. For specificity, we focus on the model of search
unemployment in a time homogeneous environment that is presented in Section
1.2.2. Our analysis of this model serves as a prototype for a broad class of
microeconomic duration models produced from optimizing theory.
We make the following points about this model assuming that the analyst has
access to longitudinal data on I independent spells of unemployment.
1.6.1. Point A
1.6.2. Point B
1.6.3. Point C
1.6.4. Point D
“As discussed in Flinn and Heckman (1982a). some equilibrium search models place restrictions on
the functional form of F.
Ch. 29: Econometric Analysis of Longitudinal Data 1141
secured from these models are very sensitive to the choice of these functional
forms. Model identification is difficult to check and is very functional form
dependent.
(ii) In order to impose the restrictions produced by economic theory to secure
estimates, it is necessary to solve nonlinear eq. (1.2.13). Of special impor-
tance is the requirement that I’> 0. If this restriction is not satisfied, the
model cannot explain the data. If V < 0, an unemployed individual will not
search. Closed form solutions exist only for special cases and in general
numerical algorithms must be developed to impose or test these restrictions.
Such numerical analysis procedures are costly even for a simple one spell
search model and for models with more economic content often become
computationally intractable. (One exception is a dynamic McFadden model
with no restrictions between the choice and interarrival time distributions.)
(iii) Because of restrictions like (1.2.13), proportional hazard specifications (1.1.10)
are rarely produced by economic models.
1.65. Point E
In the search model without unobserved variables, the restriction that W > rV is
an essential piece of identifying information. In a model with unobservable 0
introduced in c, r, X or F, rV= rV(19) as a consequence of functional restriction
(1.2.13). In this model, the restriction that W 2 rV is replaced with an implicit
equation restriction on the support of 0; i.e. for an observation with accepted
wage W and reservation wage rV(Q, the admissible support set for 0 is
25 Kiefer and Neumann (1981) fail to impose this requirement in their discrete time structural search
model so their proposed estimator is inconsistent. See Flinn and Heckman, 1982~.
1748 J. J. Heckman and B. Singer
The single spell duration models discussed in Part I are the principal building
blocks for the richer, more behaviorally interesting models presented in this part
of the paper. Sequences of birth intervals, work histories involving movements
among employment states, the successive issuing of patents to firms and individ-
ual criminal victimization histories are examples of multiple spell processes which
require a more elaborate statistical framework than the one presented in Part I.
In this part of the paper we confine our attention to new issues that arise in the
analysis of multiple spell data. Issues such as the sensitivity of empirical estimates
to ad hoc specifications of mixing distributions and initial conditions problems
which also arise in multiple spell models are not discussed except in cases where
access to multiple spell data aid in their resolution.
This part of the paper is in two sections. In Section 2.1 we present a unified
statistical framework within which a rich variety of discrete state continuous time
processes can be formulated and analyzed. We indicate by example how special-
izations of this framework yield a variety of models, some of which already
appear in the literature. We do not present a complete analysis of multiple spell
processes including their estimation and testing on data generated by various
sampling processes because at the time of this writing too little is known about
this topic.
Section 2.2 considers in somewhat greater detail a class of multiple spell
duration models that have been developed for the analysis of event history data.
In this Section we also consider some alternative approaches to initial conditions
problems and some alternative approaches to controlling for unobserved variables
that are possible if the analyst has access to multiple spell data.
2. I. A unified framework
{ 1,. . . , CQ} as the value assumed by Y at the jth transition time. Y( 7) or R(j) is
generated by the following sequence.
(i) An individual begins his evolution in a state Y(0) = R(0) = r(0) and waits
there for a random length of time Tr governed by a conditional survivor function
P(7’I>t,~r(0))=exp(-/“h(u~~(u),r(0))du).
0
As before h(ulx(u), r(0)) is a calendar time (or age) dependent function and we
now make explicit the origin state of the process.
(ii) At time T(1) = r(l), the individual moves to a new state R(1) = r(1)
governed by a conditional probability law
I = 4)la, 40))~
which may also be age dependent.
(iii) The individual waits in state R(1) for a random length of time T, governed
by
ww = 4m4), 7m 4L 40),
R(0) = r(O), 9-(l) = r(l), R(1) = r(l), 372) = r(2), R(2) = Y(2),...
where R(k), k = 0,1,2 ,... is a discrete time stochastic process governed by the
conditional probabilities
where
h(ulx(u+,T(j-1)), t,_,)=h u[
This is a model with both occurrence dependence and lagged duration depen-
dence, where the latter is defined as dependence on lengths of preceding spells.
A final specification writes
h(ujw(u+7(j-l)),til)=h(x(u+7(j-1))).
C/I. 29: Econometric Anu!vsis of Longitudinal Duta 1751
where
lhjll = MY
where the elements of {Xi} are positive constants. Then Y( 7) is a time homoge-
neous Markov chain with constant intensity matrix
Q=A(M-I)
where
Xl 0
A=
I !
0
-.
.A,
,
where
is a two parameter family of time (T) and duration (u) dependent stochastic
matrices with each element a function 7 and u and
m;, = 0.
We further define
These two estimators are consistent, asymptotically normal, and efficient and
are independent of each other as the number of persons sampled becomes large.
There is no efficiency gain from joint estimation. The same results carry over if 11
and P(T, > tkltk_l, rk_l, T(k -1)) are parameterized (e.g. elements of I7 as a
logit, P( Tk > t,J.) as a general duration model) provided, for example, the
regressors are bounded iid random variables. The two component procedure is
efficient. However, if there are parameter restrictions connecting I7 and the
conditional survivor functions, the two component estimation procedure produces
C/L 19: Econometric Anulysis of Longiiudinnl Datu 1153
2.2. General duration models for the analysis of event history data
In this section we present a multistate duration model for event history data, i.e.
data that give information on times at which people change state and on their
transitions. We leave for another occasion the analysis of multistate models
designed for data collected by other sampling plans. This is a major area of
current research.
An equivalent way to derive the densities of duration times and transitions for
the multistate processes described in Section 2.1 that facilitates the derivation of
the likelihoods presented below is based on the exit rate concept introduced in
Part 1. An individual event history is assumed to evolve according to the following
steps.
(i) At time r = 0, an individual is in state rcO,= (i), i = 1,. . ., C. Given oc-
cupancy of state i, there are Ni I C - 1 possible destinations.28 The limit (as
At + 0) of the probability that a person who starts in i at calendar time r = 0
leaves the state in interval (tl, t, + At) given regressor path {x( ~)}a’“’ and
unobservable 8 is the conditional hazard or escape rate
lim P(t,<T,<t,+At,R(l)=jlr(,,=(i),~(O)=O,x(t,),8,T,2t,
At40 At
z h(t,,jlq,,= (i>3-(O)=O,-WJ)
/=1
=h(t,lr(O,= (i>,-qO)=o,-+,),e).
P(TI ‘t,l~~,=(i),~(O)=O,{r(u)}~,e)
f(fllrco,= (i),~(O)=O,(r(u))~,e)
=h(t,lr,,=i,~(O)=O,x(t,),e)
XP(Tl~~llr~o,=(i),~(0)=O,{x(u)}~,e)
The density of the joint event R(1) = j and T, = t, is
f(t,,jlrco,=(i),~(0)=O,{~(~)}~,e)
=h(t,(r,,=(i),~(O)=O,x(t,),e)
x P(TI >‘Ilr~,,=(i),~(0)=O,(x(~))~,e).
~f(r,,jlr~o~=(i),l(0)=O,(x(u)}~,e)
j =1
= j, j=l,..., N, is
(2.2.4)
As noted in Section 1.5, it is unlikely that the origin date of the sample
coincides with the start date of the event history. Let
be the probability density for the random variables describing the events that a
person is in state R(0) = r(0) at time Y(O) = 0 with a spell of length I,,
(measured after the start of the sample) that ends with an exit to state R(1) = r(1)
given ( x(u)}$‘) and 8. The derivation of this density in terms of the intake
density k appears in Section 1.5 (see the derivation of the density of T,). The only
new point to notice is that the h in Section 1.5 should be replaced with the
appropriate h as defined in (2.2.2). The joint density of (r(O), flo r(1)) the
completed spell density sampled at Y(O) = 0 terminating in state r(1) is defined
analogously. For such spells we write the density as
1756 J. J. Heckman and B. Singer
In a multiple spell model setting in which it is plausible that the process has
been in operation prior to the origin data of the sample, intake rate k introduced
in Section 1.5 is the density of the random variable 7 describing the event
“entered the state r(0) at time Y = 7 I 0 and did not leave the state until
Y > 0.” The expression for k in terms of exit rate (2.2.2) depends on (i)
presample values of x and (ii) the date at which the process began. Thus in
principle given (i) and (ii) it is possible to determine the functional form of k. In
this context it is plausible that k depends on 8.
The joint likelihood for r(O), t,,(l = a, c), r(l), t,, . . . , r(k), tk+l conditional on
8 and {x(u)}-m‘(k)+rk+l for a right censored k + 1st spell is
(2.2.6)
Equation (2.2.6) makes explicit that the date of onset of spell m + 1 (Y( m + 1))
depends on the durations of the preceding spells. Accordingly, in a model in
which the exit rates (2.2.2) depend on 8, the distribution of time varying x
variables (including date of onset of the spell) sampled at the start of each spell
depends on 0. Such variables are not (weakly) exogenous or ancillary in duration
regression equations, and least squares estimators of models that include such
variables are, in general, inconsistent. (See Flinn and Heckman, 1982b.) Provided
that in the population X is distributed independently of 0, time varying variables
create no econometric problem for maximum likelihood estimators based on
density (2.2.6) which accounts for the entire history of the process. However, a
maximum likelihood estimator based on a density of the lust n < k + 1 spells that
conditions on T(k +l- n) or {x(u)}?!~,‘~-“) assuming they are independent of
0 is inconsistent.
Ch. 29: Econometric Analysis of Longitudinal Data 1151
(2.2.7)
*‘The conditional likelihood cannot be used to analyze single spell data. Estimating 0 as a person
specific parameter would expl&n each single spell observation perfectly and no structural parameters
of the model would be identified.
1758 J. J. Heckman and B. Singer
0) It allows for a flexible Box-Cox hazard for (2.2.2) with scalar heterogeneity.
where j3, yi, yZ, A,, A, and c are permitted to depend on the origin state, the
destination state and the serial order of the spell. Lagged durations may be
included among the X. Using maximum likelihood procedures it is possible to
estimate all of these parameters except for one normalization of c.
(ii) It allows for general time varying variables and right censoring. The regres-
sors may include lagged durations.30
(iii) p( 6) can be specified as either normal, log normal or gamma or the NPMLE
procedure discussed in Section 1.4.1 can be used.”
(iv) It solves the left censoring or initial conditions problem by assuming that the
functional form of the initial duration distribution for each origin state is
different from that of the other spells.32
30The random effect maximum likelihood estimator based on (2.2.6) can be shown to be consistent
in the presence of 0 with lagged durations included on x.
31The NPMLE procedure of Heckman and Singer (1984b) can be shown to be consistent for
multiple spell data.
32 This procedure is identical to the procedure discussed in Section 1.5.2, using spells that originate
after the origin of the sample.
Ch. .?9: Econometric Analysis of Longitudinal Data 1759
For more details on the CTM program see Hotz (1983). For further details on
the CTM likelihood function and its derivatives, see Flinn and Heckman (1983).33
For examples of structural multispell duration models see Coleman (1983) and
Flinn and Heckman (1982a).
3. Summary
This paper considers the formulation and estimation of continuous time social
science duration models. The focus is on new issues that arise in applying
statistical models developed in biostatistics to analyze economic data and for-
mulate economic models. Both single spell and multiple spell models are dis-
cussed. In addition, we present a general time inhomogeneous multiple spell
model which contains a variety of useful models as special cases.
Four distinctive features of social science duration analysis are emphasized:
(1) Because of the limited size of samples available in economics and because of
an abundance of candidate observed explanatory variables and plausible
omitted explanatory variables, standard nonparametric procedures used in
biostatistics are of limited value in econometric duration analysis. It is
necessary to control for observed and unobserved explanatory variables to
avoid biasing inference about underlying duration distributions. Controlling
for such variables raises many new problems not discussed in the available
literature.
(2) The environments in which economic agents operate are not the time homoge-
neous laboratory environments assumed in biostatistics and reliability theory.
Ad hoc methods for controlling for time inhomogeneity produce badly biased
estimates.
(3) Because the data available to economists are not obtained from the controlled
experimental settings available to biologists, doing econometric duration
analysis requires accounting for the effect of sampling plans on the distri-
butions of sampled spells.
(4) Econometric duration models that incorporate the restrictions produced by
economic theory only rarely can be represented by the models used by
biostatisticians. The estimation of structural econometric duration models
raises new statistical and computational issues.
s31n Flinn and Heckman (1983) the likelihood is derived using a “competing risks” framework.
[See, e.g. Kalbfleisch and Prentice (1980) for a discussion of competing risits models.] This framework
is in fact inessential to their approach. A more direct approach starts with hazards (2.2.1) and (2.2.2)
that are not based on “latent failure times.” This direct approach, given hazard specification (1.3.2);
produces exactly the same estimating equations as are given in their paper.
1760 J. J. Heckmun und B. Singer
the usual object of econometric interest (Point (3)). Inference based on mis-
specified duration distributions is in general biased. New formulae for the
densities of commonly used duration measures are produced for duration models
with unobservables in time inhomogeneous environments. We show how access to
spells that begin after the origin date of a sample aids in solving econometric
problems created by the sampling schemes that are used to generate economic
duration data.
We also discuss new issues that arise in estimating duration models explicitly
derived from economic theory (Point (4)). For a prototypical search unemploy-
ment model we discuss and resolve new identification problems that arise in
attempting to recover structural economic parameters. We also consider non-
standard statistical problems that arise in estimating structural models that are
not treated in the literature. Imposing or testing the restrictions implied by
economic theory requires duration models that do not appear in the received
literature and often requires numerical solution of implicit equations derived from
optimizing theory.
References
Amemiya. T. (1981) “Qualitative Response Models: A Survey”, Journal of Economic Literature, 19,
1483-1536.
Amemiya, T. (1984) “Tobit Models: A Survey”, Journal of Econometrics, 24, l-63.
Andersen, E. B. (1973) Conditional Inference and Models for Measuring. Copenhagen: Mentalhygiej-
nisk Forlag.
Andersen, E. B. (1980) Discrete Statistical Models with Social Science Annlications.
‘1 Amsterdam:
North-Holland.
Arnold, Barry and P. Brockett (1983) “Identifiability For Dependent Multiple Decrement/Competing
Risks Models”, Scandanavian Actuarial Journal. 10. 117-127.
Baker, G. and P. Trevedi (1982) “Methods for Estimating the Duration of Periods of Unemployment”.
Australian National University Working Paper.
Barlow, R. E. and F. Proschan (1975) Statistical Theory of Reliability and Life Testing. New York:
Holt, Rinehart and Winston.
Barlow, ,R. E., D. J. Bartholomew, J. M. Bremner and H. D. Brunk (1972) Statistical Inference Under
Order Restrictions. London: Wiley.
Billingsley, P. (1961) Statistical Inference for Markov Processes. Chicago: University of Chicago Press.
Braun, H. and J. Hoem (1979) “Modelling Cohabitational Birth Intervals in the Current Danish
Population: A Progress Report”. Copenhagen University, Laboratory of Actuarial Mathematics,
working paper no. 24.
Burdett, K. and D. Mortensen (1978) “Labor Supply under Uncertainty”, in: R. Ehrenberg, ed.,
Research in Labor Economics. London: JAI Press, 2, 109-157.
Chamberlain, G. (1985) “Heterogeneity, Duration Dependence and Omitted Variable Bias”, in: J.
Heckman and B. Singer, eds., Longitudinal Analysis of labor Market Data. New York: Cambridge
University Press.
Chamberlain, G. (1980) “Comment on Lancaster and Nickell”, Journal of Royal Statistical Society,
Series A. 160.
Coleman, T. (1983) “A Dynamic Model of Labor Supply under Uncertainty”. U. of Chicago,
prcscnted at 1983 Summer Meetings of the Econometric Society, Evanston, Ill., unpublished
manuscript.
1762 J. J. Heckman and B. Singer
Cosslett, 8. (1981) “Efficient Estimation of Discrete Choice Models”, in: C. Mar&i and D.
McFadden, eds., Structurul Analysis of Discrete Data with Econometric Applications. Cambridge:
MIT Press, 41-112.
Cox, D. R. (1962) Renewal Theory. London: Methuen.
Cox, D. R. (1972) “Regression Models and Lifetables”, Journal of the Royal Statistical Society, Series
B, 34, 187-220.
Cox, D. R. and D. Hinkley (1974) Theoretical Statistics. London: Chapman and Hall.
Cox, D. R. and P. A. W. Lewis (1966) The Statistical Analysis of u Series of Events. London: Chapman
and Hall.
Cox, D. R. and D. 0. Oakes (1984) Analysis of Survival Data. London: Chapman and Hall.
DeGroot, M. (1970) Optimal Statisticul Decisions. New York: McGraw-Hill.
Domencich, T. and D. McFadden (1975) Urban Travel Demand. Amsterdam: North-Holland.
Elbers, C. and G. Ridder (1982) “True and Spurious Duration Dependence: The Identifiability of the
Proportional Hazard Model”, Review of Economic Studies, 49, 403-410.
Feller, W. (1970) An Introduction to Probability Theory and Its Applications. New York: Wiley, Vol. I,
third edition.
Feller, W. (1971) An Introduction to Probability Theory and Its Applications. New York: Wiley, Vol. II.
Flinn, C. and J. Heckman (1982a) “New Methods for Analyzing Structural Models of Labor Force
Dynamics”, Journul of Econometrics, 18, 115-168.
Flinn, C. and J. Heckman (1982b) “Models for the Analysis of Labor Force Dynamics”, in: R.
Basmann and G. Rhodes, eds., Advances in Econometrics, 1, 35-95.
Flinn, C. and J. Heckman (1982~) “Comment on ‘Individual Effects in a Nonlinear Model: Explicit
Treatment of Heterogeneity in the Empirical Job Search Literature”‘, unpublished manuscript,
University of Chicago.
Flinn, C. and J. Heckman (1983) “The Likelihood Function for the Multistate-Multiepisode Model in
‘Models for the Analysis of Labor Force Dynamics”‘, in: R. Basmann and G. Rhodes, eds.,
Advances in Econometrics. Greenwich: JAI Press, 3.
Hartigan, J. and P. Hartigan (1985) “The Dip Test for Unimodalities”, The Annals of Statistics, 13(l),
70-84.
Hauser, J. R. and K. Wisniewski (1982a) “Dynamic Analysis of Consumer Response to Marketing
Strategies”, Munagement Science, 28, 455-486.
Hauser, J. R. and K. Wisniewski (1982b) “Application, Predictive Test and Strategy Implications for a
Dynamic Model of Consumer Response”, Marketing Science, 1, 143-179.
Heckman, J. (1981a) “Statistical Models for Discrete Panel Data”, in: C. Manski and D. McFadden
eds., The Structural Analysis of Discrete Data. Cambridge: MIT Press.
Heckman, J. (1981b) “The Incidental Parameters Problem and the Problem of Initial Conditions in
Estimating a Discrete Time-Discrete Data Stochastic Process”, in: C. Manski and D. McFadden,
eds., Structurul Analysis of Discrete Data with Economic Applications. Cambridge: MIT Press,
179-197.
Heckman, J. (1974) “Shadow Prices, Market Wages and Labor Supply”, Econometrica, 42(4),
679-694.
Heckman, J. and G. Borjas (1980) “Does Unemployment Cause Future Unemployment? Definitions,
Questions and Answers from a Continuous Time Model of Heterogeneity and State Dependence”,
Economica, 47, 247-283.
Heckman, J. and B. Singer (1982) “The Identification Problem in Econometric Models for Duration
Data”, in: W. Hildenbrand, ed., Advances in Econometrics. Proceedings of World Meetings of the
Econometric Society, 1980. Cambridge: Cambridge University Press.
Heckman, J. and B. Singer (1983) “The Identifiability of Nonproportional Hazard Models”. Univer-
sity of Chicago, unpublished manuscript.
Heckman, J. and B. Singer (1984a) “The Identifiability of the Proportional Hazard Model”, Review of
Economic Studies, 51(2), 231-243.
Heckman, J. and B. Singer (1984b) “A Method for Minimizing the Impact of Distributional
Assumptions in Econometric Models for duration Data”, Econometrica, 52(2), 271-320.
Hoem. J. (1972) “Inhomogeneous Semi-Markov Processes, Select Actuarial Tables and Duration
Dependence in Demography”, in: T. Greville, ed., Population Dynamics. New York: Academic
Press, 251-296.
Ch. 29: Econometric Analysis of Longitudinal Data 1763
Hotz. J. (1983) “Continuous Time Models (CTM): A Manual”. GSIA, Pittsburgh: Carnegie-Melon
University.
Jovanovic, B. (1979) “Job Matching and the Theory of Turnover”, Journal of Political Economy,
October, 87, 972-990.
Kac, M. (1959) Probability and Related Topics in the Physical Science. New York: Wiley.
Kalbfleisch, J. and R. Prentice (1980) The Siatistical Analysis of Failure Time Data. New York: Wiley.
Kieter. N. and G. Neumann (1981) ~ I
“Individual Effects in a Nonlinear Model”, Econometrica. 49(4). ,/
965-980.
Lancaster. T. and S. Nickel1 (1980) “The Analysis of Reemployment Probabilities for the Unem-
ployed”, Journal of the Royal Statistical Society, Series A, 143, 141-165.
Lawless, J. F. (1982) Statistical Models and Methods for Lifetime Data. New York: Wiley.
Lewis, H. G. (1974) “Comments on Selectivity Biases in Wage Comparisons”, Journal of Political
Economy, November, 82(6), 1145-1156.
Lindsey, B. (1983a) “The Geometry of Mixture Likelihoods, Part I”, Annals of Statistics, 11, 86-94.
Lindsey, B. (1983b) “The Geometry of Mixture Likelihoods, Part II”, Annals of Statistics, 11(3),
783-792.
Lippman, S. and J. McCall (1976) “The Economics of Job Search: A Survey”, Economic Inquiry,
September, 14, 113-126.
Lundberg, F. (1903) “I. Approximerad Framstallmng af Sannolikhetsfunktionen II. Aterforsakring af
Kollektivrisker”. Uppsala: Almquist und Wicksell.
Lundberg, S. (1981) “The Added Worker: A Reappraisal”. NBER Working Paper no. 706, Cam-
bridge, Mass.
Manski, C. and D. McFadden (1981) “Alternative Estimators and Sample Designs for Discrete Choice
Analysis”, in: C. Manski and D. McFadden, Structural Analysis of Discrete Data with Econometric
Applications. Cambridge: MIT Press, 2-50.
Manski, C. and S. Lerman (1977) “The Estimation of Choice Probabilities from Choice Based
Samples”, Econometrica, 45, 1977-1988.
McFadden, D. (1974) “Conditional Logit Analysis of Qualitative Choice Behavior”, in: P. Zarembka,
ed., Frontiers in Econometrics. New York: Academic Press.
Moore, E. and R. Pyke (1968) “Estimation of the Transition Distributions of a Markov Renewal
Process”, Annals of the Institute of Statistical Mathematics. Tokyo, 20(3), 411-424.
Neyman, J. and E. Scott (1948) “Consistent Estimates Based on Partially Consistent Observations”,
Econometrica, 16, l-32.
Robb, R. (1984) “Two Essays on the Identification of Economic Models”. University of Chicago,
May, unpublished manuscript.
Robbins. H. (1970) “Optimal Stopping”, American Mathematical Monthly, 77, 333-43.
Ross, S. M. (1970) Applied Probability Models with Optimization Applications. San Francisco: Holden-
Day.
Rudin, W. (1974) Real and Complex Analysis. New York: McGraw Hill.
Salant, S. (1977) “Search Theory and Duration Data: A Theory of Sorts”, Quarterly Journal of
Economics, February, 91, 39-57.
Sheps, M. and J. Menken (1973) Mathematical Models of Conception and Birth. Chicago: University of
Chicago Press.
Shohat, J. and J. Tamarkin (1943)/ The Problem of Moments. New York: American Mathematical
Society.
Singer, B. (1982) “Aspects of Nonstationarity”, Journal of Econometrics, 18(l), 169-190.
Trusseli, J. and T. Richards (1985) “Correcting for Unobserved Heterogeneity in Hazard Models: An
Application of the Heckman-Singer Procedure, in N. Tuma, Sociological Methodology. San Fran-
cisco: Jossey Bass.
1768 A. Deaton
0. Introduction
The empirical analysis of consumer behavior has always held a central position in
econometrics and many of what are now standard techniques were developed in
response to practical problems in interpreting demand data. An equally central
position in economic analysis is held by the theory of consumer behavior which
has provided a structure and language for model formulation and data analysis.
Demand analysis is thus in the rare position in econometrics of possessing long
interrelated pedigrees on both theoretical and empirical sides. And although the
construction of models which are both theoretically and empirically satisfactory is
never straightforward, no one who reads the modem literature on labor supply,
on discrete choice, on asset demands, on transport, on housing, on the consump-
tion function, on taxation or on social choice, can doubt the current vigor and
power of utility analysis as a tool of applied economic reasoning. There have been
enormous advances towards integration since the days when utility theory was
taught as a central element in microeconomic courses but then left unused by
applied economists and econometricians.
Narrowly defined, demand analysis is a small subset of the areas listed above,
referring largely to the study of commodity demands by consumers, most usually
based on aggregate data but occasionally, and more so recently, on cross-sections
or even panels of households. In this chapter, I shall attempt to take a somewhat
broader view and discuss, if only briefly, the links between conventional demand
analysis and such topics as labor supply, the consumption function, rationing,
index numbers, equivalence scales and consumer surplus. Some of the most
impressive recent econometric applications of utility theory are in the areas of
labor supply and discrete choice, and these are covered in other chapters. Even so,
a very considerable menu is left for the current meal. Inevitably, the choice of
material is my own, is partial (in both senses), and does not pretend to be a
complete survey of recent developments. Nor have I attempted to separate the
economic from the statistical aspects of the subject. The strength of consumer
demand analysis has been its close articulation of theory and evidence and the
theoretical advances which have been important (particularly those concerned
with duality) have been so precisely because they have permitted a more intimate
contact between the theory and the interpretation of the evidence. It is not
possible to study applied demand analysis without keeping statistics and ew-
nomic theory simultaneously in view.
The layout of the chapter is as follows. Section 1 is concerned with utility and
the specification of demand functions and attempts to review the theory from the
Ch. 30: Demand Analysis 1169
objection to the assumption that preferences are conuex, i.e. that for qA z qB, and
for 0 I X I 1, AqA + (1 - A)qB 2 qB. This translates immediately into quasi-con-
cavity of the utility function u(q), i.e. for qA, qB, 0 I A 5 1,
41
Figure 1 Indifference curves illustrating quasi-concavity, differentiability and essential
goods.
Ch. 30: Demand Analysis 1771
ued demand functions. Empirically, flats are important because they represent
perfect substitutes; for example, between S and T on B, the precise combination
of q1 and q2 makes no difference and this situation is likely to be relevant, say,
for two varieties of the same good. Non-differentiabilities occur at the kink points
on the curves B and C. With a linear budget constraint, kinks imply that for
relative prices within a certain range, two or more goods are bought in fixed
proportions. Once again, this may be practically important and fixed relationships
between complementary goods are often a convenient and sensible modelling
strategy. The n-dimensional analogue of the utility function corresponding to C is
the fixed coefficient or Leontief utility function
(2)
For positive parameters ai,. . . , a,. Finally curve A illustrates the situation where
q2 is essential but q1 is not. As q2 tends to zero, its marginal value relative to that
of q1 tends to infinity along any given inditIerence curve. Many commonly used
utility functions impose this condition which implies that q2 is always purchased
in positive amounts. But for many goods, the behavior with respect to q1 is a
better guide; if p1 > p&l, the consumer on indifference curve A buys none of ql.
Data on individual households always show that, even for quite broad commodity
groups, many households do not buy all goods. It is therefore necessary to have
models that can deal with this fact. _
au@!_ = xp,
1) (3)
8%
which, under the given assumptions, solve for the demand functions
u = IT(q, - #, (5)
1772 A. Deaton
for parameters y and 8, the first-order conditions of which are readily solved to
give the demand functions
Pi4i=PiYi+BiBi(x-P*Y)* (6)
In practice, the first-order conditions are rarely analytically soluble even for quite
simple formulations (e.g. Houthakker’s (1960) “direct addilog” u = &qp), nor
is it at all straightforward to pass back from given demand functions to a closed
form expression for the utility function underlying them, should it indeed exist.
The generic properties of demands are frequently derived from (3) by total
differentiation and matrix inversion to express dq as a function of dx and dp, the
so-called “fundamental matrix equation” of consumer demand analysis, see
Barten (1966) originally and its frequent later exposition by The& e.g. (1975b, pp.
14lI), also Phlips (1974, 1983, p. 47), Brown and Deaton (1972, pp. 1160-2).
However, such an analysis requires that u(q) be twice-differentiable, and it is
usually assumed in addition that utility has been monotonically transformed so
that the Hessian is non-singular and negative definite. Neither of these last
assumptions follows in any natural way from reasonable axioms; note in particu-
lar that is is not always possible to transform a quasi-concave function by means
of a monotone increasing function into a concave one, see Kannai (1977) Afriat
(1980). Hence, the methodology of working through first-order conditions in-
volves an expansive and complex web of restrictive and unnatural assumptions,
many of which preclude consideration of phenomena requiring analysis. Even in
the hands of experts, e.g. the survey by Barten and Bohm (1980) the analytical
apparatus becomes very complex. At the same time, the difficulty of solving the
conditions in general prevents a close connection between preferences and
demand, between the a priori and the empirical.
There are many different ways of representing preferences and great convenience
can be obtained by picking that which is most appropriate for the problem at
hand. For the purposes of generating empirically useable models in which
quantities are a function of prices and total expenditure, dual representations are
typically most convenient. In this context, duality refers to a switch of variables,
from quantities to prices, and to the respecification of preferences in terms of the
latter. Define the cost function, sometimes expenditure function, by
c(u,p) =x.
WG P)
aPi
=h,(u,p)=q,. (9)
u=#(x9P), 00)
and is known as the indirect utility function. (The original function u(q) is the
direct utility function and the two are linked by the identity \I,(x, p) = u { g(x, p)}
for utility m aximizmg demands g(x, p)). Substituting (10) into (9) yields
qi=hi(u,p)=hi{rC,(x,P),P}=gi(x,P), 01)
1774 A. Dearon
ac(aud.p)
=gi{c(u,p),P)9 (12)
I
which may be solved for c(u, p) provided the mathematical integrability condi-
tions are satisfied. These turn out to be equivalent to Slutsky symmetry, so that
demand functions displaying symmetry always imply some cost function, see, for
example, Hurwicz and Uzawa (1971) for further details. If the Slutsky matrix is
also negative semi-definite (together with symmetry, the ‘economic’ integrability
condition), the cost function will be appropriately concave which it must be to
represent preferences. This possibility, of moving relatively easily between prefer-
ences and demands, is of vital importance if empirical knowledge is to be linked
to economic theory.
An alternative and almost equally straightforward procedure is to start from
the indirect utility function J/(x, p). This must be zero degree homogeneous in x
and p and quasi-convex in p and Shephard’s Lemma takes the form
- a4(x? P)/aPi
4i=gi(x3P)= (13)
Wk PVJX ’
Hence,
wi = ai + &ln X, 05)
for parameters (Y and /3, generally functions of prices, and this form was
supported in later comparative tests by Leser (1963). From (14) the budget shares
are the logarithmic derivatives of the cost function, so that (15) corresponds to
differential equations of the form
alnc(u, P)
alnp, = ai( P>+ai( P)lnctu, P)9 06)
lnc(u,p)=ulnb(p)+(l-u)lna(p), (17)
where (ui(p) = (ailn b - biln a)/@ b -In a) and pi(p) = bi/(ln b -In a) for ai
= 8 In u/8 In pi and bi = d In b/a In pi. The form (17) gives the cost function as a
utility-weighted geometric mean of the linear homogeneous functions u(p) and
b(p) representing the cost functions of the very poor (U = 0) and the very rich
(U = 1) respectively. Such preferences have been called the PIGLOG class by
Muellbauer (1975b); (1976a), (1976b). A full system of demand equations within
the Working-Leser class can be generated by suitable choice of the functions
b(p) and u(p). For example, if
lnu(p)=u,+C~klnPk+~CCYk*mlnpklnPm,
k m
08)
lnb(p) =ha(p)+&flpfi,
we reach the “almost ideal demand system” (AIDS) of Deaton and Muellbauer
(1980b) viz
and yij = +(yiT + y,:). A variation on the same theme is to replace the geometric
mean (17) by a mean of order E
w, = ai + pix-‘. (21)
This is Muellbauer’s PIGL class; equation (21) in an equivalent Box-Cox form,
has recently appeared in the literature as the “generalized Working model”, see
Tran van Hoa, Ironmonger, and Manning (1983) and Tran van Hoa (1983).
I shall return to these and similar models below, but for the moment note how
the construction of these models allows empirical knowledge of demands to be
built into the specification of preferences. This works at a less formal level too.
For example, prior information may relate to the shape of indifference curves, say
that two goods are poor substitutes or very good substitutes as the case may be.
This translates directly into curvature properties of the cost function; ‘kinks’ in
quantity space turn into ‘flats’ in price space and vice versa so that the specifica-
tion can be set accordingly. For further details, see the elegant diagrams in
McFadden (1978).
The duality approach also provides a simple demonstration of the generic
properties of demand functions which have played such a large part in the testing
of consumer rationality, see Section 2 below. The budget constraint implies
immediately that the demand functions add-up (trivially) and that they are
zero-degree homogeneous in prices and total expenditure together (since the
budget constraint is unaffected by proportional changes in p and x). Shephard’s
Lemma (9) together with the mild regularity conditions required for Young’s
Theorem implies that
so that, if sij, the Slutsky substitution term is ah,/ap,, the matrix of such terms,
S, is symmetric. Furthermore, since c( u, p) is a concave function of p, S must be
negative semi-definite. (Note that the homogeneity of c( u, p) implies that p lies in
the nullspace of S). Of course, S is not directly observed, but it can be evaluated
using (12); differentiating with respect to pj gives the Slutsky equation.
'ij=
GTi
G +
agi
xqj. (23)
Hence to the extent that agi/apj and ag,/ax can be estimated econometrically,
symmetry and negative semi-definiteness can be checked. I shall come to practical
attempts to do so in the next section.
Ch. 30. Demand Analysis 1-m
Pi4i i3u/alnq,
-= (24)
x Cau/alnq,’
k
which is the dual analogue of (14), though now determination goes from the
quantities q to the normalized prices p/x. Alternatively, define the distance
function d(u, q), dual to the cost function, by
The distance function has properties analogous to the cost function and, in
particular,
(26)
are the inverse compensated demand functions relating an indifference curve u
and a quantity ray q to the price to income ratios at the intersection of q and u.
See McFadden (1978), Deaton (1979) or Deaton and Muellbauer (1980a, Chapter
2.7) for fuller discussions.
Compensated and uncompensated inverse demand functions can be used in
exactly the same way as direct demand functions and are appropriate for the
analysis of situations when quantities are predetermined and prices adjust to clear
the market. Hybrid situations can also be analysed with some prices fixed and
some quantities fixed; again see McFadden (1978) for discussion of “restricted”
preference representation functions. Note one final point, however. The Hessian
matrix of the distance function d(u, q) is the Antonelli matrix A with elements
a*d aai(u, 4)
a..=-=a..=
aqiaqj (27)
1~ ~1 aqj ’
which can be used to define q-substitutes and q-complements just as the Slutsky
matrix defines p-substitutes and p-complements, see Hicks (1956) for the original
discussion and derivations. Unsurprisingly the Antonelli and Slutsky matrices are
intimately related and given the close parallel been duality and matrix inversion,
1778 A. Decrton
S*=S*AS*. (29)
Similarly,
A= AS*A, (30)
A= (xS+qq’)-‘-~-~pp’, (31)
with primes denoting transposition, see Deaton (1981a). The Antonelli matrix has
important applications in measuring quantity index numbers, see, e.g. Diewert
(1981, 1983) and in optimal tax theory, see Deaton (1981a). Formula (31) allows
its calculation from an estimate of the Slutsky matrix.
This brief review of the theory is sufficient to permit discussion of a good deal
of the empirical work in the literature. Logically, questions of aggregation and
separability ought to be treated first, but since they are not required for an
understanding of what follows, I shall postpone their discussion to Section 4.
the class
for commodity i on observation t, parameter vector b, and error uil. For the
linear expenditure system the function takes the form
2.1. Simultaneity
where the yir and pi, parameters are now specific to periods (needs vary over the
life-cycle), W is the current present discounted value of present and future
income and current financial assets, and p:k is the current discounted price of
good k in future period r( p:k = ptk since t is the present). As with any such
system based on intertemporally separable preferences, see Section 4 below, (34)
can be solved for x, by summing the left-hand side over i and the result, i.e. the
consumption function, used to substitute for W. Hence (34) implies the familiar
1780 A. Deaton
cov(x f,u It
)=Cu. rk -p”
P
c&,,, (36)
k ’ k m
where aii is the (assumed constant) covariance between uif and Ujt, i.e.
where urs is the Kronecker delta. Clearly, the covariance in (36) is zero if
~k”ik/~ukrn = pi,/&.One specialized theory which produces exactly this rela-
tionship is Theil’s (1971b, 1974,1975a, 1975b, pp. 56-90,1979) “rational random
behaviour” under which the variance, covariance matrix of the errors u,, is
rendered proportional to the Slutsky matrix by consumers’ trading-off the costs of
exact maximization against the utility losses of not doing so. If this model is
correct, there is no simultaneity bias, see Deaton (1975a, pp. 161-8) and Theil
(1976, pp. 4-6, 80-82) for applications. However, most econometricians would
tend to view the error terms as reflecting, at least in part, those elements not
allowed for by the theory, i.e. misspecifications, omitted variables and the like.
Even so, it is not implausible that (36) should be close to zero since the
requirement is that error covariances between each category and total expenditure
should be proportional to the marginal propensity to spend for that good. This is
a type of “error separability” whereby omitted variables influence demands in
much the same way as does total outlay.
In general, simultaneity will exist and the issue deserves to be taken seriously; it
is likely to be particularly important in cross-section work, where occasional large
purchases affect both sides of the Engel curve. Ignoring it may also bias the other
tests discussed below, see Altfield (1985).
Ch. 30: Demand Analysis 1781
The second problem arises from the fact that with x, dejked as the sum of
expenditures, expenditures automatically add-up to total expenditure identically,
i.e. without error. Hence, provided fi in (32) is properly chosen, we must have
(41)
1782 A. Dearon
where u(n) is the (n - 1)-vector of uit excluding element n. Barten defines a new
non-singular matrix V by
where i is the normalized vector of units, i.e. ii =1/n, and 0 < K < co. Then (41)
may be shown to be equal to
for serially uncorrelated errors Ed,.If R is the diagonal matrix of pi’s, (44) implies
that
52=RS2R+I, (45)
autoregressive structures, as, for example, in Guilkey and Schmidt (1973) and
Anderson and Blundell(1982). But provided autocorrelation is handled in a way
that respects the singularity (as it should be), so that the omitted equation is not
implicitly treated differently from the others, then it will always be correct to
estimate by dropping one equation since all the relevant information is contained
in the other (n - 1).
2.3. Estimation
(46)
where tik’ is the (k, I)th element of 0-l. These equations also define the linear or
non-linear GLS estimator. Since D is usually unknown, it can be replaced by its
maximum likelihood estimator,
If ij,, replaces wij in (47) and (47) and (48) are solved simultaneously, fi and b
are the full-information maximum likelihood estimators (FIML). Alternatively,
some consistent estimator of /3 can be used in place of b in (48) and the resulting
b used in (47); the resulting estimates of /3 will be asymptotically equivalent to
FIML. Zellner’s (1962) seemingly unrelated regression technique falls in this class,
1784 A. Deaton
see also Gallant (1975) and the survey by Srivastava and Dwivedi (1979) for
variants. Consistency of estimation of 4 in (47) is unaffected by the choice of 0;
the MLE’s of /3 and 52 are asymptotically independent, as calculation of the
information matrix will show. All this is standard enough, except possibly for
computation, but the use of standard algorithms such as those of Marquardt
(1963), scoring, Berndt, Hall, Hall and Hausman (1974) Newton-Raphson,
Gauss-Newton all work well for these models, see Quandt (1984) in this Handbook
for a survey. Note also Byron’s (1982) technique for estimating very large
symmetric systems.
Nevertheless, there are a number of problems, particularly concerned with the
estimation of the covariance matrix 9, and these may be severe enough to make
the foregoing estimators undesirable, or even infeasible. Taking feasibility first,
note that the estimated covariance matrix b given by (48) is the mean of T
matrices each of rank 1 so that its rank cannot be greater than T. In consequence,
systems for which (n - 1) > T cannot be estimated by FIML or SURE if the
inverse of the estimated b is required. Even this underestimates the problem. In
the linear case (e.g. the Rotterdam system considered below) the demand system
becomes the classical multivariate regression model
Y=XB+U, (49)
with Ya(TX(n’l))matrix, Xa(TXK)matrix, B(kX(n-l))andU(TX(n
- 1)). (The nth equation has been dropped). The estimated variance-covariance
matrix from (48) is then
b= +YfI- x(xtx)-'xt)y.
Now the idempotent matrix in backets has rank (T - k) so that the inverse will
not exist if n - 1 > T - k. Since X is likely to contain at least n + 2 variables
(prices, the budget and a constant), an eight commodity system would require at
least 19 observations. Non-linearities and cross-section restrictions can improve
matters, but they need not. Consider the following problem, first pointed out to
me by Teun Kloek. The AIDS system (19) illustrates most simply, though the
problem is clearly a general one. Combine the two parts of (19) into a single set of
equations,
total of (2 + n)(n - 1) parameters -(n -1) (Y’Sand /3’s, and n(n-1) y’s-or
(n + 2) per equation as in the previous example. But now, each equation has
2 + (n - 1)n parameters since all y’s always appear. In consequence, if the
constant, ln x, ln p, and the cross-terms are linearly independent in the sample,
and if T < 2 + (n - l)n, it is possible to choose parameters such that the calcu-
lated residuals for any one (arbitrarily chosen) equation will be exactly zero for all
sample points. For these parameters, one row and one column of the estimated b
will also be zero, its determinant will be zero and the log likelihood (41) or (43)
will be infinite. Hence full information MLE’s do not exist. In such a case, at least
56 observations would be necessary to estimate an 8 commodity disaggregation.
All these cases are variants of the familiar “ undersized sample” problem in FIML
estimation of simultaneous equation systems and they set upper limits to the
amount of commodity disaggregation that can be countenanced on any given
time-series data.
Given a singular variance-covariance matrix, for whatever reason, the log
likelihood (41) which contains the term - T/2 logdet 9, will be infinitely large
and FIML estimates do not exist. Nor, in general, can (47) be used to calculate
GLS or SURE estimators if a singular estimate of D is employed. However, there
are a number of important special cases in which (47) has solutions that can be
evaluated even when ti is singular (though it is less than clear what is the status of
these estimators). For example, in the classical multivariate regression model (49)
the solution to (47) is the OLS matrix estimator B = (X’X)-‘X’Y which does not
involve s2, see e.g. Goldberger (1964, pp. 207-12). Imposing identical within
equation restrictions on (49), e.g. homogeneity, produces another (restricted)
classical model with the same property. With cross-equation restrictions of the
form R/3 = r, e.g. symmetry, for stacked j3, fi, the solution to (47) is
which, though involving 52, can still be calculated with Q singular provided the
matrix in square brackets is non-singular. I have not been able to find the general
conditions on (47) that allow solutions of this form, nor is it clear that it is
important to do so. General non-linear systems will not be estimable on under-
sized samples, and except in the cases given where closed-form solutions exist,
attempts to solve (47) and (48) numerically will obviously fail.
The important issue, of course, is the small sample performance of estimators
based on near-singular or singular estimates of Q. In most time series applications
with more than a very few commodities, fi is likely to be a poor estimator of s2
and the introduction of very poor estimates of 52 into the procedure for parame-
ter estimation is likely to give rise to extremely inefficient estimates of the latter.
Paradoxically, the search for (asymptotic) efficiency is likely to lead, in this case,
1786 A. Deaton
(53)
where uk’ is the (k, I)th element of X1, so that { A(@} -’ is the conventionally
used (asymptotic) variance-covariance matrix of the FIML estimates p from (47).
Define also B(& s2) by
(54)
(55)
It is perhaps not surprising that authors who finally surmounted the obstacles in
the way of estimating systems of demand equations should have professed
themselves satisfied with their hard won results. Mountaineers are not known for
criticising the view from the summit. And certainly, models such as the linear
expenditure system, or which embody comparably strong assumptions, yield very
high R* statistics for expenditures or quantities with t-values that are usually
closer to 10 than to unity. Although there are an almost infinite number of studies
using the linear expenditure system from which to illustrate, almost certainly the
most comprehensive is that by Lluch, Powell and Williams (1977) who fit the
model (or a variant) to data from 17 developed and developing countries using an
eightfold disaggregation of commodities. Of the 134 R2 statistics reported (for 2
countries 2 of the groups were combined) 40 are greater than 0.99,104 are greater
than 0.95 and only 14 are below 0.90. (Table 3.9 p. 49). The parameter estimates
nearly ah “look sensible” and conform to theoretical restrictions, i.e. marginal
propensities to consume are positive yielding, in the case of the linear expenditure
system, a symmetric negative semi-definite Slutsky matrix. However, as is almost
invariably the case with the linear expenditure system, the estimated residuals
display substantial positive autocorrelation. Table 3.10 in Lluch, Powell and
Williams displays Durbin-Watson statistics for all countries and commodities: of
the 134 ratios, 60 are less than 1.0 and only 15 are greater than 2.0. Very similar
results were found in my own, Deaton (1975a), application of the linear expendi-
ture system to disaggregated expenditures in post-war Britain. Such results
suggest that the explanatory power of the model reflects merely the common
upward time trends in individual and total expenditures. The estimated j3
parameters in (33), the marginal propensities to consume, will nevertheless be
sensible, since the model can hardly fail to reflect the way in which individual
expenditures evolve relative to their sum over the sample as a whole. Obtaining
sensible estimates of marginal propensities to spend on time-series data is not an
onerous task. Nevertheless, the model singularly fails to account for variations
around trend, the high R* statistics could be similarly obtained by replacing total
expenditure by virtually any trending variable, and the t-values are likely to be
grossly overestimated in the presence of the very severe autocorrelation, see, e.g.
Malinvaud (1970, pp. 521-2) and Granger and Newbold (1974). In such cir-
cumstances, the model is almost certainly a very poor approximation to whatever
process actually generated the data and should be abandoned in favor of more
appropriate alternatives. It makes little sense to “treat” the autocorrelation by
transforming the residuals by a Cochrane-Orcutt type technique, either based on
(44) with a common parameter, or using a full vector autoregressive specification.
[See Hendry (1980) for some of the consequences of trying to do so in similar
situations.]
1788 A. Demon
In spite of its clear misspecifications, there may nevertheless be cases where the
linear expenditure system or a similar model may be the best that can be done.
Because of its very few parameters, (2n - 1) for an n commodity system, it can be
estimated in situations (such as the LDC’s in Lluch, Powell and Williams book)
where data are scarce and less parsimonious models cannot be used. In such
situations, it will at the least give a theoretically consistent interpretation of the
data, albeit one that is probably wrong. But in the absence of alternatives, this
may be better than nothing. Even so, it is important that such applications be
seen for what they are, i.e. untested theory with “sensible” parameters, and not as
fully-tested data-consistent models.
The immediately obvious problem with the linear expenditure system is that it has
too few parameters to give it a reasonable chance of fitting the data. Referring
back to (33) and dividing through by pi, it can be seen that the y, parameters are
essentially intercepts and that, apart from them, there is only one free parameter
per equation. Essentially, the linear expenditure system does little more than fit
bivariate regressions between individual expenditures and their total. Of course,
the prices also enter the model but all own- and cross-price effects must also be
allowed for within the two parameters per equation, one of which is an intercept.
Clearly then, in interpreting the results from such a model, for example, total
expenditure elasticities, own and cross-price elasticities, substitution matrices, and
so on, there is no way to sort out which numbers are determined by measurement
and which by assumption. Certainly, econometric analysis requires the applica-
tion of prior reasoning and theorizing. But it is not helped if the separate
influences of measurement and assumption cannot be practically distinguished.
Such difficulties can be avoided by the use of what are known as “flexible
functional forms,” Diewert (1971). The basic idea is that the choice of functional
form should be such as to allow at least one free parameter for the measurement
of each effect of interest. For example, the basic linear regression with intercept is
a flexible functional form. Even if the true data generation process is not linear,
the linear model without parameter restrictions can offer a first-order Taylor
approximation around at least one point. For a system of (n - 1) independent
demand functions, (n - 1) intercepts are required, (n - 1) parameters for the total
expenditure effects and n(n - 1) for the effects of the n prices. Bamett (1983b)
offers a useful discussion of how Diewert’s definition relates to the standard
mathematical notions of approximation.
Flexible functional form techniques can be applied either to demand functions
or to preferences. For the former, take the differential of (9) around some
Ch. 30: Demand Analysis 1789
so that writing dq, = qidlnq, and multiplying (56) by pi/x, the approximation
becomes
where
ai = p,h,,/x
cij = PiSijPj/X.
Eq. (58), with ai, bi and cij parametrized, is the Rotterdam system of Barten
(1966), (1967), (1969) and Theil (1965), (1975b), (1976). It clearly offers a lo-
cal first-order approximation to the underlying relationship between q, x
and p.
There is, of course, no guarantee that a function hi( u, p) exists which has ai, bi
and cij constant. Indeed, if it did, Young’s theorem gives hiuj = hij, which, from
(59), is easily seen to hold only if cij = - ( ijijbi - bibi). If imposed, this restriction
would remove the system’s ability to act as a flexible functional form. (In fact, the
restriction implies unitary total expenditure and own-price elasticities). Contrary
to assertions by Phlips (1974,1983), Yoshihara (1969), Jorgenson and Lau (1976)
and others, this only implies that it is not sensible to impose the restriction; it
does not affect the usefulness of (58) for approximation and study of the true
demands via the approximation, see also Barten (1977) and Barnett (1979b).
Flexible functional forms can also be constructed by approximating preferences
rather than demands. By Shephard’s Lemma, an order of approximation in prices
(or quantities) - but not in utility- is lost by passing from preferences to de-
mands, so that in order to guarantee a first-order linear approximation in the
latter, secondorder approximation must be guaranteed in preferences. Beyond
1790 A. Deaton
that, one can freely choose to approximate the direct utility function, the indirect
utility function, the cost-function or the distance function provided only that the
appropriate quasi-concavity, quasi-convexity, concavity and homogeneity restric-
tions are observed. The best known of these approximations is the trunslog,
Sargan (1971) Christensen, Jorgenson and Lau (1975) and many subsequent
applications. See in particular Jorgenson, Lau and Stoker (1982) for a comprehen-
sive treatment. The indirect translog gives a quadratic approximation to the
indirect function J/*(r) for normalized prices, and then uses (14) to derive the
system of share equations. The forms are
~*(r)=~,+C~,1nr,+3CCp,*,lnr,lnrj (60)
k i
(61)
where pij = i( pi; + /?;T). In estimating (61) some normalization is required, e.g.
that Ca, = 1. The direct translog approximates the direct utility function as a
quadratic in the vector q and it yields an equation of the same form as (61) with
w, on the left-hand side but with qi replacing r, on the right. Hence, while (61)
views the budget share as being determined by quantity adjustment to exogenous
price to outlay ratios, the direct translog views the share as adapting by prices
adjusting to exogenous’ quantities. Each could be appropriate under its own
assumptions, although presumably not on the same set of data. Yet another
flexible functional form with close affinities to the translog is the second-order
approximation to the cost function offered by the AIDS, eqs. (17) (18) and (19)
above. Although the translog considerably predates the AIDS, the latter is a good
deal simpler to estimate, at least if the price index In P can be adequately
approximated by some fixed pre-selected index.
The AIDS and translog models yield demand functions that are first-order
flexible subject to the theory, i.e. they automatically possess symmetric substitu-
tion matrices, are homogeneous, and add up. However, trivial cases apart, the
AIDS cost function will not be globally concave nor the translog indirect utility
function globally convex, though they can be so over a restricted range of r (see
below). The functional forms for both systems are such that, by relaxing certain
restrictions, they can be made first-order flexible without theoretical restrictions,
as is the Rotterdam system. For example, in the AIDS, eq. (19) the restrictions
yii = yji and +,, = 0 can be relaxed while, in the indirect translog, eq. (61)
Pij = Pii can be relaxed and In x included as a separate variable without neces-
sarily assuming that its coefficient equals -cpij. Now, if the theory is correct,
and the flexible functional form is an adequate representation of it over the data,
the restrictions should be satisfied, or at least not significantly violated. Similarly,
Ch. 30: Demand Analysis 1791
for the Rotterdam system, if the underlying theory is correct, it might be expected
that its approximation by (58) would estimate derivatives conforming to the
theoretical restrictions. From (59), homogeneity requires ccij = 0 and symmetry
cij = cji. Negative semi-definiteness of the Slutsky matrix can also be imposed
(globally for the Rotterdam model and at a point for the other models) following
the work of Lau (1978) and Barten and Geyskens (1975).
The AIDS, translog, and Rotterdam models far from exhaust the possibilities
and many other flexible functional forms have been proposed. Quadratic logarith-
mic approximations can be made to distance and cost functions as well as to
utility functions. The direct quadratic utility function u = (q - a)‘A(q - a) is
clearly flexible, though it suffers from other problems such as the existence of
“bliss” points, see Goldberger (1967). Diewert (1973b) suggested that G*(r) be
approximated by a “Generalized Leontief” model
-1
This has the nice property that it is globally quasi-convex if Si 2 0 and yii 2 0 for
all i, j; it also generalizes Leontief since with 6, = Si = 0 and yij = 0 for i # j,
#*(r) is the indirect utility function corresponding to the Leontief preferences (2).
Bemdt and Khaled (1979) have, in the production context, proposed a further
generalization of (62) where the 3 is replaced by a parameter, the “generalized
BOX-COX” system.
There is now a considerable body of literature on testing the symmetry and
homogeneity restrictions using the Rotterdam model, the translog, or these other
approximations, see, e.g. Barten (1967), (1969) Byron (1970a), (1970b), Lluch
(1971), Parks (1969) Deaton (1974a), (1978), Deaton and Muellbauer (1980b),
Theil (1971a), (1975b), Christensen, Jorgensen and Lau (1975) Christensen and
Manser (1977), Bemdt, Darrough and Diewert (1977) Jorgenson and Lau (1976),
and Conrad and Jorgenson (1979). Although there is some variation in results
through different data sets, different approximating functions, different estimation
and testing strategies, and different commodity disaggregations, there is a good
deal of accumulated evidence rejecting the restrictions. The evidence is strongest
for homogeneity, with less (or perhaps no) evidence against symmetry over and
above the restrictions embodied in homogeneity. Clearly, for any one model, it is
impossible to separate failure of the model from failure of the underlying theory,
but the results have now been replicated frequently using many different func-
tional forms, so that it seems implausible that an inappropriate specification is at
the root of the difficulty. There are many possible substantive reasons why the
theory as presented might fail, and I shall discuss several of them in subsequent
sections. However, there are a number of arguments questioning this sort of
1792 A. Deaton
procedure for testing. One is a statistical issue, and questions have been raised
about the appropriateness of standard statistical tests in this context; I deal with
these matters in the next subsection. The other arguments concern the nature of
flexible functional forms themselves.
Empirical work by Wales (1977), Thursby and Lovell (1978) Griffin (1978),
Berndt and Khaled (1979), and Guilkey and Lovell (1980) cast doubt on the
ability of flexible functional forms both to mimic the properties of actual
preferences and technologies, and to behave “regularly” at points in price-outlay
space other than the point of local approximation (i.e. to generate non-negative,
downward sloping demands). Caves and Christensen (1980) investigated theoreti-
cally the global properties of the (indirect) translog and the generalized Leontief
forms. For a number of two and three commodity homothetic and non-homo-
thetic systems, they set the parameters of the two systems to give the same pattern
of budget shares and substitution elasticities at a point in price space, and then
mapped out the region for which the models remained regular. Note that
regularity is a mild requirement; it is a minimal condition and does not by itself
suggest that the system is a good approximation to true preferences or behavior.
It is not possible here to reproduce Caves and Christensen’s diagrams, nor do the
authors give any easily reproducible summary statistics. Nevertheless, although
both systems can do well (e.g. when substitutability is low so that preferences are
close to Leontief, the GL is close to globahy regular, and similarly for the translog
when preferences are close to Cobb-Douglas), there are also many cases where
the regular regions are worringly small. Of course, these results apply only to the
translog and the GL systems, but I see no reason to suppose that similar problems
would not occur for the other flexible functional forms discussed above.
These results raise questions as to whether Taylor series approximations, upon
which most of these functional forms are based, are the best type of approxima-
tions to work with, and there has been a good deal of recent activity in exploring
alternatives. Barnett (1983a) has suggested that Laurent series expansions are a
useful avenue to explore. The Laurent expansion of a function f(x) around the
point x,, takes the form
f(x)=
“=
‘c”+-XJ,
-00
(63)
where v. = r!12 and 5. = r.w1/2. The resulting demand system has too many
parameters to be estimaied in most applications, and has more than it needs to be
Ch. 30: Demand Analysis 1793
where D is the sum over i of the bracketed expression. Barnett calls this the
miniflex Laurent model. The squared terms guarantee non-negativity, but are
likely to cause problems with multiple optima in estimation. Bamett and Lee
(1983) present results comparable to those of Caves and Christensen’s which
suggest that the miniflex Laurent has a substantially larger regular region than
either translog or GL models.
A more radical approach has been pioneered by Gallant, see Gallant (1981),
and Gallant and Golub (1983), who has shown how to approximate indirect
utility functions using Fourier series. Interestingly, Gallant replicates the
Christensen, Jorgenson and Lau (1975) rejection of the symmetry restriction,
suggesting that their rejection is not caused by the approximation problems of the
translog. Fourier approximations are superior to Taylor approximations in a
number of ways, not least in their ability to keep their approximating qualities in
the face of the separability restrictions discussed in Section 4 below. However,
they are also heavily parametrized and superior approximation may be being
purchased at the expense of low precision of estimation of key quantities. Finally,
many econometricians are likely to be troubled by the sinusoidal behavior of
fitted demands when projected outside the region of approximation. There is
something to be said for using approximating functions that are themselves
plausible for preferences and demands.
The whole area of flexible functional forms is one that has seen enormous
expansion in the last five years and perhaps the best results are still to come. In
particular, other bases for spanning function space are likely to be actively
explored, see, e.g. Bamett and Jones (1983).
The principles involved are most simply discussed within a single model and for
convenience I shall use the Rotterdam system written in the form, i = 1,. . . , (n - 1)
w,dlnq,=~,+b~dlnx,+Cyiidlnpj+Uirr (66)
1794 A. Deaton
where dln X, is an abbreviated form of the term in (58) and, in practice, the
differentials would be replaced by finite approximations, see Theil(1975b, Chapter
2) for details. I shall omit the n th equation as a matter of course so that D stands
for the (n - 1) x (n - 1) variance-covariance matrix of the u ‘s.
The u, vectors are assumed to be identically and independently distributed as
N(O,52). I shall discuss the testing of two restrictions: homogeneity ciyij = 0, and
symmetry, y,j = yii.
Equation (66) is in the classical multivariate regression:orm (49) so equation
by equation OLS yield: SURE and FIML estimates. Let p be the stacked vector
of OLS estimates and D for the unrestricted estimate of the variance-covariance
matrix (50). If the matrix of unrestricted residuals Y - Xi is denoted by I?, (50)
takes the form
n-1
and re-estimate. Once again OLS is SURE is FIML and the restriction can be
tested equation by equation using standard text-book F-tests. These are exact
tests and no problems of asymptotic approximation arise. For examples, see
Deaton and Muellbauer’s (1980b) rejections of homogeneity using AIDS. If an
overall test is desired, a Hotelling T2 test can be constructed for the system as a
whole, see Anderson (1958 pp. 207-10) and Laitinen (1978). Laitinen also
documents the divergence between Hotelling’s T2 and its limiting x2 distribution
when the sample size is small relative to the number of goods, see also Evans and
Savin (1982). In consequence, homogeneity should always be tested using exact F
or T2 statistics and neuer using asymptotic test statistics such as uncorrected
Wald, likelihood ratio, or Lagrange multiplier tests. However, my reading of the
literature is that the rejection of homogeneity in practice tends to be confirmed
using exact tests and is not a statistical illusion based on the use of inappropriate
asymptotics.
Testing symmetry poses much more severe problems since the presence of the
cross-equation restrictions makes estimation more difficult, separates SUR from
FIML estimators and precludes exact tests. Almost certainly the simplest testing
procedure is to use a Wald test based on the unrestricted (or homogeneous)
estimates. Define R as the fn(n -1)X( n - l)( n + 2) matrix representing the
Ch. 30: Demand Analysis 1795
w,=P’R’[R(B~(xlx)-l}R’]-lR~, (70)
ji=T-li’E
(71)
The new estimate of b can be substituted into (52) and iterations continued to
convergence yielding the FIML estimators of /3 and Sz. Assume that this process
has been carried out and that (at the risk of some notational confusion) fi and fi
are the final estimates. A likelihood ratio test can then be computed according to
W,=Tln{detfi/detfi}, (72)
(73)
know that rejections will be less frequent, but it was still found that, with n large
relative to (T - k) both W, and W, grossly over-rejected.
These problems for testing symmetry are basically the same as those discussed
for estimation in (2.3) above; typical time series are not long enough to give
reliable estimates of the variance-covariance matrix, particularly for large sys-
tems. For estimation, and for the testing of within equation restrictions, the
difficulties can be circumvented. But for testing cross-equation restrictions, such
as symmetry, the problem remains. For the present, it is probably best to suspend
judgment on the -existing tests of symmetry (positive or negative) and to await
theoretical or empirical developments in the relevant test statistics. [See Byron
and Rosalsky (1984) for a suggested ad hoc size correction that appears to work
well in at least some situations.]
All the techniques of demand analysis so far discussed share a common approach
of attempting to fit demand functions to the observed data and then enquiring as
to the compatibility of these fitted functions with utility theory. If unlimited
experimentation were a real possibility in economics, demand functions could be
accurately determined. As it is, however, what is observed is a finite collection of
pairs of quantity and price vectors. It is thus natural to argue that the basic
question is whether or not these observed pairs are consistent with any preference
ordering whatever, bypassing the need to specify particular demands or prefer-
ences. It may well be true that a given set of data is perfectly consistent with
utility maximization and yet be very poorly approximated by AIDS, the translog,
the Rotterdam system or any other functional form which the limited imagination
of econometricians is capable of inventing.
Non-parametric demand analysis takes a direct approach by searching over the
price-quantity vectors in the data for evidence of inconsistent choices. If these do
exist, a utility function exists and algorithms exist for constructing it (or at least
one out of the many possible). The origins of this type of analysis go back to
Samuelson’s (1938) introduction of revealed preference analysis. However, the
recent important work on developing test criteria is due to Hanoch and Rothschild
(1972) and especially to Afriat (1967), (1973), (1976), (1977) and (1981). Unfor-
tunately, some of Afriat’s best work has remained unpublished and the published
work has often been difficult for many economists to understand and assimilate.
However, as the techniques involved have become more widespread in economics,
other workers have taken up the topic, see the interpretative essays by Diewert
(1973a) and Diewert and Parkan (1978) -the latter contains actual test results-and
also the recent important work by Varian (1982, 1983).
Afriat proposes that a finite set of data be described as cyclically consistent if,
for any “cycle”, a, b, c,. . ., r, a of indices, pa. q” 2 pa. qb, ph. qb 2 pb* q’,
Ch. 30: Demand Analysis 1191
P’YO > PW
po. qo - po. ql *
(74)
For many periods simultaneously, Afriat (1981) shows that the Laspeyres index
between any two periods i and j, say, should be no less than the chain-linked
Paasche index obtained by moving from i to j in any number of steps. Given that
1798 A. Deaton
no one using any parametric form has ever suggested that all total expenditure
elasticities are unity, it comes as something of a surprise that the Afriat condition
appears to be acceptable for an 111 commodity disaggregation of post-war U.S.
data, see Manser and McDonald (1984).
Clearly, more work needs to be done on reconciling parametric and non-para-
metric approaches. The non-parametric methodology has not yet been success-
fully applied to cross-section data because it provides no obvious way of dealing
with non-price determinants of demand. There are also difficulties in allowing for
“disturbance terms” so that failures of, e.g. GARP, can be deemed significant or
insignificant, but see the recent attempts by Varian (1984) and by Epstein and
Yatchew (1985).
of largely unexploited data, although the pace of work has recently been increas-
ing, see, for example, the survey paper on India by Bhattacharrya (1978), the
work on Latin America by Musgrove (1978), Howe and Musgrove (1977), on
Korea by Lluch, Powell and Williams (1977, Chapter 5) and on Sri Lanka by
Deaton (1981~).
In this section, I deal with four issues. The first is the specification and choice
of functional form for Engel curves. The second is the specification of how
expenditures vary with household size and composition. Third, I discuss a group
of econometric issues arising particularly in the analysis of micro data with
particular reference to the treatment of zero expenditures, including a brief
assessment of the Tobit procedure. Finally, I give an example of demand analysis
with a non-linear budget constraint.
This is very much a traditional topic to which relatively little has been added
recently. Perhaps the classic treatment is that of Prais and Houthakker (1955)
who provide a list of functional forms, the comparison of which has occupied
many manhours on many data sets throughout the world. The Prais-Houthakker
methodology is unashamedly pragmatic, choosing functional forms on grounds of
fit, with an attempt to classify particular forms as typically suitable for particular
types of goods, see also Tomqvist (1941), Aitchison and Brown (1954-5), and the
survey by Brown and Deaton (1972) for similar attempts. Much of this work is
not very edifying by modem standards. The functional forms are rarely chosen
with any theoretical model in mind, indeed all but one of Prais and Houthakker’s
Engel curves are incapable of satisfying the adding-up requirement, while, on the
econometric side, satisfactory methods for comparing different (non-nested) func-
tional forms are very much in their infancy. Even the apparently straightforward
comparison between a double-log and a linear specification leads to considerable
difficulties, see the simple statistic proposed by Sargan (1964) and the theoreti-
cally more satisfactory (but extremely complicated) solution in Aneuryn-Evans
and Deaton (1980).
More recent work on Engel curves has reflected the concern in the rest of the
literature with the theoretical plausibility of the specification. Perhaps the most
general results are those obtained in a paper by Gorman (1981), see also Russell
(1983) for alternative proofs. Gorman considers Engel curves of the general form
where R is some finite set and (p,( ) are a series of functions. If such equations are
1800 A. Deaton
to be theory consistent, there must exist a cost function c(u, p) such that
Gorman shows that for these partial differential equations to have a solution, (a)
the rank of the matrix formed from the coefficients arr( p) can be no larger than 3
and (b), the functions c&( ) must take specific restricted forms. There are three
generic forms for (75), two of which are reproduced below
where S is a finite set of elements u,, S_ its negative elements and S, its positive
elements. A third form allows combinations of trigonometrical functions of x
capable of approximating a quite general function of x. However, note that the
y,, 1-1, and 9, functions in (77) and (78) are not indexed on the commodity
subscript i, otherwise the rank condition on a,, could not hold.
Equations (77) and (78) provide a rich source of Engel curve specifications and
contain as special cases anumber of important forms. From (77), with m =l, the
form proposed by Working and Leser and discussed above, see (15), is obtained.
In econometric specifications, u,(p) adds to unity and b,(p) to zero, as will their
estimates if OLS is applied to each equation separately. The log quadratic form
(79)
was applied in Deaton (1981~) to Sri Lankan micro household data for the food
share where the quadratic term was highly significant and a very satisfactory fit
was obtained (an R2 of 0.502 on more than 3,000 observations.) Note that, while
for a single commodity, higher powers of In x could be added, doing so in a
complete system would require cross-equation restrictions since, according to
(77), the ratios of coefficients on powers beyond unity should be the same for all
commodities. Testing such restrictions (and Wald tests offer a very simple
method-see Section 4(a) below) provides yet another possible way of testing the
theory.
Equation (78) together with S = { - 1, 1, 2,. . . , r , . . . } gives general polynomial
Engel curves. Because of the rank condition, the quadratic with S = { - 1, l} is as
Ch. 30: Demand Analysis 1801
Piqi=b:(P)+al(P)X+dr(P)X*, (80)
where b:(p) = bi( p)p,(p) and dT( p) = di( p)f3,( p). This is the “quadratic
expenditure system” independently derived by Howe, Pollak and Wales (1979)
Pollak and Wales (1978) and (1980). The cost function underlying (80) may be
shown to be
a(P)
4% P) = 4 P)- u + y(p> Y (81)
where the links between the ai, br and dr on the one hand and the (Y,j3 and y
on the other are left to the interested reader. (With lnc(u, p) on the left hand
side, (81) also generates the form (79)). This specification, like (79) is also of
considerable interest for time-series analysis since, in most such data, the range of
variation in x is much larger than that in relative prices and it is to be expected
that a higher order of approximation in x than in p would be appropriate.
Indeed, evidence of failure of linearity in time-series has been found in several
studies, e.g. Carlevaro (1976). Nevertheless, in Howe, Pollak and Wales’ (1979)
study using U.S. data from 1929-1975 for four categories of expenditure, tests
against the restricted version represented by the linear expenditure system yielded
largely insignificant results. On grouped British cross-section data pooled for two
separate years and employing a threefold categorization of expenditures, Pollak
and Wales (1978) obtain a x2 values of 8.2 (without demographics) and 17.7
(with demographics) in likelihood ratio tests against the linear expenditure
system. These tests have 3 degrees of freedom and are notionally significant at the
5% level (the 5% critical value of a x: variate is 7.8) but the study is based on
only 32 observations and involves estimation of a 3 X 3 unknown covariance
matrix. Hence, given the discussion in Section 2.6 above, a sceptic could reasona-
bly remain unconvinced of the importance of the quadratic terms for this
particular data set.
Another source of functional forms for Engel curves is the study of conditions
under which it is possible to aggregate over consumers and I shall discuss the
topic in Section 5 below.
the object of most attention and I shall concentrate the discussion around them,
but other household characteristics can often be dealt with in the same way, (e.g.
race, geographical region, religion, occupation, pattern of durable good owner-
ship, and so on). If the vector of these characteristics is a, and superscripts denote
individual households, the general model becomes
q:= gi(Xh2
P? ah)? (82)
with gi taken as common and, in many studies, with p assumed to be the same
across the sample and suppressed as an argument in the function.
The simplest methodology is to estimate a suitable linearization of (82) and one
question which has been extensively investigated in this way is whether there are
economies of scale to household size in the consumption of some or all goods. A
typical approach is to estimate
lnq,~=~i+Pilnxh+yjlnnh+~i, (83)
which, if needs can be linked to, say, age through the y functions, would yield an
applicable specification with strong restrictions on behavior. However, such
models are somewhat artificial in that they ignore the ‘public’ or shared goods in
family consumption, though suitable modifications can be made. They also lack
empirical sharpness in that the consumption vectors of individual family members
are rarely observed. The exception is in the case of family labor supply, see
Chapter 32 of this volume.
Rather more progress has been made in the specification of needs under the
assumption that the family acts as a homogeneous unit. The simplest possibility is
that, for a given welfare level, costs are affected multiplicatively by some index
depending on characteristics and welfare, i.e.
(87)
which is independent of ah. Hence, if households face the same prices, those with
the same consumption patterns W, have the same uh, so that by comparing their
outlays the ratio of their costs is obtained. By (86), this ratio is the equivalence
scale m(uh, uh). This procedure derives directly from Engel’s (1895) pioneering
work, see Prais and Houthakker (1955). In practice, a single good, food, is usually
used although there is no reason why the model cannot be applied more generally
under suitable specification of the m and c functions in (86), see e.g. Muellbauer
1804 A. Deafon
(1977). For examples of the usual practice, see Jackson (1968) Orshansky (1965)
Seneca and Taussig (1971) and Deaton (1981~).
Although the Engel model is simple to apply, it has the long recognised
disadvantage of neglecting any commodity specific dimension to needs. Common
observation suggests that changes in demographic composition cause substitution
of one good for another as well as the income effects modelled by (86) and (87).
In a paper of central importance to the area, Barten (1964) suggested that
household utility be written
So that, using Pollak and Wales’ (1981) later terminology, the demographic
variables generate indices which “scale” commodity consumption levels. The
Barten model is clearly equivalent to writing the cost function in the form
Pi*=Pimi(ah), (91)
for a cost function C(U, p) for the reference household. Hence, if g;(x, p) are the
Marshallian demands for the household, household h’s demands are given by
(92)
alnqi _ alnmi
(93)
aaj aa,
(94)
where m,(a h, is the specifc commodity scale, and mo(ah) is some general scale.
In contrast to (93), we now have the relationship
alnq, = ahmi
so that the substitution effects embodied in (93) are no longer present. Indeed, if
xh/mo( ah) is interpreted as a welfare indicator (which is natural in the context)
(94) can only be made consistent with (88) and (89) if indifference curves are
Leontief, ruling out all substitution in response to relative price change, see
Muellbauer (1980) for details, and Pollak and Wales (1981) for a different
interpretation.
On a single cross-section, neither the Barten model nor the Prais-Houthakker
model are likely to be identifiable. That there were difficulties with the
Prais-Houthakker formulation has been recognized for some time, see Forsyth
(1960) and Cramer (1969) and a formal demonstration is given in Muellbauer
(1980). In the Barten model, (93) may be rewritten in matrix notation as
F=(I+E)M, (96)
F=M-em’, (97)
which yields
Once again (I - ew’) is singular, and the identification problem recurs. Here price
information is likely to be of less help since, with Leontief preferences, prices
have only income effects. Even so, it is not difficult to construct Prais-Houthakker
models which identified given sufficient variation in prices.
Since Prais and Houthakker, the model has nevertheless been used on a number
of occasions, e.g by Singh (1972), (1973), Singh and Nagar (1973) and
McClements (1977) and it is unclear how identification was obtained in these
studies. The use of a double logarithmic formulation for f, helps; as is well-known,
such a function cannot add up even ZucaZly, see Willig (1976), Varian (1978), and
Deaton and Muellbauer (1980a, pp 19-20) so that the singularity arguments
given above cannot be used. Nevertheless, it seems unwise to rely upon a clear
misspecification to identify the parameters of the model. Coondoo (1975) has
proposed using an assumed independence of m, on x as an identifying restric-
tion; this is ingenious but, unfortunately, turns out to be inconsistent with
the model. There are a number of other possible means of identification, see
Muellbauer (1980), but essentially the only practical method is the obvious one of
assuming a priori a value for one of the m,‘s. By this means, the model can be
estimated and its results compared with those of the Barten model. Some results
for British data are given in Muellbauer (1977) (1980) and are summarized in
Deaton and Muellbauer (1980a, pp 202-5). In brief, these suggest that each
model is rather extreme, the Prais-Houthakker with its complete lack of substitu-
tion and the Barten with its synchronous equivalence of demographic and price
substitution effects. If both models are normalized to have the same food scale,
the Prais-Houthakker model also tends to generate the higher scales for other
goods since, unless the income effects are very large, virtually all variations with
composition must be ascribed directly to the mi’s. The Barten scales are more
plausible but evidence suggests that price effects and demographic effects are not
linked as simply as is suggested by (93).
Gorman (1976) has proposed an extension to (90) which appears appropriate in
the light of this evidence. In addition to the Barten substitution responses he adds
fixed costs of children yj(ah) say; hence (90) becomes
with (94) retained as before. Clearly, (99) generates demands of the form
Ch. 30: Demand Analysis 1807
Pollak and Wales (1981) call the addition of fixed costs “demographic translating”
as opposed to “demographic scaling” of the Barten model; the Gorman model
(99) thus combines translating and scaling. In their paper, Pollak and Wales test
various specifications of translating and scaling. Their results are not decisive but
tend to support scaling; with little additional explanatory power from translating
once scaling has been allowed for. Note, however, that the translating term in (99)
might itself form the starting point for the modelling, just as did the multiplicative
term in the Engel model. If the scaling terms in, (99) are dropped, so that p
replaces p*, and if it is recognized that the child cost term p. y(ah) is likely to be
zero for certain “adult” goods, then for i an adult good, we have
for i, j = 1,2 and for p1cx2 # p2cx1. It is not difficult to design more complex (and
more realistic) models along similar lines. For a single commodity, many of these
1808 A. Deaton
models can be made formally equivalent to the Tobit, Tobin (1958) model
y,* - xy + ui
Yi = YF if yj* 2 0
= 0 otherwise, (104)
Hence, if, p( ui) is the p.d.f. of ui the likelihood for the model is
(106a)
This can be maximized directly to estimate j3 and V; given some low parameter
specification for TV.But note in particular that for ri = B for all i and ui taken as
i.i.d.N(O, u2) the likelihood is, for n, the number of zero yi’s,
(106b)
Hence OLS on the positive yi’s alone is consistent and fully efficient for /3/r and
a/?z. The MLE of a is simply the ratio of the number of positive y,‘s to the
sample size, so that, in this case, all parameters are easily estimated. If this is the
true model, Tobit will not generally be consistent. However, note that (105) allows
yi to be negative (although this may be very improbable) and ideally the Tobit
and the binary model should be combined. A not very successful attempt to do
this is reported in Deaton and Irish (1984). See also Kay, Keen and Morris (1984)
for discussion of the related problem of measuring total expenditure when there
are many zeroes.
In my view, the problem of dealing appropriately with zero expenditures is
currently one of the most pressing in applied demand analysis. We do not have a
theoretically satisfactory and empirically implementable method for modelling
zeroes for more than a few commodities at once. Yet all household surveys show
large fractions of households reporting zero purchases for some goods. Since
household surveys typically contain several thousands observations, it is im-
portant that procedures be developed that are also computationally inexpensive.
There are also a number of other problems which are particularly acute in
cross-section analysis and are not specific to the Tobit specification. Heteroscedus-
ticity tends to be endemic in work with micro data and, in my own practical
experience, is extremely difficult to remove. The test statistics proposed by
Breusch and Pagan (1979) and by White (1980) are easily applied, and White has
proposed an estimator for the variance-covariance matrix which is consistent
under heteroscedasticity and does not require any specification of its exact form.
Since an adequate specification seems difficult in practice, and since in micro
studies efficiency is rarely a serious problem, White’s procedure is an extremely
valuable one and should be applied routinely in large cross-section regressions.
Note, however, that with Tobit-like models, untreated heteroscedasticity generates
inconsistency in the parameter estimates, see Chapter 27, thus presenting a much
more serious problem. The heteroscedasticity introduced by grouping has become
1810 A. Deuion
less important as grouped data has given way to the analysis of the original micro
observations, but see Haitovsky (1973) for a full discussion.
Finally, there are a number of largely unresolved questions about the way in
which survey design should be taken into account (if at all) in econometric
analysis. One topic is whether or not to use inverse probability weights in
regression analysis, see e.g. DuMouchel and Duncan (1983) for a recent discus-
sion. The other concerns the possible implications for regression analysis of
Godambe’s (1955) (1966) theorem on the non-existence of uniformly minimum
variance or maximum likelihood estimators for means in finite populations, see
Cassel, Sarndal and Wretman (1977) for a relatively cool discussion.
Other
goods
Z sugar
markets. Housing is the obvious example, but here I illustrate with a simple case
based on Deaton (1984). In many developing countries, the government operates
so-called “fair price” shops in which certain commodities, e.g. sugar or rice, are
made available in limited quantities at subsidized prices. Typically, consumers
can buy more than the fair price allocation in the free market at a price pl, with
p1 > p,, the fair price price. Figure 2 illustrates for “sugar” versus a numeraire
good with unit price. Z is the amount available in the fair price shop and the
budget constraint assumes that resale of surplus at free market prices is impossi-
ble.
There are two interrelated issues here for empirical modelling. At the micro
level, using cross-section data, we need to know how to use utility theory to
generate Engel curves. At the macro-level, it is important to know how the two
prices p. and p1 and the quantity Z affect total demand. As usual, we begin with
the indirect utility function, though the form of this can be dictated by prior
beliefs about demands (e.g. there has been heavy use of the indirect utility
function associated with a linear demand function for a single good- for the
derivation, see Deaton and Muellbauer (1980a, p. 96) (1981) and Hausman
(1980)). Maximum utility along AD is u0 = #(x, p, po) with associated demand,
by Roy’s identity, of s0 = g(x, p, po). Now, by standard revealed preference, if
s0 < Z, s,, is optimal since BC is obtainable by a consumer restricted to being
within AD. Similar, maximum utility along EC is ui = J/(x +(pr - po)Z, p, pl)
with s = g(x +( p1 - p,,)Z, p, pl). Again, if s1 > Z, then si is optimal. The
remaining case is s,, > Z and si < Z (both of which are infeasible), so that sugar
demand is exactly Z (at the kink B). Hence, for individual h with expenditure xh
and quota Zh, the demand functions are given by
008)
sh=z” if gh(xh +( p1 - po)Zh, p, PI) I ZhS gh(xh, P, PO) (109)
-_----
as
-=
CYZ
$(p,- d-1) dF(x).
/,‘p{ 012)
Since, at the entensive margin, consumers buy nothing in the free market, only the
intensive margin is of importance. Note that all of these estimations and
calculations take a particularly simple form if the Marshallian demand functions
are assumed to be linear, so that, even in this non-standard situation, linearity can
still greatly simplify.
The foregoing is a very straightforward example but is illustrates the flavor of
the analysis. In practice, non-linear budget constraints may have several kink
points and the budget set may be non-convex. While such things can be dealt
with, e.g. see Ring (1980) or Hausman and Wise (1980) for housing, and Reece
and Zieschang (1984) for charitable giving, the formulation of the likelihood
becomes increasingly complex and the computations correspondingly more
Ch. 30: Demand Analysis 1813
4. Separability
4. I. Weak separability
Weak separability is the central concept for much of the analygs. Let qA be some
subvector of the commodity vector q so that q = (qA, qA) without loss of
generality. qA is then said to be (weakly) separable if the direct utility function
takes the form
013)
uA(qA) is the subutility (or felicity) function associated with qA. This equation is
equivalent to the existence of a preference ordering over qA alone; choices over
the qA bundles are consistent independent of the vector qA. More symmetrically,
preferences as a whole are said to be separable if q can be partitioned into
(qA, qB,....qN) such that
u= u(uA(qA),UR(qB)~...,UN(qN)). 014)
(115)
where x A - pA. qA, while (115) has the same implication for all groups. Hence, if
preferences in a life-cycle model are weakly separable over time periods, commod-
ity demand functions conditional on x and p for each time period are guaranteed
to exist. Similarly, if goods are separable from leisure, commodity demand
functions of the usual type can be justified.
Tests of these forms of separability can be based on the restrictions on the
substitution matrix implied by (115). If i and j are two goods in distinct groups,
i E G, j E H, G f H, then the condition
(116)
for some quantity pCH (independent of i and j) is both necessary and sufficient
for (114) to hold. If a general enough model of substitution can be estimated,
(116) can be used to test ‘separability, and Byron (1968), Jorgenson and Lau
(1975) and Pudney (1981b), have used essentially this technique to find separabil-
ity patterns between goods within a single period. Bamett (1979a) has tested the
important separability restriction between goods and leisure using time series
American data and decisively rejects it. If widely repeated, this result would
suggest considerable misspecification in the traditional studies. It is also possible
to use a single cross-section to test separability between goods and leisure.
Consider the following cost function proposed by Muellbauer (1981b).
where w is the wage d(p), b(p) and a(p) are functions of p, homogenous of
degrees, 1, 0 and 1 respectively. Shephard’s Lemma gives immediately
018)
for transfer income p, hours worked h and parameters CY,/3, y all constant in a
single cross-section. It may be shown that (117) satisfies (114) for leisure vis-a-vi,
goods if and only if b(p) is a constant, which for (118) implies that pi/y; be
independent of i, i =l,..., n. This can be tested by first estimating (114) as a
system by OLS equation by equation and then computing the Wald test for the
Ch. 30: Demand Analysis 1815
(n - 1) restrictions, i = 1,. . . , (n - 1)
P;Y, - VA = 0. 019)
This does not involve estimating the restricted nonlinear model. My own results
on British data, Deaton (1981b), suggest relatively little conflict with separability,
however, earlier work by Atkinson and Stern (1981) on the same data but using
an ingenious adaptation of Becker’s (1965) time allocation model, suggests the
opposite. Blundell and Walker (1982), using a variant of (117) reject the hypothe-
sis that wife’s leisure is separable from goods. Separability between different time
periods is much more difficult to test since it is virtually impossible to provide
general unrestricted estimates of the substitution responses between individual
commodities across different time periods.
Subgroup demand functions are only a part of what the applied econometrician
needs from separability. Just as important is the question of whether it is possible
to justify demand functions for commodity composites in terms of total expendi-
ture and composite price indices. The Hicks (1936) composite commodity theo-
rem allows this, but only at the price of assuming that there are no relative price
changes within subgroups. Since there is no way of guaranteeing this, nor often
even of checking it, more general conditions are clearly desirable. In fact, the
separable structure (114) may be sufficient in many circumstances. Write uA, ug,
etc. for the values of the felicity functions and ca(uA, pA) etc. for the subgroup
cost functions corresponding to the uA(@‘) functions. Then the problem of
choosing the group expenditure levels xA, xr,, . . . can be written as
Write
P”)
CR(UR,
CR(UR~pR)=CR(UR~~R)-C (uR p”)’ (121)
R 9
for some fixed prices FR. For such a fixed vector, cR(uR, FR) is a welfare
indicator or quantity index, while the ratio cR( uR, pR)/cR( uR, PR) is a true (sub)
cost-of-living price index comparing pR and pR using uR as reference, see Pollak
(1975). Finally, since uR = qR(cR(uR, pR), pR), (120) may be written
maxu=u{~~(c,(u,,PA),PA),~B( 1, }. 022)
cR( ‘R, P”)
S.t.&R(“R, pR)*
cR(uR, PR) =”
1816 A. Deaton
for suitable functions FG, b, and a,, the first monotone increasing, the latter two
linearly homogeneous, and the utility function (114) or (120) must be additive in
the individual felicity functions. Additivity is restrictive even between groups, and
will be further discussed below, but (123j permits fairly general forms of Engel
curves, e.g. the Working form, AIDS, PIGL and the translog (61) if C,C,p,, = 0.
See Blackorby, Boyce and Russell (1978) for an empirical application, and
Anderson (1979) for an attempt to study the improvement over standard practice
of actually computing the Gorman indices. In spite of this analysis, there seems to
be a widespread belief in the profession that homothetic weak separability is
necessary for the empirical implementation of two-stage budgeting (which is itself
almost the only sensible way to deal with very large systems) - see the somewhat
bizarre exchanges in the 1983 issue of the Journal of Business and Economic
Statistics. In my view, homothetic separability is likely to be the least attractive of
the alternatives given here; it is rarely sensible to maintain without testing that
subgroup demands have unit group expenditure elasticities. In many cases, prices
will be sufficiently collinear for the problem (122) to given an acceptably accurate
representation. And if not, additivity between broad groups together with the very
flexible Gorman generalized polar form should provide an excellent alternative.
Even failing these possibilities, there are other types of separability with useful
empirical properties, see Blackorby, Primont and Russell (1978) and Deaton and
Muellbauer (1980, Chapter 5).
One final issue related to separability is worth noting. As pointed out by
Blackorby, Primont and Russell (1977), flexible functional forms do not in
Ch. 30: Demand Analysis 1817
Strong separability restricts (114) to the case where the overall function is
additive, i.e. for some monotone increasing f
a4i aqj
s *,=p,,p (125)
The budget constraint (or homogeneity) can be used to complete this for all i and
j; in elasticity terms, the relationship is, Frisch (1959) Houthakker (1960)
for some scalar +, (uncompensated) cross-price elasticity eij, and total expendi-
ture elasticity e,. This formula shows immediately the strengths and weaknesses of
additivity. Apart from the data wi, knowledge of the (n - 1) independent ei’s
together with the quantity (p (obtainable from knowledge of one single price
elasticity) is sufficient to determine the whole (n x n) array of price elasticities.
Additivity can therefore be used to estimate price elasticities on data with little or
1818 A. Deaton
for concave u(q), so that the consumer sells utility (to him or herself) at a price r
( = the reciprocal of the marginal utility of money) using inputs q at prices p.
Now if u(q) has the explicitly additive form CuR(qR), so will rr( p, r), i.e.
(128)
Now m(p, r) also has the derivative property q = - vPr( p, r) so that for i
belonging to group R,
q = _ aTRR(rdR)
I
(129)
aPRi ’
which depends only on within group prices and the single price of utility r which
is common to all groups and provides the link between them. In the intertemporal
context, r is the price of lifetime utility, which is constant under certainty or
follows (approximately) a random walk under uncertainty, while pR is within
Ch. 30: Demand Analysis 1819
Clearly, on micro or pane! data, aggregation is not an issue, and as the use of such
data increases, the aggregation problem will recede in importance. However,
much demand analysis is carried out on macroeconomic aggregate or per capita
data, and it is an open question as to whether this makes sense or not. The topic
is a large one and I present only the briefest discussion here, see Deaton and
Muellbauer (1980a, Chapter 6) for further discussion and references. At the most
general level, average aggregate demand 4, is given by
for the H outlays xh of household h. The function Gi can be given virtually any
properties whatever depending on the configuration of individual preferences. If,
however, the outlay distribution were fixed in money terms, x h = khX for con-
stants kh, (130) obviously gives
032)
a specification known as the “Gorman polar form”. Suitable choice of the ah(p)
and b(p) functions permits (132) to be a flexible functional form, Diewert
(1980a), but the uniformity across households implied by the need for all Engel
curves to be parallel seems implausible. However, it should be noted that a single
cross-section is insufficient to disprove the condition since, in principle, and
without the use of panel data, variation in the ah(p) functions due to non-outlay
factors cannot be distinguished from the direct effects of variations in xh. A
somewhat weaker form of the aggregation condition, emphasized by Theil (1954)
(1975 Chapter 4) is that the marginal propensities to consume be distributed
independently of the xh, see also Shapiro (1976) and Shapiro and Braithwait
(1979). Note finally that if aggregation is to be possible for all possible income
distributions, including those for which some people have zero income, then the
parallel linear Engel curves must pass through the origin so that ah(p) in (132) is
zero and preferences are identical and homothetic.
If, however, the casual evidence against any form of linear Engel curves is
taken seriously exact aggregation requires the abandonment of (131) at least in
principle. One set of possibilities has been pursued by Muellbauer (1975b)
(1976a) (1976b) who examines conditions under which the aggregate budget share
Ch. 30: Demand Analysis 1821
033)
Rh(Xh,
p,ah)=g(xh,p,ah)+kh(p), (134)
1822 A. Deaton
(135)
with fi(x, a) non-constant symmetric functions of the H-vectors x and a, implies
that
(136)
k=l
Gorman’s (1981) theorem, see 3(a) above, tells us what form the $k functions can
take, while Lau’s theorem makes Gorman’s results the more useful and important.
Lau’s theorem provides a useful compromise between conventional aggregation as
represented by (131) on the one hand and complete agnosticism on the other.
Distributional effects on demand are permitted, but in a limited way. Gorman’s
results tell us that to get these benefits, polynomial specifications are necessary
which either link quantities to outlays or shares to the logarithms of outlays. The
latter seem to work better in practice and are therefore recommended for use.
Finally, mention must be made of the important recent work of Stoker who, in
a series of papers, particularly (1982) (1984), has forged new links between the
statistical and economic theories of aggregation. This work goes well beyond
demand analysis per se but has implications for the subject. Stoker (1982) shows
that the estimated parameters from cross-section regressions will estimate the
corresponding macro-effects not only under the Gorman perfect aggregation
conditions, but also if the independent variables are jointly distributed within the
exponential family of distributions. In the context of demand analysis, the
marginal propensity to consume from a cross-section regression would con-
sistently estimate the impact of a change in mean income on mean consumption
either with linear Engel curves or with non-linear Engel curves and income
distributed according to some exponential family distribution. Since one of the
reasons we are interested in aggregation is to be able to move from micro to
macro in this way, these results open up new possibilities. Stoker (1984) also
carries out the process in reverse and derives completeness (or identification)
conditions on the distribution of exogenous variables that allow recovery of micro
behavior from macro relationships.
Much of the work reported in this section, by Muellbauer, Lau and Stoker, can
be regarded as developing the appropriate techniques of allowing for the impacts
of distribution on aggregate demand functions. That such effects could be
potentially important has been known for a long time, see de Wolff (1941) for an
early contribution. What still seems to be lacking so far is empirical evidence that
such effects are actually important.
Ch. 30: Demand Analysis 1823
see eq. (115) above. Clearly, rationing makes no difference to (137) except that z
replaces q O, so that testing for the existence of the quantity restrictions can be
carried out by testing for the endogeneity of q" using a Wu (1973) or Hausman
(1978) test with p” as the necessary vector of exogenous instruments not
appearing in (137). Without separability matters are more complicated and, in
addition to the variables in (137), the demand for q1 depends on z so that
without quantity restrictions
Efficient estimation and testing requires that the relationship between g F and g R
be fully understood. Once again, the cost function provides the answer. If
c( u, p”,pl) is the unrestricted cost function, i.e. that which generates (138) the
restricted cost function c*(u, p”, p’, z) is defined by
=p”‘z+Y(u,P’,z), (140)
where y does not depend upon p”. Define the “virtual prices”, PO, Rothbarth
(1941) as a function {‘(u, pl, z) by the relation
c(~,~“,p)=80~z+Y(~,P1,z), (142)
is an identity in u, p1 and z with j” = {‘(u, p’, z). Hence, combining (140) and
(142)
With p” determined by (141) this equation is the bridge between restricted and
unrestricted cost functions and, since (138) derives from differentiating c( u, p”, p)
and (139) from differentiating c*(u, p”, pl, z), it also gives full knowledge of the
relationship between g F and g R. This can be put to good theoretical use, to prove
all the standard rationing results and a good deal more besides.
For empirical purposes, the ability to derive g R from g F allows the construc-
tion of a “matched pair” of demand functions, matched in the sense of deriving
from the same preferences, and representing both free and constrained behavior.
A first attempt, applied to housing expenditure in the U.K., and using the
Muellbauer cost function (117) is given in Deaton (1981b). In that study I also
found that allowing for quantity restrictions using a restricted cost function
related to that for the AIDS, removed much of the conflict with homogeneity on
post-war British data. Deaton and Muellbauer (1981) have also derived the
matched functional form g F and gR for commodity demands for the case where
there is quantity rationing in the labor market and where unrestricted labor
supply equations take the linear functional forms frequently assumed in the labor
supply literature.
Ch. 30: Demand Analysis 1825
7. Other topics
In a review of even this length, only a minute fraction of demand analysis can be
covered. However, rather than omit them altogether, I devote this last section to
an acknowledgement of the existence of three areas closely linked to the preceed-
ing analysis (and which many would argue are central), intertemporal demand
analysis, the analysis of quality, and the use of demand analysis in welfare
economics.
Commodity choices over a lifetime can perhaps be modelled using the utility
function
A large proportion of the results and formulae of welfare economics, from cost
benefit analysis to optimal tax theory, depend for their implementation on the
results of empirical demand analysis, particularly on estimates of substitution
responses. Since the coherence of welfare theory depends on the validity of the
standard model of behavior, the usefulness of applied demand work in this
context depends crucially on the eventual solution of the problems with homo-
geneity (possible symmetry) and global regularity discussed in Section 2 above.
But even without such difficulties, the relationship between the econometric
estimates and their welfare application is not always clearly appreciated. In
1828 A. Deaion
cv=c(zP,pl)-c(uO,pO), (147)
EV=c(u’,p’)-c(u’,p’), 048)
so that both measure the money costs of a welfare affecting price change from p”
to pl, CV using u” as reference (compensation returns the consumer to the
original welfare level) and EV using u1 (it is equivalent to the change to u’). Base
and current reference true cost-of-living index numbers are defined analogously
using ratios instead of differences, hence
are the base and current true indices. Note the CV, EV and the two price indices
depend in no way on how utility is measured; they depend only on the indiffer-
ence curve indexed by u, which could equally well be replaced by Cp(u) for any
monotone increasing +. Even so, the cost function is not observed directly and a
procedure must be prescribed for constructing it from the (in principle) observ-
able Marshallian demand functions. If the functional forms for these are known,
and if homogeneity, symmetry and negativity are satisfied, the cost function can
be obtained by solving the partial differential equations (12) often analytically,
see e.g. Hausman (1981). Unobserved constants of integration affect only the
measurability of u so that complete knowledge of the Marshallian demands is
equivalent to complete knowledge of consumer surplus and the index numbers. If
analytical integration is impossible or difficult, numerical integration is straight-
forward (provided homogeneity and symmetry hold) and algorithms exist in the
literature, see e.g. Samuelson (1948) and in much more detail, Vartia (1983). If the
integrability conditions fail, consumer behavior is not according to the theory and
it is not sensible to try to calculate the welfare indices in the first place, nor is it
possible to do so. Geometrically, calculating CV or EV is simply a matter of
integrating the area under a Hicksian demand curve; there is no valid theoretical
or practical reason for ever integrating under a Marshallian demand curve. The
very considerable literature discussing the practical difficulties of doing so (the
path-dependence of the integral, for example) provides a remarkable example of
the elaboration of secondary nonsense which can occur once a large primary
category error has been accepted; the emperor with no clothes, although quite
unaware of his total nakedness, is continuously distressed by his inability to tie
Ch. 30: Demand Analysis 1829
his shoelaces. A much more real problem is the assumption that the functional
forms of the Marshallian demands are known, so that working with a specific
model inevitably understates the margin of ignorance about consumer surplus or
index numbers. The tools of non-parametric demand analysis, as discussed in
Section 2.7, can, however, be brought to bear to give bounding relationships on
the cost function and hence on the welfare measures themselves, see Varian
(1982b).
The construction of empirical scales is similar to the construction of price
indices although there are a few special difficulties. For household characteristics
ah, the equivalence scale M( ah, a’; U, p) is defined by
References
Afriat, S. N. (1967) “The Construction of Utility Functions From Expenditure Data”, International
Economic Review, 8, 67-77.
Afriat, S. N. (1973) “On a System of Inequalities in Demand Analysis: An Extension of the Classical
Method”, International Economic Review, 14, 460-472.
Afriat, S. N. (1976) 7’he Combinatorial Theory of Demand. London: Input-output Co.
Afriat, S. N. (1977) The Price Index. Cambridge University Press.
1830 A. Deaton
Afriat. S. N. (1980) Demand Functions and the Slutsky Mafrix. Princeton: Princeton University Press.
Afriat, S. N. (1981) “On the Constructability of Consistent Price Indices Between Several Periods
Simultaneously”, in: A. S. Deaton, ed., Essays in the Theory and Measurement of Consumer
Behaviour in Honour of Sir Richard Stone. Cambridge: Cambridge University Press.
Aitchison. J. and J. A. C. Brown (1954-5) “A Synthesis of Engel Curve Theory”, Revrew of Economic
Studies, 22, 35-46.
Akerlof, G. (1970) “The Market for Lemons”, Quarter/y Journal of Economics, 84, 488-500.
Altfteld, C. L. F. (1985) “Homogeneity and Endogeneity in Systems of Demand Equations”, Journal
of Econometrics, 27, 191-209.
Anderson, G. J. and R. W. Blundell(l982) “Estimation and Hypothesis Testing in Dynamic Singular
Equation Systems”, Econometrica, 50, 155991571.
Anderson, R. W. (1979) “Perfect Price Aggregation and Empirical Demand Analysis”, Econometrica,
47, 1209-30.
Anderson, T. W. (1958) An Introduction to Multivariate Statistical Analysts. New York: John Wiley.
Aneurvn-Evans, G. B. and A. S. Deaton (1980) “Testing Linear versus Logarithmic Regressions”,
Rev& of Economic Studies, 41, 215-91. ~ ’ -
Antonelli. G. B. (1886) Sulla Teoria Matematica della Economia Politica, Pisa: nella Tipografia de1
Folchetto. Republished as “On the Mathematical Theory of Political Economy”, in: J. S. Chipman,
L. Hurwicz, M. K. Richter and H. F. Sonnenschein, eds., Preferences, Uility, and Demand. New
York: Harcourt Brace Jovanovich, 1971.
Armstrong, J. S. (1967) “Derivation of Theory by Means of Factor Analysis or Tom Swift and his
Electric Factor Analysis Machine”, American Statistician, 21(5), 17-21.
Ashenfelter, 0. (1980) “Unemployment as Disequilibrium in a Model of Aggregate Labor Supply”,
Econometrica, 48, 541-564.
Atkinson, A. B. and N. Stern (1981) “On Labour Supply and Commodity Demands”, in: A. S.
Deaton, ed., Esqys in the Theory and Measurement of Consumer Behaviour. New York: Cambridge
University Press.
Barnett, W. A. (1979a) “The Joint Allocation of Leisure and Goods Expenditure”, Econometrica, 47,
539-563.
Barnett. W. A. (1979b) “Theoretical Foundations for the Rotterdam Model”, Review of Economic
Studtes, 46, 109-130.
Barnett, W. A. (1983a) “New Indices of Money Supply and the Flexible Laurent Demand System”,
Journal of Economic and Business Statistics, 1, l-23.
Barn&t, W. A. (1983b) “Definitions of ‘Second Order Approximation’ and ‘Flexible Functional
Form”‘, Economics Letters, 12, 31-35.
Bamett W. A. and A. Jonas (1983) “The Mum-Szatz Demand System: An Application of a Globally
Well-Behaved Series Expansion”, Economics Letters, 11, 331-342.
Barnett W. A. and Y. W. Lee (1985) “The Regional Properties of the Miniflex Laurent, Generalized
Leontief, and Translog Flexible Functional Forms”. Econometrica, forthcoming.
Barten, A. P. (1964) “Family Composition, Prices and Expenditure Patterns”, in: P. E. Hart, G. Mills
and J. K. Whitaker, eds., Economic Analysis for National Economic Planning. London: Butterworth.
Barten, A. P. (1966) Theorie en empirie van een volledig stelsel van vraagvergelijkrngen. Doctoral
dissertation, Rotterdam.
Barten, A. P. (1967) “Evidence on the Slutsky Conditions for Demand Equations”, Review of
Economics and Statistics, 49, 77-84.
Barten, A. P. (1969) “Maximum Likelihood Estimation of a Complete System of Demand Equations”,
European Economic Review, 1, I-13.
Barten, A. P. (1977) “The Systems of Consumer Demand Functions Approach: A Review”,
Economefrica, 45, 23-51.
Barten, A. P. and V. Bohm (1980) “Consumer Theory”, in: K. J. Arrow and M. D. Intriligator, eds..
Handbook of Mathematical Economics. Amsterdam: North-Holland.
Barten, A. P. and E. Geyskens (1975) “The Negativity Condition in Consumer Demand”, European
Economic Review, 6, 221-260.
Becker, G. S. (1965) “A Theory of the Allocation of Time”, Economic Journal, 75, 493-517.
Becker, G. S. (1976) The Economic Approach to Human Behaviour. Chicago: University of Chicago
Press.
Ch. 30: Demand Analysis 1831
Bera, A. K., R. P. Byron and C. M. Jarque (1981) “Further Evidence on Asymptotic Tests for
Homogeneity and Symmetry in Large Demand Systems”, Economics Letters, 8, 101-105.
Bemdt. E. R.. M. N. Darrouah and W. E. Diewert (1977) “Flexible Functional Forms and
Expenditure’Distributions: Ai Application to Canadian- Consumer Demand Functions”, Interna-
tionul Economic Review, 18,651-675.
Bemdt, E. R., B. H. Hall, R. E. Hall and J. A. Hausman (1974) “Estimation and Inference in
Non-Linear Structural Models”, Annals of Economic and Social Measurement, 3, 653-665.
Bemdt, E. R. and M. S. Khaled (1979) “Parametric Productivity Measurement and the Choice Among
Flexible Functional Forms”, Journal of Political Economy, 84, 1220-1246.
Bemdt, E. R. and N. E. Savin (1975) “Estimation and Hypothesis Testing in Singular Equation
Systems With Autoregressive Disturbances”, Econometrica, 43, 937-957.
Bemdt, E. R. and N. E. Savin (1977) “Conflict Among Criteria For Testing Hypotheses in the
Multivariate Linear Regression Model”, Econometrica, 45, 1263-1277.
Bewley, T. (1977) “The Permanent Income Hypothesis: A Theoretical Formulation”, Journal of
Economic Theory, 16,252-292.
Bhattacharrya, N. (1978) “Studies on Consumer Behaviour in India”, in: A Survey of Research in
Economics, Vol. 7, Econometrics, Indian Council of Social Science Research: New Delhi, Allied
Publishers.
Blackorby, C., R. Boyce and R. R. Russell (1978) “Estimation of Demand Systems Generated by the
Gorman Polar Form; A Generalization of the S-branch Utility Tree”, Econometrica, 46, 345-363.
Blackorby, C., D. Primont and R. R. Russell (1977) “On Testing Separability Restrictions With
Flexible Functional Forms”, Journal of Econometrics, 5, 195-209.
Blackorby, C., D. Primont and R. R. Russell (1978) Duality, Separability and Functional Structure.
New York: American Elsevier.
Blundell, R. W. and I. Walker (1982) “Modelling the Joint Determination of Household Labour
Supplies and Commodity Demands”, Economic Journal, 92, 351-364.
Breusch, T. S. and A. R. Pagan (1979) “A Simple Test for Heteroscedasticity and Random Coefficient
Variation”, Econometrica, 47, 1287-1294.
Brown, J. A. C. and A. S. Deaton (1972) “Models of Consumer Behaviour: A Survey”, Economic
Journal, 82, 1145-1236.
Browning, M. J., A. Deaton and M. Irish (1985) “A Profitable Approach to Labor Supply and
Commodity Demands Over the Life-Cycle”, Econometricu, forthcoming.
Burstein, M. L. (1961) “Measurement of the Quality Change in Consumer Durables”, Manchester
School, 29, 267-279.
Byron, R. P. (1968) “Methods for Estimating Demand Equations Using Prior Information: A Series
of Experiments With Australia Data”, Australian Economic Papers, 7, 227-248.
Byron, R. P. (1970a) “A Simple Method for Estimating Demand Systems Under Separable Utility
Assumptions”, Review of Economic Studies, 37, 261-274.
Byron, R. P. (1970b) “The Restricted Aitken Estimation of Sets of Demand Relations”, Econometrica,
38, 816-830.
Byron, R. P. (1982) “A Note on the Estimation of Symmetric Systems”, Econometrica, 50, 1573-1575.
Byron, R. P. and M. Rosalsky (1984) “Symmetry and Homogeneity Tests in Demand Analysis: A Size
Correction Which Works”. University of Florida at Gainsville, mimeo.
Carlevaro, F. (1976) “A Generalization of the Linear Expenditure System”, in: L. Solari and J.-N. du
Pasquier, eds., Private and Enlarged Consumption. North-Holland for ASEPELT, 73-92.
Cassell, C. M., C.-E. Samdal and J. H. Wretman (1977) Foundations of Inference in Survey Sampling.
New York: Wiley.
Caves, D. W. and L. R. Christensen (1980) “Global Properties of Flexible Functional Forms”,
Americun Economic Review, 70,422-432.
Chipman, J. S. (1974) “Homothetic Preferences and Aggregation”, Journal of Economic Theory, 8,
26-38.
Chow, G. (1957) Demand for Automobiles in the U.S.: A Study in Consumer Durables. Amsterdam:
North-Holland.
Chow, G. (1960) “Statistical Demand Functions for Automobiles and Their Use for Forecasting”, in:
A. C. Harberger, ed., The Demand for Durable Goods. Chicago: University of Chicago Press.
Christensen, L. R., D. W. Jorgenson and L. J. Lau (1975) “Transcendental Logarithmic Utility
Functions”, American Economic Review, 65, 367-283.
1832 A. Deaton
Christensen, L. R. and M. E. Manser (1977) “Estimating U.S. Consumer Preferences for Meat With a
Flexible Utility Function”, Journal of Econometrics, 5, 37-53.
Conrad, K. and D. W. Jorgenson (1979) “Testing the Integrability of Consumer Demand Functions”,
European Economic Review, 12, 149-169.
Coondoo, D. (1975) “EfIects of Household Composition on Consumption Pattern: A Note”, Arthan-
iti, 17.
Court, A. T. (1939) “Hedonic Price Indexes with Automotive Examples”, in: The Dynamics of
Automobile Demand. New York: General Motors.
Cowling, K. and J. Cubbin (1971) “Price, Quality, and Advertising Competition”, Economica, 82,
963-978.
Cowling, K. and J. Cubbin (1972) “Hedonic Price Indexes for U.K. Cars”, Economic Journal, 82,
963-978.
Cramer, J. S. (1969) Empirical Economics. Amsterdam: North-Holland.
Cubbin, J. (1975) “Quality Change and Pricing Behaviour in the U.K. Car Industry 1956-1968”,
Economica, 42, 43-58.
Deaton, A. S. (1974a) “The Analysis of Consumer Demand in the United Kingdom, 1900-1970”,
Econometrica, 42, 341-367.
Deaton, A. S. (1974b) “A Reconsideration of the Empirical Implications of Additive Preferences”,
Economic Journal, 84, 338-348.
Deaton, A. S. (1975a) Models and Projections of Demand in Post- War Britain. London: Chapman &
Hall.
Deaton, A. S. (1975b) “The Measurement of Income and Price Elasticities”, European Economic
Review, 6, 261-274.
Deaton, A. S. (1975~) The Structure of Demand 1920-1970, The Fontana Economic History of Europe.
Collins: Fontana, 6(2).
Deaton, A. S. (1976) “A Simple Non-Additive Model of Demand”, in: L. Solari and J.-N. du
Pasquier, eds., Private and Enlarged Consumption. North-Holland for ASEPELT, 56-72.
Deaton, A. S. (1978) “Specification and Testing in Applied Demand Analysis”, Economic Journal, 88,
524-536.
Deaton, A. S. (1979) “The Distance Function and Consumer Behaviour with Applications to Index
Number and Optimal Taxation”, Review of Economic Studies, 46, 391-405.
Deaton, A. S. (1981a) “Optimal Taxes and the Structure of Preferences”, Econometrica, 49,1245-1268.
Deaton, A. S. (1981b) “Theoretical and Empirical Approaches to Consumer Demand Under Ration-
ing”, in: A. S. Deaton, ed., Essays in the Theory and Measurement of Consumer Behaviour. New
York: Cambridge University Press.
Deaton, A. S. (1981~) “Three Essays on a Sri Lankan Household Survey”. Living Standards
Measurement Study W.P. No. 11, Washington: The World Bank.
Deaton, A. S. (1982)“Model Selection Procedures, or Does the Consumption Function Exist?“, in:
G. Chow and P. Corsi, eds., Evaluating the Reliability of Macroeconomic Models. New York: Wiley.
Deaton, A. S. (1984) “Household Surveys as a Data Base for the Analysis of Optimality and
Disequilibrium”, Sankhya: The Indian Journal of Statistics, 46, Series B, forthcoming.
Deaton, A. S. and M. Irish (1984) “A Statistical Model for Zero Exoenditures in Household Budeets”.
-
Journal of Public Economics, 23, 59-80.
Deaton, A. S. and J. Muellbauer (1980a) Economics and Consumer Behavior. New York: Cambridge
University Press.
Deaton, A. S. and J. Muellbauer (1980b) “An Almost Ideal Demand System”, Amertcan Economic
Review, 70, 312-326.
Deaton, A. S. and J. Muellbauer (1981) “Functional Forms for Labour Supply and Commodity
Demands with and without Quantity Constraints”, Econometrica, 49, 1521-1532.
Deaton, A. S. and J. Muellbauer (1986) “Measuring Child Costs in Poor Countries”, Journal of
Political Economy, forthcoming.
Dhrymes, P. J. (1971) “Price and Quality Changes in Consumer Capital Goods: An Empirical Study”,
in: Z. Griliches, ed., Price Indexes and Quality Change: Studies in New Methods of Measurement.
Cambridge: Harvard University Press.
Diewert, W. E. (1971) “An Application of the Shephard Duality Theorem: A Generalized Leontief
Production Function”, Journal of Political Economy, 79, 481-507.
Diewert, W. E. (1973a) “Afriat and Revealed Preference Theory”, Review of Economic Studies, 40,
419-426.
Ch. 30: Demand Analysis 1833
Diewert, W. E. (1973b) “Functional Forms for Profit and Transformation Functions”, Journal of
Economic Theory, 6, 284-316.
Diewert, W. E. (1974a) “Applications of Duality Theory”, Chapt. 3 in: M. D. Intriligator and D. A.
Kendrick, eds., Frontiers of Quantitiue Economics, American Elsevier: North-Holland, Vol. II.
Diewert, W. E. (1974b) “Intertemporal Consumer Theory and the Demand for Durables”,
Econometrica, 42, 497-516.
Diewert, W. E. (1980a) “Symmetry Conditions for Market Demand Functions”, Review of Economic
Studies, 47, 595-601.
Diewert, W. E. (1980b) “Duality Approaches to Microeconomic Theory”, in: K. J. Arrow and M.~ J.
Intriligator, eds., Handbook of Mathematical Economics. North-Holland.
Diewert, W. E. (1981) “The Economic Theory of Index Numbers: A Survey”, in: A. S. Deaton, ed.,
Essays in the Theory and Measurement of Consumer Behaviour in Honour of Sir Richard Stone.
Cambridge: Cambridge University Press.
Diewert, W. E. (1983) “The Theory of the Cost of Living Index and the Measurement of Welfare
Change”. University of British Columbia, mimeo.
Dicwert, W. E. and C. Parkan (1978) “Tests for Consistency of Consumer Data and Nonparametric
Index Numbers”. University of British Columbia: Working Paper 78-27, mimeo.
DuMouchel, W. H. and G. J. Duncan (1983) “Using Sample Survey Weights in Multiple Regression
Analyses of Statified Samples”, Journul of American Statistical Association, 78, 535-543.
Eisenberg, E. (1961) “Aggregation of Utility Functions”, Management Science, 7, 337-350.
Engel, E. (1895) “Die Lebenskosten Belgischer Arbeiterfamilien friiher und jetzt”, International
Stutistical Institute Bulletin, 9, l-74.
Epstein, L. and A. Yatchew (1985). “Non-parametric Hypothesis Testing Procedures and Applica-
tions to Demand Analysis”, University of Toronto, mimeo.
Evans, G. B. A. and N. E. Savin (1982) “Conflict Among the Criteria Revisited; the W, LR and LM
Tests”, Econometrica, 50, 737-748.
Federenko, N. P. and N. J. Rimashevskaya (1981) “The Analysis of Consumption and Demand in the
USSR”, in: A. S. Deaton, ed., Essays in the Theory and Measurement of Consumer Behauiour. New
York: Cambridge University Press.
Fiebig, D. G. and H. Theil(1983) “The Two Perils of Symmetry Constrained Estimation of Demand
Systems”, Economics Letters, 13, 105-111.
Fisher, F. M. and K. Shell (1971) “Taste and Quality Change in the Pure Theory of the True Cost of
Living Index”, in: Z. Griliches, ed., Price Indexes and Quality Changes: Studies in New Methods of
Meusurement. Cambridge: Harvard University Press.
Forsyth, F. G. (1960) “The Relationship Between Family Size and Family Expenditure”, Journal of
the Royal Statistical Society, Series A, 123, 367-397.
Freixas, X. and A. Mas-Cole11 (1983) “Engel Curves Leading to the Weak Axiom in the Aggregate”.
Harvard University, mimeo.
Frisch, R. (1932) New Methods of Measuring Marginal Utility. Tubingen: J.C.B. Mohr.
Frisch, R. (1959) “A Complete Scheme for Computing All Direct and Cross Demand Elasticities in a
Model with Many Sectors”, Econometrica, 27, 367-397.
Gallant, R. A. (1975) “Seemingly Unrelated Non-Linear Regressions”, Journal of Econometrics, 3,
35-50.
Gallant, R. A. (1981) “On the Bias in Flexible Functional Forms and an Essentially Unbiased Form:
The Fourier Functional Form”, Journal of Econometrics, 15,211-245.
Gallant, R. A. and G. H. Golub (1983) “Imposing Curvature Restrictions on Flexible Functional
Forms”. North Carolina State Umversity and Stanford University, mimeo.
Godambe, V. P. (1955) “A Unified Theory of Sampling From Finite Populations”, Journal of the
Royal Stutisticul Society, Series B, 17, 268-278.
Godambe, V. P. (1966) “A New Approach to Sampling from Finite Populations: Sufficiency and
Linear Estimation”, Journal of the Royal Statistical Society, Series B, 28, 310-319.
Goldberger, A. S. (1964) Econometric Theory. New York: Wiley.
Goldberger, A. S. (1967) “Functional Form and Utility: A Review of Consumer Demand Theory”.
Social Systems Research Institute, University of Wisconsin, mimeo.
Gorman, W. M. (1953) “Community Preference Fields”, Econometrica 21, 63-80.
Gorman, W. M. (1956, 1980) “A Possible Procedure for Analysing Quality Differentials in ihe Egg
Market”, Review of Economic Studies, 47, 843-856.
Gorman, W. M. (1959) “Separable Utility and Aggregation”, Econometrica, 27, 469-481.
1834 A. Deuton
Hurwicz, L. and H. Uzawa (1971) “On the Integrability of Demand Functions”, in: J. S. Chipman,
L. Hurwicz. M. K. Richter and H. F. Sonnenschein, eds., Preference, Utility and Demand. New
York: Harcourt, Brace, Jovanovich, 114-148.
Iyengar, N. S.. L. R. Jain and T. N. Srinivasar (1968) “Economies of Scale in Household Consump-
tion: A Case Study”, Indian Economic Journal, Econometric Annual, 15, 465-477.
Jackson, C. (196&) “Revised Equivalence Scales for Estimating Equivalent Incomes for Budget Costs
by Family Type”, BLS Bulletin, U.S. Dept. of Labor, 1570-1572.
Jerison, M. (1984) “Aggregation and Pairwise Aggregation of Demand When the Distribution of
Income is Fixed”, Journal of Economic Theory, forthcoming.
Jorgenson, D. W. and L. J. Lau (1975) “The Structure of Consumer Preferences”, Annuls of Economic
und Social Measurement, 4, 49-101.
Jorgenson, D. W. and L. J. Lau (1976) “Statistical Tests of the Theory of Consumer Behaviour”, in:
H. Albach. E. Helmstadter and R. Henn, eds., Quantitative Wirtschuftsforschung. Tubingen: J.C.B.
Mohr.
Jorgenson, D. W., L. J. Lau and T. Stoker (1982) “The Transcendental Logarithmic Model of
Aggregate Consumer Behavior”, Advances in Econometrics, 1, JAI Press.
Kannai, Y. (1977) “Concavifiability and Constructions of Concave Utility Functions”, Journal of
Muthematicul Economics, 4, l-56.
Kay, J. A., M. J. Keen and C. N. Morris (1984) “Consumption, Income, and the Interpretation of
Household Expenditure Data”, Journul of Public Economics, 23, 169-181.
Kina, M. A. (1980) “An Econometric Model of Tenure Choice and Demand for Housing as a Joint
D&ision”, Journul of Public Economics, 14, 137-159.
Klein, L. R. and H. Rubin (1947-48) “A Constant Utility Index of the Cost of Living”, Review of
Economic Studies, 15, 84-87.
Kuznets, S. (1962) “Quantitative Aspects of the Economic Growth of Nations: VII The Share and
Structure of Consumption”, Economic Development and Cultural Change, 10, l-92.
Kuznets, S. (1966) Modern Economic Growth. New Haven: Yale University Press.
Laitinen, K. (1978) “Why is Demand Homogeneity so Often Rejected?“, Economics Letters, 1,
187-191.
Lancaster, K. J. (1966) “A New Approach to Consumer Theory”, Journal of Political Economy, 74,
132-157.
Lau, L. J. (1978) “Testing and Imposing Monotonicity, Convexity, and Quasi-Concavity”, in: M. Fuss
and D. McFadden, eds., Production Economics: A Dual Approach to Theory and Applications.
Amsterdam: North-Holland.
Lau, L. J. (1982) “A Note on the Fundamental Theorem of Exact Aggregation”, Economics Letters, 9,
119-126.
Lee, L. F. and M. M. Pitt (1983) “Specification and Estimation of Demand Systems with Limited
Dependent Variables”. University of Minnesota, mimeo.
Leser, C. E. V. (1963) “Forms of Engel Functions”, Econometricu, 31, 694-703.
Lluch, C. (1971) “Consumer Demand Functions, Spain, 1958-64”, European Economic Review, 2,
227-302.
Lluch, C. (1973) “The Extended Linear Expenditure System”, European Economic Review, 4, 21-32.
Lluch, C., A. A. Powell and R. A. Williams (1977) Patterns in Household Demand and Saving. Oxford:
Oxford University Press for the World Bank.
Lluch, C. and R. A. Williams (1974) “Consumer Demand Systems and Aggregate Consumption in the
U.S.A.: An Application of the Extended Linear Expenditure System”, Cunudian Journal of
Economics, 8, 49-66.
MaCurdy, T. E. (1981) “An Empirical Model of Labor Supply in a Life-Cycle Setting”, Journal of
Politicul Economy, 89, 1059-1085.
Malinvaud, E. (1970) ‘Statistical Methods of Econometrics. Amsterdam: North-Holland.
Manser, M. E. and R. J. McDonald (1984) “An Analysis of the Substitution Bias in Measuring
Inflation”, Bureau of Labor Statistics, mimeo.
Marquardt, D. W. (1963) “An Algorithm for Least-Squares Estimation on Non-Linear Parameters”,
Journal of the Society of Industrial and Applied Mathematics, 11, 431-441.
Mayo, S. K. (1978) “Theory and Estimation in the Economics of Housing Demand”, Journal of
Urban Economics, 14,137-159.
McClements, L. D. (1977) “Equivalence Scales for Children”, Journal of Public Economics, 8,
191-210.
1836 A. De&on
McFadden, D. (1978) “Costs, Revenue, and Profit Functions”, in: M. Fuss and D. McFadden, eds.,
Production Economics: A Dual Approach to Theory and Applications. Amsterdam: North-Holland.
McGuire, T. W., J. W. Farley, R. E. Lucas and R. L. Winston (1968) “Estimation and Inference for
Linear Models in which Subsets of the Dependent Variable are Constrained”, Journal of the
American Statistical Association, 63, 1201-1213.
Meisner. J. F. (1979) “The Sad Fate of the Asymptotic Slutsky Symmetry Test for Large Systems”,
Economic Letters, 2, 231-233.
Muellbauer, .I. (1974) “Household Composition, Engel Curves and Welfare Comparisons Between
Households: A Duality Approach”, European Economic Review, 103-122.
Muellbauer, J. (1975a) “The Cost of Living and Taste and Quality Change”, Journul of Economic
Theor?;, 10, 269-283.
Muellbauer, .I. (1975b) “Aggregation, Income Distribution and Consumer Demand”, Review of
Economic Studies, 62, 525-543.
Muellbauer, J. (1976a) “Community Preferences and the Representative Consumer”, Econometrica,
44, 979-999.
Muellbauer, J. (1976b) “Economics and the Representative Consumer”, in: L. Solari and J-N. du
Pasquier, eds., Private and Enlarged Consumption. Amsterdam: North-Holland for ASEPELT,
29-53.
Muellbauer, J. (1976~) “Can We Base Welfare Comparisons Across Households on Behaviour?“.
London: Birkbeck Coliege, mimeo.
Muellbauer, J. (1977) “Testing the Barten Model of Household Composition Effects and the Cost of
Children”, Economic Journal, 87, 460-487.
Muellbauer, J. (1980) “The Estimation of the Prais-Houthakker Model of Equivalence Scales”,
Econometrica, 48, 153-176.
Muellbauer, J. (1981a) “Testing Neoclassical Models of the Demand for Consumer Durables”, in:
A. S. Deaton, ed., Essays in the Theory and Measurement of Consumer Behaviour. New York:
Cambridge University Press.
Muellbauer, J. (1981b) “Linear Aggregation in Neoclassical Labour Supply”, Review of Economic
Studies, 48, 21-36.
Musgrove, P. (1978) Consumer Behavior in Latin America: Income and Spending of Families in Ten
Andean Cities. Washington: Brookings.
Neary, J. P. and K. W. S. Roberts (1980) “The Theory of Household Behaviour Under Rationing”,
European Economic Review, 13, 25-42.
Nicholson, J. L. (1949) “Variations in Working Class Family _ Expenditure”,
_ Journal of the Royal
Statistical Society, Skies A, 112, 359-411. -
Ohta. M. and 2. Griliches (19761 “Automobile Prices Revisited: Extensions of the Hedonic Hvuothe-
sis”, in: N. Terleckyj, ed:, Ho’usehold Production and Consumption. New York: National B&au of
Economic Research.
Orshansky, M. (1965) “Counting the Poor: Another Look at the Poverty Profile”, Social Security
Bulletin, 28, 3-29.
Parks, R. W. (1969) “Systems of Demand Equations: An Empirical Comparison of Alternative
Functional Forms”, Econometrica, 37, 629-650.
Pearce, I. F. (1964) A Contribution lo Demand Analysis. Oxford University Press.
Phlips, L. (1972) “A Dynamic Version of the Linear Expenditure Model”, Review of Economics and
Statistics, 54, 450-458.
Phlips, L. (1974) Applied Consumprion Analysis. Amsterdam and Oxford: North-Holland, second
edition 1983.
Pigou, A. C. (1910) “A Method of Determining the Numerical Value of Elasticities of Demand”,
Economic Journal, 20,636-640.
Pollak, R. A. (1975) “Subindexes in the Cost-of-Living Index”, International Economic Review, 16,
135-150.
Pollak, R. A. and T. J. Wales (1978) “Estimation of Complete Demand Systems from Household
Budget Data”, American Economic Review, 68, 348-359.
Pollak, R. A. and T. J. Wales (1979) “Welfare Comparisons and Equivalence Scales”, American
Economic Review, pap & proc 69, 216-221.
Pollak, R. A. and T. J. Wales (1980) “Comparison of the Quadratic Expenditure System and Translog
Demand Systems with Alternative Specifications of Demographic Effects”, Economerrica, 48,
595-612.
Ch. 30: Demand Analysis 1837
Pollak, R. A. and T. .I. Wales (1981) “Demographic Variables in Demand Analysis”, Econometrica,
49,1533-1551.
Powell, A. A. (1969) “Aitken Estimators as a Tool in Allocating Predetermined Aggregates”, Journal
of the American Statistical Association, 64, 913-922.
Prais, S. J. (1959) “A Comment”, Econometrica, 27, 127-129.
Prais, S. J. and H. S. Houthakker (1955) The Analysis of Family Budgets. Cambridge: Cambridge
University Press, second edition 1971.
Pudney, S. E. (1980) “Disaggregated Demand Analysis: The Estimation of a Class of Non-Linear
Demand Systems”, Review of Economic Studies, 47, 875-892.
Pudney, S. E. (1981a) “Instrumental Variable Estimation of a Characteristics Model of Demand”,
Review of Economic Studies, 48,417-433.
Pudney, S. E. (1981b) “An Empirical Method of Approximating the Separable Structure of Consumer
Preferences”, Review of Economic Studies, 48, 561-577.
Quandt, R. E. (1983) “Computational Problems and Methods”, Handbook of Econometrics. Chapter
12, Vol. 1.
Reece, W. S. and K. D. Zieschang (1985) “Consistent Estimation of the Impact of Tax Deductibility
on the Level of Charitable Contributions”, Econometrica, forthcoming.
Rothbarth, E. (1941) “The Measurement of Change in Real Income Under Conditions of Rationing”,
Review of Economic Studies, 8, 100-107.
Rothbarth, E. (1943) “Note on a Method of Determining Equivalent Income for Families of Different
Composition”, Appendix 4 in: C. Madge, ed., War-Time Pattern of Saving and Spending. Occasional
paper No. 4., London: National Institute of Economic and Social Research.
Roy, R. (1942) De I’Utilith, Contribution b la Theorie des Choix. Paris: Hermann.
Russell, T. (1983) “On a Theorem of German”, Economics Letters, 11, 223-224.
Samuelson, P. A. (1938) “A Note on the Pure Theory of Consumer Behaviour”, Economica, 5, 61-71.
Samuelson, P. A. (1947) Foundations of Economic Analysis. Cambridge: Harvard University Press.
Samuelson, P. A, (1947-48) “Some Implications of Linearity”, Review of Economic Studies, 15, 88-90.
Samuelson, P. A. (1948) “Consumption Theory in Terms of Revealed Preference”, Economica, 15,
243-253.
Samuelson, P. A. (1956) “Social Indifference Curves”, Quarterly Journal of Economics, 70, l-22.
Sargan, J. D. (1964) “Wages and Prices in the United Kingdom” in: P. E. Hart, C. Mills and J. K.
Whitaker, eds., Econometric Analysis for National Economic Planning. London: Butterworths.
Sargan, J. D. (1971) “Production Functions”, Part V in: P. R. G. Layard, J. D. Sargan, M. E. Ager
and D. J. Jones, eds., Qualified Manpower and Economic Performance. London: Penguin Press.
Seneca, J. J. and M. K. Taussig (1971) “Family Equivalence Scales and Personal Income Tax
Exemptions for Children”, Review of Economics and Statistics, 53, 253-262.
Shapiro, P. (1977) “Aggregation and the Existence of a Social Utility Functions”, Review of Economic
Studies, 46, 653-665.
Shapiro, P. and S. Braithwait (1979) “Empirical Tests for the Existence of Group Utility Functions”,
Review of Economic Studies, 46, 653-665.
Shephard, R. (1953) Cost and Production Functions. Princeton: Princeton University Press.
Simmons, P. (1980) “Evidence on the Impact of Income Distribution on Consumer Demand in the
U.K. 1955-68”, Review of Economic Studies, 47, 893-906.
Singh, B. (1972) “On the Determination of Economies of Scale in Household Consumption”,
International Economic Review, 13, 257-270.
Singh, B. (1973) “The Effect of Household Composition on its Consumption Pattern”, Sankhya,
Series B, 35, 207-226. .
Singh B. and A. L. Nagar (1973) “Determination of Consumer Unit Scales”, Econometrica, 41,
347-355.
Spinnewyn, F. (1979a) “Rational Habit Formation”, European Economic Review, 15, 91-109.
Spinnewyn, F. (1979b) “The Cost of Consumption and Wealth in a Model with Habit Formation”,
Economics Letters, 2, 145-148.
Srivastava, V. K. and T. D. Dwivedi (1979) “Estimation of Seemingly Unrelated Regression
Equations: A Brief Survey”, Journal of Econometrics, 10, 15-32.
Stoker, T. (1982) “The Use of Cross-Section Data to Characterize Macro Functions”, Journal of the
American Statistical Association, 77, 369-380.
1838 A. Deaion
Stoker, T. (1985) “Completeness, Distribution Restrictions and the Form of Aggregate Functions”,
Econometrico, forthcoming.
Stone, J. R. N. (1954) “Linear Expenditure Systems and Demand Analysis: An Application to the
Pattern of British Demand”, Economic Journal, 64, 511-527.
Stone, J. R. N. (1956) Quantity and Price Indexes in National Accounts. Paris: OEEC.
Stone, R. and D. A. Rowe (1957) “The Market Demand for Durable Goods”, Econometrica, 25,
423-443.
Stone, R. and D. A. Rowe (1958) “Dynamic Demand Functions: Some Econometric Results”,
Economic Journal, 27, 256-70.
Summers, R. (1959) “A Note on Least Squares Bias in Household Expenditure Analysis”,
Econometrica, 27, 121-126.
Sydenstricker, E. and W. I. King (1921) “The Measurement of the Relative Economic Status of
Families”, Quarterly Publication of the American Statistical Association. 17, 842-857.
Szakolnai, G. (1980) “Limits to Redistribution: The Hungarian Experience”, in: D. A. Collard,
R. Lecomber and M. Slater, eds., Income Distribution, the Limits to Redistribution. Bristol:
Scientechnica.
Theil. H. (1954) Lineur Aggregation of Economic Relations. Amsterdam: North-Holland.
Theil. H. (1965) “The Information Approach to Demand Analysis”, Econometricu, 33. 67-87.
Theil, H. (1971a) Principles of Econometrics. Amsterdam: North-Holland.
Theil, H. (1971b) “An Economic Theory of the Second Moments of Disturbances of Behavioural
Equations”, American Economic Review, 61, 190-194.
Theil. H. (1974) “A Theorv of Rational Random Behavior”. Journal of the American Statistical
Assoctution, 69, 310-314. ’
Theil, H. (1975a) “The Theory of Rational Random Behavior and its Application to Demand
Analysis”, European Economic Review, 6, 217-226.
Theil, H. (1975b) Theory and Measurement of Consumer Demand. North-Holland, Vol. I.
Theil, H. (1976) Theory and Measurement of Consumer Demand. North-Holland, Vol. II.
Theil, H. (1979) The System- Wide Approach to Microeconomics. Chicago: University of Chicago Press.
Theil, H. and K. Laitinen (1981) “The Independence Transformation: A Review and Some Further
Explorations”, in: A. S. Deaton, ed., Essays in the’Theory and Measurement of Consumer Behaoiour.
New York: Cambridge University Press.
Theil, H. and M. Rosalsky (1984) “More on Symmetry-Constrained Estimation”. University of
Florida at Gainesville, mimeo.
Theil, H. and F. E. Suhm (1981) International Consumption Comparisons: A S,vstem- Wide Approuch.
Amsterdam: North-Holland.
Tbursby, J. and C. A. Knox Love11 (1978) “An Investigation of the Kmenta Approximation to the
CES Function”, International Economic Review, 19, 363-377.
Tobin, J. (1952) “A Survey of the Theory of Rationing”, Econometrica, 20, 512-553.
Tobin, J. (1958) “Estimation of Relationships for Limited Dependent Variables”, Econometrica, 26,
24-36.
Tobin, J. and H. S. Houthakker (1951) “The Effects of Rationing on Demand Elasticities”, Review of
Economic Studies, 18, 140-153.
Tomqvist, L. (1941) “Review”, Ekonomisk Tidrkrtjt, 43, 216-225.
Varian, II. R. (1978) “A Note on Locally Constant Income Elasticities”, Economics Letfers, 1, 5-9.
Varian, H. R. (1982) “The Nonparametric Approach to Demand Analysis”, Econometriccr, 50,
945-973.
Varian, H. R. (1983) “Nonparametric Tests of Consumer Behavior”, Review oj Economic Studies, 50,
99-110.
Varian, H. R. (1984) “Nonparametric Analysis of Optimizing Behavior with Measurement Error”.
University of Michigan, mimeo.
Vartia, Y. 0. (1983) “Efficient Methods of Measuring Welfare Change and Compensated Income in
Terms of Market Demand Functions”, Econometrica, 51, 79-98.
Wales, T. J. (1977) “On the Flexibility of Flexible Functional Forms: An Empirical Approach”,
Journal of Econometrics, 5, 183-193.
Wales. T. J. and A. D. Woodland (1983) “Estimation of Consumer Demand Systems with Binding
Non-Negativity Constraints”, Journal of Econometrics, 21, 263-285.
Ch. 30: Demand Analysis 1839
White, H. (1980) “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for
Heteroskedasticity”, Econometrica, 48, 817-838.
Willig, R. (1976) “Integrability Implications for Locally Constant Demand Elasticities”, Journal of
Economic Theory, 12, 391-401.
de Wolff, P. (1941) “Income Elasticity of Demand, a Micro-Economic and a Macro-Economic
Interpretation”, Economic Journal, 51, 104-145.
Woodland, A. (1979) “Stochastic Specification and the Estimation of Share Equations”, Journal of
Econometrics, 10, 361-383.
Working, H. (1943) “Statistical Laws of Family Expenditure”, Journal of the American Statistical
Association, 38, 43-56.
Wu, D-M. (1973) “Alternative Tests of Independence Between Stochastic Regressors and Dis-
turbances”, Econometrica, 41, 733-750.
Yoshihara, K. (1969) “Demand Functions: An Application to the Japanese Expenditure Pattern”,
Econometrica, 37, 257-274.
Zellner, A. (1962) “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for
Aggregation Bias”, Journal of the American Statistical Association, 57, 348-368.
Chapter 31
Harvard University
Contents
1. Introduction 1842
1.1. Production theory 1842
1.2. Parametric form 1844
1.3. Statistical method 1845
1.4. Overview of the paper 1847
2. Price functions 1848
2.1, Duality 1849
2.2. Substitution and technical change 1851
2.3. Parametrization 1855
2.4. Integrability 1857
3. Statistical methods 1860
3.1. Stochastic specification 1860
3.2. Autocorrelation 1862
3.3. Identification and estimation 1865
4. Applications of price functions 1871
4.1. Substitution 1872
4.2. Technical change 1876
4.3. Two stage allocation 1882
5. Cost functions 1884
5.1. Duality 1884
5.2. Substitution and economies of scale 1886
5.3. Parametrization and integrability 1889
5.4, Stochastic specification 1891
6. Applications of cost functions 1893
6.1. Economies of scale 1893
6.2. Multiple outputs 1897
7. Conclusion 1900
7.1. General equilibrium modeling 1900
7.2. Panel data 1902
7.3. Dynamic models of production 1904
References 1905
1. Introduction
optimization. The principal analytical tool employed for this purpose is the
implicit function theorem.’
Unfortunately, the characterization of demands and supplies as implicit func-
tions of relative prices is inconvenient for econometric applications. In specifying
an econometric model of producer behavior the demands and supplies must be
expressed as explicit functions. These functions can be parametrized by treating
measures of substitution, technical change, and economies of scale as unknown
parameters to be estimated on the basis of empirical data.
The traditional approach to modeling producer behavior begins with the
assumption that the production function is additive and homogeneous. Under
these restrictions demand and supply functions can be derived explicitly from the
production function and the necessary conditions for producer equilibrium.
However, this approach has the disadvantage of imposing constraints on patterns
of production - thereby frustrating the objective of determining these patterns
empirically.
The traditional approach was originated by Cobb and Douglas (1928) and was
employed in empirical research by Douglas and his associates for almost two
decades.2 The limitations of this approach were made strikingly apparent by
Arrow, Chenery, Minhas, and Solow (1961, henceforward ACMS), who pointed
out that the Cobb-Douglas production function imposes a priori restrictions on
patterns of substitution among inputs. In particular, elasticities of substitution
among all inputs must be equal to unity.
The constant elasticity of substitution (CES) production function introduced by
ACMS adds flexibility to the traditional approach by treating the elasticity of
substitution as an unknown parameter.3 However, the CES production function
retains the assumptions of additivity and homogeneity and imposes very stringent
limitations on patterns of substitution. McFadden (1963) and Uzawa (1962) have
shown, essentially, that elasticities of substitution among all inputs must be the
same.
The dual formulation of production theory has made it possible to overcome
the limitations of the traditional approach to econometric modeling. This formu-
lation was introducted by Hotelling (1932) and later revived and extended by
Samuelson (1954, 1960)4 and Shephard (1953, 1970).5 The key features of the
‘This approach to production theory is employed by Carlson (1939). Frisch (1965), and Schneider
(1934). The English edition of Frisch’s book is a translation from the ninth edition of his lectures,
published in Norwegian in 1962; the first edition of these lectures dates back to 1926.
*These studies are summarized by Douglas (1948). See also: Douglas (1967, 1976). Early economet-
ric studies of producer behavior, including those based on the Cobb-Douglas production function,
have been surveyed by Heady and Dillon (1961) and Walters (1963). Samuelson (1979) discusses the
impact of Douglas’s research.
3Econometric studies based on the CES production function have been surveyed by Griliches
(1967), Jorgenson (1974) Kennedy and Thirlwall (1972). Nadiri (1970), and Nerlove (1967).
1844 D. W. Jorgenson
4Hotelling (1932) and Samuelson (1954) develop the dual formulation of production theory on the
basis of the Legendre transformation. This approach is employed by Jorgenson and Lau (lY74a,
1974b) and Lau (1976,197Sa).
5Shephard utilizes distance functions to characterize the duality between cost and production
functions. This approach is employed by Diewert (1974a, lY82), Hanoch (1978), McFadden (1978),
and Uzawa (1964).
6Surveys of duality in the theory of production are presented by Diewert (1982) and Samuelson
(1983).
‘This approach to the selection of parametric forms is discussed by Diewert (1974a), Fuss,
McFadden, and Mundlak (1978). and Lau (1974).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1845
increase demand for the input and are said to use the input; if the bias is negative,
changes in technology decrease demand for the input and are said to save input.
If technical change neither uses nor saves an input, the change is neutral in the
sense of Hicks.
By treating measures of substitution and technical change as fixed parameters
the system of demand and supply functions can be generated by integration.
Provided that the resulting functions are themselves integrable, the underlying
price or cost function can be obtained by a second integration. As we have
already pointed out, Hicks’s elasticity of substitution is unsatisfactory for this
purpose, since it leads to arbitrary restrictions on patterns of producer behavior.
The introduction of a new measure of substitution, the share elasticity, by
Christensen, Jorgenson, and Lau (1971, 1973) and Samuelson (1973) has made it
possible to overcome the limitations of parametric forms based on constant
elasticities of substitution.’ Share elasticities, like biases of technical change, can
be defined in terms of shares of inputs in the value of output. The share elasticity
of a given input is the response of the share of that input to a proportional change
in the price of an input.
By taking share elasticities and biases of technical change as fixed parameters,
demand functions for inputs &th constant share elasticities and constant biases
of technical change can be obtained by integration. The shares of each input in
the value of output can be taken to be linear functions of the logarithms of input
prices and of the level of technology. The share elasticities and biases of technical
change can be estimated as unknown parameters of these functions.
The constant share elasticity (CSE) form of input demand functions can be
integrated a second time to obtain the underlying price or cost function. For
example, the logarithm of the price of output can be expressed as a quadratic
function of the logarithms of the input prices and the level of technology. The
price of output can be expressed as a transcendental or, more specifically, an
exponential function of the logarithms of the input prices.’ Accordingly,
Christensen, Jorgenson, and Lau refer to this parametric form as the translog
price functi0n.l’
the same set of independent variables-for example, relative prices and the level
of technology. The variables may enter these functions in a nonlinear manner, as
in the translog demand functions proposed by Christensen, Jorgenson, and Lau.
The functions may also be nonlinear in the parameters. Finally, the parameters
may be subject to nonlinear constraints arising from the theory of production.
The selection of a statistical method for estimation of systems of demand and
supply functions depends on the character of the data set. For cross section data
on individual producing units, the prices that determine demands and supplies
can be treated as exogenous variables. The unknown parameters can be estimated
by means of nonlinear multivariate regression techniques. Methods of estimation
appropriate for this purpose were introduced by Jennrich (1969) and Malinvaud
(1970,1980).1’
For time series data on aggregates such as industry groups, the prices that
determine demands and supplies can be treated as endogenous variables. The
unknown parameters of an econometric model of producer behavior can be
estimated by techniques appropriate for systems of nonlinear simultaneous equa-
tions. One possible approach is to apply the method of full information maximum
likelihood. However, this approach has proved to be impractical, since it requires
the likelihood function for the full econometric model, not only for the model of
producer behavior.
Jorgenson and Laffont (1974) have developed limited information methods for
estimating the systems of nonlinear simultaneous equations that arise in modeling
producer behavior. Amemiya (1974) proposed to estimate a single nonlinear
structural equation by the method of nonlinear two stage least squares. The first
step in this procedure is to linearize the equation and to apply the method of two
stage least squares to the linearized equation. Using the resulting estimates of the
coefficients of the structural equation, a second linearization can be obtained and
the process can be repeated.
Jorgenson and Laffont extended Amemiya’s approach to a system of nonlinear
simultaneous equation by introducing the method of nonlinear three stage least
squares. This method requires an estimate of the covariance matrix of the
disturbances of the system of equations as well as an estimate of the coefficients
of the equations. The procedure is initiated by linearizing the system and applying
the method of three stage least squares to the linearized system. This process can
be repeated, using a second linearization.12
It is essential to emphasize the role of constraints on the parameters of
econometric models implied by the theory of production. These constraints may
take the form of linear or nonlinear restrictions on the parameters of a single
“Methods for estimation of nonlinear multivariate regression models are summarized by Malinvaud
(1980).
“Nonlinear two and three stage least squares methods are also discussed by Amemiya (1977),
Gallant (1977), and Gallant and Jorgenson (1979).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1847
This paper begins with the simplest form of the econometric methodology for
modeling producer behavior. This methodology is based on production under
constant returns to scale. The dual representation of the production function is a
price function, giving the price of output as a function of the prices of inputs and
the level of technology. An econometric model of producer behavior is generated
by differentiating the price function with respect to the prices and the level of
technology.
We present the dual formulation of the theory of producer behavior under
constant returns to scale in Section 2. We parameterize this model by taking
measures of substitution and technical change to be constant parameters. We
than derive the constraints on these parameters implied by the theory of produc-
tion. In Section 3 we present statistical methods for estimating this model of
producer behavior under linear and nonlinear restrictions. Finally, we illustrate
the application of this model by studies of data on individual industries in Sec-
tion 4.
In Section 5 we consider the extension of econometric modeling of producer
behavior to nonconstant returns to scale. In regulated industries the price of
output is set by regulatory authority. Given the demand for output as a function
of the regulated price, the level of output can be taken as exogenous to the
producing unit. Necessary conditions for producer equilibrium can be derived
from cost minimization. The minimum value of total cost can be expressed as a
function of the level of output and the prices of all inputs. This cost function
provides a dual representation of the production function.
2. Price functions
The purpose of this section is to present the simplest form of the econometric
methodology for modeling producer behavior. We base this methodology on a
production function with constant returns to scale. Producer equilibrium implies
the existence of a price function, giving the price of output as a function of the
prices of inputs and the level of technology. The price function is dual to the
production function and provides an alternative and equivalent description of
technology.
An econometric model of producer behavior takes the form of a system of
simultaneous equations, determining the distributive shares of the inputs and the
rate of technical change. Measures of substitution and technical change give the
responses of the distributive shares and the rate of technical change to changes in
prices and the level of technology. To generate an econometric model of producer
behavior we treat these measures as unknown parameters to be estimated.
The economic theory of production implies restrictions on the parameters of an
econometric model of producer behavior. These restrictions take the form of
linear and nonlinear constraints on the parameters. Statistical methods employed
in modeling producer behavior involve the estimation of systems of nonlinear
Ch. 31: Econometric Methods for Modeling Producer Behavior 1 x49
2. I. Duality
PJXj
u_=
J
- (j=l,Z...J).
ClY ’
Under competitive markets for output and all inputs the necessary conditions for
producer equilibrium are given by equalities between the share of each input in
the value of output and the elasticity of output with respect to that input:
u=S(x,t), (2.2)
where
15Time series and cross section differences in technology have been incorporated into a model
of substitution and technical change in U.S. agriculture by Binswanger (1974a, 1974b, 1978~).
Binswanger’s study is summarized in Section 4.2 below.
1850 D. W. Jorgenson
Under constant returns to scale the elasticities and the value shares for all
inputs sum to unity:
i,u=i’alny =I
dlnx ’
where i is a vector of ones. The value of output is equal to the sum of the values
of the inputs.
Finally, we can define the rate of technical change, say u,, as the rate of growth
of the quantity of output holding all inputs constant:
ut =$y(x,t). (2.3)
It is important to note that this definition does not impose any restriction on
patterns of substitution among inputs.
Given the identity between the value of output and the value of all inputs and
given equalities between the value share of each input and the elasticity of output
with respect to that input, we can express the price of output as a function, say Q,
of the prices of all inputs and the level of technology:
q=Q(~,t). (2.4)
We refer to this as the price function for the producing unit.
The price function Q is dual to the production function F and provides an
alternative and equivalent description of the technology of the producing unit.16
We can formalize this description in terms of the following properties of the price
function:
Given differentiability of the price function, we can express the value shares of
all inputs as elasticities of the price function with respect to the input prices:
u=z(P,t), (2.5)
“The dual formulation of production theory under constant returns to scale is due to Samuelson
(1954).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1851
where:
Further, we can express the negative of the rate of technical change as the rate of
growth of the price of output, holding the prices of all inputs constant:
(2.6)
Since the price function Q is homogeneous of degree one in the input prices,
the value shares and the rate of technical change are homogeneous-of degree zero
and the value shares sum to unity:
i,“=i’alnq=1
alnp .
Since the price function is increasing in the input prices the value shares must be
nonnegative,
u 22 0.
u 2 0,
We have represented the value shares of all inputs and the rate of technical
change as functions of the input prices and the level of technology. We can
introduce measures of substitution and technical change to characterize these
functions in detail. For this purpose we differentiate the logarithm of the price
function twice with respect to the logarithms of input prices to obtain measures of
substitution:
the input prices. If a share elasticity is positive, the corresponding value share
increases with the input price. If a share elasticity is negative, the value share
decreases with the input price. Finally, if a share elasticity is zero, the value
share is independent of the price.17
Second, we can differentiate the logarithm of the price function twice with
respect to the logarithms of input prices and the level of technology to obtain
measures of technical change:
P-8)
(2.9)
17The share elasticity was introduced by Christensen, Jorgenson, and Lau (1971, 1973) and
Samuelson (1973).
“This definition of the bias of technical change is due to Hicks (1963). Alternative definitions of
biases of technical change are compared by Binswanger (1978b).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1853
+NH.N=U,,+w’-V;
the price of output q is positive and the matrices N and V are diagonal:
(2.10)
where the function P is independent of the J - K input prices { pK+ I, pK12.. . pJ}
and the level of technology t. *’ We say that the price function is homothetically
separable if the function P in (2.10) is homogeneous of degree one.21 Separability
of the price function implies homothetic separability.22
(2.11)
The total cost of the K inputs included in the price index P, say c, is the sum
of expenditures on all K inputs:
c= c PkXk.
k=l
We can define the quantity index G for this aggregate as the ratio of total cost to
the price index P:
G=$. (2.12)
The product of the price and quantity indexes for the aggregate is equal to the
cost of the K inputs.24
We can analyze the implications of homothetic separability by introducing
price and quantity indexes of aggregate input and defining the value share of
aggregate input in terms of these indexes. An aggregate input can be treated in
precisely the same way as any other input, so that price and quantity indexes can
be used to reduce the dimensionality of the space of input prices and quantities.
The price index generates a second stage of the model, by treating the price of
each aggregate as a function of the prices of the inputs making up the aggregate.25
2.3. Parametrization
In the theory of producer behavior the dependent variables are value shares of all
inputs and the rate of technical change. The independent variables are prices of
inputs and the level of technology. The purpose of an econometric model of
producer behavior is to characterize the value shares and the rate of technical
change as functions of the input prices and the level of technology.
To generate an econometric model of producer behavior a natural approach is
to treat the measures of substitution and technical change as unknown parameters
to be estimated. For this purpose we introduce the parameters:
(2.13)
u=a,+B,,lnp+j$,.t,
1np=cu~+a~1np+a;t+~1np’B,,1np+1np’&;t+~~,;t2, (2.15)
output so that it is equal to unity where t is zero, we can set this parameter equal
to zero. This represents a choice of scale for measuring the quantity and price of
output.
For the price function (2.15) the price of output is a transcendental or, more
specifically, an exponential function of the logarithms of the input prices. We
refer to this form as the transcendental logarithmic price function or, more simply,
the translog price function, indicating the role of the variables. We can also
characterize this price function as the constant share elasticity or CSE price
function, indicating the role of the fixed parameters. In this representation the
scalars - (Y,, p, -the vectors - (Ye,&, - and the matrix Bpp are constant parameters
that reflect the underlying technology. Differences in levels of technology among
time periods for a given producing unit or among producing units at a given point
of time are represented by differences in the level of technology t.
For the translog price function the negative of the average rates of technical
change at any two levels of technology, say t and t - 1, can be expressed as the
difference between successive logarithms of the price of output, less a weighted
average of the differences between successive logarithms of the input prices with
weights given by the average value shares:
-U,=lnq(t)-lnq(t-l)-iY[lnp(t)-lnp(t-l)]. (2.16)
v,=f[u,(t)+u,(t-l)],
U=i[u(t)+U(t-.l)].
“Arrow, Chenery, Minhas, and Solow (1961) have derived the CES production function as an exact
representation of a model of producer behavior with a constant elasticity of substitution.
Ch. 31: Econometric Methods for Modeling Producer Behuvior 1857
2.4. Integrability
2.4, I. Homogeneity
The value shares and the rate of technical change are homogeneous of degree zero
in the input prices.
We first represent the value shares and the rate of technical change as a sys-
tem of eqs. (2.14). Homogeneity of the price function implies that the
parameters - Bpp, BPt -in this system must satisfy the restrictions:
Bp,i = 0,
(2.17)
&,i = 0,
where i is a vector of ones. For J inputs there are J+l restrictions implied by
homogeneity.
‘aAn alternative approach to the generation of the translog parametric form for the production
function by means of the Taylor’s series was originated by Kmenta (1967). Kmenta employs a Taylor’s
series expansion in terms of the parameters of the CES production function. This approach imposes
the same restrictions on patterns of production as those implied by the constancy of the elasticity of
substitution. The Kmenta approximation is employed by Griliches and Ringstad (1971) and Sargan
(1971), among others, in estimating the elasticity of substitution.
1858 D. W. Jorgenson
ffbi =l,
Bipi = 0,
&i = 0. (2.18)
2.4.3. Symmetry
The matrix of share elasticities, biases of technical change, and the deceleration of
technical change must be symmetric.
A necessary and sufficient condition for symmetry is that the matrix of
parameters must satisfy the restrictions:
(2.19)
2.4.4. Nonnegativity
alnq
alnp >=O’
For the translog price function the conditions for monotonicity take the form:
alnq
-=~~~+B~,lnp+/$;t~O. (2.20)
alnp
Since the translog price function is quadratic in the logarithms of the input prices,
we can always choose prices so that the monotonicity of the price function is
Ch. 31: EconometricMethoakfor Modeling Producer Behavior 1859
2.4.5. Monotonicity
Bpp = TDT’,
where:
T=
The matrix of constant share elasticities Bpp must satisfy restrictions implied by
symmetry and product exhaustion. These restrictions imply that the parameters of
29This approach to global concavity was originated by Jorgenson and Fraumeni (1981). Caves and
Christensen (1980) have compared regions where concavity obtains for alternative parametric forms
1860 D. W. Jorpmon
3. Statistical methods
Our model of producer behavior is generated from a translog price function for
each producing unit. To formulate an econometric model of production and
technical change we add a stochastic component to the equations for the value
shares and the rate of technical change. We associate this component with
unobservable random disturbances at the level of the producing unit. The
producer maximizes profits for given input prices, but the value shares of inputs
are subject to a random disturbance.
The random disturbances in an econometric model of producer behavior may
result from errors in implementation of production plans, random elements in the
technology not reflected in the model of producer behavior, or errors of measure-
ment in the value shares. We assume that each of the equations for the value
shares and the rate of technical change has two additive components. The first is a
nonrandom function of the input prices and the level of technology; the second is
an unobservable random disturbance that is functionally independent of these
variables.31
3. I. Stochastic specification
30The Cholesky factorization was first proposed for imposing local concavity restrictions by Lau
(1978b).
31Different stochastic specifications are compared by Appelbaum (1978), Burgess (1975), and Geary
and McDonnell (1980). The implications of alternative stochastic specifications are discussed in detail
by Fuss, McFadden, and Mundlak (1978).
Ch. 31: Economeiric Methods for Modeling Producer Behavior 1861
where ef is the vector of unobservable random disturbances for the value shares of
the t th time period and E: is the corresponding disturbance for the rate of
technical change. Since the value shares for all inputs sum to unity in each time
period, the random disturbances corresponding to the J value shares sum to zero
in each time period:
We also assume that the disturbances have a covariance matrix that is the same
for all observations; since the random disturbances corresponding to the J value
shares sum to zero, this matrix is nonnegative definite with rank at most equal to
J. We assume that the covariance matrix of the random disturbances correspond-
ing to the value shares and the rate of technical change, say Z, has rank J, where:
1
81
E:
(3.4)
3.2. Autocorrelation
The rate of technical change ui is not directly observable; we assume that the
equation for the translog price index of the rate of technical change can be
written:
E:=$[E;+E:-r], (t=1,2...T).
Laurent matrix:
(3.8)
where:
1
2
...
1 ...
s2= ii ...
0 0 0
(3.9)
Since the matrix D in (3.9) is known, the equations for the average rate of
technical change and the average value shares can be transformed to eliminate
autocorrelation. The matrix 52 is positive definite, so that there is a matrix P such
that:
POP’ = I,
P’P = r’.
To construct the matrix P we first invert the matrix D to obtain the inverse
matrix tip’, a positive definite matrix. We then calculate the Cholesky factoriza-
1864 D. W. Jorgenson
Q-t = TDT’.
where T is a unit lower triangular matrix and D is a diagonal matrix with positive
elements along the main diagonal. Finally, we can write the matrix P in the form:
where D112 is a diagonal matrix with elements along the main diagonal equal to
the square roots of the corresponding elements of D.
We can transform equations for the average rates of technical change by the
=
matrix P = D’12T’ to obtain equations with uncorrelated random disturbances:
1
v=
* 1 lnp,, ..a 2-i E2
1
f
u3
I
1 lnp,, ... 3-i .e3
f
Dl/=T Dl/=T’ + D”/=T’ . 9
-T T-f
“t 1 lnp,, ... ET
:I
i3.10)
since:
IQ~W~T’) = a3I.
(103W~~f)(2f8i2)( (3.11)
for the remaining average value share, using the product exhaustion restrictions
on these parameters. The complete model involves :J( J + 3) unknown parame-
ters. A total of $(.I” + 4J + 5) additional parameters can be estimated as func-
tions of these parameters, using the homogeneity, product exhaustion, and
symmetry restrictions.32
where V’is the number of instruments. A necessary and sufficient rank condition
is given below; this amounts to the nonlinear analogue of the absence of
multicollinearity.
Our objective is to estimate the unknown parameters- (Ye, Bpp, ppt -subject to
the restrictions implied by homogeneity, product exhaustion, symmetry, and
monotonicity. By dropping the equation for one of the value shares, we can
eliminate the restrictions implied by summability. These restrictions can be used
in estimating the parameters that occur in the equation that has been dropped.
We impose the restrictions implied by homogeneity and symmetry as equalities.
The restrictions implied by monotonicity take the form of inequalities.
We can write the model of production and technical change in (3.5) and (3.6) in
the form:
U=_f(Y)+E, (3.14)
where:
By the assumptions in Section 3.1 above the random vector E has mean zero and
Ch. 31: Econometric Methods for Modeling Producer Behavior 1867
af
U=f(Yo)+-(Yo)Ay+U, (3.16)
ay
where yO is the initial value of the vector of unknown parameters y and
Ay=y,-Yo,
where yr is the revised value of this vector. The fitted residuals u depend on the
initial and revised values.
To revise the initial values we apply Zellner and Theil’s (1962) three stage least
squares method to the linearized model, obtaining:
Ay= ( ~(yo)'(~;l~Z(Z'Z)-'Z')~(yo))-l
~~(yo)'(e;1~z(z2)-1z')[u-f(yo)]. (3.17)
The final step in estimation of the model of production and technical change is
to minimize the criterion function (3.15) subject to the restrictions implied by
monotonicity of the distributive shares. We have eliminated the restrictions that
take the form of equalities. Monotonicity of the distributive shares implies
inequality restrictions on the parameters of the Cholesky factorization of the
matrix of constant share elasticities ,BP,. The diagonal elements of the matrix D
in this factorization must be nonposltrve.
We can represent the inequality constrains on the matrix of share elasticities
BP, in the form:
L=S(y)+X$, (3.19)
aL
-= as(Y) +,*=, (3.20)
a~ au ay ’
and the complementary slackness condition:
X$ = 0, x 2 0. (3.21)
where AS is the change in the values of the parameters (3.17) and X* is the
solution of the linear complementarity problem:
where:
‘A=09
+$Y&Y-$(vo) h20.
35The method of nonlinear three stage least squares introduced by Jorgenson and Laffont (1974)
was extended to nonlinear inequality constrained estimation by Jorgenson, Lau, and Stoker (19X2),
esp. pp. 196-204.
1870 D. W. Jorgenson
The rank condition necessary and sufficient for identifiability of the vector of
unknown parameters y is the nonsingularity of the following matrix in the
neighborhood of the true parameter vector:
(3.26)
The order condition (3.12) given above is necessary for the nonsingularity of this
matrix.
Finally, we can consider the problem of testing equality restrictions on the
vector of unknown parameters y. For example, suppose that the maintained
hypothesis is that there are r = $( J + 3) elements in this vector after solving out
the homogeneity, product exhaustion, and symmetry restrictions. Additional
equality restrictions can be expressed in the form:
r=g@L (3.27)
H: Y = g(S),
against the alternative:
A: yzg(6).
Test statistics appropriate for this purpose have been analyzed by Gallant and
Jorgenson (1979) and Gallant and Holly (1980).36
A statistic for testing equality restrictions in the form (3.27) can be constructed
by analogy with the likelihood ratio principle. First, we can evaluate the criterion
function (3.15) at the minimizing value T, obtaining:
s(y)= [u-f(~)]‘[2;1sz(ztz)-1z~][u-f(Q)].
Second, we can replace the vector of unknown parameters y by the function g(6)
in (3.27):
S(6)= {u-f[g(s)]}'[~,lez(z~z)-'z~]{u-~~g~~)l};
36A nonstatistical approach to testing the theory of production has been presented by Afriat (1972),
Diewert and Parkan (1983), Hanoch and Rothschild (1972), and Varian (1984).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1871
T(P,@=s@)-s(q). (3.28)
Gallant and Jorgenson (1979) show that this statistic is distributed asymptotically
as &i-squared with r - s degrees of freedom. Wherever the right hand side
variables can be treated as exogenous, this statistic reduces to the likelihood ratio
statistic for nonlinear multivariate regression models proposed by Malinvaud
(1980). The resulting statistic is distributed asymptotically as chi-squared.37
37Statistics for testing linear inequality restrictions in linear multivariate regression models have
been developed by Gourieroux, Holly, and Montfort (1982); statistics for testing nonlinear inequality
restrictions in nonlinear multivariate regression models are given by Gourieroux, Holly, and Monfort
(1980).
1872 D. W. Jorgenson
4. I. Substitution
implemented by Berndt and Wood (1975). This sector combines the manufactur-
ing and petroleum refining sectors of the Berndt-Jorgenson model. Berndt and
Wood generate this model by expressing the price of aggregate input as a function
of the prices of capital, labor, energy, and materials inputs into total manufactur-
ing. They find that capital and energy inputs are complements, while all other
pairs of inputs are substitutes.
By comparison with the results of Berndt and Wood, Hudson and Jorgenson
(1978) have classified patterns of substitution and complementarity among inputs
for the four nonenergy sectors of the Berndt-Jorgenson model. For agriculture,
nonfuel mining and construction, capital and energy are complements and all
other pairs of inputs are substitutes. For manufacturing, excluding petroleum
refining, energy is complementary with capital and materials, while other pairs of
inputs are substitutes. For transportation energy is complementary with capital
and labor while other pairs of inputs are substitutes. Finally, for communications,
trade and services, energy and materials are complements and all other pairs of
inputs are substitutes.
Bemdt and Wood have considered further simplification of the Berndt-
Jorgenson model of producer behavior by imposing separability restrictions on
patterns of substitution among capital, labor, energy, and materials inputs.38 This
would reduce the number of input prices at the first stage of the model through
the introduction of additional input aggregates. For this purpose additional stages
in the allocation of the value of sectoral output among inputs would be required.
Berndt and Wood consider all possible pairs of capital, labor, energy, and
materials inputs, but find that only the input aggregate consisting of capital and
energy is consistent with the empirical evidence.39
Bemdt and Morrison (1979) have disaggregated the Berndt-Wood data on
labor input between blue collar and white collar labor and have studied the
substitution among the two types of labor and capital, energy, and materials
inputs for U.S. total manufacturing, using a translog price function. Anderson
(1981) has reanalyzed the Bemdt-Wood data set, testing alternative specifications
of the model of substitution among inputs. Gallant (1981) has fitted an alternative
model of substitution among inputs to these data, based on the Fourier functional
form for the price function. Elbadawi, Gallant, and Souza (1983) have employed
this approach in estimating price elasticities of demand for inputs, using the
Berndt-Wood data as a basis for Monte Carlo simulations of the performance of
alternative functional forms.
Cameron and Schwartz (1979) Denny, May, and Pinto (1978) Fuss (1977a),
and McRae (1981) have constructed econometric models of substitution among
capital, labor, energy, and materials inputs based on translog functional forms for
total manufacturing in Canada. Technical change is assumed to be neutral, as in
the study of U.S. total manufacturing by Berndt and Wood (1975) but noncon-
stant returns to scale are permitted. McRae and Webster (1982) have compared
models of substitution among inputs in Canadian manufacturing, estimated from
data for different time periods.
Friede (1979) has analyzed substitution among capital, labor, energy, and
materials inputs for total manufacturing in the Federal Republic of Germany. He
assumes that technical change is neutral and utilizes a translog price function. He
has disaggregated the results to the level of fourteen industrial groups, covering
the whole of the West German economy. He has separated materials inputs into
two groups-manufacturing and transportation services as one group and other
nonenergy inputs as a second group. Ozatalay, Grubaugh, and Long (1979) have
modeled substitution among capital, labor, energy and materials inputs, on the
basis of a translog price function. They use time series data for total manufactur-
ing for the period 1963-74 in seven countries-Canada, Japan, the Netherlands,
Norway, Sweden, the U.S., and West Germany.
Longva and Olsen (1983) have analyzed substitution among capital, labor,
energy, and materials inputs for total manufacturing in Norway. They assume
that technical change is neutral and utilize a generalized Leontief price function.
They have disaggregated the results to the level of nineteen industry groups.
These groups do not include the whole of the Norwegian economy; eight
additional industries are included in a complete multi-sectoral model of
production for Norway. Dargay (1983) has constructed econometric models of
substitution among capital, labor, energy, and materials inputs based on translog
functional forms for total manufacturing in Sweden. She assumes that technical
change is neutral, but permits nonconstant returns to scale. She has disaggregated
the results to the level of twelve industry groups within Swedish manufacturing.
Although the breakdown of inputs among capital, labor, energy, and materials
has come to predominate in econometric models of production at the industry
level, Humphrey and Wolkowitz (1976) have grouped energy and materials inputs
into a single aggregate input in a study of substitution among inputs in several
U.S. manufacturing industries that utilizes translog price functions. Friedlaender
and Spady (1980) have disaggregated transportation services between trucking
and rail service and have grouped other inputs into capital, labor and materials
inputs. Their study is based on cross section data for ninety-six three-digit
industries in the United States for 1972 and employs a translog functional form
with fixed inputs.
Parks (1971) has employed a breakdown of intermediate inputs among agricul-
tural materials, imported materials and commercial services, and transportation
Ch. 31: Econometric Methods for Modeling Producer Behavior 1875
40The advantages and disadvantages of summarizing data from process analysis models by means of
econometric models have been discussed by Maddala and Roberts (1980, 1981) and Griffin (1980,
1981~).
1876 D. W. Jorgenson
dummy variables. Their results differ substantially from those of Berndt and
Jorgenson and Berndt and Wood. These differences have led to an extensive
discussion among Berndt and Wood (1979, 1981), Griffin (1981a, 1981b), and
Kang and Brown (1981), attempting to reconcile the alternative approaches.
Substitution among capital, labor, and energy inputs requires a price function
that is homothetically separable in the prices of these inputs. An alternative
specification is that the price function is homothetically separable in the prices of
capital, labor, and natural resource inputs. This specification has been utilized by
Humphrey and Moroney (1975), Moroney and Toeves (1977,1979) and Moroney
and Trapani (1981a, 1981b) in studies of substitution among these inputs for
individual manufacturing industries in the U.S. based on translog price functions.
A third alternative specification is that the price function is separable in the
prices of capital and labor inputs. Berndt and Christensen (1973b, 1974) have
used translog price functions employing this specification in studies of sub-
stitution among individual types of capital and labor inputs for U.S. total manu-
facturing. Berndt and Christensen (1973b) have divided capital input between
structures and equipment inputs and have tested the separability of the two types
of capital input from labor input. Berndt and Christensen (1974) have divided
labor input between blue collar and white collar inputs and have tested the
separability of the two types of labor input from capital input. Hamermesh and
Grant (1979) have surveyed the literature on econometric modeling of substitu-
tion among different types of labor input.
Woodland (1975) has analyzed substitution among structures, equipment and
labor inputs for Canadian manufacturing, using generalized Leontief price func-
tions. Woodland (1978) has presented an alternative approach to testing sep-
arability and has applied it in modeling substitution among two types of capital
input and two types of labor input for U.S. total manufacturing, using the
translog parametric form. Field and Berndt (1981) and Berndt and Wood (1979,
1981) have surveyed econometric models of substitution among inputs. They
focus on substitution among capital, labor, energy and materials inputs at the
level of individual industries.
models the biases of technical change are endogenous and depend on relative
prices. As Samuelson (1965) has pointed out, models of induced technical change
require intertemporal optimization since technical change at any point of time
affects future production possibilities.41
In the Jorgenson-Fraumeni model of producer behavior myopic decision rules
can be derived by treating the price of capital input as a rental price of capital
services.42 The rate of technical change at any point of time is a function of
relative prices, but does not affect future production possibilities. This greatly
simplifies the modeling of producer behavior and facilitates the implementation
of the econometric model. Given myopic decision rules for producers in each
industrial sector, all of the implications of the economic theory of production can
be described in terms of the properties of the sectoral price functions given in
Section 2.1.43
The Jorgenson-Fraumeni model of producer behavior consists of a system of
equations giving the shares of capital, labor, energy, and materials inputs in the
value of output and the rate of technical change as functions of relative prices and
time. To formulate an econometric model a stochastic component is added to
these equations. Since the rate of technical change is not directly observable, we
consider a form of the model with autocorrelated disturbances; the data are
transformed to eliminate the autocorrelation. The prices are treated as endoge-
nous variables and the unknown parameters are estimated by the method of
nonlinear three stage least squares presented in Section 3.3.
The endogenous variables in the Jorgenson-Fraumeni model include value
shares of sectoral inputs for four commodity groups and the sectoral rate of
technical change. Four equations can be estimated for each industry, correspond-
ing to three of the value shares and the rate of technical change. As unknown
parameters there are three elements of the vector {(Ye}, the scalar { LYE},six share
elasticities in the matrix {BP,}, which is constrained to be symmetric, three biases
of technical change in the vector { &}, and the scalar { &}, so that there is a
total of fourteen unknown parameters for each industry. Jorgenson and Fraumeni
estimate these parameters from time series data for the period 1958-1974 for each
industry, subject to the inequality restrictions implied by monotonicity of the
sectoral input value shares.44
The estimated share elasticities with respect to price {BP,} describe the
implications of patterns of substitution for the distribution of the value of output
among capital, labor, energy, and materials inputs. Positive share elasticities
41A review of the literature on induced technical change is given by Binswanger (lY78a).
42The model of capital as a factor of production was originated by Walras (1954). This model has
been discussed by Diewert (1980) and by Jorgenson (1973a, 1980).
41Myopic decision rules are derived by Jorgenson (1973b).
44 Data on energy and materials are based on annual interindustry transactions tables for the United
States compiled by Jack Faucett Associates (1977). Data on labor and capital are based on estimates
by Fraumeni and Jorgenson (1980).
1878 II. W. Jorgenson
imply that the corresponding value shares increase with an increase in price;
negative share elasticities imply that the value shares decrease with price; zero
share elasticities correspond to value shares that are independent of price. The
concavity constraints on the sectoral price functions contribute substantially to
the precision of the estimates, but require that the share of each input be
nonincreasing in the price of the input itself.
The empirical findings on patterns of substitution reveal some striking similari-
ties among industries. 45 The elasticities of the shares of capital with respect to the
price of labor are nonnegative for thirty-three of the thirty-five industries, so that
the shares of capital are nondecreasing in the price of labor for these thirty-three
sectors. Similarly, elasticities of the share of capital with respect to the price of
energy are nonnegative for thirty-four industries and elasticities with respect to
the price of materials are nonnegative for all thirty-five industries. The share
elasticities of labor with respect to the prices of energy and materials are
nonnegative for nineteen and for all thirty-five industries, respectively. Finally,
the share elasticities of energy with respect to the price of materials are nonnega-
tive for thirty of the thirty-five industries.
We continue the interpretation of the empirical results with estimated biases of
technical change with respect to price { &}. These parameters can be interpreted
as changes in the sectoral value shares (2.14) with respect to time, holding prices
constant. This component of change in the value shares can be attributed to
changes in technology rather than to substitution among inputs. For example, if
the bias of technical change with respect to the price of capital input is positive,
we say that technical change is capital-using; if the bias is negative, we say that
technical change is capital-saving.
Considering the rate of technical change (2.14) the biases of technical change
{ BP,} can be interpreted in an alternative and equivalent .way. These parameters
are changes in the negative of the rate of technical change with respect to changes
in prices. As substitution among inputs takes place in response to price changes,
the rate of technical change is altered. For example, if the bias of technical change
with respect to capital input is positive, an increase in the price of capital input
decreases the rate of technical change; if the bias is negative, an increase in the
price of capital input increases the rate of technical change.
A classification of industries by patterns of the biases of technical change is
given in Table 1. The pattern that occurs with greatest frequency is capital-using,
labor-using, energy-using, and materials-saving technical change. This pattern
occurs for nineteen of the thirty-five industries for which biases are fitted.
Technical change is capital-using for twenty-five of the thirty-five industries,
labor-using for thirty-one industries, energy-using for twenty-nine industries, and
materials-using for only two industries.
45Parameter &mates are given by Jorgenson and Fraumeni (1983), pp. 255-264.
(‘A. 31: Lkmmetric Methods for Modeling Producer Behmior 1879
Table 1
Classification of industries by biases of technical change
energy prices since 1973 have had the effect of reducing sectoral rates of technical
change, slowing the aggregate rate of technical change, and diminishing the rate
of growth for the U.S. economy as a whole.46
While the empirical results suggest a considerable degree of similarity across
industries, it is necessary to emphasize that the Jorgenson-Fraumeni model of
producer behavior requires important simplifying assumptions. First, conditions
for producer equilibrium under perfect competition are employed for all in-
dustries. Second, constant returns to scale at the industry level are assumed.
Finally, a description of technology that leads to myopic decision rules is
employed. These assumptions must be justified primarily by their usefulness in
implementing production models that are uniform for all thirty-five industrial
sectors of the U.S. economy.
Binswanger (1974a, 1974b, 1978~) has analyzed substitution and technical
change for U.S. agriculture, using cross sections of data for individual states for
1949, 1954, 1959, and 1964. Binswanger was the first to estimate biases of
technical change based on the translog price function. He permits technology to
differ among time periods and among groups of states within the United States.
He divides capital inputs between land and machinery and divides intermediate
inputs between fertilizer and other purchased inputs. He considers substitution
among these four inputs and labor input.
Binswanger employs time series data on U.S. agriculture as a whole for the
period 191221964 to estimate biases of technical change on an annual basis.
Brown and Christensen (1981) have analyzed time series data on U.S. agriculture
for the period 1947-1974. They divide labor services between hired labor and
self-employed labor and capital input between land and all other-machinery,
structures, and inventories. Other purchased inputs are treated as a single
aggregate. They model substitution and technical change with fixed inputs, using
a translog functional form.
Berndt and Khaled (1979) have augmented the Berndt-Wood data set for U.S.
manufacturing to include data on output. They estimate biases of technical
change and permit nonconstant returns to scale. They employ a Box-Cox
transformation of data on input prices, generating a functional form that includes
the translog, generalized Leontief, and quadratic as special cases. The Box-Cox
transformation is also employed by Appelbaum (1979a) and by Caves,
Christensen, and Trethaway (1980). Denny (1974) has proposed a closely related
approach to parametrization based on mean value functions.
Kopp and Diewert (1982) have employed a translog parametric form to study
technical and allocative efficiency. For this purpose they have analyzed data on
U.S. total manufacturing for the period 1947-71 compiled by Berndt and Wood
46The implications of patterns of biases of technical change are discussed in more detail by
Jorgenson (1981).
C-h. 31: Econometrrc Methods for Modeling Producer Behavior 1881
(1975) and augmented by Berndt and Khaled (1979). Technical change is not
required to be neutral and nonconstant returns to scale are permitted. They have
interpreted the resulting model of producer behavior as a representation of
average practice. They have then re-scaled the parameters to obtain a “frontier”
representing best practice and have employed the results to obtain measures of
technical and allocative efficiency for each year in the sample.47
Wills (1979) has modeled substitution and technical change for the U.S. steel
industry, using a translog price function. Norsworthy and Harper (1981) have
extended and augmented the Berndt-Wood data set for total manufacturing and
have modeled substitution and technical change, using a translog price function.
Woodward (1983) has reanalyzed these data and has derived estimates of rates of
factor augmentation for capital, labor, energy, and materials inputs, using a
translog price function.
Jorgenson (1984b) has modeled substitution and technical change for thirty-five
industries of the United States for the period 1958-1979, dividing energy inputs
between electricity and nonelectrical energy inputs. He employs translog price
functions with capital, labor, two kinds of energy, and materials inputs and finds
that technical change is electricity-using and nonelectrical energy-using for most
U.S. industries. Nakamura (1984) has developed a similar model for twelve
sectors covering the whole of the economy for the Federal Republic of Germany
for the period 1960-1974. He has disaggregated intermediate inputs among
energy, materials, and services.
We have already discussed the work of Kopp and Smith on substitution among
inputs, based on data generated by process models of the U.S. steel industry.
Kopp and Smith (1981c, 1982) have also analyzed the performance of different
measures of technical change, also using data generated by these models. They
show that measures of biased technical change based on the methodology
developed by Binswanger can be explained by the proportion of investment in
specific technologies.
Econometric models of substitution among inputs at the level of individual
industries have incorporated intermediate inputs-broken down between energy
and materials inputs-along with capital and labor inputs. However, models of
substitution and technical change have also been constructed at the level of the
economy as a whole. Output can be divided between consumption and investment
goods, as in the original study of the translog price function by Christensen,
Jorgenson, and Lau (1971, 1973) and input can be divided between capital and
labor services.
Hall (1973) has considered nonjointness of production of investment and
consumption goods outputs for the United States. Kohli (1981, 1983) has also
47A survey of the literature on frontier representations of technology is given by Forsund, Lovell,
and Schmidt (1980).
1882 D. W. Jorgenson
studied nonjointness in production for the United States. Burgess (1974) has
added imports as an input to inputs of capital and labor services. Denny and
Pinto (1978) developed a model with this same breakdown of inputs for Canada.
Conrad and Jorgenson (1977, 1978) have considered nonjointness of production
and alternative models of technical change for the Federal Republic of Germany.
Aggregation over inputs has proved to be a very important means for simplifying
the description of technology in modeling producer behavior. The price of output
can be represented as a function of a smaller number of input prices by
introducing price indexes for input aggregates. These price indexes can be used to
generate a second stage of the model by treating the price of each aggregate as a
function of the prices of the inputs making up the aggregate. We can parametrize
each stage of the model separately.
The Berndt-Jorgenson (1973) model of producer behavior is based on two
stage allocation of the value of output of each sector. In the first stage the value of
sectoral output is allocated among capital, labor, energy, and materials inputs,
where materials include inputs of nonenergy commodities and competitive im-
ports. In the second stage the value of energy expenditure is allocated among
expenditures on individual types of energy and the value of materials expenditure
is allocated among expenditures on competitive imports and nonenergy commod-
ities.
The first stage of the econometric model is generated from a price function for
each sector. The price of sectoral output is a function of the prices of capital and
labor inputs and the prices of inputs of energy and materials. The second stage of
the model is generated from price indexes for energy and materials inputs. The
price of energy is a function of the prices of five types of energy inputs, while the
price of materials is a function of the prices of four types of nonenergy inputs and
the price of competitive imports.
The Berndt-Jorgenson model of producer behavior consists of three systems of
equations. The first system gives the shares of capital, labor, energy and materials
inputs in the value of output, the second system gives the shares of energy inputs
in the value of energy input, and the third system gives the shares of nonenergy
inputs and competitive imports in the value of materials inputs. To formulate an
econometric model stochastic components are added to these systems of equa-
tions. The rate of technical change is taken to be exogenous; all prices-including
the prices of energy and materials inputs for each sector-are treated as endoge-
nous variables. Estimates of the unknown parameters of all three systems of
equations are based on the nonlinear three stage least squares estimator.
The Berndt-Jorgenson model illustrates the use of two stage allocation to
simplify the description of producer behavior. By imposing the assumption that
Ch. 31: Econometric Methods for Modeling Producer Behavior 1883
the price of aggregate input is separable in the prices of individual energy and
materials inputs, the price function that generates the first stage of the model can
be expressed in terms of four input prices rather than twelve. However, simplifica-
tions of the first stage of the model requires the introduction of a second stage,
consisting of price functions for energy and materials inputs. Each of these price
functions can be expressed in terms of five prices of individual inputs.
Fuss (1977a) has constructed a two stage model of Canadian total manufactur-
ing using translog functional forms. He treats substitution among coal, liquid
petroleum gas, fuel oil, natural gas, electricity, and gasoline as a second stage of
the model. Friede (1979) has developed two stage models based on translog price
functions for fourteen industries of the Federal Republic of Germany. In these
models the second stage consists of three separate models-one for substitution
among individual types of energy and two for substitution among individual
types of nonenergy inputs. Dargay (1983) has constructed a two stage model of
twelve Swedish manufacturing industries utilizing a translog functional form. She
has analyzed substitution among electricity, oil, and solid fuels inputs at the
second stage of the model.
Nakamura (1984) has constructed three stage models for twelve industries of
the Federal Republic of Germany, using translog price functions. The first stage
encompasses substitution and technical change among capital, labor, energy,
materials, and services inputs. The second stage consists of three models - a model
for substitution among individual types of energy, a model for substitution among
individual types of materials, and a model for substitution among individual
types of services. The third stage consists of models for substitution between
domestically produced input and the corresponding imported input of each type.
Pindyck (1979a, 1979b) has constructed a two stage model of total manufactur-
ing for ten industrialized countries- Canada, France, Italy, Japan, the Nether-
lands, Norway, Sweden, the U.K., the U.S., and West Germany-using a translog
price function. He employs annual data for the period 1959-1973 in estimating a
model for substitution among four energy inputs -coal, oil, natural gas, and
electricity. He uses annual data for the period 1963-73 in estimating a model for
substitution among capital, labor, and energy inputs. Magnus (1979) and Magnus
and Woodland (1984) have constructed a two stage model for total manufacturing
in the Netherlands along the same lines. Similarly, Ehud and Melnik (1981) have
developed a two stage model for the Israeli economy.
Halvorsen (1977) and Halvorsen and Ford (1979) have constructed a two stage
model for substitution among capital, labor, and energy inputs for nineteen
two-digit U.S. manufacturing industries on the basis of translog price functions.
For this purpose they employ cross section data for individual states in 1971. The
second stage of the model provides a disaggregation of energy input among inputs
of coal, oil, natural gas, and electricity. Halvorsen (1978) has analyzed substitu-
tion among different types of energy on the basis of cross section data for 1958,
1962. and 1971.
18X4 D. W. Jorgenson
5. Cost functions
5.1. Duality
Utilizing the notation of Section 2, we can define total cost, say c, as the sum of
expenditures on all inputs:
c= c pjxj.
j=1
u_=
PJxj
- (j=1,2...J).
J c ’
With output fixed from the point of view of the producing unit and competitive
markets for all inputs, the necessary conditions for producer equilibrium are given
by equalities between the shares of each input in total cost and the ratio of the
Ch. 31: Econometric Method.7for Modeling Producer Behmior 1X85
elasticity of output with respect to that input and the sum of all such elasticities:
d In y
alnx
(5.1)
‘= i,alny ’
alnx
Given the definition of total cost and the necessary conditions for producer
equilibrium, we can express total cost, say c, as a function of the prices of all
inputs and the level of output:
c=c(p,y). (5.2)
We refer to this as the cost function. The cost function C is dual to the production
function F and provides an alternative and equivalent description of the technol-
ogy of the producing unit.48
We can formalize the theory of production in terms of the following properties
of the cost function:
1. Positiuity. The cost function is positive for positive input prices and a
positive level of output.
2. Homogeneity. The cost function is homogeneous of degree one in the input
prices.
3. Monotonicity. The cost function is increasing in the input prices and in the
level of output.
4. Concuuity. The cost function is concave in the input prices.
Given differentiability of the cost function, we can express the cost shares of all
inputs as elasticities of the cost function with respect to the input prices:
u=++,,). (5.3)
Further, we can define an index of returns to scale as the elasticity of the cost
function with respect to the level of output:
4RDuality between cost and production functions is due to Shephard (1953, 1970).
1886 D. w. .7orgenson
Following Frisch (1965), we can refer to this elasticity as the cost flexibility.
The cost flexibility uV is the reciprocal of the degree of returns to scale, defined
as the elasticity of output with respect to a proportional increase in all inputs:
1
u”= i, 8lny . (5.5)
t3In x
i’u=i’alnc=l,
ifI In p
Since the cost function is increasing in the input prices, the cost shares must be
nonnegative and not all zero:
u 2 0.
The cost function is also increasing in the level of output, so that the cost
flexibility is positive:
UP> 0.
We have represented the cost shares of all inputs and the cost flexibility as
functions of the input prices and the level of output. We can characterize these
functions in terms of measures of substitution and economies of scale. We obtain
share elasticities by differentiating the logarithm of the cost function twice with
respect to the logarithms of input prices:
(5.6)
These measures of substitution give the response of the cost shares of all inputs to
proportional changes in the input prices.
Second, we can differentiate the logarithm of the cost function twice with
respect to the logarithms of the input prices and the level of output to obtain
Ch. 31: Econometric Methods for Modeling Producer Behavior 1887
(5.7)
We refer to these measures as biases of scale. The vector of biases of scale up_”can
be employed to derive the implications of economies of scale for the relative
distribution of total cost among inputs. If a scale bias is positive, the cost share of
the corresponding input increases with a change in the level of output. If a scale
bias is negative, the cost share decreases with a change in output. Finally, if a
scale bias is zero, the cost share is independent of output.
Alternatively, the vector of biases of scale uPy can be employed to derive the
implications of changes in input prices for the cost flexibility. If the scale bias is
positive, the cost flexibility increases with the input price. If the scale bias
is negative, the cost flexibility decreases with the input price. Finally, if the bias is
zero, the cost flexibility is independent of the input price.
To complete the description of economies of scale we can differentiate the
logarithm of the cost function twice with respect to the level of output:
U =%P,Y)=-&P,v). (5.8)
.“.” 8 In y 2
Total cost c is positive and the diagonal matrices N and V are defined in terms of
the input prices p and the cost shares u, as in Section 2. Two inputs are
substitutes if the corresponding element of the matrix uPP + uu’- V is negative,
complements if the element is negative, and independent if the element is zero.
In Section 2.2 above we have introduced price and quantity indexes of
aggregate input implied by homothetic separability of the price function. We can
analyze the implications of homothetic separability of the cost function by
1888 D. W. Jorgenson
introducing price and quantity indexes of aggregate input and defining the cost
share of aggregate input in terms of these indexes. An aggregate input can be
treated in precisely the same way as any other input, so that price and quantity
indexes can be used to reduce the dimensionality of the space of input prices and
quantities.
We say that the cost function C is homothetic if and only if the cost function is
separable in the prices of all J inputs { pi, p2 . . . p,}, so that:
(5.9)
where the function P is homogeneous of degree one and independent of the level
of output y. The cost function is homothetic if and only if the production
function is homothetic, where
(5.10)
uY =+y(y). (5.12)
If the cost flexibility is also independent of the level of output, the cost function is
homogeneous in the level of output and the production function is homogeneous
in the quantity index of aggregate input G. The degree of homogeneity of the
production function is the degree of returns to scale and is equal to the reciprocal
of the cost flexibility. Under constant returns to scale the degree of returns to
scale and the cost flexibility are equal to unity.
49The concept of homotheticity was introduced by Shephard (1953). Shephard shows that ho-
motheticity of the cost function is equivalent to homotheticity of the production function.
Ch. 31: Econometric Method for Modeling Producer Behavior 18X9
(5.14)
uv = cfyy+ &ln p + P,,ln y,
+lnp~~~,lny+:4,(lny)‘, (5.15)
5.3.1. Homogeneity
The cost shares and the cost flexibility are homogeneous of degree zero in the
input prices.
Homogeneity of degree zero of the cost shares and the cost flexibility implies
that the parameters- BP, and fi,,, ~ must satisfy the restrictions:
Bppi = 0
&i = 0. (5.16)
c$i =l,
Bipi = 0, (5.17)
pdYi = 0.
5.3.3. Symmetry
The matrix of share elasticities, biases of scale, and the derivative of the cost
flexibility with respect to the logarithm output must be symmetric.
A necessary and sufficient condition for symmetry is that the matrix of
parameters must satisfy the restrictions:
(5.18)
5.3.4. Nonnegutivity
5.3.5. Monotonicity
(5.19)
+lnpk~~,lnyk+f(1nyk)2+e;, (k=1,2...K),
where .ak is the vector of unobservable random disturbances for the cost shares of
the k th producing unity and E: is the corresponding disturbance for the cost
function (k = 1,2.. . K). Since the cost shares for all inputs sum to unity for each
1892 D. W. Jorgenson
producing unit, the random disturbances corresponding to the J cost shares sum
to zero for each unit:
&k
E =O, (k =1,2... K). (5.21)
[ e: 1
We also assume that the disturbances have a covariance matrix that is the same
for all producing units and has rank J, where:
'k
V =.Z, (k=1,2... K).
[G 1
El1
-512
v G =Z@I. (5.22)
E21
E;
(5.23)
the vector of biases of scale is equal to zero. Under homogeneity the cost
flexibility is independent of output, so that:
Ch. 31: Econometric Method for Modeling Producer Behavior 1893
the derivative of the flexibility with respect to the logarithm of output is zero.
Finally, under constant returns to scale, the cost flexibility is equal to unity; given
the restrictions implied by homogeneity, constant returns requires:
ay =l. (5.24)
5oA model of a regulated firm based on cost minimization was introduced by Nerlove (1963).
Surveys of the literature on the Averch-Johnson model have been given by Bailey (1973) and Baumol
and Klevorick (1970).
1894 D. W. Jorgenson
SIChri~ten~en and Greene have assembled data on cross sections of individual firms for 1955 and
1970. The quantity of output is measured in billions of kilowatt hours (kwh). The quantity of fuel
input is measured by British thermal units (Btu). Fuel prices per million Btu are averaged by weighting
the price of each fuel by the corresponding share in total consumption. The price of labor input is
measured as the ratio of total salaries and wages and employee pensions and benefits to the number of
full-time employees plus half the number of part-time employees. The price of capital input is
estimated as the sum of interest and depreciation.
Ch. .<I: Econometric Meihodr for Modeling Producer Behuvior 1895
Table 2
Cost function for U.S. electric power industry (parameter estimates, 1955 and 1970;
t-ratios in parentheses).a
1955 1970
8.412 7.14
(31.52) (32.45)
0.386 0.587
(6.22) (20.87)
0.094 0.208
(0.94) (2.95)
0.348 ~ 0.151
(4.21) (- 1.85)
0.558 0.943
(8.57) (14.64)
0.059 0.049
(5.76) (12.94)
- 0.008 0.003
(~ 1.79) (- 1.23)
- 0.016 -0.018
(~ 10.10) (- 8.25)
0.024 0.021
(5.14) (6.64)
0.175 0.118
(5.51) (6.17)
0.038 0.081
(2.03) (5.00)
0.176 0.178
(6.83) (10.79)
-0.018 ~ 0.011
(- 1.01) (- 0.749)
- 0.159 ~ 0.107
(-6.05) (- 7.48)
- 0.020 - 0.070
(- 2.08) (-6.30)
data base has been extended by Greene (1983) to incorporate cross sections of
individual electric utilities for 1955, 1960, 1965, 1970, and 1975. By including
both the logarithm of output and time as an index of technology in the translog
total cost function (5.15), Greene is able to characterize economies of scale and
technical change simultaneously.
Stevenson (1980) has employed a translog total cost function incorporating
output and time to analyze cross sections of electric utilities for 1964 and 1972.
Gollop and Roberts (1981) have used a similar approach to study annual data on
eleven electric utilities in the United States for the period 195881975. They use
the results to decompose the growth of total cost among economies of scale,
technical change, and growth in input prices. Griffin (1977b) has modeled
substitution among different types of fuel in steam electricity generation using
four cross sections of twenty OECD countries. Halvorsen (1978) has analyzed
substitution among different fuel types, using cross section data for the United
States in 1972.
Cowing, Reifschneider, and Stevenson (1983) have employed a translog total
cost function similar to that of Christensen and Greene to analyze data for
eighty-one electric utilities for the period 1964-1975. For this purpose they have
grouped the data into four cross sections, each consisting of three-year totals for
all firms. If disturbances in the equations for the cost shares (5.19) are associated
with errors in optimization, costs must increase relative to the minimum level
given by the cost function (5.15). Accordingly, Cowing, Reifschneider and Steven-
son employ a disturbance for the cost function that is constrained to be positive.52
An alternative to the Christensen-Greene model for electric utilities has been
developed by Fuss (1977b, 1978). In Fuss’s model the cost function is permitted
to differ ex ante, before a plant is constructed, and ex post, after the plant is in
place. 53 Fuss employ s a generalized Leontief cost function with four input
prices- structures, equipment, fuel, and labor. He models substitution among
inputs and economies of scale for seventy-nine steam generation plants for the
period 1948-61.
We have observed that a model of the behavior of a regulated firm based on
cost minimization must be carefully distinguished from the model originated by
Averch and Johnson (1962). In addition to allowing a given rate of return,
regulatory authorities may permit electric utilities to adjust the regulated price of
output for changes in the cost of specific inputs. In the electric power industry a
Brown, Caves, and Christensen (1979) have introduced a model for joint produc-
tion of freight and transportation services in the railroad industry based on the
translog cost function (5.15). 54 A cost flexibility (5.4) can be defined for each
output. Scale biases and derivatives of the cost flexibilities with respect to each
54A review of the literature on regulation with joint production is given by Bailey and Friedlaender
(1982).
1898 D. W. Jorgemn
7. Conclusion
The purpose of this concluding section is to suggest possible directions for future
research on econometric modeling of producer behavior. We first discuss the
application of econometric models of production in general equilibrium analysis.
The primary focus of empirical research has been on the characterization of
technology for individual producing units. Application of the results typically
involves models for both demand and supply for each commodity. The ultimate
objective of econometric modeling of production is to construct general equi-
librium models encompassing demand and supplies for a wide range of products
and factors of production.
A second direction for future research on producer behavior is to exploit
statistical techniques appropriate for panel data. Panel data sets consist of
observations on several producing units at many points of time. Empirical
research on patterns of substitution and technical change has been based on time
series observations on a single producing unit or on cross section observations on
different units at a given point of time. Research on economics of scale has been
based primarily on cross section observations.
Our exposition of econometric methods has emphasized areas of research where
the methodology has crystallized. An important area for future research is the
implementation of dynamic models of technology. These models are based on
substitution possibilities among outputs and inputs at different points of time. A
number of promising avenues for investigation have been suggested in the
literature on the theory of production. We conclude the paper with a brief review
of possible approaches to the dynamic modeling of producer behavior.
Ch. 31: Econometric Methods for Modeling Producer Behavior 1901
incorporated through one-zero dummy variables that enter the cost function.
One set of dummy variables corresponds to the individual producing units. A
second set of dummy variables corresponds to the time periods.
Although airlines provide both freight and passenger service, the revenues for
passenger service greatly predominate in the total, so that output is defined as an
aggregate of five categories of transportation services. Inputs are broken down
into three categories-labor, fuel, and capital and materials. The number of points
served by an airline is included in the cost functions as a measure of the size of
the network. Average stage length and average load factor are included as
additional characteristics of output specific to the airline.
Caves, Christensen, and Trethaway introduce a distinction between economies
of scale and economies of density. Economies of scale are defined in terms of the
sum of the elasticities of total cost with respect to output and points served,
holding input prices and other characteristics of output constant. Economies of
density are defined in terms of the elasticity of total cost with respect to output,
holding points served, input prices, and other characteristics of output constant.
Caves, Christensen, and Trethaway find constant returns to scale and increasing
returns to density in airline service.
The model of panel data .employed by Caves, Christensen, and Trethaway in
analyzing air transportation service is based on “ fixed effects”. The characteristics
of output specific to a producing unit can be estimated by employing one-zero
dummy variables for each producing unit. An alternative approach based on
“random effects” of output characteristics is utilized by Caves, Christensen,
Trethaway, and Windle (1984) in modeling rail transportation service. They
consider a panel data set for forty-three Class I railroads in the United States for
the period 1951-1975.
Caves, Christensen, Trethaway, and Windle employ a generalized translog cost
function in modeling the joint production of freight and passenger transportation
services by rail. They treat the effects of characteristics of output specific to each
railroad as a random variable. They estimate the resulting model by panel data
techniques originated by Mundlak (1963,1978). The number of route miles served
by a railroad is included in the cost function as a measure of the size of the
network. Length of haul for freight and length of trip for passengers are included
as additional characteristics of output.
Economies of density in the production of rail transportation services are
defined in terms of the elasticity of total cost with respect to output, holding route
miles, input prices, firm-specific effects, and other characteristics of output fixed.
Economies of scale are defined holding only input prices and other characteristics
of output fixed. The impact of changes in outputs, route miles, and firm specific
effects can be estimated by panel data techniques. Economies of density and scale
can be estimated from a single cross section by omitting firm-specific dummy
variables.
1904 D. W. Jorgenson
References
Arrow, K. J., H. B. Chencry, B. S. Minhas and R. M. Solow (1961) “Capital-Labor Substitution and
Economic Efficiency”, Review of Economics and Statistics, August, 63(3), 2255241.
Atkinson, S. E. and R. Halvorsen (1980) “A Test of Relative and Absolute Price Efficiency in
Regulated Utilities”, Review of Economics and Statistics, February, 62(l), 81-88.
Avcrch, H. and L. L. Johnson (1962) “Behavior of the Firm Under Regulatory Constraint”, Americun
Economic Review, December, 52(5), 1052-1069.
Bailey, E. E. (1973) Economic Theory of Regulatory Constmint. Lexington: Lexington Books.
Bailey, E. E. and A. F. Friedlacnder (1982) “Market Structure and Multiproduct Industries”, Journal
of Economic Literature, September, 20(3), 1024-1048.
Baumol, W. J. and A. K. Klevorick (1970) “Input Choices and Rate-of-Return Regulation: An
Overview of the Discussion”, Bell Journal of Economics und Munugement Science, Autumn, l(2).
1622190.
Belsley, D. A. (1974) “Estimation of Systems of Simultaneous Equations and Computational Applica-
tions of GREMLIN”, Annuls of Social and Economic Meuuurement, October, 3(4), 551-614.
Bclsley, D. A. (1979) “On The Computational Competitiveness of Full-Information Maximum-Likeli-
hood and Three-Stage Least-Squares in the Estimation of Nonlinear, Simultaneous-Equations
Models”, Journal of Econometrics, February, 9(3), 315-342.
Berndt, E. R. and L. R. Christensen (1973a) “The Internal Structure of Functional Relationships:
Separability, Substitution, and Aggregation”, Review of Economic Studies, July, 40(3), 123, 403-410.
Bcrndt, E. R. and L. R. Christensen (1973b) “The Translog Function and the Substitution of
Equipment, Structures, and Labor in U.S. Manufacturing, 1929%1968”, Journul of Econometrics,
March, l(l), 81-114.
Bcrndt, E. R. and L. R. Christensen (1974) “Testing for the Existence of a Consistent Aggregate Index
of Labor Inputs”, American Economic Review, June, 64(3), 391-404.
Berndt, E. R. and B. C. Field, cds. (1981) Modeling nnd Meusuring Natural Resource Substitution.
Cambridge: M.I.T. Press.
Berndt, E. R., B. H. Hall, R. E. Hall and J. A. Hausman (1974) “Estimation and Inference in
Nonlinear Structural Models”, Annals of Social and Economic Measurement, October, 3(4), 653-665.
Bcrndt, E. R. and D. W. Jorgenson (1973) “Production Structure”, in: D. W. Jorgenson and H. S.
Houthakker. eds., U.S. Energy Resources and Economic Growth. Washington: Energy Policy Project.
Berndt. E. R. and M. Khaled (1979) “Parametric Productivity Measurement and Choice Among
Flexible Functional Forms”, Journal of Political Economy, December, 87(6), 1220-1245.
Berndt, E. R. and C. J. Morrison (1979) “Income Redistribution and Employment Effects of Rising
Energy Prices”, Resources and Energy, October, 2(2), 131-150.
Bcrndt. E. R., C. J. Morrison and G. C. Watkins (1981) “Dynamic Models of Energy Demand: An
Assessment and Comparison”, in: E. R. Bemdt and B. C. Field, eds., 259-289.
Bcrndt, E. R. and D. 0. Wood (1975) “Technology, Prices, and the Derived Demand for Energy”,
Revrew of Economics und Stutistics, August, 57(3), 376-384.
Bcmdt, E. R. and D. 0. Wood (1979) “Engineering and Econometric Intcrprctations of Energy-Capital
Complcmcntarity”, American Economic Review, June, 69(3), 342-354.
Berndt, E. R. and D. 0. Wood (1981) “Engineering and Econometric Interpretations of Energy-Capital
Complementarity: Reply and Further Results”, American Economic Review, December, 71(5),
1105-1110.
Binswanger. H. P. (1974a) “A Cost-Function Approach to the Measurement of Elasticities of Factor
Demand and Elasticities of Substitution”, American Journnl of Agricultural Economics, May, 56(2),
377-386.
Binswangcr, H. P. (1974b) “The Mcasurcmcnt of Technical Change Biases with Many Factors of
Production,” Americun Economic Review, December, 64(5), 964-976.
Binswangcr, H. P. (1978a) “Induced Technical Change: Evolution of Thought”, in: H. P. Binswangcr
and V. W. Ruttan, eds., 13-43.
Binswanger, H. P. (1978b) “Issues in Modeling Induced Technical Change”, in: H. P. Binswangcr and
V. W. Ruttan, cds., 128-163.
Binswanger, H. P. (1978~) “Measured Biases of Technical Change: The United States”, in: H. P.
Binswangcr and V. W. Ruttan, cds., 215-242.
Binswangcr, H. P. and V. W. Ruttan, eds. (1978) Induced Innovcrtion. Baltimore: Johns Hopkins
University Press.
C‘h. .?I: Econometric Method.7 for Modeling Producer Rehuvior 1907
Blackorby, C., D. Primont and R. R. Russell (1977) “On Testing Separability Restrictions with
Flexible Functional Forms”, Journal of Econometrics, March, 5(2), 195-209.
Blackorby, C., D. Primont and R. R. Russell (1978) Duality, Separability, and Functionul Structure.
Amsterdam: North-Holland.
Blackorby, C. and R. R. Russell (1976) “Functional Structure and the Allen Partial Elasticities of
Substitution: An Application of Duality Theory”, Reutew of Economic Studies, 43(2), 134, 2855292.
Braeutigan, R. R., A. F. Daughety and M. A. Turnquist (1982) “The Estimation of a Hybrid Cost
Function for a Railroad Firm”, Review of Economics and Statistics, August. 64(3), 394-404.
Brown, M., ed. (1967) The Theory and Empiricttl Analysis of Production. New York: Columbia
University Press.
Brown, R. S.. D. W. Caves and L. R. Christensen (1979) “Modeling the Structure of Cost and
Production for Multiproduct Firms”, Southern Economic Journal, July, 46(3), 256273.
Brown, R. S. and L. R. Christensen (1981) “Estimating Elasticities of Substitution in a Model of
Partial Static Equilibrium: An Application to U.S. Agriculture, 1947 to 1974”, in: E. R. Berndt and
B. C. Field, eds., 209-229.
Burgess, D. F. (1974) “A Cost Minimization Approach to Import Demand Equations”, Review of
Economics und Stcrtistits, May, 56(2), 224-234.
Burgess, D. F. (1975) “Duality Theory and Pitfalls in the Specification of Technology”, Journnl of
Econometrics, May, 3(2), 105-121.
Cameron, T. A. and S. L. Schwartz (1979) “Sectoral Energy Demand in Canadian Manufacturing
Industries”. Energy Economics, April, l(2), 112-118.
C’arlson, S. (1939) A Study on the Pure Theory of Production. London: King.
Caves, D. W. and L. R. Christensen (1980) “Global Properties of Flexible Functional Forms”,
Americctn Economic Review, June, 70(3), 422-432.
Caves, D. W., L. R. Christensen and J. A. Swanson (1980) “Productivity in U.S. Railroads,
1951-1974”. Bell Journal of Economics, Spring 1980, 11(l), 166-181.
Caves, D. W., L. R. Christensen and J. A. Swanson (1981) “Productivity Growth, Scale Economies
and Capacity Utilization in U.S. Railroads, 1955-1974”. Amertcan Economic Review, December,
71(5), 994-1002.
Caves. D. W.. L. R. Christensen and M. W. Trethaway (1980) “Flcxiblc Cost Functions for
Multiproduct Firms”, Review of Economics und Statistics, August, 62(3), 477-481.
Caves, D. W., L. R. Christensen and M. W. Trethaway (1984) “Economics of Density Versus
Economics of Scale: Why Trunk and Local Airline Costs Differ”, Rand Journul of Economics,
Winter. 15(4), 471-489.
Caves. D. W.. L. R. Christensen, M. W. Trethaway and R. Windle (1984) “Network Effects and the
Measurement of Returns to Scale and Density for U.S. Railroads”, in: A. F. Daughety, ed.,
Anu!vtical Studies in Trunsport Economics, forthcoming.
C’hiang, S. J. W. and A. F. Friedlaender (1985) “Trucking Technology and Marked Structure”, Review
of Economit~s und Statistics, May, 67(2), 250-258.
Christ, C., ct. al. (1963) Mwsurement in Economics. Stanford: Stanford University Press.
Christensen, L. R., D. Cummings and P. E. Schocch (1983) “Econometric Estimation of Scale
Economies in Telecommunications”, in: L. Courville, A. de Fontcnay and R. Dobell, eds., 27-53.
Christensen, L. R. and W. H. Greene (1976) “Economies of Scale in U.S. Electric Power Generation”,
Jourtutl of Politicctl Economy, August, X4(4), 655-676.
Christensen. L. R. and D. W. Jorgenson (1970) “U.S. Real Product and Real Factor Input,
lY2Y-1967”. Review of Income and Wealth, March, 16(l), 19-50.
Christensen, L. R., D. W. Jorgenson and L. J. Lau (1971) “Conjugate Duality and the Transcendental
Logarithmic Production Function”, Econometricu, July, 39(3), 255-256.
Christensen, L. R., D. W. Jorgenson and L. J. Lau (1973) “Transcendental Logarithmic Production
Frontiers”, Retjiew of Economics and Statisttcs, Februarv. 55(l). 28-45.
Cobb, C. W. and P. H. Douglas (192X) “A Theory of+Production”, American Economic Revrew,
March, 18(2), 139-165.
Conrad, K. and D. W. Jorgenson (1977) “Tests of a Model of Production for the Federal Republic of
Germany, 1950-1973”, European Economic Review, October, 10(l), 51-75.
Conrad, K. and D. W. Jorgenson (1978) “The Structure of Technology: Nonjointness and Commodity
Augmentation, Fcdcral Republic of Germany, lY50-lY73”, Empirical Economics, 3(2), 91-113.
1908 D. W. Jorgenson
Courvillc, L., A. de Fontenay and R. Dobell, eds. (1983) Economic An&is of Telecommunications.
Amsterdam: North-Holland.
Cowing, T. G. (1978) “The Effectiveness of Rate-of-Return Regulation: An Empirical Test Using
Protit Functions”, in: M. Fuss and D. McFadden, eds.. 2, 215-246.
Cowing, T. G. and V. K. Smith (1978) “The Estimation of a Production Technology: A Survey of
Econometric Analyses of Steam Electric Generation”, Land Economics, May, 54(2), 158-16X.
Cowing, T. G. and R. E. Stevenson, eds. (1981), Productioitv Measurement in Regulated ,I Industries. New
York: Academic Press.
Cowing, T. G., D. Rcifschncider and R. E. Stevenson, “A Comparison of Alternative Frontier Cost
Function Specifications”, in: A. Doaramaci, ed.. 63-92.
Dargay. J. (1983) “The Demand for Energy in Swedish Manufacturing,” in B.-C. Ysander, ed., Energy
in Swedish Manu/acturing. Stockholm: Industrial Institute for Economic and Social Research,
57-128.
Denny, M. (1974) “The Relationship Between Functional Forms for the Production System”,
Canadian Journal of Economics, February, 7(l), 21-31.
Denny, M. and M. Fuss (1977) “The Use of.Approximation Analysis to Test for Separability and the
Existence of Consistent Aggregates”, American Economic Review, June, 67(3), 404-418.
Denny, M., M. Fuss, C. Everson and L. Waverman (1981) “Estimating the Effects of Technological
Innovation in Telecommunications: The Production Structure of Bell Canada”, Canadian Journal of
Economics, February, 14(l), 24-43.
Denny, M., M. Fuss and L. Waverman (1981) “The Substitution Possibilities for Energy: Evidence
from U.S. and Canadian Manufacturing Industries”, in: E. R. Bemdt and B. C. Field, eds.,
230-258.
Denny, M. and J. D. May (1978) “Homotheticity and Real Value-Added in Canadian Manufacturing”,
in: M. Fuss and D. McFadden, eds., 2, 53-70.
Denny, M., J. D. May and C. Pinto (1978) “The Demand for Energy in Canadian Manufacturing:
Prologue to an Energy Policy”, Canadian Journal of Economics, May, 11(2), 300-313.
Denny. M. and C. Pinto, “An Aggregate Model with Multi-Product Technologies”, in: M. Fuss and
D. McFadden, eds., 2, 249-268.
Diewert, W. E. (1971) “An Application of the Shephard Duality Theorem, A Generalized Leontief
Production Function”, Journal of Political Economy, May/June, 79(3), 481-507.
Diewert, W. E. (1973) “Functional Forms for Profit and Transformation Functions”, Journal of
Economic Theory, June, 6(3), 284-316.
Diewert. W. E. (1974a) “Applications of Duality Theory”, in: M. D. Intrilligator and D. A. Kendrick,
eds., 106-171.
Diewert, W. E. (1974b) “Functional Forms for Revenue and Factor Requirement Functions”,
fntemutional Economic Review, February, 15(l), 119-130.
Diewert, W. E. (1976) “Exact and Superlative Index Numbers”, Journal of Econometrics, May, 4(2),
115-14s.
Diewert. W. E. (1980) “Aggregation Problems in the Measurement of Capital”, in: D. Usher, ed., The
Measurement of Capital. Chicago: University of Chicago Press, 433-528.
Diewert, W. E. (1982) “Duality Approaches to Microeconomic Theory”, in: K. J. Arrow and M. D.
Intrilligator, eds., Handhook of Mathematical Economics, 2, 535-591.
Diewert, W. E. and C. Parkan (1983) “Linear Programming Tests of Regularity Conditions for
Production Functions”, in: W. Eichhom, R. Henn, K. Neumann and R. W. Shephard, eds.,
131-158.
Dogramaci, A., ed. (1983) Developments in Econometric Ana!yses of Productivity. Boston:
Kluwer-Nijhoff.
Douglas, P. W. (1948) “Are There Laws of Production?“, American Economic Review, March, 38(l),
1-41.
Douglas, P. W. (1967) “Comments on the Cobb-Douglas Production Function”, in: M. Brown, ed.,
15-22.
Douglas, P. W. (1976) “The Cobb-Douglas Production Function Once Again: Its History, Its Testing,
and Some Empirical Values,” October, 84(5), 903-916.
Ehud, R. I. and A. Melnik (1981) “The Substitution of Capital, Labor and Energy in the Israeli
Economy”, Resources and Energy, November, 3(3), 247-258.
Ch. 31: Econometric Methods for Modeling Producer Behavior 1909
Eichhorn, W., R. Henn, K. Neumann and R. Wm. Shephard, eds. (1983) Quantitative Studies on
Production and Prices. Wurzburg: Physica-Verlag.
Elbadawi, I.. A. R. Gallant and G. Souza (1983) “An Elasticity Can Be Estimated Consistently
Without a Priori Knowledge of Functional Form”, Econometrica, November, 51(6), 1731-1752.
Epstein, L. G. and A. Yatchew (1985) “The Empirical Determination ol Technology and Expecta-
tions: A Simplified Procedure”, Journal of Econometrics, February, 27(2), 2355258.
Evans. D. S. and J. J. Heckman (1983) “Multi-Product Cost Function Estimates and Natural
Monopoly Tests for the Bell System”, in: D. S. Evans, ed., Breaking up Bell. Amsterdam:
North-Holland, 253-282.
Evans, D. S. and J. J. Heckman (1984) “A Test for Subadditivity of the Cost Function with an
Application to the Bell System”, American Economic Review, Scptcmbcr, 74(4), 615-623.
Faucett, Jack and Associates (1977) Development of 35Order Input-Output Tables, 1958-1974.
Washington: Federal Emergency Management Agency.
Field, B. C. and E. R. Bemdt (1981) “An Introductiory Review of Research on the Economics of
Natural Resource Substitution”, in: E. R. Bemdt and B. C. Field, eds., l-14.
Field, B. C. and C. Grebenstein (1980) “Substituting for Energy in U.S. Manufacturing”, Review of
Economtcs und Statistics, May, 62(2), 207-212.
Forsund, F. R. and L. Hjalmarsson (1979) “Frontier Production Functions and Technical Progress: A
Study of General Milk Processing Swedish Dairy Plants”, Econometrica, July, 47(4), 883-901.
Forsund, F. R. and L. Hjalmarsson (1983) “Technical Progress and Structural Change in the Swedish
Cement Industry 195S-1979”, Econometrica, September, 51(5), 1449-1467.
Forsund, F. R. and E. S. Jansen (1983) “Technical Progress and Structural Change in the Norwegian
Primary Aluminum Industry”, Scundinavian Journal of Economics, 85(2), 113-126.
Forsund, F. R., C. A. K. Love11 and P. Schmidt (1980) “A Survey of Frontier Production Functions
and of Their Relationship to Efficiency Measurement”, Journul of Econometrrcs, May, 13(l), 5-25.
Fraumeni. B. M. and D. W. Jorgenson (1980) “The Role of Capital in U.S. Economic Growth,
194X-1976”, in: G. von Furstenberg, ed., 9-250.
Frcngcr, P. (1978) “Factor Substitution in the Interindustry Model and the Use of Inconsistent
Aggregation”, in: M. Fuss and D. McFadden, eds., 2, 269-310.
Friede, G. (1979) Investigution of Producer Behavior in the Federal Republic of Germany Using the
Translog Price Function. Cambridge: Oelgeschlager, Gunn and Hain.
Friedlaender, A. F. and R. H. Spady (1980) “A Derived Demand Function for Freight Transporta-
tion”, Review of Econonucs and Statistics, August, 62(3), 432-441.
Friedlaender, A. F. and R. H. Spady (1981) Freight Transport Regulation. Cambridge: M.I.T. Press.
Friedlaender, A. F., R. H. Spady and S. J. W. Chiang (1981) “Regulation and the Structure of
Technology in the Trucking Industry”, in: T. G. Cowing and R. E. Stevenson, eds., 77-106.
Frisch, R. (1965) Theoty of Production. Chicago: Rand McNally.
Fullerton, D., Y. K. Henderson and J. B. Shoven, “A Comparison of Methodologies in Empirical
General Equilibrium Models of Taxation”, in: H. E. Scarf and J. B. Shoven, eds., 367-410.
Fuss, M. (1977a) “The Demand for Energy in Canadian Manufacturing: An Example of the
Estimation of Production Structures with Many Inputs”, Journal <$ Econometrics, January, 5(l),
89-116.
Fuss, M. (1977b) “The Structure of Technology Over Time: A Model for Testing the Putty-Clay
Hypothesis”, Econometrica, November, 45(8), 1797-1821.
Fuss, M. (197X) “Factor Substitution in Electricity Generation: A Test of the Putty-Clay Hypothesis”,
in: M. Fuss and D. McFadden, eds., 2, 187-214.
Fuss, M. (1983) “A Survey of Recent Results in the Analysis of Production Conditions in Telecom-
munications”, in: L. Courville, A. de Fontenay and R. Dobell, eds., 3-26.
Fuss, M. and D. McFadden, eds. (1978) Production Economics. Amsterdam, North-Holland, 2 Vols.
Fuss, M., D. McFadden and Y. Mundlak (1978) “A Survey of Functional Forms in the Economic
Analysis of Production”, in: M. Fuss and D. McFadden, eds., 1, 219-268.
Fuss, M. and L. Waverman (1981) “Regulation and the Multiproduct Firm: The Case of Telecom-
munications in Canada”, in: G. Fromm, ed., Studies in Public Regulation. Cambridge: M.I.T. Press,
277-313.
Gallant, A. R. (1977) “Three-Stage Least Squares Estimation for a System of Simultaneous, Nonlin-
ear, Implicit Equations”, Journal of Econometrics, January, 5(l), 71-88.
1910 D. W. Jorgenson
Gallant, A. R. (1981) “On the Bias in Flexible Functional Forms and an Essentially Unbiased Form”,
Journal of Econometrics, February, 15(2), 211-246.
Gallant, A. R. and A. Holly (1980) “Statistical Inference in an Implicit, Nonlinear, Simultaneous
Equations Model in the Context of Maximum Likelihood Estimation”, Econometrica, April, 48(3),
6977720.
Gallant, A. R. and D. W. Jorgenson (1979) “Statistical Inference for a System of Simultaneous,
Nonlinear, Implicit Equations in the Context of Instrumental Variable Estimation”, Journul of
E~~onometrics, October/December, 11(2/3), 275-302.
Geary, P. T. and E. J. McDonnell (1980) “Implications of the Specification of Technologies: Further
Evidence”, Journul of Econometrics, October, 14(2), 247-255.
Gollop. F. M. and S. M. Karlson (1978) “The Impact of the Fuel Adjustment Mechanism on
Economic Efficiency”, Review of Economics and Stutistics, November, 60(4), 574-584.
Gallop, F. M. and M. J. Roberts (1981) “The Sources of Economic Growth in the U.S. Electric Power
Industry”, in: T. G. Cowing and R. E. Stevenson, eds., 107-145.
Gollop. F. M. and M. J. Roberts (1983) “Environmental Regulations and Productivity Growth: The
Case of Fossil-Fueled Electric Power Generation”. Journal of Poliiicul Economy, August, 91(4).
6544674.
German, W. M. (1959) “Separable Utility and Aggregation”, Econometrica, July, 27(3), 469-481.
Gourieroux, C., A. Holly and A. Monfort (1980) “Kuhn-Tucker, Likelihood Ratio and Wald Tests
for Nonlinear Models with Constraints on the Parameters”. Harvard University, Harvard Institute
for Economic Research, Discussion Paper No. 770, June.
Gourieroux, C.. A. Holly and A. Monfort (1982) “Likelihood Ratio Test, Wald Test, and Kuhn-Tucker
Test in Linear Models with Inequality Constraints on the Regression Parameters”, Econometricu,
January, 50(l), 63-80.
Greene. W. H. (1980) “Maximum Likelihood Estimation of Econometric Frontier Functions”,
Journal of Econumetrics, May, 13(l), 27-56.
Greene. W. H. (1983) “Simultaneous Estimation of Factor Substitution, Economies of Scale, Produc-
tivity, and Non-Neutral Technical Change”, in: A. Dogramaci, ed., 121-144.
Grifftn, J. M. (1977a) “The Econometrics of Joint Production: Another Approach”, Review of
Economics and Sfutistics, November, 59(4), 389-397.
Griffin, J. M. (1977b) “Interfuel Substitution Possibilities: A Translog Application to Pooled Data”,
Internationul Economic Review, October, 18(3), 755-770.
Grit%, J. M. (1977~) “Long-Run Production Modeling with Pseudo Data: Electric Power Generation”,
Bell Journal of Economics, Spring 1977, 8(l), 112-127.
Griffin, J. M. (1978) “Joint Production Technology: The Case of Petrochemicals”, Econometrica,
March, 46(l), 379-396.
Griffin, J. M. (1979) “Statistical Cost Analysis Revisited”, Quarterly Journul of Economics, February,
93(l). 107-129.
Griffin, J. M. (1980) “Alternative Functional Forms and Errors of Pseudo Data Estimation: A Reply”,
Review of Economics und Statistics, May, 62(2), 327-328.
Griffin, J. M. (1981a) “The Energy-Capital Complementarity Controversy: A Progress Report on
Reconciliation Attempts”, in: E. R. Bemdt and B. C. Field, eds., 70-80.
Griffin, J. M. (1981b) “Engineering and Econometric Interpretations of Energy-Capital Complemen-
tarity: Comment”, American Economic Review, December, 71(5), 1100-1104.
Griflin, J. M. (1981~) “Statistical Cost Analysis Revisited: Reply”, Quarterly Journul of Economics,
February, 96(l), 183-187.
GriRin, J. M. and P. R. Gregory (1976) “An Intercountry Translog Model of Energy Substitution
Responses”, Americun Economic Review, December, 66(5), 845-857.
Griliches, Z. (1967) “Production Functions in Manufacturing: Some Empirical Results”, in: M.
Brown, ed., 275-322.
Griliches, Z. and V. Ringstad (1971) Economies of Scale and the Form of the Production Function.
Amsterdam: North-Holland.
Hall, R. E. (1973) “The Specification of Technology with Several Kinds of Output”, Journal of
Political Economy, July/August, 81(4), 878-892.
Halvorsen, R. (1977) “Energy Substitution in U.S. Manufacturing”, Review of Economics and
Statistics. November, 59(4), 381-388.
Ch. .?I: Econometric Methods for Modeling Producer Behaoior 1911
Iialvorsen, R. (1978) Econometric Studies of U.S. Energy Demand. Lexington: Lexington Books.
Halvorsen, R. and J. Ford, “Substitution Among Energy, Capital and Labor Inputs in U.S.
Manufacturing”, in: R. S. Pindyck, ed., Advances in the Economics af Energy and Resources.
Greenwich: JAI Press. 1. 51-75 _.
Hamermesh, D. S. and~J. Grant (1979) “Econometric Stud& of Labor-Labor Substitution and Their
Implications for Policy”, Journal of IIuman Resources, Fall. 14(4). 518-542.
Han&h, G. (1978) “Symmetric Duality and Polar Production’ Functions”, in: M. Fuss and D.
McFadden, eds., 1, 111-132.
Hanoch, G. and M. Rothschild ( 1972) “Testing the Assumptions of Production Theory: A Nonpara-
metric Approach”, Journal of Political Economy, March/April, 80(2), 256-275.
Hansen, L. P. and T. J. Sargent (1980) “Formulating and Estimating Dynamic Linear Rational
Expectations Models”, Journal of Economic D.ynamics and Control, February, 2(l), l-46.
Hansen, L. P. and T. J. Sargent (1981) “Linear Rational Expectations Models for Dynamically
Interrelated Variables”, in: R. E. Lucas and T. J. Sargent. eds.. Rational Exnectations and
Econometric Practice. Minneapolis: University of Minnesot~Prcss, 1, 127-156. ’
Harmatuck, Donald J. (1979) “A Policy-Sensitive Railway Cost Function”, Logi.stic.v and Trunsporta-
tion Review, April, 15(2), 277-315.
Harmatuck, Donald J. (1981) “A Multiproduct Cost Function for the Trucking Industry”, Journal o/
Transportation Economics and Polky, May, 15(2), 135-153.
Heady, E. 0. and J. L. Dillon (1961) Agricultural Production Functions. Ames: Iowa State University
Press.
Hicks, J. R. (1946) Value and Cupitul. 2nd ed. (1st ed. 1939) Oxford: Oxford University Press.
Hicks, J. R. (1963) The Theory of Wages. 2nd ed. (1st ed. 1932), London: Macmillan,
Hildenbrand, W. (1981) “Short-Run Production Functions Based on Microdata”, Econometrtca,
September, 4Y(5), 1095-1125.
Hotelling. H. S. (1932) “Edgeworth’s Taxation Paradox and the Nature of Demand and Supply
Functions”, Journal of Politicul Economy, October, 40(5), 517-616.
Houthakker, H. S. (1955-1956) “The Pareto Distribution and the Cobb-Douglas Production Func-
tion in Activity Analysis”, Review of Economic Studies, 23(l), 60, 27-31.
Hudson, E A. and D. W. Jorgenson (1974) “U.S. Energy Policy and Economic Growth, 1975-2000”,
Bell .Journul of Econ0mic.s und Munugement Science, Autumn, 5(2), 461-514.
Hudson, E. A. and D. W. Jorgenson (197X) “The Economic Impact of Policies to Rcducc U.S. Energy
Growth,” Resources and Energy. November. l(3). 205-230.
Humphrey, D. B. and J. R. Moroney (1975) “Substitution Among Capital, Labor, and Natural
Resource Products in American Manufacturing”, Journal of Political Econom_v, February, 83(l),
57-82.
Humphrey, D. B. and B. Wolkowitz (1976) “Substituting Intermediates for Capital and Labor with
Alternative Functional Forms: An Aggregate Study”, Applied Economics, March, X(l), 59-68.
Intriligator. M. D. and D. A. Kendrick, eds. (1974) Frontiers in Qunntitative ,!konomit:r. Amsterdam:
North-Holland, Vol. 2.
JaraaDiaz, S. and C. Winston (1981) “Multiproduct Transportation Cost Functions: Scale and Scope
in Railway Operations”, in: N. Blattner, ed., Eighth European Associution for Research m Industrtal
Economics, Basle: University of Basle, 1, 437-469.
Jcnnrich, R. I. (1969) “Asymptotic Properties of Nonlinear Least Squares Estimations”, Annuls of
Mathematical Statistits, April, 40(2), 633-643.
Johansen. L. (1972) Production Functions. Amsterdam: North-Holland.
Johansen, L. (1974) A Multi-Sectoral Study of Economic Growth. 2nd ed. (1st ed. 1960) Amsterdam,
North-Holland.
Jorgenson. D. W:. (1973a) “The Economic Theory of Replacement and Depreciation”, in: W.
Sellckaerts, ed., Econometrrcs and Economic Theoty. New York: Macmillan, 1X9-221.
Jorgenson, D. W. (1973b) “Technology and Decision Rules in the Theory of Investment Behavior”,
Quarter!y Journal of Economics, November 1973, 87(4), 523-543.
Jorgenson, D. W. (1974) “Investment and Production: A Review”, in: M. D. Intriligator and D. A.
Kendrick. eds., 341-366.
Jorgenson. D. W. (19X0) “Accounting for Capital”, in: G. von Furstenberg, ed., 251-319.
Jorgenson, D. W. (1981) “Energy Prices and Productivity Growth”, Scundinuviun Journal of Econonz-
1912 D. W. Jorgenson
the Second Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of
California Press, 481-492.
Lau, L. J. (1969) “Duality and the Structure of Utility Functions”, Journal of Economic Theon;,
December, l(4), 374-396.
Lau, L. J. (1974) “Applications of Duality Theory: Comments”, in: M. D. Intriligator and D. A.
Kendrick, eds., 176-199.
Lau, L. J. (1976) “A Characterization of the Normalized Restricted Protit Function”, Journal of
Economic Theory, February, 12(l), 131-163.
Lau, L. J. (197Xa) “Applications . of Profit Functions”, in: M. Fuss and D. McFadden. eds.. 1.
133-216.
Lau, I.. J. (1978b) “Testing and Imposing Monotonicity, Convexity and Quasi-Convexity Constraints”,
in: M. Fuss and D. McFadden, eds., 1. 409-453.
Lau. L. J. (1986) “Functional Forms in Econometric Model Building”, this Handhook, Vol. 3.
Leontief, W. W. (1947a) “Introduction to a Theory of the Internal Structure of Functional Relation-
ships”, Econometrico, October, 15(4), 361-373.
Leontief, W. W. (1947b) “A Note on the Interrelation of Subsets of Independent Variables of a
Continuous Function with Continuous First Derivatives”, Bulletin of the American Mathematical
Socrety, April, 53(4), 343-350.
Leontief, W. W. (1951) The Structure of the American Economy, 1919-1939. 2nd cd. (1st ed. 1941)
New York: Oxford University Press.
Leontief. W. W., ed. (1953) Studies in the Structure of the American Economy. New York: Oxford
University Press.
Liew, C. K. (1976) “A Two-Stage Least-Squares Estimator with Inequality Restrictions on the
Parameters”, Review of Economics and Statistics, May, 58(2), 234-238.
Longva, S. and 0. Olsen (1983) “Producer Behaviour in the MSG Model”, in: 0. Bterkholt, S.
Longva, 0. Olsen and S. Strom, eds., Ana!vsis of Supply and Demand of Electricit_v in the Norwegian
I:‘conomy. Oslo: Central Statistical Bureau, 52-83.
Lucas, R.-E. (1967) “Adjustment Costs and the Theory of Supply”, Journal of Political Econom,y,
August, Pt. 1, 75(4), 321-334.
Maddala, Cr. S. and R. B. Roberts (1980) “Alternative Functional Forms and Errors of Pseudo Data
Estimation”, Review of Economics und Statistics, May, 62(2), 323-326.
Maddala. G. S. and R. B. Roberts (1981) “Statistical Cost Analysis Revisited: Comment”, Quartet+
Journal of Economrcs, February, 96(l), 177-182.
Magnus, J. R. (1979) “Substitution Between Energy and Non-Energy Inputs in the Netherlands,
1950-1976”. International Economic Review, June, 20(2), 465-484.
Magnus, J. R. and A. D. Woodland (1980) “Interfuel Substitution and Separability in Dutch
Manufacturing: A Multivariate Error Components Approach”, London School of Economics,
November.
Malinvaud, E. (1970) “The Consistency of Non-Linear Regressions”, Annals of Mathematical Statts-
tics, June, 41(3), 456-469.
Malinvaud, E. (1980) Statistical Methods of Econometricx 3rd ed. (1st ed. 1966) trans. A. Silvey.
Amsterdam: North-Holland.
McFadden, D. (1963) “Further Results on CES Production Functions”, Review of Economic Studie.s,
June, 30(2), X3, 73-83.
McFadden, 1). (1978) “Cost, Revenue, and Profit Functions”, in: M. Fuss and D. McFadden, eds., 1,
l-110.
McRae. R. N. (1981) “Regional Demand for Energy by Canadian Manufacturing Industries”,
Internatronal Journal o/Energy S_ystems, January, l(l), 38-48.
McRae, R. N. and A. R. Webster (1982) “The Robustness of a Translog Model to Describe Regional
Energy Demand by Canadian Manufacturing Industries”, Resources and Energy, March, 4(l), l-25.
Meese. R. (19X0) “Dynamic Factor Demand Schedules for Labor and Capital Under Rational
Expectations”, Journal of Econometrics, September, 14(l), 141-15X.
Moroney, J. R. and A. Toevs (1977) “Factor Costs and Factor Use: An Analysis of Labor, Capital,
and Natural Resources”, Southern Economic Journal, October, 44(2), 222-239.
Moroney. J. R. and A. Toevs (1979) “Input Prices, Substitution, and Product Inflation”, in: R. S.
Pindyck, ed., Advances tn the Economics of Energy and Resources. Greenwich: JAI Press, 1, 27-50.
1914 D. W. Jorgenson
Moroncy, J. R. and J. M. Trapani (1981a) “Alternative Models of Substitution and Technical Change
in Natural Resource Intensive Industries”, in: E. R. Berndt and B. C. Field, eds., 48-69.
Moroney. J. R. and J. M. Trapam (1981b) “Factor Demand and Substitution in Mineral-Intensive
Industries”, Bell Journul of Economics. Spring. 12(l). 212-285.
Morrison. C. J. and E. R. Berndt (1981) “Short-run Labor Productivity in a Dynamic Model”, Journal
of Econometrics, August, 16(3), 339-366.
Mundlak, Y. (1963) “Estimation of Production and Behavioral Functions from a Combination of
Cross-Section and Time Series Data”, in: C. Christ, et al., 138-166.
Mundlak. Y. (1978) “On the Pocling of Time Series and Cross Section Data”, Econometn’cu, January,
46(l), 60-X6.
Nadiri. M. I. (1970) “Some Approaches to the Theory and Measurcmcnt of Total Factor Productivity:
A Survey”, Journul of Economic Lueruture, December, 8(4), 1137-1178.
Nadiri, M. I. and M. Schankcrman (1981) “The Structure of Production, Technological Change, and
the Rate of Growth of Total Factor Productivity in the U.S. Bell System”, in: T. G. Cowing and
R. E. Stevenson, eds., 219-248.
Nakamura, S. (1984) An Inter-Industry Translog Model of Prices and Technical Change for the We.st
C;ermun Economy. Berlin: Springer-Verlag.
Nerlove, M. (1963) “Returns to Scale in Electricity Supply”, in: C. Christ, et al., 167-200.
Nerlove, M. (1967) “ Recent Empirical Studies of the CES and Related Production Functions”, in: M.
Brown, cd., 55-122.
Norsworthy, J. R. and M. J. Harper (1981) “Dynamic Models of Energy Substitution in U.S.
Manufacturing”, in: E. R. Bemdt and B. C. Field, eds., 177-208.
Ozatalay, S., S. S. Grumbaugh and T. V. Long III, “Energy Substitution and National Energy Policy”,
Americun Economic Review, May, 69(2), 369-371.
Parks, R. W. (1971) “Responsiveness of Factor Utilization in Swedish Manufacturing, 1870-1950”,
Rerliew of Economics and Stntisrics, May, 53(2), 129-139.
Peterson, H. C. (1975) “An Empirical Test of Regulatory Effects”, Bell Journal of Economics, Spring,
6(l), 111-126.
Pindyck. R. S. (lY79a) “Interfuel Substitution and Industrial Demand for Energy”, Reuiew of
Gonomic.v und Srarisric.s, May, 61(2), 169-179.
Pindyck, R. S. (1979b) The Structure of World Energy Demand. Cambridge: M.I.T. Press.
Pindyck, R. S. and J. J. Rotemberg (1983a) “Dynamic Factor Demands and the Effects of Energy_.
Price Shocks”, Americun Economic Review, December, 73(5), 1066-1079.
Pindvck. R. S. and J. J. Rotembere (1983b) “Dvnamic Factor Demands Under Rational Exoectations”.
S&dinaoian Journul of Econo&, 85(i), 223-239.
Quandt, R. E. (1983) “Computational Problems and Methods”, this Handbook, 1, 701-764.
Russell, C. S. and W. J. Vaughan (1976) Steel Production. Baltimore: Johns Hopkins University Press.
Russell, R. R. (1975) “Functional Separability and Partial Elasticities of Substitution”, Reuiew of
Economic Studies, January, 42(l), 129, 79-86.
Samuelson, P. A. (1951) “Abstract of a Theorem Concerning Substitutability in Open Leonticf
Models”, in: T. C. Koopmans, ed., Activity Anulysis of Production and Allocution. Wiley: New York,
142-146.
Samuelson, P. A. (1953-1954) “Prices of Factors and Goods in General Equilibrium”, Review of
Economic Studies, 21(l), 54, l-20.
Samuelson, P. A. (1960) “Structure of a Minimum Equilibrium System”, in: R. W. Pfouts, cd., ~.~sstlys
in Economics und Econometrrcs. Chapel Hill: University of North Carolina Press, l-33.
Samuelson, P. A. (1965) “A Theory of Induced Innovation Along Kennedy-Weizsacker Lines”,
Revrew of Economics and Stufistics, November, 47(4), 343-356.
Samuelson, P. A. (1973) “Relative Shares and Elasticities Simplified: Comment”, American Economic
Review, Septcmbcr, 63(4), 770-771.
Samuclson, P. A. (1974) “Complementarity-An Essay on the 40th Anniversary of the Hicks-Allen
Revolution in Demand Theory”, Journul of Economic Literature, December, 12(4), 1255-1289.
Samuelson, P. A. (1979) “Paul Douglas’s Measurement of Production Functions and Marginal
Productivities”, Journal of Political Economy, October, Part 1, X7(5), 923-939.
Samuelson, P. A. (1983) Foundations of Economic Analysis. 2nd ed. (1st ed. 1947), Cambridge:
Harvard University Press.
Sargan, J. D. (1971) “Production Functions”, in: R. Layard, cd., Qualijied Manpower and Economic
Ch. <I: EconomrtrrcMethods for Modeling Producer Behavior 1915
LABOR ECONOMETRICS*
JAMES J. HECKMAN
Utliversity of Chicago
THOMAS E. MACURDY
Contents
0. Introduction 1918
1. The index function model 1920
1.1. Introduction 1920
1.2. Some definitions and basic ideas 1921
1.3. Sampling plans 1926
2. Estimation 1929
2.1. Regression functions characterizations 1930
2.2. Dummy endogenous variable models 1945
3. Applications of the index function model 1952
3.1. Models with the reservation wage property 1952
3.2. Prototypical dummy endogenous variable models 1959
3.3. Hours of work and labor supply 1963
4. Summary 1971
Appendix: The principal assumption 1972
References 1974
*Heckman’s research on this project was supported by National Science Foundation Grant No.
SES-8107963 and NIH Grants ROl-HD16846 and ROl-HD19226. MaCurdy’s research on this project
was supported by National Science Foundation Grant No. SES-8308664 and a grant from the Alfred
P. Sloan Foundation. This paper has benefited greatly from comments generously given by Ricardo
Barros, Mark Gritz, Joe Hotz, and Frank Howland.
0. Introduction
In the past twenty years, the field of labor economics has been enriched by two
developments: (a) the evolution of formal neoclassical models of the labor market
and (b) the infusion of a variety of sources of microdata. This essay outlines the
econometric framework developed by labor economists who have built theoreti-
cally motivated models to explain the new data.
The study of female labor supply stimulated early research in labor economet-
rics. In any microdata study of female labor supply, two facts are readily
apparent: that many women do not work, and that wages are often not available
for nonworking women. To account for the first fact in a theoretically coherent
framework, it is necessary to model corner solutions (choices at the extensive
margin) along with conventional interior solutions (choices at the intensive
margin) and to develop an econometrics sufficiently rich to account for both types
of choices by agents. Although there were precedents for the required type of
econometric model in work in consumer theory by Tobin (1958) and his students
[e.g. Rosett (1959)], it is fair to say that labor economists have substantially
improved the original Tobin framework and have extended it in various im-
portant ways to accommodate a variety of models and types of data. To account
for the second fact that wages are missing in a nonrandom fashion for nonwork-
ing women, it is necessary to develop models for censored random variables. The
research on censored regression models developed in labor economics had no
precedent in econometrics and was largely neglected by statisticians (See the essay
by Griliches in this volume).
The econometric framework developed for the analysis of female labor supply
underlies more recent models of job search [Yoon (1981), Kiefer and Neumann
(1979), Flinn and Heckman (1982)], occupational choice [Roy (1951), Tinbergen
(1951), Siow (1984), Willis and Rosen (1979), Heckman and Sedlacek (1984)], job
turnover [Mincer and Jovanovic (1981) Borjas and Rosen (1981), Flinn (1984)],
migration [Robinson and Tomes (1982)], unionism [Lee (1978) Strauss and
Schmidt (1976), Robinson and Tomes (1984)] and training evaluation [Heckman
and Robb (1985)].
All of the recent models presented in labor econometrics are special cases of an
index function model. The origins of this model can be traced to Karl Pearson’s
(1901) work on the mathematical theory of evolution. See D. J. Kevles (1985, p.
31) for one discussion of Pearson’s work. In Pearson’s framework, discrete and
censored random variables are the manifestations of underlying continuous
random variables subject to various sampling schemes. Discrete random variables
are indicators of whether or not certain latent continuous variables lie above or
Ch. 32: L.&or Econometrics 1919
below given thresholds. Censored random variables are direct observations on the
underlying random variables given that certain selection criteria are met. Assum-
ing that the underlying continuous random variables are normally distributed
leads to the theory of biserial and tetrachoric correlation. [See Kendall and Stuart
(1967, Vol. II), for a review of this theory.] Later work in mathematical psy-
chology by Thurstone (1927) and Bock and Jones (1968) utilized the index
function framework to produce mathematical models of choice among discrete
alternatives and stimulated a considerable body of ongoing research in economics
[See McFadden’s paper in Volume II for a survey of this work and Lord and
Novick (1968) for an excellent discussion of index function models used in
psychometrics].
The index function model cast in terms of underlying continuous latent
variables provides the empirical counterpart of many theoretical models in labor
economics. For example, it is both natural and analytically convenient to for-
mulate labor supply or job search models in terms of unobserved reservation
wages which can often be plausibly modeled as continuous random variables.
When reservation wages exceed market wages, people do not work. If the
opposite occurs, people work and wages are observed. A variety of models that
are special cases of the reservation wage framework will be presented below in
Section 3.
The great virtue of research in labor econometrics is that the problems and the
solutions in the field are the outgrowth of research on well-posed economic
problems. In this area, the economic problems lead and the proposed statistical
solutions follow in response to specific theoretical and empirical challenges. This
imparts a vitality and originality to the field that is not found in many other
branches of econometrics.
One format for presenting recent developments in labor econometrics is to
chart the history of the subject, starting with the earliest models, and leading up
to more recent developments. This is the strategy we have pursued in previous
joint work [Heckman and MaCurdy (1981); Heckman, Killingsworth and
MaCurdy (1981)]. The disadvantage of such a format is that basic statistical ideas
become intertwined with specific economic models, and general econometric
points are sometimes difficult to extract.
This paper follows another format. We first state the basic statistical and
econometric principles. We then apply them in a series of worked examples. This
format has obvious pedagogical advantages. At the same time, it artihcially
separates economic problems from econometric theory and does not convey the
flow of research problems that stimulated the econometric models.
This paper is in three parts: Part 1 presents a general introduction to the index
function framework; Part 2 presents methods for estimating index function
models; and Part 3 makes the discussion concrete by presenting a series of models
in labor economics that are special cases of the index function framework.
1920 J. J. Heckman and T. E. MaCurdy
1.1. Introduction
The critical assumption at the heart of index function models is that unobserved
or partially observed continuous random variables generate observed discrete,
censored, and truncated random variables. The goal of econometric analysis
conducted for these models is to recover the parameters of the distributions of the
underlying continuous random variables.
The notion that continuous latent variables generate observed discrete, censored
and truncated random variables is natural in many contexts. For example, in the
discrete choice literature surveyed by McFadden (1985), the difference between
the utility of one option and the utility of another is often naturally interpreted as
a continuous random variable, especially if, as is sometimes plausible, utility
depends on continuously distributed characteristics. When the difference of
utilities exceeds a threshold (zero in this example), the first option is selected. The
underlying utilities of choices are never directly observed.
As another example, many models in labor economics are characterized by a
“reservation wage” property. Unemployed persons continue to search until their
reservation wage - a latent variable-is less their the offered wage. The difference
between reservation wages and offered wages is a continuous random variable if
some of the characteristics generating reservation wages are continuous random
variables. The decision to stop searching is characterized by a continuous latent
variable falling below a threshold (zero). Observed wages are censored random
variables with the censoring rule characterized by a continuous random variable
(the difference between reservation wages and market wages) crossing a threshold.
Further examples of index functions generated by economic models are presented
in Section 3.
From the vantage point of context-free statistics, using continuous latent
variables to generate discrete, censored or truncated random variables introduces
unnecessary complications into the statistical analysis. Despite its ancient heri-
tage, the index function approach is no longer widely used or advocated in the
modern statistics literature. [See, e.g. Bishop, Fienberg and Holland (1975) or
Haberman (1978) Volumes I and II.]’ Given their disinterest in behavioral
models, many statisticians prefer direct parameterizations of discrete data and
censored data models that typically possess no behavioral interpretation. Some
statisticians have argued that econometric models that incorporate behavioral
‘Such models are still widely used in the psychometric literature. See Lord and Novick (1968) or
Bock and Jones (1968).
Ch. 3.?: Lohor Econometr~ts 1921
theory are needlessly complicated. For this reason labor economics has been the
locus of recent research activity on index function models.
We begin with the most elementary index function model. This model ignores the
existence of Y and focuses on discrete variables whose outcomes register the
occurrence of various states of the world. Let 0, be a nontrivial subset of 0.
Although we do not directly observe Z, we know if
if Z E O,,
(1.2.1)
otherwise.
The discrete choice models surveyed by McFadden (1985) can be cast in this
framework. Let Z be a J X 1 vector of utilities, Z = (V(l), . . . , V(J))‘. The event
that option i is selected is the event that V(i) is maximal in the set { V(j)}:=,. In
the space of the distribution of utilities, the event that V(i) is maximal corre-
‘a, 0 and * and all partitions of these sets considered in this paper are assumed to be Bore1 sets.
1922 J. J. Heckman und T. E. MaC’ur<p
O,= J},
{ZlV(j)-V(i)<O, j=l,...,
and
Pr(ai=l) =Pr(ZEO,).
Introducing exogenous variables (X) into this model raises only minor concep-
tual issues3 The distribution of Z can be defined conditional on X, and the
regions of definition of 6, can also be allowed to depend on X (so 0; = Oi( X)).
The conditional probability that 8, =I given X is
‘Exogenous random variables are always observed and have a marginal density that shares no
parameters in common with the conditional distribution of the endogenous variables given the
exogenous variables.
Ch. 32: I.uhor Econometrics 1923
where the Oi’s are subsets of 0, and I, ( 4 I) is the number of states in which Y
is observed. In the remaining states (I - Zi in number), Y is not observed. We
define an indicator variable ai by
1 if(Y,Z)Ea,
6; = i=l ,..., Z.
i 0 if(Y,Z)eQ,
/&“,*f(YA4 dz
for y,* E qj
gi(Y?> =
i
Pr(6, =l) i=l ,...> Z1, (1.2.8)
1924 J. J. Heckman and T. E. MaCurdy
with
where the notation Jo,,, and Jo denotes integration over the sets 0 given y and
9, respectively - i.e.
and
The function gi( .) is the conditional density of Y given that selection rule (1.2.6)
is satisfied. As a consequence of convention (1.2.7) the distribution of Yj* when
& = 0 has point mass at q* = 0 (i.e. Pr(q* = 016, = 0) = 1).
The joint density of Y* and Si is
~b*Jl,..., =lfil
sf> [gi(Y*)Pr(sz =l>l 8’i=++l
I
[J(Y?)Pr(si=l)l SC.
(1.2.11)
k(y*@,=l) =E(h(y*16,,...,6,)@,=1)
= i h(y*16i=1)Pr(6i=11;T1=l)
r=l
= 5 g,(Y*)Pr(G,=l)/Pr(&=l).
i=l
k(y*lZ,=l) = ; gi(y*)Pr(Si=1)/Pr(8,=1).5
i=I,+l
‘These derivations use the fact that the sets D, are qmtually exclusive so Pr( 8, = 1) = Ez, Pr( 6, = 1)
and E( 13,)6, = 1J = Pr( 8, = 1 IS, = 1) = Pr( 8, = l)/Pr( 6, = l), with completely analogous results hold-
ing for 6, and 8,.
1926 J. J. Heckman and T E. MaCur&
6,‘sis given by
$1
h(y*18,,$,,8,) = 5 g,(y*)Pr(Si=1)/Pr(8i=1)
[ i=l I
where Y* has point mass at zero when 8s = 1 (i.e. Pr( Y * = 01& = 1) = 1).
Multiplying the conditional density (1.2.12) by the probability of the events 6i,
$1
&, 8, generates the joint density for Y* and the 6,‘s:
h(y*,8,,8,,&) = i
[ i=l
g,(y*)Pr(G,=l)
II
1
’ C gi(Y*)Pr(Gi=l) =l) “.
[ i=I,+l I
(1.2.13)
Densities of the form (1.2.8)-(1.2.13) appear repeatedly in the models for the
analysis of labor supply presented in Section 3.3.
All the densities in the preceding analysis can be modified to depend on
exogenous variables X, as can the support of the selection region (i.e. 9, = Q;(X)).
Writing f( y, z IX) to denote the appropriate conditional density, only obvious
notational modifications are required to introduce such variables.
I. 3. Sampling plans
A variety of different sampling plans are used to collect the data available to
labor economists. The econometric implications of data collected from such
sampling plans have received a great deal of attention in the discrete choice and
labor econometrics literatures. In this subsection we define the concepts of simple
random samples, truncated random samples, censored random samples, stratified
random samples, and choice based samples. To this end we let h(X) denote the
population density of the exogenous variables X, so that the joint density of
(Y,S,X) is
with c.d.f.
F(Y,6, Jo (1.3.2)
w, J-1EAlCA, (1.3.3)
‘It is possible to estimate this conditional distribution using the subsample generated by the
rcguircmcnt that (Y, 6, X) E A, for certain specific functional form assumptions for F. Such forms for
F are termed “recoverable” in the literature. See Heckman and Singer (1986) for further discussion of
this issue of recoverability.
1928 J. J. Heckman and T. E. MaCurdy
In the special case in which the subset A, only restricts the support of X,
(exogenous truncated and censored samples), the econometric analysis can pro-
ceed conditional on X. In light of the assumed exogeneity of X, the only possible
econometric problem is a loss in efficiency of proposed estimators.
Truncated and censored samples are special cases of the more general notion of
a strati$ed sample. In place of the special sampling rule (1.3.3), in a general
stratified sample, the rule for selecting independent observations is such that even
in an infinite sample the probability that (Y, 8, X) E Ai c A does not equal the
.population probability that (Y, 6, X) E A, where U f,iAi = A, and A i and A j
are disjoint for all i # j. It is helpful to further distinguish between exogenous&
stratijied and endogenously strati$ed samples.
In an exogenously stratified sample, selection occurs solely on the X in the
sense that the sample distribution of X does not converge to the population
distribution of X even as the sample size is increased. This may occur because
data are systematically missing for X in certain regions of the support, or more
generally because some subsets of the support of X are oversampled. However,
conditional on X, the sample distribution of (Y, 6 ]X) converges to the population
distribution. By virtue of the assumed exogeneity of X, such a sampling scheme
creates no special econometric problems.
In an endogenously stratified sample, selection occurs on (Y, 8) (and also
possibly on the X), and the sampling rule is such that the sample distribution of
(Y, S) does not converge to the population distribution F( Y, 8) (conditional or
unconditional on X). This can occur because data are missing for certain values
of Y or 6 (or both), or because some subsets of the support of these random
variables are oversampled. The special case of an endogenously stratified sample
in which, conditional on (Y, S), the population density of X characterizes the
data, i.e.
h(X,Y,S)=fgfy), (1.3.4)
7
‘Strictly speaking, the choice based sampling literature focuses on a model in which Y is integrated
out of the model so that 6 and X are the relevant random variables.
Ch. 32: L&or Econometrics 1929
2. Estimation
‘We note, however, that it is possible to construct examples of stratified sample selection rules that
cannot be cast in this format. For example, selection rules that weight various strata in different
(nonzero) proportions than the population proportions cannot be cast in the form of selection rule
(1.2.6).
1930 J. J. Heckman and T. E. MaCur&
recent work in labor econometrics. The normality assumption has come under
attack in the recent literature because when implications of it have been subject to
empirical test they have often been rejected.
It is essential to separate conceptual ideas that are valid for any index function
model from results special to the normal model. Most of the conceptual frame-
work underlying the normal index model is valid in a general nonnormal setting.
In this section we focus on general ideas and refer the reader to specific papers in
the literature where relevant details of normal models are presented.
For two reasons we do not discuss estimation of index function models by the
method of maximum likelihood. First, once the appropriate densities are derived,
there is little to say about the method beyond what already appears in the
literature. [See Amemiya (1985).] We devote attention to the derivation of the
appropriate densities in Section 3. Second, it is our experience that the conditions
required to secure identification of an index function model are more easily
understood when stated in a regression or method of moments framework.
Discussions of identifiability that appeal ‘.o the nonsingularity of an information
matrix have no intuitive appeal and cften degenerate into empty tautologies. For
these reasons we focus attention on regression and method of moments proce-
dures.
A special case of the index function framework set out in Section 1 writes Y and
Z as scalar random variables which are assumed to be linear functions of a
common set of exogenous variables X and unobservables U and V respectively.’
‘)By exogenous variables we mean that X is observed and is distributed independently of (U, V) and
that the parameters of the distribution of X are not functions of the parameters (fi, y) or the
parameters of the distribution of (U, V).
1931
y=xp+u, (2.14
z=xy+v, (2.1.2)
if ZEOt;
otherwise.
E(YlS=l,X)=xp+M, (2.1.3)
where
M=M(Xy,J,)=E(U16=1,X),
E(Y*~X)=E(Y*~6=O,X)Pr(G=O~X)+E(Y*~6=1,X)Pr(G=lIX)
= (Xfi+M)Pr(S=llX). (2.1.4)
1932 J. J. Heckmm und T. E. MaCurdy
~~Jh z - Xy)dz
f(UlZEO,, x) = (2.1.5)
Pl ’
sf,(
0,
z - Xy)dz, (2.1.6)
(2.1.7)
so X, may proxy M.
The essential feature of both examples is that in samples selected so that 6 = 1,
X is no longer exogenous with respect to the disturbance term lJ* ( = SU)
“‘It is not the case that L MX, = (aM/aX,), although the approximation may be very close. See
Byron and Rera (1983).
C/l. 37: Labor Econometrics 1933
“Many statisticians implicitly adopt the extreme view that nonworkers come from a different
population than workers and that there is no commonality of decision processes and/or parameter
values in the two populations. In some contexts (e.g. in a single cross section) these two views are
empirically indistinguishable. See the discussion of recoverability in Heckman and Singer (1986).
1934 J. J. Heckman and T. E. MaCurdy
hy?!ub)d~
E(V(6=1,X)= p ) (2.1.10)
1
E(Vl6=1,X)=E(VIV2-Xy,X)=
/OOxvufr,wu
p ) (2.1.12)
1
and
P,=Prob(S=ljX)=/W$a)du=l-F,.(-xy), (2.1.13)
Y
Distribution
(Mean, Variance, Density Truncated mea&
Sign of Skewne$ f”(U) E(uJu 2 - Xy)
Normal
(0,l.O)
Student’s th
Chi-square’
(0,2n, +)
for-niu<ce ~)-nC(Z,~)]~[r(f)(l-F;(-Xv))]
Logistic for n 2 X,
e” _
[lnF,.(Xy)~XyF,(-XY)I/F,,(XY)
(1+ eu)2
Uniform
5
(O,f,O) l/\/iz for IuI Ifi (0 - XY)/2 for IXYI IA
4
Log-normaId
p-‘/Z(,w2)+ U)-le-lln(e"*+u)l*/2 2
(O,e’-e,+)
e1,2 @(1-ln(e"2- XY)) _1
for u z - e1j2 for Xy 5 &I2
P
2
. @(-ln(e 1’2 - xy)) 2
[ 1 Q.
.Y
a The function F,(a) = /” ,f, ( u) d u in these formulae is the cumulative distribution function. .h
‘The parameter n denotes degrees of freedom. For Student’s f, it is assumed that n > 2. The function T(a) = /Ty”mlem-‘dy is the 3
gamma function. ci
‘The function G(a, b) = ffy”- ‘e-y dy is the incomplete gamma function. 4
d The function @( .) represents the standardized normal cumulative distribution function. 9
‘Skewness is defined as mean minus the median.
Ch. 32: Luhor Econometrics 1937
with
M=E(UIV2-Xy,X)= (2.1.15)
1-F,(-xy) .
This expression does not have a simple analytical solution except in very special
cases. Lee (1982) invokes the assumption that V is a standard normal random
variable, in which case A(V) =l (since p 03 = po4 - 3 = 0) and the conditional
mean is
(2.1.17)
where $I( .) and @( .) are, respectively, the density function and the cumulative
distribution functions associated with a standard normal distribution, and rt, r2,
and r3 are parameters.12
‘*The requirement that V is normally distributed is not as restrictive as it may first appear. In
particular, suppose that the distribution of V, F,( .) is not normal. Defining J( .) as the transforma-
tion W1 0 F,, , the random variable J(V) is normally distributed with mean zero and a variance equal
to one. Define a new unobserved dependent variable Z, by the equation
z,=-J(-Xy)+J(V). (*>
Since J( .) is monotonic, the events Z, > 0 and Z 2 0 are equivalent. All the analysis in the text
continues to apply if eq. (*) is substituted in place of eq. (2.1.2) and the quantities Xy and V are
replaced everywhere by - J( - Xy) and J(V), respectively. Notice that expression (2.1.17) for M
obtained by replacing Xy by - J( - X-r) does not arise by making a change of variables from V to
J(V) in performing the integration appearing in (2.1.15). Thus, (2.1.17) does not arise from a
Gram-Charlier expansion of the bivariate density for U and nonnormal V; instead, it is derived from
a Gram-Charlier expansion applied to the bivariate density of U and normal J(V).
1938 .I. J. Heckmun and T. E. MaCurdy
(2.1.18)
k=l
where the gk( .)‘s are known functions. The functional form implied for the
selection term is
M=E(U1b’--x)‘,x)= f TkE(gkll/k-xY,x)
k=l
= 5 T,rnk(X). (2.1.19)
k=l
Specifying a particular functional form for the g,‘s and the marginal distribution
for V produces an entire class of sample selection corrections that includes Lee’s
procedure as a special case.
Cosslett (1984) presents a more robust procedure that can be cast in the format
of eq. (2.1.19). With his methods it is possibie to consistently estimate the
distribution of V, the functions mk, the parameters TV, and K the number of
terms in the expansion. In independent work Gallant and Nychka (1984) present
a more robust procedure for correcting models for sample selection bias assuming
that the joint density of (U, V) is twice continuously differentiable. Their analysis
does not require specifications like (2.1.8), (2.1.14) or (2.1.18) or prior specifica-
tion of the distribution of V.
Among many possible generalizations of the preceding analysis, one of the most
empirically fruitful considers the situation in which the dependent variable Y is
generated by a different linear equation for each state of the world. This model
includes the “switching regression” model of Quandt (1958, 1972). The occur-
rence of a particular state of the world results from Z falling into one of the
mutually exclusive and exhaustive subsets of 0, O,, i = 0,. . . , I. The event Z E 0,
signals the occurrence of the ith state of the world. We also suppose that Y is
observed in states i = 1,. . . , I and is not observed in state i = 0. In state i > 0, the
equation for Y is
r=xp,+q, (2.1.20)
Ch. .{7: Lcrhor Econometrics 1939
where the U,‘s are error terms with E(U,) = 0. Define U = (U,, . . . , U,), and let
f,,,(lJ, V) be the joint density of U and the disturbance V of the equation
determining Z. The value of the discrete dependent variable
if Z E Oj,
(2.1.21)
otherwise,
records whether or not state i occurs. In this notation the equation determining
the censored version of Y may be written as
with
J -mlB,Uifu,u(U,,Z-XXY)dZdUi
(”
M;-E(U,IZEOi,X)= 2 (2.1.24)
pi
where f,,,( . , -) denotes the joint density of U, and I’, and P, = Prob( Z E Oil X) is
the probability that state i occurs.
Paralleling the analysis of Section 2.1.2, one can develop explicit specifications
for each selection bias correction term M, by using formulae such as (2.1.9,
(2.1.14) or (2.1.18). With the convention that Y* = 0 when 6, =l, the regression
functions (2.1.23) can be combined into a single relation
In the second case considered here not all states of the world are observed by
the econometrician. It often happens that it is known if Y is observed, and the
1940 J. .I. Heckmun und T. E. MaCurdy
where P,= Prob( Z E @ilX).‘3 Relation (2.1. 26) 1‘s the regression of Y on X for
the case in which Y is observed but the particular state occupied by an
observation is not observed.
Using (2.1.22), and recalling that Y * = Y(l - 8,) is a censored random variable,
the regression of Y * on X is
If Y is observed for all states of the world, then Y * = Y, 6, = 0, and (2.1.26) and
(2.1.27) are identical because the set 0, is the null set so that PO= 0 and
C~=,P,=l.i
Extensions of the basic framework presented above provide a rich structure for
analyzing a wide variety of problems in labor econometrics. We briefly consider
three useful generalizations.
The first relaxes the linearity assumption maintained in the specification of the
equations determining the dependent variables Y and Z. In eqs. (2.1.1) and (2.1.2)
substitute h y( X, p) for Xp and h,( X, y) for Xy where h y(. , .) and
131n order to obtain (2.1.26) we use the fact that the 0,‘s are nonintersecting sets so that
=Prob
(
h z(. , .) are known nonlinear functions of exogenous variables and parameters.
Modifying the preceding analysis and formulae to accommodate this change in
specification only requires replacing the quantities Xb and Xy everywhere by the
functions h r and h,. A completely analogous modification of the multi-state
model introduces nonlinear specifications for the conditional expectation of Y in
the various states.
A second generalization extends the preceding framework of Sections 2.1.1-2.1.3
by interpreting Y, Z and the errors U and V as vectors. This extension enables the
analyst to consider a multiplicity of behavioral functions as well as a broad range
of sampling rules. No conceptual problems are raised by this generalization but
severe computational problems must be faced. Now the sets 0, are multidimen-
sional. Tallis (1961) derives the conditional means relevant for the linear multi-
variate normal model, but it remains a challenge to find other multivariate
specifications that yield tractable analytical results. Moreover, work on estimating
the multivariate normal model has just begun [e.g. see Catsiapsis and Robinson
(1982)]. A current area of research is the development of computationally
tractable specifications for the means of the disturbance vector lJ conditional on
the occurrence of alternative states of the world.
A third generalization allows the sample selection rule to depend directly on
realized values of Y. For this case, the sets Oi are replaced by the sets Oi where
(Y, Z) E fij designates the occupation of state i. The integrals in the preceding
formulae are now defined over the Oi. In place of the expression for the selection
term M in (2.1.7), use the more general formula
where
P, = / 12,fuo(~
- W, z - Xy)dzd_v,
available for this two-state model can be directly generalized to more complicated
models.
For the two-state model, expression (2.1.3) implies that the regression equation
for Y conditional on X and 6 = 1 is given by
Y=Xp+M+e,
Y=Xp+mr+e. (2.1.29)
The implied regression equation for the censored dependent variable Y * = SY is
Y*=(X~+mr)(l-~O(-Xy;~))+~, (2.1.30)
where E is a disturbance with E(EIX) = 0 and we now make explicit the
dependence of F, on #.
The appropriate procedure for estimating the parameters of regression eqs.
(2.1.29) and (2.1.30) depends on the sampling plan that generates the available
data. It is important to distinguish between two types of samples discussed in
Section 1: truncated samples which include data on Y and X only for observa-
tions for which the value of the dependent variable Y is actually known (i.e.
where Z 2 0 for the model under consideration here), and censored samples
which include data on Y * and X from a simple random sample of 6, X and Y *.
For a truncated sample, nonlinear least squares applied to regression eq.
(2.1.29) can be used to estimate the coefficients of p and r and the parameters y
and # which enter this equation through the function m. More specifically,
defining the function g and the parameter vector 8 as g( X, 13)= XP + m( Xy, I/J)T
and 0’ = (p’, r’, y’, #‘), eq. (2.1.29) can be written as
Y=g(X,d)+e. (2.1.31)
(2.1.32)
where N is the size of the truncated sample, a6,/&3], denotes the gradient
vector of g for the n th observation evaluated at 8, and d, symbolizes the least
square residual for observation n. Thus
For censored samples, two regression methods are available for estimating the
parameters p, r, y, and 4. First, one can apply the nonlinear least squares
procedure just described to estimate regression eq. (2.1.30). In particular, reinter-
preting the function g as g( X, 8) = [X/3 + m( Xy, $)r](l- F,( - Xy; $)), it is
straightforward to write eq. (2.1.30) in the form of an equation analogous to
(2.1.31) with Y* and E replacing Y and e. Since the disturbance E has a zero
mean conditional on X and is distributed independently across the observations
making up the censored sample, under standard regularity conditions nonlinear
least squares applied to this equation yields a consistent estimator 8 with a
large-sample normal distribution. To account for potential heteroscedasticity
compute the asymptotic variance-covariance matrix of 8 using the formula in
(2.1.33) with the matrices H and R calculated by summing over the N *
observations of the censored sample.
A second type of regression procedure can be implemented on censored
samples. A two-step procedure can be applied to estimate the equation for Y
given by (2.1.29). In the first step, obtain consistent estimates of the parameters y
and J/ from a discrete choice analysis which estimates the parameters of P,. From
these estimates it is possible to consistently estimate m (or the variables in the
vector m). More specifically, define 0; = (y’, 1c/‘) as a parameter vector which
uniquely determines m as a function of X. The log likelihood function for the
independently distributed discrete variables S,, given X,,, n = 1,. . _, N * is
E[6,ln(l-F,,(-X,y;J/))+(1-6,)ln(F,,(-X,y;~))l.
il = 1
(2.1.34)
1944 J. J. Heckman und T. E. MuCurdy
Under general conditions [See Amemiya (1985) for one statement of these
conditions], maximum likelihood estimators of y and 1c/are consistent, and with
maximum likelihood estimates fiZ one can construct Cz, = m( X,7,I/J) for each
observation. In step two of the proposed estimation procedure, replace the
unobserved variable m in regression eq. (2.1.29) by its constructed counterpart A
and apply linear least-squares to the resulting equation using only data from the
subsample in which Y and X are observed. Provided that the model is identified,
the second step produces estimators for the parameters S; = (p’, 7’) that are both
consistent and asymptotically normally distributed.
When calculating the appropriate large-sample covariance matrix for least
squares estimator 8i, one must account for the fact that in general the dis-
turbances of the regression equation are heteroscedastic and that the variables fi
are estimated quantities. A consistent estimator for the covariance matrix which
accounts for both of these features is given by
Q, = 5 wnw,‘,
n= 1
Q2= ;
n=l
w,,w,@-2, and Qx= t w&I,;
n=l
(2.1.36)
where the row vector wn = (X,, fi,)’ denotes the regressors for the n th observa-
tion, the variable e”, symbolizes the least-squares residual, and the row vector
de,,/&‘; ]e is the gradient of the function e, = Y,, - X,/3 - m,,a_with respect to y
and + evaluated at the maximum likelihood estimates 7 and 4 and at the least
squares estimates fl and 7”-i.e.
14To derive the expression for the matrix C given by (2.1.35) we use the following result. Let
L,, = L( 8,. X,,) denote the n th observation on the gradient of the likelihood function (2.1.34) with
respect to B,, with this gradient viewed as a function of the data and the true value of 8,; and let w,,
and eon be KJ,, and e, evaluated at the true parameter values. Then E( w,J,,e,l,,Lk 18, = 1, X,) =
w,,,E(~,,,,I~,=~,X,)L’,(~,=~,X,)=O.
Ch. 32: I.uhor Econometrics 1945
8, 7 N&C). (2.1.38)
Two versions of the dummy endogenous variable model are commonly confused
in the literature: fixed coefficient models and random coefficient models. These
specifications should be carefully distinguished because different assumptions are
required to consistently estimate the parameters of these two distinct models. The
fixed coefficient model requires fewer assumptions.
In the fixed coefficient model
r=xp+th+u, (2.2.1)
z=xy+v, (2.2.2)
where
if Z20,
otherwise,
1946 J. J. Necknmn and T. E. MaCur&
U and V are mean zero random disturbances, and X is exogenous with respect to
U. Simultaneous equation bias is present in (2.2.1) when lJ is correlated with S.
In the random coefficient model the effect of 6 on Y (holding U fixed) varies in
the population. In place of (2.2.1) we write
Y=xp+S(a+&)+U, (2.2.3)
where E is a mean zero error term. l5 E q uation (2.2.2) is unchanged except now V
may be correlated with E as well as U. The response to 6 = 7 differs in the
population, with successively sampled observations assumed to be random draws
from a common distribution for (17, E, V). In this model X is assumed to be
exogenous with respect to (U, E). Regrouping terms, specification (2.2.3) may be
rewritten as
Y= xp+6a+(U+&). (2.2.4)
Y=s(CY+Xp+U+&)+(1-8)(Xp+U), (2.2.5)
this equation is of the form of multi-state eq. (2.1.22). The equivalence of (2.2.5)
and (2.1.22) follows directly from specializing the multi-state framework so that:
(i) 6,~ 0 (so th at there is no censoring and Y = Y *); (ii) I = 2 (which along with
(i) implies that th ere are two states); (iii) 6 = 1 indicates the occurrence of state 1
and the events 6, = 1 and 6, = 0 (with 1 - 6 = 1 indicating the realization of state
2); and (iv) X& = Xfi, Vi = U, Xfi, = X/3+ a, and U, = U + E. In this notation eq.
(2.2.3) may be written as
Y= xp,+~x(p,-p2)+u1+(ui-u2)8. (2.2.6)
One empirically fruitful generalization of this model relaxes (iv) by letting both
slope and intercept coefficients differ in the two regimes. Equation (2.2.6) with
“Individuals may or may not know their own value of E. “Randomness” as used here refers to the
cconometrician’s ignorance of E.
C'h.31: L&or Econometrm 1947
condition (iv) modified so that PI and & are freely specified can also be used to
represent this generalization.
Fixed coefficient specification (2.2.1) specializes the random coefficient model
further by setting E = 0 so U, - U, = 0 in (2.2.6). In the fixed coefficient model,
U, = lJ, so that the unobservables in the state specific eqs. (2.1.20) are identical in
each state. Examples of economic models which produce this specification are
given below in Section 3.2.
The random coefficient and the fixed coefficient models are sometimes confused
in the literature. For example, recent research on the union effects on wage rates
has been unclear about the distinction [e.g. see Freeman (1984)]. Many of the
cross section estimates of the union impact on wage rates have been produced
from the random coefficient model [e.g. see Lee (1978)] whereas most of the recent
longitudinal estimates are based on a fixed coefficient model, or a model that can
be transformed into that format [e.g. see Chamberlain (1982)]. Estimates from
these two data sources are not directly comparable because they are based on
different model specifications.16
Before we consider methods for estimating both models, we mention one aspect
of model formulation that has led to considerable confusion in the recent
literature. Consider an extension of equation system (2.2.1)-(2.2.2) in which
dummy variables appear on the right-hand side of each equation
Y=Xfi+a,&+U, (2.2.7a)
2 = xy + Q1*8r+ V, (2.2.7b)
where
if YTO,
otherwise,
and
if 22 0,
otherwise.
ala2 = 0. (2.2.8)
[See Heckman (1978) or Schmidt (1981)]. This assumption- termed the “principal
“For further discussion of this point, see Heckman and Robb (1985).
1948 J. J. Heckman and T. E. MaCurdy
assumption” in the literature - rules out contradictions such as the possibility that
Y 2 0 but 6, = 0, or other such contradictions between the signs of the elements
of (Y, 2) and the values assumed by the elements of (a,, 6,).
The principal assumption is a logical requirement that any well-formulated
behavioral model must satisfy. An apparent source of confusion on this point
arises from interpreting (2.2.7) as well-specified behavioral relationships. In the
absence of a precise specification determining the behavioral content of (2.2.7) it
is incomplete. The principal assumption forces the analyst to estimate a well-
specified behavioral and statistical model. This point is developed in the context
of a closely related model in an appendix to this paper.
E@(X)=l-F,(-Xy). (2.2.10)
Given knowledge of the functional form of F,, one can estimate (2.2.9) by
nonlinear least squares. The standard errors for this procedure are given by
(2.1.32) and (2.1.33) where g, in these formulae is defined as g, = Y, - X,*/3 - a(1
- E;(- %Y)).
One benefit of this direct estimation procedure is that the estimator is con-
sistent even if 8 is measured with error because measurements on 6 are never
directly used in the estimation procedure. Notice that the procedure requires
specification of the distribution of V (or at least its estimation). Specification of
the distribution of U or the joint distribution of U and V is not required.
2.2.2.3. Invoking a distributional assumption about U. The coefficients of (2.2.1)
can be identified if some assumptions are made about the distribution of U. NO
assumption need be made about the distribution of V or its stochastic dependence
with U. It is not required to precisely specify discrete choice eq. (2.2.2) or to use
nonlinearities or exclusion restrictions involving exogenous variables which are
utilized in the two estimation strategies just presented. No exogenous variables
need appear in either equation.
If U is normal, (Y and /3 are identified given standard rank conditions even if
no regressor appears in the index function equation determining the dummy
variable (2.2.2). Heckman and Robb (1985) establish that if E(U3) = E(U’) = 0,
which is implied by, but weaker than, assuming symmetry or normality of U, LY
and p are identified even if no regressor appears in the index function (2.2.2). It is
thus possible to estimate (2.2.1) without a regressor in the index function equation
determining 6 or without making any assumption about the marginal distribution
of V provided that stronger assumptions are maintained about the marginal
distribution of U.
In order to see how identification is secured in this case, consider a simplified
version of (2.2.1) with only an intercept and dummy variable 6
y=p,+sa+u. (2.2.11)
1950 J. J. Heckman and T. E. MaCurdy
$ 5 [(Yn-r)-&(6n-8)]3=0,
n=l
(2.2.12a)
and
~n~lI(Y.e)-a(s.-s)]5=o. (2.2.12b)
where y and 8 are sample means of Y and 6 respectively. There is only one
consistent root that satisfies both equations. The inconsistent roots of (2.2.12a) do
not converge to the inconsistent roots of (2.2.12b). Choosing a value of a to
minimize a suitably weighted sum of squared discrepancies from (2.2.12a) and
(2.2.12b) (or choosing any other metric) solves the small sample problem that for
any finite N (2.2.12a) and (2.2.12b) cannot be simultaneously satisfied. For proof
of these assertions and discussion of alternative moment conditions on U to
secure identification of the fixed coefficient model, see Heckman and Robb
(1985).
Many of the robust consistent estimators for the fixed coefficient model are
inconsistent when applied to estimate (Yin the random coefficient model.17 The
reason this is so is that in general the composite error term of (2.2.4) does not
possess a zero conditional (on X) or unconditional mean. More precisely,
E(&jX)# 0 and E(&)# 0 even though E(UIX)= 0 and E(U)= 0.” The
instrumental variable estimator of Section 2.2.2.1 is inconsistent because E( U +
SE1X) # 0 and so X and functions of X are not valid instruments. The nonlinear
least squares estimator of Section 2.2.2.2 that conditions on X is also in general
inconsistent. Instead of (2.2.9) the conditional expectation of Y given X for eq.
(2.2.4) is
“In certain problems the coefficient of interest is a + E( E 18= 1). Reparameterizing (2.2.4) to make
this rather than (Yas the parameter of econometric interest effectively converts the random coefficient
model back into a fixed coefficient model when no regressors appear in index function (2.2.2).
‘*However, some of the models presented in Section 3.2 have a zero unconditional mean for 8~.
This can occur when E is unknown at the time an agent makes decisions about 8.
Ch. 32: L.&or Econometrics 1951
Inconsistency of the nonlinear least squares estimator arises because the unob-
served omitted term E( 8~ 1X) is correlated with the regressors in eq. (2.2.9).
Y=s(xp,+v,)+(l-6)(xp,+u*). (2.2.14)
Relation (2.1.25) for the multi-state model of Section 2.1 implies that the
regression equation for Y on 6 and X is
Y=8(X&+M,)+(l-S)(X&+M,)+e, (2.2.15)
where the functional forms of the elements of the row vectors m 1 and m 2 depend
on the particular specification chosen from Section 2.1.2.19 Substituting (2.2.16)
into (2.2.15), the regression equation for Y becomes
Y=X:P,+X2*P2+m:r1+m:r2+e, (2.2.17)
where
“Inspection of eq. (2.2.2) and the process generating 6 reveals that the events 8 =l and 6 = 0
correspond to the conditions V 2 - Xy Ad - V > Xy; and, consequently, the functions Ml and M2
have forms completely analogous to the selection correction M whose specification is the topic of
Section 2.1.2.
1952 J. J. IIeckman and T. E. MuCur&
(2.2.18)
This section applies the index function framework to specific problems in labor
economics. These applications give economic content to the statistical framework
presented above and demonstrate that a wide range of behavioral models can be
represented as index function models.
Three prototypical models are considered. We first present models wit!r a
“reservation wage” property. In a variety of models for the analysis of unemploy-
ment, job turnover and labor force participation, an agent’s decision process can
be characterized by the rule “stay in the current state until an offered wage
exceeds a reservation wage.” The second prototype we consider is a dummy
endogenous variable model that has been used to estimate the impact of school-
ing, training, occupational choice, migration, unionism and job turnover on
wages. The third model we discuss is one for labor force participation and hours
of work in the presence of taxes and fixed costs of work.
Many models possess a reservation wage property, including models for the
analysis of unemployment spells [e.g. Kiefer and Neumann, (1979), Yoon (1981,
1984), Flinn and Heckman (1982)], for labor force participation episodes [e.g.
Heckman and Willis (1977); Heckman (1381), Heckman and MaCurdy (1980)
Killingsworth (1983)], for job histories [e.g. Johnson (1978), Jovanovic (1979),
Miller (1984), Flinn (1984)] and for fertility and labor supply [Moffit (1984) Hotz
and Miller (1984)]. Agents continue in a state until an opportunity arises (e.g. an
offered wage) that exceeds the reservation wage for leaving the state currently
Ch. 32: Labor Economeirics 1953
occupied. The index function framework has been used to formulate and estimate
such models.
where U,( .) and U,( -) denote partial derivatives. The market wage W(t) is
assumed to be known to the agent but it is observed by the econometrician only if
the agent works.
In terms of the index function apparatus presented in Section 1,
If Z(t) 2 0 the agent works, 8(t) = 1, and the wage rate W(t) is observed. Thus
the observed wage is a censored random variable
Setting A(t) = exp{ X(t)/?, + e(t)} wh ere e(t) is a mean zero disturbance, the
reservation wage is
lnW,(t) = X(t)p,+ln(y/a)+(l-ar)lnR(t)+e(t).
lnW(t)=X(t)&+U(t).
Define an index function for this example as Z(t) = In W(t)- In W,( t), so that
Z(t)= X(t)(kPI)-ln(v/a)-(I-a)lnR(t)+V(t),
Y(t)=lnW(t)=X(t)&+U(t),
Y*(t)=Y(t)G(t)=G(t)X(t)p,+S(t)u(t).
X(t)(P1-&)+lny/a+(l-a)lnR(t)
Pr(S(t)=lIX(t),R(t))=l-Go
i au i:
The index function model provides the framework required to give econometric
content to the conventional model of search unemployment. As in the labor force
participation example just presented, agents continue on in a state of search
unemployment until they receive an offered wage that exceeds their reservation
wage. Accepted wages are thus censored random variables. The only novelty in
the application of the index function to the unemployment problem is that a
different economic theory is used to produce the reservation wage.
In the most elementary version of the search model, agents are income
maximizers. An unemployed agent’s decision problem is very simple. If cost c is
incurred in a period, the agent receives a job offer but the wage that comes with
the offer is unknown before the offer arrives. This uncertainty is fundamental to
the problem. Successive wage offers are assumed to independent realizations from
a known absolutely continuous wage distribution F(W) with E) WI -c co. Assum-
1956 J. J. Heckman and T. E. MaCurdy
ing a positive real interest rate r, no search on the job, and jobs that last forever
(so there is no quitting from jobs), Lippman and McCall (1976) show that the
value of search at time t, V(t), is implicitly determined by the functional equation
The reservation wage is W, = rV. This function clearly depends on c, r and the
parameters of the wage offer distribution. Conventional exclusion restrictions of
the sort invoked in the labor force participation example presented in the
previous section cannot be invoked for this model.
Solving (3.1.5) for W, = rV and inserting the function so obtained into eqs.
(3.1.1) and (3.1.2) produces a statistical model that is identical to the deterministic
labor force participation model.
Except for special cases for F, closed form expressions for W, are not
available.21 Consequently, structural estimation of these models requires numeri-
cal evaluation of implicit functions (like V(t) in (3.1.4)) as input to evaluation of
sample likelihoods. To date, these computational problems have inhibited wide
20The reservation wage property characterizes other models as well. See Lippman and McCall
(1976).
21See Yoon (1981) for an approximate closed form expression of IV,.
Ch. 32: L.&or Economerrics 1951
scale use of structural models derived from dynamic optimizing theory and have
caused many analysts to adopt simplifying approximations.22
The density of accepted wages is
f(w)
dw*) = w* 2 w,, (3.1.6)
l- F(W,) ’
The joint density of durations and accepted wages is the product of (3.1.6) and
(3.1.7), or
where
w* 2 w,. (3.1.8b)
**Coleman (1984) presents indirect reduced form estimation procedures which offer a low cost
alternative to costly direct maximum likelihood procedures. Flinn and Heckman (19X2), Miller (1985),
Wolpin (1984), and Rust (1984) discuss explicit solutions to such dynamic problems. Kiefer and
Neumann (1979), Yoon (1981, 1984), and Hotz and Miller (1984) present approximate solutions.
23A potential source of such restrictions makes r and c known functions of exogenous variables.
24 Kiefer and Neumann (1979) achieve identification in this manner.
1958 J. J. Heckmun und T. E. MaCur&
(3.1.9)
i.e. n is now restricted to produce a nonnegative reservation wage that is less than
(or equal to) the offered accepted wage. Modifying density (3.1.8a) to reflect this
dependence and letting #(n) be the density of n leads to
h(w’,j)=J(?jlo~w,(?j)~w*] ~~~(w,(n))]i(w*)~(~)d~.
t=1
(3.1.10)
The index function model can also be used to provide a precise econometric
framework for models of on-the-job learning and job turnover developed by
Johnson (1978), Jovanovic (1979), Flinn (1984) and Miller (1985). In this class of
models, agents learn about their true productivity on a job by working at the job.
We consider the most elementary version of these models and assume that
workers are paid their realized marginal product, but that this product is due, in
part, to random factors beyond the control of the agent. Agents learn about their
true productivity by a standard Bayesian learning process. They have beliefs
about the value of their alternatives elsewhere. Ex ante all jobs look alike in the
simplest model and have value V,.
The value of a job which currently pays wage W( t ) in the t th period on the job
is V( W( t)). An agent’s decision at the end of period t given W(t) is to decide
whether to stay on the job the next period or to go on to pursue an alternative
opportunity. In this formulation, assuming no cost of mobility and a positive real
interest rate r,
25See Flinn and Heckman (1982) for further discussion of this point.
26For further discussion of identification in this model, see Flinn and Heckman (1982)
Ch. 32: I~hor Econonwtrkv 1959
where the expectation is taken with respect to the distribution induced by the
information available in period t which may include the entire history of wage
payments on the job. If V, > E,V( IV(t + 1)) the agent changes jobs. Otherwise,
he continues on the job for one more period.
This setup can be represented by an index function model. Wages are observed
at a job in period t + 1 if E,( V( W’(t + 1))) > V,.
In this subsection we consider some examples of well posed economic models that
can be cast in terms of the dummy endogenous variable framework presented in
Section 2.2. We consider fixed and random coefficient versions of these models for
both certain and uncertain environments. We focus only on the simplest models
in order to convey essential ideas.
x(t)p+6a+U(t), t>k
W(t) = (3.2.1)
X(t)6 + WY tsk.’
In writing this equation, we suppose that all individuals have access to training
at only one period in their life (period k) and that anyone can participate in
27Miller (1984) provides a discussion and an example of estimation of this class of models
1960 J. J. Heckman and T. E. MuCurdy
training if he or she chooses to do so. However, once the opportunity to train has
passed, it never reoccurs. Training takes one period to complete.28
Income maximizing agents are assumed to discount all earnings streams by a
common discount factor l/(1 + r). From (3.2.1) training raises earnings by an
amount (Y per period. While taking training, the individual receives subsidy S
which may be negative, (e.g. tuition payments). Income in period k is foregone
for trainees. To simplify the algebra we assume that people live forever.
As of period k, the present value of earnings for an individual who does not
receive training is
f’v(l)=S+ E($+++j)+,f$-q.
j=l
The present value maximizing enrollment rule has a person enroll in the program
if PV(1) > PV(0). Letting Z be the index function for enrollment,
Z=PV(l)-PV(O)=S-W(k)+;, (3.2.2)
and
if S-W(k)+:>0
(3.2.3)
otherwise
Because W(k) is not observed for trainees, it is convenient to substitute for W(k)
in (3.2.2) using (3.2.1). In addition some components of subsidy S may not be
observed by the econometrician. Suppose
S=Q#+v, (3.2.4)
28The assumption that enrollment decisions are made solely on the basis of an individual’s choice
process is clearly an abstraction. More plausibly, the training decision is the joint outcome of decisions
taken by the prospective trainee, the training agency and other agents. See Heckman and Robb (1985)
for a discussion of more general models.
Ch. 32: Labor Econometrics 1961
have
(y= l ifQ#+F-X(k)fi+q-U(k)>0
(3.2.5)
i0 otherwise
W(t)=X(t)p+qa+E)+U(t)
=X(t)p+&x+U(t)+&S. (3.2.6)
using the notation of eq. (2.2.3). This model captures the notion of a variable
effect of training (or unionism or migration or occupational choice, etc.) on
earnings.
If agents know E when they make their decisions about S, the following
modification to (3.2.5) characterizes the decision process:
if Q$+q-X(k)/?+q-U(k)fe/r>O
(3.2.7)
otherwise
The fact that E appears in the disturbance terms in (3.2.6) and (3.2.7) creates
another source of covariance between 6 and the error term in the earnings
equation that is not present in the fixed coefficient dummy endogenous variable
model.
The random coefficient model captures the key idea underlying the model of
self selection introduced by Roy (1951) that has been revived and extended in
recent work by Lee (1978) and Willis and Rosen (1979). In Roy’s model, it is
solely population variation in X(k), E, and U(k) that determines 6 (so 9 = Q = 0
in (3.2.7)).30
As noted in Section 2, the fixed coefficient and random coefficient dummy
endogenous variable models are frequently confused in the literature. In the
context of studies of the union impact on wages, Robinson and Tomes (1984)
find that a sample selection bias correction (or M-function) estimator of (Yand an
instrumental variable estimator produce virtually the same estimate of the coeffi-
cient. As noted in Section 2.2.3, the instrumental variable estimator is inconsistent
for the random coefficient model while the sample selection bias estimator is not.
Both are consistent for cx in the fixed coefficient model. The fact that the same
estimate is obtained from the two different procedures indicates that a fixed
coefficient model of unionism describes their data. (It is straightforward to
develop a statistical test that discriminates between these two models that is
based on this principle.)
if E,_, S-W(k)+?]>0
9 (3.2.8)
otherwise
“For further discussion of this model and its applications see Heckman and Sedlacek (1985)
Ch. 32: Labor Econometrics 1963
The index function framework has found wide application in the recent empirical
literature on labor supply. Because this work is surveyed elsewhere [Heckman and
MaCurdy (1981) and Moffitt and Kehrer (1981)], our discussion of this topic is
not comprehensive. We briefly review how recent models of labor supply dealing
with labor force participation, fixed costs of work, and taxes can be fit within the
general index function framework.
We initially consider a simple model of hours of work and labor force participa-
tion that ignores fixed costs and taxes. Let W be the wage rate facing a consumer,
C is a Hick’s composite commodity of goods and L is a Hicks’ composite
commodity of nonmarket time. The consumer’s strictly quasi-concave preference
311n the more general case in which future earnings are not known, the optimal forecasting rule for
W(k) depends on the time series process generating U(t). For an extensive discussion of more general
decision processes under uncertainty see Heckman and Robb (1985). An uncertainty model provides
yet another rationalization for the results reported in Robinson and Tomes (1984).
1964 J. J. Heckmun nnd T. E. MuCurdy
for MRS, where H is hours of work and C = R + MRS *H. In equilibrium the
wage equals MRS. The reservation wage is MRS( R, 0, v). The consumer works if
otherwise, he does not. If condition (3.3.2) is satisfied, the labor supply function is
determined by solving the equation MRS( R, H, v) = W for H to obtain
H=H(R,W,v). (3.3.3)
Consider a population of consumers who all face wage W and receive unearned
income R but who have different v’s The density k(vl W) is the conditional
density of “tastes for work” over the population with a given value of W. Letting
r, denote the subset of the support of v which satisfies MRS( R, 0, v) < W for a
given W, the fraction of the population that works is
(3.3.4)
where v is a mean zero, normally distributed error term. Market wage rates are
written as
lnW=Po+PIXl+v, (3.3.8)
where n is a normally distributed error term with zero mean. Equating (3.3.7) and
(3.3.8) for equilibrium hours of work for those observations satisfying In W >
MRS( R,O, v), one obtains
H=$[lnW-lnMRS(R.O,v)]
=~(P,-ao+&Xl-a,R-a,X,)+$(q-v). (3.3.9)
In terms of the conceptual apparatus of Sections 1 and 2, one can interpret this
labor supply model as a two-state model. State 0 corresponds to the state in which
the consumer does not work which we signify by setting the indicator variable
6 = 0. When 6 = 1 a consumer works and state 1 occurs. Two index functions
characterize the model where Y’ = (Y,, Y,) is a two element vector with
Y, = H and Y, = In W.
The consumer works (6 = 1) when (Y,, Y,) E Q1 where Q2,= {(Y,, Y,)( Y, > 0,
- 00 I Y, I co} is a subset of the support of (Y,, Y,). Note that the exogenous
variables X include Xi, X, and R. The joint distribution of the errors v and TJ
induces a joint distribution f(y,, y,]X) for Y via eqs. (3.3.8) and (3.3.9). Letting
Y * = 6Y denote the observed value of Y, Yr* = H * represents a consumer’s
actual hours of work and Y;C equals In W when the consumer works and equals
zero otherwise.
1966 J. J. Heckman and T. E. MaCurdy
By analogy with eq. (1.2.8) the joint density of hours and wages conditional on
X and working is given by
g(y*p =l, x) =
f(YL Y,*lX)
l,J(Yl? Y,lX)dY,dY,
(3.3.10)
(3.3.12)
3.3.2. A general model of labor supply with fixed costs and taxes
In this section we extend the simple model presented above to incorporate fixed
costs of work (such as commuting costs) and regressive taxes. We present a
general methodology to analyze cases in which marginal comparisons do not fully
characterize labor supply behavior. We synthesize the suggestions of Burtless and
Hausman (1978), Hausman (1980), Wales and Woodland (1979), and Cogan
(1981).
Fixed costs of work or regressive taxes produce a nonconvex budget constraint.
Figure 1 depicts the case considered here. 32 This figure represents a situation in
which a consumer must pay a fixed money cost equal to F in order to work. R, is
his nonlabor income if he does not work. Marginal tax rate of t,
32Generalization to more than two branches involves no new principle. Constraint sets like R,SN
are alleged to be common in negative income tax experiments and in certain social programs.
Ch. 32: I*rhor Econometrics 1967
Consumption
State 3 / \
I \
I Y \ ,r State 1
T HL 1’
Hours Worked
State 2 ’,
\
\
\
\
\
\
\
R3
Figure 1
applies to the branch R,S defined up to p hours, and a lower marginal rate t,
applies to branch NV.
Assuming that no one would ever choose to work T or more hours, a consumer
facing this budget set may choose to be in one of three possible states of the
world: the no work position at kink point R, (which we define as state l), or an
interior equilibrium on either segment R,S or segment SN (defined as states 2
and 3, respectively).33 A consumer in state 1 receives initial after-tax income R,.
In state 2, a consumer receives unearned income R, and works at an after-tax
wage rate equal to W, = W(l- tA) where W is the gross wage. A consumer in
state 3 earns after-tax wage rate W, = W(l- te) and can be viewed as receiving
the equivalent of R, as unearned income. Initially we assume that W is exoge-
nous and known for each consumer.
In the analysis of kinked-nonconvex budget constraints, a local comparison
between the reservation wage and the market wage does not adequately char-
acterize the work-no work decision as it did in the model of Section 3.3.1. Due to
the nonconvexity of the constraint set, existence of an interior solution on a
branch does not imply that equilibrium will occur on the branch. Thus in Figure
1, point B associated with indifference curve ZJ, is a possible interior equilibrium
on branch R,S that is clearly not the global optimum.
33The kink at S is not treated as a state of the world because preferences are assumed to be twice
diff‘erentiable and quasiconcave.
1968 J. J. IIeckmun und T. E. MuCur+
H=F=H(R,w,v).
R
While the arguments of the functions U(e), V( .), and H( .) may differ across
consumers, the functional forms are assumed to be the same for each consumer.
If a consumer is at an interior equilibrium on either segment R,S or SN, then
the equilibrium is defined by a tangency of an indifference curve and the budget
constraint. Since this tangency indicates a point of maximum attainable utility,
the indifference curve at this point represents a level of utility given by V( R,, W,, v)
where Rj and W, are, respectively, the after-tax unearned income and wage rate
associated with segment i. Thus, hours of work for an interior equilibrium are
given by V,/V, evaluated at R, and W,. For this candidate equilibrium to be
admissible, the implied hours of work must lie between the two endpoints of the
interval (i.e. equilibrium must occur on the budget segment). A consumer does
not work if utility at kink R,, U( R,, T, v), is greater than both V( R >, W,, v) and
V( R,, W,, v), provided that these latter utility values represent admissible solu-
tions located on the budget constraint.
More specifically, define the labor supply functions Hcl), Hc2, and Hc3, as
Hcl, = 0 and
and
We assume the E!/(.) is chosen so that V( .) > 0 for all C, L, and v. A consumer
C‘h. 32: Labor Econometrics 1969
The sets F,, F,, and F, do not intersect, and their union is the relevant subspace
of the support of v. These sets are thus mutually exclusive.34 The functions Hcij
determine the hours of work for individuals for whom v E c.
Choosing a specification for the preference function and a distribution for
“ tastes” in the population, G(V), produces a complete statistical characterization
of labor supply behavior. The probability that a consumer is in state i is
Pr(vEc)=_/.+(v)dv. (3.3.19)
(3.3.20)
We have thus far assumed: (i) that data on potential wage rates are available
for all individuals including nonworkers, and (ii) that wage rates are exogenous
34 Certain values for Y may be excluded if they imply such phenomena as negative values of U or V
or nonconvex preferences. In this case we use the conditional density of B excluding those values.
1970 J. J. Heckmun and T. E. Ma(‘ur&
variables. Relaxing these assumptions does not raise any major conceptual
problems and makes the analysis relevant to a wider array of empirical situations.
Suppose that market wage rates are described by the function
w= Wxll), (3.3.22)
FE {(v~11)l~i)2v(j)
for all j } , (3.3.23)
(recall that equality holds on a set of measure zero) replace the characterization of
the sets ri for known wages given by (3.3.16)-(3.3.18). A consumer for whom
(Y, 7) E r; occupies state i. The probability of such an event is
(3.3.25)
i=l
where
(3.3.26)
Thus far we have assumed that hours of work and wages are not measured with
error. The needed modifications required in the preceding analysis to accommo-
date measurement error are presented in Heckman and MaCurdy (1981).
To illustrate the required modifications when measurement error is present,
suppose that we express the model in terms of u and n and that errors in the
variables plague the available data on hours of work. When H > 0, suppose that
measured hours, which we denote by H+, are related to true hours by the
equation H + = H + e where e is a measurement error distributed independently
of the explanatory variables X. When such errors in variables are present, data on
hours of work (i.e. H+ when H > 0 and H when H = 0) do not allocate working
individuals to the correct branch of the budget constraint. Consequently, the
states of the world a consumer occupies can no longer be directly observed.
This model translates into a three index function model of the sort described in
Section 1.2. Two index functions, Y’= (Y,, Yz) = (Hi, W) are observed in some
states, and one index function, 2 = v, is never directly observed. Given an
assumption about the joint distribution of the random errors v, 11, and e, a
transformation from these errors to the variables Y, W, and H+ using eq. (3.3.13)
and the relation H+ = H( R, W, Y)+ e produces a joint density function f( Y, Z).
There are three states of the world in this model (so I = 3 in the notation of
Section 1.2). The ith state occurs when S, = 1 which arises if (Y, Z) E 52; where
and
Y is observed in the work states 2 and 3, but not when 6, = 1. Thus, adopting the
convention of Section 1, the observed version of Y is given by Y * = (6, + 6,)Y.
In this notation, the appropriate density functions for this model are given by
formulae (1.2.12) and (1.2.13) with & = 6, + S,, 8, = 0, and & = S,.
4. Summary
This paper presents and extends the index function model of Karl Pearson (1901)
that underlies all recent models in labor econometrics. In this framework,
censored, truncated and discrete random variables are interpreted as the manifes-
tation of various sampling schemes for underlying index function models. A
unified derivation of the densities and regression representations for index func-
1972 J. J. Heckmun and T. E. MaCurdy
(A.lb)
z, 2 0 iff 6, =l,
z, < 0 iff 6, = 0,
z, 2 0 iff 6, =I,
(Yr(Y*
= 0, (A.2)
(A.3)
where (nl, nz, TJ~)is a vector of parameters and (I+, Q, cg) is a vector of mean zero
continuous unobserved random variables.
The outcome 8, =l of the choice process arises if either U(l,l) or U(l,O) is
maximal in the choice set (i.e. max(U(1, l), U(l, 0)) 2 max(U(0, l), U(O,O))). For
a separable model with no interactions (v~ = 0 and es = 0), this condition can be
stated as
Setting ni = pr, (or = 0 and &I= V, produces eq. (A.la). Condition (A.2) is
satisfied. By a parallel argument for S,, (A.lb) is produced. Condition (A.2) is
satisfied because both (or = 0 and (Ye= 0.
For a general nonseparable choice problem (TJ~f 0 or eg f 0 or both) equation
system (A.l) still represents the choice process but once more (or = (Y*= 0. For
example, suppose that cg = 0. In this case
or
or
where Xl, denotes the conditional random variable X given Y = y. The probabil-
ity that 6, = 1 can be represented by eq. (A.l) with (or = 0. In this model the
distribution of (Vi, V,) is of a different functional form than is the distribution of
(% E*).
1974 .I. J. Heckman and T. E. MaCur&
In this example there is genuine interaction in the utility of outcomes and eqs.
(A.l) still characterize the choice process. The model satisfies condition (A.2).
Even if (hi = (Ye= 0, there is genuine simultaneity in choice.
Unconditional representation (A.l) (with cyiZ 0 or CQf 0) sometimes char-
acterizes a choice process of interest and sometimes does not. Often partitions of
the support of (Vi, V,) required to define 6, and 6, are not rectangular and so the
unconditional representation of the choice process with CX~ # 0 or CY~# 0 is not
appropriate, but any well-posed simultaneous choice process can be represented
by equation system (A.l).
An apparent source of confusion arises from interpreting (A.l) as a well-
specified behavioral relationship. Thus it might be assumed that the utility of
agent 1 depends on the actions of agent 2, and vice versa. In the absence of any
behavioral mechanism for determining the precise nature of the interaction
between two actors (such as (A.3)), the model is incomplete. Assuming that player
1 is dominant (so CQ= 0) is one way to supply the missing behavioral relationship.
(Dominance here means that player 1 temporally has the first move.) Another
way to complete the model is to postulate a dynamic sequence so that current
utilities depend on previous outcomes (so CY~ = (Ye= 0, see Heckman (1981)).
Bjorn and Vuong (1984) complete the model by suggesting a game theoretic
relationship between the players. In all of these completions of the model, (A.2) is
satisfied.
References
Abowd, .I. and H. Farber (1982) “Jobs Queues and the Union Status of Workers”, Industrial nnd
Lcrhor Rehtions Review, 35, 354-367.
Amemiya, T. (1985) Aduanced Econometrics. Harvard University Press, forthcoming.
Bishop, Y., S. Fienberg and P. Holland (1975) Discrete Multivariate An&y.+. Cambridge: MIT Press.
Bock and Jones (1968) The Measurement und Prediction of Judgment and Choice. San Francisco:
Holden-Day.
Bojas, G. and S. Rosen (1981) “Income Prospects and Job Mobility of Younger Men”, in:
R. Ehrenburg, ed., Research in I*rhor Economics. London: JAI Press, 3.
Burtless, G. and J. Hausman (1978) “The Etfect of Taxation on Labor Supply: Evaluating the Gary
Negative Income Tax Experiment”, Journal of Political Economy, 86(6), 1103-1131.
Byron, R. and A. K. Bera (1983) “Least Squares Approximations to Unknown Regression Functions,
A Comment”, Internntional Economic Review, 24(l), 255-260.
Cain, G. and H. Watts, eds. (1973) Income Muintenunce and Labor Supp!v. Chicago: Markham.
Catsiapsis, B. and C. Robinson (1982) “Sample Selection Bias with Multiple Selection Rules: An
Application to Student Aid Grants”, Journal of Econometrics, 18. 351-368.
Chamberlain, G. (1982) “Multivariate Regression Models for Panel Data”, Journnl of Econometrics,
18, 5-46.
Cogan, J. (1981) “Fixed Costs and Labor Supply”, Econometrica, 49(4), 945-963.
Coleman, T. (1981) “Dynamic Models of Labor Supply”. University of Chicago, unpublished
manuscript.
Coleman, T. (1984) “Two Essays on the Labor Market”. University of California, unpublished Ph.D.
dissertation.
Cosslett, S. (1984) “Distribution-Free Estimator of Regression Model with Sample Selectivity”.
University of Florida, unpublished manuscript.
Ch. 32: I&or Eeonometric.s 1975
Eicker, F. (1963) “Asymptotic Normality and Consistency of the Least Squares Estimators for
Families of Linear Regressions”, Annals of Mathematical Statistics, 34, 446-456.
Eicker. F. (1967) “Limit Theorems for Regressions with Unequal and Dependent Errors”, in:
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley:
University of California Press, 1, 59-82.
Flinn, C. (1984) “Behavioral Models of Wage Growth and Job Change Over the Life Cycle”.
University of Chicago, unpublished Ph.D. dissertation.
Flinn, C. and J. Heckman (1982) “New Methods for Analyzing Structural Models of Labor Force
Dynamics”, Journal of Econometrics, 18, 115-16X.
Freeman. R. (1984) “Longitudinal Analysis of the Effects of Trade Unions”, Journal of Labor
Economics, 2, l-26.
Gallant, R. and D. Nychka (1984) “Consistent Estimation of the Censored Regression Model”,
unpublished manuscript, North Carolina State University.
Goldberger, A. (1983) “Abnormal Selection Bias”, in: S Karlin, T. Amemiya and L. Goodman, eds.,
Studies in Econometrics, Time Series and Multivarrate Statistics. New York: Academic Press, 67-84.
Griliches, Z. (1986) “Economic Data Issues”, in this volume.
Haberman, S. (1978) Analysis of Qualitatioe Data, New York: Academic Press, I and II.
Hausman, J. (1980) “The Effects of Wages, Taxes, and Fixed Costs on Women’s Labor Force
Participation”, Journal of Public Economics, 14, 161-194.
Heckman, J. (1974) “Shadow Prices, Market Wages and Labor Supply”, Econometrica, 42(4),
679-694.
Heckman, J. (1976a) “Simultaneous Equations Models with Continuous and Discrete Endogenous
Variables and Structural Shifts”, in: S. Goldfeld and R. Quandt. eds.. Studies in Nonlinear
Estimation. Cambridge: Ballinger.
Heckman, J. (1976b) “The Common Structure of Statistical Models of Truncation, Sample Selection
and Limited Dependent Variables and a Simple Estimator for Such Models”, Annals of Economic
and Social Measurement, Fall, 5(4), 475-492.
Heckman, J. (1978) “Dummy Endogenous Variables in a Simultaneous Equations System”,
Econometrica, 46, 931-961.
Heckman, J. (1979) “Sample Selection Bias as a Specification Error”, Econometrica, 47, 153.-162.
Heckman, J. (1981) “Statistical Models for Discrete Panel Data”, in: C. Manski and D. McFadden,
eds., Structural Analysis of Discrete Data with Economic Applications. Cambridge: MIT Press.
Heckman, J., M. Killingsworth and T. MaCurdy (1981) “Empirical Evidence on Static Labour Supply
Models: A Survey of Recent Developments”, in: Z. Homstein, J. Grice and A. Webb, eds., The
Economics of the Labour Market. London: Her Majesty’s Stationery Office, 75-122.
Heckman, J. and T. MaCurdy (1980) “A Life Cycle Model of Female Labor Supply”, Review of
Economic Studies, 41, 47-74.
Heckman, J. and T. MaCurdy (1981) “New Methods for Estimating Labor Supply Functions: A
Survey”, in: R. Ehrenberg, ed., Research in Labor Economics. London: JAI Press, 4.
Heckman, J. and R. Robb (1985) “Alternative Methods for Evaluating the Impact of Training on
Earnings”, in: J. Heckman and B. Singer, eds., Longitudinal Analysis of Labor Market Data.
Cambridge: Cambridge University Press.
Heckman, J. and G. Sedlacek (1985) “Heterogeneity, Aggregation and Market Wage Functions: An
Empirical Model of Self Selection in the Labor Market”, Journal of Political Economy, 93,
December.
Heckman, J. and B. Singer (1986) “Econometric Analysis of Longitudinal Data”, in this volume.
Heckman, J. and R. Willis (1977) “A Beta Logistic Model for Analysis of Sequential Labor Force
Participation by Married Women”, Journal of Political Economy, X5, 27-58.
Hotz, J. and R. Miller (1984) “A Dynamic Model of Fertility and Labor Supply”. Carnegie-Mellon
University, unpublished manuscript.
Johnson, W. (1978) “A Theory of Job Shopping”, Quartet+ Journal of Economics.
Jovanovic, B. (1979) “Firm Specific Capital and Turnover”, Journal of Political Economy, December,
87(6), 1246-1260.
Kagan, A., T. Linnik and C. R. Rao (1973) Some Characterization Theorems in Mathematical
Statistics. New York: Wiley.
Kendall, M. and A. Stuart (1967) The Advanced Theory of Stutistics. London: Griffen, II.
1976 J. J. Ileckmun und T. E. MuCurdv
Tinbcrgcn, J. (1951) “Some Remarks on the Distribution of Labour Incomes”, Internattotul Econorn/c
Pupers. 195-20-l.
Tobin, J. (1958) “Estimation of Relationships for Limited Dependent Variables”, Gonometrrcu,26.
24-36.
Wales, T. J. and A. D. Woodland (1979) “Labour Supply and Progrcssivc Taxes”, Reeve!+ of Ecotromic
Studies, 46, 83-95.
White, H. (1981) “Consequences and Detection of Misspecifed Nonlinear Regression Models”,
Journul of the American Stutistwd Assoaatron, 16, 419-433.
Willis, R. and S. Rosen (1979) “Education and Self Selection”, Journul of Polrticcrl Econon!g, Xl.
Sl-S36.
Wolpin, K. (1984) “An Estimable Dynamic Stochastic Model of Fertility and Child Mortality”,
Journal of Politicrrl Ecc.onom,v,Vol. 92, August.
Yoon. B. (1981) “A Model of Unemployment Duration with Variable Search Intensity”, Reoiew o/
Economic.v and Statistics, November, 63(4), 599-609.
Yoon, B. (19X4) “A Nonstationary Hazard Model of Unemployment Duration”. New York: SUNY,
Department of Economics, unpublished manuscript.
Zellner, A., J. Kmenta and J. Dreze (1966) “Specification and Estimation of Cobb Douglas Production
Function Models”, Econometrica, 34, 784-795.
Chapter 33
Contents
1. Introduction 1980
2. Numerical solution of nonlinear models 1981
3. Evaluation of ex ante forecasts 1984
4. Evaluation of ex post forecasts 1986
5. An alternative method for evaluating predictive accuracy 1988
6. Conclusion 1993
References 1994
1. Introduction
Methods for evaluating the predictive accuracy of econometric models are dis-
cussed in this chapter. Since most models used in practice are nonlinear, the
nonlinear case will be considered from the beginning. The model is written as:
.fi(YtY
xt2 ai> = uil,
(i=l ,..., n), (t =l,...,T),
‘For a good recent text on forecasting techniques for time series, see Granger and Newbold (1977).
Ch. 33: Evaluatingthe PredictiveAccuracyof Models 1981
the estimation technique used. Given P and given the normality assumption, an
estimate of the distribution of the coefficient estimates is N(&, P), where & is the
K x 1 vector of the coefficient estimates.
Let u: denote a particular draw of the m error terms for period t from the
N(O,e) distribution, and let 1y* denote a particular draw of the K coefficients
from the N(& P) distribution. Given u : for each period t of the simulation and
given (Y*, one can solve the model. This is merely a deterministic simulation for
the given values of the error terms and coefficients. Call this simulation a “trial”.
Another trial can be made by drawing a new set of values of U: for each period t
and a new set of values of (Y*.This can be done as many times as desired. From
each trial one obtains a prediction of each endogenous variable for each period.
Let ji;?,kdenote the value on the jth trial of the k-period-ahead prediction of
variable i from a simulation beginning in period t.2 For J trials, the estimate of
the expected value of the variable, denoted Tirk, is:
‘Note that f denotes the first period of the simulation, so that ji, is the prediction for period
ti k-l.
Ch. 33: Evaluating the Predictive Accuracy of Models 1983
3Although much of the discussion in the literature is couched in terms of constant-term adjustments,
Intriligator (1978, p. 516) prefers to interpret the adjustments as the user’s estimates of the future
values of the error terms.
1984 R. C. Fair
along with another type of mechanical adjustment procedure, is used for some of
the results in Haitovsky, Treyz, and Su (1974). See also Green, Liebenberg, and
Hirsch (1972) for other examples.
The three most common measures of predictive accuracy are root mean squared
error (RMSE), mean absolute error (MAE), and Theil’s inequality coefficient4
(U). Let yii, be the forecast of variable i for period t, and let y,, be the actual
value. Assume that observations on jjii, and y,, are available for t = 1,. . . , T. Then
the measures for this variable are:
MAE (4
u (5)
where A in (5) denotes either absolute or percentage change. All three measures
are zero if the forecasts are perfect. The MAE measure penalizes large errors less
than does the RMSE measure. The value of U is one for a no-change forecast
(A ji, = 0). A value of U greater than one means that the forecast is less accurate
than the simple forecast of no change.
An important practical problem that arises in evaluating ex ante forecasting
accuracy is the problem of data revisions. Given that the data for many variables
are revised a number of times before becoming “final”, it is not clear whether the
forecast values should be compared to the first-released values, to the final values,
or to some set in between. There is no obvious answer to this problem. If the
revision for a particular variable is a benchmark revision, where the level of
the variable is revised beginning at least a few periods before the start of the
prediction period, then a common procedure is to adjust the forecast value by
adding the forecasted change (AJii,), which is based on the old data, to the new
lagged value (ri,_J and then comparing the adjusted forecast value to the new
data. If, say, the revision took the form of adding a constant amount ji to each of
the old values of yit, then this procedure merely adds the same Ji to each of the
forecasted values of yit. This procedure is often followed even if the revisions are
not all benchmark revisions, on the implicit assumption that they are more like
benchmark revisions than other kinds. Following this procedure also means that
if forecast changes are being evaluated, as in the U measure, then no adjustments
are needed.
There are a number of studies that have examined ex ante forecasting accuracy
using one or more of the above measures. Some of the more recent studies are
McNees (1973, 1974, 1975, 1976) and Zarnowitz (1979). It is usually the case that
forecasts from both model builders and nonmodel builders are examined and
compared. A common “base” set of forecasts to use for comparison purposes is
the set from the ASA/NBER Business Outlook Survey. A general conclusion
from these studies is that there is no obvious “winner” among the various
forecasters [see, for example, Zarnowitz (1979, pp. 23, 30)]. The relative perfor-
mance of the forecasters varies considerably across variables and length ahead of
the forecast, and the differences among the forecasters for a given variable and
length ahead are generally small. This means that there is yet little evidence that
the forecasts from model builders are more accurate than, say, the forecasts from
the ASA/NBER Survey.
Ex ante forecasting comparisons are unfortunately of little interest from the
point of view of examining the predictive accuracy of models. There are two
reasons for this. The first is that the ex ante forecasts are based on guessed rather
than actual values of the exogenous variables. Given only the actual and forecast
values of the endogenous variables, there is no way of separating a given error
into that part due to bad guesses and that part due to other factors. A model
should not necessarily be penalized for bad exogenous-variable guesses from its
users. More will be said about this in Section 5. The second, and more important,
reason is that almost all the forecasts examined in these studies are generated
from subjectively adjusted models, (i.e. subjective add factors are used). It is thus
the accuracy of the forecasting performance of the model builders rather than of
the models that is being examined.
Before concluding this section it is of interest to consider two further points
regarding the subjective adjustment of models. First, there is some indirect
evidence that the use of add factors is quite important in practice. The studies of
Evans, Haitovsky, and Treyz (1972) and Haitovsky and Treyz (1972) analyzing
the Wharton and OBE models found that the ex ante forecasts from the model
builders were more accurate than the ex post forecasts from the models, even
when the same add factors that were used for the ex ante forecasts were used for
the ex post forecasts. In other words, the use of actual rather than guessed values
1986 R. C. Fair
of the exogenous variables decreased the accuracy of the forecasts. This general
conclusion can also be drawn from the results for the BEA model in Table 3 in
Hirsch, Grimm, and Narasimham (1974). This conclusion is consistent with the
view that the add factors are (in a loose sense) more important than the model in
determining the ex ante forecasts: what one would otherwise consider to be an
improvement for the model, namely the use of more accurate exogenous-variable
values, worsens the forecasting accuracy.
Second, there is some evidence that the accuracy of non-subjectively adjusted
ex ante forecasts is improved by the use of actual rather than guessed values of
the exogenous variables. During the period 1970111-197311, I made ex ante
forecasts using a short-run forecasting model [Fair (1971)]. No add factors were
used for these forecasts. The accuracy of these forecasts is examined in Fair
(1974) and the results indicate that the accuracy of the forecasts is generally
improved when actual rather than guessed values of the exogenous variables are
used. ’
It is finally of interest to note, although nothing really follows from this, that
the (non-subjectively adjusted) ex ante forecasts from my forecasting model were
on average less accurate than the subjectively adjusted forecasts [McNees (1973)],
whereas the ex post forecasts, (i.e. the forecasts based on the actual values of the
exogenous variables) were on average about the same degree of accuracy as the
subjectively adjusted forecasts [Fair (1974)].
The measures in (3)-(5) have also been widely used to evaluate the accuracy of
ex post forecasts. One of the more well known comparisons of ex post forecasting
accuracy is described in Fromm and Klein (1976) where eleven models are
analyzed. The standard procedure for ex post comparisons is to compute ex post
forecasts over a common simulation period, calculate for each model and variable
an error measure, and compare the values of the error measure across models. If
the forecasts are outside-sample, there is usually some attempt to have the ends
of the estimation periods for the models be approximately the same. It is
generally the case that forecasting accuracy deteriorates the further away the
forecast period is from the estimation period, and this is the reason for wanting to
make the estimation periods as similar as possible for different models.
The use of the RMSE measure, or one of the other measures, to evaluate
ex post forecasts is straightforward, and there is little more to be said about this.
Sometimes the accuracy of a given model is compared to the accuracy of a
“naive” model, where the naive model can range from the simple assumption of
no change in each variable to an autoregressive moving average (ARIMA) process
for each variable. (The comparison with the no-change model is, of course,
Ch. 33: Evaluating the Predictive Accuracy of Models 1987
predicted values should be zero. Nelson found that in general the estimates of this
coefficient were significantly different from zero. This test, while interesting,
cannot be used to compare models that differ in the number and types of
variables that are taken to be exogenous. In order to test the hypothesis of
efficient information use, the information set used by one model must be
contained in the set used by the other model, and this is in general not true for
models that differ in their exogenous variables.
where Jitk is determined by (2). If an estimate of the uncertainty from the error
terms only is desired, then the trials consist only of draws from the distribution of
the error terms.5
There are two polar assumptions that can be made about the uncertainty of the
exogenous variables. One is, of course, that there is no exogenous-variable
uncertainty. The other is that the exogenous-variable forecasts are in some way as
uncertain as the endogenous-variable forecasts. Under this second assumption
one could, for example, estimate an autoregressive equation for each exogenous
variable and add these equations to the model. This expanded model, which
would have no exogenous variables, could then be used for the stochastic-simula-
‘Note that it is implicitly assumed here that the variances of the forecast errors exist. For some
estimation techniques this is not always the case. If in a given application the variances do not exist,
then one should estimate other measures of dispersion of the distribution, such as the interquartile
range or mean absolute deviation.
Ch. 33: Evaluating the Predictive Accuracy of Models 1989
tion estimates of the variances. While the first assumption is clearly likely to
underestimate exogenous-variable uncertainty in most applications, the second
assumption is likely to overestimate it. This is particularly true for fiscal-policy
variables in macroeconomic models, where government-budget data are usually
quite useful for purposes of forecasting up to at least about eight quarters ahead.
The best approximation is thus likely to lie somewhere in between these two
assumptions.
The assumption that was made for the results in Fair (1980) was in between the
two polar assumptions. The procedure that was followed was to estimate an
eighth-order autoregressive equation for each exogenous variable (including a
constant and time in the equation) and then to take the estimated standard error
from this regression as the estimate of the degree of uncertainty attached to
forecasting the change in this variable for each period. This procedure ignores the
uncertainty of the coefficient estimates in the autoregressive equations, which is
one of the reasons it is not as extreme as the second polar assumption. In an
earlier stochastic-simulation study of Haitovsky and Wallace (1972), third-order
autoregressive equations were estimated for the exogenous variables, and these
equations were then added to the model. This procedure is consistent with the
second polar assumption above except that for purposes of the stochastic
simulations Haitovsky and Wallace took the variances of the error terms to be
one-half of the estimated variances. They defend this procedure (pp. 267-268) on
the grounds that the uncertainty from the exogenous-variable forecasts is likely to
be less than is reflected in the autoregressive equations.
Another possible procedure that could be used for the exogenous variables
would be to gather from various forecasting services data on their ex ante
forecasting errors of the exogenous variables (exogenous to you, not necessarily to
the forecasting service). From these errors for various periods one could estimate
a standard error for each exogenous variable and then use these errors for the
stochastic-simulation draws.
For purposes of describing the present method, all that needs to be assumed is
that some procedure is available for estimating exogenous-variable uncertainty. If
equations for the exogenous variables are not added to the model, but instead
some in between procedure is followed, then each stochastic-simulation trial
consists of draws of error terms, coefficients, and exogenous-variable errors. If
equations are added, then each trial consists of draws of error terms and
coefficients from both the structural equations and the exogenous-variable equa-
tions. In either case, let izk denote the stochastic-simulation estimate of the
variance of the forecast error that takes into account exogenous-variable uncer-
tainty. dik differs from c?;, in (6) in that the trials for g,$ include draws of
exogenous-variable errors.
Estimating the uncertainty from the possible misspecification of the model is
the most difhcult and costly part of the method. It requires successive reestima-
tion and stochastic simulation of the model. It is based on a comparison of
1990 R. C. Fair
If it is assumed that Tirk exactly equals the true expected value, jjjrk, then iitk in
(7) is a sample draw from a distribution with a known mean of zero and variance
u$. The square of this error, Zf,,,, is thus under this assumption an unbiased
estimate of IJ~:~. One thus has two estimates of u,$, one computed from the mean
forecast error and one computed by stochastic simulation. Let ditk denote the
difference between these two estimates:
If it is further assumed that 6$ exactly equals the true value, then ditk is the
difference between the estimated variance based on the mean forecast error and
the true variance. Therefore, under the two assumptions of no error in the
stochastic-simulation estimates, the expected value of dirk is zero.
The assumption of no stochastic-simulation error, i.e. Jjtk = jitk and 6ii:k = ai,,
is obviously only approximately correct at best. Even with an infinite number of
draws the assumption would not be correct because the draws are from estimated
rather than known distributions. It does seem, however, that the error introduced
by this assumption is likely to be small relative to the error introduced by the fact
that some assumption must be made about the mean of the distribution of ditk.
Because of this, nothing more will be said about stochastic-simulation error. The
emphasis instead is on the possible assumptions about the mean of the distribu-
tion of dilk, given the assumptions of no stochastic-simulation error.
The procedure just described uses a given estimation period and a given
forecast period. Assume for sake of an example that one has data from period 1
through 100. The model can then be estimated through, say, period 70, with the
forecast period beginning with period 71. Stochastic simulation for the forecast
period will yield for each i and k a value of diTlk in (8). The model can then be
reestimated through period 71, with the forecast period now beginning with
period 72. Stochastic simulation for this forecast period will yield for each i and k
a value of di72k in (8). This process can be repeated through the estimation period
Ch. 33: Evaluating the Predictive Accuracy of Models 1991
ending with period 99. For the one-period-ahead forecast (k = 1) the procedure
will yield for each variable i 30 values of d,,, (t = 71,. . . ,100); for the two-
period-ahead forecast (k = 2) it will yield 29 values of dif2 (t = 72,. . . ,100); and
so on. If the assumption of no simulation error holds for all t, then the expected
value of dirk is zero for all t.
The discussion so far is based on the assumption that the model is correctly
specified. Misspecification has two effects on dirk in (8). First, if the model is
misspecified, the estimated covariance matrices that are used for the stochastic
simulation will not in general be unbiased estimates of the true covariance
matrices. The estimated variances computed by means of stochastic simulation
will thus in general be biased. Second, the estimated variances computed from the
forecast errors will in general be biased estimates of the true variances. Since
n&specification affects both estimates, the effect on ditk is ambiguous. It is
possible for misspecification to affect the two estimates in the same way and thus
leave the expected value of the difference between them equal to zero. In general,
however, this does not seem likely, and so in general one would not expect the
expected value of ditk to be zero for a misspecified model. The expected value
may be negative rather than positive for a misspecified model, although in general
it seems more likely that it will be positive. Because of the possibility of data
mining, m&specification seems more likely to have a larger positive effect on the
outside sample forecast errors than on the (within-sample) estimated covariance
matrices.
An examination of how the dilk values change over time (for a given i and k)
may reveal information about the strengths and weaknesses of the model that one
would otherwise not have. This information may then be useful in future work on
the model. The individual values may thus be of interest in their own right aside
from their possible use in estimating total predictive uncertainty.
For the total uncertainty estimates some assumption has to be made about how
misspecification affects the expected value of drtk. For the results in Fair (1980a)
it was assumed that the expected value of dirk is constant across time: for a given
i and k, misspecification was assumed to affect the mean of the distribution of
dilk in the same way for all t. Other possible assumptions are, of course, possible.
One could, for example, assume that the mean of the distribution is a function of
other variables. (A simple assumption in this respect is that the mean follows a
linear time trend.) Given this assumption, the mean can be then estimated from a
regression of dirk on the variables. For the assumption of a constant mean, this
regression is merely a regression on a constant (i.e. the estimated constant term is
merely the mean of the ditk values).6 The predicted value from this regression for
period t, denoted aitk, is the estimated mean for period t.
6For the results in Fair (1980) a slightly different assumption than that of a constant mean was
made for variables with trends. For these variables it was assumed that the mean of dlrk is
proportional to ;,:,, 1‘.e that the mean of di,k/Fi;i:k is constant across time.
1992 R. C. Fair
An estimate of the total variance of the forecast error, denoted Sg,, is the sum
of di$ - the stochastic-simulation estimate of the variance due to the error terms,
coefficient estimates, and exogenous variables - and aitk:
Since the procedure in arriving at 6& takes into account the four main sources of
uncertainty of a forecast, the values of c%&can be compared across models for a
given i, k, and t. If, for example, one model has consistently smaller values of 15,:~
then another, this would be fairly strong evidence for concluding that it is a more
accurate model, i.e. a better approximation to the true structure.
This completes the outline of the method. It may be useful to review the main
steps involved in computing I?& in (9). Assume that data are available for periods
1 through T and that one is interested in estimating the uncertainty of an
eight-period-ahead forecast that began in period T + 1, (i.e. in computing 15~3,for
t=T+l and k=l , . . . ,8). Given a base set of values for the exogenous variables
for periods T + 1 through T + 8, one can compute izi:, for f = T + 1 and k = 1,. . . ,8
by means of stochastic simulation. Each trial consists of one eight-period dynamic
simulation and requires draws of the error terms, coefficients, and exogenous-vari-
able errors. These draws are based on the estimate of the model through period T.
This is the relative inexpensive part of the method. The expensive part consists of
the successive reestimation and stochastic simulation of the model that are needed
in computing the ditk values. In the above example, the model would be
estimated 30 times and stochastically simulated 30 times in computing the ditk
values. After these values are computed for, say, periods T - r through T, then
airk can be computed for t = T + 1 and k = 1,. . . ,8 using whatever assumption has
been made about the distribution of dilk. This allows S& in (9) to be computed
for t=T+l and k=1,...,8.
In the successive reestimation of the model, the first period of the estimation
period may or may not be increased by one each time. The criterion that one
should use in deciding this is to pick the procedure that seems likely to corre-
spond to the chosen assumption about the distribution of dirk being the best
approximation to the truth. It is also possible to take the distance between the last
period of the estimation period and the first period of the forecast period to be
other than one, as was done above.
It is important to note that the above estimate of the mean of the ditk
distribution is not in general efficient because the error term in the ditk regression
is in general heteroscedastic. Even under the null hypothesis of no misspecifica-
tion, the variance of the drtk distribution is not constant across time. It is true,
however, that &J( 6& + d,,,) “* has unit variance under the null hypothesis,
and so it may not be a bad approximation to assume that P~,/(c?& + a,,,) has a
constant variance across time. This then suggests the following iterative proce-
Ch. 33: Evaluating the Predictive Accuracy of Models 1993
dure. 1) For each i and k, calculate ditk from the dirk regression, as discussed
above; 2) divide each observation in the ditk regression by C$ + airk, run another
regression, and calculate artk from this regression; 3) repeat step 2) until the
successive estimates of aitk are within some prescribed tolerance level. Litterman
(1980) has carried out this procedure for a number of models for the case in
which the only explanatory variable in the ditk regression is the constant term (i.e.
for the case in which the null hypothesis is that the mean of the ditk distribution
is constant across time).
If one is willing to assume that gitk is normally distributed, which is at best
only an approximation, then Litterman (1979) has shown that the above iterative
procedure produces maximum likelihood estimates. He has used this assumption
in Litterman (1980) to test the hypothesis (using a likelihood ratio test) that the
mean of the dirk distribution is the same in the first and second halves of the
sample period. The hypothesis was rejected at the 5 percent level in only 3 of 24
tests. These results thus suggest that the assumption of a constant mean of the
ditk distribution may not be a bad approximation in many cases. This conclusion
was also reached for the results in Fair (1982), where plots of dirk values were
examined across time (for a given i and k). There was little evidence from these
plots that the mean was changing over time.
The mean of the ditk distribution can be interpreted as a measure of the
average unexplained forecast error variance, (i.e. that part not explained by &)
rather than as a measure of n&specification. Using this interpretation, Litterman
(1980) has examined whether the use of the estimated means of the ditk distribu-
tions lead to more accurate estimates of the forecast error variances. The results
of his tests, which are based on the normality assumption, show that substantially
more accurate estimates are obtained using the estimated means. Litterman’s
overall results are thus quite encouraging regarding the potential usefulness of the
method discussed in this section.
Aside from Litterman’s use of the method to compare various versions of Sims’
(1980) model, I have used the method to compare my model [Fair (1976)],
Sargent’s (1976) model, Sims’ model, and an eighth-order autoregressive model.
The results of this comparison are presented in Fair (1979).
6. Conclusion
It should be clear from this chapter that the comparison of the predictive
accuracy of alternative models is not a straightforward exercise. The difficulty of
evaluating alternative models is undoubtedly one of the main reasons there is
currently so little agreement about which model best approximates the true
structure of the economy. If it were easy to decide whether one model is more
accurate than another, there would probably be by now a generally agreed upon
1994 R. C. Fair
model of, for example, the U.S. economy. With further work on methods like the
one described in Section 5, however, it may be possible in the not-too-distant
future to begin a more systematic comparison of models. Perhaps in ten or twenty
years time the use of these methods will have considerably narrowed the current
range of disagreements.
References
Bianchi, C., G. Calzolari and P. Corsi (1976) “Divergences in the Results of Stochastic and
Deterministic Simulation of an Italian Non Linear Econometric Model”, in: L. Dekker, ed.,
Simulation of Systems. Amsterdam: North-Holland Publishing Co.
Calzolari. G. and P. Corsi (1977) “Stochastic Simulation as a Validation Tool for Econometric
Models”. Paper presented at IIASA Seminar, Laxenburg, Vienna, September 13-15.
Cooper, J. P. (1974) Development of the Monetary Sector, Prediction and Policy Analysis in the
FRB-MIT-Penn Model. Lexington: D. C. Heath & Co.
Cooper. J. P. and S. Fischer (1972) “Stochastic Simulation of Monetary Rules in Two Macroecono-
metric Models”, Journal of the American Statistical Association, 67, 750-760.
Cooper. J. P. and S. Fischer (1974) “Monetarv and Fiscal Policv in the Fullv Stochastic St. Louis
Econometric Model”, Journal of money, Credit and Banking, 6, i-22. .
Evans, Michael K., Yoel Haitovsky and George I. Treyz, assisted by Vincent Su (1972) “An Analysis
of the Forecasting Properties of U.S. Econometric Models”, in: B. G. Hickman, ed., Econometric
Models of Cyclical Behavior. New York: Columbia University Press, 949-1139.
Evans, M.. K:, L. R. Klein and M. Saito (1972) “Short-Run Prediction and Long-Run Simulation of
the Wharton Model”. in: B. G. Hickman. ed., Econometric Models of ~ Cvclical
I Behavior. New York:
Columbia University Press, 139-185.
Fair, Ray C. (1971) A Short-Run Forecastrng Model of the United States Economy. Lexington: D. C.
Heath & Co.
Fair, Ray C. (1974) “An Evaluation of a Short-Run Forecasting Model”, International Economic
Review, 15, 285-303.
Fair, Ray C. (1976) A Model of Macroeconomic Activity. Volume II: The Empirical Model. Cambridge:
Ballinger Publishing Co.
Fair, Ray C. (1979) “An Analysis of the Accuracy of Four Macroeconometric Models”, Journal of
Political Economy, 87, 701-718.
Fair, Ray C. (1980) “Estimating the Expected Predictive Accuracy of Econometric Models,” Interna-
tional Economic Review, 21, 355-378.
Fair, Ray C. (1982) “The ElTects of l&specification on Predictive Accuracy,” in: G. C. Chow and P.
Corsi, eds., Evaluating the Reliability of Macro-economic Models. New York: John Wiley & Sons,
193-213.
Fromm, Gary and Lawrence R. Klein (1976) “The NBER/NSF Model Comnarison Seminar: An
Analysis of-Results”, Annals of Economic and Social Measurement, Winter, 5, i-28.
Fromm. Gary. L. R. Klein and G. R. S&ink (1972) “Short- and Lone-Term Simulations with the
Brookings Model”, in: B. G. Hickman, ed., Econometric Models of C$lical Behavior. New York:
Columbia University Press, 201-292.
Garbade, K. D. (1975) Discretionary Control of Aggregate Economic Activity. Lexington: D. C. Heath
& co.
Granger. C. W. J. and Paul Newbold (1977) Forecasting Economic Time Series. New York: Academic
Press.
Green, G. R., M. Liebenberg and A. A. Hirsch (1972) “Short- and Long-Term Simulations with the
OBE Econometric Model”, in: B. G. Hickman, ed., Econometric Models of C_vclicalBehavior. New
York: Columbia University Press, 25-123.
Haitovsky, Yoel and George Treyz (1972) “Forecasts with Quarterly Macroeconometric Models:
Equation Adjustments, and Benchmark Predictions: The U.S. Experience”, The Review of Economics
Ch. 33: Evuluating the Predictive Accuracy of Models 1995
Stanford University
Contents
1. Introduction 1998
2. Solution concepts and techniques 1998
2.1. Scalar models 1999
2.2. Bivariate models 2016
2.3. The use of operators, generating functions, and z-transforms 2031
2.4. Higher order representations and factorization techniques 2033
2.5 Rational expectations solutions as boundary value problems 2037
3. Econometric evaluation of policy rules 2038
3.1. Policy evaluation for a univariate model 2039
3.2. The Lucas critique and the Cowles Commission critique 2040
3.3. Game-theoretic approaches 2041
4. Statistical inference 2041
4.1. Full information estimation 2041
4.2. Identification 2043
4.3. Hypothesis testing 2044
4.4. Limited information estimation methods 2044
5. General linear models 2045
5.1. A general first-order vector model 2045
5.2. Higher order vector models 2047
6. Techniques for nonlinear models 2048
6.1. Multiple shooting method 2049
6.2. Extended path method 2049
6.3. Nonlinear saddle path manifold method 2050
7. Concluding remarks 2051
References 2052
*Grants from the National Science Foundation and the Guggenheim Foundation are gratefully
acknowledged. I am also grateful to Olivier Blanchard, Gregory Chow, Avinash Dixit, George Evans,
Zvi Griliches, Sandy Grossman, Ben McCallum, David Papell, Larry Reed, Philip Reny, and Ken
West for helpful discussions and comments on an earlier draft.
1. Introduction
The sine qua non of a rational expectations model is the appearance of forecasts
of events based on information available before the events take place. Many
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 1999
different techniques have been developed to solve such models. Some of these
techniques are designed for large models with very general structures. Others are
designed to be used in full information estimation where a premium is placed on
computing reduced form parameters in terms of structural parameters as quickly
and efficiently as possible. Others are short-cut methods designed to exploit
special features of a particular model. Still others are designed for exposition
where a premium is placed on analytic tractability and intuitive appeal. Graphical
methods fall in this last category.
In this section, I examine the basic solution concept and explain how to obtain
the solutions of some typical linear rational expectations models. For expositional
purposes I feel the method of undetermined coefficients is most useful. This
method is used in time series analysis to convert stochastic difference equations
into deterministic difference equations in the coefficients of the infinite moving
average representation. [See Anderson (1971, p. 236) or Harvey (1981, p. 38)]. The
difference equations in the coefficients have exactly the same form as a determin-
istic version of the original model, so that the method can make use of techniques
available to solve deterministic difference equations. This method was used by
Muth (1961) in his original exposition of the rational expectations assumption. It
provides a general unified treatment of most stochastic rational expectations
models without requiring knowledge of any advanced techniques, and it clearly
reveals the nature of the assumptions necessary for existence and uniqueness of
solutions. It also allows for different viewpoint dates for expectations, and
provides an easy way to distinguish between the effects of anticipated versus
unanticipated policy shifts. The method gives the solution in terms of an infinite
moving average representation which is also convenient for comparing a model’s
properties with the data as represented in estimated infinite moving average
representations. An example of such a comparison appears in Taylor (1980b). An
infinite moving average representation, however, is not useful for maximum
likelihood estimation for which a finite ARMA model is needed. Although it is
usually easy to convert an infinite moving average model into a finite ARMA
model, there are computationally more advantageous ways to compute the
ARMA model directly as we will describe below.
(24
where OLand 6 are parameters and E, is the conditional expectation based on all
information through period t. The variable U, is an exogenous shift variable or
“shock” to the equation. It is assumed to follow a general linear process with the
J. B. Taylor
representation
00
(2.3)
with p > 0. In other words, the demand for real money balances depends
negatively on the expected rate of inflation, as approximated by the expected first
difference of the log of the price level. Eq. (2.3) can be written in the form of eq.
(2.1) by setting (Y= p/(1 + p) and 6 =l/(l+ p), and by letting y, = pt and
u, = m,. In this example the variable u, represents shifts in the supply of money,
as generated by the process (2.2). Alternatively, we could add an error term u, to
the right hand side of eq. (2.3), to represent shifts in the demand for money. Eq.
(2.3) was originally introduced in the seminal work by Cagan (1956), but with
adaptive, rather than rational expectations. The more recent rational expectations
version has been used by many researchers including Sargent and Wallace (1973).
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations
The stochastic process for the shock variable u, is assumed in eq. (2.2) to have a
general form. This form includes any stationary ARMA process [see Harvey
(1981), p. 27, for example]. For empirical applications this generality is necessary
because both policy variables and shocks to equations frequently have com-
plicated time series properties. In many policy applications (where U, in (2.2) is a
policy variable), one is interested in “thought experiments” in which the policy
variable is shifted in a special way and the response of the endogenous variables is
examined. In standard econometric model methodology, such thought experi-
ments require one to calculate policy multipliers [see Chow (1983), p. 147, for
example]. In forward-looking rational expectations models, the multipliers depend
not only on whether the shift in the policy variable is temporary or permanent,
but also on whether it is anticipated or unanticipated. Eq. (2.2) can be given a
special form to characterize these different thought experiments, as the following
examples indicate.
Temporary versus permanent shocks. The shock U, is purely temporary when
8, = 1 and Bi = 0 for i > 0. Then any shock U, is expected to disappear in the
period immediately after it has occurred; that is E,u,+; = 0 for i > 0 at every
realization of u,. At the other extreme the shock u, is permanent when 0; =l for
i > 0. Then any shock u, is expected to remain forever; that is Etut+, = u, for
i > 0 at every realization of u,. In this permanent case the u, process can be
written as u, = u,_r + E,. (Although U, is not a stationary process in this case, the
solution can still be used for thought experiments, or transformed into a sta-
tionary series by first-differencing.)
By setting 0, = p’, a range of intermediate persistence assumptions can be
modeled as p varies from 0 to 1. For 0 < p < 1 the shock u, is assumed to phase
out geometrically. In this case the u, process is simply U, = put-r + E,, a first
order autoregressive model. When p = 0, the disturbances are purely temporary.
When p = 1, they are permanent.
Anticipated versus unanticipated shocks. In policy applications it is also im-
portant to distinguish between anticipated and unanticipated shocks. Time delays
between the realization of the shock and its incorporation in the current informa-
tion set can be introduced for this purpose by setting Bi = 0 for values of i up to
the length of time of anticipation. For example, in the case of a purely temporary
shock, we can set 0, = 0, 8, = 1, fli = 0 for i > 1 so that u, = s,_r. This would
characterize a temporary shock which is anticipated one period in advance. In
other words the expectation of u,+i at time t is equal to u,+i because et = u,+r is
in the information set at time t. More generally a temporary shock anticipated k
periods in advance would be represented by U, = E,_~.
A permanent shock which is anticipated k periods in advance would be
modeled by setting B;=O for i=l,..., k-l and Bi=l for i=k, k+l,....
2002 J. B. Taylor
Table 1
Summary of alternative policies and their effects.
Interpretation:
Policy is anticipated k periods in advance,
k = 0 means unanticipated.
Policy is phased-out at geometric rate p, 0 I p 2 1,
p = 0 means purely temporary (N.B. p” = 1 when p = 0).
p = 1 means permanent.
In order to find a solution for y, (that is, a stochastic process for y, which satisfies
the model (2.1) and (2.2)), we begin by representing y, in the unrestricted infinite
moving average form
Ch. 34: Stabilization Palicy in Macroeconomic Fluctuations 2003
Finding a solution for yI then requires determining values for the undetermined
coefficients y, such that eq. (2.1) and (2.2) are satisfied. Current and past E,
represent the entire history of the perturbations to the model. Eq. (2.4) simply
states that y, is a general function of all possible events that may potentially
influence y,. The linear form is used in (2.4) because the model (2.2) is linear.
Note that the solution for y, in eq. (2.4) can easily be used to calculate the effect
of a one time unit shock to et. The dynamic impact of such a shock is simply
dy,+,/de, = Y,-
To find the unknown coefficients, the most direct procedure is to substitute for
Y, and E,Y,+, in (2.1) using (2.4) and solve for the y, in terms of (Y,6 and Bi. The
conditional expectation E,y,+, is obtained by leading (2.4) by one period and
taking expectations, making use of the equalities Er~,+i = 0 for i > 0. The first
equality follows from the assumption that E, has a zero unconditional mean and
is uncorrelated; the second follows from the fact that etti for i -C0 is in the
conditioning set at time t. The conditional expectation is
Equating the coefficients of Ed,&,_r, E~_~,. . . on both sides of the equality (2.6)
results in the set of equations
The first equation in (2.7) for i = 0 equates the coefficients of E, on both sides of
(2.6); the second equation similarly equates the coefficient for e,_i and so on.
Note that (2.7) is a deterministic difference equation in the yi coefficients with
di as a forcing variable. This deterministic difference equation has the same
structure as the stochastic difference eq. (2.1). It can be thought of as a
deterministic perfect foresight model of the “variable” yi. Hence, the problem of
solving a stochastic difference equation with conditional expectations of future
variables has been converted into a problem of solving a deterministic difference
equation.
Consider first the most elementary case where U, = E,. That is, 8, = 0 for i 21.
This is the case of unanticipated shocks which are temporary. Then eq. (2.7) can
2004 J. B. Taylor
be written
y. = ayl + 6. (2.8)
Yi+l
= iyi i=1,2 )... . (2.9)
From eq. (2.9) all the y, for i > 1 can be obtained once we have yi. However, eq.
(2.8) gives only one equation in the two unknowns y,, and yi. Hence without
further information we cannot determine the yi coefficients uniquely. The number
of unknowns is one greater than the number of equations. This indeterminacy is
what leads to non-uniqueness in rational expectations models and has been
studied by many researchers including Blanchard (1979) Flood and Garber
(1980), McCallum (1983), Gourieroux, Laffont, and Monfort (1982) Taylor
(1977) and Whiteman (1983).
If la1 I 1 then the requirement that y, is a stationary process will be sufficient
to yield a unique solution. (The case where Ial > 1 is considered below in Section
2.1.4.). To see this suppose that yr # 0. Since eq. (2.9) is an unstable difference
equation, the y, coefficients will explode as i gets large. But then yI would not be
a stationary stochastic process. The only value for yi that will prevent the y, from
exploding is yi = 0. From (2.9) this in turn implies that yj = 0 for all i > 1. From
eq. (2.8) we then have that y0 = 6. Hence, the unique stationary solution is simply
y, = 6~~. In this case, the impact of a unit shock dy,+s/de, is equal to S for s = 0
and is equal to 0 for s r 1. This simple impact effect is illustrated in Figure la.
(The more interesting charts in Figures lb, lc, and Id will be described below).
Example
In the case of the Cagan money demand equation this means that the price
p, = (1+ &‘m,. Because p > 0, a temporary unanticipated increase in the
money supply increases the price level by less than the increase in money. This is
due to the fact that the price level is expected to decrease to its normal value
(zero) next period, thereby generating an expected deflation. The expected defla-
tion increases the demand for money so that real balances must increase. Hence,
the price p, rises by less than m,. This is illustrated in Figure 2a.
For the more general case of unanticipated shifts in U, that are expected to
phase-out gradually we set 8, = pi, where p < 1. Eq. (2.7) then becomes
1 &J’
Y r+l i=O,1,2,3 ,... . (2.10)
=2--T
s
1 -a,
dy
t+S (d)
det
on y, of an unanticipated
on y, of an anticipated
k
Figure l(a). Effect on y, of an unanticipated
s s 0 s
The solution to (2.10) is the sum of the homogeneous solution and the particular
solution yi = y/H) + yi(J’). [See Baumol (1970) for example, for a description of
this solution technique for deterministic difference equations]. The homogeneous
part is
1 (a) (b)
0 k s- 0 k 5
Figure 2(a). Price level effect of an unanticipated unit increase in m, which lasts for one period.
(b). Price level effect of an unanticipated increase in m, which is phased-out gradually. (c). Price level
effect of an anticipated unit increase in m,+k which lasts for one period. The increase is anticipated k
periods in advance. (d). Price level of an anticipated unit increase in m,+k which is phased-out
gradually. The increase is anticipated k periods in advance.
To find the particular solution we substitute yi(‘) = hb’ into (2.10) and solve for
the unknown coefficients h and b. This gives:
b=p, (2.12)
h=6(1-ap)-‘.
Because the homogeneous solution is identically equal to zero, the sum of the
homogeneous and the particular solutions is simply
Sp’
Yi’ I_apy i=o,1,2 )... . (2.13)
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2007
(2.14)
The variable yt is proportional to the shock U, at all t. The effect of a unit shock
E, is shown in Figure lb. Note that yt follows the same type of first order
stochastic process that U, does; that is,
6E
Y,=PYt-l+y--+ (2.15)
Example
For the money demand example, eq. (2.14) implies that
1 1
P1= 1+p
B ( 1 mt
l- i 1+p 1p
i 1
(2.16)
= l+p(l-p) i mr.
As long as p -c 1 the increase in the price level will be less than the increase in the
money supply. The dynamic impact on pt of a unit shock to the money supply is
shown in Figure 2b. The price level increases by less than the increase in the
money supply because of the expected deflation that occurs as the price level
gradually returns to its equilibrium value of 0. The expected deflation causes an
increase in the demand for real money balances which is satisfied by having the
price level rise less than the money supply. For the special case that p = 1, a
permanent increase in the money supply, the price level moves proportionately to
money as in the simple quantity theory. In that case there is no change in the
expected rate of inflation since the price level remains at its new level.
If ICYI> 1, then simply requiring that y, is a stationary process will not yield g
unique solution. In this case eq. (2.9) is stable, and any value of y1 will give a
stationary time series. There is a continuum of solutions and it is necessary to
place additional restrictions on the model if one wants to obtain a unique solution
2008 J. B. Tqlor
for the y,. There does not seem to be any completely satisfactory approach to take
in this case.
One possibility raised by Taylor (1977) is to require that the process for y, have
a minimum variance. Consider the case where U, is uncorrelated. The variance of
yI is given by
Vary,=y,2+(y0-6)2((Y2-1)-1. (2.17)
Y, = EY,&t-1
i=O
=("y1+1)EI+y1E1_1+(Y1/~)&t-2+(Yl/~2)&I-3+ ..-. (2.18)
Lagging (2.18) by one time period, multiplying by a-1 and subtracting from
(2.18) gives
(2.19)
which is ARMA (1,l) model with a free parameter yi. Clearly if yi = 0 then this
more general solution reduces to the solution discussed above. But, rather than
imposing this condition, Chow (1983) has suggested that the parameter yi be
estimated, and has developed an appropriate econometric technique. Evans and
Honkapohja (1984) use a similar procedure for representing ARMA models in
terms of a free parameter.
Are there any economic examples where Ia/ > l? In the case of the Cagan
money demand equation, (Y= p/(1 + p) which is always less than 1 since /3 is a
positive parameter. One economic example where (Y> 1 is a flexible-price macro-
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2009
economic model with money in the production function. To see this consider the
following equations:
-P$ (2.21)
(2.22)
where z, is real output, i, is the nominal interest rate, and the other variables are
as defined in the earlier discussion of the Cagan model. The first equation is the
money demand equation. The second equation indicates that real output is
negatively related to the real rate of interest (an “IS” equation). In the third
equation z, is positively related to real money balances. The difference between
this model and the Cagan model (in eq. (2.3)) is that output is a positive function
of real money balances. The model can be written in the form of eq. (2.1) with
P
a=1+p-d(a+pc-1)’ (2.23)
Eq. (2.23) is equal to the value of (Yin the Cagan model when d = 0. In the more
general case where d > 0 and money is a factor in the production function, the
parameter cycan be greater than one. This example was explored in Taylor (1977).
Another economic example which arises in an overlapping generation model of
money was investigated by Blanchard (1979).
Although there are examples of non-uniqueness such as these in the literature,
most theoretical and empirical applications in economics have the property that
there is a unique stationary solution. However, some researchers, such as
Gourieroux, Laffont, and Monfort (1982), have even questioned the appeal to
stationarity. Sargent and Wallace (1973) have suggested that the stability require-
ment effectively rules out speculative bubbles. But there are examples in history
where speculative bubbles have occurred and some analysts feel they are quite
common. There have been attempts to model speculative bubbles as movements
of y, along a self-fulfilling nonstationary (explosive) path. Blanchard and Watson
(1982) have developed a model of speculative bubbles in which there is a positive
probability that the bubble will burst. Flood and Garber (1980) have examined
whether the periods toward the end of the eastern European hyperinflations in the
1920s could be described as self-fulfilling speculative bubbles. To date, however,
the vast majority of rational expectations research has assumed that there is a
unique stationary solution. For the rest of this paper we assume that lcxl< 1, or
the equivalent in higher order models, and we assume that the solution is
stationary.
2010 J. B. Taylor
Consider now the case where the shock is anticipated k periods in advance and is
purely temporary. That is, u, = E,_~ so that 8,=1 and 8,=0 for i#k. The
difference equations in the unknown parameters can be written as:
(2.25)
p,=6[a%,+&1e t-1
+ a.. + “‘+(kpl) + &t-k]. (2.27)
Substituting (Y= /I/(1 + p), S =l/(l+ /I), and E, = u,+~ = mr+k into (2.27) we
get
Note that eq. (2.30) is identical to eq. (2.10) except that the initial condition starts
at k rather than 0. The homogeneous part of (2.30) is
h=6(1-ap)-‘,
b=p. (2.32)
After the immediate impact of the announcement, yt will grow smoothly until it
equals S(l- ap)-’ at the time that U, increases. The effect then phases out
geometrically. This pattern is illustrated in Figure Id.
Example
For the money demand model, the effect on the price level p, is shown in Figure
2d. As before the anticipation of an increase in the money supply causes the price
level to jump. The price level then increases gradually until the increase in money
actually occurs. During the period before the actual increase in money, the level
of real balances is below equilibrium because of the expected inflation. The initial
increase becomes larger as the phase-out parameter p gets larger. For the
permanent case where p = 1 the price level eventually increases by the same
amount that the money supply increases.
2012 J. B. Taylor
The above solution procedure can be generalized to handle the case where (2.2) is
an autoregressive moving average (ARMA) model. We consider only unantic-
ipated shocks where there is no time delay. Suppose the error process is
an ARMA (p, q) model. The coefficients in the linear process for U, in the form
of (2.2) can be derived from:
mW,p)
ej=#j+ C Pie,-1 j=o,1,2 ,.-.,q,
i=l
miN_i.p)
The initial conditions ( yM_ .., yw_,,) for (2.37), as well as the remaining y
r, .
Yi+l
= iyi - $ji i=O,l )...) M-l. (2.38)
Comparing the form of (2.37) and (2.38) with (2.36) indicates that the y,
coefficients can be interpreted as the infinite moving average representation of an
ARMA (p, A4 - 1) model. That is, the solution for y, is an ARMA (p, A4 - 1)
model with an autoregressive part equal to the autoregressive part of the U,
process defined in eq. (2.35). This result is found in Gourieroux, Laffont, and
Monfort (1982). The methods of Hansen and Sargent (1980) and Taylor (1980a)
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2013
8,=1,
@1= J/1+ Pi@,,
8, = Piei+ P2@cl>
6, = pie;-1 + pzei-2 + p3@i_s i = 3,4 )... . (2.39)
(2.41)
Substituting for y, and expected y, from (2.4) into (2.42) results in a set of
equations for the y coefficients much like the equations that we studied above.
Suppose U, = PU,_~ + E,. Then, the equations for y are
Yo=“lYl+h
Yi+l = 1 i
l-lx,
p--g yi--$& i=1,2,.... (2.43)
2014 J. B. Tuylor
Hence, we can use the same procedures for solving this set of difference equa-
tions. The solution is
Yo = “ibp + 6,
y, = bp’ i =1,2 )... .
where b = 6/(1- (Ye- pa, - pal). Note that this reduces to (2.13) when (Y*= (Ye
= 0.
The solution of the difference eq. (2.7) that underlies this technique has an
intuitive graphical interpretation which corresponds to the phase diagram method
used to solve continuous time models with rational expectations. [See Calvo
(1980) or Dixit (1980) for example]. Eq. (2.7) can be written
1
yi+l
- yi =
i
--1
a 1
yi+ i=O,l )... . (2.44)
The set of values for which y, is not changing are given by setting the right-hand
side of (2.44) to zero. These values of (yi, (3,) are plotted in Figure 3. In the case
where Bi = pi, for 0 < p -c 1 there is a difference equation representation for Bi of
the form
8;+i-ei=(p-1)ei, (2.45)
I I
0 ei
Figure 3. Illustration of the rational expectations solution and the saddle path. Along the saddle path
the motion is towards the origin at geometric rate p. That is, 0, = PO,_ I.
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2015
that it was necessary for yi = (6/l - (~p)e,. This linear equation is shown as the
straight line with the arrows in Figure 3. This line balances off the unstable
vertical forces and uses the stable horizontal forces to bring y, back to the values
yi = 0 and 0; = 0 and i + 00. For this reason it is called a saddle point and
corresponds to the notion of a saddle path in differential equation models [see
Birkhoff and Rota (1962), for example].
Figure 3 is special in the sense that one of the zero-change lines is perfectly
vertical. This is due to the fact that the shock variable U, is exogenous to y,. If we
interpret (2.1) and (2.2) as a two variable system with variables y, and u, as the
two variables, then the system is recursive in that U, affects yt in the current
period and there are no effects of past y, on u,. In Section 2.2 we consider a more
general two variable system in which U, is endogenous.
In using Figure 3 for thought experiments about the effect of one time shocks,
recall that yj is dy,+,/de, and ti, is drc,+,jde,. The vertical axis thereby gives the
paths of the endogenous variable y, corresponding to a shock E, to the policy eq.
(2.2). The horizontal axis gives the path of the policy variable. The points in
Figure 3 can be therefore viewed as displacements of y, and U, from their steady
state values in response to a one-time unit shock.
The arrows in Figure 3 show that the saddle path line must have a slope greater
than zero and a slope less than the zero-change line for y. That is, the saddle path
line must lie in the shaded region of Figure 3. Only in this region is the direction
of motion toward the origin. The geometric technique to determine whether the
saddle path is upward or downward sloping is frequently used in practice to
obtain the sign of an impact effect of policy. [See Calvo (1980) for example].
In Figure 4 the same diagram is used to determine the qualitative movement of
y, in response to a shock to u, which is anticipated k periods in advance and
which is expected to then phase out geometrically. This is the case considered
Figure 4. Illustration of the effect of an anticipated shock to U, which is then expected to be phased
out gradually at geometric rate p. The shock is anticipated k periods in advance. This thought
experiment corresponds to the chart in Figure l(d).
2016 .I. B. Taylor
above in Section 2.1.5. The endogenous variable y initially jumps at time 0 when
the future increase in u becomes known; it then moves along an explosive path
through period k when u increases by 1 unit. From time k on the motion is along
the saddle path as y and u approach their steady state values of zero.
g,-gt-l=P(gt-1-gt-*)+%k.
Thus, the change in the growth rate is anticipated k periods in advance. The
new growth rate is phased in at a geometric rate p. By solving the model for the
particular solution corresponding to this equation, one can solve for the price
level and the inflation rate. In this case, the inflation rate is nonstationary, but the
change in the inflation rate is stationary.
Yl, = a1 E.h+
t
1+ PlOY,, + P,lY,,-1 + 4%
Y2t = a2J3.Yt lr+1 + P2Oh + P21h-l+ 82%
where U, is a shock variable of the form (2.2). Model (2.46) is a special bivariate
model in that there are no lagged values of y,, and no lead values of yzr. This
asymmetry is meant to convey the continuous time idea that one variable ylt is a
“jump” variable, unaffected by its past while y21 is a more slowly adjusting
variable that is influenced by its past values. Of course in discrete time all
variables tend to jump from one period to the next so that the terminology is not
exact. Nevertheless, the distinction is important in practice. Most commonly, y,,
would be a price and y,, a stock which cannot change without large costs in the
short run.
We assume in (2;46) that there is only one shock u,. This is for notational
convenience. The generalization to a bivariate shock (ulrr u2t) where ulr appears
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2017
in the first equation and uzl in the second equation is straightforward, as should
be clear below.
Because (2.46) has this special form it can be reduced to a first order
2-dimensional vector process:
This particular way to construct a first order process follows that of Blanchard
and Kahn (1980). A generalization to the case of viewpoint dates earlier than time
t is fairly straightforward. If yit_i or E,y,,+, also appeared in (2.46) then a
first-order model would have to be more than 2 dimensional.
There are many interesting examples of this simple bivariate model. Five of these
are summarized below.
Example I: Exchange rate overshooting
m,-p,=-a Ee,+i--e, ,
( I 1
Pt-Pt-l=P(et-Pt),
where e, is the log of the exchange rate, and pI and m, are as defined in the
Cagan model. The first equation is simply the demand for money as a function of
the nominal interest rate. In a small open economy with perfect capital mobility
the nominal interest rate is equal to the world interest rate (assumed fixed)
plus the expected rate of depreciation E,e,+i - e,. The second equation describes
the slow adjustment of prices in response to the excess demand for goods. Excess
demand is assumed to be a negative function of the relative price of home goods.
Here prices adjust slowly and the exchange rate is a jump variable. This model is
of the form (2.47) with yit = e,, y,, = p*, a1 =l, pi0 = -l/a, pii = 0, 6, =1/c&
a2 = 0, &a = P/(1 + P), P2i = l/(1 + P), 62 = 0.
Example 2: Open economy portfolio balance model
et + f, = a Ee,+l - e, + u,:
( i
r,-r,_I=P:,.
The first equation represents the demand for foreign assets f, (in logs) evaluated
in domestic currency, as a function of the expected rate of depreciation. Here U,
is a shock. The second equation is the “current account” (the proportional change
in the stock of foreign assets) as a function of the exchange rate. Prices are
assumed to be fixed and out of the picture. This model reduces to (2.47) with
y2, = f,, q = G+ a>, PI0= l/O + ~1, PII = 0, 4 = l/l + a, a2= 0,
ylr = e,,
Pm= P, Pz1=- 1, 6, = 0.
Example 3: Money and capital
Fischer (1979) developed the following type of model of money and capital.
Y, = yk t-1,
rt = - (l- Y)k,-1,
The first two equations describe output y, and the marginal efficiency of capital rt
as a function of the stock of capital at the end of period t - 1. The third and
fourth equations are a pair of portfolio demand equations for capital and real
money balances as a function of the rates of return on these two assets. Lucas
(1976) considered a very similar model. Substituting the first two equations into
the third and fourth we get model (2.47) with
a2 -al(l-v)
Yl, = Pt, Y2t = kt, a1=l+2 &II= 1t-a ’
2
a,= 1
PII = 03
l+a,’ 0L2= (l+b&))’
P*+jYt+j - :(“r+j -
Yjpj[ n*+j-l I’- w*+jnt+j] 3
subject to the linear production function y, = yn,. The random variables p, and
w, are the price of output and the wage, respectively. The first order conditions of
this maximization problem are:
This model is essentially the same as that in Example (4) where U, = wI - ypl.
Equation (2.47) is a vector version of the univariate eq. (2.1). The technique for
finding a solution to (2.47) is directly analogous with the univariate case.
The solution can be represented as
Ylt= EYliEI-i?
i=O
(2.48)
Y,, = E Y*rEt-i.
i=O
2020 J. B. Taylor
where the definitions of the matrices B and C, and the vectors z, and 6 in (2.49)
should be clear, and where A = C-‘B and d = - C-‘6. Let y, = (yli,y2,-J’,
i=O,1,2,... and set y2,_r = 0. Substitution of (2.2) and (2.48) into (2.50) gives
Yi+l
= Ay, + de; i=O,1,2 ,.... (2.51)
Eq. (2.51) is analogous to eq. (2.7). For i = 0 we have three unknown elements of
the unknown vectors y0 = (yr,,,O)’ and yr = (yrr, y&‘. The 3 unknowns are ylo,
yii and yZo. However, there are only two equations (at i = 0) in (2.51) that can be
used to solve for these three parameters. Much as in the scalar case considering
i = 1 gives two more equations, but it also gives two more unknowns ( yr2, y2r); the
same is true for i = 2 and so on. To determine the solution for the y, process we
therefore need another equation. As in the scalar case this third equation comes
by imposing stationarity on the process for y,, and yzr or equivalently in this
context by preventing either element of yj from exploding. For uniqueness we will
require that one root of A be greater than one in modulus, and one root be less
than one in modulus. The additional equation thus comes from choosing yr =
(yir, yZo)’so that yi does not explode as i + co. This condition implies a unique
linear relationship between yir and yZo. This relationship is the extra equation. It
is the analogue of setting the scalar yi = 0 in model (2.1).
To see this, we decompose the matrix A into H- ‘A H where A is a diagonal
matrix with Xi and X, on the diagonal. H is the matrix whose rows are the
characteristic vectors of A. Assume that the roots are distinct and that IX,1> 1
and 1X,1< 1. Let pLi= (pii, pZi)’ = Hy,. Then the homogeneous part of (2.51) is
so that
or
where (hi,, hi,) is the first row of H and is the characteristic vector of A
corresponding to the unstable root A,. Eq. (2.54) is the extra equation. When
combined with (2.51) at i = 0 we have 3 linear equations that can be solved for
yic, yii and yzo. From these we can use (2.51) or equivalently (2.53) to obtain the
remaining yi for i > 1. In particular pli = 0 implies that
h
Yli = - jfY2i-1 i=1,2,...., * (2.55)
11
Given the initial values y2i we compute the remaining coefficients from (2.55) and
(2.56).
and the difference equation described by (2.51) for i > 0 is homogeneous. Hence
the solution given by (2.55) (2.56), and (2.57) is the complete solution.
For the more general case where 13,= pi, eq. (2.57) still holds but the difference
equation in (2.51) for i 2 1 has a nonhomogeneous part. The particular solution to
the nonhomogeneous part is of the form y,“) = gb’ where g is a 2 x 1 vector.
Substituting this form into (2.51) for i 2 1 and equating coefficients we obtain the
particular solution
Since eq. (2.55) is the requirement for stability of the homogeneous solution, the
complete solution can be obtained by substituting y$” = yll - y$” and yif) =
y2,, - ~4:) into (2.54) to obtain
Eq. (2.59) can be combined with (2.57) to obtain yr,,, ytI, and yzO. The remaining
coefficients are obtained by adding the appropriate elements of particular solu-
tions (2.58) to the homogeneous solutions of (2.56) and (2.57).
For the case where the shock is anticipated k periods in advance, but is purely
temporary (6, = 0 for i = 1,. . . , k - 1, 0, = 0 for i = k + 1,. . . ), we break up the
difference eq. (2.51) as:
-- h (2.63)
Ylk+l= j,ff Y2k’
Once Y2k and Ylk+1 have been determined the y values for i > k can be
computed as above in eqs. (2.55) and (2.56). That is,
h 12
Ylr+ 1 = - hY2i i=k ,*.e,
11
To determine y2k and ylk+ 1 we solve eq. (2.63) jointly with the 2( k + 1) equations
in (2.60) and (2.61) for the 2(k + l)+l unknowns yI1 ,..., ylk+l and yzo,. .., yzh.
(Note how this reduces to the result obtained for the unanticipated case above
when k = 0). A convenient way to solve these equations is to first solve the three
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2023
(obtained by “forecasting” yi out k periods) and eq. (2.61) for yzk, ylk+r and
yra. Then the remaining coefficients can be obtained from the difference equations
in (2.60) starting with the calculated value for yIO.
The case where ei=O for i=l,..., k-l and t3k=pk-r for i=k, k-l canbe
solved by adding the particular solution to the nonhomogeneous equation
in place of (2.62) and solving for the remaining coefficients using eqs. (2.60) and
(2.61) as above. The particular solution of (2.67) is
(2.69)
(2.70)
and the vector d = ( - l/cu,O)‘. Suppose that (Y= 1 and /3 = 1. Then the character-
istic roots of A are
The characteristic vector .associated with the unstable root is obtained from
this gives - h,,/h,, = -0.414 so that according to eq. (2.56) the coefficients of
the (homogeneous) solution must satisfy
The particular solution is given by the vector (PI- A)-’ dplWk as in eq. (2.68).
That is
(0.5 - p)/Pk
i=k,k+l,k+2 ,..., (2.75)
‘?= (1.5- p)(O.5- p)-0.25
where k is the number of periods in advance that the shock to the money supply
is anticipated (k = 0 for unanticipated shocks).
In Tables 2, 3, and 4 and in Figures 5, 6, and 7, respectively, the effects of
temporary unanticipated money shocks (k = 0, p = 0), permanent unanticipated
money shocks (k = 0, p = l), and permanent money shocks anticipated 3 periods
Table 2
Effect of an unanticipated temporary increase in money on the exchange rate and
the price level (k = 0, p = 0).
Table 3
Effect of unanticipated permanent increase in money on the exchange rate and
the price level (k = 0, p = 1).
Table 4
Effect of a permanent increase in money anticipated 3 periods in advance on the exchange rate
and the price level (k = 3, p = 1).
Effect on the exchange rate: yr 0.28 0.43 0.71 1.21 1.06 1.02 1.00
particular solution: YliCP) - - - - 1.00 1.00 1.00
homogeneous solution: &W - - - - 0.06 0.02 0.01
Effect on the price level: 0.14 0.28 0.50 0.85 0.96 0.99 1.00
particular solution: 1.00 1.00 1.00 1.00
homogeneous solution: -0.15 - 0.04 - 0.01 -0.00
Yli
-1 0 1 2 3 4i
Figure 5. Temporary
OJL 0
unanticipated
1
increase
2 3
in money.
4i
2026 J. B. Tqvlor
71 i
1.0
I I I I I
0 1 2 3 4i
YZi
o.5 (--
0
-1
L-L 0
I
1
I
2 3
I
4 i
Y2i
0.5 -
,
2 3 4 5 i
If the increase in the money supply is anticipated in advance, then the price
level rises and the exchange rate depreciates at the announcement date. Subse-
quently, the price level and e continue to rise. The exchange rate reaches its
lowest value (e reaches its highest value) on the announcement date, and then
appreciates back to its new long-run value of 1 (Table 4 and Figure 7). Note that
p and e are on explosive paths from period 0 until period 3.
subtracting yri and yzi_ 1 from the first and second equation respectively results in
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (2.77)
According to (2.77) there are two linear relationships between yli and y2r_1
consistent with no change in the coefficients: Ayli=r = 0 and Ay2, = 0. For
example, in the exchange rate model in eq. (2.69), the equations in (2.77) become
P 1
AY1i+ 1=
“(l+@y”+ ar(1+p)Y2”’
(2.78)
Yli
Yli=-o YZi-1
I I
0 YZi-1
Figure 8. Geometric interpretation of the solution in the bivariate model. The darker line is the
saddle point path along which the impact coefficients converge to the equilibrium value of (0,O).
4
71 i
0.2 -
-0.2 1 I I
-0.2 -0.1 0 0.1 0.2 Y2i-1
Figure 9. Solution values for the case of temporary-unanticipated shocks. (k = 0, p = 0). The
numbered points are the values of i. See also Table 2 and Figure 5.
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations
Yl, = - LY2i-lY
P (2.79)
Yli = Y2i-19
and are plotted in Figure 8. The arrows in Figure 8 show the directions of motion
according to eq. (2.78) when the no-change relationships in (2.79) are not
satisfied. It is clear from these arrows that if the y coefficients are to converge to
1.6
71i
1.4
1.2
1 .o
0.8
0.6 L
I I I I I I
-0.2 0.8 1.0 1.2
-0.2 0 0.2 0.4 0.6
%!i-1
Figure 10. Solution values for a permanent unanticipated increase in the money supply. The open
circles give the ( yIi, vz,) pairs starting with i = 0.
2030 J. R. Tqhr
1.4
Yli r
0.8 -
I
0.6 - I
I
I
I
I
0.4 - I
I
I
I
I
I
I
I
I
I
-0.2 I I I____L_ I I I
-0.2 0 0.2 0.4 0.6 0.8 1.0 1.2
YZi-1
Figure 11. Solution values for an anticipated permanent increase in the money supply. The open
circles give the y,, , yzzpairs starting with i = 0.
their equilibrium value (0,O) they must move along the “saddle point” path
shown by the darker line in Figure 8. Points off this line will lead to ever-increas-
ing values of the y coefficients. The linear combination of yli and yzi_i along this
saddle point path is given by the characteristic vector associated with the unstable
root A, as given in general by eq. (2.55) and for this example in eq. (2.73). Note
how Figure 8 immediately shows that the saddle point path is downward sloping.
In Figure 9 the solution values for the impacts on the exchange rate and the price
level are shown for the case of a temporary shock as considered in Table 2 and
Figure 5. In Figures 10 and 11, the solution values are shown for the case where
the increase in money is permanent. The permanent increase shifts the reference
point from (0,O) to (1,l). The point (1,l) is simply the value of the particular
Ch. 34: S~uhilrzation Policy in Macroeconomic Fluctuatior~s 2031
solution in this case. Figure 10 is the case where the permanent increase is
unanticipated; Figure 11 is the anticipated case.
Note that these diagrams do not give the impact on the exchange rate and the
price level in the Same period; they are one period out of synchronization. Hence,
the points do not correspond to a scatter diagram of the effects of a change in
money on the exchange rate and on the price level. It is a relatively simple matter
to deduce a scatter diagram as shown by the open circles in Figures 10 and 11.
As the previous Sections have shown, the problem of solving rational expectations
models is equivalent to solving nonhomogeneous deterministic difference equa-
tions. The homogeneous solution is obtained simply by requiring that the stochas-
tic process for the endogenous variables be stationary. Once this is accomplished,
most of the work comes in obtaining the particular solution to the nonhomoge-
neous part. Lag or lead operators, operator polynomials, and the power series
associated with these polynomials (i.e. generating functions or z-transformations)
have frequently been found useful in solving the nonhomogeneous part of
difference equations [see Baumol(1970), for economic examples]. These methods
have also been useful in rational expectations analysis. Futia (1981) and
Whiteman (1983) have exploited the algebra of z-transforms in solving a wide
range of linear rational expectations models.
To illustrate the use of operators, let FSxt = xt+$ be the forward lead operator.
Then the scalar equation in the impact coefficients that we considered in eq. (2.7),
can be written
Consider the case where 0, = pi and solve for yi by operating on both sides by the
inverse of the polynomial (1 - crF). We then have
Sp’
y;= l-&7
=- 6P’
i=O,1,2.... (2.81)
l-ap
the last equality follows from the algebra of operator polynomials [see for
example Baumol (1970)]. The result is identical to what we found in Section 2.1
using the method of undetermined coefficients to obtain the particular solution.
The procedure easily generalizes to the bivariate case and yields the particular
2032 J. B. Taylor
solution shown in eq. (2.58). It also generalizes to handle other time series
specifications of Bi.
The operator notation used in (2.80) is standard in difference equation analysis.
In some applications of rational expectations models, a non-standard operator
has been used directly on the basic model (2.1). To see this redefine the operator
F as FE,y, = E,y,+,. That is, F moves the date on the variable but the viewpoint
date in the expectation is held constant. Then eq. (2.1) can be written (note that
E,Y, = Y,):
(1-aF)vy,=S,. (2.82)
Ey,=6(1-aF)-‘u,
I
=S(lft~F+(aF)~+ -)ut
u,+cKEu,+~+(w*Eu,+~+ ..a
1
=6(u,+apu,+(ap)2u,+ . ..)
s
= l_%’ (2.83)
and where we again assume that U, = put-i + cl. Eq. (2.83) gives the same answer
that the previous methods did (again note that E,y, = y,). As Sargent (1979, p.
337) has discussed, the use of this type of operator on conditional expectations
can lead to confusion or mistakes, if it is interpreted as a typical lag operator that
shifts all time indexes, including the viewpoint dates. The use of operators on
conventional difference operations like (2.6) is much more straightforward, and
perhaps it is best to think of the algebra in (2.82) and (2.83) in terms of (2.80) and
(2.81).
Whiteman’s (1983) use of the generating functions associated with the operator
polynomials can be illustrated by writing the power series corresponding to eqs.
(2.2) and (2.4):
Y(Z) = f YiZi,
i=O
e(z) = f B,z’.
i=O
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2033
These are the z-transforms [see Dhrymes (1971) for a short introduction to
z-transforms and their use in econometrics]. Equating the coefficients of E,_, in
eq. (2.6) is thus the same as equating the coefficients of powers of z. That is, (2.6)
means that
y(Z)=(l-Cx-rz)-l(yo-&x-lze(z)). (2.85)
As in Section 2.1, eq. (2.85) has a free parameter y,, which must be determined
before y(z) can be evaluated. For Y! to be a stationary process, it is necessary that
y(z) be a convergent power series (or equivalently an analytic function) for
]z( ~1. The term (l- a-‘~)-~ on the right-hand side of (2.85) is divergent if
(Y-’ > 1. Hence, the second term in parentheses must have a factor to “cancel out”
this divergent series. For the case of serially uncorrelated shocks, e(z) is a
constant 0, = 1 so that it is obvious that y0 = 6 will cancel out the divergent
series. We then have y(z) = 6 which corresponds with the results in Section 2.1.
Whiteman (1983) shows that in general y(z) will be convergent when ]a] < 1 if
y0 = M(a). For the unanticipated autoregressive shocks this implies that y(z) =
S(1 - par)-‘(1 - pz) which is the z-transform of the solution we obtained earlier.
When ]a( > 1 there is no natural way to determine yO, so we are left with
non-uniqueness as in Section 2.1.
We noted in Section 2.2 that a first-order bivariate model with one lead variable
could be interpreted as a second-order scalar model with a lead and a lag. That is,
can be written as a bivariate model and solved using the saddle point stability
method. An alternative approach followed by Sargent (1979), Hansen and
Sargent (1980) and Taylor (1980a) is to work with (2.86) directly. That the two
approaches give the same result can be shown formally.
Substitute for y,, y,_,, and E,y,+, in eq. (2.86) using (2.4) to obtain the
equations
(2.87)
Yi+l
= tyi - zyi_, - $ei i=1,2 )... . (2.88)
2034 J. B. Taylor
As above, we need one more equation to solve for all the y coefficients. Consider
first the homogeneous part of (2.88). Its characteristic polynomial is
1 a2
z2 --z+--, (2.89)
a1 a1
(Ai - 402 - 47
(2.90)
where A, and A, are the roots of (2.89). The solution to the homogeneous part is
Y,(H) = klAil + k2Ai2. As we discussed above, in many economic applications one
root, say hi, will be larger than 1 in modulus and the other will be smaller than 1
in modulus. Thus, the desired solution to the homogeneous part is achieved by
setting k, = 0 so that y,cH)= k,X’, where k, equals the initial condition ydH).
Equivalently we can interpret the setting of k, = 0 as reducing the characteristic
polynomial (2.89) to (z - h2). Thus, the y coefficients satisfy
(2.92)
y;=-~y,l= 1
11 i --x1
a1 1
Y,-1. (2.93)
For the two methods to be equivalent, we need to show that (2.91) and (2.93)
are equivalent, or that h, = l/a, - Xi. This follows immediately from the fact
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2035
that the sum of the roots (A, + A,) of a second-order polynomial equals the
coefficients of the linear term in the polynomial: A, + X2 = l/(~r.
For the case where fIi = pi, we need to compare the particular solutions as well.
For the second-order scalar model we guess the form y/P) = ab’. Substituting this
into (2.88) we find that b = p and u = 6(1- (~rp - cy2p?‘))‘. To see that this
gives the same value for the particular solution that emerges from the matrix
formulation in eq. (2.58), note that
(PI- A)-‘dp’=
I 6 \
1 -q
= pi. (2.94)
P2-$p+012 _s
a1 \ “l/
Eq. (2.94) gives the particular solution for the vector (yi(‘), y,‘!‘,‘), which corre-
sponds to the vector y!‘
1 ) in eq. (2.58). Hence
- pcll, ‘Sp’
y/P’=
p2 - pa; l+ a2a; l
= 6P’
1- ‘ylp - (Y2p-l ’
which is the particular solution obtained from the second-order scalar representa-
tion.
Rather than obtaining the solution of the homogeneous system by factoring the
characteristic equation, one can equivalently factor the polynomial in the time
shift operators. Because the operator polynomials also provide a convenient way
to obtain the nonhomogeneous solution (as was illustrated in Section 2.3), this
approach essentially combines the homogeneous solution and the nonhomoge-
neous solution in a notationally and computationally convenient way.
Write (2.88) as
(2.95)
Let H(L) = L-l -l/cur + ((Y~/(Y~)L be the polynomial on the left-hand side of
2036 J. B. Tuylor
The particular solution also can be written using the operator notation:
SCY,lp’
yy) = (2.98)
/.l(l- +P)(l- l/L) .
The first term on the right-hand side of (2.99) equals zero. Therefore the complete
solution is given by
&Y;‘p’
Yi='ZYi-1+
x,(1 - x,‘r’)
SCX;‘p’
= A*yi_l + (2.100)
A,(1 - px;r> *
positive powers of L (lags) and the other involving negative powers of L (leads),
and (2) operate on both sides of (2.95) by the inverse of the polynomial involving
negative powers of L.
It is clear from (2.94) that the yj weights are such that the solution for y, can be
represented as a first-order autoregressive process with a serially correlated error:
where
In the papers by Sargent (1979) Taylor (1980a) and Hansen and Sargent
(1980) the difference equation in (2.95) was written y, = E,y,+, and ei = E,u,+;, a
form which can be obtained by taking conditional expectations in eq. (2.86). In
other words rather than working with the moving average coefficients they worked
directly with the conditional expectations. As discussed in Section 2.3 this
requires the use of a non-standard lag operator.
It is useful to note that the problem of solving rational expectations models can
be thought of as a boundary value problem where final conditions as well as
initial conditions are given. To see this consider the homogeneous equation
Yi+l
= iYi i=O,l )... . (2.102)
The stationarity conditions place a restriction on the “ final” value limj ~ ,yi = 0
rather than on the “initial” value y,,. As an approximation we want y, = 0 for
large j. A traditional method to solve boundary value problems is “shooting”:
One guesses a value for y,, and then uses (2.102) to project (shoot) a value of y,
for some large j. If the resulting y, # 0 (or if y, is further from 0 than some
tolerance range) then a new value (chosen in some systematic fashion) of y0 is
tried until one gets yj sufficiently close to zero. It is obvious in this case that
y0 = 0 so it would be impractical to use such a method. But in nonlinear models
the approach can be quite useful as we discuss in Section 6.
2038 J. B. Taylor
This approach obviously generalizes to higher order systems; for example the
homogeneous part of (2.88) is
with y_ t = 0 as one initial condition and y, = 0 for some large j as the one
“final” condition. This is a two point boundary problem which can be solved in
the same way as (2.102).
the way the policymakers respond to events - that is, changes in their policy rules.
For this we can make use of stochastic equilibrium solutions examined in Section
2. We illustrate this below.
Consider the following policy problem which is based on model (2.1). Suppose
that an econometric policy advisor knows that the demand for money is given by
Here there are two shocks to the system, the supply of money m, and the demand
for money u,. Suppose that u, = put-r + E,, and that in the past the money
supply was fixed: m, = 0; suppose that under this fixed money policy, prices were
thought to be too volatile. The policy advisor is asked by the Central Bank for
advice on how m, can be used in the future to reduce the fluctuations in the price
level. Note that the policy advisor is not asked just what to do today or tomorrow,
but what to do for the indefinite future. Advice thus should be given as a
contingency rule rather than as a fixed path for the money supply.
Using the solution technique of Section 2, the behavior of pt during the past is
Et
p*=ppt-1- l+p(l_p). (3.2)
m,-p,=--P(fp,-p,)+u,. (3.3)
m,-u,
(3.4)
pr= l+P(l-p).
Considering a feedback policy rule of the form m, = gu,_ r eq. (3.4) implies
2[g2+1-2gp]. (3.5)
varpf= [l+&)l’(l-pZ)u~
If there were no cost to varying the money supply, then eq. (3.5) indicates that the
best choice for g to minimize fluctuation in pr is g = p.
2040 J. B. Tqdor
But we know that (3.5) is incorrect if g # 0. The error was to assume that
E*Pl+l = ppl regardless of the choice of policy. This is the expectations error that
rational expectations was designed to avoid. The correct approach would have
been to substitute m, = gut-i directly into (3.1) and calculate the stochastic
equilibrium for pt. This results in
-I-P(l-d
pt= (l+p)(l+~(I-p))uf+~u~-,. (3.6)
Note how the parameters of (3.6) depend on the parameters of the policy rule.
The variance of pr is
(3.7)
for many policy evaluation questions. Rather one should model structural rela-
tionships. The parameters of the reduced form are, of course, functions of the
structural parameters in the standard Cowles Commission setup. The discussion
by Marschak (1953), for example, is remarkably similar to the more recent
rational expectations critiques; Marschak did not consider expectations variables,
and in this sense the rational expectations critique is a new extension. But earlier
analyses like Marschak’s are an effort to explain why structural modeling is
necessary, and thus has much in common with more recent research.
In the policy evaluation procedure discussed above, the government acts like a
dominant player with respect to the private sector. The government sets g and the
private sector takes g as given. The government then maximizes its social welfare
function across different values of g. One can imagine alternatively a game
theoretic setup in which the government and the private sector each are maximiz-
ing utility. Chow (1983) Kydland (1975) Lucas and Sargent (1981), and Epple,
Hansen, and Roberds (1983) have considered this alternative approach. It is
possible to specify the game theoretic model as a choice of parameters of decision
rules in the steady state or as a formal non-steady state dynamic optimization
problem with initial conditions partly determining the outcome. Alternative
solution concepts including Nash equilibria have been examined.
The game-theoretic approach naturally leads to the important time incon-
sistency problem raised by Kydland and Prescott (1977) and Calvo (1979). Once
the government announces its policy, it will be optimal to change it in the future.
The consistent solution in which everyone expects the government to change is
generally suboptimal. Focussing on rules as in Section 3.1 effectively eliminates
the time inconsistency issue. But even then, there can be temptation to change the
rule.
4. Statistical inference
The statistical inference issues that arise in rational expectations models can be
illustrated in a model like that of Section 2.
x, = E, + BIEt_l + * . . + 8qEt-q’
(4.2)
where E, is serially uncorrelated and assume that Cov( u,, E$)= 0 for all t and s.
To obtain the full information maximum likelihood estimate of the structural
system (4.1) and (4.2) we need to reduce (4.1) to a form which does not involve
expectations variables. This can be done by solving the model using one of the
techniques described in Section 2. Using the method of undetermined coefficients,
for example, the solution for yt is
’Yo
Yl
(4.4)
\ y9
Eqs. (4.2) and (4.3) together form a two dimensional vector model.
(4.5)
(1983) Taylor (1979, 1980a), and Wickens (1982). As in this example, the basic
approach is to find a constrained reduced form and maximize the likelihood
function subject to the constraints. Hansen and Sargent (1980, 1981) have
emphasized these cross-equation constraints in their expositions of rational expec-
tations estimation methods. In Muth (1981), Wickens (1982) and Taylor (1979)
multivariate models were examined in which expectations are dated at t - 1 rather
than 1 and E,_iy, app ears in (4.1) rather than E,y,+,. More general multivariate
models with leads and lags are examined in the other papers.
For full information estimation, it is also important that the relationship
between the structural parameters and the reduced form parameters can be easily
evaluated. In this example the mapping from the structural parameters to the
reduced form parameters is easy to evaluate. In more complex models the
mapping does not have a closed form; usually because the roots of high-order
polynomials must be evaluated.
4.2. Identification
There has been relatively little formal work on identification in rational expecta-
tions models. As in conventional econometric models, identification involves the
properties of the mapping from the structural parameters to the reduced form
parameters. The model is identified if the structural parameters can be uniquely
obtained from the reduced form parameters. Over-identification and under-iden-
tification are similarly defined as in conventional econometric models. In rational
expectations models the mapping from reduced form to structural parameters is
much more complicated than in conventional models and hence it has been
difficult to derive a simple set of conditions which have much generality. The
conditions can usually be derived in particular applications as we can illustrate
using the previous example.
When q = 0, there is one reduced form parameter ya, which can be estimated
from (4.2) and (4.3), recalling that Cov (u,, E,) = 0, and two structural parameters
J and (Yin eq. (4.4). Hence, the model is not identified. In this case, 6 = y0 is
identified from the regression of y, on the exogenous x,, but (Yis not identified.
When q = 1, there are three reduced form parameters yO, y1 and 8, which can be
:stimated from (4.2) and (4.3), and three structural parameters 6, L-X, and 8,. (0, is
,oth a structural and reduced form parameter since x, is exogenous). Hence, the
node1 is exactly identified according to a simple order condition. More generally,
here are q + 2 structural parameters (6, (Y,6’,,. . . , 8,) and 2q + 1 reduced form
jarameters (ye, yi,. . . , y,, 8,, . . . , f?,) in this model. According to the order condi-
ions, therefore, the model is overidentified if q > 1.
Treatments of identification in more general models focus on the properties of
he cross-equation restrictions in more complex versions of eq. (4.4). Wallis (1980)
ives conditions for identification for a class of rational expectations models; the
2044 J. B. Taylor
Three different types of “limited information” estimates have been used for
rational expectations models. These can be described using the model in (4.1) and
(4.2). One method investigated by Wallis estimates (4.2) separately in order to
obtain the parameters 8,, . . . , 6,. These estimates then are taken as given (as
known parameters) in estimating (4.3). Clearly this estimator is less efficient than
the full information estimator, but in more complex problems the procedure saves
considerable time and effort. This method has been suggested by Wallis (1980)
and has been used by Papell(1984) and others in applied work.
A second method proposed by Chow (1983) and investigated by Chow and
Reny (1983) was mentioned earlier in our discussion of nonuniqueness. This
method does not impose the saddle point stability constraints on the model. It
leads to an easier computation problem than does imposing the saddle point
constraints. If the investigator does not have any reason to impose this constraint,
then this could prove quite practical.
A third procedure is to estimate eq. (4.1) as a single equation using instrumen-
tal variables. Much work has been done in this area in recent years, and because
of computational costs of full information methods it has been used frequently in
applied research. Consider again the problem of estimating eq. (4.1). Let e, =
E,Y,+, - Y,+1 be the forecast error for the prediction of Y,. Substitute E,y,+, into
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2045
(4.1) to get
By finding instruments of variables for yI+ i that are uncorrelated with u, and e,, 1
one can estimate (4.6) using the method of instrumental variables. In fact this
estimate would simply be the two stage least squares estimate with y,, i treated as
if it were a right-hand side endogenous variable in a conventional simultaneous
equation model. Lagged values of x, could serve as instruments here. This
estimate was first proposed by McCallum (1976).
Several extensions of MeCallum’s method have been proposed to deal with
serial correlation problems including Cumby, Huizinga and Obstfeld (1983)
McCallum (1979), Hayashi and Sims (1983) Hansen (1982), and Hansen and
Singleton (1982). A useful comparison of the efficiency of these estimators is
found in Cumby, Huizinga and Obstfeld (1983).
by stacking y,, y,_,, . . . , Y,_~ into the vector z, much as in eq. (2.50). (It is
necessary that A, be nonsingular to write (5.1) as (5.2)). Anderson and Moore
(1984) have developed an algorithm that reduces equations with a singular A,
into an equivalent form with a nonsingular matrix coefficient of yfiq and have
applied it to an econometric model of the U.S. money market. (Alternatively,
Preston and Pagan (1982, pp. 297-304) have suggested that a “shuffle” algorithm
described by Luenberger (1977) be used for this purpose). In eq. (5.2) let z1 be an
n-dimensional vector and let u, be an m dimensional vector of stochastic
disturbances. The matrix A is n x n and the matrix D is n i< m.
We describe the solution for the case of unanticipated temporary shocks:
u, = Ed where E, is a serially uncorrelated vector with a zero mean. Alternative
assumptions about u, can be handled by the methods discussed in Section 2.2.
The solution for zy can be written in the general form:
r,=AI’,+D,
Note that these matrix difference equations hold for each column of c sep-
arately; that is
yl= Ax, + d,
Yr+1= hi i =1,2 ,..., (5.5)
To get a unique solution in the general case, we therefore need (2n - k)- n = n
- k additional equations. These additional equations can be obtained by requir-
ing that the solution for y, be stationary or equivalently in this context that the yi
do not explode. If there are exactly n - k distinct roots of A which are greater
than one in modulus, then the saddle point manifold will give exactly the number
of additional equations necessary for a solution. The solution will be unique. If
there are less than n - k roots then we have the same nonuniqueness problem
discussed in Section 2.
Suppose this root condition for uniqueness is satisfied. Let the n - k roots of A
that are greater than one in modulus be hi,. . . , A,_,. Diagonalize A as K’AH =
A. Then
where A, is a diagonal matrix with all the unstable roots on the diagonal. The y
vectors are partitioned accordingly and the rows (HII, HJ of H are the char-
acteristic vectors associated with the unstable roots. Thus, for stability we require
These n - k equations define the saddle point manifold and are the additional
n - k equations needed for a solution. Having solved for yi and the unknown
elements of yO we then obtain the remaining yi coefficients from
As yet there has been relatively little research with nonlinear rational expectations
models. The research that does exist has been concerned more with solution and
policy evaluation rather than with estimation. Fair and Taylor (1983) have
investigated a full-information estimation method for a non-linear model based
on a solution procedure described below. However, this method is extremely
expensive to use given current computer technology. Hansen and Singleton (1982)
have developed and applied a limited-information estimator for nonlinear models.
There are a number of alternative solution procedures for nonlinear models
that have been investigated in the literature. They generally focus on deterministic
models, but can be used for stochastic analysis by stochastic simulation tech-
niques.
Three methods are reviewed here: (1) a “multiple shooting” method, adopted
for rational expectations models from two-point boundary problems in the
differential equation literature by Lipton, Poterba, Sachs, and Summers (1982)
(2) an “extended path” method based on an iterative Gauss-Seidel algorithm
examined by Fair and Taylor (1983), and (3) a nonlinear stable manifold method
examined by Bona and Grossman (1983). This is an area where there is likely to
be much research in the future.
A general nonlinear rational expectation model can be written
(6.1)
rather than through period t. For continuity with the rest of this paper, we
continue to assume that the information is through period t, but the methods can
easily be adjusted for different viewpoint dates. We also distinguish between
exogenous variables and disturbances, because some of the nonlinear algorithms
can be based on known future values of x, rather than on forecasts of these from
a model like (2.2).
This approach has been examined by Fair and Taylor (1983) and used to solve
large-scale nonlinear models. Briefly it works as follows. Guess values for the
E,y,+, in eq. (6.1) for j = 1,. _., J. Use these values to solve the model to obtain a
new path for yI+,. Replace the initial guess with the new solution and repeat the
process until the path Y,+~,j = 1,. . . , J converges, or changes by less than some
tolerance range. Finally, extend the path from J to J + 1 and repeat the previous
sequence of iterations. If the values of y,, on this extended path are within the
tolerance range for the values of J + 1, then stop; otherwise extend the path one
more period to J + 2 and so on. Since the model is nonlinear, the Gauss-Seidel
method is used to solve (6.1) for each iteration given a guess for y,,,. There are no
general proofs available to show that this method works for an arbitrary nonlin-
ear model. When applied to the linear model in Section (2.1) with Ial < 1 the
method is shown to converge in Fair and Taylor (1983). When J(Y~>1, the
2050 J. B. Taylor
iterations diverge. A convergence proof for the general linear model is not yet
available, but many experiments have indicated that convergence is achieved
under the usual saddle path assumptions. This method is expensive but is fairly
easy to use. An empirical application of the method to a modified version of the
Fair.model is found in Fair and Taylor (1983) and to a system with time varying
parameters in Taylor (1983). Carlozzi and Taylor (1984) have used the method to
calculate stochastic equilibria. This method also appears to work well.
In Section (2.4) we noted that the solution of the second-order linear difference
eq. (2.88) is achieved by placing the solution on the stable path associated with
the saddle point line. For nonlinear models one can use the same approach after
linearizing the system. The saddle point manifold is then linear. Such a lineariza-
tion, however, can only yield a local approximation.
Bona and Grossman (1983) have experimented with a method that computes a
nonlinear saddle-point path. Consider a deterministic univariate second-order
version of (6.1):
(6.3)
where we have one initial condition y,. Note that eq. (6.2) is a nonlinear version
of the homogeneous part of eq. (2.88) and eq. (6.3) is a nonlinear version of the
saddle path dynamics (2.91).
Bona and Grossman (1983) compute g( .) by a series of successive approxima-
tions. If eq. (6.3) is to hold for all values of the argument of g then
must hold for every value of x (at least within the range of interest). In the
application considered by Bona and Grossman (1983) there is a natural way to
write (6.4) as
approximations:
&+1(X)= ~(g”(gnb))~
&b>~ 4 n=0,1,2 ).... (6.6)
The initial function g,(x) can be chosen to equal the linear stable manifold
associated with the linear approximation of f( .) at x.
Since this sequence of successive approximations must be made at every x,
there are two alternative ways to proceed. One can make the calculations
recursively for each point y, of interest; that is, obtain a function g for x = y,, a
new function for x = y, and so on. Alternatively, one could evaluate g over a grid
of the entire range of possible values of k, and form a “meta function” g which is
piecewise linear and formed by linear interpolation for the value of x between the
grid points. Bona and Grossman (1983) use the first procedure to numerically
solve a macroeconomic model of the form (6.2).
It is helpful to note that when applied to linear models the method reduces to a
type of undetermined coefficients method used by Lucas (1975) and McCallum
(1983) to solve rational expectations models (a different method of undetermined
coefficients than that applied to linear process (2.4) in Section 2 above). To see
this, substitute a linear function y, = gy_, into
the deterministic difference equation already considered in eq. (2.88). The result-
ing equation is
2
g’---g+$ y,_1=0. (6.8)
i _I
Setting the term in parenthesis equal to zero, yields the characteristic polynomial
of (6.7) which appears in eq. (2.89). Under the usual assumption that one root is
inside and one root is outside the unit circle a unique stable value of g is found
and is equal to stable root h, of (2.89).
7. Concluding remarks
As its title suggests, the aim of this chapter has been to review and tie together in
an expository way the extensive volume of recent research on econometric
techniques for macroeconomic policy evaluation. The table of contents gives a
good summary of the subjects that I have chosen to review. In conclusion it is
perhaps useful to point out in what ways the title is either overly inclusive or not
inclusive enough relative to the subjects actually reviewed.
2052 J. B. Tuylor
References
Anderson, Gary and George Moore (1984) “An Efficient Procedure for Solving Linear Perfect
Foresight Models”. Board of Governors of the Federal Reserve Board, unpublished manuscript.
Anderson, T. W. (1971) The Statistical Anulysis of Time Series. New York: Wiley.
Baumol, W. J. (1970) Economic Dynamics: An Introduction, 3d ed. New York: Macmillan.
Birkhoff, Garret and G. C. Rota (1962) Ordinary DifSerential Equations. Waltham: Blaisdell, 2nd
Edition.
Blanchard, Olivier J. (1979) “Backward and Forward Solutions for Economies with Rational Expecta-
tions”, Americun L&onomic Review, 69, 114-118.
Blanchard. Olivier J. (1982) “Identification in Dvnamic Linear Models with Rational Exoectations”.
Technical Paper No. 24, ‘National Bureau of Economic Research.
Blanchard, Oliver and Charles Kahn (1980) “The Solution of Linear Difference Models under
Rational Expectations”, Econometrica, 48, 1305-1311.
Blanchard, Olivier and Mark Watson (1982) “Rational Expectations, Bubbles and Financial Markets”,
in: P. Wachtel, ed., Crises in The Economrc and Financial Srructure. Lexington: Lexington Books.
Bona, Jerry and Sanford Grossman (1983) “Price and Interest Rate Dynamics in a Transactions Based
Model of Money Demand”. University of Chicago, unpublished paper.
Buiter, Willem H. and Marcus Miller (1983) “Real-Exchange Rate &&shooting and the Output Cost
of Brinaina Down Inflation: Some Further Results”. in: J. A. Frenkel. ed.. Exchange Rates and
Jnterna~on~l Macroeconomics. Chicago: University of Chicago Press for National Bureau of
Economic Research.
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2053
Cagan, Phillip (1956) “The Monetary Dynamics of Hyperinflation”, in: M. Friedman, ed., Studies in
the Quantity Theo& of Money. Chicago: University of Chicago Press.
Calvo. Guillermo (1978) “On The Time Consistencv _ of Ontimal _ Policv in a Monetary Economy”.
Econometrica, 46, 1411-1428.
Calvo, Guillermo (1980) “Tax-Financed Government Spending in a Neoclassical Model with Sticky
Wages and Rational Expectations”, Journal of Economic Dynamics and Control, 2, 61-78.
Carlozzi, Nicholas and John B. Taylor (1984) “International Capital Mobility and the Coordination of
Monetary Rules”, in: J. Bandhari, ed., Exchange Rate Management under Uncertainty. MIT Press,
forthcoming.
Chow, G. C. (1983) Econometrics. New York: McGraw Hill.
Chow, Gregory and Philip J. Reny (1984) “On Two Methods for Solving and Estimating Linear
Simultaneous Equations with Rational Expectations”. Princeton University, unpublished paper.
Christiano, Lawrence J. (1984) “Can Automatic Stabilizers be Destablizing: An Old Question
Revisited”, Carnegie- Rochester Conference Series on Public Policy, 20, 147-206.
Cumby, Robert E., John Huizinga and Maurice Obstfeld (1983) “Two-Step Two-Stage Least Squares
Estimation in Models with Rational Expectations”, Journal of Econometrics, 21, 333-355.
Dagli, C. Ates and John B. Taylor (1985) “Estimation and Solution of Linear Rational Expectations
Models Using a Polynomial Matrix Factorization”, Journal of Economic Dynamics and Control,
forthcoming.
Dhrymes, Pheobus J. (1971) Distributed Lags: Problems of Estimation and Formulation. San Francisco:
Holden-Day.
Dixit, Avinash (1980) “A Solution Technique for Rational Expectations Models with Applications to
Exchange Rate and Interest Rate Determination”. Princeton University, unpublished paper.
Dombusch, Rudiger (1976) “Expectations and Exchange Rate Dynamics”, Journal of Political
Economy, 84, 1161-1176.
Epple, Dennis, Lam P. Hansen and William Roberds (1983) “Linear Quadratic Games of Resource
Depletion”, in: Thomas J. Sargent, ed., Energy, Foresight, and Strategy. Washington: Resources for
the Future.
Evans, George and Seppo Honkapohja (1984) “A Complete Characterization of ARMA Solutions to
Linear Rational Expectations Models”. Technical Report No. 439, Institute for Mathematical
Studies in the Social Sciences, Stanford University.
Fair, Ray (1986) “Evaluating the Predictive Accuracy of Models”, in: Z. Griliches and M. Intriligator,
eds., Handbook of Econometrics. Amsterdam: North-Holland, Vol. III.
Fair, Ray and John B. Taylor (1983) “Solution and Maximum Likelihood Estimation of Dynamic
Nonlinear Rational Expectations Models”, Econometrica, 51, 1169-1185.
Fischer, Stanley (1979) “Anticipations and the Nonneutrality of Money”, Journal of Political
Economy, 87, 225-252.
Flood, R. P. and P. M. Garber (1980) “Market Fundamentals versus Price-Level Bubbles: The First
Tests”, Journal of Political Economy, 88, 745-770.
Futia, Carl A. (1981) “Rational Expectations in Stationary Linear Models”, Econometrica, 49,
171-192.
Geweke, John (1984) “Inference and Causality in Economic Time Series Models”, in: Z. Griliches and
M. Intriligator, eds., Handbook of Econometrics. Amsterdam: North-Holland, Vol. II.
Gourieroux, C., J. J. Laffont and A. Monfort (1982) “Rational Expectations in Linear Models:
Analysis of Solutions”, Econometrica, 50,409-425.
Hansen, Lam P. (1982) “Large Sample Properties of Generalized Method of Moments Estimators”,
Econometrica, 50,1029-1054.
Hansen, Lam P. and Thomas J. Sargent (1980) “Formulating and Estimating Dynamic Linear
Rational Expectations Models”, Journal of Economic Dynamics and Control, 2, 7-46.
Hansen, Lam P. and Thomas J. Sargent (1981) “Linear Rational Expectations Models for Dynami-
cally Interrelated Variables”, in: R. E. Lucas and T. J. Sargent, eds., Rational Expectations and
Econometric Practice. Minneapolis: University of Minnesota Press.
Hansen, L. P. and K. Singleton (1982) “Generalized Instrumental Variables Estimation of Nonlinear
Rational Expectations Models”, Econometrica, 50, 1269-1286.
Harvey, Andrew C. (1981) Time Series Models. New York: Halsted Press.
2054 J. R. Tuylor
Hayashi, Fumio and Christopher Sims (1983) “Nearly Efficient Estimation of Time Series Models
with Predetermined, but not Exogenous, Instruments”, Econometrica, 51, 783-798.
Kendrick, David (1981) “Control Theory with Applications to Economics”, in: K. Arrow and M.
Intriligator, eds., Amsterdam: North-Holland, Vol. I.
Kouri, Pentti J. K. (1976) “The Exchange Rate and the Balance of Payments in the Short Run and in
the Long Run: A Monetary Approach”, Scandinavian Journal of Economics, 78, 280-304.
Kydland, Finn E. (1975) “Noncooperative and Dominant Player Solutions in Discrete Dynamic
Games”, International Economic Review, 16, 321-335.
Kydland, Finn and Edward C. Prescott (1977) “Rules Rather Than Discretion: The Inconsistency of
Optimal Plans”, Journal of Political Econom.v, 85, 473-491.
Kydland, Finn and Edward C. Prescott (1982) “Time to Build and Aggregate Fluctuations”,
Econometrica, 50, 1345-1370.
Lipton, David, James Poterba, Jeffrey Sachs and Lawrence Summers (1982) “Multiple Shooting in
Rational Expectations Models”, Economefricu, 50, 1329-1333.
Lucas, Robert E. Jr. (1975) “An Equilibrium Model of the Business Cycle”, Journal of Pohticul
Economy, 83, 1113-1144. _
Lucas. Robert E. Jr. (1976) “Econometric Policv Evaluation: A Critioue”, in: K. Brunner and A. H.
Mehzer, eds., Carnegie Rochester Conference ‘Series on Public PO/;&. Amsterdam: North-Holland,
19-46.
Lucas, Robert E. Jr. and Thomas J. Sargent (1981) “Introduction”, to their Rationul Expectutions and
Econometric Practice. University of Minneapolis.
Luenberger, David G. (1977) “Dynamic Equations in Descriptor Form”, IEEE Trunscrctions on
Automatic Control, AC-22, 312-321.
Marschak, Jacob (1953) “Economic Measurements for Policy and Prediction”, in: W. C. Hood and
T. C. Koopmans, eds., Studies in Econometric Method. Cowles Foundation Memograph 14, New
Haven: Yale University Press.
McCallum, Bennett T. (1976) “Rational Expectations and the Natural Rate Hypothesis: Some
Consistent Estimates”, Econometrica, 44, 43-52.
McCallum. Bennett T. (1979) “Topics Concerning the Formulation, Estimation, and Use of Macro-
econometric Models with Rational Expectation”, American Statistical Association ~ Proceedings of
the Business und Economics Section, 65-72.
McCallum, Bennett T. (1983) “On Non-Uniqueness in Rational Expectations: An Attempt at
Perspective”, Journal of Monetary Economics, 11, 139-168.
Mishkin, Frederic S. (1983) A Rational Expectations Approach to Macroeconometrics: Testmg Poliq
Ineffectiveness and EfJicient-Markets Models. Chicago: University of Chicago Press.
Muth, John F. (1961) “Rational Expectations and The Theory of Price Movements”, Economefricu.
29, 315-335.
Muth, John F. (1981) “Estimation of Economic Relationships Containing Latent Expectations
Variables”, reprinted in: R. E. Lucas and T. J. Sargent, eds., Rational Expectations rend Econometric
Practice. Minneapolis: University of Minnesota Press.
Papell, David (1984) “Anticipated and Unanticipated Disturbances: The Dynamics of The Exchange
Rate and The Current Account”, Journal of international Money und Finance, forthcoming.
Preston, A. J. and A. R. Pagan (1982) The Theory of Economic Policy. Cambridge: Cambridge
University Press.
Rehm, Dawn (1982) Staggered Contructs, Capital Flows, and Mucroeconomtc Stability tn The Open
Economy. Ph.D. Dissertation, Columbia University.
Rodriquez, Carlos A. (1980) “The Role of Trade Flows in Exchange Rate Determination: A Rational
Expectations Approach”, Journal of Politico1 Economy, 88, 1148-1158.
Sargent, Thomas J. (1979) Macroeconomic Theory. New York: Academic Press.
Sargent, Thomas J. and Neil Wallace (1973) “Rational Expectations and The Dynamics of Hyperinfla-
tion”, Internutional Economic Review, 14, 328-350.
Sargent, Thomas J. and Neil Wallace (1975) “‘Rational’ Expectations, The Optimal Monetary
Instrument, and The Optimal Money Supply Rule”, Journal of Political Economy: 83, 241-254. .
Summers, Lawrence H. (1981) “Taxation and Cornorate Investment: A a-Theorv _ Annroach”.
&.
Brookings Papers on Economic Activity, 1, 67-127. L
Taylor, John B. (1977) “Conditions for Unique Solutions in Stochastic Macroeconomic Models with
Rational Expectations”, Econometrica, 45, 1377-1385.
Ch. 34: Stobilizution Policy in Macroeconomic Fluctuations 2055
Taylor, John B. (1979) “Estimation and Control of a Macroeconomic Model with Rational Expecta-
tions”, Econometrica, 47, 1267-1286.
Taylor, John B. (1980a) “Aggregate Dynamics and Staggered Contracts”, Journal of Political
Economy, 88, l-23.
Taylor, John B. (1980b) “Output and Price Stability: An International Comparison”, Journul of
Economic Dynamics and Control, 2, 109-132.
Taylor, John B. (1982) “The Swedish Investment Fund System as a Stabilization Policy Rule”,
Brookings Pupers on Economic Activity, 1, 57-99.
Taylor, John B. (1983) “Union Wage Settlements During a Disinflation”, American Economic Review,
73, 981-993.
Wallis, Kenneth F. (1980) “Econometric Implications of The Rational Expectations Hypothesis”,
Econometricu, 48, 49-73.
Whiteman, Charles H. (1983) Linear Rutionul Expectations Models: A User’s Guide. Minneapolis:
University of Minnesota.
Wickens, M. (1982) “The Efficient Estimation of Econometric Models with Rational Expectations”,
Review of Economic Studies, 49, 55-68.
Wilson, Charles (1979) “Anticipated Shocks and Exchange Rate Dynamics”, Journal of Political
Economy, 87, 639-647.
Chupter 35
University ofPennsylvaniu
Contents
accounts, has the most variable velocity. Between these extremes it appears that
the further is the aggregate from control, the less variable is its associated velocity.
This is more a problem for the implementation of monetary policy that for the
construction of models.
But a problem for both policy formation and modeling is the recent introduc-
tion of new monetary instruments and technical changes in the operation of credit
markets. Electronic banking, the use of credit cards, the issuance of more
sophisticated securities to the average citizen are all innovations that befuddle the
monetary authorities and the econometrician. Authorities find that new instru-
ments are practically outside their control for protracted periods of time, espe-
cially when they are first introduced. They upset traditional patterns of seasonal
variation and generally enlarge the bands of uncertainty that are associated with
policy measures. They are problematic for econometricians because they establish
new modes of behavior and have little observational experience on which to base
sample estimates.
Side by side with monetary policy goes the conduct of fiscal policy. For many
years - during and after the Great Depression-fiscal policy was central as far as
macro policy was concerned. It was only when interest rates got significantly
above depression floor levels that monetary policy was actively used and shown to
be fairly powerful.
Fiscal policy is usually, but not necessarily, less flexible than monetary policy
because both the legislative and executive branches of government must approve
major changes in public revenues and expenditures. In a parliamentary system, a
government cannot survive unless its fiscal policy is approved by parliament, but
this very process frequently delays effective policy implementation. In a legislative
system of the American type, a lack of agreement may not bring down a
government, but it may seriously delay the implementation of policy. On the
other hand, central banking authorities can intervene in the functioning of
financial markets on a moment’s notice.
On the side of fiscal policy, there are two major kinds of instruments, public
spending and taxing. Although taxing is less flexible than monetary management,
it is considerably more flexible than are many kinds of expenditure policy. In
connection with expenditures, it is useful to distinguish between purchases of
goods or services and transfer payments. The latter are often as flexible as many
kinds of taxation instruments.
It is generally safer to focus on tax instruments and pay somewhat less
attention to expenditure policy. Tax changes have the flexibility of being made
retroactive when desirable. This can be done with some expenditures, but not all.
Tax changes can be made effective right after enactment. Expenditure changes,
for goods or services, especially if they are increases, can be long in the complete
making. Appropriate projects must be designed, approved, and executed. Often it
is difficult to find or construct appropriate large projects.
Ch. 35: Economic Policy Formation 2061
Tax policy can be spread among several alternatives such as personal direct
taxes, (either income or expenditure) business income taxes, or indirect taxes. At
present, much interest attaches to indirect taxes because of their ease of collec-
tion, if increases are being contemplated, or because of their immediate effect on
price indexes, if decreases are in order. Those taxes that are levied by local, as
opposed to national governments, are difficult to include in national economic
analysis because of their diversity of form, status, and amount.
Some tax policies are general, affecting most people or most sectors of the
economy all at once. But, speci$c, in contrast to general, taxes are important for
the implementation of structural policies. An expenditure tax focuses on stimulat-
ing personal savings. Special depreciation allowances or investment tax credits
aim at stimulating private fixed capital formation. Special allowances for R&D,
scientific research, or capital gains are advocated as important for helping the
process of entrepreneurial innovation in high technology or venture capital lines.
These structural policies are frequently cited in present discussions of industrial
policy.
A favorite proposal for strictly anti-inflationary policy is the linkage of tax
changes, either as rewards (cuts) or penalties (increases), to compliance by
businesses and households with prescribed wage/price guidelines. Few have ever
been successfully applied on a broad continuing scale, but this approach, known
as incomes policies, social contracts, or TIPS (tax based incomes policies), is
widely discussed in the scholarly literature.
These monetary and fiscal policies are the conventional macro instruments of
overall policies. They are important and powerful; they must be included in any
government’s policy spectrum, but are they adequate to deal with the challenge of
contemporary problems? Do they deal effectively with such problems as:
Structural policies, as distinct from macro policies, seem to be called for in order
to deal effectively with these specific issues.
If these are the kinds of problems that economic policy makers face, it is
worthwhile considering the kinds of policy decisions with instruments that have
to be used in order to address these issues appropriately, and consider the kind of
economic model that would be useful in this connection.
2062 L. R. Klein
For dealing with youth unemployment and related structural problems in labor
markets, the relevant policies are minimum wage legislation, skill training grants,
and provision of vocational education. These are typical things that ought to be
done to reduce youth unemployment. These policy actions require legislative
support with either executive or legislative initiative.
In the case of energy policy, the requisite actions are concerned with pricing of
fuels, rules for fuel allocation, controls on imports, protection of the terrain
against excessive exploitation. These are specific structural issues and will be
scarcely touched by macro policies. These energy issues also effect the environ-
ment, but there are additional considerations that arise from non-energy sources.
Tax and other punitive measures must be implemented in order to protect the
environment, but, at the same time, monitor the economic costs involved. The
same is true for policies to protect public health and safety. These structural
policies need to be implemented but not without due regard to costs that have
serious inflationary consequences. The whole area of public regulation of enter-
prise is under scrutiny at the present time, not only for the advantages that might
be rendered, but also for the fostering of competition, raising incentives, and
containing cost elements. It is not a standard procedure to consider the associated
inflationary content of regulatory policy.
Ever since the large harvest failures of the first half of the 1970s (1972 and
1975, especially) economists have become aware of the fact that special attention
must be paid to agriculture in order to insure a basic flow of supplies and
moderation in world price movements. Appropriate policies involve acreage
limitations (or expansions), crop subsidies, export licenses, import quotas, and
similar specific measures. They all have bearing on general inflation problems
through the medium of food prices, as components of consumer price indexes,
and of imports on trade balances.
Overall trade policy is mainly guided by the high minded principle of fostering
of conditions for the achievement of multilateral free trade. This is a macro
concept, on average, and has had recent manifestation in the implementation of
the “Tokyo Round” of tariff reductions, together with pleas for moderation of
non-tariff barriers to trade. Nevertheless, there are many specific breaches of the
principle, and specific protectionist policies are again a matter of concern. Trade
policy, whether it is liberal or protectionist, will actually be implemented through
a set of structural measures. It might mean aggressive marketing in search of
export sales, provision of credit facilities, improved port/storage facilities, and a
whole group of related policy actions that will, in the eyes of each country by
itself, help to preserve or improve its net export position.
We see then that economic policy properly understood in the context of
economic problems of the day goes far beyond the macro setting of tax rates,
overall expenditure levels, or establishing growth rates for some monetary aggre-
gates. It is a complex network of specific measures, decrees, regulations (or their
Ch. 35: Economic Policy Formation 2063
absence), and recommendations coming from all branches of the public sector. In
many cases they require government coordination. Bureaus, offices, departments,
ministries, head of state, and an untold number of public bodies participate in
this process. It does not look at all like the simple target-instrument approach of
macroeconomics, yet macroeconometric modeling, if pursued at the appropriate
level of detail, does have much to contribute. That will be the subject of sections
of this paper that follow.
The preceding section has just described the issues and actors in a very summary
outline. Let us now examine some of the underlying doctrine. The translation of
economic theory into policy is as old as our subject, but the modern formalism is
conveniently dated from the Keynesian Reuolution. Clear distinction should be
made between Keynesian theory and Keynesian policy, but as far as macro policy
is concerned, it derives from Keynesian theory.
The principal thrust of Keynesian theory was that savings-investment balance
at full employment would be achieved through adjustment of the aggregative
activity level of the economy. It was interpreted, at an early stage, in a framework
of interest-inelastic investment and interest-elastic demand for cash. This particu-
lar view and setting gave a secondary role to monetary policy. Direct effects on
the spending or activity stream were most readily achieved through fiscal policy,
either adding or subtracting directly from the flow of activity through public
spending or affecting it indirectly through changes in taxation. Thinking therefore
centered around the achievement of balance in the economy, at full employment,
by the appropriate choice of fiscal measures. In a formal sense, let us consider the
simple model
C=f(Y-T) consumption function
T=tY tax function
Z = g(AY) investment function
Y=C+Z+G output definition
where
G = public expenditures
A = time difference operator
Y= total output (or income, or activity level).
Fiscal policy means the choice of an appropriate value of t (tax rate), or level G
(expenditure), or mixture of both in order to achieve a target level of Y. This
could also be a dynamic policy, by searching for achievement of a target path of
Y through time. To complement dynamic policy it is important to work with a
2064 L. R. Klein
Mv=Y.
MiV, = Y.
‘See the various concepts in the contribution by Benjamin Friedman, op. tit
Ch. 35: Economic Policy Formation 2065
A search for a desired subscript i may attach great importance to the correspond-
ing stability of ui. It is my experience, for example, that in the United States, u2 is
more stable than ut.
More sophisticated concepts would be
Mu= 5 WiY_,,
i=o
or
or
The first says that M is proportional to long run Y or a distributed lag in Y. The
second says that M is proportional to a power of long run Y or merely that a
stable relationship exists between long run Y and M. Finally, the third says that
M is a function of long run price as well as long run real income (X). In these
relationships no attention is paid to subscripts for M, because the theory would
be similar (not identical) for any M, and proponents of monetarist policy simply
argue that a stable relationship should be found for the authorities for some Mi
concept, and that they should stick to it.
The distributed lag relationships in P_; and X_i are evidently significant
generalizations of the crude quantity theory, but in a more general view, the
principal thing that monetarists need for policy implementation of their theory is
a stable demand function for money. If this stable function depends also on
interest rates (in lag distributions), the theory can be only partial, and analysis
then falls back on the kind of mainstream general macroeconometric model used
in applications that are widely criticized by strict monetarists.3
The policy implications of the strict monetarist approach are clear and are,
indeed, put forward as arguments for minimal policy intervention. The propo-
nents are generally against activist fiscal policy except possibly for purposes of
indexing when price movements get out of hand. According to the basic monetarist
3The lack of applicability of the monetarist type relationship, even generalized dynamically, to the
United Kingdom is forcefully demonstrated by D. F. Hendry and N. R. Ericsson, “Assertion without
Empirical Basis: An Econometric Appraisal of Friedman and Schwartz’ ‘Monetary Trends in.. the
United Kingdom,“’ Monetary Trends in the United Kingdom, Bank of England Panel of Academic
Consultants, Panel Paper No. 22 (October 1983), 45-101.
2066 L. R. Klein
4The Wharton Quarterly Model, regularly used for short run business cycle analysis had 1000
equations in 1980, and the medium term Wharton Annual Model had 1595 equations, exclusive of
input-output relationships. The world system of Project LINK has more than 15,000 equations at the
present time, and is still growing.
L. R. Klein
In each of these sectors, there are several subsectors, some by type of product,
some by type of end use, some by age-sex-race, some by country of origin or
destination, some by credit market instrument, and some by level of government.
The production sector may have a complete input-output system embedded in
the model. Systems like these should not be classified as either Keynesian or
monetarist. They are truly eclectic and are better viewed as approximations to the
true but unknown Walrasian structure of the economy. These approximations are
not unique. The whole process of model building is in a state of flux because at
any time when one generational system is being used, another, better approxima-
tion to reality is being prepared. The outline of the equation structure for a
system combining input-output relations with a macro model of income de-
termination and final demand, is given in the appendix.
The next section will deal with the concrete policy making process through the
medium of large scale models actually in use. They do not govern the policy
process on an automatic basis, but they play a definite role. This is what this
presentation is attempting to show.
There is, however, a new school of thought, arguing that economic policy will
not get far in actual application because the smart population will counter public
officials’ policies, thus nullifying their effects. On occasion, this school of thought,
called the rational expectations school, indicate that they think that the use of
macroeconometric models to guide policy is vacuous, but on closer examination
their argument is seen to be directed at any activist policy, whether through the
model medium or not.
The argument, briefly put, of the rational expectations school is that economic
agents (household, firms, and institutions) have the same information about
economic performance as the public authorities and any action by the latter, on
the basis of their information has already been anticipated and will simply lead to
re-action by economic agents that will nullify the policy initiatives of the
authorities. On occasion, it has been assumed that the hypothetical parameters of
economic models are functions of policy variables and will change in a particular
way when policy variables are changed.5
‘R. Lucas, “Econometric Policy Evaluation: A Critique,” The Phillips Curve und Labor Mm-km,
eds., K. Brunner and A. K. Meltzer. (Amsterdam: North-Holland, 1976), 19-46.
Ch. 35: Economic Policy Formation 2069
C=(u+fi(Y-T)
P=B(CG)
This argument seems to me to be highly contrived. It is true that a generalization
of the typical model from fixed to variable parameters appears to be very
promising, but there is little evidence that the generalization should make the
coefficients depend in such a special way on exogeneous instrument variables.
The thought that economic models should be written in terms of the agent’s
perceptions of variables on the basis of their interpretation of history is sound.
The earliest model building attempts proceeded from this premise and introduced
lag distributions and various proxies to relate strategic parameter values, to
information at the disposal of both economic agents and public authorities, but
they did not make the blind intellectual jump to the conclusion that perceptions
of the public at large and authorities are the same. It is well known that the
public, at any time, holds widely dispersed views about anticipations for
the economy. Many do not have sophisticated perceptions and do not share the
perceptions of public authorities. Many do not have the qualifications or facilities
to make detailed analysis of latest information or history of the economy.
Econometric models are based on theories and estimates of the way people do
behave, not on the way they ought to behave under the conditions of some
hypothesized decision making rules. In this respect, many models currently in use,
contain data and variables on expressed expectations, i.e. those expected values
that can be ascertained from sample surveys. In an interesting paper dealing with
business price expectations, de Leeuw and McKelvey find that statistical evidence
on expected prices contradict the hypothesis of rationality, as one might expect.6
The rise of the rational expectations school is associated with an assertion that
the mainstream model, probably meaning the Keynesian model, has failed during
the 1970s. It principally failed because of its inability to cope with a situation in
which there are rising rates of inflation and rising rates of unemployment. In
standard analysis the two ought to be inversely related, but recently they have
been positively related. Charging that macroeconomic models have failed in this
situation, Lucas and Sargent, exponents of the school of rational expectations,
seek an equilibrium business cycle model consisting of optimizing behavior by
‘F. de Leeuw and M. McKelvey, “Price Expectations by Business Firms,” Brookings Papers on
Economic Activity, 1981) 299-314. The findings in this article have been extended, and they now
report that there is evidence in support of long run la& of bias in price expectations, a necessary but
not sutlicient condition for rationality of price expectations. See “Price Expectations of Business
Firms: Bias in the Short and Long Run,” American Economic Review, 14 (March 1984) 99-110.
2070 L. R. Klein
economic agents and the clearing of markets.7 Many, if not most, macroecono-
metric models are constructed piece-by-piece along these lines and have been for
the past 30 or more years. Rather than reject a whole body of analysis or demand
wholly new modelling approaches, it may be more fruitful to look more carefully
at the eclectic model that has, in fact, been in use for some time. If such models
have appropriate allowance for supply side disturbances, they can do quite well in
interpreting the events of the 1970s and even anticipated them in many instances.’
Rather than move in the direction of the school of rational expectations, I suggest
that we turn from the oversimplified model and the highly aggregative policy
instruments to the eclectic system that has large supply side content, together with
conventional demand side analysis and examine structural as well as macro
policies.
In the 1960s aggregative policies of Keynesian demand management worked
very well. The 1964 tax cut in the United States was a textbook example and
refutes the claim of the rational expectations school that parametric shifts will
nullify policy action. It also refutes the idea that we know so little about the
response pattern of the economy that we should refrain from activist policies.
Both the Wharton and Brookings Models were used for simulations of the 1964
tax cut.’ A typical policy simulation with the Wharton Model is shown in the
accompanying table.
This is a typical policy simulation with an econometric model, solving the
system dynamically, with and without a policy implementation. The results in the
above table estimate that the policy added about $10 billion (1958 $) to real GNP
and sacrificed about $7 billion in tax revenues. Actually, by 1965, the expansion
of the (income) tax base brought revenues back to their pre-tax cut position.
The Full Employment Act of 1946 in the United States was the legislation
giving rise to the establishment of the Council of Economic Advisers. Similar
commitments of other governments in the era following World War II and
reconstruction led to the formulation of aggregative policies of demand manage-
‘Robert S. Lucas and Thomas J. Sargent, “After Keynesian Macroeconomics”, After the Phillips
Cuwe: Persistence of High Inflation and High Unemployment. (Boston: Federal Reserve Bank of
Boston, 1978). 49-72.
‘L. R. Klein. “The Longevity of Economic Theory”, Quantitative WirtschaJf~/orschuilg, ed. by H.
Albach et al. (Tubingen: J. C. B. Mohr (Paul Siebeck), 1977) 411-19; “Supply Side Constraints in
Demand Oriented Systems: An Interpretation of the Oil Crisis”, ZeitschriJt Jiir Nationaliikonomie, 34
(1974). 45-56: “Five-year Experience of Linking National Econometric Models and of Forecasting
International Trade”, Quanritatioe Studies of International Economic Relations. H. Glejser, ed.
(Amsterdam: North-Holland, 1976). l-24.
‘L. R. Klein, “Econometric Analysis of the Tax Cut of 1964,” The Brookmgs Model: Some Furrher
Re.wlfs. ed. by J. Duesenberry et al. (Amsterdam: North-Holland, 1969).
Ch. 35: Economic Policy Formation 2071
Table 1
Comparative simulations of the tax cut of 1964 (The Wharton Model).
ment on a broad international scale. New legislation in the United States, under
the name of the Humphrey-Hawkins Bill, established ambitious targets for
unemployment and inflation during the early part of the 1980s. The bill, however,
states frankly that aggregative policy alone will not be able to accomplish the
objectives. Structural policies will be needed, and to formulate those, with
meaning, it will be necessary to draw upon the theory of a more extensive model,
manely, the Keynes-Leontief model.
The Wharton Annual Model is of the Keynes-Leontief type. It combines a
model of income generation and final demand determination with a complete
input-output system of 65 sectors and a great deal of demographic detail. It is
described in general terms in the preceding section and laid out in equation form
in the appendix. To show how some structural policies for medium term analysis
work out in this system, I have prepared a table with a baseline projection for the
1980s together with an alternative simulation in which the investment tax credit
has been increased (doubled to 1982 and raised by one-third thereafter), in order
to stimulate capital formation, general personal income taxes have been reduced
by about 6% and a tax has been placed on gasoline (5Oe per gallon).” To offset
the gasoline tax on consumers, sales taxes have been cut back, with some grants in
aid to state and local governments increased to offset the revenue loss of the sales
taxes.
These policies mix aggregative fiscal measures with some structural measures to
get at the Nation’s energy problem. Also, tax changes have been directed
specifically at investment in order to improve the growth of productivity and hold
down inflation for the medium term. It is an interesting policy scenario because it
simultaneously includes both stimulative and restrictive measures. Also, it aims to
steer the economy in a particular direction towards energy conservation and
inducement of productivity.
As the figures in Table 2 show, the policy simulation produces results that
induce more real output, at a lower price level. Lower unemployment accompa-
“The investment tax credit provides tax relief to business, figured as a percentage of an equipment
purchase, if capital formation is undertaken. The percentage has varied, but is now about 10 percent.
2072 L. R. Klein
Table 2
Estimated policy projections of the Wharton Annual Model 1980-89
(Deviation of policy simulation from baseline)
Selected economic indicators
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
nies the higher output, and the improvement in productivity contributes to the
lower price index. The lowering of indirect taxes offsets the inflationary impact of
higher gasoline taxes.
A cutback in energy use, as a result of the higher gasoline tax, results in a lower
BTU/GNP ratio. This holds back energy imports and makes the trade balance
slightly better in the policy alternative case.
A contributing factor to the productivity increase is the higher rate of capital
formation in the policy alternative. There are no surprises in this example. The
results come out as one would guess on the basis of a priori analysis, but the main
contribution of the econometric approach is to try to quantify the outcome and
provide a basis for net assessment of both the positive and negative sides of the
policy. Also, the differences from the base-line case are not very large. Economet-
ric models generally project moderate gains. To some extent, they underestimate
change in a systematic way, but they also suggest that the present inflationary
situation is deep seated and will not be markedly cured all at once by the range of
policies that is being considered.
“J. Tinbergen, On The Themy of Economic Policy. (Amsterdam: North Holland, 1952).
Ch. 35: Economic Policy Formation 2073
12The Wharton Quarterly Model (1980) has 432 stochastic equations, 568 identities, and 401
exogenous variables. The Wharton Annual Model (1980) had 647 stochastic equations, 948 identities
and 626 exogenous variables. Exclusive of identities, (and input-output relations) these each have
approximate balance between endogenous and exogenous variables.
2074 L.R.Klein
tively. The many dimensions of weather and climate that are so important for
determining agricultural output are the clearest examples of non-controllable
exogenous variables -with or without cloud seeding.
The econometric model within which these concepts are being considered will
be written as
Xl-$, . . ., XR2
n,+n,=n
zi, z 7_,..., Z m,
m,+m,=m
w=g(y,G).
For a particular target value of y(y*), we can find the appropriate instrument
value w = w* from the solution of
w*=g(y*&).
If the f-function were simple proportional, we can write the answer in closed
form as
y=&w
w*=- *
;Y .
For any desired value of y we can thus find the appropriate action that the
authorities must take by making w = w *. This will enable us to hit the target
exactly. The only exception to this remark would be that a legitimate target y *
required an unattainable or inadmissable w*. Apart from such inadmissible
solutions, we say that for this case the straightforward rule is to interchange the
roles of exogenous and endogenous variable and resolve the system, that is to say,
treat the n, = m, instruments as though they were unknown endogenous variables
and the n, = m, targets as though they were known exogenous variables. Then
solve the system for all the endogenous as functions of the exogenous variables so
classified.
It is obvious and easy to interchange the roles of endogenous and exogenous
variables by inverting the single equation and solving for the latter, given the
target value of the former. In a large complicated system linear or not, it is easy to
indicate how this may be done or even to write closed form linear expressions for
doing it in linear systems, but it is not easy to implement in most large scale
models.
2076 L. R. Klein
The solution for the desired instruments w* in terms of the targets y* and of z is
The relevant values come from the first n, rows of this solution.
This solution is not always easy to evaluate in practice. Whether the system is
linear or nonlinear, the usual technique employed in most econometric centers is
to solve the equations by iterative steps in what is known as the Gauss-Seidel
algorithm. An efficient working of this algorithm in large dynamic systems
designed for standard calculations of simulation, forecasting, multiplier analysis
and similar operations requires definite rules of ordering, normalizing, and
choosing step sizes. l3 It is awkward and tedious to re-do that whole procedure for
a transformed system in which some variables have been interchanged, unless
they are standardized.
It is simpler and more direct to solve the problem by searching (systematically)
for instruments that bring the n, values of y as “close” as possible to their targets
y*. There are many ways of doing this, but one would be to find the minimum
value of
L = g u,(y; - yI*y
i=l
subject to $;=;
where P = estimated value of F for 0 = 6
t = assigned values to error vector
In the theory of optimal economic policy, L is called a loss function and is
arbitrarily made a quadratic in this example. Other loss functions could equally
well be chosen. The ui are weights in the loss function and should be positive.
If there is an admissible solution and if n, = m,, the optimal value of the loss
function should become zero.
13L, R. Klein, A Textbook of Econometrics, (New York: Prentice-Hall, 1974). p. 239.
Ch. 35: Economic Policy Formation 2017
A more interesting optimization problem arises if ni 2 m,; i.e. if there are more
targets than instruments. In this case, the optimitation procedure will not, in
general, bring one all the way to target values, but only to a “minimum distance”
from the target. If m1 > n,, it would be possible, in principle, to assign arbitrary
values to m, - n, (superfluous) instruments and solve for the remaining n,
instruments as functions of the n, target values of y. Thus, the problem of excess
instruments can be reduced to the special problem of equal numbers of instru-
ments and targets.
It should be noted that the structural model is a dynamic system, and it is
unlikely that a static loss function would be appropriate. In general, economic
policy makers have targeted paths for y. A whole stream of y-values are generally
to be targeted over a policy planning horizon. In addition, the loss function could
be generalized in other dimensions, too. There will usually be a loss associated
with instrumentation. Policy makers find it painful to make activist decisions
about running the economy, especially in the industrial democracies; therefore, L
should be made to depend on w - w * as well as on y - y *. In the quadratic case,
covariation between yi - ri* might also be considered, but this may well be
beyond the comprehension of the typical policy maker.
A better statement of the optimal policy problem will then be
L= t { 5 ~~(y,,-y;)~+ j!JUi(wi,-w~)2)=min.
t=1 i=l i=l
The ui are weights associated with instrumentation losses. If future values are to
be discounted it may be desirable to vary ui and ui with t. A simple way would be
to write
many in retrospect, assessing what policy should have been.14 A noteworthy series
of experimental policies dealt with attempts to have alleviated the stagflation of
the late 1960s and the 1970s in the United States; in other words, could a
combination of fiscal and monetary policies have been chosen that would have led
to full (or fuller) employment without (so much) inflation over the period
1967-75?
The answers, from optimal control theory applications among many models,
suggest that better levels of employment and production could have been achieved
with very little additional inflationary pressures but that it would not have been
feasible to bring down inflation significantly at the same time. Some degree of
stagflation appears to have been inevitable, given the prevailing exogenous
framework.
Such retrospective applications are interesting and useful, but they leave one a
great distance from the application of such sophisticated measures to the positive
formulation of economic policy. There are differences between the actual and
optimal paths, but if tolerance intervals of error for econometric forecasts were
properly evaluated, it is not likely that the two solutions would be significantly
apart for the whole simulation path. If the two solutions are actually far apart, it
is often required to use extremely wide ranges of policy choice, wider and more
frequently changing than would be politically acceptable.
Two types of errors must be considered for evaluation of tolerance intervals,
var( 6)
var(e).
The correct parameter values are not known, they must be estimated from small
statistical samples and have fairly sizable errors. Also, there is behavioral error,
arising from the fact that models cannot completely describe the economy.
Appropriate valuation of such errors does not invalidate the use of models for
some kinds of applications, but the errors do preclude “fine tuning”.
A more serious problem is that the optimum problem is evaluated for a fixed
system of constraints; i.e. subject to
14A. Hirsch, S. Hymans, and H. Shapiro, “Econometric Review of Alternative Fiscal and Monetary
Policy, 1971-75,” Review of Economics and Statistics, LX (August, 1978), 334-45.
L. R. Klein and V. Su, “Recent Economic Fluctuations and Stabilization Policies: An Optimal
Control Approach,” Quantitative Economics and Deuelopment, (New York: Academic Press, 1980) cds,
L. R. Klein, M. Nerlove, and S. C. Tsiang.
M. B. Zarrop, S. Holly, B. Rutem, J. H. Westcott, and M. O’Connell, “Control of the LBS
Econometric Model Via a Control Model,” Optimal Control for Econometric Models, ed by S. Holly,
et al. (London: Macmillan, 1979), 23-64.
Ch. 35: Economic Policy Formation 2079
it has been found that highly favorable simulations can be constructed that
simultaneously come close to full employment and low inflation targets. These
simulation solutions were found with the same (Wharton) model that resisted full
target approach using the methods of optimal control. The wage and profits
(price) equations of the model had to be re-specified to admit
Alnw=Aln(X/hL)
Aln(PR/K)=Aln(X/hL)
PR = corporate profits
K = stock of corporate capital
Equations for wages and prices, estimated over the sample period had to be
removed, in favor of the insertion of these.t5
A creative policy search with simulation exercises was able to get the economy
to performance points that could not be reached with feasible applications of
optimal control methods. This will not always be the case, but will frequently be
so. Most contemporary problems cannot be fully solved by simple manipulation
of a few macro instruments, and the formalism of optimal control theory has very
limited use in practice. Simulation search for “good” policies, realistically for-
mulated in terms of parameter values that policy makers actually influence is
likely to remain as the dominant way that econometric models are used in the
policy process.
That is not to say that optimal control theory is useless. It shows a great deal
about model structure and instrument efficiency. By varying weights in the loss
function and then minimizing, this method can show how sensitive the uses of
policy instruments are. Also, some general propositions can be developed. The
more uncertainty is attached to model specification and estimation, the less
should be the amplitude of variation of instrument settings. Thus, William
Brainard has shown, in purely theoretical analysis of the optimum problem, that
Table 3
Growth assumptions and budget deficit fiscal policy planning, USA February 1984a
“Source: Baseline Budget Projections for Fiscal Years 1985-1989 Congressional Budget Office,
Washington, D. C. February 1984 Testimony of Rudolph G. Penner, Committee on Appropria-
tions, U.S. Senate, February 22, 1984.
16W. Brainard, “Uncertainty and the Effectiveness of Policy,” American Economic Review LVIII
May 1967), 411-25. See also L. Johansen, “ Targets and Instruments Under Uncertainty,” Institute of
Economics, Oslo, 1972. Brainard’s results do not, in all theoretical cases, lead to the conclusion that
instrument variability be reduced as uncertainty is increased, but that is the result for the usual case.
“See I. and F. Adelman, “The Dynamic Properties of the Klein-Goldberger Model,” Econometrica
27 (October 1959). 596-625. See also, Econometric Models of Cyclical Behavior, ed. B. G. Hickman
(New York: Columbia University Press, 1972).
Ch. 35: Economic Policy Formation 2081
are not known with great precision, it is argued that it is better not to introduce
them at all. An appropriate standard error of estimate is probably no larger than
f 1.0 year; therefore, they ought to be introduced with an estimated degree of
certainty.
The Congressional Budget Office in the United States has a fairly steady
expansion path for its baseline case, but introduces a cycle downturn for 1986, in
a low growth alternative case, between 4 and 5 years after the last downturn. It
would seem more appropriate to consider this as a baseline case, with the steady
growth projection an upper limit for a more favorable budget projection.
A series of randomly disturbed simulations of an estimated model
’y{i) \
’Zl
y2(i)
given ‘:f and initial conditions.
\ZH,
,YP,
The R stochastic projections will, on average, have cycles with random timing
and amplitude. They will produce R budget deficit estimates. The mean and
variance of these estimates can be used to construct an interval that includes a
given fraction of cases, which can be used to generate a high, low, and average
case for budget deficit values. The stochastic replications need not allow only for
drawings of e,(‘); they can also be used to estimate distributions of parameter
estimates for F.18 This is an expensive and time consuming way to generate policy
intervals, but it is a sound way to proceed in the face of uncertainty for
momentous macro problems.
It is evident from the table that provision for a business cycle, no matter how
uncertain its timing may be, is quite important. The higher and steadier growth
assumptions of the American administration produces, by far, the lowest fiscal
deficits in budgetary planning. A slight lowering of the steady path (by only 0.5
percentage points, 1986-89) produces much larger deficits, and if a business cycle
correction is built into the calculations, the rise in the deficit is very big. In the
cyclical case, we have practically a doubling of the deficit in five years, while in
the cycle-free case the rise is no more than about 50 percent in the same time
period.
Also, optimal control theory can be used to good advantage in the choice of
exogenous inputs for long range simulations. Suppose that values for
By optimizing about a balanced growth path for the endogenous variables, with
respect to choice of key exogenous variables, we may be able to indicate sensible
choices of these latter variables for a baseline path, about which to examine
alternatives. These and other analytical uses will draw heavily on optimal control
theory, but it is unlikely that such theory will figure importantly in the positive
setting of economic policy.
The role of the baseline (balanced growth) solution for policy making in the
medium or long term is to establish a reference point about which policy induced
deviations can be estimated. The baseline solution is not, strictly speaking, a
forecast, but it is a policy reference set of points. Many policy problems are long
term. Energy availability, other natural resource supplies, social insurance reform,
and international debt settlement are typical long term problems that use econo-
metric policy analysis at the present time.
At the present time, the theory of economic policy serves as a background for
development of policy but not for its actual implementation. There is too much
uncertainty about the choice of loss function and about the constraint system to
rely on this approach to policy formation in any mechanistic way.” Instead,
economic policy is likely to be formulated, in part at least, through comparison of
alternative simulations of econometric models.
In the typical formulation of policy, the following steps are taken:
“See, in this respect, the conclusions of the Royal Commission (headed by R. J. Ball) Committee on
Policy Optimisation, Report, (London: HMSO, 1978).
Ch. 35: Economic Policy Formation 2083
“Stephen McNees, “The Forecasting Record for the 197Os,” New England Economic Review,
(September/October, 1979), 33-53.
Vincent Su, “An Error Analysis of Econometric and Noneconometric Forecasts,” American Economic
Review, 68, (May, 1978), 360-72.
Ch. 35: Economic Policy Formation 2085
Some of the problems for which the LINK model has been used are: exchange
rate policy, agricultural policy associated with grain failures, oil pricing policy,
coordinated fiscal policies, coordinated monetary policies.
When the LINK system was first constructed, the Bretton Woods system of
fixed exchange rates was still in force. It was appropriate to make exchange rates
exogenous in such an environment. At the present time exchange rate equations
have been added in order to estimate currency rates endogenously. An interesting
application of optimal control theory can be used for exchange rate estimation
and especially for developing the concept of equilibrium exchange rates. Such
equilibrium rates give meaning to the concept of the degree of over- or under-
evaluation of rates, which may be significant for the determining of fiscal
intervention in the foreign exchange market.
In a system of multiple models, for given exchange rates there is a solution,
model by model, for
These are all endogenous variables in a multi-model world system. The equi-
librium exchange rate problem is to set targets for each trade balance at levels
that countries could tolerate at either positive or negative values for protracted
periods of time- or zero balance could also be imposed. The problem is then
transformed according to Tinbergen’s approach, and assumed values are given to
the trade balance, as though they are exogenous, while solutions are obtained for
The exchange rates are usually denominated in terms of local currency units per
U.S. dollar. For the United States, the trade balance is determined as a residual
by virtue of the accounting restraints.
i i
and the exchange rate in terms of U.S. dollars is, by definition, 1.0.
As noted earlier, this problem, although straightforward from a conceptual
point of view, is difficult to carry out in practice, especially for a system as large
2086 L. R. Klein
~{[(PX)~*x~-(PM)i*M~]-[(PX)j*Xl-(PM),*M,]*}2
i
=min=O,
with the entire LINK system functioning as a set of constraints. The minimization
is done with respect to the values of the exchange rates (instruments). With
modern computer technology, hardware, and software, this is a feasible problem.
Its importance for policy is to give some operational content to the concept of
equilibrium exchange rate values.
Optimal control algorithms built for project LINK to handle the multi-model
optimization problem have been successfully implemented to calculate Ronald
McKinnon’s proposals for exchange rate stabilization through monetary policy.21
As a result of attempts by major countries to stop inflation, stringent monetary
measures were introduced during October, 1979, and again during March, 1980.
American interest rates ascended rapidly reaching a rate of some 20% for short
term money. One country after another quickly followed suit, primarily to protect
foreign capital holdings and to prevent capital from flowing out in search of high
yields. An internationally coordinated policy to reduce rates was considered in
LINK simulations. Such international coordination would diminish the possibil-
ity of the existence of destabilizing capital flows across borders. Policy variables
(or near substitutes) were introduced in each of the major country models. The
resulting simulations were compared with a baseline case. Some world results are
shown, in the aggregate, in Table 4.
The results in Table 4 are purely aggregative. There is no implication that all
participants in a coordinated policy program benefit. The net beneficial results are
obtained by summing gains and losses. Some countries might not gain, individu-
ally, in a coordinated framework, but on balance they would probably gain if
coordination were frequently used for a variety of policies and if the whole world
economy were stabilized as a result of coordinated implementation of policy.
Coordinated policy changes of easier credit conditions helps growth in the
industrial countries. It helps inflation in the short run by lowering interest cost,
directly. Higher inflation rates caused by enhanced levels of activity are restrained
Table 4
Effects of coordinated monetary policy, LINK system world aggregates
(Deviation of policy simulation from baseline)
5. Prospects
‘*L. R. Klein, P. Beaumont, and V. Su, “Coordination of International Fiscal Policies and
Exchange Rate Revaluations,” Mode&g the International Transmission Mechanism. ed. J. Sawyer
(Amsterdam: North-Holland, 1979), 143-59.
H. Georgiadis, L. R. Klein, and V. Su, “International Coordination of Economic Policies,” Greek
Economic Review I (August, 1979), 27-47.
L. R. Klein, R. Simes, and P. Voisin, “Coordinated Monetary Policy and the World Economy,”
Prt%sion et Analyse iconomique, 2 (October 1981), 75-104.
23A new and promising approach is to make international policy coordination a dynamic game. See
Gilles Oudiz and Jeffrey Sachs, “Macroeconomic Policy Coordination among the Industrial Countries”
Brookings Papers on Economic Activity (1, 1984) l-64.
2088 L. R. Klein
has been carried quite far, possibly as far as it can in terms of methodological
development. There will always be new cases to consider, but the techniques are
not likely to be significantly improved upon. To some extent, formal methods of
optimal control can be further developed towards applicability. But significant
new directions can be taken through the development of more supply side content
in models to deal with the plethora of structural policy issues that now confront
economies of the world. This situation is likely to develop-further along supply
side lines. The bringing into play of joint Leontief-Keynes models with fully
articulated input-output systems, demographic detail, resource constraints and
environmental conditions are likely to be important for the development of more
specific policy decisions requiring the use of more micro details from models. This
is likely to be the next wave of policy applications, focusing on energy policy,
environmental policy, food policy, and other specific issues. It is clear that
econometric methods are going to play a major role in this phase of development.
The first five sectors listed on p. 2067 are the components of final demand as they
are laid out in the simple versions of the Keynesian macro model, extending the
cases cited earlier by the explicit introduction of inventory investment and foreign
trade. When the Keynesian system is extended to cover price and wage formation,
then the production function, labor requirements, labor supply and income
determination must also be included. These, together, make up the main compo-
nents of national income. Interest income and monetary relationships to generate
interest rates must also be included. This outlines, in brief form, the standard
macro components of the mainstream econometric model. The interindustry
relationships making up the input-output system round out the total model.
The flow of goods, in a numeraire unit, from sector i to sector j is denoted as
aij = Xi,/Xj
” n
where Fi is final demand, and the total number of sectors is n. In matrix notation
this becomes
(z-A)X=F.
Fc + F, -I- FG + FE - FM = F
F r;;., I;;.M
a,, = rC ; ail = -r;;., ; ai, = -e;;.G; aiE = F ; aiM = F .
Fc FI FG E M
’ Fc
4
9= FG
FE
\ - 44
F=CS
or
(z-A)X=CS
x= (I- A)%3
X= BY
where
/ 1 \
0
l- 5 ai,
i=l
B=
1
0
l- i ai,
r=l I
\
We observe also that the sum of Y. gives the GNP total, too.
Y= B-‘(I-A)-‘@.
PY’Y = P;G.
P;B-‘(14-b= Pi’“.
X= BY
(I - A’)Z’, = B-‘Py
P,=B(Z-A’)P,
The ratio q/Xj (value added to gross output of sector j) can be written as
;=I- t aij;
J i=l
they are the reciprocals of the diagonal elements of B. In matrix notation we have
P, = B - ‘Py + A’P,
Py = B(Z- A’)P,.
uj = elasticity of substitution.
24R. S. Preston, “The Wharton Long Term Model: Input-Output Within the Context of a Macro
Forecasting Model,” Econometric Model Performance, ed. by L. R. Klein and E. Burmeister, (Phila-
delphia, University of Pennsylvania Press, 1976), 271-87. In a new generation of this system, the
sector production functions are nested CES functions, with separate treatment for energy and
non-energy components of X,,.
Ch. 35: Economic Policy Formation 2093
This system has an implicit restriction that the elasticity of substitution between
pairs of intermediate inputs is invariant for each sector, across input pairs. This
assumption is presently being generalized as indicated in the preceding footnote.
In other models, besides the Wharton Model, different production function
specifications are being used for this kind of work, e.g. translog specifications.
The demand side coefficients of final expenditure have not yet been estimated
in terms of complete systems, but they could be determined as specifications of
complete expenditure systems.25
All these equations are stochastic and dynamic, often with adaptive adjustment
relations.
25See Theodore Gamaletsos, Forecasting Sectoral Final Demand by a Dynamic Expenditure System,
(Athens: Center of Planning and Economic Research, 1980), for a generalization of this expenditure
system.