0% found this document useful (0 votes)
126 views620 pages

Handbook of Econometrics Volume 3

This document discusses issues related to economic data used in econometrics. It notes that while data imperfections cause problems for econometricians, they also provide the field with its purpose. Econometricians have traditionally had a detached relationship with data collection, which is usually handled by government agencies. More recently, some economists have become directly involved in designing data collection for experiments. Overall, the quantity and accessibility of economic data has greatly increased in recent decades through longitudinal surveys and computerization, though firm-level data remains limited.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views620 pages

Handbook of Econometrics Volume 3

This document discusses issues related to economic data used in econometrics. It notes that while data imperfections cause problems for econometricians, they also provide the field with its purpose. Econometricians have traditionally had a detached relationship with data collection, which is usually handled by government agencies. More recently, some economists have become directly involved in designing data collection for experiments. Overall, the quantity and accessibility of economic data has greatly increased in recent decades through longitudinal surveys and computerization, though firm-level data remains limited.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 620

Chapter 25

ECONOMIC DATA ISSUES


ZVI GRILICHES*

Harvard University

Contents

1. Introduction: Data and econometricians - the uneasy alliance 1466


2. Economic data: An overview 1470
3. Data and their discontents 1472
4. Random measurement errors and the classic EVM 1476
5. Missing observations and incomplete data 1485
6. Missing variables and incomplete models 1495
7. Final remarks 1507
References 1509

*I am indebted to me National Science Foundation (SOC78-04279 and PRA81-08635) for their


support of my work on this range of topics, to John Bound, Bronwyn Hall, J. A. Hausman, and Ariel
Pakes for research collaboration and many discussions, and to 0. Ashenfelter, E. Berndt, F. M. Fisher,
R. M. Hauser, M. Intriligator, S. Kuznets, J. Medoff, and R. Vernon for comments on an earlier draft.

Hrrndhook of Econometrics, Volume III, Edited by Z. Griliches and M.D. Intriligator


@IElsevier Science Publishers B V. 1986
1466 Z. Griliches

1. Introduction: Data and econometricians - the uneasy alliance

Then the officers of the children of Israel came and cried


unto Pharaoh, saying, Wherefore dealest thou thus with thy servants?
There is no straw given unto thy servants, and they say
to us, Make brick: and behold thy servants are beaten; but the fault
is in thine own people.
But he said, Ye are idle, ye are idle: Therefore ye say,
Let us go and do sacrifice to the Lord.
Go therefore now, and work; for there shall no straw be
given you, yet shall ye deliver the tale of bricks.
Exodus 5,15-18

Econometricians have an ambivalent attitude towards economic data. At one


level, the “data” are the world that we want to explain, the basic facts that
economists purport to elucidate. At the other level, they are the source of all our
trouble. Their imperfection makes our job difficult and often impossible. Many a
question remains unresolved because of “multicollinearity” or other sins of the
data. We tend to forget that these imperfections are what gives us our legitimacy
in the first place. If the data were perfect, collected from well designed random-
ized experiments, there would be hardly room for a separate field of econometrics.
Given that it is the “badness” of the data that provides us with our living,
perhaps it is not all that surprising that we have shown little interest in improving
it, in getting involved in the grubby task of designing and collecting original data
sets of our own. Most of our work is on “found” data, data that have been
collected by somebody else, often for quite different purposes.
Economic data collection started primarily as a byproduct of other govemmen-
tal activities: tax and customs collections. Early on, interest was expressed in
prices and levels of production of major commodities. Besides tax records,
population counts, and price surveys, the earliest large scale data collection efforts
were various Censuses, family expenditure surveys, and farm cost and production
surveys. By the middle 1940s the overall economic data pattern was set: govem-
ments were collecting various quantity and price series on a continuous basis,
with the primary purpose of producing aggregate level indicators such as price
indexes and national income accounts series, supplemented by periodic surveys of
population numbers and production and expenditure patterns to be used prim-
arily in updating the various aggregate series. Little microdata was published or
accessible, except in some specific sub-areas, such as agricultural economics.
A pattern was also set in the way the data were collected and by whom they
were analyzed.’ With a few notable exceptions, such as France and Norway, and

‘See Kuznets (1971) and Morgenstem (1950) for earlier expressions of similar opinions. Morgen-
stern’s Cassandra like voice is still very much worth listening to on this range of topics.
Ch. 25: Economic Data Issues 1461

until quite recently, econometricians were not to be found inside the various
statistical agencies, and especially not in the sections that were responsible for
data collection. Thus, there grew up a separation of roles and responsibility.
“They” collect the data and “they” are responsible for all of their imperfections.
“We” try to do the best with what we get, to find the grain of relevant
information in all the chaff. Because of this, we lead a somewhat remote existence
from the underlying facts we are trying to explain. We did not observe them
directly; we did not design the measurement instruments; and, often we know
little about what is really going on (e.g. when we estimate a production function
for the cement industry from Census data without ever having been inside a
cement plant). In this we differ quite a bit from other sciences (including
observational ones rather than experimental) such as archeology, astrophysics,
biology, or even psychology where the “facts” tend to be recorded by the
professionals themselves, or by others who have been trained by and are super-
vised by those who will be doing the final data analysis. Economic data tend to be
collected (or often more correctly “reported’) by firms and persons who are not
professional observers and who do not have any stake in the correctness and
precision of the observations they report. While economists have increased their
use of surveys in recent years and even designed and commissioned a few special
purpose ones of their own, in general, the data collection and thus the responsibil-
ity for the quality of the collected material is still largely delegated to census
bureaus, survey research centers, and similar institutions, and is divorced from the
direct supervision and responsibility of the analyzing team.
It is only relatively recently, with the initiation of the negative income tax
experiments and various longitudinal surveys intended to follow up the effects of
different governmental programs, that econometric professionals had actually
become involved in the primary data collection process. Once attempted, the job
turned out to be much more difficult than was thought originally, and taught us
some humility.2 Even with relatively large budgets, it was not easy to figure out
how to ask the right question and to collect relevant answers. In part this is
because the world is much more complicated than even some of our more
elaborate models allow for, and partly also because economists tend to formulate
their theories in non-testable terms, using variables for which it is hard to find
empirical counterparts. For example, even with a large budget, it is difficult to
think of the right series of questions, answers to which would yield an unequiv-
ocal number of the level for “human capital” or “permanent income” of an
individual. Thinking about such “alibi-removing” questions should make us a bit
more humble, restrain our continuing attacks on the various official data produc-
ing agencies, and push us towards formulating theories with more regard to what
is observable and what kind of data may be available.

*See Hausman and Wise (1985).


1468 Z. Griliches

Even allowing for such reservations there has been much progress over the
years as a result of the enormous increase in the quantity of data available to us,
in our ability to manipulate them, and in our understanding of their limitations.
Especially noteworthy have been the development of various longitudinal micro-
data sets (such as the Michigan PSID tapes, and Ohio State NLS surveys, the
Wisconsin high school class follow-up study, and others),3 the computerization of
the more standard data bases and their easier accessibility at the micro, individual
response level (I have in mind here such developments as the Public Use Samples
from the U.S. Population Census and the Current Population Surveys).4 Unfor-
tunately, much more progress has been made with labor force and income type
data, where the samples are large, than in the availability of firm and other
market transaction data. While significant progress has been made in the collec-
tion of financial data and security prices, as exemplified in the development of the
CRISP and Compustat data bases which have had a tremendous impact on the
field of finance, we are still in our infancy as far as our ability to interrogate and
get reasonable answers about other aspects of firm behavior is concerned. Most of
the available microdata at the firm level are based on legally required responses to
questions from various regulatory agencies who do not have our interests exactly
in mind.
We do have, however, now a number of extensive longitudinal microdata sets
which have opened a host of new possibilities for analysis and also raised a whole
range of new issues and concerns. After a decade or more of studies that try to
use such data, the results have been somewhat disappointing. We, as econometri-
cians, have learned a great deal from these efforts and developed whole new
subfields of expertise, such as sample selection bias and panel data analysis. We
know much more about these kinds of data and their limitations but it is not clear
that we know much more or more precisely about the roots and modes of
economic behavior that underlie them.
The encounters between econometricians and data are frustrating and ulti-
mately unsatisfactory both because econometricians want too much from the data
and hence tend to be disappointed by the answers, and because the data are
incomplete and imperfect. In part it is our fault, the appetite grows with eating.
As we get larger samples, we keep adding variables and expanding our models,
until on the margin, we come back to the same insignificance levels.
There are at least three interrelated and overlapping causes of our difficulties:
(1) the theory (model) is incomplete or incorrect; (2) the units are wrong, either at
too high a level of aggregation or with no way of allowing for the heterogeneity of
responses; and, (3) the data are inaccurate on their own terms, incorrect relative
3See Borus (1982) for a recent survey of longitudinal data sets.
4This survey is, perforce, centered on U.S. data and experience, which is what I am most familiar
with. The overall developments, however, have followed similar patterns in most other countries.
Ch. 25: Economic Data Issues 1469

to what they purport to measure. The average applied study has to struggle with
all three possibilities.
At the macro level and even in the usual industry level study, it is common to
assume away the underlying heterogeneity of the individual actors and analyze
the data within the framework of the “representative” firm or “average” individ-
ual, ignoring the aggregation difficulties associated with such concepts. In analyz-
ing microdata, it is much more difficult to evade this issue and hence much
attention is paid to various individual “effects” and “heterogeneity” issues. This
is wherein the promise of longitudinal data lies - their ability to control and allow
for additive individual effects. On the other hand, as is the case in most other
aspects of economics, there is no such thing as a free lunch: going down to the
individual level exacerbates both some of the left out variables problems and
the importance of errors in measurement. Variables such as age, land quality, or
the occupational structure of an enterprise, are much less variable in the aggre-
gate. Ignoring them at the micro level can be quite costly, however. Similarly,
measurement errors which tend to cancel out when averaged over thousands or
even millions of respondents, loom much larger when the individual is the unit of
analysis.
It is possible, of course, to take an alternative view: that there are no data
problems only model problems in econometrics. For any set of data there is the
“right” model. Much of econometrics is devoted to procedures which try to assess
whether a particular model is “right” in this sense and to criteria for deciding
when a particular model fits and is “correct enough” (see Chapter 5, Hendry,
1983 and the literature cited there). Theorists and model builders often proceed,
however, on the assumption that ideal data will be available and define variables
which are unlikely to be observable, at least not in their pure form. Nor do they
specify in adequate detail the connection between the actual numbers and their
theoretical counterparts. Hence, when a contradiction arises it is then possible to
argue “so much worse for the facts.” In practice one cannot expect theories to be
specified to the last detail nor the data to be perfect or of the same quality in
different contexts. Thus any serious data analysis has to consider at least two data
generation components: the economic behavior model describing the stimulus-
response behavior of the economic actors and the measurement model, describing
how and when this behavior was recorded and summarized. While it is usual to
focus our attention on the former, a complete analysis must consider them both.
In this chapter, I discuss a number of issues which arise in the encounter
between the econometrician and economic data. Since they permeate much of
econometrics, there is quite a bit of overlap with some of the other chapters in the
Handbook. The emphasis here, however, is more on the problems that are posed
by the various aspects of economic data than on the specific technological
solutions to them.
1470 2. Griliches

After a brief review of the major classes of economic data and the problems
that are associated with using and interpreting them, I shall focus on issues that
are associated with using erroneous or partially missing data, discuss several
empirical examples, and close with a few final remarks.

2. Economic data: An overview

Data: fr. Latin, pluralof datum - given.


Observation: fr. Latin observare- to guard, watch.

It is possible to classify economic data along several different dimensions: (a)


Substantive: Prices, Quantities, Commodity Statistics, Population Statistics,
Banking Statistics, etc.; (b) Objective versus Subjective: Prices versus expectations
about them, actual wages versus self reported opinions about well being; (c) Type
and periodicity: Time series versus cross-sections; monthly, quarterly, or annual;
(d) Level of aggregation: Individuals, families, or firms (micro), and districts,
states, industries, sectors, or whole countries (macro); (e) Level of fabrication:
primary, secondary, or tertiary; (f) Quality: Extent, reliability and validity.
As noted earlier, the bulk of economic data is collected and produced by
various governmental bodies, often as a by-product of their other activities.
Roughly speaking, there are two major types of economic data: aggregate time
series on prices and quantities at the commodity, industry, or country level, and
periodic surveys with much more individual detail. In recent years, as various
data bases became computerized, economic analysts have gained access to the
underlying microdata, especially where the governmental reports are based on
periodic survey results. This has led to a great flowering of econometric work on
various microdata sets including longitudinal panels.
The level of aggregation dimension and the micro-macro dichotomy are not
exactly the same. In fact, much of the “micro” data is already aggregated. The
typical U.S. firm is often an amalgam of several enterprises and some of the larger
ones may exceed in size some of the smaller countries or states. Similarly,
consumer surveys often report family expenditure or income data which have
been aggregated over a number of individual family members. Annual income
and total consumption numbers are also the result of aggregation over more
detailed time periods, such as months or weeks, and over a more detailed
commodity and sources of income classification. The issues that arise from the
mismatch between the level of aggregation at which the theoretical model is
defined and expected to be valid and the level of aggregation of the available data
have not really received the attention they deserve (see Chapters 20 and 30 for
more discussion and some specific examples).
Ch. 25: Economic Data Issues 1471

The level of fabrication dimension refers to the “closeness” of the data to the
actual phenomenon being measured. Even though they may be subject to various
biases and errors, one may still think of reports of hours worked during last week
by a particular individual in a survey or the closing price of a specific common
stock on the New York Stock Exchange on December 31 as primary observations.
These are the basic units of information about the behavior of economic actors
and the information available to them (though individuals are also affected by the
macro information that they receive). They are the units in which most of our
microtheories are denominated. Most of our data are not of this sort, however.
They have usually already undergone several levels of processing or fabrication.
For example, the official estimate of total corn production in the State of Iowa in
a particular year is not the result of direct measurement but the outcome of a
rather complicated process of blending sample information on physical yields,
reports on grain shipments to and from elevators, benchmark census data from
previous years, and a variety of informal Bayes-like smoothing procedures to
yield the final official “estimate” for the state as a whole. The final results, in this
case, are probably quite satisfactory for the uses they are put to, but the
procedure for creating them is rarely described in full detail and is unlikely to be
replicable. This is even more true at the aggregated level of national income
accounts and other similar data bases, where the link between the original
primary observations and the final aggregate numbers is quite tenuous and often
mysterious.
I do not want to imply that the aggregate numbers are in some sense worse
than the primary ones. Often they are better. Errors may be reduced by aggrega-
tion and the informal and formal smoothing procedures may be based on correct
prior information and result in a more reliable final result. What needs to be
remembered is that the final published results can be affected by the properties of
the data generating mechanism, by the procedures used to collect and process the
data. For example, some of the time series properties of the major published
economic series may be the consequence of the smoothing techniques used in
their construction rather than a reflection of the underlying economic reality.
(This was brought forceably home to me many years ago while collecting
unpublished data on the diffusion of hybrid corn at the USDA when I came
across a circular instructing the state agricultural statisticians: “When in
doubt -use a growth curve.“) Some series may fluctuate because of fluctuations in
the data generating institutions themselves. For example, the total number of
patents granted by the U.S. Patent Office in a particular year depends rather
strongly on the total number of patent examiners available to do the job. For
budgetary and other reasons, their number has gone through several cycles,
inducing concomitant cycles in the actual number of patents granted. This last
example brings up the point that while particular numbers may be indeed correct
as far as they go, they do not really mean what we thought they did.
1472 2. Griliches

Such considerations lead one to consider the rather amorphous notion of data
“quality.” Ultimately, quality cannot be defined independently of the intended
use of the particular data set. In practice, however, data are used for multiple
purposes and thus it makes some sense to indicate some general notions of data
quality. Earlier I listed extent, reliability, and validity as the three major dimen-
sions along which one may judge the quality of different data sets. Extent is a
synonym for richness: How many variables are present, what interesting ques-
tions had been asked, how many years and how many firms or individuals were
covered? Reliability is actually a technical term in psychometrics, reflecting the
notion of replicability and measuring the relative amount of random measure-
ment error in the data by the correlation coefficient between replicated or related
measurement of the same phenomenon. Note that a measurement may be highly
reliable in the sense that it is a very good measure of whatever it measures, but
still be the wrong measure for our particular purposes.
This brings us to the notion of validity which can be subdivided in turn into
representativeness and relevance. I shall come back to the issue of how repre-
sentative is a body of data when we discuss issues of missing and incomplete data.
It will suffice to note here that it contains the technical notion of coverage: Did all
units in the relevant universe have the same (or alternatively, different but known
and adjusted for) probability of being selected into the sample that underlies this
particular data set? Coverage and relevance are related concepts which shade over
into issues that arise from the use of “proxy” variables in econometrics. The
validity and relevance questions relate less to the issue of whether a particular
measure is a good (unbiased) estimate of the associated population parameter and
more to whether it actually corresponds to the conceptual variable of interest.
Thus one may have a good measure of current prices which are still a rather poor
indicator of the currently expected future price and relatively extensive and well
measured IQ test scores which may still be a poor measure of the kind of
“ability” that is rewarded in the labor market.

3. Data and their discontents

My father would never eat “cutlets” (minced meat patties) in the old
country. He would not eat them in restaurants because he didn’t know
what they were made of and he wouldn’t eat them at home because he
did.
AN OLD FAMILY STORY

I will be able to touch on only a few of the many serious practical and conceptual
problems that arise when one tries to use the various economic data sets. Many of
these issues have been discussed at length in the national income and growth
measurement literature but are not usually brought up in standard econometrics
Ch. 25: Economic Data Issues 1413

courses or included in their curriculum. Among the many official and semi-official
data base reviews one should mention especially the Creamer GNP Improvement
report (U.S. Department of Commerce, 1979), the Rees committee report on
productivity measurement (National Academy of Sciences, 1979), the Stigler
committee (National Bureau of Economic Research, 1961) and the Ruggles
(Council on Wage and Price Stability, 1977) reports on price statistics, the
Gordon (President’s Committee to Appraise Employment Statistics, 1962), and
the Levitan (National Committee on Employment and Unemployment Statistics,
1979) committee reports on the measurement of employment and unemployment,
and the many continuous and illuminating discussions reported in the proceed-
ings volumes of the Conference on Research in Income and Wealth, especially in
volumes 19, 20, 22, 25, 34, 38, 45, 47, and 48 (National Bureau of Economic
Research, 1957...1983). All these references deal almost exclusively with U.S.
data, where the debates and reviews have been more extensive and public, but are
also relevant for similar data elsewhere.
At the national income accounts level there are serious definitional problems
about the borders of economic activity (e.g. home production and the investment
value of children) and the distinction between final and intermediate consumption
activity (e.g. what fraction of education and health expenditures can be thought
of as final rather than intermediate “goods” or “ bads”). There are also difficult
measurement problems associated with the existence of the underground economy
and poor coverage of some of the major service sectors. The major serious
problem from the econometric point of view probably occurs in the measurement
of “real” output, GNP or industry output in “constant prices,” and the associated
growth measures. Since most of the output measures are derived by dividing
(“deflating”) current value totals by some price index, the quality of these
measures is intimately connected to the quality of the available price data.
Because of this, it is impossible to treat errors of measurement at the aggregate
level as being independent across price and “quantity” measures.
The available price data, even when they are a good indicator of what they
purport to measure, may still be inadequate for the task of deflation. For
productivity comparisons and for production function estimation the observed
prices are supposed to reflect the relevant marginal costs and revenues in a, at
least temporary, competitive equilibrium. But this is unlikely to be the case in
sectors where output or prices are controlled, regulated, subsidized, and sold
under various multi-part tariffs. Because the price data are usually based on the
pricing of a few selected items in particular markets, they may not correspond
well to the average realized price for the industry as a whole during a particular
time period, both because “easily priced” items may not be representative of the
average price movements in the industry as a whole and because many transac-
tions are made with a lag, based on long term contracts. There are also problems
associated with getting accurate transactions prices (Kruskal and Telser, 1960 and
1474 2. Griliches

Stigler and Kindahl, 1970) but the major difficulty arises from getting compar-
able prices over time, from the continued change in the available set of commod-
ities, the “quality change” problem.
“Quality change” is actually a special version of the more general comparabil-
ity problem, the possibility that similarly named items are not really similar,
either across time or individuals. In many cases the source of similarly sounding
items is quite different: Employment data may be collected from plants (establish-
ments), companies, or households. In each case the answer to the same question
may have a different meaning. Unemployment data may be reported by a
teenager directly or by his mother, whose views about it may both differ and be
wrong. The wording of the question defining unemployment may have changed
over time and so should also the interpretation of the reported statistic. The
context in which a question is asked, its position within a series of questions on a
survey, and the willingness to answer some of the questions may all be changing
over time making it difficult to maintain the assumption that the reported
numbers in fact relate to the same underlying phenomenon over time or across
individuals and cultures.
The common notion of quality change relates to the fact that many commod-
ities are changing over time and that often it is impossible to construct ap-
propriate pricing comparisons because the same varieties are not available at
different times and in different places. Conceptually one might be able to get
around this problem by assuming that the many different varieties of a commod-
ity differ only along a smaller number of relevant dimensions (characteristics,
specifications), estimate the price-characteristics relationship econometrically and
use the resulting estimates to impute a price to the missing model or variety in the
relevant comparison period. This approach, pioneered by Waugh (1928) and
Court (1936) and revived by Griliches (1961) has become known as the “hedonic”
approach to price measurement. The data requirements for the application of this
type of an approach are quite severe and there are very few official price indexes
which incorporate it into their construction procedures. Actually, it has been used
much more widely in labor economics and in the analyses of real estate values
than in the construction of price deflator indexes. See Griliches (1971) Gordon
(1983), Rosen (1974) and Triplett (1975) for expositions, discussions, and exam-
ples of this approach to price measurement.
While the emergence of this approach has sensitized both the producers and the
consumers of price data to this problem and contributed to significant improve-
ments in data collection and processing procedures over time, it is fair to note
that much still remains to be done. In the U.S. GNP deflation procedures, the
price of computers has been kept constant since the early 1960s for lack of an
agreement of what to do about it, resulting in a significant underestimate in the
growth of real GNP during the last two decades. Similarly, for lack of a more
appropriate price index, aircraft purchases had been deflated by an equally
Ch. 25: Economic Data Issues 1415

weighted index of gasoline engine, metal door, and telephone equipment prices
until the early 197Os, at which point a switch was made to a price index based on
data from the CAB on purchase prices for “identical” models, missing thereby
the major gains that occurred from the introduction of the jet engine, and the
various improvements in operating efficiency over time.5 One could go on adding
to this gallery of horror stories but the main point to be made here is not that a
particular price index is biased in one or another direction. Rather, the point is
that one cannot take a particular published price index series and interpret it as
measuring adequately the underlying notion of a price change for a well specified,
unchanging, commodity or service being transacted under identical conditions
and terms in different time periods. The particular time series may indeed be
quite a good measure of it, or at least better than the available alternatives, but
each case requires a serious examination whether the actual procedures used to
generate the series do lead to a variable that is close enough to the concept
envisioned by the model to be estimated or by the theory under test. If not, one
needs to append to the model an equation connecting the available measured
variable to the desired but not actually observed correct version of this variable.
The issues discussed above affect also the construction and use of various
“capital” measures in production function studies and productivity growth
analyses. Besides the usual aggregation issues connected with the “existence” of
an unambiguous capital concept (see Diewert, 1980 and Fisher, 1969 on this) the
available measures suffer from potential quality change problems, since they are
usually based on some cumulated function of past investment expenditures
deflated by some combination of available price indexes. In addition, they are
also based on rather arbitrary assumptions about the pattern of survival of
machines over time and the time pattern of deterioration in the flow of their
services. The available information on the reasonableness of such assumptions is
very sparse, ancient, and flimsy. In some contexts it is possible to estimate the
appropriate pattern from the data rather than impose them a priori. I shall
present an example of this type of approach below.
Similar issues arise also in the measurement of labor inputs and associated
variables at both the macro and micro levels. At the macro level the questions
revolve about the appropriate weighting to be given to different types of labor:
young-old, male-female, black-white, educated vs. uneducated, and so forth.
The direct answer here as elsewhere is that they should be weighted by their
appropriate marginal prices but whether the observed prices actually reflect
correctly the underlying differences in their respective marginal productivities is
one of the more hotly debated topics in labor economics. (See Griliches, 1970 on
the education distinction and Medoff and Abraham, 1980 on the age distinction.)

‘For a recent review and reconstruction of the price indexes for durable producer goods see
Gordon’s (1985) forthcoming monograph.
1476 Z. Griliches

Connected to this is also the dilhculty of getting relevant labor prices. Most of the
usual data sources report or are based on data on average annual, weekly, or
hourly earnings which do not represent adequately either the marginal cost of a
particular labor hour to the employer or the marginal return to a worker from the
additional hour of work. Both are affected by the existence of overtime premia,
fringe benefits, training costs, and transportation costs. Only recently has an
employment cost index been developed in the United States. (See Triplett, 1983
on this range of issues.) From an individual worker’s point of view the existence
of non-proportional tax schedules introduces another source of discrepancy
between the observed wage rates and the unobserved marginal after tax net
returns from working (see Hausman, 1982, for a more detailed discussion).
While the conceptual discrepancy between the desired concepts and the avail-
able measures dominates at the macro level the more mundane topics of errors of
measurement and missing and incomplete data come to the fore at the micro,
individual survey level. This topic is the subject of the next section.

4. Random measurement errors and the classic EVM

To disavow an error is to invent retroactively.


Goethe

While many of the macro series may be also subject to errors, the errors in them
rarely fit into the framework of the classical errors-in-variables model (EVM) as it
has been developed in econometrics (see Chapter 23 for a detailed exposition).
They are more likely to be systematic and correlated over time.6 Micro data are
subject to at least three types of discrepancies, “errors,” and fit this framework
much better:
(a) Transcription, transmission, or recording error, where a correct response is
recorded incorrectly either because of clerical error (number transposition, skip-
ping a line or a column) or because the observer misunderstood or misheard the
original response.
(b) Response or sampling error, where the correct underlying value could be
ascertained by a more extensive sampling, but the actual observed value is not
equal to the desired underlying population parameter. For example, an IQ test is
based on a sample of responses to a selected number of questions. In principle,
the mean of a large number of tests over a wide range of questions would

6For an “error analysis” of national income account data based on the discrepancies between
preliminary and “final” estimates see Cole (1969) Young (1974), and Haitovsky (1972). For an earlier
more detailed evaluation based on subjective estimates of the differential quality of the various
“ingredients” (series) of such accounts see Kuznets (1954, chapter 12).
Ch. 25: Economic Data Issues 1417

converge to some mean level of “ability” associated with the range of subjects
being tested. Similarly, the simple permanent income hypothesis would assert that
reported income in any particular year is a random draw from a potential
population of such incomes whose mean is “permanent income.” This is the case
where the observed variable is a direct but fallible indicator of the underlying
relevant “ unobservable,” “ latent factor” or variable (see Chapter 23 and Griliches,
1974, for more discussion of such concepts).
(c) When one is lacking a direct measure of the desired concept and a “proxy”
variable is used instead. For example, consider a model which requires a measure
of permanent income and a sample which has no income measures at all but does
have data on the estimated market value of the family residence. This housing
value may be related to the underlying permanent income concept, but not clearly
so. First, it may not be in the same units, second it may be affected by other
variables also, such as house prices and family size, and third there may be
“random” discrepancies related to unmeasured locational factors and events that
occurred at purchase time. While these kinds of “indicator” variables do not fit
strictly into the classical EVM framework, their variances, for example, need not
exceed the variance of the true “unobservable,” they can be fitted into this
framework and treated with the same methods.
There are two classes of cases which do not really fit this framework: Occasion-
ally one encounters large transcription and recording errors. Also, sometimes the
data may be contaminated by a small number of cases arising from a very
different behavioral model and/or stochastic process. Sometimes, these can be
caught and dealt with by relatively simple data editing procedures. If this kind of
problem is suspected, it is best to turn to the use of some version of the “robust
estimation” methods discussed in Chapter 11. Here we will be dealing with the
more common general errors-in-measurement problem, one that is likely to affect
a large fraction of our observations.
The other case that does not fit our framework is where the true concept, the
unobservable is distributed randomly relative to the measure we have. For
example, it is clear that the “number of years of school completed” (S) is an
erroneous measure of true “education” (E), but it is more likely that the
discrepancy between the two concepts is independent of S rather than E. I.e. the
“error” of ignoring differences in the quality of schooling may be independent of
the measured years of schooling but is clearly a component of the true measure of
E. The problem here is a left-out relevant variable (quality) and not measurement
error in the variable as is (years of school). Similarly, if we use the forecast of
some model, based on past data, to predict the expectations of economic actors,
we clearly commit an error, but this error is independent of the forecast level (if
this forecast is optimal and the actors have had access to the same information).
This type of “error” does not induce a bias in the estimated coefficients and can
be incorporated into the standard disturbance framework (see Berkson, 1950).
1478 Z. Griliches

The standard EVM assumes the existence of a true relationship

y=a+pz+e, (4.1)
the absence of direct observations on z, and the availability of a fallible measure
of it

X=Z+& 7 (4.2)
where E is a purely random i.i.d. measurement error, with EE = 0, and no
correlation with either z or y. This is quite a restrictive set of assumptions,
especially the assumption of the errors not being correlated with anything else in
the model including their own past values. But it turns out to be very useful in
many contexts and not too far off for a variety of micro data sets. I will discuss
the evidence for the existence of such errors further on, when we turn to consider
briefly various proposed solutions to the estimation problem in such models, but
the required assumptions are not more difficult than those made in the standard
linear regression model which requires that the “disturbance” e, the model
discrepancy, be uncorrelated with all the included explanatory variables.
It may be worthwhile, at this point, to summarize the main conclusions from
the EVM for the standard OLS estimates in contexts where one has ignored the
presence of such errors. Estimating

y=a+bx+u, (4.3)

where the true model is the one given above yields - PA as the asymptotic bias of
the OLS 8, where X = u,‘/u: is a measure of the relative amount of measurement
error in the observed x series. The basic conclusion is that the OLS slope estimate
is biased towards zero, while the constant term is biased away from zero. Since, in
this model one can treat y and x symmetrically, it can be shown (Schultz, 1938,
Frisch, 1934, Klepper and Learner, 1983) that in the “other regression,” the
regression of x on y, the slope coefficient is also biased towards zero, implying a
“bracketing” theorem

plim byx < /3< l/plim bxy . (4.4)

These results generalize also to the multivariate case. In the case of two indepen-
dent variables (xi and x2), where only one (xi) is subject to error, the coefficient
of the other variable (the one not subject to errors of measurement) is also biased
(unless the two variables are uncorrelated). That is, if the true model is

Y = a + Pizi + P2x2 + e, (4.5)


xi = Ii + E,
Ch. 25: Economic Data Issues 1479

then

plim(by+. x2 - pJ=-Bl~/(l-P*)~ (4.6)


where p is the correlation between the two observed variables xi and x2, and if
we scale the variables so that u:, = a:* = 1, then

P~(~,,z.,1-P2)=pa,x/(1-P2) (4.7)
= - P[bias&].

That is, the bias in the coefficient of the erroneous variable is “transmitted” to the
other coefficients, with an opposite sign (provided, as is often the case, that
p > 0), (see Griliches and Ringstad, 1971, Appendix C, and Fisher, 1980 for the
derivation of this and related formulae).
If more than one independent variable is subject to error, the formulae become
more complicated, but the basic pattern persists. If both zi and z2 are unob-
served and x,=z,+ Ed, x2=z2+z2, where the E’S are independent (of each
other) errors of measurement, and we have normalized the variables so that
a*Xl= IJ:~=~, then

plim(b,,.2-P,)=-B,h,/(l-p2)+P2X2p/(1-~2) (4-g)

J$(l-@),

with a similar symmetric formula for plim by2.1. Thus, in the multivariate case, the
bias is increased by the factor l/(1 - p*), the reduction in the independent
variance of the true signal due to its intercorrelation with the other variable(s),
and attenuated by the fact that the particular variable compensates somewhat for
the downward bias in the other coefficients caused by the errors in the other
variables. Overall, there is still a bias towards zero. For example, in this case the
sum of the estimated coefficients is always biased towards zero:

plim[(b,,.,+bY2.,)-(&+82)] =-]&&+~2~2]/(l+P)- (4.9)

It is a declining function of p, for p > 0, which is reasonable it we remember that


p is defined as the intercorrelation between the observed x ‘s. The higher it is, the
smaller must be the role of independent measurement errors in these variables.
The impact of errors in variables on the estimated coefficients can be magnified
by some transformations. For example, consider a quadratic equation in the
unobserved true z:

y=a+pz+Yz*+e, (4.10)
1480 Z. Griliches

with the observed


x=z+&,

substituted instead. If both z and E are normally distributed, it can be shown


(Griliches and Ringstad, 1970) that

plim&=/?(l-X), (4.11)

while

plim? = y(l- A)*,

where i, and 2 are the estimated OLS coefficients in the y = a + bx + cx* + u


equation. That is, higher order terms of the equation are even more affected by
errors ;n measurement than lower order ones.
The impact of errors in the levels of the variables may be reduced by
aggregation and aggravated by differencing. For example, in the simple model
y = (Y+ /3z + e, x = z + E, the asymptotic bias in the OLS by, is equal to - /3X,
while the bias of the first differenced estimator [ y, - yt_ 1 = b(x, - x,_ 1)+ u,] is
equal to - /3X/(1 - p) where p now stands for the first order serial correlation of
the x’s, and can be much higher than in levels (for p > 0 and not too small).
Similarly, computing “within” estimates in panel data, or differencing across
brothers or twins in micro data, can result in the elimination of much of the
relevant variance in the observed x’s, and a great magnification of the noise to
signal ratio in such variables. (See Griliches, 1979, for additional exposition and
examples.)
In some cases, errors in different variables cannot be assumed to be indepen-
dent of each other. To the extent that the form of the dependence is known, one
can derive similar formulae for these more complicated cases. The simplest and
commonest example occurs when a variable is divided by another erroneous
variable. For example, “wage rates” are often computed as the ratio of payroll to
total man hours. To the extent that hours are measured with a multiplicative
error, so will be also the resulting wage rates (but with opposite sign). In such
contexts, the biases of (say) the estimated wage coefficient in a log-linear labor
demand function will be towards - 1 rather than zero.
The story is similar, though the algebra gets a bit more complicated, if the z’s
are categorical or zero-one variables. In this case the errors arise from misclas-
sification and the variance of the erroneously observed x need not be higher than
the variance of the true z. Bias formulae for such cases are presented in Aigner
(1973) and Freeman (1984).
How does one deal with errors of measurement? As is well known, the standard
EVM is not identified without the introduction of additional information, either
in the form of additional data (replication and/or instrumental variables) or
additional assumptions.
Ch. 25: Economic Data Issues 1481

Procedures for estimation with known h’s are outlined in Chapter 23. Occa-
sionally we have access to “replicated” data, when the same question is asked on
different occasions or from different observers, allowing us to estimate the
variance of the “true” variable from the covariance between the different mea-
sures of the same concept. This type of an approach has been used in economics
by Bowles (1972) and Borus and Nestel(1973) in adjusting estimates of parental
background by comparing the reports of different family members about the same
concept, and by Freeman (1984) on a union membership variable, based on a
comparison of worker and employer reports. Combined with a modelling ap-
proach it has been pursued vigorously and successfully in sociology in the works
of Bielby, Hauser, and Featherman (1977), Massagli and Hauser (1983) and
Mare and Mason (1980). While there are difficulties with assuming a similar error
variance on different occasions or for different observers, such assumptions can be
relaxed within the framework of a larger model. This is indeed the most
promising approach, one that brings in additional independent evidence about
the actual magnitude of such errors.
Almost all other approaches can be thought of as finding a reasonable set of
instrumental variables for the problem, variables that are likely to be correlated
with the true underlying z, but not with either the measurement error E or the
equation error (disturbance) e. One of the earlier and simpler applications of this
approach was made by Griliches and Mason (1972) in estimating an earnings
function and worrying about errors in their ability measure (AFQT test scores).
In a “true” equation of the form

y=a+ps+ya+Sx+e, (4.12)

where y = log wages, s = schooling, a = ability, and x = other variables, they


substituted an observed test score t for the unobserved ability variable and
assumed that it was measured with random error: t = a + E. They used then a set
of background variables (parental status, regions of origin) as instrumental
variables, the crucial assumption being that these background variables did not
belong in this equation on their own accord. Chamberlain and Griliches (1975
and 1977) used “purged” information from the siblings of the respondents as
instruments to identify their models (see also Chamberlain, 1971).
Various “grouping” methods of estimation, which use city averages (Friedman,
1957) industry averages (Pakes, 1983), or size class averages (Griliches and
Ringstad, 1971), to “cancel out” the errors, can be all interpreted as using the
classification framework as a set of instrumental dummy variables which are
assumed to be correlated with differences in the underlying true values and
uncorrelated with the random measurement errors or the transitory fluctuations.’

‘Grouping methods that do not use an “outside” grouping criterion but are based on grouping on x
alone (or using its ranks as instruments) are not in general consistent and need not reduce the EV
induced bias. (See Pakes, 1982).
1482 Z. Griliches

The more complete MIMIC type models (Multiple indicators-multiple causes


model, see Hauser and Goldberger, 1971) are basically full information versions
of the instrumental variables approaches, with an attempt to gain efficiency by
specifying the complete system in greater detail and estimating jointly. In the
Griliches- Mason example, such a model would consist of the following set of
equations:

a = xs, + g,
t=a+e, (4.13)
s = xs, + y,a + u,
J’ = ps + y,a + e,

where a is an unobserved “ability” factor, and the “unique” disturbances g, e, u,


and E are assumed all to be mutually uncorrelated. With enough distinct x’s and
S, # a,, this model is estimable either by instrumental variable methods or
maximum likelihood methods. The maximum likelihood versions are equivalent
to estimating the associated reduced form system:

t=xs,+g+&,

s = x(&2+ $J+ -r,g + u, (4.14)

Y = x [a, + (v,P + Ye)&] + (u,P + y2)g + Pu + e,

imposing the non-linear parameter restrictions across the equations and retrieving
additional information about them from the variance-covariance matrix of the
residuals, given the no-correlation assumption about the E’S, g’s, U’S,and e’s. It
is possible, for example, to retrieve an estimate of p + y2/y1 from the
variance-covariance matrix and pool it with the estimates derived from the
reduced form slope coefficients. In larger, more over-identified models, there are
more binding restrictions connecting the variance- covariance matrix of the
residuals with the slope parameter estimates. Chamberlain and Griliches (1975)
used an expanded version of this type of model with sibling data, assuming that
the unobserved ability variable has a variance-components structure. Aasness
(1983) uses a similar framework and consumer expenditures survey data to
estimate Engel functions and the unobserved distribution of total consumption.
All of these models rely on two key assumptions: (1) The original model
y = (Y+ bz + e is correct for all dimensions of the data. I.e. the /3 parameter is
stable and (2) The unobserved errors are uncorrelated in some well specified
known dimension. In cross-sectional data it is common to assume that the z’s (the
“true” values) and the E’S (the measurement errors) are based on mutually
independent draws from a particular population. It is not possible to maintain
Ch. 25: Economic Data Issues 1483

this assumption when one moves to time series data or to panel data (which are a
cross-section of time series), at least as far as the z’s are concerned. Identification
must hinge then on known differences in the covariance generating functions of
the z’s and the E’S. The simplest case is when the E’S can be taken as white (i.e.
uncorrelated over time) while the z’s are not. Then lagged x’s can be used as
valid instruments to identify /3. For example, the “contrast” estimator suggested
by Kami and Weisman (1974) which combines the differentially biased level
(plim b = /3 - /?X) and first difference estimators [plim b, = fi - PA/(1 - p)] to
derive consistent estimators for fl and A, can be shown, for stationary x and y, to
be equivalent (asymptotically) to the use of lagged x’s as instruments.
While it may be difhcult to maintain the hypothesis that errors of measurement
are entirely white, there are many different interesting cases which still allow the
identification of /3. Such is the case if the errors can be thought of as a
combination of a “permanent” error or misperception of or by individuals and a
random independent over time error component, The first part can be encom-
passed in the usual “correlated” or “fixed” effects framework with the “within”
measurement errors being white after all. Identification can be had then from
contrasting the consequences of differencing over differing lengths of time.
Different ways of differencing all sweep out the individual effects (real or errors)
and leave us with the following kinds of bias formulae:

plimb,,=P(l-2a,2/&), (4.15)

where u,” is the variance of the independent over time component of the E’S, 1A
denotes the transformation x1 - x1 while 24 indicates differences taken two
periods apart: x3 - xi and so forth, and the s2’s are the respective variances of
such differences in x. (4.15) can be solved to yield:

/j= ‘%-‘+A and &;,‘= (b - 28


bz,)$A, (4.16)
dA - s$

where wjAis the covariance of j period differences in y and x. This in turn, can
be shown to be equivalent to using past and future x’s as instruments for the first
differences.*
More generally, if one were willing to assume that the true z’s are non-sta-
tionary, which is not unreasonable for many evolving economic series, but the
measurement errors, the E’S,are stationary, then it is possible to use panel data to
identify the parameters of interest even when the measurement errors are corre-

*See Griliches and Hausman (1984) for details, generalizations, and an empirical example.
1484 Z. Griliches

lated over time.’ Consider, for example, the simplest case of T = 2. The probabil-
ity limit of the variance-covariance matrix between y and x is given by:

im (4.17)

where now sth stands for the variances and covariances of the true z’s, a* is the
variance of the E’S, and p is their first order correlation coefficient. It is obvious
that if the z’s are non-stationary then (covy,x, - covyzx2)/(varx1 - varx,) and
(covy,x, - covy*xJ/(covxlxz -covx2x1) yield consistent estimates of fl. In
longer panels this approach can be extended to accommodate additional error
correlations and the superimposition of “correlated effects” by using its first
differences analogue.
Even if the z’s were stationary, it is always possible to handle the correlated
errors case provided the correlation is known. This rarely is the case, but
occasionally a problem can be put into this framework. For example, capital
measures are often subject to measurement error but these errors cannot be taken
as uncorrelated over time, since they are cumulated over time by the construction
of such measures. But if one were willing to assume that the errors occur
randomly in the measurement of investment and they are uncorrelated over time,
and the weighting scheme (the depreciation rate) used in the construction of the
capital stock measure is known, then the correlation between the errors in the
stock levels is also known.
For example, if one is interested in estimating the rate of return to some capital
concept, where the true equation is

rt=a+rK,*+e,, (4.18)

v is a measure of profits and K * is defined as a geometrically weighted average


of past true investments It*:

K,*=Zt*+XK;C_1=Zt*+XZtT1+A2Zt~2+ ..-, (4.19)

but we do not observe I,* or Kt* only

z, = z,* -t E,, (4.20)

91 am indebted to A. Pakes for this point.


Ch. 25: Economic Data Issues 1485

where E, is an i.i.d. error of measurement and the observed K, = Zxir,_i is


constructed from the erroneous I series, then if h is taken as known, which is
implicit in most studies that use such capital measures, instead of running
versions of (4.18) involving K, and dealing with correlated measurement errors
we can estimate

q-Ax?r,_,= a(1 - X)+ ‘I, + u, - xu,_r - t-s,, (4.21)

which is now in standard EVM form, and use lagged values of I as instruments.
Hausman and Watson (1983) use a similar approach to estimate the seasonality in
the unemployment series by taking advantage of the known correlation in the
measurement errors introduced by the particular structure of the sample design in
their data.
One needs to reiterate, that in these kinds of models (as is also true for the rest
of econometrics) the consistency of the final estimates depends both on the
correctness of the assumed economic model and the correctness of the assump-
tions about the error structure. lo We tend to focus here on the latter, but the
former is probably more important. For example, in Friedman’s (1957) classical
permanent income consumption function model, the estimated elasticity of con-
sumption with respect to income is a direct estimate of one minus the error ratio
(the ratio of the variance of transitory income to the variance of measured
income). But this conclusion is conditional on having assumed that the true
elasticity of consumption with respect to permanent income is unity. If that is
wrong, the first conclusion does not follow. Similarly in the profit-capital stock
example above, we can do something because we have assumed that the true
depreciation is both known and geometric. All our conclusions about the amount
of error in the investment series are conditional on the correctness of these
assumptions.

5. Missing observations and incomplete data

This could but have happened once,


And we missed it, lost it forever.
Browning

Relative to our desires data can be and usually are incomplete in many different
ways. Statisticians tend to distinguish between three types of “missingness”:
undercoverage, unit non-response, and item non-response (NAS, 1983). Under-
coverage relates to sample design and the possibility that a certain fraction of the

“The usual assumption of normality of such measurement and response errors may not be tenable
in many actual situations. See Ferber (1966) and Hamilton (1981) for empirical evidence on this point.
1486 Z. Griliches

relevant population was excluded from the sample by design or accident. Unit
non-response relates to the refusal of a unit or individual to respond to a
questionnaire or interview or the inability of the interviewers to find it. Item
non-response is the term associated with the more standard notion of missing
data: questions unanswered, items not filled in, in a context of a larger survey or
data collection effort. This term is usually applied to the situation where the
responses are missing for only some fraction of the sample. If an item is missing
entirely, then we are in the more familiar omitted variables case to which I shall
return in the next section.
In this section I will concentrate on the case of partially missing data for some
of the variables of interest. This problem has a long history in statistics and
somewhat more limited history in econometrics. In statistics, most of the discus-
sion has dealt with the randomly missing, or in newer terminology, ignorable carve
(see Rubin, 1976, and Little, 1982) where, roughly speaking, the desired parame-
ters can be estimated consistently from the complete data subsets and “missing
data” methods focus on using the rest of the available data to improve the
efficiency of such estimates.
The major problem in econometrics is not just missing data but the possibility
(or more accurately, probability) that they are missing for a variety of self-selec-
tion reasons. Such “behavioral missing” implies not only a loss of efficiency but
also the possibility of serious bias in the estimated coefficients of models that do
not take this into account. The recent revival of interest in econometrics in limited
dependent variables models, sample-selection, and sample self-selection problems
has provided both the theory and computational techniques for attacking this
problem. Since this range of topics is taken up in Chapter 28, I will only allude to
some of these issues as we go along. It is worth noting, however, that this area has
been pioneered by econometricians (especially Amemiya and Heckman) with
statisticians only recently beginning to follow in their footsteps (e.g. Little, 1983).
The main emphasis here will be on the no-self-selection ignorable case. It is of
some interest, because these kinds of methods are widely used, and because it
deals with the question of how one combines scraps of evidence and what one can
learn from them. Consider a simple example where the true equation of interest is

y=/?x+yz+e, (5.1)

where e is a random term satisfying the usual OLS assumptions and the constant
has been suppressed for notational ease. /3 and y could be vectors and x and z
could be matrices, but I will think of them at first as scalars and vectors
respectively. For some fraction A[n2/( n, + nz)] of our sample we are missing
observations (responses) on x. Let us rearrange the data and call the complete
data sample A and the incomplete sample B. Assume that it is possible to
Ch. 25: Economic Data Issues 1487

describe the data generating mechanism by the following model

d=l if g(x,z,m;e)+E20,
d=O if g(x,z,m;@)+e<O, (5.2)

where d = 1 implies that the observation is in set A, it is complete; d = 0 implies


that x is missing, m is another variable(s) determining the response or sampling
mechanism, B is a set of parameters, and E is a random variable, distributed
independently of x, z, and m. The incomplete data problem is ignoruble if (1) E
(and m) are distributed independently of e and (2) there is no connection or
restrictions between the parameters 19and B and y. If these conditions hold then
one can estimate j3 and y from the complete data subset A and ignore B. Even if
0 and /3 and y are connected, if E and e are independent, p and y can be
estimated consistently in A but now some information is lost by ignoring the data
generating process. (See Rubin, 1976 and Little, 1982 for more rigorous versions
of such statements.)
Note that this notion of ignorability of the data generating mechanism is more
general than the simpler notion of randomly missing x ‘s. It does not require that
the missing x’s be similar to the observed ones. Given the assumptions of the
model (a constant fi irrespective of the level of x), the x’s can be missing
“ non-randomly,” as long as the conditional expectation of y given x does not
depend on which x’s are missing. For example, there is nothing especially wrong
if all “high” x’s are missing, provided e and x are independent over the whole
range of the data.
Even though with these assumptions p and y can be estimated consistently in
the A subsample there is still some more information about them in sample B.
The following questions arise then: (1) How much additional information is there
in sample B and about which parameters? (2) How should the missing values of x
be estimated (if at all)? What other information can be used to improve these
estimates?”
Options include using only z, using z and y, or using z and m, where m is an
additional variable, related to x but not appearing itself in the y equation.
To discuss this, it is helpful to specify an “auxiliary” equation for x:

x=6z++m+u, (5.3)

where E(u) = 0 and E( ue) = 0. Note that as far as this equation is concerned, the
missing data problem is one of missing the dependent variable for sub-sample B.
If the probability of being present in the sample were related to the size of U, we

“l%is section borrows heavily from Griliches, Hall and Hausman (1978).
1488 Z. Griliches

would be in the non-ignorable case as far as the estimation of 6 and + are


concerned. Assume this is not the case and let us consider at first only the
simplest case of $I = 0, with no additional m variables present.
One way of rewriting the model is then

y, = Px, + YZ, + e,,


(5.4)

How one estimates /3, y, and 6 depends on what one is willing to assume about
the world that generated such data. There are two kinds of assumptions possible:
The first is a “regression” approach, which assumes that the parameters which are
constant across different subsamples are the slope coefficients p, y, and 6 but
does not impose the restriction that CJ,’and CJ,’are the same across all the various
subsamples. There can be heteroscedasticity across samples as long as it is
independent from the parameters of interest. The second approach, the maximum
likelihood approach, would assume that conditional on z, y and x are distributed
normally and the missing data are a random sample from such a distribution.
This implies that crCt= IJ~ and uU”,= cr,2h.
The first approach starts by recognizing that under the general assumptions of
the model Sample A yields consistent estimates of p, y, and 6 with variance
covariance matrix I=. Then a “first order” procedure, i.e., one that estimates
missing x,‘s by f alone and does not iterate, is equivalent to the following:
Estimate /3,, To,, 8, from sample A, rewrite the y equation as

(5.5)

where E involves terms which are due to the discrepancy between the estimated /i
and 6 and their true population values. Then just estimate y from this “com-
pleted” sample by OLS.
It is clear that this procedure results in no gain in the efficiency of /3, since /?, is
based solely on sample A. It is also clear that the resulting estimate of y could be
improved somewhat using GLS instead of OLS.12
How much of a gain is there in estimating y this way? Let the size of sample A
be Nr and of B be N2. The maximum (unattainable) gain in efficiency would be
proportional to (Ni + N,)/N, (when u,” = 0). Ignoring the contribution of E’S,
which is unimportant in large samples, the asymptotic variance of y from the

‘*See Gourieroux and Monfort (1981).


Ch. 25: Economic Data Issues 1489

sample as a whole would be

Var(Yn+b)= [~~cr*+N,(a*+B:o,Z)]/(N,+~*)*u~~

and (5.6)

where CJ*= u,‘; and X = N,/(N, + N,). Hence efficiency will be improved as long
as p*u~/u’ c l/(1 - X), i.e. the unpredictable part of x (unpredictable from z) is
not too important relative to u *, the overall noise level in the y equation.13
Let us look at a few illustrative calculations. In the work to be discussed below,
y will be the logarithm of the wage rate, x is IQ, and z is schooling. IQ scores are
missing for about one-third of the sample, hence X = f. But the “importance” of
IQ in explaining wage rates is relatively small. Its independent contribution
(p”u,‘) is small relative to the large unexplained variance in y. Typical numbers
are j3 = 0.005, uU=12, and u = 0.4, implying

Eff(&+,) = 2,3[1+ 4 p] = 0.672,

which is about equal to the 4’s one would have gotten ignoring the terms in the
brackets. Is this a big gain in efficiency? First, the efficiency (squared) metric may
be wrong. A more relevant question is by how much can the standard error of y
be reduced by incorporating sample B into the analysis. By about 18 percent
(J&6?? = 0.82) for these numbers. Is this much? That depends how large the
standard error of y was to start out with. In Griliches, Hall and Hausman (1978)
a sample consisting of about 1,500 individuals with complete information yielded
an estimate of y, = 0.0641 with a standard error of 0.0052. Processing another
700 plus observations could reduce this standard error to 0.0043, an impressive
but rather pointless exercise, since nothing of substance depends on knowing y
within 0.001.
If IQ (or some other missing variable) were more important, the gain would be
even smaller. For example, if the independent contribution of x to y were on the
order of a*, then with one-third missing, Eff((~=+,,)2: 3, and the standard devia-
tion of y would be reduced by only 5.7 percent. There would be no gain at all, if
the missing variable was one and a half times as important as the disturbance [or
more generally if j3“u,‘/u * > l/(1 - X)].

I3 Thus, remark 2 of Gomieroux and Monfort (1981, p. 583) is in error. The first-order method is
not always more efficient. But an “appropriately weighted first-order method,” GLS, will be more
efficient. See Nijman and Palm (1985).
1490 Z. Griliches

The efficiency of such estimates can be improved a bit more by allowing for the
implied heteroscedasticity in these estimates and by iterating further across the
samples. This is seen most clearly by noting that sample B yields an estimate of
7i = /3 + $3 with an estimated standard error a,,. This information can be blended
optimally with the sample A estimates of p, y, 6, and 2, using non-linear
techniques and maximum likelihood is one way of doing this.
If additional variables. which could be used to predict x but which do not
appear on their own accord in the y equation were available, then there is also a
possibility to improve the efficiency of the estimated p and not just of y. Again,
unless these variables are very good predictors of x and unless the amount of
complete data available is relatively small, the gains in efficiency from such
methods are unlikely to be impressive. (See Griliches, Hall and Hausman, 1978,
and Haitovsky, 1968, for some illustrative calculations.)
The maximum likelihood approaches differ from the “first-order” ones by
using also the dependent variable y to “predict” the missing x’s, and by
imposing restrictions on equality of the relevant variances across the samples. The
latter assumption is not usually made or required by the first order methods, but
follows from the underlying likelihood assumption that conditional on z, x and y
are jointly normally (or some other known distributions) distributed, and that the
missing values are missing at random. In the simple case where only one variable
is missing (or several variables are missing at exactly the same places), the joint
likelihood connecting y and x to z, which is based on the two equations

y=px+yz+e,
x=sz+v, (5.7)

with Ee = a2, Ev2 = q2, Eev = 0 can be rewritten in terms of the marginal
distribution function of y given z, and the conditional distribution function of x
given y and z, with corresponding equations:

y=cz+u,
x=dy+fi+w, (5.8)

and Eu2 = g2, Ew2 = h2, E wu = 0. Given the normality assumption, this is just
another way of rewriting the same model, with the new parameters related to the
old ones by

c=y+ps, g2 = /3v2 + 02,


d = /3q2/( p2q2 + a2), f =&-cd, h2 = q2a2/g2. (5.9)

In this simple case the likelihood factors and one can estimate c and g2 from the
Ch. 25: Economic Data Issues 1491

Table 1
Earnings equations for NLS sisters: Various missing data estimators.a

Estimation Y dependent Tdependent


method S T s 02 q2

OLS on complete 0.0434 0.00433 3.211 0.1217 152.58


data sample (0.0109) (0.00148) (0.398)
N=366

Total Sample:
N=520
OLS with pre-
dicted IQ in 0.0423 0.00433 0.1186
missing portion* (0.00916) (0.00148)

GLS with pre- 0.0432 0.00433


dicted IQ* (0.00915) (0.00148)

Maximum Likeli- 0.0427 0.00421 3.205 0.1177 152.48


hood (0.00912) (0.00144) (0.346)

Y = log of wage rate, S = years of schooling completed, T = IQ type test score.


*The standard errors are computed using the Gourieroux-Monfort (1982) formulae. All variables
have been conditioned on age, region, race, and year dummy variables. The conditional moment
matrices are:

Complete data (N = 366) Incomplete (154)

LW 0.13488 0.12388
IQ 1.2936 187.71 -
SC 0.19749 11.0703 3.4476 0.23472 4.3408

‘Data Source: The National Longitudinal Survey of Young Women (see Center for Human
Resource Research, 1979).

complete sample; d, f, and h2 from the incomplete sample and solve back
uniquely for the original parameters p, y, 8, u2, and q2. In this way all of the
information available in the data is used and computation is simple, since the two
regressions (y on z in the whole sample and x on y and z in the complete data
portion) can be computed separately. Note, that while x is implicitly “estimated”
for the missing portion, no actual “predicted” value of x are either computed or
used in this framework.14
Table 1 illustrates the results of such computations when estimating a wage
equation for a sample of young women from the National Longitudinal Survey,
30 percent of which were missing IQ data. The first row of the table gives

t4Marini et al. (1980) describe such computations in the context of more than one set of variables
missing in a nested pattern.
1492 2. Griiiches

estimates computed solely from the complete data subsample. The second one
uses the schooling variable to estimate the missing IQ values in the incomplete
portion of the data and then re-computes the OLS estimates. The third row uses
GLS, reweighting the incomplete portion of the data to allow for the increased
imprecision due to the estimation of the missing IQ values. The last row reports
the maximum likelihood estimates. All the estimates are very close to each other.
Pooling the samples and “estimating” the missing IQ values increases the efficiency
of the estimated schooling coefficient by 29 percent. Going to maximum likeli-
hood adds another percentage point. While these gains are impressive, substan-
tively not much more is learned from expanding the sample except that no special
sample selectivity problem is caused by ignoring the missing data subset. The ~22
test for pooling yields the insignificant value of 0.8. That the samples are roughly
similar, also can be seen from computing the biased schooling coefficient (ignor-
ing IQ) in both matrices: it is equal to 0.057 (0.010) in the complete data subset
and 0.054 in the incomplete one.
The maximum likelihood computations get more complicated when the likeli-
hood does not factor as neatly as it does in the simple “nested” missing case. This
happens in at least two important common cases: (1) If the model is overiden-
tified then there are binding constraints between the L(ylz, 19,)and L(xly, z, 19~)
pieces of the overall likelihood function. For example, if we have an extra
exogenous variable which can help predict x but does not appear on its own in
the “structural” y equation, then there is a constraining relationship between the
8, and S, parameters and maximum likelihood estimation will require iterating
between the two. This is also the case for multi-equation systems where, say, x is
itself structurally endogenous because it is measured with error. (2) If the pattern
of “missingness” is not nested, if observations on some variables are missing in a
number of different patterns which cannot be arranged in a set of nested blocks,
then one cannot factor the likelihood function conveniently and one must
approach the problem of estimating it directly.
There are two related computational approaches to this problem: The first is
the EM algorithm (Dempster et al., 1977). This is a general approach to
maximum likelihood estimation where the problem is divided into an iterative
two-step procedure. In the E-step (estimation), the missing values are estimated
on the basis of the current parameter values of the model (in this case starting
with all the available variances and covariances) and an M-step (maximization) in
which maximum likelihood estimates of the model parameters are computed
using the “completed” data set from the previous step. The new parameters are
then used to solve again for the missing values which are then used in turn to
reestimate the model, and this process is continued until convergence is achieved.
While this procedure is easy to program, its convergence can be slow, and there
are no easily available standard error estimates for the final results (though Beale
and Little, 1975, indicate how they might be derived).
Ch. 25: Economic Data Issues 1493

An alternative approach, which may be more attractive to model oriented


econometricians and sociologists, given the assumption of ignorability of the
process by which the data are missing, is to focus directly on pooling the available
information from different portions of the sample which under the assumptions of
the model are independent of each other. That is, the data are summarized by
their relevant variance-covariance matrices (and means, if they are constrained
by the model) and the model is expressed in terms of constraints on the elements
of such matrices. What is done next is to “fit” the model to the observed matrices.
This approach is based on the idea that for multivariate normally distributed
random variables the observed moment matrix is a sufficient statistic. Many
models can be written in the form Z(0), where Z is the true population
covariance matrix associated with the assumed multivariate normal distribution
and 6 is a vector of parameters of interest. Denote the observed covariance
matrix as S. Maximizing the likelihood function of the data with respect to the
model parameters comes down to maximizing

(5.10)

with respect to 8. If 0 is exactly identified, the estimates are unique and can be
solved directly from the definition of 2 and the assumption that S is a consistent
estimator of it. If 8 is over-identified, then the maximum likelihood procedure
“fits” the model Z(0) to the data S as best as possible. If the observed variables
are multivariate normal this estimator is the Full Information Maximum Likeli-
hood estimator for this model. Even if the data are not multivariate normal but
follow some other distribution with E(sle) = 2(e), $is is a pseudo- or quasi-
maximum likelihood estimator yielding a consistent r3.15The correctness of the
computed standard errors will depend, however, on the validity of the normality
assumption. Robust standard errors for this model can be computed using the
approach of White.
There is no conceptual difficulty in generalizing this to a multiple sample
situation where the resulting Z;.(ei) may depend on somewhat different parame-
ters. As long as these matrices can be taken as arising independently, their
respective contributions to the likelihood function can be added up, and as long
as the ej’s have parameters in common, there is a return from estimating them
jointly. This can be done either utilizing the multiple samples feature of LISREL-V
(see Allison, 1981, and Joreskog and Sorbom, 1981) or by extending the
MOMENTS program (Hall, 1979) to the connected-multiple matrices case. The
estimation procedure combines these different matrices and their associated pieces
of the likelihood function, and then iterates across them until a maximum is
found. (See Bound, Griliches and Hall, 1984, for more exposition and examples.)

ISSee Van Praag (1983).


1494 2. Griliches

I will outline this type of approach in a somewhat more complex, multi-equa-


tion context: the estimation of earnings functions from sibling data while allow-
ing for an unobserved ability measure and errors of measurement in the variable
of interest -schooling. (See Griliches, 1974 and 1979 for an exposition of such
models.) The simplest version of such a model can be written as follows:

t=a+e,=(f+g)+e,,
s=Sa+h+e,=S(f+g)+(w+u)+e,, (5.11)

y=/&z+X(s_e,)+e,=7r(f+g)+y(w+u)+e,,

where t is a reported IQ-type test score, s is the recorded years of school


completed, and y = In wage rate, is the logarithm of the wage rate on the current
or last job, a = (f + g) is an unobserved “ability” factor with f being its
“family” component. h = (w + u) is the individual opportunity factor (above and
beyond a and hence assumed to be orthogonal to it), with w, “wealth,” as its
family component. The e’s are all random, uncorrelated and untransmitted
measurement errors. That is

and 7~= p + yS. In addition, it is convenient to define

Vara = a', Varh = h2,


(5.12)
7 = Varf/a2, p=Varw/h2,

where 7 and p are the ratios of the variance of the family components to total
variance in the a and h factors respectively.
Given these assumptions, the expected values of the variance-covariance
matrix of all the observed variables across both members of a sib-pair is given by

t1 Sl Yl t2 $2 Y2
t, a*+a,2 6a2 77a2 ra 2 da* ma 2
31
S2a2 + h2 + uf 6na2 + yh2 d2a2 + ph* dma2 + pyh2 ,
Yl r2a2 + y2h2 + 03” m2a2 + py2h2

(5.13)
where only the 12 distinct terms of the overall 6 x 6 matrix are shown, since the
others are derivable by symmetry and by the assumption that all the relevant
variances (conditional on a set of exogenous variables) are the same across sibs.
With 10 unknown parameters this model would be under-identified without
Ch. 25: Economic Data Issues 1495

sibling data. This type of model was estimated by Bound, Griliches and Hall
(1984) using sibling data from the National Longitudinal Surveys of Young Men
and Young Women. l6 They had to face, however, a very serious missing data
problem since much of the data, especially test scores, were missing for one or
both of the siblings. Data were complete for only 164 brothers pairs and 151
sister pairs but additional information subject to various patterns of “missing-
ness” was available for 315 more male and 306 female siblings pairs and 2852 and
3398 unrelated male and female respondents respectively. Their final estimates
were based on pooling the information from 15 different matrices for each sex
and were used to test the hypothesis that the unobserved factors are the same for
both males and females in the sense that their loading (coefficients) are similar in
the male and female versions of the model and that the implied correlation
between the male and female family components of these factors was close to
unity. The latter test utilized the cross-sex cross-sib covariances arising from the
brother-sister pairs (N = 774) in these panels.
Such pooling of data reduced the estimated standard errors of the major
coefficients of interest by about 20 to 40 percent without changing the results
significantly from those found solely in their “complete data” subsample. Their
major substantive conclusion was that taking out the mean differences in wages
between young males and females, one could not detect significant differences in
the impact of the unobservables or in their patterns between the male and female
portions of their samples. As far as the IQ-Schooling part of the model is
concerned, families and the market appeared to be treating brothers and sisters
identically.
A class of similar problems occurs in the time series context: missing data at
some regular time intervals, the “construction” of quarterly data from annual
data and data on related time series, and other “interpolation” type issues. Most
of these can be tackled using adaptations of the methods described above, except
for the fact that there is usually more information available on the missing values
and it makes sense to adapt these methods to the structure of the specific
problem. A major reference in this area is Chow and Lin (1971). More recent
references are Harvey and Pierse (1982) and Palm and Nijman (1984).

6. Missing variables and incomplete models

“Ask not what you can do to the data but rather what the data can do
for you.”

Every econometric study is incomplete. The stated model usually lists only the
“major” variables of interest and even then it is unlikely to have good measures
for all of the variables on the already foreshortened list. There are several ways in

“?he cited paper uses a more detailed 4 equation model based on an additional “early” wage rate.
1496 Z. Griliches

which econometricians have tried to cope with these facts of life: (1) Assume that
the left-out components are random, minor, and independent of all the included
exogenous variables. This throws the problem into the “disturbance” and leaves it
there, except for possible considerations of heteroscedasticity, variance-compo-
nents, and similar adjustments, which impinge only on the efficiency of the usual
estimates and not on their consistency. In many contexts it is difficult, however, to
maintain the fiction that the left-out-variables are unrelated to the included ones.
One is pushed than into either, (2), a specification sensitivity analysis where the
direction and magnitude of possible biases are explored using prior information,
scraps of evidence, and the standard left-out-variable bias formulae (Griliches
1957 and Chapter 5) or (3) one tries to transform the data so as to minimize the
impact of such biases.
In this section, I will concentrate on this third way of coping which has used
the increasingly available panel data sets to try to get around some of these
problems. Consider, then, the standard panel data set-up:

Y,,= “+P(i,t)xir+y(i,t)z,,+eif, (6.1)


where y,, and xit are the observed dependent and “independent” variables
respectively, /? is the set of parameters of interest, z,~ represents various possible
m&specifications of the model in the form of left out variables, and e,, are the
usual random shocks assumed to be well behaved and independently distributed
(at this level of generality almost all possible deviations from this can be
accommodated by redefining the z ‘s). Two basic assumptions are made very early
on in this type of model. The first one, that the relationship is linear, is already
implicit in the way I have written (6.1). The second one is that the major
parameters of interest, the p’s, are both stable over time and constant across
individuals. I.e.,

P(i> t> =P. (6.2)

Both of these assumptions are in principle testable, but are rarely questioned in
practice. Unless there is some kind of stability in /?, unless there is some interest
in its central moments, it is not clear why one would engage in estimation at all.
Since the longitudinal dimension of such data is usually quite short (2-10 years),
it makes little sense to allow /3 to change over time, unless one has a reasonably
clear idea and a parsimonious parameterization of how such changes happen.
(The fact that the p’s are just coefficients of a first order linear approximation to
a more complicated functional relationship and hence should change as the level
of X’S changes can be allowed for by expanding the list of x’s to contain higher
order terms.)
The assumption that pi = p, that all individuals respond alike (up to the
additive terms, the zi, which can differ across individuals), is one of the more
Ch. 25: Economic Data Issues 1491

bothersome ones. If longer time series were available, it would be possible to


estimate separate pi’s for each individual or firm. But that is not the world we find
ourselves in at the moment. Bight now there are basically three outs from this
corner: (1) Assume that all differences in the &‘s are random and uncorrelated
with everything else. Then we are in the random coefficients world (Chapter 21)
and except for issues of heteroscedasticity the problem goes away; (2) Specify a
model for the differences in j?,, making them depend on additional observed
variables, either own individual ones or higher-order macro ones (cf. Mundlak
1980). This results in defining a number of additional “interaction” variables with
the x set. Unless there is strong prior information on how they differ, this
introduces an additional dimension to the “specification search’ (in Learner’s
terminology) and is not very promising; (3) Ignore it, which is what I shall
proceed to do for the moment, focusing instead on the heterogeneity which is
implicit in the potential existence of the zj’s, the ignored or unavailable variables
in the model.
Even if (6.1) is simplified to

.Yj, = OL+ Pxit + YtZit + eit (6.3)

p is not identified from the data in the absence of direct observations on z.


Somehow, assumptions have to be made about the source of the z’s and their
distributional properties, before it is possible to derive consistent estimators of j3.
There are (at least) three categories of assumptions that can be made about such
z’s which lead to different estimation approaches in this context: (a) The z’s are
random and independent of x ‘s. This is the easy but not too likely case. The z’s
can be collapsed then into the ej’s with only the heteroscedasticity issue remain-
ing for the “random effects” model to solve. (b) The z’s are correlated with the
x’s but are constant over time and have also constant effects on the y ‘s. I.e.,

Y(t)Zjl=Zj, (6.4)

where we have normalized y = 1. This is the standard “fixed” or “correlated”


effects model (see Maddala 1971, and Mundlak 1978) which has been extensively
analyzed in the recent literature. This is the case for which the panel structure of
the data provides a perfect solution. Letting each individual have its own mean
level and expressing all the data as deviations from own means eliminates the z’s
and leads to the use of “within” estimators.

where J,. = (l/T)~T=,yil, etc., and yields consistent estimates of /I.


1498 2. Griliches

I have only two cautionary comments on this topic: As is true in many other
contexts, and as was noted earlier, solving one problem may aggravate another. If
there are two reasons for the zit, e.g. both “fixed” effects and errors in variables,
then

zir = ai - BEit, (6.6)


where oi is the fixed individual effect and &if is the random uncorrelated over
time error of measurement in xii. In this type of model (Y~causes an upward bias
in the estimated /3 from pooled samples while eit results in a negative one. Going
“within” not only eliminates (Y,but also increases the second type of bias through
the reduction of the signal to noise ratio. This is seen easiest in the simplest panel
model where T = 2 and within is equivalent to first differencing. Undifferenced,
an OLS estimate of /3 would yield

plim(&--P)=b,,,-PA,, (6.7)

where b, x is the auxiliary regression coefficient in the projection of the (Y;‘son the
x’s, whiie A,= u~*/u,” is the error variance ratio in x. Going “within”, on the
other hand, would eliminate the first term and leave us with

PWLL-P) = -FL=-P&-/(1-P), (6.8)


where p is the first order serial correlation coefficient of the x ‘s. A plausible
example might have /I = 1, p,,, =0.2, X.=0.1, and &=1+0.2_-O.l=l.l. Now,
as might not be unreasonable, if p = 0.67, then $W= 0.3 and /3, = 0.7, which is
more biased than was the case with the original &.
This is not an idle comment. Much of the recent work on production function
estimation using panel data (e.g. see Griliches-Mairesse, 1984) starts out worry-
ing about fixed effects and simultaneity bias, goes within, and winds up with
rather unsatisfactory results (implausible low coefficients). Similarly, the rather
dramatic reductions in the schooling coefficient in earnings equations achieved by
analyzing “within” family data for MZ twins is also quite likely the result of
originally rather minor errors of measurement in the schooling variable (see
Griliches, 1979 for more detail).
The other comment has to do with the unavailability of the “within” solution if
the equation is intrinsically non-linear since, for example, the mean of ex + E is
not equal to e’ + E. This creates problems for models in which the dependent
variables are outcomes of various non-linear probability processes. In special
cases, it is possible to get around this problem by conditioning arguments.
Chamberlain (1980) discusses the logit case while Hausman, Hall and Griliches
(1984) show how conditioning on the sum of outcomes over the period as a whole
Ch. 25: Economic Dais Issues 1499

converts a Poisson problem into a conditional multinominal logit problem and


allows an equivalent “within’ unit analysis.
(c) Non-constant effects. The general case here is one of a left out variable(s)
and nothing much can be done about it unless more explicit assumptions are
made about how the unseen variables behave and/or what their effects are.
Solutions are available for special cases, cases that make restrictive enough
assumptions on the y(t)z,, terms and their correlations with the included x
variables (see Hausman and Taylor, 1981).
For example, it is not too difficult to work out the relevant algebra for
Y(t)+ = Y,‘Z;, (6.9)
or

Y(r)zit= -P&it? (6.10)


where eit is an i.i.d. measurement error in x. The first version, eq. (6.9) is one of a
“fixed” common effect with a changing influence over time. Such models have
been considered by Stewart (1983) in the estimation of earnings function, by
Pakes and Griliches (1984) for the estimation of geometric lag structures in panel
data where the unseen truncation remainders decay exponentially over time, and
by Anderson and Hsiao (1982) in the context of the estimation of dynamic
equations with unobserved initial conditions. The second model, eq. (6.10), is the
pure EVM in the panel data context and was discussed in Section IV. It is
estimable by using lagged x’s as instruments, provided the “true” x’s are
correlated over time, or by grouping methods if independent (of the errors)
information is available which allows one to group the data into groups which
differ in the underlying “true” x’s (Pakes, 1983). Identification may become
problematic when the EVM is superimposed on the standard fixed effects model.
Estimation is still possible, in principle, by first differencing to get rid of the ai’s,
the fixed effects, and then using past and future x’s as instruments. (See Griliches
and Hausman, 1984.)
Some of these issues can be illustrated by considering the problem of trying to
estimate the form of a lag structure from a relatively short panel.” Let us define a
flexible distributed lag equation

where the constancy of the /3’s is imposed across individuals and across time. The
empirical problem is how does one estimate, say, 9 p’s if one only has four to five

“The following discussion borrows heavily from Pakes and Griliches (1984).
1500 Z. Griliches

years history on the y’s and x ‘s. In general this is impossible. If the length of the
lag structure exceeds the available data, then the data cannot be informative
about the unseen tail of the lag distribution without the imposition of stronger
a priori restrictions. There are at least two ways of doing this: (a) We can assume
something strong about the /3’s. For example, that they decline geometrically
after a few free terms, that &+i = A/?,. This leads us back to the geometric lag
case which we know more or less how to handle.‘* (b) We can assume something
about the unseen x’s, that they were constant in the past (in which case we are
back to the fixed effects with a changing coefficient case), or that they follow some
simple low order autoregressive process (in which case their influence on the
included x’s dies out after a few terms).
Before proceeding along these lines, it is useful to recall the notion of the
II-matrix, introduced in Chapter 22, which summarizes all the (linear) informa-
tion contained in the standard time series -cross section panel model. This
approach, due to Chamberlain (1982), starts with the set of unconstrained
multivariate regressions, relating each year’s y,, to all of the available x’s, past,
present, and future. Consider, for example, the case where data on y are available
for only three years (T = 3) and on x’s for four. Then the IT matrix consists of
the coefficients in the following set of regressions:

Yll = n13x3i + n12x2i + Bllxli + ?TIOxOi + uli7

Y2, = T23x3i + *22x2i + T21xli + T20x01 + ‘2i9 (6.12)


Y3, = *33x3r + g32x3r + r31xli + *30xOi + ‘3i,

where we have ignored constants to simplify matters. Now all that we know from
our sample about the relationship of the y’s to the x’s is summarized in these 7r’s
(or equivalently in the overall correlation matrix between all the y’s and the x ‘s),
and any model that we shall want to fit will impose a set of constraints on it.19
A series of increasingly complex possible worlds can be written as:

a. _Yi,= POX;, + Plxit-1 + city

b. yit = poxit + &x,~_~ + ai + e,,,

C. yit=~oXir+~1(Xit-1+hxif_2+~2x,t_3+ ..‘)+e;t,

d. Yjt=PoXir+P1(Xi,_1+XXi,-2+~2Xir-3’ *.*)+a,+ei,,
(6.13)
e. _Yi,= POxit + Plxit- 1+ P2Xir_2 + P3Xi*-3 + P4x**p4 ’’. + eiz,

Xit = PXjr- 1 + &if’

f. Yi, = POxit + Plxit-l +P2xir_2+P3x,t-3+~qX;r-4”. +ai+ezry

xit = ka; + PXil_l + Ejt,

“See Anderson and Hsiao (1982) and Bhargava and Sargan (1983).
“There may be, of course, additional useful information in the separate correlation matrices
between all of the y’s and all the x’s respectively.
Ch. 25: Economic Data Issues 1501

going from the simple one lag, no fixed effects case (a) to the arbitrary lag
structure with the one factor correlated effects structure (f). For each of these
cases we can derive the expected value of II. It is obvious that (a) implies

For the b case, fixed effects with no lags, we need to define the wide sense least
squares projection (E*) of the unseen effects (CQ)on all the available x’s

E*((Y;lXOz. * * x3;) = 6,x,j + 82x2; + 6,x,; + 6,x,;. (6.14)

Then

63 82 6, + PO 8, + P1
II(b)= 6, 6, +I-% h+P1 43 .
1 h+Po &+P1 6, 4l

To write down the II matrix for c, the geometric lag case, we rewrite (6.11) as

Yli = POxli + PlxOi + ‘i + eliY

Y2i=~~x2i+BlXli+PlXXOj+hZj+ e2j5 (6.15)

y-ji = &Xsi + plX2j + plXxli + &A2xOi + A2zi + e3i,

and (6.14) as

E *( zilx) = m’x (6.16)

which gives us the II matrix corresponding to the geometric tail case

m3 m2 m,+& mo+&

n(c)= Am3 Am2 + PO AmI + PI A(mo+&) .

A2m3+& A’m,+& A2m,+h& h2(m,+&)


i i

This imposes a set of non-linear constraints on the II matrix, but is estimable


with standard non-linear multivariate regression software (in SAS or TSP). In this
1502 Z. Griliches

case we have seven unknown parameters to estimate (4 m’s, 2 p ‘s, and A) from
the 12 unconstrained I7 coefficients.20
Adding fixed effects on top of this, as in d, adds another four coefficients to be
estimated and strains identification to its limit. This may be feasible with larger T
but the data are unlikely to distinguish well between fixed effects and slowly
changing initial effects, especially in short panels.
Perhaps a more interesting version is represented by (6.13e), where we are
unwilling to assume an explicit form for the lag distribution since that happens to
be exactly the question we wish to investigate, but are willing instead to assume
something restrictive about the behavior of the x’s in the unseen past; specifically
that they follow an autoregressive process of low order. In the example sketched
out, we never see x_1, x_2 and x_~, and hence cannot identify /3, (or even &)
but may be able to learn something about PO, &, and &. If the x’s follow a first
order autoregressive process, then it can be shown (see Pakes and Griliches, 1984)
that in the projection of x_, on all the observed x’s

E*(x_,Ix3, x.2, Xl, x0) = g’x = o*x,i +o*xzifO-xli + g7-XOi, (6.17)

only the last coefficient is non-zero, since the partial correlation of x _-T with all
the subsequent x’s is zero, given its correlation with x0. If the x’s had followed a
higher order autoregression, say third order, then the last three coefficients would
be non-zero. In the first order case the II matrix is

0 0 PO P1+P2g1+ Pa2 + P4g3


II(e)= 0 PO PI P2+ P3&+ P4g2 >
i PO 81 P2 P3 + P4& 1

where now only PO, & and p2 are identified from the data. Estimation proceeds
by leaving the last column of II free and constraining the rest of it to yield the
parameters of interest. 21 If we had assumed that the x’s are AR(2), we would be
able to identify only the first two p’s, and would have to leave the last two
columns of II free.

20An alternative approach would take advantage of the geometric nature of the lag structure, and
use lagged values of the dependent variable to solve out the unobserved z,‘s. Using the lagged
dependent variables formulation would introduce both an errors-in-variables problem (since y,-,
proxies for z subject to the e,_ 1 error) and a potential simultaneity problem due to their correlation
with the a,% (even if the (I’S are not correlated with the x’s). Instruments are available, however, in
the form of past y’s and future x’s and such a system is estimable along the lines outlined by
Bhargava and Sargan (1983).
21This is not fully efficient. If we really believe that the x’s follow a low order Markov process with
stable coeffiecients over time (which is not necessary for the above), then the equations for x can be
appended to this model and the g’s would be estimated jointly, constraining this column of II also.
Ch. 25: Economic Data Issues 1503

The last case to be considered, represents a mixture of fixed effects and


truncated lag distributions. The algebra is somewhat tedious (see Pakes and
Griliches, 1984) and leads basically to a mixture of the (c) and (e) case, where the
fixed effects have changing coefficients over time, since their relationship to the
correlated truncation remainder is changing over time:

where I have normalized m, = 1. The first three /3’s should be identified in this
model but in practice it may be rather hard to distinguish between all these
parameters, unless T is significantly larger than 3, the underlying samples are
large, and the x’s are not too collinear.
Following Chamberlain, the basic procedure in this type of model is first to
estimate the unconstrained version of the II matrix, derive its correct
variance-covariance matrix allowing for the heteroscedasticity introduced by our
having thrust those parts of the (Y~and zi which are uncorrelated with the x’s into
the random term (using the formulae in Chamberlain 1982, or White 1980) and
then impose and test the constraints implied by the specific version deemed
relevant.
Note that it is quite likely (in the context of larger T) that the test will reject all
the constraints at conventional significance levels. This indicates that the underly-
ing hypothesis of stability over time of the relevant coefficient may not really
hold. Nevertheless, one may still use this framework to compare among several
more constrained versions of the model to see whether the data indicate, for
example, that “if you believe in a distributed lag model with fixed coefficients,
then two terms are better than one.”
Some of these ideas are illustrated in the following empirical example which
considers the ubiquitous question of “capital.” What is the appropriate way to
define it and measure it? This is, of course, an old and much discussed question to
which the theoretical answer is that in general it cannot be done in a satisfactory
fashion (Fisher, 1969) and that in practice it depends very much on the purpose at
hand (Griliches, 1963). There is no intention of reopening the whole debate here
(see the various papers collected in Usher 1980 for a review of the recent state of
this topic); the focus is rather on the much narrower question of what is the
appropriate functional form for the depreciation or deterioration function used in
the construction of conventional capital stock measures. Almost all of the data
used empirically are constructed on the basis of conventional “length of life”
assumptions developed for accounting and tax purposes and based on very little
direct evidence on the pattern of capital services over time. These accounting
1504 2. Griliches

estimates are then taken to imply rather sharp declines in the service flows of
capital over time using either the straight line or double declining balance
depreciation formulae. Whatever independent evidence there is on this topic
comes largely from used assets markets and is heavily contaminated by the effects
of obsolescence due to technical improvements in newer assets.
Pakes and Griliches (1984) present some direct empirical evidence on this
question. In particular they asked: What is the time pattern of the contribution of
past investments to current profitability? What is the shape of the “deterioration
of services with age function” (rather than the “decline in present value”
patterns)? All versions of capital stock measures can be thought of as weighted
sums of past investments:

K, = cw,I,-,, (6.18)

with w, differing according to the depreciation schemes used. Since investments


are made to yield profits and assuming that ex ante the expected rate of return
comes close to being equalized across different investments and firms, one would
expect that

(6.19)

where e, is the ex post discrepancy between expected and actual profits assumed
to be uncorrelated with the ex ante optimally chosen I ‘s. Given a series on II,
and I,, in principle one could estimate all the w parameters except for the
problem that one rarely has a long enough series to estimate them individually,
especially in the presence of rather high multi-collinearity in the 1’s. Pakes and
Griliches used panel data on U.S. firms to get around this problem, which greatly
increases the available degrees of freedom. But even then, the available panel data
are rather short in the time dimension (at least relative to the expected length of
life of manufacturing capital) and hence some of the methods described above
have to be used.
They used data on the gross profits of 258 U.S. manufacturing firms for the
nine years 1964-72 and their gross investment (deflated) for 11, years 1961-71.
Profits were deflated by an overall index of the average gross rate of return
(1972 = 100) taken from Feldstein and Summers (1977) and all the observations
were weighted inversely to the sum of investment over the whole 1961-71 period
to adjust roughly for the great heteroscedasticity in this sample. Model (6.13f) of
the previous section was used. That is, they tried to estimate as many uncon-
strained w terms as possible asking whether these coefficients in fact decline as
rapidly as is assumed by the standard depreciation formulae. To’ identify the
model, it was assumed that in the unobserved past the I ‘s followed an autoregres-
Ch. 25: Economic Data Issues 1505

sive process. Preliminary calculations indicated that it was adequate to assume a


third order autoregression for I. Since they had also an accounting measure of
capital stock as of the beginning of 1961, it could be used as an additional
indicator of the unseen past I ‘s. The possibility that more profitable firms may
also invest more was allowed for by including individual firm effects in the model
and allowing them to be correlated with the I’s and the initial K level. The
resulting set of multivariate regressions with non-linear constraints on coefficients
and a free covariance matrix was estimated using the LISREL-V program of
Joreskog and Sorbom (1981).
Before their results are examined a major reservation should be noted about
this model and the approach used. It assumes a fixed and common lag structure
(deterioration function) across both different time periods and different firms
which is far from being realistic. This does not differ, however, from the common
use of accounting or constructed capital measures to compute and compare “rates
of return” across projects, firms, or industries. The way “capital” measures are
commonly used in industrial organization, production function, finance, and
other studies implicitly assumes that there is a stable relationship between
earnings (gross or net) and past investments; that firms or industries differ only
by a factor of proportionality in the yield on these investments, with the time
shape of these yields being the same across firms and implicit in the assumed
depreciation formula. The intent of the Pakes-Griliches study was to question
only the basic shape of this formula rather than try to unravel the whole tangle at
once.
Their main results are presented in Table 2 and can be summarized quickly.
There is no evidence that the contribution of past investments to current profits
declines rapidly as is implied by the usual straight line or declining balance
depreciation formula. If anything, they rise during the first three years! Introduc-
ing the 1961 stock as an additional indicator improves the estimates of the later
W’S and indicates no noticeable decline in the contribution of past investments
during their first seven years. Compared against a single traditional stock measure
(column 3), this model does a significantly better job of explaining the variance of
profits across firms and time. But it does not come close to doing as well as the
estimates that correspond to the free II matrix, implying that such lag structures
may not be stable across time and/or firms. Nevertheless, it is clear that the usual
depreciation schemes which assume that the contribution of past investments
declines rapidly and immediately with age are quite wrong. If anything, there may
be an “appreciation” in the early years as investments are completed, shaken
down, and adjusted to.22

‘*For a methodologically related study see Hall, Griliches and Hausman (1983) which tried to figure
out whether there is a significant “tail” to the patents as a function of past R&D expenditures lag
structure.
1506 Z. Griliches

Table 2
The relationship of profits to past investment expenditures for U.S. manufacturing firms:
Parameter estimates allowing for heterogeneity.*

Parameter Comparison 3 years 3 years


(standard model investment investment
error) Without kg, With kE, (system 10) + k:,+4 + kfr-4

(1) (2) (3) (4) (5)

Wl 0.067 0.068 0.073 0.057


(0.028) (0.027) (0.022) (0.021)
w2 0.115 0.112 0.104 0.077
(0.033) (0.032) (0.022) (0.022)
w3
0.224 0.222 0.141 0.120
(0.041) (0.040) (0.024) (0.024)
w4 0.172 0.208
(0.046) (0.046)
W5 0.072 0.198
(0.049) (0.050)
w6
0.096 0.277
(0.062) (0.057)
- 0.122 0.202
WI
(0.094) (0.076)
wx - 0.259 0.087
(0.133) (0.103)
Coefficient of:
k:I, 0.095
(0.012)
K-4 0.103
(0.011)
V-4 0.045
(0.006)

(Trace @/253.6” 1.18 2.04 1.35 1.37

‘fi = Estimated covariance matrix of the disturbances from the system of profit eqs. (across years).
For the free II matrix: trace fi = 253.6
*The dependent variable is gross operating income deflated by the implicit GNP deflator and an
index of the overall rate of return in manufacturing (1972 =l.O). The w, refer to the coefficients of
gross investment expenditures in period t-r deflated by the implicit GNP producer durable
investment deflator. kz and kz are deflated Compustat measures of net and gross capital at the
beginning of the year. kg, refers to undeflated gross capital in 1961 as reported by Compustat. All
variables are divided by the square root of the firm’s mean investment expenditures over the 1961-71
period. Dummy variables for the nine time periods are included in all equations. N = 258 and T = 9.
The overall fit, measured by 1 -(trace h/1208.4), 1208.4 =X:3;,, where $, is the sample variance
in _r,, is 0.72 for the model in Column 2 as against 0.79 for the free FI matnx.
From: Pakes and Griliches (;984).
Ch. 25: Economic Data Issues 1507

7. Final remarks

The dogs bark but the caravan keeps moving.


A Russian proverb

Over 30 years ago Morgenstern (1950) asked whether economic data were
accurate enough for the purposes that economists and econometricians were using
them for. He raised serious doubts about the quality of many economic series and
implicitly about the basis for the whole econometrics enterprise. Years have
passed and there has been very little coherent response to his criticisms.
There are basically four responses to his criticism and each has some merit: (1)
The data are not that bad. (2) The data are lousy but it does not matter. (3) The
data are bad but we have learned how to live with them and adjust for their
foibles. (4) That is all there is. It is the only game in town and we have to make
the best of it.
There clearly has been great progress both in the quality and quantity of the
available economic data. In the U.S. much of the agricultural statistical data
collection has shifted from judgment surveys to probability based survey sam-
pling. The commodity converge in the various official price indexes has been
greatly expanded and much more attention is being paid to quality change and
other comparability issues. Decades of criticisms and scrutiny of official statistics
have borne some fruit. Also, some of the aggregate statistics have now much more
extensive micro-data underpinnings. It is now routine, in the U.S., to collect large
periodic labor force activity and related topics surveys and release the basic
micro-data for detailed analysis with relatively short lags. But both the improve-
ments in and the expansion of our data bases have not really disposed of the
questions raised by Morgenstern. As new data appear, as new data collection
methods are developed, the question of accuracy persists. While quality of some
of the “central” data has improved, it is easy to replicate some of Morgenstem’s
horror stories even today. For example, in 1982 the U.S. trade deficit with Canada
was either $12.8 or $7.9 billion depending on whether this number came from
U.S. or Canadian publications. It is also clear that the national income statistics
for some of the LDC’s are more political than economic documents (Vernon,
1983).23
Morgenstem did not distinguish adequately between levels and rates of change.
Many large discrepancies represent definitional differences and studies that are
mostly interested in the movements in such series may be able to evade much of
this problem. The tradition in econometrics of allowing for “constants” in most
relationships and not over-interpreting them, allows implicitly for permanent

23See also Prakash (1974) for a collection of confidence shattering comparisons of measures of
industrial growth and trade for various developing countries based on different sources.
1508 Z. Griliches

“errors” in the levels of the various series. It is also the case that in much of
economic analysis one is after relatively crude first order effects and these may be
rather insensitive even to significant inaccuracies in the data. While this may be
an adequate response with respect to much of the standard especially macro-eco-
nomic analysis, it seems inadequate when we contemplate some of the more
recent elaborate non-linear multi-equational models being estimated at the fron-
tier of the subject. They are much more likely to be sensitive to errors and
inconsistencies in the data.
In the recent decade there has been a revival of interest in “error” models in
econometrics, though the progress in sociology on this topic seems more impres-
sive. Recent studies using micro-data from labor force surveys, negative-tax
experi*ments and similar data sources exhibit much more sensitivity to measure-
ment error and sample selectivity problems. Even in the macro area there has
been some progress (see de Leeuw and McKelvey, 1983) and the “rational
expectations” wave has made researchers more aware of the discrepancy between
observed data and the underlying forces that are presumably affecting behavior.
All of this has yet to make a major dent on econometric textbooks and
econometric teaching but there are signs that change is coming.24 It is more
visible in the areas of discrete variable analysis and sample selectivity issues, (e.g.
note the publication of the Maddala (1983) and Man&i-McFadden (1981)
monographs) than in the errors of measurement area per se, but the increased
attention that is devoted to data provenance in these contexts is likely to spill over
into a more general data “aware” attitude.
One of the reasons why Morgenstem’s accusations were brushed off was that
they came from “outside” and did not seem sensitive to the real difficulties of
data collection and data generation. In most contexts the data are imperfect not
by design but because that is all there is. Empirical economists have over
generations adopted the attitude that having bad data is better than having no
data at all, that their task is to learn as much as is possible about how the world
works from the unquestionably lousy data at hand. While it is useful to alert users
to their various imperfections and pitfalls, the available economic statistics are
our main window on economic behavior. In spite of the scratches and the
persistent fogging, we cannot stop peering through it and trying to understand

“Theil(l978) devotes five pages out of 425 to this range of problems. Chow (1983) devotes only six
pages out of 400 to this topic directly, but does return to it implicitly in the discussion of rational
expectations models. Dhrymes (1974) does not mention it explicitly at all, though some of it is implicit
in his discussion of factor analysis. Dhrymes (1978) does devote about 25 pages out of 500 to this
topic. Maddala (1977) and Malinvaud (1980) devote separate chapters to the EVM, though in both
cases these chapters represent a detour from the rest of the book. The most extensive textbook
treatment of the EVM and related topics appears in a chapter by Judge et al. (1980). The only book
that has some explicit discussion of economic data is Intriligator (1978). Except for the sample
selection literature there is rarely any discussion of the processes that generate economic data and the
resultant implications for econometric practice.
Ch. 25: Economic Data Issues 1509

what is happening to us and to our environment, nor should we. The problematic
quality of economic data presents a continuing challenge to econometricians. It
should not cause us to despair, but we should not forget it either.
In this somewhat disjointed survey, I discussed first some of the long standing
problems that arise in the encounter between the practicing econometrician and
the data available to him. I then turned to the consideration of three data related
topics in econometrics: errors of measurement, missing observations and incom-
plete data sets, and missing variables. The last topic overlapped somewhat with
the chapter on panel analysis (Chapter 22), since the availability of longitudinal
microdata has helped by providing us with one way of controlling for missing but
relatively constant information on individuals and firms. It is difficult, however, to
shake off the impression that here also, the progress of econometric theory and
computing ability is outracing the increased availability of data and our under-
standing and ability to model economic behavior in increasing detail. While we
tend to look at the newly available data as adding degrees of freedom grist to our
computer mills, the increased detail often raises more questions than it answers.
Particularly striking is the great variety of responses and differences in behavior
across firms and individuals. Specifying additional distributions of unseen param-
eters rarely adds substance to the analysis. What is needed is a better understand-
ing of the behavior of individuals, better theories and more and different
variables. Unfortunately, standard economic theory deals with “representative”
individuals and “big” questions and does not provide much help in explaining the
production or hiring behavior of a particular plant at a particular time, at least
not with the help of the available variables. Given that our theories, while
couched in micro-language, are not truly micro-oriented, perhaps we should not
be asking such questions. Then what are we doing with microdata? We should be
using the newly available data sets to help us find out what is actually going on in
the economy and in the sectors that we are analyzing without trying to force our
puny models on them. 2s The real challenge is to try to stay open, to learn from
the data, but also, at the same time, not drown in the individual detail. We have
to keep looking for the forest among all these trees.

References

Aasness, J. (1983) “Engle Functions, Distribution of Consumption and Errors in Variables”. Paper
presented at the European Meeting of the Econometric Society in Pisa, Oslo: Institute of Econom-
ics.
Aigner, D. J. (1973) “Regression with a Binary Independent Variable Subject to Errors of Observa-
tion”, Journal of Economeirics, 17, 49-59.

25An important issue not discussed in this chapter is the testing of models which is a way of staying
open and allowing the data to reject our stories about them. There is a wide range of possible tests that
models can and should be subjected to. See, e.g. Chapters 5, 13, 14, 15, 18, 19, and 33 and Hausman
(1978) and Hendry (1983).
1510 2. Griliches

Allison, P. D. (1981) “Maximal Likelihood Estimation in Linear Models When Data Are Missing”,
Sociological Methodology.
Anderson. T. W. and C. Hsiao (1982) “Formulation and Estimation of Dvnamic Models Usine Panel
Data”, Journal of Econometrics, 18(l), 47-82.
Beale, E. M. L. and R. J. A. Little (1975) “Missing Values in Multivariate Analysis”, Journal of the
Royal Statistical Society, Ser. B., 37, 129-146.
Berkson, J. (1950) “Are There Two Regressions. v”, Journal of the American Statistical Association, 45,
164-180.
Bhargava, A. and D. Sargan (1983) “Estimating Dynamic Random Effects Models from Panel Data
Coveming Short Time Periods”, Econometrica, 51(6), 1635-1660.
Bielby, W. T.. R. M. Hauser and D. L. Featherman (1977) “Response Errors of Non-Black Males in
Models of the Stratification Process”, in: Aigner and Goldberger, eds., Latent Variables in Socioeco-
nomic Models. Amsterdam: North-Holland Publishing Company, 227-251.
Borus, M. E. (1982) “An Inventory of Longitudinal Data Sets of Interest to Economists”, Review of
Public Data Use, lO(l-2), 113-126.
Borus, M. E. and G. Nestel (1973) “Response Bias in Reports of Father’s Education and Socioeco-
nomic Status”, Journal of the American Statistical Association, 68(344), 816-820.
Bound, J., Z. Griliches and B. H. Hall (1984) “Brothers and Sisters in the Family and Labor Market”.
NBER Working Paper No. 1476. Forthcoming in International Economic Review.
Bowles, S. (1972) “Schooling and Inequality from Generation to Generation”, Journal of Political
Econom.y, Part II, 80(3), S219-S251.
Center for Human Resource Research (1979) The National Longitudinal Survey Handbook. Columbus:
Ohio State University.
Chamberlain, Gary (1977) “An Instrumental Variable Interpretation of Identification in Variance
Components and MIMIC Models”, Chapter 7, in: P. Taubman, ed., Kinometrics. Amsterdam:
North-Holland Publishing Company, 2351254.
Chamberlain. Garv (1980) “Analvsis of Covariance with Oualitative
. Data”. Reuiew of_I Economic
Studies, 47(l), 2251238.’ *
Chamberlain, Gary (1982) “Multivariate Regression Models for Panel Data”, Journal of Econometrics,
18(l), 5-46.
Chamberlain, G. and Z. Griliches (1975) “Unobservables with a Variance-Components Structure:
Ability, Schooling and the Economic Success of Brothers”, International Economic Review, 16(2),
422-449.
Chamberlain, Gary (1977) “More on Brothers”, in: P. Taubman, ed., Kinometrics: Determinants of
Socioeconomic Success Within and Between Families. New York: North-Holland Publishing Com-
pany, 97-124.
Chow, G. C. (1983) Econometrics. New York: McGraw Hill.
Chow, G. C. and A. Lin (1971) “Best Linear Unbiased Interpolation, Distribution and Extrapolation
of Time Series by Related Series”, Review of Economics and Statistics, 53(4), 372-375.
Cole, R. (1969) Error in Provisional Estimates of Gross National Product. Studies in Business Cycles
#21, New York: NBER.
Council on Wage and Price Stability (1977) The Wholesale Price Index: Review and Evaluation.
Washington: Executive Office of the President.
Court, A. T. (1939) “Hedonic Price Indexes with Automotive Examples”, in: The D.ynamics of
Automobile Demand. New York: General Motors Corporation, 99-117.
de Leeuw, F. and M. J. McKelvey (1983) “A ‘True’ Time Series and Its Indicators”, Journal of the
American Statistical Association, 78(381), 37-46.
Dempster, A. P., N. M. Laird and D. B. Rubin (1977) “Maximum Likelihood from Incomplete Data
via the EM Algorithm”, Journal of the Royal Statistical Society, Ser. B, 39(l), l-38.
Dhrymes, P. J. (1974) Econometrics. New York: Springer-Verlag.
Dhrymes, P. J. (1978) Introductory Econometrics. New York: Springer-Verlag.
Diewert, W. E. (1980) “Aggregation Problems in the Measurement of Capital”; in: D. Usher, ed., The
Measurement of Capital, Studies in Income and Wealth. University of Chicago Press for NBER, 45,
433-538.
Eicker, F. (1967) “Limit Theorems for Regressions with Unequal and Dependent Errors”, in:
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley:
University of California, Vol. 1.
Ch. 25: Economic Data Issues 1511

Feldstein, M. and L. Summers (1977) “Is the Rate of Profit Falling?“, Brookings Papers on Economic
Activity, 211-227.
Ferber, R. (1966) “The Reliability of Consumer Surveys of Financial Holdings: Demand Deposits”,
Journal of the American Statistical Association, 61(313), 91-103.
Fisher, F. M. (1969) “The Existence of Aggregate Production Functions”, Econometrica, 37(4),
553-577.
Fisher, F. M. (1980) “The Effect of Sample Specification Error on the Coefficients of ‘Unaffected’
Variables” in: L. R. Klein, M. Nerlove and S. C. Tsiang, eds., Quantitative Economics and
Development. New York: Academic Press, 157-163.
Freeman, R. B. (1984) “Longitudinal Analyses of the Effects of Trade Unions”, Journal of Labor
Economics, 2(l), l-26.
Friedman, M. (1957) A Theory of the Consumption Function. NBER General Series 63, Princeton:
Princeton University Press.
Frisch, R. (1934) Statistical Confluence Analysis by Means of Complete Regression Systems. Oslo:
University Economics Institute, Publication No. 5.
Gordon, R. J. (1982) “Energy Efficiency, User-Cost Change, and the Measurement of Durable Goods
Prices”. in: M. Foss, ed.. NBER. Studies in Income and Wealth.The U.S. National Income and
Products Accounts. Chicago: University of Chicago Press, 47, 205-268.
Gordon, R. J. (1985) The Measurement of Durable Goods Prices, unpublished manuscript.
Gourieroux, C. and A. Monfort (1981) “On the Problem of Missing Data in Linear Models”, Review
of Economic Studies, XLVIII(4), 579-586.
Griliches, Z. (1957) “Specification Bias in Estimates of Production Functions”, Journal of Farm
Economics, 39(l), 8-20.
Griliches, Z. (1961) “Hedonic Price Indexes for Automobiles: An Econometric Analysis of Quality
Change”, in: The Price Statistics of the Federal Government, NBER, 173-196.
Griliches. Z. (1963) “Canital Stock in Investment Functions: Some Problems of Concent and
Measurement”, in! Chrid, et al., eds., Measurement in Economics. Studies in Mathematical Econom-
its and Econometrics in Memory of Yehuda Grunfeld. Stanford: Stanford University Press,
115-137.
Griliches, Z. (1970) “Notes on the Role of Education in Production Functions and Growth
Accounting”, in: W. L. Hansen, ed., Education, Income and Human Capital. NBER Studies in
Income and Wealth. 35, 71-127.
Griliches, Z. (1971) Price Indexes and Quality Change. Cambridge: Harvard University Press.
Griliches, Z. (1974) “Errors in Variables and Other Unobservables”, Econometrica, 42(6), 971-998.
Griliches, Z. (1977) “Estimating the Returns to Schooling: Some Econometric Problems”,
Econometrica, 45(l), l-22.
Griliches, Z. (1979) “Sibling Models and Data in Economics: Beginnings of a Survey”, Journal of
Political Economy, Part 2, 87(5), S37-S64.
Griliches, Z., B. H. Hall and J. A. Hausman (1978) “Missing Data and Self-Selection in Large
Panels”, Annales de L’INSEE, 30-31, 138-176.
Griliches, Z. and J. A. Hausman (1984) “Errors-in-Variables in Panel Data”, NBER Technical Paper
No. 37, forthcoming in Journal of Econometrics.
Griliches, Z. and J. Mairesse (1984) “Productivity and R&D at the Firm Level”, in: Z. Griliches, ed.,
R&D, Patents und Productivity. NBER, Chicago: University of Chicago Press, 339-374.
Griliches, Z. and W. M. Mason (1972) “Education, Income and Ability”, Journal of Political
Economy, Part II, 80(3), S74-S103.
Griliches, Z. and V. Ringstad (1970) “Error in the Variables Bias in Non-Linear Contexts”,
Econometrica, 38(2), 368-370.
Griliches, Z. (1971) Economies of Scale and the Form of the Production Function. Amsterdam:
North-Holland.
Haitovsky, Y. (1968) “Estimation of Regression Equations When a Block of Observations is Missing”,
ASA, Proceedings of the Business and Economic Statistics Section, 454-461.
Haitovskv, Y. (1972) “On Errors of Measurement in Rearession Analysis in Economics”, Interna-
tional S>atistical Review, 40(l), 23-35.
Hall. B. H. (1979) Moments: The Moment Matrix Processor User Manual. Stanford: California
Hall, B. H., %Z.Griliches and J. A. Hausman (1983) “Patents and R&D: Is There A Lag Structure?‘.
NBER Working Paper No. 1227.
1512 2. Griliches

Hamilton, L. C. (1981) “Self Reports of Academic Performance: Response Errors Are Not Well
Behaved”, Sociological Methods and Research, 10(2), 165-185.
Harvey, A. C. and R. G. Pierse (1982) “Estimating Missing Observations in Economic Time Series”.
London: London School of Economics Econometrics Programme Discussion Paper No. A33.
Hauser, R. M. and A. S. Goldberger (1971) “The Treatment of Unobservable Variables in Path
Analysis”, in: H. L. Costner, ed., Sociological Methodology 1971. San Francisco: Jossey-Bass,
81-117.
Hausman, J. A. (1978) “Specification Tests in Econometrics”, Econometrica, 46(6), 1251-1271.
Hausman, J. A. (1982) “The Econometrics of Non Linear Budget Constraints”, Fisher-Schultz
Lecture given at the Dublin Meetings of the Econometric Society, Econometrica, forthcoming.
Hausman, J. A., B. H. Hall and Z. Griliches (1984) “Econometric Models for Count Data with
Application to the Patents- R&D Relationship”, Econometrica, 52(4), 909-938.
Hausman, J. A. and W. E. Taylor (1981) “Panel Data and Unobservable Individual Effects”,
Econometrica, 49(6), 1377-1398.
Hausman, J. A. and M. Watson (1983) “Seasonal Adjustment with Measurement Error Present”.
National Bureau of Economic Research Working Paper No. 1133.
Hausman, J. A. and D. Wise, eds. (1985) S ocial Experimentation. NBER, Chicago: University of
Chicago Press, forthcoming.
Hendry, D. F. (1983) “Econometric Modelling: The ‘Consumption Function’ in Retrospect”, Scottish
Journal of Political Economy, 30, 193-220.
Intriligator, M. D. (1978) Econometric Models, Techniques and Applications. Englewood Cliffs:
Prentice-Hall.
Joreskog, K. and D. Sorbom (1981) LISRELV, Analysis of Linear Structural Relationships by
Maximum Likelihood and Least Squares Method. Chicago: National Educational Resources.
Judge, G. G., W. R. Griffiths, R. C. Hill and T. C. Lee (1980) The Theory and Practice of Econometrics.
New York: Wiley.
Karni. E. and I. Weissman (1974) “A Consistent Estimator of the Slope in a Regression Model with
Errors in the Variables”, Journal of the American Statistical Association, 69(345), 211-213.
Klepper, S. and E. E. Learner (1983) “Consistent Sets of Estimates for Regressions with Errors in All
Variables”, Econometrica, 52(l), 163-184.
Kruskal, W. H. and L. G. Telser (1960) “Food Prices and The Bureau of Labor Statistics”, Journal of
Business, 33(3), 258-285.
Kuznets, S. (1954) Nutional Income and Its Composition 1919- 1938. New York: NBER.
Kuznets, S. (1971) “Data for Quantitative Economic Analysis: Problems of Supply and Demand”.
Lecture delivered at the Federation of Swedish Industries. Stockholm: Kungl Boktryckeriet P. A.
Norsted and Soner.
Little, R. J. A. (1979) “Maximum Likelihood Inference for Multiple Regressions with Missing Values:
A Simulation Study”, Journal of the Royal Statistical Society, Ser. B. 41(l), 76-87.
Little, R. J. A. (1983) “Superpopulation Models for Non-Response”, in: Madow, Olkin and Rubin,
eds., National Academy of Sciences, Incomplete Data in Sample Surveys. New York: Academic
Press, Part VI, II, 337-413.
Little, R. J. A. (1982) “Models for Non-Reponse in Sample Surveys”, Journal of the American
Statistical Association, 77(378), 237-250.
MaCurdy. T. E. (1982) “The Use of Time Series Processes to Model the Error Structure of Earnings in
Longitudinal Data Analysis”, Journal of Econometrics, 18(l), 83-114.
Maddala, G. S. (1971) “The Use of Variance Components Models in Pooling Cross Section and Time
Series Data”, Econometrica, 39(2), 341-358.
Maddala. G. S. (1977) Econometrics. New York: McGraw Hill,
Maddala, G. S. (1983) Limited-Dependent and Qualitative Variables in Econometrics. Cambridge:
Cambridge University Press.
Malinvaud, E. (1980) Stutisticul Methods of Econometrics. 3rd revised ed., Amsterdam: North-Holland.
Manski, C. F. and D. MacFadden, eds. (1981) Structural Analysis of Discrete Dam with Econometric
Applicutions. Cambridge: MIT Press.
Mare, R. D. and W. M. Mason (1980) “Children’s Report of Parental Socioeconomic Status: A
Multiple Group Measurement Model”, Sociological Methods und Research, 9, 178-198.
Marisk M. M., A. R. Olsen and D. B. Rubin (1980) “Maximum-Likelihood Estimation in Panel
Studies with Missing Data”, Sociological Methodology 1980, 9. 315-357.
Ch. 25: Economic Data Issues 1513

Massagli, M. P. and R. M. Hauser (1983) “Response Variability in Self- and Proxy Reports of
Paternal and Filial Socioeconomic Characteristics”, American Journal of Sociology, 89(2), 420-431.
Medoff, J. and K. Abraham (1980) “Experience, Performance, and Earnings”, Quartet+ Journal of
Economics, XVC(4), 703-736.
Morgenstem, 0. (1950) On the Accuracy of Economic Observations. Princeton: Princeton University
Press, 2nd edition, 1963.
Mundlak, Y. (1978) “On the Pooling of Time Series and Cross Section Data”, Econometrica, 46(l),
69-85.
Mundlak, Y. (1980) “Cross Country Comparisons of Agricultural Productivity”. Unpublished
manuscript.
National Academy of Sciences (1979) Measurement and Interpretation of Productivity. Washington,
D.C.
National Academy of Sciences (1983) in: Madow, Olkin and Rubin, eds., Incomplete Data in Sample
Surveys. New York: Academic Press, Vol. 1-3.
National Bureau of Economic Research (1961) The Price Statistics of the Federal Government, Report
of the Price Statistic Review Committee, New York: General Series, No. 73.
National Bureau of Economic Research (1957a) Studies in Income and Wealth, Problems of Capital
Formation: Concepts, Measurement, and Controlling Factors. New York: Arno Press, Vol. 19.
National Bureau of Economic Research (1957b) Studies in Income and Wealth, Problems in Interna-
tional Comparisons of Economic Accounts. New York: Amo Press, Vol. 20.
National Bureau of Economic Research (1958) Studies in Income and Wealth. A Critiaue of the United
States Income and Products Accounts. hew York: Amo Press, Vol. 22. ’ ,
National Bureau of Economic Research (1961) Studies in Income and Wealth, Output, Input and
Productivity Measurement. New York: NBER, Vol. 25.
National Bureau of Economic Research (1969) Studies in Income and Wealth, V. R. Fuchs, ed.,
Production and Productivity in the Service Industries. New York: Columbia University Press, Vol. 34.
National Bureau of Economic Research (1973) Studies in llncome’and / Wealth, M. Moss, ed., The
Measurement of Economic and Social Performance. New York: Columbia University Press, Vol. 38.
National Bureau of Economic Research (1983a) Studies in Income and Wealth, M. Foss, ed., The U.S.
National Income and Product Accounts. Chicago: University of Chicago Press, Vol. 47.
National Bureau of Economic Research (1983b) Studies in Income and Wealth, J. Triplett, ed., The
Measurement of Labor Cost. Chicago: University of Chicago Press, Vol. 48.
National Commission on Employment and Unemployment Statistics (1979) Counting the Labor Force.
Washington: Government Printing Office.
Nijman, Th. E. and F. C. Palm (1985) “Consistent Estimation of a Regression Model with
Incompletely Observed Exogenous Variable”, Netherlands Central Bureau of Statistics, Unpublished
paper.
Pakes, A. (1982) “On the Asymptotic Bias of Wald-Type Estimators of a Straight Line When Both
Variables Are Subject to Error”, International Economic Review, 23(2), 491-497.
Pakes, A. (1983) “On Group Effects and Errors in Variables in Aggregation”, Review of Economics
and Statistics, LXV(l), 168-172.
Pakes, A. and Z. Griliches (1984) “Estimating Distributed Lags in Short Panels with An Application
to the Specification of Depreciation Patterns and Capital Stock Constructs”, Review of Economic
Studies, LI(2), 243-262.
Palm, F. C. and Th. E. Nijman (1984) “Missing Observations in the Dynamic Regression Model”,
Econometrica, November, 52(6), 1415-1436.
P&ash, V. (1974) “Statistical Indicators of Industrial Development: A Critique of the Basic Data”.
International Bank for Reconstruction and Development, DES Working Paper NO. 189.
President’s Committee to Appraise Employment and Unemployment Statistics (1962) Measuring
Employment and Unemployment. Washington: Government Printing Office.
Rosen, S. (1974) “Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition”,
Journal of Political Economy, 82(l), 34-55.
Rubin, D. B. (1976) “Inference and Missing Data”, Biometriha, 63(3), 581-592.
R&es, N. D. (1964) Review of 0. Morgenstem, On the Accuracy of Economic Observations, 2nd
edition, American Economic Review, LIV(4, part l), 445-447.
Schultz, H. (1938) The Theory and Measurement of Demand. Chicago: University of Chicago Press.
1514 Z. Griliches

Stewart, M. B. (1983) “The Estimation of Union Wage Differentials from Panel Data: The Problems
of Not-So-Fixed Effects”. Cambridge: National Bureau of Economic Research Conference on the
Economics of Trade Unions, unpublished.
Stigler, G. J. and J. K. Kindahl (1970) The Behaviour of Industrial Prices, National Bureau of
Economic Research, New York: Columbia University Press.
Theil, H. (1978) Infroduction to Econometrics. Englewood Cliffs: Prentice Hall.
Triplett, J. E. (1975) “The Measurement of Inflation: A Survey of Research on the Accuracy of Price
Indexes”, in: P. H. Earl, ed., Analysis of Inflation. Lexington: Lexington Books, Chapter 2, 19-82.
Triplett. J. E. (1983) “An Essay on Labor Cost”, in: National Bureau of Economic Research, Studies
in Income and Wealth, The Measurement of Labor Cost. Chicago: University of Chicago Press, 49,
l-60.
U.S. Department of Commerce (1979) Gross Nutional Product Improvement Report. Washington:
Government Printing Office.
Usher, D., ed. (1980) The Meusurement of Cupitul, National Bureau of Economic Research: Studies in
Income and We&h. Chicago: University of Chicago Press, Vol. 45.
Van Praag, B. (1983) “The Population-Sample Decomposition in Minimum Distance Estimation”.
Unpublished paper presented at the Harvard-MIT Econometrics seminar.
Vernon, R. (1983) “The Politics of Comparative National Statistics”. Cambridge, Massachusetts,
unpublished.
Waugh. F. V. (1928) “Quality Factors Influencing Vegetable Prices”, Journal of Farm Economics, 10,
185-196.
White, H. (1980) “Using Least Squares to Approximate Unknown Regression Functions”, Inrerna-
tional Economic Review, 21(l), 149-170.
Young, A. H. (1974) “Reliability of the Quarterly National Income and Product Accounts in the
United States, 1947-1971”, Review of Income and Wealth, 20(l), l-39.
Chapter 26

FUNCTIONAL FORMS IN ECONOMETRIC


MODEL BUILDING*
LAWRENCE J. LAU
Stanford University

Contents

1. Introduction 1516
2. Criteria for the selection of functional forms 1520
2.1. Theoretical consistency 1520
2.2. Domain of applicability 1527
2.3. Flexibility 1539
2.4. Computational facility 1545
2.5. Factual conformity 1546
3. Compatibility of the criteria for the selection of functional forms 1547
3.1. Incompatibility of a global domain of applicability and flexibility 1548
3.2. Incompatibility of computational facility and factual conformity 1551
3.3. Incompatibility of a global domain of applicability, flexibility and
computational facility 1552
4. Concluding remarks 1558
Appendix 1 1559
References 1564

*The author wishes to thank Kemreth Arrow, Erwin Diewert, Zvi Griliches, Dale Jorgenson and
members of the Econometrics Seminar at the Department of Economics, Stanford U&ersity, for
helnful comments and discussions. Financial sunnort for this research under srant SOC77-11105 from
the* National Science Foundation is gratefully &cnowledged. Responsibility for errors remains with
the author.

Handbook of Econometrics, Volume III, Edited by 2. Griliches and M.D. Intriligator


0 Elsevier Science Publishers BV, 1986
1516 L. J. Lml

1. Introduction

Econometrics is concerned with the estimation of relationships among observable


(and sometimes even unobservable) variables. Any relationship to be estimated is
almost always assumed to be stochastic. However, the relationship is often
specified in such a way that it can be decomposed into a deterministic and a
stochastic part. The deterministic part is often represented as a known algebraic
function of observable variables and unknown parameters. A typical economic
relationship to be estimated may take the form:

where y is the observed value of the dependent variable, X is the observed value
of the vector of independent variables, cu is a finite vector of unknown constant
parameters and E is a stochastic disturbance term. The deterministic part,
f( X, a), is supposed to be a known function. The functional form problem that
we consider is the ex ante choice of the algebraic form of the function f( X; CX)
prior to the actual estimation. We ask: What considerations are relevant in the
selection of one algebraic functional form over another, using only a priori
information not specific to the particular data set?
This problem of ex ante choice of functional forms is to be carefully dis-
tinguished from that of ex post choice, that is, the selection of one functional
form from among several that have been estimated from the same actual data set
on the bases of the estimated results and/or post-sample predictive tests. The
ex post choice problem belongs properly to the realm of specification analysis and
hypothesis testing, including the testing of nonnested hypotheses.
We do not consider here the choice of functional forms in quanta1 choice
analysis as the topic has been brilliantly covered by McFadden (1984) elsewhere
in this Handbook. In our discussion of functional forms, we draw our examples
largely from the empirical analyses of production and consumer demand because
the restrictions implied by the respective theories on functional forms are richer.
But the principles that we use are applicable more generally.
Historically, the first algebraic functional forms were chosen because of their
ease of estimation. Almost always a functional form chosen is linear in parame-
ters, after a transformation of the dependent variable if necessary. Thus, one
specializes from
Ch. 26: Functional Forms in Econometric Model Building 1517

to

Y= Cfi(x)ai9

where g(a) is a known monotonic transformation of a single variable. Moreover,


it is often desirable to be able to identify the effect of each independent variable
on the dependent variable separately. Thus, one specializes further to:

or

g(Y)= CfiCxiJai7
I

so that (Y~can be interpreted as the effect of a change in Xi (or more precisely


fi( Xi)). Finally, for ease of computation and interpretation and for aesthetic
reasons, the fi( )‘s are often chosen to be the same f( ), resulting in:

Y = C.f( xi)ai7 (1.1)

or

f?(Y)= Cf(4Jai* (1.2)

An example of eq. (1.1) is the widely used linear functional form in which
f( Xi) = Xi. An example of eq. (1.2) is the double-logarithmic functional form in
which g(y) = In y and f( Xi) = In Xi. It has the constant-elasticity property with
the advantage that the parameters are independent of the units of measurement.
In addition, functional forms of the type in eqs. (1.1) and (1.2) may be
interpreted as first-order approximations to any arbitrary function in a neighbor-
hood of some X= X,, and that is one reason why they have such wide currency.
However, linear functions, while they may approximate whatever underlying
function reasonably well for small changes in the independent variables, fre-
1518 L. J. L.uu

quently do not work very well for many others purposes. For example, as a
production function, it implies perfect substitution among the different inputs
and cons&~ marginal products. It cannot represent the phenomenon of di-
minishing marginal returns. Moreover, the perfect substitution property of the
linear production function has the unacceptable implication that almost always
only a single input will be employed and an ever so slight change in the relative
prices of inputs will cause a complete shift from one input to another.
Another linear-in-parameters functional form that was used is that of the
Leontief or fixed-coefficients production function in its derived demand functions
representation:

Xi=ffiY, i=l ,..., m,

where Xi is the quantity of the ith input, i =l,. . . , m and Y is the quantity of
output. However, this production function implies zero substitution among the
different inputs. No matter what the relative prices of inputs may be, the relative
proportions of the inputs remain the same. This is obviously not a good
functional form to use if one is interested in the study of substitution possibilities
among inputs.
The first widely used production function that allows substitution is the
Cobb-Douglas (1928) production function, which may be regarded as a special
case of eq. (1.2):

lnY=a,+CailnXi, Cai=l. 0.3)


i i

However, it should be noted that the Cobb-Douglas production function was


discovered not from a priori reasoning but through a process of induction from
the empirical data. Cobb and Douglas observed that labor’s share of national
income had been approximately constant over time and independent of the
relative prices of capital and labor. They deduced, under the assumptions of
constant returns to scale, perfect competition in the output and input markets,
and profit maximization by the firms in the economy that the production function
must take the form:

y = AjpL(1-a) 3 0.4
where K and L are the quantities of capital and labor respectively. Fq. (1.4)
reduces to the form of eq. (1.3) by taking natural logarithms of both sides. The
Cobb-Douglas production function became the principal work horse of empirical
analyses of production until the early 1960s and is still widely used today.
Ch. 26: Functional Forms in Econometric Model Building 1519

The next advance in functional forms for production functions came when
Arrow, Chenery, Minhas and Solow (1961) introduced the Constant-Elasticity-
of-Substitution (C.E.S.) production function:

Y=y[(14)KP+GLP]“P, 0 -5)

where y, 6 and p are parameters. This function is not itself linear in parameters.
However, it gives rise to average productivity relations which are linear in
parameters after a monotonic transformation:

03

where p, r, and w are the prices of output, capital and labor respectively and (Y,p
and u are parameters. The C.E.S. production function was discovered, again
through a process of induction, when the estimated u from eq. (1.6) turned out to
be different from one as one would have expected if the production function were
actually of the Cobb-Douglas form.
Unfortunately, although the C.E.S. production function is more general than
the Cobb-Douglas production function (which is itself a limiting case of the
C.E.S. production function), and is perfectly adequate in the two-input case, its
generalizations to the three or more-input case impose unreasonably severe
restrictions on the substitution possibilities. [See, for example, Uzawa (1962) and
McFadden (1963)]. In the meantime, interest in gross output technologies dis-
tinguishing such additional inputs as energy and raw materials continued to grow.
Almost simultaneously advances in the computing technology lifted any con-
straint on the number of parameters that could reasonably be estimated. This led
to the growth of the so-called “flexible” functional forms, including the gener-
alized Leontief functional form introduced by Diewert (1971) and the tran-
scendental logarithmic functional form introduced by Christensen, Jorgenson and
Lau (1973). These functional forms share the common characteristics of linearity-
in-parameters and the ability of providing second-order approximations to any
arbitrary function. In essence they allow, in addition to the usual linear terms, as
in eqs. (1.1) and (1.2), quadratic and interaction terms in the independent
variables.
Here we study the problem of the ex ante choice of functional form when the
true functional form is unknown. (Obviously, if the true functional form is
known, we should use it.) We shall approach this problem by considering the
relevant criteria for the selection of functional forms.
1520 L. J. ZAU

2. Criteria for the selection of functional forms

What are some of the criteria that can be used to guide the ex ante selection of an
algebraic functional form for a particular economic relationship? Neither eco-
nomic theory nor available empirical knowledge provide, in general, a sufficiently
complete specification of the economic functional relationship so as to determine
its precise algebraic form. Consequently the econometrician has wide latitude in
deciding which one of many possible algebraic functional forms to use in building
an econometric model. Through practice over the years, however, a set of criteria
has evolved and developed. These criteria can be broadly classified into five
categories:
(1) Theoretical consistency;
(2) Domain of applicability;
(3) Flexibility;
(4) Computational facility; and
(5) Factual conformity.
We shall discuss each of these criteria in turn.

2. I. Theoretical consistency

Theoretical consistency means that the algebraic functional form chosen must be
capable of possessing all of the theoretical properties required of that particular
economic relationship for an appropriate choice of parameters. For example, a
cost function of a cost-minimizing firm must be homogeneous of degree one,
nondecreasing and concave in the prices of inputs, and nondecreasing in the
quantity of output. Thus, any algebraic functional form selected to represent a
cost function must be capable of possessing these properties for an appropriate
choice of the parameters at least in a neighborhood of the prices of inputs and
quantity of output of interest. For another example, a complete system of demand
functions of a utility-maximizing consumer must be summable,’ homogeneous of
degree zero in the prices of commodities and income or total expenditure and
have a Jacobian matrix which gives rise to a negative semidefinite and symmetric
Slutsky substitution matrix. Thus, any algebraic functional form selected to
represent a complete system of consumer demand functions must be capable of
possessing these properties for an appropriate choice of the parameters at least in
a neighborhood of the prices of commodities and income of interest.
Obviously, not all functional forms can meet these theoretical requirements, not
even in a small neighborhood of the values of the independent variables of

‘Summability means that the sum of expenditures on all commodities must be equal to income or
total expenditure.
Ch. 26: Functional Forms in Econometric Model Building 1521

interest. However, a sufficiently large number of functional forms will satisfy the
test of theoretical consistency, at least locally, that other criteria must be used to
select one from among them. Moreover, many functional forms, while they may
satisfy the theoretical consistency requirement, are in fact readily seen to be
rather poor choices. For example, the cost function

c(P>y)=y
[
c”iPi
i=l I 3

where piis the price of the ith input and Y is the quantity of output and (Y~> 0,
i=l >*.., m, satisfies all the theoretical
requirements of a cost function. It is
homogeneous of degree one, nondecreasing and concave in the prices of inputs
and nondecreasing in the quantity of output. However, it is not regarded as a
good functional form in general because it allows no substitution among the
inputs. The cost-minimizing demand functions corresponding to this cost function
are given by Hotelling (1932)-Shephard (1953) Lemma as:

= aiY, i=l ,...,m.

Thus all inputs are employed in fixed proportions. While zero substitution or
equivalently fixed proportions may be true for certain industries and processes, it
is not an assumption that should be imposed a priori. Rather, the data should be
allowed to indicate whether there is substitution among the inputs, which brings
up the question of “flexibility” of a functional form to be considered below.
Yet sometimes considerations of theoretical consistency alone, even locally, can
rule out many functional forms otherwise considered acceptable. This is demon-
strated by way of the following two examples, one taken from the empirical
analysis of producer behavior and one from consumer behavior.
First, we consider the system of derived demand functions of a cost-minimiz-
ing, price and output-taking firm with the constant-elasticity property:

ln Xi = ai + e &ln pj + &ln Y, i=1,2 ,*.-, m (2.1)


j-l

where Xi is the quantity demanded of the ith input, pj is the price of the jth
input, and Y is the quantity of output. The elasticities of demand with respect to
1522 L. J. Lau

own and cross prices and the quantity of output are all constants:

3 =pij, i, j=l ,**-, m,


3

z =&, i=l,...,m.

Functional forms with constant elasticities as parameters are often selected over
other functional forms with a similar degree of ease of estimation because the
values of the parameters are then independent of the units of measurement of the
variables. It can be readily verified that in the absence of further restrictions on
the values of the parameters /Iij’s and &r’s, such a system of derived input
demand functions is flexible, that is, it is capable of attaining any given value of
X (necessarily positive), h’X’/ap and aX/aY at any specified positive values of
p=I_i and Y=Y through a suitable choice of the parameters &,‘s and /Iiv’s.
However, if it were required, in addition, that the system of derived demand
functions in eq. (2.1) be consistent with cost-minimizing behavior on the part of
the producer, at least in a neighborhood of the prices of input and the quantity of
output, then certain restrictions must be satisfied by the parameters pij’s and
piv’s. Specifically, the function:

C(p,Y)= 2 exp clli+ 5 (/3ij+6ij)lnp,+&lnY ,


(2.2)
i=l i j=l 1

l,i=j
where 6,, =
0, otherwise

must have all the properties of a cost function and its partial derivatives with
respect to pi:

(Bki+ aki)exP ak + 5 (bkj + skj> In Pj + &dn Y


i j=l

i=l ,...,m, (2.3)

must be identically equal to the original system of derived demand functions in


eq. (2.1):

Xi=exp ai + 5 /?ijlnpj+P,ylnY 9 i’=l ,.-., m. (2.4)


i j=l i
Ch. 26: Funciional Forms in Econometric Model Building 1523

A cost function is homogeneous of degree one in the prices of inputs and the
first-order partial derivative of a cost function with respect to the price of an
input is therefore homogeneous of degree zero, implying:

f Pij=OB i=l >..‘, m. (2.5)


j=l

A cost function is also concave in the prices of inputs, which implies:

&, exp (Y~


+ 5 fiijln pi + &ln Y
i j-l i
= 10, i=l 3.*-, m.
Pi

We conclude that fiii I; 0. Moreover, since the value of a second-order cross-par-


tial derivative is independent of the order of differentiation wherever it exists,

if j, i, j=l >*.*,m,

which implies:

ax,=a.
aPj api J’
i # j, i, j=l,..., m. (2.6)

Applying eq. (2.6) to eq. (2.4) yields:

Pi,exP ai + 5 PikIn Pk + Pivln y


k-l

PjieXP LyI + i &lnPk + bplny


= k=l
i # j, i, j=l ,...,m. (2.7)
Pi
There are three possible cases. First, pij = fiji = 0, in which case each of the two
inputs has a zero cross elasticity with respect to the price of the other input.
Second, pij > 0 and fiji > 0 (they cannot have opposite signs because of the
positivity of the exponential function and the nonnegativity of prices), in which
case the relative expenditures on the two inputs are constants independent of the
prices of inputs and quantity of output, implying the following restrictions on the
1524 L. J. Luu

parameters:

Plk- Pjk = O? k # i, j; k=l,...,m;


p,;+1-pji=o;
Pi,-(Pjj+l)=o; (2.8)
PiY-PjY='Y

a, - Pjiea/ = 0.2
P,,e

We note that for this case, (pli + 1) > 0 and ( pjj + 1) > 0, implying that the
own-price elasticities of the i th and j th inputs must be greater than minus unity
(or less than unity in absolute value)-a significant restriction. We further note
that if & # 0 for some k, k # i, j, Pjk z 0 for the same k. But if Pik # 0 and
Pjk f 0 by eq. (2.7), Pki f 0 and piXi/p,X, = Pki/&, a constant, and hence the
relative expenditures of all three inputs, i, j and k, are constants. Moreover, the
proportionality of expenditures implies that pii + 1 - Pki = 0 for all k such that
Pik # 0, k # i. Hence all &‘s, k # i, must have the same sign-positive, in this
case. All Pki’s, k # i, must have the same positive sign and magnitude. And
PiY= bjY=PkY'
By considering all the i’s it can be shown that the inputs are separable into n,
n I m, mutually exclusive and jointly exhaustive groups such that

(1) Cross-price elasticities are zero between any two commodities belonging to
different groups;
(2) Relative expenditures are constant within each group.

Such a system of derived demand functions corresponds to a cost function of


the form:

c(P,y)= 5 c,(P’,Y), (2.9)


;=1

where pJ is the vector of prices of the jth group of inputs and each C,( ) has the
form:

*This restriction results from setting the prices of all inputs and the quantity of output to unities.
Ch. 26: Functional Form in Econometric Model Building 1525

where

A,.>O; aii > 0, i; Caji=l; fi,>O, j=l,..., n.

Third, pij < 0 and pij < 0, in which case the relative expenditures on the two
inputs are again constants independent of the prices of inputs and quantity of
output, implying the same restrictions on the parameters as those in eq. (2.8).
However, as derived earlier, all &‘s that are nonzero must have the same
sign-negative, in this case. But then cy= i& cannot be zero as required by zero
degree homogeneity. We conclude that a cost function of the form in eq. (2.9) is
the only possibility, with rather restrictive implications.
From this example we can see that the requirement of theoretical consistency,
even locally, may impose very strong restrictions on an otherwise quite flexible
functional form.
Second, we consider the complete system of demand functions of a utility-max-
imizing, budget-constrained consumer with the constant-elasticity property: 3

In Xi = ai + F pijln pj + piMln M, i=l,2 ,.-., m; (2.10)


j-l

where Xi is the quantity demanded of the i th commodity, pj is the price of the


jth commodity, and M is income (or equivalently total expenditure). The
elasticities of demand with respect to own and cross prices and to income are all
constants:

z=flij, J
i, j-1 ,..., m,

ggj = &, i=l ,...,m.

This is also known as the double-logarithmic system of consumer demand


functions. It can be readily verified that in the absence of further restrictions on
the values of the parameters pii’s and &,‘s, such a system of consumer demand
functions is flexible, that is, it is capable of attaining any given value of X
(necessarily positive), aX’/h’p and aX/aM at any specified positive values of
p = p and M = &? through a suitable choice of the parameters pij’s and PiM’s.
However, if it were required, in addition, that the system of consumer demand
functions in eq. (2.10) be consistent with utility-maximizing behavior on the part
of the consumer, at least in a neighborhood of the prices of commodities and

3Such a system was employed by Schultz (1938), Wold with Jureen (1953) and Stone (1953).
1526 L. J. Lau

income, it is necessary that the system of consumer demand functions satisfies


summability, that is:

ipixj=
i-l
2 exp(a,+
C
i=l j=l
(Pijf6,j)lnP,+Bi~lnM)

=M (2.11)

identically. It will be shown that (local) summability alone, through eq. (2.11),
imposes strong restrictions on the parameters pi,‘s and pjM’s.
By dividing both sides by M, eq. (2.11) can be transformed into:

i=l
=
exp
l 0~~+ 2
j=l
( pij + aij)ln pj + ( PiM - 1)ln M
I
=l. (2.12)

Differentiating eq. (2.12) with respect to In pk twice, we obtain:

m I m \
iFl(Pik+Sik)2exP{ai+ C (P;j+Gij)ln~,+(~i,-l)lnM}=o~
j=l

k=l,..., m. (2.13)

But

(Pik + 6ik)22 ‘2 i, k =l,..., m,


and

exp ai+ f (Pij+6ij)lnpj+(pi,-l)lnM ‘O, i=l >-.*, m.


j-l I

Thus, in order for the left-hand side of eq. (2.13) to be zero, one must have:

(Pi/c + 6ik) = OY
i, k =l,..., m.

Differentiating eq. (2.12) with respect to In M twice, we obtain:

5 (&M-1)2exp a,+ f (/?i,+Sij)lnpj+(&,-l)lnM (2.14)


i-l j=l

which by a similar argument implies

( PiM-l) = ‘3 i=l ,...> m.


Ch. 26: Functional Forms m Econometric Model Building 1527

We conclude that (local) summability alone implies that the system of consumer
demand functions must take the form:

lnXj=cri-lnp,+lnMi, i=l ,..., m; fJ e”l=1, (i.15)


i=l

which is no longer flexible.4 For this system, the own-price elasticity is minus
unity, the cross-price elasticities are zeroes, and the income elasticity is unity for
the demand function of each and every commodity.
We conclude that theoretical consistency, even if applied only locally, can
indeed impose strong restrictions on the admissible range of the values of the
parameters of an algebraic functional form. It is essential in any empirical
application to verify that the algebraic functional form remains reasonably
flexible even under all the restrictions imposed by the theory. We shall return to
the concept of “flexibility” in Section 2.3 below.

2.2. Domain of applicability

The domain of applicability of an algebraic functional form can refer to a number


of different concepts. The most common usage of the domain of applicability
refers to the set of values of the independent variables over which the algebraic
functional form satisfies all the requirements for theoretical consistency. For
example, for an algebraic functional form for a unit cost function C( p; a), where
(Yis a vector of parameters, the domain of applicability of the algebraic functional
form, for given (Y, consists of the set

{PIP~O; c(p;+o; v ct p; a) 2 0; v2C( p; a) negative semidefinite}.

For an algebraic functional form for a complete system of consumer demand


functions, X( p, M; a), the. domain of applicability, for given (Y,consists of the set

{P, WP, M20; X(pdwd20;

X(Xp, AM; a) = X(p, M; a); and

the corresponding Slutsky substitution matrix being symmetric

and negative semidefinite}.

4This result is well known. The proof here follows Jorgenson and Lau (1977) which contains a more
general result.
1528 L. J. Lau

We shall refer to this concept of the domain of applicability as the extrapolative


domain since it is defined on the space of the independent variables with respect to
a given value of the vector of parameters (Y.
It would be ideal if the extrapolative domain of applicability consists of all
nonnegative (or positive) prices in the case of a unit cost function or of all
nonnegative (or positive) prices and incomes in the case of a complete system of
consumer demand functions for any value of the vector of parameters (Y.
Unfortunately this is in general not the case.
The first question that needs to be examined is thus: for any algebraic
functional form f(X; cy), what is the set of (Y such that f( X, a) is theoretically
consistent for the whole of the applicable domain? For an algebraic functional
form for a unit cost function, the applicable domain is normally taken to be the
set of all nonnegative (positive) prices of inputs.5 For an algebraic functional
form for a complete system of consumer demand functions, the applicable
domain is normally taken to be the set of all nonnegative (positive) prices of
commodities and incomes.6 If, for given (Y, the algebraic functional form f( X, CY)
is theoretically consistent over the whole of the applicable domain, it is said to be
globally theoretically consistent or globally valid. For many functional forms,
however, it may turn out that there is no such (Y, such that f( X; CY)is globally
valid, or that the set of such admissible a’s may be quite small relative to the set
of possible a’s. Only in very rare circumstances does the set of admissible (Y’S
coincide with the set of possible (Y’s.
We have already encountered two examples in Section 2.1 in which the set of
admissible values of the parameters that satisfy the requirements of theoretical
consistency is a significantly reduced subset of the set of possible values of the
parameters. For the system of constant-elasticity cos&minimizing input demand
functions, the number of independent parameters is reduced from m(inputs)X
(m + 2)(la,; mpij’s and l&) parameters to at most 2m parameters by the
requirements of local theoretical consistency. It may be verified, however, that
under the stated restrictions on its parameters, the cost function in eq. (2.9) as
well as the system of constant-elasticity input demand functions that may be
derived from it, are globally valid. Similarly, for the complete system of constant-
elasticity consumer demand functions, the number of independent parameters is
reduced from m (commodities)x(m + 2) (la,; m&;‘s and l&) to (m - 1)
parameters by the requirements of local summability. It may be verified, however,
that under the stated restrictions on its parameters (own-price elasticities of - 1;
cross-price elasticities of 0 and income elasticities of l), the complete system of
constant-elasticity consumer demand functions is globally valid.

51t is possible, and sometimes advisable, to take the applicable domain to be a compact convex
subset of the set of all nonnegative prices.
61t is possible, and sometimes advisable, to take the applicable domain to be a compact convex
subset of the set of all nonnegative prices and incomes.
Ch. 26: Functional Forms in Econometric Model Building 1529

These two examples share an interesting property - for given a, if the algebraic
functional form is locally valid, it is globally valid. This property, however, does
not always hold. We shall consider two examples of unit cost functions- the
generalized Leontief unit cost function introduced by Diewert (1971) and the
transcendental logarithmic unit cost function introduced by Christensen,
Jorgenson and Lau (1973).
The generalized Leontief unit cost function for a single-output, two-input
technology takes the form:

c( p1, pz) = ‘yop1+ qP:"P:'"+ U2P2* (2.16)

Local theoretical consistency requires that in a neighborhood of some price


(Pi? PZ)?

c(F,,P,)lo;
VC(F1, P,) 2 0; (2.17)

v2C(PI,p2)negative semidefinite.

We note that a change in the units of measurement of the inputs leaves the values
of the cost function and the expenditures unchanged. Without loss of generality,
the price per unit of any input can be set equal to unity at any specified set of
positive prices by a suitable change in the units of measurement. The parameters
of the cost function, of course, must be appropriately resealed. We therefore
assume that the appropriate resealing of the parameters have been done and take
(pi, p2) to be (1,l). By direct computation:

C(l,l) = (Ya+ (Yi+ (Y2,

vC(l,J) =
I“o+h
-
a2
1+ $a,

a1
)

a1

V2C(Ll)=
i-
ds, _(y
4 -1 4

4l
.

It is clear that by choosing (Y~ to be positive and sufficiently large all three
conditions in eq. (2.17) can be strictly satisfied at (l,l). We conclude that for local
theoretical consistency (pi positive and sufficiently large is sufficient. (Actually (Y~
nonnegative is necessary.)
1530 L. J. Llu

We shall now show that (pi positive and sufficiently large alone is not sufficient
for global theoretical consistency. Global theoretical consistency requires that

cc Pl, P2) = ‘yop1+ qPy2Py2 + qP2 2 0; (2.18)

1 20; (2.19)

1
-l/2 -l/2
iP1 P2
negative semidefinite;
-1 l/2 -3,‘2 ’
4P1 P2

(2.20)

for all pl, p2 2 0.


First, note that as long as (pi 2 0, negative semidefiniteness of the Hessian
matrix of the unit cost function always holds. Second, if a0 < 0, then for
sufficiently large p1 and sufficiently small p2, vC( pl, p2) will fail to be nonnega-
tive. We conclude that for global monotonicity, a0 2 0 and similarly a2 2 0. If (Ye,
(pi and a2 are all nonnegative, eq. (2.18) will be nonnegative for all nonnegative
prices. We conclude that the restrictions

(Ye2 0; cW,>O; (Y22 0, (2.21)

are necessary and sufficient for global theoretical consistency of the generalized
Leontief unit cost function.
The transcendental logarithmic unit cost function for a single-output, two-input
technology takes the form:

lnC( ply p2) = a0 + allnpl + Cl- alb p2

+ +ln pf - &iln piln p2

P
+ +ln pi. (2.22)

Local theoretical consistency at (1,l) requires that:

C(l,l) = eao 2 0,

vC(1,l) =
I eao(l
- q)1ea”(y1 20, (2.23)

1
%(l-%)-Pll
negative semidefinite,
-(l-+1+&1 ’
Ch. 26: Functional Forms in Econometric Model Building 1531

eao is always greater than zero. 12 (pi 2 0 is necessary and sufficient for vC(l,l)
to be nonnegative. (~r((~r- l)+ j3i1 I 0 is necessary and sufficient for v2C(1, 1) to
be negative semidefinite. The set of necessary and sufficient restrictions on the
parameters for local theoretical consistency at (1,l) is therefore:
12cyr20; “r(ol,-l)+&rIo. (2.24)

We shall now show that the conditions in eq. (2.24) are not sufficient for global
theoretical consistency. Global theoretical consistency requires that

ChJ,~ P2) = exp( (~a+ cy,ln pr + (1 - cY)lnp2 + +ln p: - Prrln prln pZ

P
+*lnpi 20 (2.25)
i

vC(P,,P,)'=C
1
a1 + Pllln p1

PI
- &11n P2

a2c c
x (~-,)-&&P~+P~&P~
P2 I-,. (2.26)

-=-(~l+Plllnpl-Pllln ~~)((~~-1+Plllnpl-Plllnp2)
aPf P:
+Plllo, (2.27)

for all pl, p2 > 0.’


Equation (2.27) is necessary and sufficient for the negative semidefiniteness of
v2C( pl, p2) because C( pl, p2) is homogeneous of degree one. First, note that
eq. (2.25) is always satisfied because of the positivity of the exponential function.
Second, because the range of In p1 (and In p2) for positive prices is from minus
infinity to infinity, no matter what the sign of &i may be, as long as it is nonzero,
one can make In p1 arbitrarily large (positive) or small (negative) by choosing p1
to be arbitrarily large or small, and thus causing the nonnegativity of vC( pl, p2)
to fail. Thus, for global monotonicity, &r = 0. If 12 (pi 2 0 and &i = 0, eq. (2.27)
reduces to:

“&t-l) 1o
,
P:

which will always be satisfied. We conclude that the restrictions:

l>a,20; t%i = 0, (2.28)

‘The logarithmic function is not defined at 0.


1532 L. J. Lou

are necessary and sufficient for global theoretical consistency of the transcenden-
tal logarithmic unit cost function.
We shall show later that under the necessary and sufficient restrictions for
global theoretical consistency on their parameters both the generalized Leontief
unit cost function and the transcendental logarithmic unit cost function lose their
flexibility.
Having established that functional forms such as the generalized Leontief unit
cost function and the transcendental logarithmic unit cost function can be
globally valid only under relatively stringent restrictions on the parameters, but
that they can be locally valid under relatively less stringent restrictions we turn
our attention to a second question, namely, characterizing the domain of theoreti-
cal consistency for a functional form when it fails to be global.
As our first example, we consider again the generalized Leontief unit cost
function. We note that (pi 2 0 is a necessary condition for local theoretical
consistency. Given ai 2 0, eq. (2.20) is identically satisfied. The set of prices of
inputs over which the generalized Leontief unit cost function is theoretically
consistent must satisfy:

C(P,, P2) = OP, + qP:‘“P:‘” + (y2P2 2 0. (2.29)

vc( P1, p2) = ao


+t”p;,$yf;;0.
[ a2 + hP1 P2 1 2 (2.30)

If eq. (2.30) holds, eq. (2.29) must hold because

c( PI, P2) = m PI, P2).P.

We conclude that the domain of theoretical consistency consists of the set of


prices which satisfy eq. (2.30). Eq. (2.30) can be rewritten as:

(2.31)

Eq. (2.31) thus defines the domain of theoretical consistency of the generalized
Leontief unit cost function. If (1,l) were required to be in this domain then the
additional restrictions of:

(2.32)

must also be satisfied.


Ch. 26: Functional Forms in Econometric Model Building 1533

Next we consider the transcendental logarithmic unit cost function. We note


that 12 (pi 2 0 and al(al -l)+ &i I 0 are necessary conditions for theoretical
consistency if (1,l) were required to be in the domain. If &i # 0, we have seen
that the translog unit cost function cannot be globally theoretically consistent. We
consider the cases of pii > 0 and pi1 < 0 separately. If PI1 > 0, it can be shown
that the domain of theoretical consistency is given by:

(2.33)

where + 2 (1- a)a 2 pii > 0. If pi1 < 0, it can be shown that the domain of
theoretical consistency is given by:

Our analysis shows that both the generalized Leontief and the translog unit
cost functions cannot be globally theoretically consistent for all choices of
parameters. However, even when global theoretical consistency fails, there is still
a set of prices of inputs over which theoretical consistency holds and this set may
well be large enough for all practical purposes. The question which arises here is
that given neither functional form is guaranteed to be globally theoretically
consistent, is there any objective criterion for choosing one over the other?
One approach that may provide a basis for comparison is the following: We
can imagine each functional form to be attempting to mimic the values of C, VC
and v2C at some arbitrarily chosen set of prices of inputs, say, without loss of
generality, (1,l). Once the values of C, VC and v2C are given, the unknown
parameters of each functional form is determined. We can now investigate,
holding C, VC and v2C constant, the domain of theoretical consistency of each
functional form. If the domain of theoretical consistency of one functional form
always contains the domain of theoretical consistency of the other, no matter
what the values of C, VC and v2C are, we say that the first functional form
dominates the second functional form in terms of extrapolative domain of
applicability. In general, however, there may not be dominance and one func-
tional form may have a larger domain of theoretical consistency for some values
of C, vC and v2C and a smaller domain for other values.
We shall apply this approach to a comparison of the generalized Leontief and
transcendental logarithmic unit cost functions in the single-output, two-input
case.

‘See Lau and Schaible (1984) for a derivation. See also Caves and Christensen (1980).
1534 L. J. Lau

We choose (1,l) to be the point of interpolation. We let

C&l) =1,9

1
(2.35)
vC(l,l) = k2
[ l-k, ’

and

v2C(l,l) = [ ;ql _k;3]7

where 1 r k, 2 0 and k, 2 0. Eq. (2.35) with k, and k, ranging through all of


their admissible values represents all the theoretically consistent values that can
possibly be attained by a unit cost function, its gradient and its Hessian matrix at
(I, I).
We need to establish the rules that relate the values of the parameters to the
values of C, vC, and v2C at (1,l). We shall refer to such rules as the rules of
interpolation. For the generalized Leontief unit cost function, the rules of interpo-
lation are:

c(1,1)=1=cw,+a,+a2,

vc(lJ)= [I:;2]= [ ;;:;I,

which imply:

a,=4k3,

a,=k,-2k3, (2.36)

a2 = (1- k,)-2k,.

It can be verified that (Ye+ (pi + a2 is indeed equal to unity. Thus, the generalized

9C(1, 1) may be set equal to any positive constant by an appropriate resealing of all the parameters.
We choose C(l, 1) = 1 for the sake of convenience.
Ch. 26: Funciional Forms in Econometric Model Building 1535

Leontief unit cost function may be rewritten in terms of k, and k, as:

C(~1, ~2) = (k, -%)p, +4k3p:‘2p:‘2+ (l- k, -2k,)p,. (2.37)

For the translog unit cost function, the rules of interpolation are:

C(l,l) =l= e”o*

which imply:

ao=o,
q=kz, (2.38)
&I=-k3+k2(l-k2).

Thus, the translog unit cost function may be rewritten as:

lnC(p,,p,)=k,lnp,+(l-k,)lnp,
+ [k,(l+)-k31 (hp >2
1
2
- [k&-k,)-k,bv4w,
+ [k,(l-kd-k31 (lnp2)2
(2.39)
2

We can now compare the domains of theoretical consistency of the two


functional forms holding k, and k, constant. For the generalized Leontief unit
cost function, the domain of theoretical consistency is defined by eq. (2.31) as:

I I[]
a1
ao T Pl
l/2
2 0,
a1 l/2
y a2 P2

l/2
or

Ii 1
k, - 2k, 2k3
(2.40)
2k3
(l- k,)-2k, ;;I2 “’
1536 L‘.J.Lau

If k, - 2k, 2 0 and (1 - k,)-2k, 2 0, then the domain of theoretical consistency


is the whole of the nonnegative orthant of R*. If k, -2k, 2 0 and (l- k,)-2k,
-c 0, then the domain of theoretical consistency is given by:

1.
EL, (1- k,)-2k, *
(2.41)
P2 - 2k,

If k, - 2k, -c 0 and (1 - k,)-2k, 2 0, then the domain of theoretical consistency


is given by:

(2.42)

Finally if k, -2k, < 0 and (1- k2) = 2k, < 0, then the domain of theoretical
consistency is given by:

( k2y;k3)2k~2
[(l-$-y*. (2.43)

For the translog unit cost function, the domain of theoretical consistency is
defined by eqs. (2.33) and (2.34). If pii = - k, + k,(l- k2) = 0, the domain of
theoretical consistency is the whole of the positive orthant of R* (and may be
uniquely extended to the whole of the nonnegative orthant of R*). If &i = - k,
+ k,(l - k2) > 0, then the domain of theoretical consistency is given by:

exp((f+\/i-[k,(l-k,)-k,] -k,)/[k,(l-k,)-k,])>E

>-exp((f-/$--[k,(l-k,)-k,] -k,)/[k,(l-k,)-k3]). (2.44)

If pii = - k, + k,(l - k2) < 0, then the domain of theoretical consistency is given
by:

exp{-k,/[k,(l-k2)-k,]} ~~~exp{(l-k2)/[k,(l-k2)-k;]}.

(2.45)

With these formulas we can compare the domains of theoretical consistency for
different values of k, and k, such that 12 k, 2 0 and k, 2 0. First, suppose
k, = 0, then k, -2k, 2 0 and (1- k,)-2k, 2 0 and the domain of theoretical
consistency for the generalized Leontief unit cost function is the whole of the
Ch. 26: Functional Forms in Econometric Model Building 1537

nonnegative orthant of R2. k, = 0 i mplies that pi1 = k,(l- k2) 2 0. Thus, the
domain of theoretical consistency for the translog unit cost function is given by:

exP( (3 + \la- - k,),'k,(l - k,)) 2 E

which is clearly smaller than the whole of the nonnegative orthant of R*. We note
that the maximum and minimum values of k,(l - k,) over the interval [0, l] is +
and 0 respectively. Given k, = 0, if k,(l- k,) = 0, pII = 0, which implies that
the domain of theoretical consistency is the whole of the nonnegative orthant of
R2. If k,(l - k2) = $, pII = a, and the domain of theoretical consistency reduces
to a single ray through the origin defined by pi = p2. If k,(l- k2) = $, (k2 = )),
the domain of theoretical consistency is given by:

e312=4.48kfi21.
P2

Overall, we can say that the domain of theoretical consistency of the translog unit
cost function is not satisfactory for k, = 0.
Next suppose k, = k,(l - k,) (which implies that k, I a), then either

k,-2k,=k,-2k2+2k;
= k2(2k2 - 1) < 0,

or

(1- k,)-2k, = (1- k,)-2k,(l- k2)


= (1- k,)(l-2k,) < 0,

or

If k, = 4, k, = a, and the domain of theoretical consistency of the generalized


Leontief unit cost function remains the whole of the nonnegative orthant of R2.
However, if either of the first two cases is true (they cannot both be true), then the
domain of theoretical consistency for the generalized Leontief unit cost function
will be smaller than the whole of the nonnegative orthant of R2. k, = k,(l - k2)
implies that pii = 0. Thus the domain of theoretical consistency for the translog
unit cost function is the whole of the positive orthant of R2. We conclude that
1538 L. .J. Lau

neither functional form dominates the other. The cases of k, = 0 and k, = k,(l -
k2) correspond approximately to the Leontief and Cobb-Douglas production
functions respectively.
How do the two functional forms compare at some intermediate values of k,
and k,? Observe that the value of the elasticity of substitution at (1,l) is given by:

C(LW,,(L1)
a(lJ) = C,(l,l)C*(l,l) ’

= b’[k,(l- &)I.
If we let k, = ), (l- k2) = $, then a(l,l) = a is achieved at k, = i. At these
values of k, and k,, the domain of theoretical consistency of the generalized
Leontief unit cost function is still the whole of the nonnegative orthant of R*. At
these values of k, and k,, pII = -i + 6 = & > 0. The domain of theoretical
consistency of the translog unit cost function is given by:

56,233 2 e 2 0.0012,

We see that although it is short of the whole of the nonnegative orthant of R*, for
all practical purposes, the domain is large enough. Similarly ~(1, 1) = 3 is achieved
at k, = &. At these values of k, and k,, the domain of theoretical consistency of
the generalized Leontief unit cost function is given by:

(221&b-o,
4 P2

or p2 cannot be more than 6: times greater than pl. The domain of theoretical
consistency of the translog unit cost function is given by:

e6 = 403.4 2 2 2 0.000006.

We see that ignoring extremely small relative prices, the domain of theoretical
consistency of the translog unit cost function is much larger than that of the
generalized Leontief unit cost function.
The comparison of the domains of theoretical consistency of different func-
tional forms for given values of k, and k, is a worthwhile enterprise and should
be systematically extended to other functional forms and to the three or more-
input cases. The lack of space does not permit an exhaustive analysis here. It
suffices to note that the extrapolative domain of applicability does not often
provide a clearcut criterion for the choice of functional forms in the absence of
Ch. 26: Functional Forms in Econometric Model Building 1539

a priori information. Of course, if it is known a priori whether the elasticity of


substitution is likely to be closer to zero or one a more appropriate choice can be
made.
However, it is useful to consider a functional form f( X, a) as in turn a
function g( X, k) = f( X, a(k)) where a(k) represents the rules of interpolation.
If one can prespecify the set of X’s of interest, over which theoretical consistency
must hold, one can then ask the question: What is the set of k’s such that a given
functional form f( X, a(k)) = g( X, k) will have a domain of theoretical con-
sistency (in X) that contains the prespecified set of X’s. We can call this set of
k’s the “interpolative domain’ of the functional form. It characterizes the type of
underlying behavior of the data for which a given functional form may be
expected to perform satisfactorily.

2.3. Flexibility

Flexibility means the ability of the algebraic functional form to approximate


arbitrary but theoretically consistent behavior through an appropriate choice of
the parameters. The concept of flexibility, first introduced by Diewert (1973,
1974), is best illustrated with examples. First, we consider the cost function:

c(P,y)=y
[
FaiPi
i=l1
7 ai > 0, i=l ,-.*> m.

The derived demand functions are given by Hotelling (1932)-Shephard (1953)


Lemma as:

xj=g(p,Y)=.iY, i=l ,***, m.


1

The inputs are always employed in fixed proportions, whatever the values of (Y
may be. Moreover, own and cross-price elasticities of all inputs are always zero!
Thus, although the cost function satisfies the criterion of theoretical consistency,
it cannot be considered “flexible” because it is incapable of approximating any
theoretically consistent cost function satisfactorily through an appropriate choice
of the parameters. to If we are interested in estimating the price elasticities of the
derived demand for say labor or energy, we would not employ the linear cost
function as an algebraic functional form because the price elasticities of demands
that can be derived from such a cost function are by a priori assumption always
zeroes.

“There is of course, the question of what satisfactory approximation means, which is addressed
below. ’
1540 L. J. Luu

The degree of flexibility required of an algebraic functional form depends on


the purpose at hand. In the empirical analysis of producer behavior, flexibility is
generally taken to mean that the algebraic functional form used, be it a produc-
tion function, a profit function, or a cost function, must be capable of generating
output supply and input demand functions whose own and cross-price elasticities
can assume arbitrary values subject only to the requirements of theoretical
consistency at any arbitrarily given set of prices through an appropriate choice of
the parameters. We can give a working definition of “flexibility” for an algebraic
functional form for a unit cost function as follows:

DeJinition
An algebraic functional form for a unit cost function C( p; a) is said to be flexible
if at any given set of nonnegative (positive) prices of inputs the parameters of the
cost function, (Y, can be chosen so that the derived unit-output input demand
functions and their own and cross-price elasticities are capable of assuming
arbitrary values at the given set of prices of inputs subject only to the require-
ments of theoretical consistency.”

More formally, let C( p; a) be an algebraic functional form for a unit cost


function where (Yis a vector of unknown parameters.
--- Then flexibility implies and
is implied by the existence of a solution a( 3; C, X, S) to the following set of
equations:

c( p,; a) = c,
vc( j; a) = x, (2.46)

v2c( p; a) = 9,

for every nonnegative (positive) value of p, c and x and negative semidefinite


value of p2 such that c= px and gj = 0. In other words, for every vector of
prices of inputs p, it is possible to choose the vector of parameters (Yso that at the
given p, the values of the unit cost-- function, its gradient and its Hessian matrix
are equal to prespecified values of C, X and 3 respectively.
An example of a flexible algebraic functional form for a unit cost function is
the generalized Leontief cost function. The generalized Leontief unit cost function

“This definition of flexibility is sometimes referred to as “ second-order” flexibility because it


implies that the gradient and the Hessian matrix of the unit cost function with respect to the prices of
inputs are capable of assuming arbitrary nonnegative and negative semidetinite values respectively.
“Ne g ative semidefiniteness of S follows from homogeneity of degree one and concavity of the unit
cost function in the prices of inputs.
Ch. 26: Functional Forms in Econometric Model Building 1541

is given by:

‘(PI = C CPijP!‘2Pj’2t (2.47)


i j

where without loss of generality pij = /3,,,Vi, j. The elements of the gradient and
Hessian matrix of the generalized Leontief unit cost function are given by:

l3C
-&-, = p;; + 12 jzic p.‘J.pYpy,

i=l >..-, m; (2.48)
I

a2c
~ = ~pijp,~~~zp,-l/z. i # j, i, j=l 3.e.3 m;
aPidPj

a?
-=-a,~,Bijp,~3/2p:/2, i=l,..., m. (2.50)
ad

In order to demonstrate the flexibility of the generalized Leontief unit cost


function, we need to show that given the left-hand sides of eqs. (2.47) through
(2.50) and F, one can always find a set of parameters p that will solve these
equations exactly. First, observe that eq. (2.47) can always be solved by an
appropriate scaling of the parameters provided that

E+ = p;;+ 1 c p,.p:1/2p;/2 2 0, i=l ,...,m.


I 2 j+;i ” ’

Second, eq. (2.48) can always be solved by an appropriate choice of the p;;‘s,
pi, 2 0, whatever the value of

+ C,/3iJp;1/2pj/2, i = 1,. . _, m.
J+’

Third, eq. (2.49) can always be solved by setting

pij= i # j, i, j=l ,..., m.


p!/2p!/2 apiapj)
1 J

Finally, because of homogeneity of degree zero of aC/ap,,

a2c
apfPi= - C *PjY
I i+i aPiaPi
1542 L. J. Lau

so that

i = 1,. . . , m,

which satisfies eq. (2.50) identically. We note that

C &jp;3/2p)/220, i=l,..., m,
j+i

in order for the Hessian matrix to be negative semidefinite. We conclude that the
generalized Leontief unit cost function is flexible.
Another example of a flexible algebraic functional form for a unit cost function
is the transcendental logarithmic cost function. The translog unit cost function is
given by:

InC(p)=C,+Cailnpi+~CCB,,lnP,lnPj, (2.51)
i ’ J

where ci~, =l; cjpij = 0,Vi and without loss of generality fiij = /3ji,Vi,.j. The
elements of the gradient and Hessian matrix of the translog unit cost function are
given by:

ac
-=-- c ahc
ap, pi alnpi'

m; (2.52)

i # j,
(2.53)
i, j=l ,a.., m,

i=l ,.**, m. (2.54)

In order to demonstrate the flexibility of the translog unit cost function, we


need to show that given the left-hand sides of eqs. (2.51) through (2.54) and p,
one always find a set of parameters C,, (Y and p that will solve these equations
exactly. First, we observe that eq. (2.51) can always be satisfied by an appropriate
Ch. 26: Functional Forms in Econometric Model Building 1543

choice of C,. Eq. (2.52) can be rewritten as

Pi ac
= ffi + C/3ijlnpj, i=l,...,m,
C aPi j

which can always be solved by an appropriate choice of the ai’s, (Y~2 0,


i=l P-.-Y m and c,cz, = 1, subject to &p,, = O,Vj. Eqs. (2.53) and (2.54) combined
may be written as:

0 ... 0
p2

0
*** 0

P, 1
or

L
0

v2Qp)
0
0
0 0 --* pm Pl
0
0 0
P2
0 ... Pf?l
- ww’ - diag[ w 1, (2.55)

where wi= alnC/alnp,, i=l,..., m, and diag[w] is a diagonal matrix with wi’s
on the diagonal. Every term on the right-hand side of eq. (2.55) is either known or
specified. Thus, /3 can be chosen, subject to cipij = O,Vj, to satisfy any negative
semidefinite matrix specified for v2C( p). We conclude that the translog unit cost
function is flexible.
Similarly, we can give a working definition of “flexibility” for an algebraic
functional form for a complete system of consumer demand functions as follows:
Dejinition

An algebraic functional form for a complete system of consumer demand


functions F( p, M, a), is said to be flexible if at any given set of nonnegative
(positive) prices of commodities and income or total expenditure the parameters,
(Y, of the complete system of consumer demand functions can be chosen so that
the consumer demand functions and their own and cross-price and income
elasticities are capable of assuming arbitrary values at the given set of prices of
commodities and income subject only to the requirements of theoretical con-
sistency.

More formally, let F*( p*, M*; a) be a vector-valued algebraic functional form
for a complete system of consumers demand functions expressed in natural
1544 L. J. Lou

logarithmic form, that is:

I;I*( p*, M*; a) = In X,, i=l ,..., m;


p:=Inp,, i=l,...,m;
M*=lnM.

Then flexibility implies and is implied by the existence of a solution a( p*, a*; F*,
aF*‘/ap*, aF*/aM*) to the following set of equations:

F*( j*, a*; a) = F*,

(2.56)

for every positive value of j*, a* and F* and symmetric negative semidefinite
value of the corresponding Slutsky substitution matrix which depends on p*, M*,
aF*‘/ap* and JF*/aM*.
We note that an equivalent definition may be phrased in terms of the natural
derivatives of the demand functions with respect to the prices of commodities and
income rather than the logarithmic derivatives or elasticities.
An example of a flexible algebraic functional form for a complete system of
consumer demand functions is the transcendental logarithmic demand system
introduced by Christensen, Jorgenson and Lau (1975). The transcendental loga-
rithmic demand system is given by:

(Y,+ ~/3,j(lnpj-lnM)
p,x,= .i
i=l ,-.., m, (2.57)
M -l+ x/?j,(lnpj-lnM) ’

where pij = pji, i, j = 1,. . . , m and xi&, = /3’M, j = 1,. . . , m. It may be verified
that this complete system of demand functions can attain at any prespecified
positive values of p = _is and M = a and given positive value of X and negative
semidefinite value of the Slutsky substitution matrix S such that S’p = 0, where a
typical element of S is given by:

aXi aXi
sij= F + XjzI i,j=l 7.*., m,

through a suitable choice of the parameters pij’s and &,‘s.


Flexibility of a functional form is desirable because it allows the data the
opportunity to provide information about critical parameters. An inflexible
Ch. 26: Functional Forms in Econometric Model Building 1545

functional form often prescribes the value, or at least the range of values, of the
critical parameters. In general, the degree of flexibility required depends on the
application. For most applications involving producer or consumer behavior,
the flexibility required is that the own and cross-price derivatives (or equivalently
the elasticities) of demand for inputs or commodities be free to attain any set of
theoretically consistent values. For other applications, the desired degree of
flexibility may be greater or less. Sometimes a knowledge of the sign and/or
magnitude of a third-order derivative may be necessary. For example, in the
analysis of behavior under uncertainty, the third derivative of the utility function
of the decision maker plays a critical role in the comparative statics. In the
empirical analysis of such situations, the algebraic functional form should be
chosen so that it is “third-order” flexible, that is, it permits the data to inform
about the sign and/or magnitude of the third derivative of the utility function (or
equivalently, the second-order derivative of the demand function). In other words,
we need to know not only the elasticity of demand, but also the rate of change of
the elasticity of demand.

2.4. Computational facility

The computational facility of a functional form implies one or more of the


following properties.
(1) Its unknown parameters are easy to estimate from the data. Usually what
this means is that the functional form is, after a known transformation if
necessary, linear-in-parameters, and if there are restrictions on the parameters
they are linear restrictions. This is called the “Linearity-in-Parameters” property.
(2) The functional form and any functions of interest derived from it are
represented in explicit closed form. For example, it is often not enough that the
production function is linear in parameters. The input demand functions deriva-
ble from it should be representable in explicit closed form and preferably be
linear in parameters as well. This property makes it easy to manipulate and
calculate the values of different quantities of economic interest and their deriva-
tives with respect to the independent variables. This is called the property of
“Explicit Representability”.
Explicit representability of a complete system of demand functions for inputs
or commodities cannot in general be guaranteed if one begins with an arbitrary
production function or utility function. In fact, the only known production
functions that give rise to a system of explicitly representable input demand
functions are those that are homothetic after a translation of the origin if
necessary. Similarly, the only known utility functions that give rise to a complete
system of explicitly representable consumer demand functions are those that are
homothetic after a translation of the origin if necessary. By contrast, if one beings
by specifying a profit or cost function or an indirect utility function, explicit
1546 I.. J. Lau

representability is guaranteed. Given a profit or cost function, the system of input


demand functions are, by Hotelling-Shephard Lemma, the gradient of the profit
or cost function with respect to the vector of prices of inputs. Given an indirect
utility function, the complete system of consumer demand functions are given by
Roy’s (1943) Identity:

-F(PAo
xi = &/l i =l,...,m,
aM(Pdw ’

where V( p, M) is the indirect utility function.


(3) If the functional form pertains to a complete system, say, of either
cost-minimizing input demand functions or consumer demand functions, the
different functions in the same system should have the same algebraic form but
different parameters. This is called the property of “Uniformity”.
Uniformity of a functional form is desirable not only for aesthetic reasons but
also because it simplifies considerably the statistical estimation and other related
computations. In essence the same procedure and computer programming can be
applied to all of the different functions in the same complete system if their
algebraic forms are the same.
(4) The number of parameters in the functional form should be the minimum
possible number required to achieve a given desired degree of flexibility. In many
instances the number of observations is quite small and conservation of the
degrees of freedom is an important consideration. In addition, the cost of
computation for a given problem increases approximately at the rate of n2 where
n is the number of parameters to be estimated. This is called the property of
“ Parsimony”.
We may add that both the generalized Leontief and the translog unit cost
functions give rise to a system of cost-minimizing input demand functions that
satisfies all four of the properties here.

2.5. Factual conformity

Factual conformity implies consistency of the functional form with known


empirical facts. Fortunately or unfortunately (depending on one’s point of view),
there are few known, generally accepted and consistently confirmed facts. Perhaps
the only generally accepted and consistently confirmed known empirical fact is
Engel’s Law, which says that the demand for food, or primary commodities in
general, has an income elasticity of less than unity.13 While this fact may seem
innocuous enough, it rules out the use of any homothetic direct or indirect utility

13Sce Houthakker (1957), (1965).


Ch. 26: Functional Forms in Econometric Model Building 1541

function as the basis for an empirical study of consumer demand because


homotheticity implies that the income elasticity of demand of every commodity is
unity.
Less established but still widely accepted empirical facts include:
(1) the six-tenth factor rule between capital cost and output capacity for certain
chemical and petrochemical processing industries;
(2) the elasticities of substitution between all pairs of input in the three or
more-input case are not all identical;
(3) the proportionality of the quantity of raw material input to the quantity of
output (for example, iron ore and steel);
(4) not all Engel curves are linear in income.
Each of these facts has implications on the choice of functional forms. For
example, the six-tenth factor rule is inconsistent with the use of functional forms
for production functions that are homothetic (unless all other inputs also satisfy
the six-tenth factor rule, which is generally not the case). The lack of identity
among the elasticities of substitution between all pairs of inputs suggests that the
Constant-Elasticity-of-Substitution (and hence the Cobb-Douglas) production
function is not an appropriate algebraic functional form. The proportionality of
raw material input to output suggests that the production function must have one
of the two following forms:

Y=Min(
f(X), E-,
where X is the vector of all other inputs, f(X) is a function of X and M is the
quantity of raw material input; or
Y= f(X)M.
The fact that not all Engel curves (of different commodities) are linear suggests
that the use of the Gorman (1953) condition for the analysis of aggregate
consumer demand can be justified only as an approximation.14
In the choice of algebraic functional forms, one should avoid, insofar as
possible, the selection of one which has implications that are at variance with
established facts.

3. Compatibility of the criteria for the selection of functional forms

A natural question that arises is: Are there algebraic functional forms that satisfy
all five categories of criteria that we have laid down in Section 2? In other words,
does there exist an algebraic functional form that is globally theoretically con-

“‘The Gorman condition on the utility function justifies the existence of aggregate demand
functions as functions of aggregate income and is widely applied in empirical analyses. See for
example Blackorby, Boyce and Russell (1978).
1548 L. J. Lau

sistent (for all theoretically consistent data), flexible, linear-in-parameters, ex-


plicitly representable, uniform (if there are more than one function in the system),
parsimonious in the number of parameters and conforms to known facts?
Obviously, the answer depends on the specific application. In Section 3.1, we give
an example of the incompatibility of a global extrapolative domain of applicabil-
ity and flexibility. In Section 3.2, we give an example of the incompatibility of
computational facility and factual conformity. In Section 3.3, we prove an
impossibility theorem which says that there does not exist an algebraic functional
form for a unit cost function which has a global extrapolative domain of
applicability and satisfies the criteria of flexibility and computational facility.
Thus, in general, one should not expect to find an algebraic functional form
that satisfies all five categories of criteria. For specific applications, especially in
situations in which the relevant theory imposes little or no restriction, it may be
possible that such an algebraic functional form can be found.

3.1. Incompatibility of a global domain of applicability and flexibility

Consider the generalized Leontief unit cost function for a single-output, two-
input technology:

C(PlT PA = “oPl+ fflP:“P:“+ (y2P2,

which, as shown in Section 2.2, is theoretically consistent over the whole nonnega-
tive orthant of prices of inputs if and only if a0 2 0; (pi 2 0 and a2 2 0. We shall
show that under these parametric restrictions, the unit cost function is not
flexible, that is, the parameters cannot be chosen such that it can attain arbitrary
but theoretically consistent values of C, VC and v2C at an arbitrary set of prices
of inputs.
Without loss of generality let the set of prices be (1, l), and let the arbitrarily
chosen values of C, VC and v2C at (1,l) be

C(1,l) = k, 2 0,

(3.1)

where the restrictions on vC(l,l) and v2C(l,l) reflect homogeneity of degree


one, monotonicity and concavity of the unit cost function in the prices of inputs.
Flexibility requires that for arbitrarily given k,, k,, k, 2 0, with k, - k, 2 0, the
Ch. 26: Functional Forms in Econometric Model Building 1549

parameters cwa,(pi, 0~~2 0 can be found such that

$(l,l)=a,++Y,=k,, (3.2)
1

$(l,l) =- $01~
= - k,.
1

The reader can verify that satisfaction of eq. (3.2) is equivalent to the satisfaction
of eq. (3.1). It is easy to see that CY~can always be chosen to be 4k, and hence
2 0. However,

a0 + ia1 = a,, +2k, = k,,

cannot hold with (~a 2 0 if 2k, 2 k,. Thus, flexibility fails if the generalized
Leontief unit cost function is required to be theoretically consistent globally. We
note that 2k, 2 k, implies that

-- Pl _=-
ax, aW/aP: k, 1
4 aP1 p1 ac/aP, ‘CT,

Thus, the generalized Leontief unit cost function, if it were to be required to be


valid for all nonnegative prices of inputs, cannot approximate a technology with
an elasticity of input demand of greater than i!
This examples shows that a global extrapolative domain of applicability may be
incompatible with flexibility.
The first related question is: Given the rules of interpolation embodied in eq.
(3.2), what is the domain of values of k,, k, and k, that will allow the
generalized Leontief unit cost function to be globally theoretically consistent? We
note from eq. (3.2) that the parameters may be obtained by interpolation as:

q,=k,-2k,rO,

q = 4k, 2 0,
cw,=k,-k,-2k,20,

which must all be nonnegative. Moreover, by monotonicity, k, - k, 2 0. The


inequalities are, however, all homogeneous of degree one, we may thus arbitrarily
normalize k, to unity. The domain of k,, k,, k,‘s can then be represented by the
1550 L. J. Lam

following set of inequalities:


k;-2k;>O,
1-k;-2k;zO,
1-k;lO,
k,“>O; k:zO.
These inequalities can be illustrated graphically in Figure 1. The interpolative
domain of the generalized Leontief unit cost function, if it were required to
globally theoretically consistent, consists only of the shaded area. The shaded area
falls far short of the constraint for theoretical consistency, that is, 1 - k; 2 0,
k; L 0 and k: 2 0. It is clear that if the generalized Leontief unit cost function
were to be required to be globally theoretically consistent, it can be flexible only
for those values of kz and k; in the shaded area.
The elasticity of substitution at (1,l) may be computed as:

CC,, k, k,
’= -C,C, = g (k, - k2)

k; 1
=-
k; (l-k;)’

The minimum value of u over the admissible domain of k* ‘s is of course zero.


The maximum value can be shown to occur at kz = $ and kc = a, that is, u = 1.
Thus, the generalized Leontief unit cost function, if it were to be globally
theoretically consistent, cannot attain an elasticity of substitution greater than
unity.
The own and cross-price elasticities of the input demand functions are given
by:
Pj 8Xi
--=_- Pj a2C
i, j=1,2.
xi aPj ci api ap, 7

At (l,l), they are given by:


ahx,
-=-
-k;
JlnP, k; ’

ah x1 k:
alnp, = G’

ah x2 k:
PC
dInPI (l-k;) ’

alnx, -kz
---=
alnP2 (1-k;) ’
Ch. 26: Functional Forms in Econometric Model Building 1551

Figure 1

Referring to Figure 1, the maximum absolute value of 8 In X,/a In pj within the


admissible region is 4, the minimum absolute value is 0.
It should be noted that the incompatibility of a global extrapolative domain of
applicability and flexibility is a common problem and not limited to the gener-
alized Leontief unit cost function. It is also true of the translog unit cost function.
If the translog unit cost function were required to be globally theoretically
consistent, the only value of elasticity of substitution it can take at (1,1) is unity!
The purpose of Section 3.1 is to show that the two criteria of domain of
applicability and flexibility are often incompatible. In Section 3.3 we shall show
that the two criteria are neuer compatible for any functional form for a unit cost
function that is linear in parameters and parsimonious.

3.2. Incompatibility of computational facility and factual conformity

In Section 2.5 we pointed out the known fact that some commodities, notably
food, have income elasticities less than unity. Thus, any algebraic functional form
for a complete system of consumer demand functions that has the property of
unitary income elasticity for every commodity must be at variance with the facts
and should not be used. This rules out all complete systems of consumer demand
1552 L. J. Lm

functions derived from a homothetic functional form for a direct or indirect


utility function.
Unfortunately, all known theoretically consistent (flexible or not) complete
system of consumer demand functions of three or more commodities that are
linear in parameters,15 after a known transformation of the dependent variables if
necessary, have the property of unitary income elasticities for all commodities.16
Thus, in the choice of a functional form for a complete system of consumer
demand functions, the linearity-in-parameters property has to be abandoned.
It is conjectured that linearity-in-parameters implies unitary income elasticities
for all theoretically consistent complete systems of consumer demand functions of
three or more commodities. Such a theorem remains to be proved.

3.3. Incompatibility of a global domain of applicability, frexibility and


computational facility

We now proceed to prove a general impossibility theorem which says that a


linear-in-parameters and parsimonious functional form for a unit cost function
cannot be simultaneously (1) globally theoretically consistent and (2) flexible for
all theoretically consistent data. Thus, it is futile to look for a linear-in-parameters
functional form for a unit cost function that will satisfy all of our criteria. In
Section 3.1 we already demonstrated that a global domain of applicability is
incompatible with flexibility as far as the generalized Leontief unit cost function
is concerned. Here we show that this incompatibility is true of all linear-in-param-
eters and parsimonious unit cost functions.
Our presentation is simplified by considering the normalized cost function
defined as C*( p2/p1) = C(1, p2/p1) instead of the cost function C(p,, p2). The
two functions are of course equivalent. The properties of the normalized cost
function are as follows:

(3.3)

(3.4)

(3.5)

“Linearity in parameters as used here requires that the restrictions on the parameters, if any, are
linear also. Thus, the Linear Expenditure System introduced by Stone (1954) is not a linear-in-parame-
ters functional form.
‘“See, for example, Jorgenson and Lau (1977) and (1979) and Lau (1977).
Ch. 26: Functional Forms in Econometric Model Building 1553

were q = p2/p1. We note that eqs. (3.3) and (3.4) together imply that C*(q) > 0.

Lemma 1

Let a normalized unit cost function have the linear-in-parameters and parsimoni-
ous form:

c(q)=fo(q)~o+fl(q)~l+f*(q)~2~17 (3.6)

where the fi(q)‘s are a set of linearly independent twice continuously differentia-
ble functions of q. In addition, suppose that the functional form is flexible, that
is, for every 4 > 0 and every k 2 0, there exists a set of parameters (Y,,,CX~
and (Y*
such that:

i=O

- 2 fi”(ij)a, =k,.
i=o

Let this system of equations be written as:

W(ij)a= k. (3.7)

where

fobd- qfO(4) f1(4)- qflw f2(d- qf2w


WI) = f;(q) f:(q) f;(q) .

[ fo”(q) f;‘(q) f?(q) I

Then W(ij) is nonsingular for all 4.


Proof

By hypothesis, for all 4 > 0, and for all k 2 0, there is a solution (Ysatisfying

W(ij)ci= k.

“This functional form is parsimonious because it has the minimum number of independent
parameters required for flexibility.
1554 L. J. Luu

By Gale’s (1960) Theorem of the Alternative this implies that there must not be a
solution y to the equations

qzY = 0, k’y=l, 4>0; kz0.

Suppose W(q) is singular for some 4, then there exists j f 0 such that

w( q)‘j = 0.

Since J # 0 there exists k 2 0 such that k’jj Z 0. If k’y < 0, we consider j* = - j,


so that k’j* > 0. By defining k* = k/k’j*, k*‘j* =l. Then W(q)‘j* = 0, k*‘j*
= 1, k* L 0 which, by Gale’s Theorem of the Alternative, implies that

W(4)” = k*, k* 2 0,

does not have a solution contradicting the hypothesis of flexibility. We conclude


that flexibility implies nonsingularity of W( 4) for all q > 0. Q.E.D.

We note that if the functions fo(q), fi(q) and f2(q) are linearly dependent,
then W(q) is always singular. It is clear that the functional form in eq. (3.6) is
parsimonious in the number of parameters since the number of independent
unknown parameters is equal to the number of components of k that need to be
matched.

Lemma 2

Let A be a real square matrix. Let x be a nonnegative vector of the same


dimension. Then

Ax20 for all x 2 0,

if and only if A is nonnegative.

Proof

Sufficiency is straightforward. Necessity is proved by contradiction. Suppose there


exist A, not nonnegative, such that Ax 2 0 for all x L 0. Let Alj -c 0, then let x
be a vector with unity as the jth element and zero otherwise. The ith element of
Ax will therefore be negative, contradicting the hypothesis that Ax 2 0. We
conclude that A must be nonnegative. Q.E.D.

Lemma 3

Let A be a real, nonnegative, nonsingular square matrix of finite dimension. The


A -’ is nonnegative if and only if A = DP where D is a positive diagonal matrix
and P is a permutation matrix.
Ch. 26: Functional Forms in Econometric Model Building 1555

A proof is contained in Appendix l.l*


With these three lemmas, we can now proceed to state and prove the main
impossibility theorem.
Theorem

Let a class of normalized unit cost functions have the linear-in-parameters and
parsimonious form:

WI; 4 =fo(cr)~o+f1(4)“1+f*(q)a21
where the fi(q)‘s are a set of linearly independent twice continuously differenti-
able functions of 4. In addition, suppose that the functional form is flexible, that
is, for every 4 > 0 and every k 2 0, there exists a set of parameters CQ, (pi and CQ
such that:

2 (L(~)-#i’(4))ai=ko,
i=O

or equivalently

W(ij)a= k.

Then C(q; a) cannot be globally theoretically consistent (for all nonnegative


prices) for all such a’s.
Proof

The proof is by contradiction. Global theoretical consistency of C(q; a) implies:

IV+2 0, vq20.

By hypothesis, for every q > 0 and k 2 0, there exists

W(+x= k.

“1 am grateful to Kenneth Arrow for correcting an error in the original formulation of Lemma 3.
1556 LJ.L.UU

By Lemma 1, W( 4) is nonsingular and hence

(Y= W(q)-%.

Suppose the theorem is false, then there exists W(q) such that:

W(q)cu=W(q)W(q)-‘k>O,Vq>O,q>Oand k20.

By Lemma 2, W(q)W(fj)-l must be nonnegative. Let

4qd = ~bdw~)-l~
which is nonnegative. Then

f+Yq) = A(% 4)WY). (3.8)

By the symmetry of q and 4,

67) = 4% 4Mq)?

and hence

@%I) = 4W7)4% 4)JG),

which implies that

Thus, both A(q, 4) and its inverse are nonnegative. By Lemma 3,

where D(q, ij) is a positive diagonal matrix and P is a permutation matrix.”


Substituting eq. (3.9) into eq. (3.8), we obtain:

fed = NcL~)P~(~).

PW( 4) is a nonsingular matrix independent of q, so that each element of the ith

19A permutation matrix is a square matrix which can be put into the form of an identity matrix by a
suitable reordering of the rows (or columns) if necessary.
Ch. 26: Functional Forms in Econometric Model Building 1557

row of W(q) is equal to a constant (possibly zero) times Dji(q), a function of q.


This contradicts the linear independence of the functions fo(q), fi(q), and f2( q).
Q.E.D.

The implication of this theorem is that there can be no linear-in-parameters


and parsimonious functional form for a normalized unit cost function which can
fit arbitrary but theoretically consistent values of a normalized unit cost function
and its first and second derivatives at any preassigned value of the normalized
price and be itself theoretically consistent for all nonnegative normalized prices.
One has to be prepared to give up one or more of the desirable properties of an
algebraic functional form.
Since one is not likely to give up theoretical consistency or flexibility, or even
computational facility, the logical area for a compromise lies in the domain of
applicability. For example, one can be satisfied with an extrapolative domain of a
functional form for a unit cost function that excludes, say, unreasonably high
values of the elasticity of substitution.
The fact is that requiring the extrapolative domain of a functional form to be
global when the data on which the parameters of the functional form are
estimated are local does not make too much sense from a practical point of view.
In the first place, even assuming that the same functional form and the same
parameters hold outside the neighborhood containing the observed data, the
confidence band for the estimated function will become so wide for values of
independent variables far away from the neighborhood containing the observed
data that it will not be very useful at all. Second, values of the parameters and
even the functional form itself may be different for values of independent
variables far away from the neighborhood containing the observed data.20 Unfor-
tunately there is no way of knowing a priori. One can only wait until these
faraway values are actually experienced and observed. Third, reality is always
finite and it is difficult to conceive of any application in which an independent
variable, for example, a price or a quantity of an input, becomes arbitrarily large.
For these reasons, it may be just as well that a global extrapolative domain
cannot be achieved in general. One should settle for a well-prespecified compact
domain of applicability that reflects the actual and potential ranges of data
experiences.
The theorem can be generalized in several dimensions: (1) the number of
independent variables can be increased; (2) the number of parameters can be
increased (but maintained finite); (3) the functional form can be linear-in-parame-
ters after a monotonic transformation.

2oAs an example, consider classical Newtonian mechanics and relativistic mechanics. The latter
reduces to the former at low velocities. However, an extrapolation of Newtonian mechanics to
high-velocity situations would be wrong!
1558 L. J. Lau

4. Concluding remarks

The most important conclusion that can be drawn from our analysis here is that
in general it is not possible to satisfy all five categories of criteria simultaneously.
Some trade-offs have to be made. It is however not recommended that one
compromises on local theoretical consistency - any algebraic functional form must
be capable of satisfying the theoretical consistency restrictions at least in a
neighborhood of the values of the independent variables of interest. It is also not
recommended, except as a last resort, to give up computational facility, as the
burden of and probability of failure in the estimation of nonlinear-in-parameters
models is at least one order of magnitude higher than linear-in-parameters models
and in many instances the statistical theory is less well developed. It is also not
advisable to sacrifice flexibility-inflexibility restricts the sensitivity of the param-
eter estimates to the data and limits a priori what the data are allowed to tell the
econometrician. Unless there is strong a priori information on the true functional
form, flexibility should be maintained as much as possible.
This leaves the domain of applicability as the only area where compromises
may be made. As argued in Section 3.3, most practical applications can be
accommodated even if the functional form is not globally theoretically consistent
so long as it is theoretically consistent within a sufficiently large but nevertheless
compact subset of the space of independent variables. For example, any extrapo-
lative domain of theoretical consistency which allows the relative price of inputs
to vary by factor of one million is plenty large enough. Moreover, by making a
compromise on the extrapolative domain of applicability one can also simulta-
neously reduce the domain over which the functional form has to be flexible.
Further, one can also make compromises with regard to the interpolative domain
of the functional form, that is, to limit the set of possible values of the derivatives
of the function that the functional form has to fit. For example, one may specify
that a functional form for a unit cost function C( p; a(k)) be theoretically
consistent for all prices in a compact subset of positive prices and for all values of
k in a compact subset of possible values of its first and second derivatives. This
last possibility holds the most promise.
With regard to specific applications, one can say that as far as the empirical
analysis of production is concerned, the surest way to obtain a theoretically
consistent representation of the technology is to make use of one of the dual
concepts such as the profit function, the cost function or the revenue function.
There, as we have learned, one has to be prepared to make compromises with
regard to the domain of applicability. The impossibility theorem in Section 3.3
applies not only to unit cost functions but to other similar concepts such as profit
and revenue functions as well.
As far as the empirical analysis of consumer demand is concerned, the surest
way to obtain a theoretically consistent and flexible complete system of demand
Ch. 26: Functional Forms in Econometric Model Building 1559

functions is to specify a theoretically consistent and flexible nonhomothetic


indirect utility function and derive the system of consumer demand functions by
Roy’s Identity. As long as the indirect utility function is theoretically consistent
and flexible, the resulting complete system of consumer demand functions will
also be theoretically consistent, flexible, and explicitly representable. Unfor-
tunately, linearity-in-parameters of the indirect utility function does not guaran-
tee linearity-in-parameters of the complete systems of consumer demand func-
tions. In fact, the only known linear-in-parameters complete system of consumer
demand functions of three or more commodities are derivable from homothetic
utility functions with the undesirable implication that the income elasticities of
demands of all commodities are unities, an implication that has been repeatedly
contradicted by facts. Thus, one has to give up on the linearity-in-parameters
property in the choice of a functional form for a complete system of consumer
demand functions.
Once linearity-in-parameters is given up, it is not clear what the next best thing
may be. However, here one may be guided by parsimony of parameters (and
restrictions on parameters). The estimation of nonlinear parameters subject to
nonlinear constraints is a considerably more difficult undertaking and the degree
of nonlinearity should be kept at a minimum. A device that frequently works is to
start with a linear-in-parameters complete system and translate its origin so that
the resulting translated system no longer has the property of unitary income
elasticities for all commodities.

Appendix 1

Lemma 3

Let A be a real, nonnegative, nonsingular square matrix of finite dimension. Then


A - ’ is nonnegative if and only if A = DP where D is a positive diagonal matrix
and P is a permutation matrix.*l

Proof

Sufficiency follows from the fact that the inverse of a permutation matrix is its
transpose, which is also a permutation matrix. The proof of necessity is by
induction on the order of the matrix n. First, we verify the necessity of the lemma

“A permutation matrix is a square matrix which can be put into the form of an identity matrix by a
suitable reordering of the rows (or columns) if necessary.
1560 L. J. Luu

for n = 2. The elements of A and A-l, both nonnegative, must satisfy the
following equations:

AllA,’ + A,,A,’ =l, (A.1)

A,,A,’ + A,,&’ = 07 (A-2)

&A,1 + A,& = 0> (A.3)

&A,r+ A,,A,‘=l, (A.4)

where A 2 0; A -’ r 0. First suppose A,, # 0. Then by eq. (A.2) Ar;l= 0 which in


turn implies that Al;’ # 0 and ATzl # 0 (otherwise A -’ is singular). A,’ # 0
implies by eq. (A.3) that A,, = 0, A221 # 0 implies by eq. (A.2) that A,, = 0. Thus
A is a diagonal matrix and nonsingularity implies that A is a positive diagonal
matrix. Next suppose A,, = 0, then A,, # 0 and A,, # 0 (otherwise A is singular)
and by eq. (A.l) At;’ # 0. A;rl # 0 implies by eq. (A.3) A,, = 0. Thus, A can be
expressed as

the product of a positive diagonal matrix and a permutation matrix.


Now suppose the lemma is true for all real, nonnegative nonsingular square
matrices for all orders up to n, we shall show that it is true for order (n + 1). Let
the matrices A and its inverse A - ’ be partitioned conformably as

where A,, and B,, are scalars. The elements of A and A-’ must satisfy the
following equations:

A,,&, + ~Jnr = 1, (A.5)


&brn + ar,$n = 0, (A.61
a,,r& + A,&,, = 0, (A-7)
anlbln + A,B,, = I,,. 64.8)

First, suppose A,, # 0, then by eq. (A.6) b,, = 0 which implies that B,, f 0 and
B, is nonsingular (otherwise A-’ is singular). B,, Z 0 implies by eq. (A.7) u,r = 0.
B,, is nonsingular implies by eq. (A.6) a,, = 0. By eq. (A.@ B, = Ai ‘. By eq.
Ch. 26: Functional Forms in Econometric Model Building 1561

(A.5) B,, = A<l. Thus the matrices A and A-’ have the following forms:

A= [“d’
l-1, A-'= [“e’
A;1].
But A,, and Ai1 are both nonnegative, implying, by the lemma that

A, = D,P,.

We conclude that

the product of a positive diagonal matrix and a permutation matrix.


Next suppose A,, = 0, then uln # 0 and a,, # 0 (otherwise A is singular), which
in turn imply:
(1) by eq. (A.5),

qnbnl = 1.

(2) by eq. (A.%

alnB,, = 0.

(3) by eq. (A.7)

B,, = 0 and A,b,,, = 0.

We note, first of all, that eq. (A.8) implies that anlbln must be a diagonal
matrix. A typical element of anlbln is a,l,ib,,,j. In order for this to be identically
zero for i # j, all i, j, it is necessary and sufficient that a,, and b,, be nonzero in
only one element which is common to both a,, and b,,. Let this element be the
kth element of a,, (and b,,). Moreover, since anlbln is then a diagonal matrix
with the k th element on the diagonal nonzero, I,, - anlbln is also a diagonal
matrix. However, it must have a rank equal to A,B, and hence less than or equal
to n - 1. We conclude that the nonzero diagonal element of anlbln must be equal
to unity. The product A,B, is then equal to an identity matrix with the kth
element on the diagonal replaced by a zero. The ranks of A, and B, must be
equal to (n - 1). If either of them were less than (n - l), then the matrix A (or
A _ ‘) would be singular.
1562 L. J. Lau

Second, we note that because

A,,&,, = 0,

whenever an element of b,, is nonzero, the corresponding column of A, must be


zero. The rank condition on A, implies that there can only be one such zero
column. Hence b,,, can only have one nonzero element, say, the Ith. Similarly,
because

al,,4 = 0,

a,, can have only one nonzero element. Moreover, because alnbnl = 1, the same
element in a,, and b,, must be nonzero. Thus, the matrix A has the form:

0 O.--O a,,,, O***O


0

0
A=
anl,k
0

where the Ith column of A, is a column of zeros. Similarly, A-’ has the form:

0 . . .0 b ln,k 0 . . .0

where the Ith row of B, is a row of zeros.


Moreover, the product of the k th row of A, and B, must be identically zero by
eq. (A.@. This means that the k th row of A, must be proportional to a,, (with
the constant of proportionality being possibly zero). But the Zth element of the
kth row of A, is zero, whereas the Ith element of a,, is nonzero. We conclude
that the k th row of A,, is identically zero. Similarly, the product of A, and the
k th column of B, must also be identically zero. This means, by a similar
Ch. 26: Functional Forms in Econometric Model Building 1563

argument, that the k th column of B, is identically zero. Thus, the matrices A and
A-’ have the following forms.

A= ’ A:-,,,-, Ai-1 n-l


() o... 0 . ..o
64.9)
a n1.k
0

I_
0 A:-,,,-, o A,*-k,,-,

0 O-.*0 bln,k O...O


0 0

0 B;r_l k-l ’ Br*-l,n-k

A-l= b fZ1.l O..:O 0 O...O


0 0
0
Bn*-l,k-l o %+-,,n-k

0 0

where AT; and B; are conformable partitions of A and A -’ respectively.


Further, by direct multiplication,

0
&-I : 0
A,B,, = 0 ... 0 . ..O
0 : In-k
I

Let AZ be the matrix formed by deleting the k th row and Ith column of A, and
B,* be the matrix formed by deleting the Ith row and kth column of B,, it can be
shown that the resulting product of the two square matrices A,* and B,* is:

A*B*=
n n I n _ 19
1564 L. J. L.uu

so that B*n = A*-’


n . But A,* is of order n - 1. Thus, applying the lemma,

A; = Dn-lPn-,,

where D,,_1 is a positive diagonal matrix and P,_, is a permutation matrix.


Substituting this result into eq. (A.9) we obtain:

al”,/

41
0

D kpl,k-1
A=
a nl,k
D kk

D n-1.77-1

-0 o... 1 ... 0
0 0
0 Pi%,,-l : Pi+-1 , n-l
b
(A.lO)
1 0e.e 0 ... 0
0 0
: p,*_k,,-l ’ pi+-k / n-l

where the Dir’s are the elements of the positive diagonal matrix D,_, and P,T’s
are conformable partitions of the permutation matrix P,*_ 1. It can be verified that
the second matrix of the product in eq. (A.lO) is a permutation matrix. Q.E.D.

References

Arrow, K. J., H. B. Chenery, B. S. Minhas and R. M. Solow (1961) “Capital-Labor Substitution and
Economic Efficiency”, Review of Economics and Statistics, 43, 225-250.
Barten, A. P. (1967) “Evidence on the Slutsky Conditions for Demand Equations”, Review’ of
Economics and Statistics, 49, 77-84.
Barten, A. P. (1977) “The Systems of Consumer Demand Functions Approach: A Review”, in: M. D.
Intriligator, ed., Frontiers of Quantitative Economics. IIIA, Amsterdam: North-Holland, 23-58.
Berndt, E. R., M. N. Darrough and W. E. Diewert (1977) “Flexible Functional Forms and
Expenditure Distributions: An Application to Canadian Consumer Demand Functions”, Interna-
tional Economic Review, 18, 651-676.
Blackorby, C., R. Boyce and R. R. Russell (1978) “Estimation of Demand Systems Generated by the
Gorman Polar Form: A Generalization of the S-branch Utility Tree”, Econometricu, 46, 345-364.
Ch. 26: Functional Forms in Econometric Model Building 1565

Caves, D. W. and L. R. Christensen (1980) “Global Properties of Flexible Functional Forms”,


American Economic Review, IO, 422-432.
Christensen, L. R., D. W. Jorgenson and L. J. Lau (1973) “Transcendental Logarithmic Production
Frontiers”, Review of Economics and Statistics, 55, 28-45.
Christensen, L. R., D. W. Jorgenson and L. J. Lau (1975) “Transendental Logarithmic Utility
Functions”, American Economic Review, 65, 361-383.
Cobb, C. W. and P. C. Douglas (1928) “A Theory of Production”. American Economic Review, 18,
139-165.
Deaton, A. and J. S. Muellbauer (1980a) “An Almost Ideal Demand System”, American Economic
Review, 70, 312-326.
Deaton, A. and J. S. Muellbauer (1980b) Economics and Consumer Behavior. Cambridge: Cambridge
University Press.
Diewert, W. E. (1971) “An Application of the Shephard Duality Theorem, A Generalized Leontief
Production Function”, Journal of Political Economy, 79, 481-507.
Diewert, W. E. (1973) “Functional Forms for Profit and Transformation Functions”, Journal of
Economic Theory, 6, 284-316.
Diewert, W. E. (1974) “Functional Forms for Revenue and Factor Requirement Functions”, Intema-
tional Economic Review, 15, 119-130.
Fuss, M. A., D. L. McFadden and Y. Mundlak (1978) “Functional Forms in Production Theory”, in:
M. A. Fuss and D. L. McFadden, eds., Production Economics: A Dual Approach to Theory and
Applications. Amsterdam: North-Holland, 1, 219-268.
Gale, D., (1960) The Theory of Linear Economic Models. New York: McGraw-Hill.
Gorman, W. M. (1953) “Community Preference Fields”, Econometrica, 21, 63-80.
Gorman, W. M. (1981) “Some Engel Curves”, in: A. S. Deaton, ed., Essays in the Theory and
Measurement of Consumer Behavior: In Honor of Sir Richard Stone. New York: Cambridge
University Press, 7-29.
Griliches, Z. and V. Ringstad (1971) Economies of Scale and the Form of the Production Function.
Amsterdam: North-Holland.
Hanoch, G. (1971) ‘I CRESH Production Functions”, Econometrica, 39, 695-712.
Heady, E. 0. and J. L. Dillon (1961) Agricultural Production Functions. Ames: Iowa State University
Press.
Hotelling, H. S. (1932) “Edgeworth’s Taxation Paradox and the Nature of Demand and Supply
Functions”, Journal of Political Economy, 40, 517-616.
Houthakker, H. S. (1957) “An International Comparison of Household Expenditure Patterns, Com-
memorating the Centenary of Engel’s Law”, Econometrica, 25, 532-551.
Houthakker, H. S. (1960) “Additive Preferences”, Econometrica, 28, 244-257.
Houthakker, H. S. (1965) “New Evidence on Demand Elasticities”, Econometrica, 33, 277-288.
Jorgenson, D. W. and L. J. Lau (1977) “Statistical Tests of the Theory of Consumer Behavior”, in: H.
Albach, E. Helmstadter and R. Hemm, eds., Quantitative Wirtschaftforschung. Tlibingen: J. C. B.
Mohr, 384-394.
Jorgenson, D. W. and L. J. Lau (1979) “The Integrability of Consumer Demand Functions”,
European Economic Review, 12, 115-147.
Jorgenson, D. W., L. J. Lau and T. M. Stoker (1980) “Welfare Comparison Under Exact Aggregation”,
American Economic Review, 70, 268-272.
Jorgenson, D. W., L. J. Lau and T. M. Stoker (1982) “The Transcendental Logarithmic Model of
Aggregate Consumer Behavior”, in: R. L. Basmann and G. F. Rhodes, eds., Advances in Economet-
rics. Greenwhich: JAI Press, Vol. 1.
Klein, L. R. and H. Rubin (1947-1948) “A Constant-Utility Index of the Cost of Living”, Review of
Economic Studies, 15, 84-87.
Lau, L. J. (1977) “Complete Systems of Consumer Demand Functions Through Duality”, in: M. D.
Intriligator, ed., Frontiers of Quantitative Economics. IIIA, Amsterdam: North-Holland, 59-86.
Lau. L. J. (1978) “ADDhXtiODS of Profit Functions”. in: M. A. Fuss and D. L. McFadden. eds.,
,Production Economi& A Dual Approach to Theory and Applications. Amsterdam: North-Holland, 1,
133-216.
Lau, L. J. (1982) “A Note on the Fundamental Theorem of Exact Aggregation”, Economics Letters, 9,
119-126.
1566 L. J. Luu

Lau, L. J., W. L. Lin and P. A. Yotopoulos (1978) “The Linear Logarithmic Expenditure System: An
Application to Consumption-Leisure Choice”, Econometrica, 46, 843-868.
Lau, L. J. and S. Schaible (1984) “A Note on the Domain of Monotonicity and Concavity of the
Transcendental Logarithmic Unit Cost Function”, Department of Economics, Stanford: Stanford
University, mimeographed.
Lau, L. J. and B. A. Van Zummeren (1980) “The Choice of Functional Forms when Prior Information
is Diffused”. Paper presented at the Fourth World of Congress of the Econometric Society,
Aix-en-Provence, France, August 28-September 2, 1980.
McFadden, D. L. (1963) “Further Results on C.E.S. Production Functions”, Review of Economic
Studies, 30, 73-83.
McFadden. D. L. (1964) “Existence Conditions for Theil-Type Preferences”, Department of Econom-
ics, Berkeley: University of California, mimeographed. __
McFadden, D. L. (1978) “Cost, Revenue, and Profit Functions”, in: M. A. Fuss and D. L. McFadden,
eds., Production Economics: A Dual Approach to Theoty and Applications. Amsterdam: North-Hol-
land, 1, 3-109.
McFadden, D. L. (1984) “Econometric Analysis of Qualitative Response Models”, in: Z. Griliches
and M. D. Intriliaator. eds.. Handbook of Econometrics. Amsterdam: North-Holland, Vol. 2.
Muellbauer, J. S. (i975) “Aggregation, Income Distribution, and Consumer Demand”, Review of
Economic Studies, 42, 525-543.
Muellbauer. J. S. (1976) “Communitv Preferences and the Renresentative Consumer”, Econometrica,
44979-999. . ’
Nerlove, M. (1963) “Returns to Scale in Electricity Supply”, in: C. F. Christ, et al., eds., Measurement
in Economics: Studies in Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld.
Stanford: Stanford University Press, Vol. I.
Pollak, R. A. and T. J. Wales (1978) “Estimation of Complete Demand Systems from Household
Budget Data: The Linear and Quadratic Expenditure Systems”, American Economic Reuiew, 68,
348-359.
Pollak, R. A. and T. J. Wales (1980) “Comparison of the Quadratic Expenditure System and Translog
Demand Systems with Alternative Specifications of Demographic Effects”, Economefrica, 48,
595-612.
Roy, R. (1943) De I’utihte. Paris: Hermann.
Schultz, H. (1938) The Theoty and Measurement of Demand. Chicago: University of Chicago Press.
Shephard, R. W. (1953) Cost and Production Functions. Princeton: Princeton University Press.
Shephard, R. W. (1970) Theory of Cost and Production Functions. Princeton: Princeton University
Press.
Stone, J. R. N. (1953) The Measurement of Consumer’s Expenditure and Behuvior in the United
Kingdom, 1820- 1938. Cambridge: Cambridge University Press Vol. 1.
Stone, J. R. N. (1954) “Linear Expenditure Systems and Demand Analysis: An Application to the
Pattern of British Demand”, Economic Journal, 64, 511-527.
Theil, H. (1967) Economics and Information Theory. Amsterdam: North-Holland.
Uzawa. H. (1962) “Production Functions with Constant Elasticities of Substitution”. Review of
Economic Studies, 29, 291-299.
Wold, H. with L. Jureen (1953) Demand Analysis. New York: Wiley.
Chapter 27

LIMITED DEPENDENT VARIABLES

PHOEBUS J. DHRYMES

Columbia University

Contents

0. Introduction 1568
1. Logit and probit 1568
1 .l Generalities 1568
1.2. Why a general linear model (GLM) formulation is inappropriate 1570
1.3. A utility maximization motivation 1572
1.4. Maximum likelihood estimation 1575
1.5. Goodness of fit 1579
2. Truncated dependent variables 1585
2.1. Generalities 1585
2.2. Why simple OLS procedures fail 1586
2.3. Estimation of parameters by ML methods 1589
2.4. An initial consistent estimator 1590
2.5. Limiting properties and distribution of the ML estimator 1595
2.6. Goodness of tit 1603
3. Sample selectivity 1604
3.1. Generalities 1604
3.2. Inconsistency of least squares procedures 1606
3.3. The LF and ML estimation 1610
3.4. An initial consistent estimator 1613
3.5. Limiting distribution of the ML estimator 1619
3.6. A test for selectivity bias 1625
References 1626

Handbook of Econometrics, Volume III, Edited by 2. Griliches and M.D. Intriligutor


c Elsevier Science Publishers BV, 1986
1568 P. J. Dhtymes

0. Introduction

This is intended to be an account of certain salient themes of the Limited


Dependent Variable (LDV) literature. The object will be to acquaint the reader
with the nature of the basic problems and the major results rather than recount
just who did what when. An extended bibliography is given at the end, that
attempts to list as many papers as have come to my attention - even if only by
title.
By LDV we will mean instances of (dependent) variables-i.e. variables to be
explained in terms of some economic model or rationalizing scheme for which (a)
their range is intrinsically a finite discrete set and any attempt to extend it to the
real line (or the appropriate multivariable generalization) not only does not lead
to useful simplification, but befouls any attempt to resolve the issues at hand; (b)
even though their range may be the real (half) line (or the appropriate multivari-
able generalization) their behavior is conditioned on another process(es).
Examples of the first type are models of occupational choice, entry into labor
force, entry into college upon high school graduation, utilization of recreational
facilities, utilization of modes of transport, childbearing, etc.
Examples of the latter are models of housing prices and wages in terms of the
relevant characteristics of the housing unit or the individual-what is commonly
referred to as hedonic price determination. Under this category we will also
consider the case of truncated dependent observations.
In examining these issues we shall make an attempt to provide an economic
rationalization for the model considered, but our main objective will be to show
why common procedures such as least squares fail to give acceptable results; how
one approaches these problems by maximum likelihood procedures and how one
can handle problems of inference-chiefly by determining the limiting distribu-
tions of the relevant estimators. An attempt will be made to handle all problems
in a reasonably uniform manner and by relatively elementary means.

1. Logit and probit

1.1. Generalities

Consider first the problem faced by a youth completing high school; or by a


married female who has attained the desired size of her family. In the instance of
the former the choice to be modelled is going to college or not; in the case of the
latter we need to model the choice of entering the labor force or not.
Ch. 27: Limited Dependent Variables 1569

Suppose that as a result of a properly conducted survey we have observations


on T individuals, concerning their socioeconomic characteristics and the choices
they have made.
In order to free ourselves from dependence on the terminology of a particular
subject when discussing these problems, let us note that, in either case, we are
dealing with binary choice; let us denote this by
Alternative 1 Going to College or Entering Labor Force
Alternative 2 Not Going to College or Not Entering Labor Force
Since the two alternatives are exhaustive we may make alternative 1 correspond to
an abstract event 8 and alternative 2 correspond to its complement 2. In this
context it will be correct to say that what we are interested in is the set of factors
affecting the occurrence or nonoccurrence of 8. What we have at our disposal is
some information about the attributes of these alternatives and the (socioeconomic)
attributes of the individual exercising choice. Of course we also observe the choices
of the individual agent in question. Let

Yt =1 if individual t chooses in accordance with event 8,


= 0 otherwise.

Let

w= (w,,w*,...,w,),
be a vector of characteristics relative to the alternatives corresponding to the
events 6 and 2; finally, let

rt.=(rtl,..., r ,m) ,

be the vector describing the socioeconomic characteristics of the tth individual


economic agent.
We may be tempted to model this phenomenon as

Yt = x,.P + Et, t =1,2 ,..., T,

where

x,.= (w, r,.).


/3 is a vector of unknown constants and

e,:t=1,2 ,..., T,

is a sequence of suitably defined error terms.


1570 P. J. Dhymes

The formulation in (1) and subsequent estimation by least squares procedures


was a common occurrence in the empirical research of the sixties.

1.2. Why a general linear model (GLM) forwylation is inappropriate

Although the temptation to think of LDV problems in a GLM context is


enormous a close examination will show that this is also fraught with considerable
problems. At an intuitive level, we seek to approximate the dependent variable by
a linear function of some other observables; the notion of approximation is based
on ordinary Euclidean distance. That is quite sensible, in the usual GLM context,
since no appreciable violence is done to the essence of the problem by thinking of
the dependent variable as ranging without restriction over the real line-perhaps
after suitably centering it first.
Since the linear function by which we approximate it is similarly unconstrained,
it is not unreasonable to think of Euclidean distance as a suitable measure of
proximity. Given these considerations we proceed to construct a logically con-
sistent framework in which we can optimally apply various inferential procedures.
In the present context, however, it is not clear whether the notion of Euclidean
distance makes a great deal of sense as a proximity measure. Notice that the
dependent variable can only assume two possible values, while no comparable
restrictions are placed on the first component of the right hand side of (1).
Second, note that if we insist on putting this phenomenon in the GLM mold, then
for observations in which

y,=l,

we must have

(2)
while for observations in which

y,=o,

we must have

Et = - x,,p. (3)

Thus, the error term can only assume two possible values, and we are immediately
led to consider an issue that is important to the proper conceptualization of such
models, viz., that what we need is not a linear model “explaining” the choices
Ch. 27: Limited Dependent Variables 1571

individuals make, but rather a model of the probabilities corresponding to the


choices in question. Thus, if we ask ourselves: what is the expectation of E,, we
shall be forced to think of the probabilities attaching to the relations described in
(2) and (3) and thus conclude that

&,=1-Q,

with probability equal to

P,l= KY, =a (4
and

with probability

Pr*=~bt=o)=l-P,1. (5)
What we really should be asking is: what determines the probability that the t th
economic agent chooses in accordance with event 8, and eq. (1) should be viewed
as a clumsy way of going about it. We see that putting

where f( .) is a suitable density function with known parameters, formalizes the


dependence of the probabilities of choice on the observable characteristics of the
individual and/or the alternatives.
To complete the argumentation about why the GLM is inapplicable in the
present context we note further

E(E1) = F(x,.P)(l-x,./3)+ [l- F(x,.P)]( - x,.B) = F(O)-x,.P,


(8)
Var(q) = F(x,.P)[l- F(x,.P)]. (9)

Hence, prima facie, least squares techniques are not appropriate, even if the
formulations in (1) made intuitive sense.
We shall see that similar situations arise in other LDV contexts in which the
absurdity of least squares procedures is not as evident as it is here.
1572 P. J. Dhrymes

Thus, to recapitulate, least squares procedures are inapplicable

i. because we should be interested in estimating the probability of choice;


however, we are using a linear function to predict actual choices, without
ensuring that the procedure will yield “predictions” satisfying the conditions
that probabilities ought to satisfy
ii. on a technical level the conditions on the error term that are compatible with
the desirable properties of least squares estimators in the context of the GLM
are patently false in the present case.

1.3. A utility maximization motivation

As before, consider an individual, t, who is faced with the choice problem as in


the preceding section but who is also hypothesized to behave so as to maximize
his utility in choosing between the two alternatives. In the preceding it is assumed
that the individual’s utility contains a random component. It involves little loss in
relevance to write the utility function as

q=u(wJ,.;~)+EI, t =1,2 ,..., T,

where

u(w, rt.; e) = E(Ulw, q.), Et.=Ut- u(w,r,.; e>.


For the moment we shall dispense with the subscript t referring to the tth
individual.
If the individual chooses according to event 8, his utility is (where now any
subscripts refer to alternatives),

u, = u(w, r; el)+el. (10)

The justification for the parameter vector 0 being subscripted is that, since w is
constant across alternatives, 8 must vary. While this may seem unnatural to the
reader it is actually much more convenient, as the following development will
make clear.
If the individual chooses in accordance with 2, then

u, = U( W, r; e,)+ + 01)

Hence, choice is in accordance with event d if, say,


Ch. 2 7: Limited Dependent Variables 1573

But (12) implies


Alternative 1 is chosen or choice is made in accordance with event 8 if

E2-E11U(W,r;e1)-U(W,r;e2), 03)

which makes it abundantly clear that we can speak unambiguously only about the
probabilities of choice. To “predict” choice we need an additional “rule” - such
as, for example,
Alternative 1 is chosen when the probability attaching to event 8 is 0.5 or
higher.
If the functions u( .) in (13) are linear, then the t th individual will choose
Alternative 1 if

Et2 - &,l 5 x,.P, 04)

where

x,. = (WYq>, p=e,-e,. 0%


Hence, in the notation of the previous section

P(y,=l)=P(q*- et1 I x,.P) = /+?,@)d6=


-m
F,(xt.P), (16)
where now f, is the density function of at2 - e,i.
If

then we have a basis for estimating the parametric structure of our model. Before
we examine estimation issues, however, let us consider some possible distribution
for the errors, i.e. the random variables stl, E,~.
Thus, suppose

Et,. - N(0, -q, 2= [;:: p], x>o,

and the E,.‘s are independent identically distributed (i.i.d.). We easily find that

Et2 - &,l - NO, a2), a2 = lJ*2 - 2ai2 + (Ill.

Hence
1574 P. J. Dhtymes

where

and F(p) is the c.d.f.’ of the unit normal. Notice that in this context it is not
possible to identify separately /I and u * by observing solely the choices individu-
als make; we can only identify /3/a.
For reasons that we need not examine here, analysis based on the assumption
that errors in (10) and (11) are normally distributed is called Probit Analysis.
We shall now examine another specification that is common in applied re-
search, which is based on the logistic distribution. Thus, let q be an exponentially
distributed random variable so that its density is

dd=e-q 4E w47 (17)

and consider the distribution of

u=ln(q)-‘=-lnq. 08)
The Jacobian of this transformation is

J( r + q) = e-O.

Hence, the density of u is

h(u) = exp - uexp -e-” uE(-co,co). (19)

If the &ti, i = 1,2 of (14) are mutually independent with density as in (19), then the
joint density is

(20)

Put

Ul+U2=&*,

U2= El. (21)


The Jacobian of this transformation is 1; hence the joint density of the ui, i = 1,2,
is given by
Ck. 27: Limited Dependent Variables 1575

Since

v1=E2-Et,

the desired density is found as

exp-(v,+2v,)exp-(e-“Z+e-“1-“z)dv,.

To evaluate this put

l+e-‘l=t, s = tee”2

to obtain
e-“1
g( vr) = 5 /mSeCsdS =
0 (1 +e-uq2 .

Hence, in this case the probability of choosing Alternative 1 is given by

P(y, = 0) = 1- F(x,J) = 1 ;e;;.P.

This framework of binary or dichotomous choice easily generalizes to the case of


polytomous choice, without any appreciable complication - see, e.g. Dhrymes
(1978a).

1.4. Maximum likelihood estimation

Although alternative estimation procedures are available we shall examine only


the maximum likelihood (ML) estimator, which appears to be the most ap-
propriate, given the sorts of data typically available to economists.
To recapitulate: we have the problem of estimating the parameters in a
dichotomous choice context, characterized by a density function f( -); we shall
deal with the case where f( .) is the unit normal and the logistic.
As before we define

Y, =L if choice corresponds to event E


= 0 if choice corresponds to event 2
1576 P. J. Dhrymes

The event 8 may correspond to entering the labor force or going to college in the
examples considered earlier.

P(Y, =I) = F(x,.P),

where

x,.= (V-J,
w is the s-element row vector describing the relevant attributes of the alternatives
and rr, is the m-element row vector describing the relevant socioeconomic
characteristics of the t th individual.
We recall that a likelihood function may be viewed in two ways: for purposes
of estimation we take the sample as given (here the Yt’s and x,.‘s) and regard it as
a function of the unknown parameters (here the vector j3) with respect to which it
is to be maximized; for purposes of deriving the limiting distribution of estima-
tors it is appropriate to think of it as a function of the dependent variable(s) - and
hence as one that encompasses the probabilistic structure imposed on the model.
This dual view of the likelihood function (LF) will become evident below.
The LF is easily determined to be

L*= fi P(xJqY’[l- F(X,.pp’. (22)


f=l

As usual, we find it more convenient to operate with its logarithm

lnL*=L= i {Y,lnF(x,.P)+(I-Y,)ln[I-F(x,.P)I}. (23)


r=1

For purposes of estimation, this form is unduly complicated by the presence of


the random variables, Yt’s. Given the sample, we will know that some of the Y,‘s
assume the value one and others assume the value zero. We can certainly
rearrange the observations so that the first TI I T observations correspond to

y,=l, t=1,2 ,..., T,,

while the remaining T, < T correspond to

Yr,+r = 0, t=1,2 ,..., T2,

If we give effect to these statements the log likelihood function becomes

L = 5 lnF(x,J)+ ‘i” ln[l- F(x,.P)], (24)


t=1 t=T,+l
Ch. 27: Limited Dependent Variables 1511

and as such it does not contain any random variables1 - even symbolically! Thus,
it is rather easy for a beginning scholar to become confused as to how, solving

aL
-=
ap
0
9
will yield an estimator, say 8, with any probabilistic properties. At least the
analogous situation in the GLM
y=xp+u,

using the standard notation yields

s = (X’X)_‘X’y,

and y is recognized to be a random variable with a probabilistic structure


induced by our assumption on the structural error vector u.
Thus, we shall consistently avoid the use of the form in (24) and use instead the
form in (23). As is well known, the ML estimator is found by solving

f(xJ)
- - fkB’a) ]Xf=
~=,~l[y~F(x,.s) (1 yf) 1
f
0. (25)

We note that, in general, (25) is a highly nonlinear function of the unknown


parameter j3 and, hence, can only be solved by iteration.
Since by definition a ML estimator, 8, is one obeying

L(b) 2 L(a), foralladmissiblefi, (26)

it is important to ensure that solving (25) does, indeed, yield a maximum in the
form of (26) and not merely a local stationary point - at least asymptotically.
The assumptions under which the properties of the ML estimator may be
established are partly motivated by the reason stated above. These assumptions
are
Assumption A.l.1.

The explanatory variables are uniformly bounded, i.e. x,, EH*, for all t, where H,
is a closed bounded subset of R s + m, i.e. the (s + m)-dimensional Euclidean space.

Assumption A.1.2.

The (admissible) parameter space is, similarly, a closed bounded subset of R,+,,,,
say, P* such that P* 3 N(/3O), where N( /3’) is an open neighborhood of the true
parameter point PO.

‘For any sample, of course, the choice of TI is random.


1578 P. J. Dhtymes

Remark I

Assumption (A.l.l.) is rather innocuous and merely states that the socioeconomic
variables of interest are bounded. Assumption (A.1.2.) is similarly innocuous. The
technical import of these assumptions is to ensure that, at least asymptotically,
the maximum maximorum of (24) is properly located by the calculus methods of
(25) and to also ensure that the equations in (25) are well defined by precluding a
singularity due to

F(x,./3) = 0 or l-F(x,.P)=O.

Moreover, these assumptions also play a role in the argument demonstrating the
consistency of the ML estimator.
To the above we add another condition, well known in the context of the
general linear model (GLM).

Assumption A. 1.3.

5et

x= (XJ t =1,2 ,..., T,

where the elements of x,. are nonstochastic. Then

lim X’X
rank(X)=s+m, -=M>O.
T-cc T

With the aid of these assumptions we can easily demonstrate (the proof will not
be given here) the validity of the following

Theorem I

Given assumption A.l.l. through A.1.3. the log likelihood function, L of (24) is
concave in p, whether -F( .) is the unit normal or the logistic c.d.f..

Remark 2

The practical implication of Theorem 1 is that, at any sample size, if we can


satisfy ourselves that the LF of (24) does not attain its maximum on the boundary
of the parameter space, then a solution to (25) say B, obeys

L(b) 2 L(P) for all admissible p.

On the other hand as the sample size tends to infinity then with probability one
the condition above is satisfied.
1580 P. J. Dhtymes

Unfortunately, in the case of the discrete choice models under consideration we


do not have a statistic that fits all three characterizations above. We can, on the
other hand, define one that essentially performs the first two functions.
In order to demonstrate these facts it will be convenient to represent the
maximized (log) LF more informatively. Assuming that the ML estimator corre-
sponds to an interior point of the admissible parameter space we can write

+ third order terms. (29)

The typical third order term involves

It is our contention that

plim +T = 0. (30)
T+CO

Now,

1 a3L
@*)=G,,,
TpttT ap,ap,ap,

is a well defined, finite quantity, where

But then, (30) is obvious since it can be readily shown that

1 a3L
TpFz~3/= apiapjap,
=Op

and moreover that


Ch. 27: Limited Dependent Variables 1581

are a.c. finite. Hence, for large samples, approximately

On the other hand, expanding aL by Taylor series we find


JP

aL
i
J7;@ (PI -
--
o

-[ f-&(P”)]h(B-a”)_
Thus,

and, consequently, for large samples

Hence

2m3)-m”)I - -(B-P~$j-#“)(B -PO>-x5+,. (31)

Consider now the hypothesis

Ho: p=o, (32)

as against

HI : pie.

Under Ho

L(PO) = 5 { y,lnF(O)+(l- y,)ln[l- F(O)]} = rln(+),


t=1

and

2[L(B)-Tlnil- x?+~,
1582 P. J. Dhrymes

is a test statistic for testing the null hypothesis in (32). On the other hand, this is
not a useful basis for defining an R2 statistic, for it implicitly juxtaposes the
economically motivated model that defines the probability of choice as a function
of

and the model based on the principle of insuficient reason which states that the
probability to be assigned to choice corresponding to the event 6’ and that
corresponding to its complement C?are both $. It would be far more meaningful
to consider the null hypothesis to be

i.e. to follow for a nonzero constant term, much as we do in the case of the GLM.
The null hypothesis as above would correspond to assigning a probability to
choice corresponding to event 8 by

J=F(flo) or B, = F-‘(J),

where

Thus, for some null hypothesis Ho, let

QP) = SUPW).
HO

By an argument analogous to that leading to (31) we conclude that

wm- &(“.)(a-PO>
UB)] - -(S-p”)’

L(B”)(P
+(a-p”)‘
~;;g -PO). (33)

In fact, (33) represents a transform of the likelihood ratio (LR) and as such it is a
LR test statistic. We shall now show that in the case where

Ho: P&=0,
Ch. 2 7: Limited Dependent Variables 1585

In the special case where

i.e. it is the constant term in the expression

x,.P,

so that no bona fide explanatory variables “explain” the probability of choice, we


can define R2 by

R”+$ (42)

The quantity in (42) has the property

1. R2 E [O,l)
ii. the larger the contribution of the bona fide variables to the maximum of the
LF the closer is R2 to 1
. ..
ul. R2 stands in a one-to-one relation to the &i-square statistic for testing the
hypothesis that the coefficients of the bona fide variables are zero. In fact,
under HO

-X(&R2 - x:+,,-~.

It is desirable, in empirical practice, that a statistic like R2 be reported and that a


constant term be routinely included in the specification of the linear functional

Finally, we should also stress that R2 as in (42) does not have the interpretation
as the square of the correlation coefficient between “predicted” and “actual”
observations.

2. Truncated dependent variables

2.1. Generalities

Suppose we have a sample conveying information on consumer expenditures; in


particular, suppose we are interested in studying household expenditures on
consumer durables. In such a sample survey it would be routine that many
1586 P. J. Dhtymes

households report zero expenditures on consumer durables. This was, in fact, the
situation faced by Tobin (1958) and he chose to model household expenditure on
consumer durables as

_Y*= Xt.P + l.4,’ if x,.p + 24,> 0


= 0 (43)
otherwise .

The same model was later studied by Amemiya (1973). We shall examine below
the inference and distributional problem posed by the manner in which the
model’s dependent variable is truncated.

2.2. Why simple OLS procedures fail

Let us append to the model in (43) the standard assumptions that

(A.2.1.) The {u,: t=l,2 ,... } is a sequence of i.i.d. random variables with

u, - NO, e2>, a2 E (0,cCJ).

(A.2.2.) The elements of x,. are bounded for all t, i.e.

lx,il < ki, for all t, i=1,2 7.0.) n,

are linearly independent and

exists as a nonsingular nonstochastic matrix.


(A.2.3.) If the elements of x,. are stochastic, then x,., uI, are mutually indepen-
dent for all t, t’, i.e. the error and data generating processes are mutually
independent.
(A.2.4.) The parameter space, say H c Rn+2, is compact and it contains an open
neighborhood of the true parameter point (PO’, $)I.

The first question that occurs is why not use the entire sample to estimate /3?
Thus, defining

x= (x,. 1, t =I,2 ,..., T,

u= (u442,...,4’, Y (l)= (Yl, Yz,..., YJ, y@‘= (O,...,O)‘,


y = ( yw’, y(V)‘,
Ch. 27: Limited Dependent Variables 1587

we may write

y=xfi+u,
and estimate /3 by

fi = ( XlX) -lx/y. (44


A little reflection will show, however, that this leads to serious and palpable
specification error since in (43) we do not assert that the zero observations are
generated by the same process that generates the positive observations. Indeed, a
little further reflection would convince us that it would be utterly inappropriate to
insist that the same process that generates the zero observations should also
generate the nonzero observations, since for the zero observations we should have
that

u, = - x*.p, t=T,+,,...,T,+T,,

and this would be inconsistent with assumption (A.l.l.).


We next ask, why not confine our sample solely to the nonzero observations,

Y(l) = x,p+ yl),

and thus estimate p by

#d= ( xix,) -lx;y(l).

This may appear quite reasonable at first, even though it is also apparent that we
are ignoring some (perhaps considerable) information. Deeper probing, however,
will disclose a much more serious problem. After all, ignoring some sample
elements would affect only the degrees of freedom and the t- and F-statistics
alone. If we already have a large sample, throwing out even a substantial part of it
will not affect matters much. But now it is in order to ask: What is the process by
which some dependent variables are assigned the value zero? A look at (43)
convinces us that it is a random process governed by the behavior of the error
process and the characteristics relevant to the economic agent, x,.. Conversely,
the manner in which the sample on the basis of which we shall estimate fi is
selected is governed by some aspects of the error process. In particular we note
that for us to observe a positive y,, according to

Y, = x,.P + a,, (45)


1588 P. J. Dhrymes

the error process should satisfy

u, > - x,.p. (46)

Thus, for the positive observations we should be dealing with the truncated
distribution function of the error process. But, what is the mean of the truncated
distribution? We have, if f( .) is the density and F( .) the c.d.f. of U,
1
E( U,]U, > - x,.P) = O” 5f(Odk
l-F(-x,.P) / -x,.B

If f( .) is the iV(0, u*) density the integral can be evaluated as

f co>?
and, in addition, we also find

I - F( - x,./3> = F(x,.P).

Moreover, if we denote by +(. ), G(e) the iV(0, 1) density and c.d.f., respectively,
and by

y=- X*.P
t u ’
(47)

then

E( u,Ju, > + x,./3) = u- &t>


@(v,) = a+t*
Since the mean of the error process in (45) is given by (48) we see that we are
committing a misspecification error by leaving out the “variable” +( v,)/@(v,)
[see Dhrymes (1978a)].
Defining

(49)

we see that {u,: r=l,2,...} is a sequence of independent but non-identically


distributed random variables, since

Var(u,)=a2(1-v,$,--1C/:). (50)

Thus, there is no simple procedure by which we can obtain efficient and/or


consistent estimators by confining ourselves to the positive subsample; conse-
quently, we are forced to revert to the entire sample and employ ML methods.
Ch. 27: Limited Dependent Variables 1589

2.3. Estimation of parameters with ML methods

We are operating with the model in (43), subject to (A.2.1.) through (A.2.4.) and
the convention that the first Ti observations correspond to positive dependent
variables, while the remaining T2, (Tl + T2 = T), correspond to zero observations.
Define

c, =1 if yr > 0,
= 0 otherwise, (51)

and note that the (log) LF can be written as

L= 5 ((1-c,)lnB(v,)-c,[+ln(2n)+flno2+-$(~~-x~.~)”]).
t=l

(52)

Differentiating with respect to y = (fi’, u2)‘, we have

8L
-; i {(l-~~)~-,(y~-~~.pi)x,.=o, (53)
ap=
r=1

aL
-=_- 1
i {c~[l-~(~~-x~.8,‘1-(1-c~)~) =O,
au2 2e2 t=i

and these equations have to be solved in order to obtain the ML estimator. It is,
first, interesting to examine how the conditions in (53) differ from the equations
to be satisfied by simple OLS estimators applied to the positive component of the
sample. By simple rearrangement we obtain, using the convention alluded to
above,

x;x,p = x;y(l) - u i q(- v,)x;., (54)


r=T,+l

(55)

where

+(v,> 445)
SW = qv,> Y c-5) = @(_vt). (56)

Since these expressions occur very frequently, we shall often employ the abbrevia-
15W P. J. Dhrymes

ted notation

#, = ~(V,>~ #T=IC,(-VA.
Thus, if in ‘some sense

is negligible, the ML estimator, say j?, could yield results that are quite similar,
from an applications point of view, to those obtained through the simple OLS
estimator, say fi, as applied to the positive component of the sample. From (54) it
is evident that if z$.. of (57) is small then

is also small. Hence, under these circumstances

which explains the experience occasionally encountered in empirical applications.


The eqs. (53) or (54) and (55) are highly nonlinear and can only be solved by
iterative methods. In order to ensure that the root of

aL
-&=o, Y= (P’,a=)‘>
so located is the ML estimator it is necessary to show either that the equation
above has only one root-which is difficult-or that we begin the iteration with an
initial consistent estimator.

2.4. An initial consistent estimator

Bearing in mind the development in the preceding section we can rewrite the
model describing the positive component of the sample as

Yr=Xr_P+a~t+Ut=u(v,+~,)+u,, (58)

such that

{u,: t=1,2 )... },


Ch. 27: Limited Dependent Variables 1591

is a sequence of mutually independent random variables with

ECU,) = 0, Va&) = e2(I- v,JI, - +:>, (59)

and such that they are independent of the explanatory variables x,.
The model in (58) cannot be estimated by simple means owing to the fact that
4, is not directly observable; thus, we are forced into nonstandard procedures.
We shall present below a modification and simplification of a consistent
estimator due to Amemiya (1973). First we note that, confining our attention to
the positive component of the sample

y:=u2(v,+~t)2+U:+2ut(vt+~r)a. (60)
Hence

E( y:lxt., u, > - x,.p) = u2(v:+v,#,)+rJ2


=x,.PE(ytlx,.,Ur> -x,.P)+u2. (61)
Defining

Et = y,2- E(Ytk., u, ’ - x,.b), (62)


we see that {Ed: t=l,2,...} is a sequence of independent random variables with
mean zero and, furthermore, we can write

w, = yt2= x,.py,+ u*+Et, t =1,2 ,..., T,. (63)

The problem, of course, is that JJ~ is correlated with E, and hence simple
regression will not produce a consistent estimator for p and u2.
However, we can employ an instrumental variables (IV.) estimator3

7 = (x:x*)-‘X&w, w= (Wgv2,...,W~J’, (64)

31t is here that the procedure differs from that suggested by Amemiya (1973). He defines

j, =x,. ( xpJ1x;y’“,

while we define

~,=+.a,

for nontrivial vector a.


1592 P. J. Dhrymes

where

X, = (D,X,,e), (65)

and

jt = x,.a, D~=diag(g,,~~,...,~,,), Dy = (~1, Y,,..., YT,), (66)

for an arbitrary nontrivial vector a.


It is clear that by substitution we find

(67)

We easily establish that

Clearly
2, x =
* *
XP&JX,
Y’X,
XlV
e’e 1.

Now

gx;“, =+ cr, x:.ut,


1 1 t=1

and

{x;..,: t=1,2 )... },

is a sequence of independent random variables with mean

E(x;.u,) = ax;.+,, (68)


and covariance matrix

cov(x;.u,) = a*(1 - v,$br- ~,)x:.x,. = O&.Xt., (69)


Ch. 27: Limited Dependent Variables 1593

where

w, = a2(1- v,J/, - +:>,


is uniformly bounded by assumption (A.2.2) and (A.2.4). Hence, by (A.2.2)

lim f; w&.x,.,
r, + 00 1 t=1

converges to a matrix with finite elements. Further and similar calculations will
show that

converges a.c. to a nonsingular matrix. Thus, we are reduced to examining the


limiting behavior of

(70)

But this is a sequence of independent nonidentically distributed random


variables with mean zero and uniformly bounded (in x,. and j3) moments to any
finite order. Now for any arbitrary (n + 2 x 1) vector (Y*consider

(71)

where

and note that

is well defined where

S$’ = z afVar( et). (72)


r=1
1594 P. J. Dhrynes

Define, further

se2
g,=+,
1

and note that

S; = T;/2ST,.

But then it is evident that Liapounov’s condition is satisfied, i.e. with K a uniform
bound on Ela,.~,[~+’

t=1 Tl K
lim s*2+s
SK lim = lim 0.
TI - cc T+m T1+s/2S;,+a T,-tm T;/2S2i8 =
TI 1

By a theorem of Varadarajan, see Dhrymes (1970), we conclude that

&f:
- E- N(0, H),

where

1
; (x,.a)2x:.xt.Var(Et) 5 (x,.a)x:.Var(&,)
t=1
H= lim 1 ‘=‘r,
(73)
T-cc Tl
5 Var(Et) ’
t=1

Consequently we have shown that

\IT,(? - Y) - ~(0, Q-QQ-I),


where

Q=lim
( zcx*>
a.c. Tl .
(74)

Moreover since
Ch. 27: Limited Dependent Variables 1595

where 3 is an a.c. finite random vector it follows that

which shows that 7 converges a.c. to ya.


We may summarize the development above in

Lemma 1
Consider the model in (43) subject to assumptions (A.2.1.) through (A.2.4.);
further consider the I.V. estimator of the parameter vector y in

w, = (Xt.Y,J)Y + &I, w,= Y,‘,


given by

4 = (X&X*)_lk;w,

where k,, X, and w are as defined in (65) and (66). Then

i. 7 converges to yO almost certainly,


ii. JT,(v - ~a) - N(0, Q-lHQ,-l),

where Q and H are as defined in (74) and (73) respectively.

2.5. Limiting properties and distribution of the ML estimator

Returning now to eqs. (53) or (54) and (55) we observe that since the initial
estimator, say 9, is strongly consistent, at each step of the iterative procedure we
get a (strongly) consistent estimator. Hence, at convergence, the estimator so
determined, say p, is guaranteed to be (strongly) consistent.
The perceptive reader may ask: Why did we not use the apparatus of Section
1.d. instead of going through the intermediate step of obtaining the initial
consistent estimator? The answer is, essentially, that Theorem 1 (of Section 1.d.)
does not hold in the current context. To see that, recall the (log) LF of our
problem and write it as

L,(y)=+ f ((1-c,)ln@(-v,)-c,
t=l

(75)
Ch. 27: Limited Dependent Variables 1597

Proof

Consider the log LF of (75) and in particular its t th term

Er= (l-c,)ln@(-Y,)-c, +l(2n)+ +no2+ -&, - XJ?)” )


I
t =1,2,... . (77)
For any x-realization

{&: t =1,2,...},

is a sequence of independent random variables with uniformly bounded moments


in virtue of assumption (A.2.1) through (A.2.3). Thus, there exists a constant, say
k, such that

V=(Sr) < k> for all t.

Consequently, by Kolmogorov’s criterion, for all admissible y,

{ MY)-@T(Y)I} =<o. Q.E.D.

Remark 3
The device of beginning the iterative process for solving (76) with a consistent
estimator ensures that for sufficiently large T we will be locating the estimator,
say j$, satisfying

G(%-) = suP&-(Y).
Y
Lemma 2, can be shown to imply that

L,(?,) a-(u,Yo). qu,YO)= suPqY,Y").


Y
Moreover, we can also show that

7 = yo.

On the other hand, it is not possible to show routinely that fraz’yo. Essentially,
the problem is the term corresponding to a2 which contains expressions like

c
*bt - a2Q)”.’
1598 P. J. Dhrymes

which cannot be (absolutely) bounded. This does not prevent us from showing
convergence a.c. of pr. to y”. By the iterative process we have shown that qT
converges to y” at least in probability. Convergence a.c. is shown easily once we
obtain the limiting distribution of PT -a task to which we now turn.
Thus, as before, consider the expansion

(78)

where y” is the true parameter point and

1% - YOI 5 IY* - YOI.

We already have an explicit expression in eq. (53) for the derivative dL,/dy. So
let us obtain the Hessian of the LF. We find

We may now define

m> x,.PO
Elr=(l-c*)@(_vp) -cf
i
Yt -
uo

I
’ (80)

t2t=ct[l-(
y~-;J”~]-(l-c,)~!~;)
and

&l,=(l-c,)~r”(lClfo-vP)+CI,
+ VP- vM”>9
L = t21r= Cl- c,)J/tO(l (81)
(2214(y+yO) +(1- Ct)V$#=O(l + v#T” - vP2),
Ch. 27: Limited Dependent Variables 1599

where, evidently,

d+P) x .PO +(vP)


q;O= @(_vp) 9
VP=*> #Y=+p> *

With the help of the notation in (80) and (81) we find

i I[]
~(,0)_!_~ i
t=1
a.& tit
0

521 ’
(8’4

and

where 52, r is a matrix all of whose elements are zero except the last diagonal
element, which is

+ ,$r $-&..

Thus, for every T we have

E(ti*,) =o. (84)

Consequently, we are now ready to prove


Theorem 3
Consider the model of eq. (43) subject to assumption (A.2.1.) through (A.2.4.);
moreover, consider the ML estimator, SIT, obtained by iteration from an initial
consistent estimator as a solution of (76). Then

JT(Pr-YO) - N(O,&-‘),

where
Ch. 27: Limited Dependent Variables 1601

From (79) we also verify that

converges in probability to the null matrix, element by element. But the elements
of

are seen to be sums of independent random variables with finite means and
bounded variances; hence, they obey a Kolmogorov criterion and thus

We easily verify that

E(&lJ = %f? E(~12,)=E(52,,)=~(~~)[l-~~~T"+~P2]

=W 121 = W21r3

EG22t) = W22t.

Hence

and, moreover,

JT( f - yO) - N(O,40 (Q.E.D.)


Ch. 27: Limited Dependent Variables 1603

However, it would be much preferable to estimate C as

with Gi ;, given as in (86) evaluated at PT.

2.6. Goodness of $2

In the context of the truncated dependent variable model the question arises as to
what we would want to mean by a “goodness of fit” statistic.
As analyzed in the Section on discrete choice models the usual R*, in the
context of the GLM, serves a multiplicity of purposes; when we complicate the
process in which we operate it is not always possible to define a single statistic
that would be meaningful in all contexts.
Since the model is

YI= X*.P + u, if x,.p + U, > 0,


= 0 if u,l - x~,@,

the fitted model may “describe well” the first statement but poorly the second or
vice versa. A useful statistic for the former would be the square of the simple
correlation coefficient between predicted and actual Y,. Thus, e.g. suppose we
follow our earlier convention about the numbering of observations; then for the
positive component of the sample we put

3, = x,.B + &, t =1,2 ,..., Tl. 638)

An intuitively appealing statistic is

(89)

where

j=+ $Yt, p+ &. (90)


1 t=1 1 t=l
1604 P. J. Dhtymes

As to how well it discriminates between the zero and positive (dependent


variable) observations we may compute @( - P() for all t; in the perfect dis-
crimination case

@(- fi,J’ @(- 4,), t,=1,2 ,..., T,, t2=Tl+1,...,T. (91)

The relative frequency of the reversal of ranks would be another interesting


statistic, as would the average probability difference, i.e.

We have a “right” to expect as a minimum that

d> 0. (93)

3. Sample selectivity

3. I. Generalities

This is another important class of problems that relate specifically to the issue of
how observations on a given economic phenomenon are generated. More particu-
larly, we hypothesize that whether a certain variable, say yz, is observed or not
depends on another variable, say y12 *. Thus, the observability of y,T depends on the
probability structure of the stochastic process that generates y,;, as well as on that
of the stochastic process that governs the behavior of yzT. The variable y,; may be
inherently unobservable although we assert that we know the variables that enter
its “systematic part.”
To be precise, consider the model

t =1,2 >..., T, (94)

where x2 ., x;C2. are rl, r,-element row vectors of observable “exogenous” variables
Ch. 27: Limited Dependent Variables 1605

which may have elements in common. The vectors

u::=(u;,ur,), t=1,2,...,

form a sequence of i.i.d. random variables with distribution

uy- N(0, ix*), 2*>0.

The variable yz is inherently unobservable, while y,T is observable if and only if

An example of such a model is due to Heckman (1979) where y,T is an observed


wage for the tth worker and y,; is his reservation wage. Evidently, y,; is the
“market valuation” of his skills and other pertinent attributes, represented by the
vector x2 ., while y$ represents, through the vector x; those personal and other
relevant attributes that lead him to seek employment at a certain wage or higher.
Alternatively, in the market for housing y,T would represent the “market
valuation” of a given structure’s worth while yl; would represent the current
owner’s evaluation.
Evidently a worker accepts a wage for employment or a structure changes
hands if and only if

If the covariance matrix, Z*, is diagonal, then there is no correlation between yz


and y;” and hence in view of the assumption regarding the error process

{uf.‘: t=1,2 )... },

we could treat the sample

{(y&x;.): t=1,2 ,..., T},

as one of i.i.d. observations; consequently, we can estimate consistently the


parameter vector 8.: by OLS given the sample, irrespectively of the second
relation in (94).
On the other hand, if the covariance matrix, Z*, is not diagonal, then the
situation is far more complicated, since now there does exist a stochastic link
between ytT and y$. The question then becomes: If we apply OLS to the first
equation in (94) do we suffer more than just the usual loss in efficiency?
1606 P. J. Dhtynes

3.2. Inconsistency of least squares procedures

In the current context, it would be convenient to state the problem in canonical


form before we attempt further analysis. Thus, define
* * *
Y,l = Yll 3 Yt2 = Y,l - Yt2 9 x,1.=x,1., x,2.= (x;.,x;.),

* (95)
P.i=P.:, p.2=*p. Utl=UX, *_ *
ut2 = U,l u,23

i
-a’: i

with the understanding that if x;“l. and x2. have elements in common, say,

x:1.= (r,r, r:,), x,*2. = (Z,i., &>,

then

h-P”;2\

x,2.= ( zt+;P1.,z;.), P.2= P((;l 9 (96)


\ - P.*22 1

where /?.:i, /3.t2 are the coefficients of z,i in x;“l_ and xz respectively, 8; is the
coefficient of zx. and p.;2 is the coefficient of zfrz.
Hence, the model in (94) can be stated in the canonical form

Yt, = xt1.P.1 + u,17


(97)
i Y,, = x,2 4.2 + ut2 )

such that x,~ contains at least as many elements as x,~ .,

(u:.=(url,u,,)‘: t=1,2 )... ),

is a sequence of i.i.d. random variables with distribution

u;. - NO, z>, z>o,

and subject to the condition that ytl is observable (observed) if and only if
Y,, 2 0.

If we applied OLS methods to the first equation in (97) do we obtain, at least,


consistent estimators for its parameters? The answer hinges on whether that
question obeys the standard assumptions of the GLM.
Clearly, and solely in terms of the system in (97)

{ u,i: t =1,2,...}, (98)

is a sequence of i.i.d. random variables and if in (94) we are prepared to assert


Ch. 27: Limited Dependent Variables 1607

that the standard conditions of the typical GLM hold, nothing in the subsequent
discussion suggests a correlation between x,r. and u,~; hence, if any problem
should arise it ought to be related to the probability structure of the sequence in
(98) insofar as it is associated with observable ytI -a problem to which we now
turn. We note that the conditions hypothesized by the model imply that (poten-
tial) realizations of the process in (98) are conditioned on4

u,2 2 -x,24.2. (99)

Or, perhaps more precisely, we should state that (implicit) realizations of the
process in (98) associated with observable realizations

{Y*G t =1,2,...},

are conditional on (99). Therefore, in dealing with the error terms of (potential)
samples the marginal distribution properties of (98) are not relevant; what are
relevant are its conditional properties-as conditioned by (99).
We have

Lemma 3

The distribution of realizations of the process in (98) as conditioned by (99) has


the following properties:
i. The elements { u,r, ut2 } are mutually independent for t f t’.
ii. The density of u,r, given that the corresponding y,, is observable (ob-
served) is

000)

where

1
=- xt2.P.2 7rI =- y,,+---u
P12
YI2 l/2 ' J/2 l/2
r1 )
022 i 011 i
001)
2 =- 42
P12 a=l- pt2,
~11~22

and @( .) is the c.d.f. of a N(O,l).

4Note that in terms of the original variables (99) reads

We shall not use this fact in subsequent discussion, however.


1608 P. J. Dhrynes

Proof

i. is quite evidently valid since by the standard assumptions of the GLM we assert
that (x:.,x,*.) and uF=(u;, u&) are mutually independent and that

{ 24;‘: 1=1,2,...},

is a sequence of i.i.d. random variables.


As for part ii. we begin by noting that since the conditional density of url given
u,~ is given by

and since the restriction in (99) restricts us to the space

u,2 2 -x,2./3.2,

the required density can be found as

Completing the square (in 5) and making the change in variable

s= (5- f31)/(42cxY2,
we find

1 7
f(u,,lu,z~-x,2,~.2)=~1
@( Yt2) \/2?ra,, exp- z”“.

To verify that this is, indeed, a density function we note that it is everywhere
nonnegative and

/
O”
-CC
f
(51142 2 - X,,.P.2M

1 O3~ 1 4
=-
@(Yr2) / _-oo\/j----& _J&P- $3~2 *exp- g-p,.
ii I
Ch. 27: Limited Dependent Variables 1609

Making the transformation

f2 = .1’25* - P,231,

the integral is reduced to

-exp- f<zd{2 =l. Q.E.D.

Lemma 4

The k th moment of realizations of the process in (98) corresponding to observ-


able realizations { ytI: t =1,2, . . . } is given, for k even (k = 2,4,6,. . .), by

(k -2s k _ 1

I,., = udk - Wc-2,, - 011k’2~(k-2)‘2p:2y~21C,(y12) x0 [,,+,)( *)’

[2( y-r)]!
0021
*2+r(&p_,)!’

while for k odd (k = 3,5,7,. . . > it is given by

&--I
, 003)

where

+bt2>
4(%*)
--
- qvt2) 3 I,,, =I, I,,, = d12P,24(V,2). 004)
1610 P. J. Dhtymes

Remark 5
It is evident, from the preceding discussion, that the moments of the error process
corresponding to observable y,, are uniformly bounded in P.r, P. 2, urr, ur2, uz2,
xI1, and x12, -provided the parameter space is compact and the elements of
x,, ., x,~ are bounded.
Remark 6
It is also evident from the preceding that for (potential) observations from the
model

Yt, =x,14.1 + U,l,


we have that

(105)

We are now in a position to answer the question, raised earlier, whether OLS
methods applied to the first equation in (97) will yield at least consistent
estimators. In this connection we observe that the error terms of observations on
the first equation of (97) obey

E(U,llUt2 2 - x,2. P.2) = 11, = 4’Pl24+,2)~

Vadu,llq2 2 - x,2.P.2> = 12t - 1: = ql- ~w212~~2ICI(~,2)

- %P:2+2h2)

= 011 - w:2~h,h2 + +(52)1.

As is well known, the second equation shows the errors to be heteroskedastic -


whence we conclude that OLS estimators cannot be eficient. The first equation
above shows the errors to have a nonzero mean. As shown in Dhrymes (1978a) a
nonconstant (nonzero) mean implies misspecification due to left out variables and
hence inconsistency.
Thus, OLS estimators are inconsistent; hence, we must look to other methods
for obtaining suitable estimators for p_r, uir, etc. On the other hand, if, in (105),
p12 = 0, then OLS estimators would be consistent but inefficient.

3.3. The LF and ML estimation

We shall assume that in our sample we have entities for which y,r is observed and
entities for which it is not observed; if ytl is not observable, then we know that
Ch. 27: Limited Dependent Variables 1611

yt2 -c 0, hence that

u,z-c- xt2.P.2.

Consequently, the probability attached to that event is

Evidently, the probability of observing ytr is @(vt2) and given that ytI is observed
the probability it will assume a value in some internal A is

-+‘d.&
exp- %r

Hence, the unconditional probability that ytI will assume a value in the interval A
is

Define

c, =l if y,, is observed,
= 0 otherwise.

and note that the LF is given by

L*= fi [~(Y12)f(Y~1-x,,.P.llu,22
-Xt2.P.2)IC([~(-Y12)11-c,. (106)
r=l

Thus, e.g. if for a given sample we have no observations on y,, the LF becomes

while, if all sample observations involve observable y,,‘s the LF becomes

Finally, if the sample contains entities for which y,, is observed as well as entities
1612 P. J. Dhymes

for which it is not observed, then we have the situation in (106). We shall examine
the estimation problems posed by (106) in its general form.
Remark 7

It is evident that we can parametrize the problem in terms of /?.i, P. 2, utr, u22,a,,;
it is further evident that j3.2 and a,, appear only in the form (/I.2/u:2/2) - hence,
that a,, cannot be, separately, identified. We shall, thus, adopt the convention

022 =
1. (107)
A consequence of (107) is that (105) reduces to
KG?,., u,2 2 - x,,.P.,> = x,1.P.1+ fJl244d. 008)

The logarithm of the LF is given by

L= g (l-c,)ln@(--v,,)
t=1

+ c, - iln(27r) - +lnu,, 2i
- --$,1-x,1.&)“]
]

+ln@[--$+2+~12( yrl-$$““))]}. (109)

Remark 8

We shall proceed to maximize (109) treating p.i, /3.2 as free parameters. As


pointed out in the discussion following eq. (95) the two vectors will, generally,
have elements in common. While we shall ignore this aspect here, for simplicity of
exposition, we can easily take account of it by considering as the vector of
unknown parameters y whose elements are the distinct elements of /3.1,j3.2 and
IJll, PI20
The first order conditions yield

(110)

(111)
Ch. 27: Limited Dependent Variables 1613

Putting

we see that the ML estimator, say ji, is defined by the condition

g(p) =0.
Evidently, this is a highly nonlinear set of relationships which can be solved
only by iteration, from an initial consistent estimator, say 7.

3.4. An initial consistent estimator

To obtain an initial consistent estimator we look at the sample solely from the
point of view of whether information is available on y,i, i.e. whether ytl is
observed with respect to the economic entity in question. It is clear that this, at
best, will identify only /3.*, since absent any information on y,, we cannot
possibly hope to estimate p.i. Having estimated B. 2 by this procedure we proceed
to construct the variable

t=1,2 ,..., T. 016)

Then, turning our attention to that part of the sample which contains observa-
tions on yti, we simply regress ytl on (x,i., 4,). In this fashion we obtain
estimators of

6= %2)’
(PC12 (117)

as well as of uii.
Examining the sample from the point of view first set forth at the beginning of
this section we have the log likelihood function

L,= f [c,ln~(v,,)+(l-c,)ln~(-v,,)], 018)


t=1

which is to be maximized with respect to the unknown vector /3.2. In Section 1.d.
we noted that L, is strictly concave with respect to p. 2; moreover, the matrix of
Ch. 27: Limited Dependent Variables 1615

It is our contention that the estimator in (125) is consistent for p.l and a,,; moreover
that it naturally implies a consistent estimator for oI1, thus yielding the initial
consistent estimator, say

which we require for obtaining the LA4 estimator.


Formally, we will establish that

fi(&&O)=
( 1~-Yb.1-u12($-
q
$41 - NO, FL (130)

for suitable matrix F, thus showing that 8 converges to 6’ with probability one
(almost surely).
In order that we may accomplish this task it is imperative that we must specify
more precisely the conditions under which we are to consider the model5 in (94),
as expressed in (97). We have:
(A.3.1.) The basic error process

{.:.: t=1,2 ,... }, u,.=(u,1,U,2),

is one of i.i.d. random variables with

and is independent of the process generating the exogenous variables


x,1 ., x,2 .*
(A.3.2.) The admissible parameter space, say H c Rn+3, is closed and bounded
and contains an open neighborhood of the true parameter point

(A.3.3.) The exogenous variables are nonstochastic and are bounded, i.e.

Ixt2il< ki, i=0,1,2,...n

for all t.6

5A~ pointed out earlier,it may be more natural to state conditions in terms of the basic variables
x,:., X; ., U; and uZ; doing so, however, will disrupt the continuity of our discussion; for this reason
we state conditions in terms of x,~., q., u,~. and u,q.
1616 P. J. Dhtymes

(A.3.4.) The matrix

X, = (x0.) t =1,2 ,..., T,

is of rank n + 1 and moreover

lip +x;x*
= P, P>O.

Remark 9

It is a consequence of the assumptions above that, for any x12, and admissible
P.2r there exists k such that

- r I xt2 ,P.2 ~2r, O-cr-ck, k<oo,

so that, for example,

44x,,&) WCk) ‘0,


@(x,~.&) <@(k) ~1, (131)
@(xt2.P.~) >@(-+O.

Consequently,

+,dM +,,.P.d
v+J= @(Xt2.&) ’ J/*w=@(- x,,.p.,>’

are both bounded continuous functions of their argument.


To show the validity of (130) we proceed by a sequence of Lemmata.

Lemma 5

The probability limit of the matrix to be inverted in is given by

plim +X1 *‘XI*= hm +Xta’XF=Q,,, Q,>O,


T-CC T-+W

6 We remind the reader that in the canonical representation of (97), the vector xtl. is a subvector of
x,* .; hence the boundedness assumptions on q2 imply similar boundedness conditions on xzl..
Incidentally, note that B.t is not necessarily a subvector of /X0,, since the latter would contain
Pfl - /3.7! and in addition S.$, - a.*,“, , while the former will contain /?:y, 85,.
Ch. 27: Limited Dependent Variables 1617

where

Proof

We examine

s,+x~px.pp]=’ 0 XlG - J/O)


(132)
T[ (Jh”>‘x, (it+~“>(G~“) 1 ’
and the problem is reduced to considering

(133)

evaluated at /I.2 = p.‘, ,

s* _ a2iG,h2)
t- evaluated at /I.2 = /3.$,
ad

IP.$ - PP,I < IP.2 -m.

It is evident that, when the expansion in (133) is incorporated in (132) quadratic


terms in (a. 2 - /X0,) will vanish with T.
Hence we need be concerned only with the terms of the form

or of the form
1618 P. J. Dhynes

In either case we note that by assumption (A.3.4.) and Remark 9

T
lim + C oyx:,.x,,.,
T-+CC r=l

has bounded elements; similarly, for

Consequently, in view of (122) and (132) we conclude

plim S, = 0,
T+CC

which implies

plim +X1*/X1* = Tlim, +Xi@XF = Q,. (134)


T--CC

Corollas 4
The limiting distribution of the left member of (130) is obtainable through

J?;(8 - SO)- Q, lX;r+.l-al*(&40)], u.l=(u11,~21...~Tl)'.

Indeed, by standard argumentation we may establish


Theorem 4

Under assumption (A.3.1) through (A.3.4) the initial (consistent) estimator of this
section has the limiting distribution

JT( 6 - 60) - N(0, I;), F= Q,‘PQ,‘,

where
Ch. 27: Limited Dependent Variables 1619

Q, is defined in (134) and

J+:,) = (J11Wllt = 011 [ I- P%%; - Po11$?] .

Corollary 5
The initial estimator above is strongly consistent.
Proof

From the theorem above

where 5 is an a.c. finite random vector.


Thus

8 converges to 6’ a.c.
Evidently, the parameter cril can be estimated (at least consistently) by

all= T
l [a; + 6,,J$,2 + c$&:] f

3.5. Limiting distribution of the ML estimator

In the previous section we outlined a procedure for obtaining an initial estimator,


say

and have shown that it converges to the true parameter point, say y”, with
probability one (a.c.).
We now investigate the properties of the ML estimator, say q, obtained by
solving
Ch. 27: Limited Dependent Variables 1621

To this effect define

P12
&2
G2(d
@2(Tt)
i
p12v,2 +
_
$2
)I’ (139)

1
1 h)
522r=;cr @(#+ m
@2(rt) +(1-ct)~*(%2)[+*(~,2)--y1217
[

I( )-_!k~~++~,
2 2

,$,=C, 2 A!$
51 * 611 f J!&
i 011 i

I d2

a
)I+2(d

@2(q)
(
Ufl

u:1/2
2


P. J. Dhrymes

In the expressions of (138) and (139) we have replaced, for reasons of


notational economy only,

U?l
l/2 .
i-i011

Remark 10

The starred symbols, for example, t4:r, [;33r, &&, all correspond to components
of the Hessian of the log LF having mean zero. Hence, such components can be
ignored both in determining the limiting distribution of the ML estimator and in
its numerical derivation, given a sample. We can, then, represent the Hessian of
the log of the LF as

where 52: contains only zeros or elements having mean zero. It is also relatively
straightforward to verify that

where the elements of A,, .$., and C2, have been evaluated at the true parameter
point y”.
To determine the limiting distribution of the ML estimator (i.e. the converging
iterate beginning with an initial consistent estimator) we need
Ch. 27: Limited Dependent Variables 1623

Lemma 6

Let A,, E., be as defined in (139) and (138); then,

where

C* = rhm, f f A,Cov(E.,)A; = rhm, f $ E(Q,). 041)


I-l t-1

Proof

The sequence

{A,&: t=1,2 )... },

is one of independent nonidentically distributed random vectors with mean zero


and uniformly bounded moments to any finite order; moreover, the sequence
obeys a Liapounov condition. Consequently

+ E(YO)
- W,C*). (Q.E.D.)

An explicit representation of 0, or C, is omitted here because of its notational


complexity. To complete the argument concerning the limiting distribution of the
ML estimator we obtain the limit of

f-g(Y). y E H.

Again for the sake of brevity of exposition we shall only state the result without
proof

Lemma 7

Under assumptions (A.3.1) through (A.3.4)

uniformly in y.
Ch. 27: Limited Dependent Variables 1625

where { is a well defined a.c. finite random variable. Hence,

Corollary 7

The matrix in the expansion of (135) obeys

1 a=L
T J--Y&~*) 2. lim +[ &YO,].
TdCO

Proof

Lemma 7 and Corollary 6.

3.6. A test for selectivity bias

A test for selectivity bias is formally equivalent to the test of

Ho: p12 = 0 or Y= (B33!2GhO)’

as against the alternative


Hi: y unrestricted (except for the obvious conditions, uri > 0, pi2e[0, 11). From
the likelihood function in eq. (109) the (log) LF under Ho becomes

L(YIH,) = i Cl-c,)ln@(- yt2)+4n@(vt2)


t=l i

[
- +ct ln(2~)+lna,, + &( y,,- x,,.~.,)’ Ii. (142)

We note that (142) is separable in the parameters (/3!,, uri)’ and p.*. Indeed, the
ML estimator of /3.= is the “probit” estimator, p,=, obtained in connection with
eq. (118) in Section 3.d.; the ML estimator of (fi!i, (I&’ is the usual one obtained
by least squares except that uir is estimated with bias - as all maximum likelihood
procedures imply in the normal case. Denote the estimator of Y obtained under
Ho, by y. Denote by y the ML estimator whose limiting distribution was
obtained in the preceding section.
1626 P. J. Dhrymes

Thus

(143)

is the usual likelihood rationtest statistic. It may be shown that

-2x-x:.

Wt=.have thus proved


Theorem 6

In the context of the model of this section a test for the absence of selectivity bias
can be carried out by the likelihood ratio (LR) principle. The test statistic is

-2x-x:,

where

A = supL(y)- supL(y).
HO Y

References

Aitchison, J. and J. Bennett (1970) “Polychotomous Quantal Response by Maximum Indicant”,


Biometrika, 57, 253-262.
Aitchison, J. and S. Silvey (1957) “The Generalization of Probit Analysis to the Case of Multiple
Responses”, Biometrika, 37, 131-140.
Amemiya, T. (1973) “Regression Analysis Whe_i the Dependent Variable Is Truncated Normal”,
Econometrica, 41, 997-1016.
Amemiya, T. (1974) “Bivariate Probit Analysis: Minimum C&-Square Methods”, Journal of the
American Statistical Association, 69, 940-944.
Amemiya, T. (1974) “Multivariate Regression and Simultaneous Equation Models When the Depen-
dent Variables Are Truncated Normal”, Econometrica, 42, 999-1012.
Amemiya, T. (1974) “A Note on the Fair and Jaffee Model”, Econometrica, 42, 759-762.
Amemiya, T. (1975) “Qualitative Response Models”, Annals of Economic and Social Measurement, 4,
363-372.
Amemiya, T. (1976) “The Maximum Likelihood, the Minimum Chi-Square, and the Non-linear
Weighted Least Squares Estimator in the General Qualitative Response Model”, JASA, 71.
Ameniya, T. (1978) “The Estimation of a Simultaneous Equation Generalized Probit Model”,
Econometrica, 46, 1193-1205.
Amemiya, T. (1978) “On a Two-Step Estimation of a Multivariate Logit Model”, Journal of
Econometrics, 8, 13-21.
Amemiya, T. and F. Nold (1975) “A Modified Logit Model”, Review of Economics and Statistics, 57,
255-257.
Anscombe, E. J. (1956) “On Estimating Binomial Response Relations”, Biometrika, 43, 461-464.
Ashford, J. R. and R. R. Sowden (1970) “Multivariate Probit Analysis”, Biometrics, 26, 535-546.
Ashton, W. (1972) The Logit Transformation. New York: Hafner.
Ch. 27: Limited Dependent Variables 1627

Bartlett, M. S. (1935) “Contingent Table Interactions”, Supplement to the Journal of the Royal
Statistical Sociev, 2, 248-252.
Berkson, J. (1949) “Application of the Logistic Function to Bioassay”, Journal of the American
Statistical Association, 39, 357-365.
Berkson, J. (1951) “Why I Prefer Logits to Probits”, Biometrika, 7, 327-339.
Berkson, J. (1953) “A StatisticaIly Precise and Relatively Simple Method of Estimating the Bio-Assay
with Quantal Response, Based on the Logistic Function”, Journal of the American Statistical
Association, 48, 565-599.
Berkson, J. (1955) “Estimate of the Integrated Normal Curve by Minimum Normit Cl&Square with
Particular Reference to B&Assay”, Journal of the American Statistical Association, 50, 529-549.
Berkson, J. (1955) “Maximum Likelihood and Minimum C&Square Estimations of the Logistic
Function”, Journal of the American Statistical Association, 50, 130-161.
Bishop, T., S. Feiberg and P. Hollan (1975) Discrete Multiuariate Analysis. Cambridge: MIT Press.
Block, H. and J. Marschak (1960) “Random Orderings and Stochastic Theories of Response”, in: I.
Olkin, ed., Contributions to Probability and Statistics. Stanford: Stanford University Press.
Bock, R. D. (1968) “Estimating Multinomial Response Relations”, in: Contributions to Statistics and
Probability: Essays in Memory of S. N. Roy. Chapel Hill: University of North Carolina Press.
Bock, R. D. (1968) The Measurement and Prediction of Judgment and Choice. San Francisco:
Holden-Day.
Boskin, M. (1974) “A Conditional Logit Model of Occupational Choice”, Journal of Political
Economy, 82, 389-398.
Boskin, M. (1975) “A Markov Model of Turnover in Aid to Families with Dependent Children”,
Journal of Human Resources, 10, 467-481.
Chambers, E. A. and D. R. Cox (1967) “Discrimination between Alternative Binary Response
Models”, Biometrika, 54, 573-578.
Cosslett, S. (1980) “Efficient Estimators of Discrete Choice Models”, in: C. Manski and D.
McFadden, eds., Structural Analysis of Discrete Data. Cambridge: MIT Press.
Cox, D. (1970) Analysis of Binary Data. London: Methuen.
Cox, D. (1972) “The Analysis of Multivariate Binary Data”, Applied Statistics, 21, 113-120.
Cox, D. (1958) “The Regression Analysis of Binary Sequences”, Journal of the Royal Statistical
Society, Series B, 20, 215-242.
Cox, D. (1966) “Some Procedures Connected with the Logistic Response Curve”, in: F. David, ed.,
Research Papers in Statistics. New York: Wiley.
Cox, D. and E. Snell (1968) “A General Definition of Residuals”, Journal of the Royal Statistical
Society, Series B, 30, 248-265.
Cragg, J. G. (1971) “Some Statistical Models for Limited Dependent Variables with Application to the
Demand for Durable Goods”, Econometrica, 39, 829-844.
Cragg, J. and R. Uhler (1970) “The Demand for Automobiles”, Canadian Journal of Economics, 3,
386-406.
Cripps, T. F. and R. J. Tarling (1974) “An Analysis of the Duration of Male Unemployment in Great
Britain 1932-1973”, The Economic Journal, 84, 289-316.
Daganzo, C. (1980) Multinomial Probit. New York: Academic Press.
Dagenais, M. G. (1975) “Application of a Threshold Regression Model to Household Purchases of
Automobiles”, The Review of Economics and Statistics, 57, 275-285.
Debreu, G. (1960) “Review of R. D. Lute Individual Choice Behavior”, American Economic Review,
50,186-188.
Dhrymes, P. J. (1970) Econometrics: Statistical Foundations and Applications. Harper & Row, 1974,
New York: Springer-Verlag.
Dhrymes, P. J. (1978a) Introductory Econometrics. New York: Springer-Verlag.
Dhrymes, P. J. (1978b) Mathematics for Econometrics. New York: Springer-Verlag.
Domencich, T. and D. McFadden (1975) Urban Travel Demand: A Behavioral Analysis. Amsterdam:
North-Holland.
Efron, B. (1975) “The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis”,
Journal of the American Statistical Association, 70, 892-898.
Fair, R. C. and D. M. JaKee (1972) “Methods of Estimation for Markets in Disequilibrium”,
Econometrica, 40, 497-514.
1628 P. J. Dhrymes

Finney, D. (1964) Statistical Method in Bio-Assay. London: Griffin.


Finney, D. (1971) Probit Analysis. New York: Cambridge University Press.
Gart, J. and J. Zweifel (1967) “On the Bias of Various Estimators of the Logit and Its Variance”.
Biometrika, 54, 181-187.
Gillen, D. W. (1977) “Estimation and Specification of the Effects of Parking Costs on Urban
Transport Mode Choice”, Journal of Urban Economics, 4, 186-199.
Goldberger, A. S. (1971) “Econometrics and Psychometrics: A Survey of Communahties”, Psycho-
metrika, 36, 83-107.
Goldberger, A. S. (1973) “Correlations Between Binary Outcomes and Probabilistic Predictions”,
Journal of American Statistical Association, 68, 84.
Goldfeld, S. M. and R. E. Quandt (1972) Nonlinear Metho& on Econometrics. Amsterdam: North-
Holland.
Goldfeld, S. M. and R. E. Quandt (1973) “The Estimation of Structural Shifts by Switching
Regressions”, Annals of Economic and Social Measurement, 2, 475-485.
Goldfeld, S. M. and R. E. Quandt (1976) “Techniques for Estimating Switching Regressions”, in: S.
Goldfeld and R. Quandt, eds., Studies in Non-Linear Estimation. Cambridge: Ballinger.
Goodman, 1. and W. H. Kruskai (1954) “Measures of Association for Cross Classifications”, Journal
of the American Statistical Association, 49, 732-764.
Goodman, I. and W. H. Kruskal(l954) “Measures of Association for Cross Classification II. Further
Discussion and References”, Journal of the American Statistical Association, 54, 123-163.
Goodman, L. A. (1970) “The Multivariate Analysis of Qualitative Data: Interactions Among Multiple
Classifications”, Journal of the American Statistical Association, 65, 226-256.
Goodman, L. A. (1971) “The Analysis of Multidimensional Contingency Tables: Stepwise Procedures
and Direct Estimation Methods for Building Models for Multiple Classifications”, l’echnometrics,
13, 33-61.
Goodman, L. A. (1972) “A Modified Multiple Regression Approach to the Analysis of Dichotomous
Variables”, American Sociological Review, 37, 28-46.
Goodman, L. A. (1972) “A General Model for the Analysis of Surveys”, Americun Journal of
Sociology, 77, 1035-1086.
Goodman, L. A. (1973) “Causal Analysis of Panel Study Data and Other Rinds of Survey Data”,
American Journal of Sociology, 78, 1135-1191.
Griliches, Z., B. H. Hall and J. A. Hausman (1978) “Missing Data and Self-Selection in Large
Panels”, Annals de I’lnsee, 30-31, 137-176.
Grizzle, J. (1962) “Asymptotic Power of Tests of Linear Hypotheses Using the Probit and Logit
Transformations”, Journal of the American Statistical Association, 57, 877-894.
Grizzle, J. (1971) “Multivariate Logit Analysis”, Biometrics, 27, 1057-1062.
Gronau, R. (1973) “The Effect of Children on the Housewife’s Value of Time”, Journal of Political
Economy, 81, 168-199.
Gronau, R. (1974) “Wage Comparisons: A Selectivity Bias”, Journal of Pohticul Economy, 82,
1119-1143.
Gurland, J., I. Lee and P. Dabm (1960) “Polychotomous Quanta1 Response in Biological Assay”,
Biometrics, 16, 382-398.
Haberman, S. (1974) The Analysis of Frequency Data. Chicago: University of Chicago Press.
Haldane, J. (1955) “The Estimation and Significance of the Logarithm of a Ratio of Frequencies”,
Annals of Human Genetics, 20, 309-311.
Harter, J. and A. Moore (1967) “Maximum Likelihood Estimation, from Censored Samples, of the
Parameters of a Logistic Distribution”, Journal of the American Statistical Association, 62, 615-683.
Hausman, J. (1979) “Individual Discount Rates and the Purchase and Utilization of Energy Using
Durables”, Bell Journal of Economics, 10, 33-54.
Hausman J. A. and D. A. Wise (1976) “The Evaluation of Results from Truncated Samples: The New
Jersey Negative Income Tax Experiment”, Annals of Economic and Social Measurement. 5,421-445.
Hausman, J. A. and D. A. Wise (1977) “Social Experimentation, Truncated Distributions and
Efficient Estimation”, Econometrica, 45, 319-339.
Hausman, J. A. and D. A. Wise (1978) “A Conditional Probit Model for Qualitative Choice: Discrete
Decisions Recognizing Interdependence and Heterogeneous Preferences”, Econometrica, 46,
403-426.
Ch. 27: Limited Dependent Variables 1629

Hausman, J. A. and D. A. Wise (1980) “Stratification on Endogenous Variables and Estimation: The
Gary Experiment”, in: C. Manski and D. McFadden, eds., Structural Analysis of Discrete Dota.
Cambridge: MIT Press.
Heckman, J. (1974) “Shadow Prices, Market Wages, and Labor Supply”, Econometrica, 42, 679-694.
Heckman, J. (1976) “Simultaneous Equations Model with Continuous and Discrete Endogenous
Variables and Structural Shifts”, in: S. M. Goldfeld and E. M. Quandt, eds., Studies in Non-Linear
Estimation. Cambridge: Ballinger.
Heckman, J. (1976) “The Common Structure of Statistical Models of Truncation, Sample Selection
and Limited Dependent Variables and a Simple Estimation for Such Models”, Annals of Economic
and Social Measurement, 5, 415-492.
Heckman, J. (1978) “Dummy Exogenous Variables in a Simultaneous Equation System”, Econometrica,
46, 931-959.
Heckman, J. (1978) “Simple Statistical Models for Discrete Panel Data Developed and Applied to
Test the Hypothesis of True State Dependence Against the Hypothesis of Spurious State Depen-
dence”, Annals de I’lnsee, 30-31, 227-270.
Heckman, J. (1979) “Sample Selection Bias as a Specification Error”, Econometrica, 47, 153-163.
Heckman, J. (1980) “Statistical Models for the Analysis of Discrete Panel Data”, in: C. Manski and
D. McFadden, eds., Structural Analysis of Discrete Data. Cambridge: MIT Press.
Heckman, J. (1980) “The Incidental Parameters Problem and the Problem of Initial Conditions in
Estimating a Discrete Stochastic Process and Some Monte Carlo Evidence on Their Practical
Importance”, in: C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data. Cam-
bridge: MIT Press.
Heckman, J. and R. Willis (1975) “Estimation of a Stochastic Model of Reproduction: An Economet-
ric Approach”, in: N. Terleckyj, ed., Household Production and Consumptron. New York: National
Bureau of Economic Research.
Heckman, J. and R. Willis (1977) “A Beta Logistic Model for the Analysis of Sequential Labor Force
Participation of Married Women”, Journal of Political Economy, 85, 27-58.
Joreskog, K. and A. S. Goldberger (1975) “Estimation of a Model with Multiple Indicators and
Multiple Causes of a Single Latent Variable Model”, Journal of the Amencan Statistical Assocration,
70, 631-639.
Kiefer, N. (1978) “Discrete Parameter Variation: Efficient Estimation of a Switching Regression
Model”, Econometrica, 46, 427-434.
Kiefer, N. (1979) “On the Value of Sample Separation Information”, Econometrica, 47, 997-1003.
Kiefer, N. and G. Neumann (1979) “An Empirical Job Search Model with a Test of the Constant
Reservation Wage Hypothesis”, Journal of Political Economy, 87, 89-107.
Kohn, M., C. Manski and D. Mundel (1976), “An Empirical Investigation of Factors Influencing
College Going Behavior”, Annals of Economic and Social Measurement, 5, 391-419.
Ladd, G. (1966) “Linear Probability Functions and Discriminant Functions”, Econometrica, 34,
873-885.
Lee, L. F. (1978) “Unionism and Wage Rates: A Simultaneous Equation Model with Qualitative and
Limited Dependent Variables”, International Economic Review, 19, 415-433.
Lee, L. F. (1979) “Identification and Estimation in Binary Choice Models with Limited (Censored)
Dependent Variables”, Econometrica, 47, 917-996. _
Lee. L. F. (1980) “Simultaneous Eouations Models with Discrete and Censored Variables”, in: C.
Manski and D: McFadden, eds., Stmtural Analysis of Discrete Data. Cambridge: MIT Press.
Lee, L. F. and R. P. Trost (1978) “Estimation of Some Limited Dependent Variable Models with
Applications to Housing Demand”, Journal of Econometrics, 8, 357-382.
Lerman, S. and C. Manski (1980) “On the Use of Simulated Frequencies to Approximate Choice
Probabilities”, in: C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data.
Cambridge: MIT Press.
Li, M. (1977) “A Logit Model of Home Ownership”, Econometrica, 45, 1081-1097.
Little, R. E. (1968) “A Note on Estimation for Quantal Response Data”, Biometrika, 55, 578-579.
Lute, R. D. (1959) Individual Choice Behavior: A Theoretical Analysis. New York: Wiley.
Lute, R. D. (1977) “The Choice Axiom After Twenty Years”, Journal of Mathematical Psychology, 15,
215-233.
Lute, R. D. and P. Suppes (1965) “Preference, Utility, and Subjective Probability”, in: R. Lute, R.
1630 P. J. Dhtymes

Bush and E. Galanter, eds., Handbook of Mathematical Psychology III. New York: Wiley.
Maddala, G. S. (1977) “Self-Selectivity Problem in Econometric Models”, in: P. Krishniah, ed.,
Applications of Statistics. Amsterdam: North-Holland.
Maddala, G. S. (1977) “Identification and Estimation Problems in Limited Dependent Variable
Models”, in: A. S. Blinder and P. Friedman, eds., Natural Resources, Uncertainty and General
Equilibrium Systems: Essays in Memory of Rafael Lusky. New York: Academic Press.
Maddala, G. S. (1978) “Selectivity Problems in Longitudinal Data”, Annals de I’INSEE, 30-31,
423-450.
Maddala, G. S. and L. F. Lee (1976) “Recursive Models with Qualitative Endogenous Variables”,
Annals of Economic and Social Measurement, 5.
Maddala, G. and F. Nelson (1974) “Maximum Likelihood Methods for Markets in Disequilibrium”,
Econometrica, 42, 1013-1030.
Maddala, G. S. and R. Trost (1978) “Estimation of Some Limited Dependent Variable Models with
Application to Housing Demand;‘, Journal of Econometrics, 8, 357-382.
Maddala. G. S. and R. Trost (1980) “Asvmutotic Covariance Matrices of Two-Staae Probit and
Two-Stage Tobit Methods for‘Sim&ne&~Equations Models with Selectivity”, Econometrica, 48,
491-503.
Manski, C. (1975) “Maximum Score Estimation of the Stochastic Utility Model of Choice”, Journal of
Econometrics, 3, 205-228.
Manski, C. (1977) “The Structure of Random Utility Models”, Theory and Decision, 8, 229-254.
Manski, C. and S. Lerman (1977) “The Estimation of Choice Probabilities from Choice-Based
Samples”, Econometrica, 45; 1977-1988.
Manski, C. and D. McFadden (1980) “Alternative Estimates and Sample Designs for Discrete Choice
Analysis”, in: C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data. Cambridge:
MIT Press.
Marshak, J. “Binary-Choice Constraints and Random Utility Indicators”, in: K. Arrow, S. Karlin and
P. Suppes, eds., Mathematical Methodr in the Social Sciences. Stanford University Press.
McFadden, D. “Conditional Logit Analysis of Qualitative Choice Behavior”, in: P. Zarembka, ed.,
Frontiers in Econometrics. New York: Academic Press.
McFadden, D. (1976) “A Comment on Discriminant Analysis ‘Versus’ Logit Analysis”, Annals of
Economics and Social Measurement, 5, 511-523.
McFadden, D. (1976) “Quantal Choice Analysis: A Survey”, Annals of Economic and Social
Measurement, 5, 363-390.
McFadden, D. (1976) “The Revealed Preferences of a Public Bureaucracy”, Bell Journal, 7, 55-72.
Miller, L. and R. Radner (1970) “Demand and Supply in U.S. Higher Education”, American
Economic Review, 60, 326-334.
Moore, D. H. (1973) “Evaluation of Five Discrimination Procedures for Binary Variables”, Journal of
American Statistical Association, 68, 399-404.
Nelson, F. (1977) “Censored Regression Models with Unobserved Stochastic Censoring Thresholds”,
Journal of Econometrics, 6, 309-327.
Nelson, F. S. and L. Olsen (1978) “Specification and Estimation of a Simultaneous Equation Model
with Limited Dependent Variables”, International Economic Review, 19, 695-710.
Nerlove, M. (1978) “Econometric Analysis of Longitudinal Data: Approaches, Problems and Pro-
spects”, Annales de I’lnsee, 30-31, 7-22.
Nerlove, M. and J. Press (1973) “Univariable and Multivariable Log-Linear and Logistic Models”,
RAND Report No. R-1306-EDA/NIH.
Oliveira, J. T. de (1958) “Extremal Distributions”, Revista de Faculdada du Ciencia, Lisboa, Serie A,
7, 215-227.
Olsen, R. J. (1978) “Comment on ‘The Effect of Unions on Earnings and Earnings on Unions: A
Mixed Logit Approach”‘, International Economic Review, 259-261.
Plackett, R. L. (1974) The Analysis of Categorical Data. London: Charles Griffin.
Poirier, D. J. (1976) “The Determinants of Home Buying in the New Jersey Graduated Work
Incentive Experiment”, in: H. W. Watts and A. Rees, eds., Impact of Experimental Payments on
Expenditure, Health and Social Behavior, and Studies on the Quality of the Evidence. New York:
Academic Press.
Poirier, D. J. (1980) “A Switching Simultaneous Equation Model of Physician Behavior in Ontario”,
Ch. 27: Limited Dependent Variables 1631

in: D. McFadden and C. Manski, eds., Structural Analysis of Discrete Data: With Econometric
Applications. Cambridge: MIT Press.
Pollakowski, H. (1980) Residential Location and Urban Housing Markets. Lexington: Heath.
Quandt, R. (1956) “Probabilistic Theory of Consumer Behavior”, Quarterly Journal of Economics, 70,
501-536.
Quandt, R. (1970) The Demand for Travel. London: Heath.
Quandt, R. (1972) “A New Approach to Estimating Switching Regressions”, Journal of the American
Statisiical Association, 67, 306-310.
Quandt, R. (1978) “Tests of the Equilibrium vs. Disequilibrium Hypothesis”, Internutional Economic
Review, 19, 435-452.
Quandt, R. and W. Baumol(1966) “The Demand for Abstract Travel Modes: Theory and Measure-
ment”, Journal of Regional Science, 6, 13-26.
Quandt, R. E. and J. B. Ramsey (1978) “Estimating Mixtures of Normal Distributions and Switching
Regressions”, Journal of the American Statistical Association, 71, 730-752.
Quigley, J. M. (1976) “Housing Demand in the Short-Run: An Analysis of Polytomous Choice”,
Explorations in Economic Research, 3, 76-102.
Radner, R. and L. Miller (1975) Demand and Supply in U.S. Higher Education. New York:
McGraw-Hill.
Sattath, S. and A. Tversky (1977) “Additive Similarity Trees”, Psychometrika, 42, 319-345.
Shakotko; Robert A. and M. Grossman (1981) “Physical Disability and Post-Secondary Educational
Choices”, in: V. R. Fuchs, ed., Economic Aspects of Health. National Bureau of Economic Research,
Chicago: University of Chicago Press.
Sickles, R. C. and P. Schmidt (1978) “Simultaneous Equation Models with Truncated Dependent
Variables: A Simultaneous Tobit Model”, Journal of Economics and Business, 31, 11-21.
Theil, H. (1969) “A Multinomial Extension of the Linear Logit Model”, International Economic
Review, 10, 251-259.
Theil, H. (1970) ‘I On the Estimation of Relationships Involving Qualitative Variables”, American
Journal of Sociology, 76, 103-154.
Thurstone, L. (1927) “A Law of Comparative Judgement”, Psychological Review, 34, 273-286.
Tobin, J. (1958) “Estimation of Relationships for Limited Dependent Variables”, Econometrica, 26,
24-36.
Tversky, A. (1972) “Choice by Elimination”, Journal of Mathematical Psychology. 9, 341-367.
Tversky, A. (1972) “Elimination by Aspects: A Theory of Choice”, Psychological Review, 79,281-299.
Walker, S. and D. Duncan (1967) “Estimation of the Probability of an Event as a Function of Several
Independent Variables”, Biometrika, 54, 167-179.
Westin, R. (1974) “Predictions from Binary Choice Models”, Journal of Econometrics, 2, 1-16.
Westin, R. B. and D. W. Gillen (1978) “Parking Location and Transit Demand: A Case Study of
Endogenous Attributes in Disaggregate Mode Choice Functions”, Journal of Econometrics, 8,
75-101.
Willis, R. and S. Rosen (1979) “Education and Self-Selection”, Journal of Political Economy, 87,
507-536.
Yellot, J. (1977) “The Relationship Between Lute’s Choice Axiom, Thurstone’s Theory of Compara-
tive Judgment, and the Double Exponential Distribution”, Journal of Mathematical Psychology, 15,
109-144.
Zellner, A. and T. Lee (1965) “Joint Estimation of Relationships Involving Discrete Random
Variables”, Econometrica, 33, 382-394.
Chapter 28

DISEQUILIBRIUM, SELF-SELECTION, AND


SWITCHING MODELS*

G. S. MADDALA

University of Florida

Contents

1. Introduction 1634
2. Estimation of the switching regression model:
Sample separation known 1637
3. Estimation of the switching regression model:
Sample separation unknown 1640
4. Estimation of the switching regression model with imperfect
sample separation information 1646
5. Switching simultaneous systems 1649
6. Disequilibrium models: Different formulations of price adjustment 1652
6.1. The meaning of the price adjustment equation 1653
6.2. Modifications in the specification of the demand and supply functions 1656
6.3. The validity of the “Mm” condition 1660
7. Some other problems of specification in disequilibrium models 1662
7.1. Problems of serial correlation 1663
7.2. Tests for distributional assumptions 1664
7.3. Tests for disequilibrium 1664
7.4. Models with inventories 1667
8. Multimarket disequilibrium models 1668
9. Models with self-selection 1672
10. Multiple criteria for selectivity 1676
11. Concluding remarks 1680
References 1682

*This chapter was first prepared in 1979. Since then Quandt (1982) has presented a survey of
disequilibrium models and Maddala (1983a) has treated self-selection and disequilibrium models in
two chapters of the book. The present paper is an updated and condensed version of the 1979 paper. If
any papers are not cited, it is just through oversight rather than any judgment on their importance.
Financial support from the NSF is gracefully acknowledged.

Handbook of Econometrics, Volume III, Edited by Z. Griliches and M.D. Intriligator


Q Elsevier Science Publishers B V, I986
1634 G. S. Maddala

1. Introduction

The title of this chapter stems from the fact that there is an underlying similarity
between econometric models involving disequilibrium and econometric models
involving self-selection, the similarity being that both of them can be considered
switching structural systems. We will first consider the switching regression model
and show how the simplest models involving disequilibrium and self-selection fit
in this framework. We will then discuss switching simultaneous equation models,
disequilibrium models and self-selection models.
A few words on the history of these models might be in order at the outset.
Disequilibrium models have a long history. In fact all the “partial adjustment”
models are disequilibrium models.’ However, the disequilibrium models consid-
ered here are different in the sense that they add the extra element of ‘quantity
rationing’. The differences will be made clear later (in Section 6). As for
self-selection models, one can quote an early study by Roy (1951) who considers
an example of two occupations: Hunting and fishing and individuals self-select
based on their comparative advantage. This example and models of self-selection
are discussed later (in Section 9). Finally, as for switching models, almost all the
models with discrete parameter changes fall in this category and thus they have a
long history. The models considered here are of course different in the sense that
we consider also “endogenous” switching. We will first start with some examples
of switching regression models. Switching simultaneous equations models are
considered later (in Section 5).
Suppose the observations on a dependent variable Y can be classified into two
regimes and are generated by different probability laws in the two regimes. Define

Yl=xBl+~l. 0.1)

Y2 = w, + u 2. (1.2)

and

Y = Yl iff Zcu - u > 0. (I .3)


Y = Y2 iff Za-~50. (I.4

X and Z are (possibly overlapping) sets of explanatory variables. fil, p2 and (Y


are sets of parameters to be estimated. ul, u2 and u are residuals that are only
contemporaneously correlated. We will assume that (u,, u2, U) are jointly nor-

‘The disequilibrium model in continuous time analyzed by Bergstrom and Wymer (1976) is also a
partial adJustment model except that it is formulated in continuous time.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1635

mally distributed with mean vector 0, and covariance matrix

0:

1
012 (Jlu

z= u12 u; u2u .

(JlU 02u 1
i

We have set var(u) = 1 because, by the nature of the conditions (1.3) and (1.4) (Y
is estimable only up to a scale factor.
The model given by eqs. (1.1) to (1.4) is called a switching regression model. If
% = (72” = 0 then we have a model with exogenous switching. If uiU or u2U is
non-zero, we have a model with endogenous switching. This distinction between
switching regression models with exogenous and endogenous switching has been
discussed at length in Maddala and Nelson (1975).
We will also distinguish between two types of switching regression models.
Model A: Sample separation known.
Model B: Sample separation unknown.
In the former class we know whether each observed y is generated by (1.1) or
(1.2). In the latter class we do not have this information. Further, in the models
with known sample separation we can consider two categories of models:
Model A-l: y observed in both regimes.
Model A-2: y observed in only one of the two regimes.
We will discuss the estimation of this type of models in the next section. But first,
we will given some examples for the three different types of models.
Example 1: Disequilibrium market model
Fair and Jaffee (1972) consider a model of the housing market. There is a demand
function and a supply function but demand is not always equal to supply. (As to
why this happens is an important question which we will discuss in a later
section.) The specification of the model is:
Demand function: D = XP, + u1
Supply function: S = X/3, + u2
The quantity transacted, Q, is given by

Q = Min( D, S) (the points on the thick lines in Figure 1).

Thus Q=X&+u, if D<S,

Q = X,2 + ~2 if D > S.
1636 G. S. Maddala

Figure 1

The condition D < S can be written as:

where a2 = Var(u, - u2) = 0: + u; - 25. Thus the model is the same as the
switching regression model in eqs. (1.1) to (1.4) with 2 = X, (Y= ( p2 - Pr)/a and
u = (ur - u2)/u. If sample separation is somehow known, i.e. we know which
observations correspond to excess demand and which correspond to excess
supply, then we have Model A-l. If sample separation is not known, we have
Model B.
Example 2: Model with self-selection
Consider the labor supply model considered by Gronau (1974) and Lewis (1974).
The wages offered W, to an individual, and the reservation wages W, (the wages
at which the individual is willing to work) are given by the following equations:

wo=xp,+u, wr=xp,+u*.

The individual works and the observed wage W = W, if W, 2 W,. If W, < W,, the
individual does not work and the observed wages are W = 0. This is an example
of Model A-2. The dependent variable is observed in only one of the two regimes.
The observed distribution of wages is a truncated distribution-it is the distribu-
tion of wage offers truncated by the “Self-selection” of individuals-each individ-
ual choosing to be ‘in the sample’ of working individuals or not, by comparing his
(or her) wage offer with his (or her) reservation wage.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1637

Example 3: Demand for durable goods


This example is similar to the labor-force participation model in Example 2. Let
y, denote the expenditures the family can afford to make, and yz denote the value
of the minimum acceptable car to the family (the threshold value). The actual
expenditures y will be defined as y = yi iff yi 2 y, and = 0 otherwise.
Example 4: Needs vs. reluctance hypothesis
Banks are reluctant to frequent the discount window too often for fear of adverse
sanctions from the Federal Reserve. One can define:

y, = Desired borrowings
y2 = Threshhold level below which banks will not use the discount window.

The structure of this model is somewhat different from that given in examples 2
and 3, because we observe yi all the time. We do not observe y2 but we know for
each observation whether y, I y, (the bank borrows in the Federal funds market)
or yi > y2 (the bank borrows from the discount window).
Some other examples of the type of switching regression model considered here
are the unions and wages model by Lee (1978), the housing demand model by Lee
and Trost (1978), and the education and self-selection model of Willis and Rosen
(1979).

2. Estimation of the switching regression model: Sample separation known

Returning to the model given by eqs. (1.1) to (1.4), we note that the likelihood
function is given by (dropping the t subscripts on U, X, Z, y and I)

(2-l)

where

I=1 iff Zo-u>O,


= 0 otherwise.

and the bivariate normal density of (tli, u) has been factored into the marginal
1638 G. S. Maddala

density gr( ur) and the conditional density fr( u[ui), with a similar factorization of
the bivariate normal density of (u,, u). Note that ui2 does not occur at all in the
likelihood function and thus is not estimable in this model. Only urU and uzu are
estimable. In the special case u = (ur - ~*)/a where u2 = Var(u, - u2) as in the
examples in the previous section, it can be easily verified that from the consistent
estimates of a:, CT:, uiU and a,, we can get a consistent estimate of ur2.
The maximum likelihood estimates can be obtained by an iterative solution of
the likelihood equations using the Newton-Raphson method or the Berndt et al.
(1974) method. The latter involves obtaining only the first derivatives of the
likelilood function and has better convergence properties. In Lee and Trost
(1978) it is shown that the log-likelihood function for this model is uniformly
bounded from above. The maximum likelihood estimates of this model can be
shown to be consistent and asymptotically efficient following the lines of proof
that Amemiya (1973) gave for the Tobit model. To start the iterative solution of
the likelihood equations, one should use preliminary consistent estimates of the
parameters which can be obtained by using a two-stage estimation method which
is described in Lee and Trost (197Q2 and will not be reproduced here.
There are some variations of this switching regression model that are of
considerable interest. The first is the case of the labor supply model where y is
observed in only one of the two regimes (Model A-2). The model is given by the
following relationships:

J’ = Yl if Y, 2 y2

= 0 otherwise.

For the group Z = 1, we know yi = y and y, 2 y


For the group Z = 0, all we know is yi < y,

Hence the likelihood function for this model can be written as:

where

u2 =Var(u, - u2) = 6~: + u: -2u,,,

‘This procedure first used by Heckman (1976) for the labor supply model was extended to a wide
class of models by Lee (1976).
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1639

@( .) is the distribution function of the standard normal and f is the joint density
of ( uit, uZt). Since y is observed only in one of the regimes, we need to impose
some identifiability restrictions on the parameters of the model. These restrictions
are:
(a) There should be at least one explanatory variable in (1.1) not included in
(1.2)
or
(b) Cov( ui, ZQ) = 0.
These conditions were first derived in Nelson (1975) and since then have been
re-derived by others.
The second variation of the switching regression model that has found wide
application is where the criterion function dete rmining the switching also involves
yi and y2 i.e. eqs. (1.3) and (1.4) are replaced by

Y = Yl iff I* > 0,

Y = Y2 iff I* IO.

Where

1*=y,yi+y,y2+zcw-u. (2.3)

Examples of this model are the unions and wages model by Lee (1978) and the
education and self-selection model by Willis and Rosen (1979). In both cases, the
choice function (2.3) determining the switching involves the income differential
( y1 - y2). Thus yZ = - yi. Interest centers on the sign and significance of the
coefficient of (y, - y2).
The estimation of this model proceeds as before. We first write the criterion
function in its reduced form and estimate the parameters by the probit method.
Note that, for normalization purposes, instead of imposing the condition Var( u)
=l, it is more convenient to impose the condition that the variance of the
residual U* in the reduced form for (2.3) is unity.

i.e.Var(u*)=Var(y,u,+y,u,-u)=l. (2.4)

This means that Var( u) = u,’ is a parameter to be estimated. But, in the switching
regression model, the parameters that are estimable are: pi, &, u:, I$, ulU*, and
(I~,,* where a& = Cov(u,, u*) and ulU * = Cov(u,, u*). The estimates of uiU* and
u2U* together with the normalization eq. (2.4) give us only 3 equations from which
we still have to estimate four parameters ui2, uiU, u2,, and u,‘. Thus, in this model
we have to impose the condition that one of the covariances Q, ulU, u2U is zero.
The most natural assumption is u12 = 0.
1640 G. S. Mad&la

As for the estimation of the parameters in the choice function (2.3) again we
have to impose some conditions on the explanatory variables in y, and y2. After
obtaining estimates of the parameters 8% and &, we get the estimated values jjl
and j$ or y, and y2 respectively and estimate the parameters in (2.3) by the
probit method using these estimated values of y, and y2. The condition for the
estimability of the parameters in (2.3) is clearly that there be no perfect multicol-
linearity between j+, j$ and z.
This procedure, called the “two-stage probit method” gives consistent estimates
of the parameters of the choice function. Note that since (yr - jr) and (y2 - j&)
are heteroscedastic, the residuals in this two-stage probit method are hetero-
scedastic, But this heteroscedasticity exists only in small samples and the residuals
are homoscedastic asymptotically, thus preserving the consistency properties of
the two-stage probit estimates. For a proof of this proposition and the derivation
of the asymptotic covariance matrix of the two-stage probit estimates. see Lee
(1979).

3. Estimation of the switching regression model: Sample separation unknown

In this case we do not know whether each observation belongs to Regime 1 or


Regime 2. The labor supply model clearly does not fall in this category because
the sample separation is known automatically. In the disequilibrium market
model, where the assumption of unknown sample separation has been often
made, what this implies is that given just the data on quantity transacted and the
explanatory variables, we have to estimate the parameters of both the demand
and supply functions. Once we estimate these parameters, we can estimate the
probability that each observation belongs to the demand and the supply function.
Consider the simplest disequilibrium model with sample separation unknown:

0, = X1,/?, + urr (Demand function),

S, = X& + uzl (Supply function),

Q, = Min(D,, S,).

The probability that observation I belongs to the demand function is:

X, = Prob( 0, < S,),

= Prob( ui, - u2t < X2,P2 - X1,&). (3.1)

Let f(~r, u2) be the joint density of (ur, u2) and g(D, S) the joint density of D
and S derived from it. If observation t is on the demand function, we know that
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1641

D, = Q, and S, > Q,. Hence,

h(Q,lQ, = D,) = )--de,, $bWh (3.2)


I

The denominator X, in (3.2) is the normalizing constant. It is equal to the


numerator integrated over Q, over its entire range. Similarly, if observation t is
on the supply function, we know that S, = Q, and 0, > Q,. Hence,

JdQ,lQ, = 4) = jQwd%
I
Q,)W/(l- A,). (3.3)

The unconditional density of Q, is:

h<Q,>=X,h<Q,lQ,=D,>+<1-x,>h<Q,lQ,=s,>
= /+dQty $)W + /f&L Q,)dDt. (3.4)
f ,

The likelihood function is:

L= I-b(Q,). (3.5)

As will be shown later, the likelihood function for this model is unbounded for
certain parameter values.
Once the parameters in the model have been estimated, we can estimate the
probability that each observation is on the demand function or the supply
function. Maddala and Nelson (1974) suggest estimating the expressions A, in
(3.1). These were the probabilities calculated in Sealy (1979) and Portes and
Winter (1980). Kiefer (1980a) and Gersovitz (1980) suggest calculating:

J’(D, < &IQ,>, (3.6)

and classifying an observation as belonging to the demand function if this


probability is > 0.5 and belonging to the supply function if this probability is
< 0.5.
For the model we are considering, we have

Prob(D,<&lQ,) =lQmdQ,, S,)dWh(Q,), (3-7)


I

where h(QL,) is defined in (3.4). Lee (1983b) treats the classification of sample
observations to periods of excess demand or excess supply as a problem in
1642 G. S. Maddala

discriminant analysis. He shows that the classification rule suggested by Kiefer


and Gersovitz is optimal in the sense that it minimizes the total probability of
misclassification. Even in a complicated model, these relationships hold good.
Note that in a more complicated model (say with stochastic price adjustment
equations) to calculate h, as in (3.1) or to compute (3.7) we need to derive the
marginal distribution of D, and S,.
There are two major problems with the models with unknown sample sep-
aration, one conceptual and the other statistical. The conceptual problem is that
we are asking too much from the data when we do not know which observations
are on the demand function and which are on the supply function. The results
cannot normally be expected to be very good though the frequency with which
‘good’ results are reported with this method are indeed surprising. For instance,
in Sealey (1979) the standard errors for the disequilibrium model (with sample
separation unknown) are in almost all cases lower than the corresponding
standard errors for the equilibrium model! Goldfeld and Quandt (1975) analyze
the value of sample separation information by Monte-Carlo methods and Kiefer
(1979) analyzes analytically the value of such information by comparing the
variances of the parameter estimates in a switching regression model from a joint
density of ( y, D) and the marginal density of y (where y is a continuous variable
and B is a discrete variable). These results show that there is considerable loss of
information if sample separation is not known. In view of this, some of the
empirical results being reported from the estimation of disequilibrium models
with unknown sample separation are surprisingly good. Very often, if we look
more closely into the reasons why disequilibrium exists, then we might be able to
say something about the sample separation itself. This point will be discussed
later in our discussion of disequilibrium models.
The statistical problem is that the likelihood functions for this class of models
are usually unbounded unless some restrictions (usually unjustifiable) are imposed
on the error variances. As an illustration, consider the model in eqs. (1.1) to (1.4):
Define
Prob( y = _~r) = r,
Prob(y=y,)=l-lr.

The conditional density of y given y = yr is:

f(YlY = vr) =fr(_Y - J&)/r.


Similarly,

f(rlv = Y2) =fz(Y - X&/<I - r>.

Hence, the unconditional density of y is:


Ch. 28: Disequilibrium, Self-selection, and Switching Models 1643

Where fi and f2 are the density functions of ui and a2 respectively. Thus, the
distribution of y is the mixture of two normal distributions. Given n observations
yi, we can write the likelihood functions as:

where

and

.
I

Take u2 # 0 and consider the behaviour of L as u1 + 0. If Xi& = yi, then


A, ---,00 and A,, A,,-A, all + 0. But B,, B,,- B, are finite. Hence L + co. Thus,
as ui --, 0 the likelihood function tends to infinity if Xi& = yi for any value of i.
Similarly, if ui # 0, then as a2 -+ 0 the likelihood function tends to infinity if
Xiip2 = yi for any value of i.
In more complicated models, this proof gets more complicated, but the struc-
ture of the proof is the same as in the simple model above. [See Goldfeld and
Quandt (1975) and Quandt (1983, pp. 13-16) for further discussion of the
problem of unbounded likelihood functions in such models.]
Another problem in this model, pointed out by Goldfeld and Quandt (1978) is
the possibility of convergence to a point where the correlation between the
residuals is either +l or -1. This problem, of course, does not arise if one
assumes ui2 = 0 to start with.
The disequilibrium model with unknown sample separation that we have been
discussing is a switching regression model with endogenous switching. The case of
a switching regression model with exogenous switching and unknown sample
separation has been extensively discussed in Quandt and Ramsay (1978) and the
discussion that followed their paper.
The model in this case is:

Regime 1: yi = X;,& + &riwith probability X


Regime 2: yi = X2ifi2 + aZi with probability (1 - A)

&ii- IN(0, u,2) E2i- IN(0, u,2).

As noted earlier, the likelihood function for this model becomes unbounded for
certain parameter values. However, Kiefer (1978) has shown that a root of the
1644 G. S. Maddala

likelihood equations corresponding to a local maximum is consistent, asymptoti-


cally normal and efficient.3
Quandt and Ramsay (1978) suggest an MGF (moment generating function)
estimator for this model. Note that the moment generating function of y is:

E(e@) =Xexp
[
*;&B+B2$
I
+(1-X)exp
[
e2u2
x;P20+--$-
1
. (3.8)

Select a set of 4 (j = 1,2.. . k) and replace in eq. (3.8).

E(eej-“) by i ,i eeJyl,
1=1

exP( @j+) by 5 ,k exP( ~j4A)y


2-l

and

Quandt and Ramsay’s MGF method is to estimate the parameters y=


(A, Pr, 82, a?, qf) by minimizing

(3.9)

where

@j_Yi),
Zj( S,) = eXP(

and G(y, xi, 9) is the value of the expression on the right hand side of (3.8) for
B = t$ and the ith observation.
The normal equations obtained by minimizing (3.9) with respect to y are the
same as those obtained by minimizing

(3.10)
;=I j=l

3Hartley and Mallela (1977) prove the strong consistency of the maximum likehood estimator but
on the assumption that q and 4 are bounded away from zero. Amemiya and Sen (1977) show that
even if the likelihood function is unbounded, a consistent estimator of the true parameter value in this
model corresponds to a local maximum of the likelihood function rather than a global maximum.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1645

where

The normal equations in both cases are:

Schmidt (1982) shows that we get more efficient estimates if we minimize


weighted sum of squares rather than the simple sum of squares (3.10), making use
of the covariance matrices 0, of (cil, .si2,.. . eik) for i = 1,2.. . n.
Two major problems with the MGF estimator is the choice of the number of
8’s to be chosen (the choice of k) and the choice of the particular values of 6’,for
a given choice of k. Schmidt (1982) shows that the asymptotic efficiency of the
modified MGF estimator (the estimator corresponding to generalized least squares)
is a non-decreasing function of k and conjectures that the lower bound of the
asymptotic variance is the asymptotic variance of the ML estimator. Thus, the
larger the k the better. As for the choice of the particular values of ej for given k,
Kiefer, in his comment on Quandt and Ramsay’s paper notes that the 8’s
determine the weights given to the moments of the raw data by the MGF
estimator. Small e’s imply heavy weight attached to low order moments. He also
suggests choosing B’S by minimizi ng some measure of the size of the asymptotic
covariance matrix (say the generalized variance). But this depends on the values
of the unknown parameters, though some preliminary estimates can be sub-
stituted. Schmidt (1982) presents some Monte-Carlo evidence on this but it is
inconclusive.
The discussants of the Quandt and Ramsay paper pointed out that the authors
had perhaps exaggerated the problems with the ML method, that they should
compare their method with the ML method, and perhaps use the MGF estimates
as starting values for the iterative solution of likelihood equations.
In summary, there are many problems with the estimation of switching models
with unknown sample separation and much more work needs to be done before
one can judge either the practical usefulness of the model or the empirical results
already obtained in this area. The literature on self-selection deals with switching
models with known sample separation but the literature on disequilibrium models
contains several examples of switching models with unknown sample separation
[see Sealey (1979), Rosen and Quandt (1979) and Portes and Winter (1980)].
Apart from the computational problems mentioned above, there is also the
problem that these studies are all based on the hypothesis of the minimum
condition holding on the aggregate so that the aggregate quantity transacted
switches between being on the demand curve and the supply curve. The validity
1646 G. S. Maddala

of this assumption could be as much a problem in the interpretation of the


empirical results as the estimation problems discussed above. Though the
“minimum condition” can be justified at the micro-level, it would no longer be
valid at the macro-level. Muellbauer (1978) argues that at the macro-level a more
reasonable assumption is that

The problems of aggregation are as important as the problems of estimation with


unknown sample separation discussed at length above. The econometric problems
posed by aggregation have also been discussed in Batchelor (1977), Kooiman and
Kloek (1979), Malinvaud (1982) and Muellbauer and Winter (1980).

4. Estimation of the switching regression model with imperfect


sample separation information

The discussion in the previous two sections is based on two polar cases: sample
separation completely known or unknown. In actual practice there may be many
cases where information about sample separation is imperfect rather than perfect
or completely unavailable. Lee and Porter (1984) consider the case. They consider
the model:

Regime 1: Yr, = X&i + &it (4.1)


Regime 2 : Y,, = X2,& + Q,, (4.2)

for r =1,2,..., T. There is a dichotomous indicator IV; for each r which provides
sample separation information for each t. We define a latent dichotomous
variable It where

1, = 1 if the sample observation Y, = Yr,


= 0 if the sample observation Y, = Y,,.

The relation between 1, and W, can be best described by a transition probabil-


ity matrix
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1641

where

pij=Prob(W,=jll,=i) for i, j =1,2.

pll+Plo=landp,,+p,=l.

Let

Prob( W, = 1) = p.

Then

where

X = prob(I, =l).

If we assume &ir and Ed, to be normally distributed as N(0, u:) and N(O,&
respectively and define

1
exp - *(Y,- xjtPi)2 for i =1,2,
fi = (2n)“5Ji [ u, I

then the joint density of Y, and W, is

m, w> = [fihP,, + f20 - QPml wt

.[fix(l-P1,)+f*(l-X)(l-POlll-Wtr (4.3)

and the marginal density of Y, is

(4.4)

If pl1 = pOl, then the joint density f(Y, W,) can be factored as:

and hence the indicators W, do not contain any information on the sample
separation. One can test the hypothesis p 11 = pOl in any actual empirical case, as
shown by Lee and Porter. Also, if pl1 = 1 and pOl = 0, the indicator W, provides
1648 G. S. Maddala

perfect sample separation, and

Thus, both the cases considered earlier-sample separation known and sample
separation unknown are particular cases of the model considered here.
Lee and Porter also show that if pI1 # pal, then there is a gain in efficiency by
using the indicator W,. Lee and Porter show that the problem of unbounded
likelihood functions encountered in switching regression models with unknown
sample separation also exists in this case of imperfect sample separation. As for
ML estimation, they suggest a suitable modification of the EM algorithm sug-
gested by Hartley (1977, 1978) and Kiefer (1980b) for the switching regression
model with unknown sample separation.
The paper by Lee and Porter is concerned with a switching regression model
with exogenous switching but it can be readily extended to a switching regression
model with endogenous switching. For instance, in the simple disequilibrium
market model

4 = 4tP1+ &it,
4 = X2,82+ E2I!
Q, = Min(D,,s,),

the joint density of Q, and w can be derived by a procedure analogous to that in


(4.3) and it is

f(Q,, W) = h% + ~olG,,l~[(l-p,,)G,,+(1-~ol)G2fl~--’~,
where

and g(D, S) is the joint density of D and S. The marginal density h(Q,) of Q, is
given by the eq. (3.4). As before, if p1t = pal then the joint density f( Q,, W,) can
be written as

One can use the sign of AP, for W,. The procedure would then be an extension of
Ch. 28: Disequdibrium, Self -selecrion, and Switching Models 1649

the ‘directional method’ of Fair and Jaffee (1.972) in the sense that the sign of A P,
is taken to be a noisy indicator rather than a precise indicator as in Fair and
Jaffee. Further discussion of the estimation of disequilibrium models with noisy
indicators can be found in Maddala (1984).

5. Switching simultaueous systems

We now consider generalizations of the model (1.1) to (1.4) to a simultaneous


equation system. Suppose the set of endogenous variables Y are generated by the
following two probability laws:
B,Y, + r,x= vi. (5.1)
B,Y, + I’,X= r/,. (5.2)
and
Y= Yi iff Zol- u > 0. (5 *3)
Y=Y2 iff Za-u<O. (5.4)
If u is uncorrelated with Ui and U,, we have switching simultaneous systems with
exogenous switching. Goldfeld and Quandt (1976) consider models of this kind.
Davidson (1978) and Richard (1980) consider switching simultaneous systems
where the number of endogenous variables could be different in the two regimes.
The switching is still exogenous. An example of this type of model mentioned by
Davidson is the estimation of a simultaneous equation model where exchange
rates are fixed part of the time and floating the rest of the time. Thus the exchange
rate is endogenous in one regime and exogenous in the other regime.
If the residual u is correlated with Vi and U, we have endogenous switching.
The analysis of such models proceeds the same way as Section 2 and the details,
which merely involve algebra, will not be pursued here. [See Lee (1979) for the
details.] Problems arise, however, when the criterion function in (5.3) and (5.4)
involves some of the endogenous variables in the structural system. In this case we
have to write the criterion function in its reduced form and make sure that the
two reduced form expressions amount to the same condition. As an illustration,
consider the model

Y, = YiY, + &Xi + Uir


y2=Y2yl+&x2+u2 if Y,<c,
=y;Y,+p;x,+U, if Y,>c.

Unless (1 - y1y2) and (1 - yiy;) are of the same sign, there will be an inconsistency
in the conditions Yi < c and Y, > c from the two reduced forms. Such conditions
1650 G. S. Maddala

for logical consistency have been pointed out by Amemiya (1974), Maddala and
Lee (1976) and Heckman (1978). They need to be imposed in switching simulta-
neous systems where the switch depends on some of the endogenous variables.
Gourieroux et al. (1980b) have derived some general conditions which they call
“coherency conditions” and illustrate them with a number of examples. These
conditions are derived from a theorem by Samelson et al. (1958) which gives a
necessary and sufficient condition for a linear space to be partitioned in cones.
We will not go into these conditions in detail here. In the case of the switching
simultaneous system considered here, the condition they derive is that the
determinants of the matrices giving the mapping from the endogenous variables
(Y,, Y,,..., Y,) to the residuals (zdi, Us,..., uk) are of the same sign, in the
different regimes. The two determinants under consideration are (1 - y1y2) and
(1 - yly$). The condition for logical consistency of the model is that they are of
the same sign or (1 - y1y2)(1 - yly;) > 0. A question arises about what to do’with
these conditions. One can impose them and then estimate the model. Alterna-
tively, since the condition is algebraic, if it cannot be given an economic
interpretation, it is important to check the basic structure of the model. An
illustration of this is the dummy endogenous variable model in Heckman (1976a).
The model discusses the problem of estimation of the effect of fair employment
laws on the wages of blacks relative to whites, when the passage of the law is
endogenous. The model as formulated by Heckman is a switching simultaneous
equations model for XC& we have to impose a condition for “logical con-
sistency”. However, the condition does not have any meaningful economic
interpretation and as pointed out in Maddala and Trost (1981) a careful
examination of the arguments reveals that there are two sentiments, not one as
assumed by Heckman, that lead to the passa,ge of the law, and when the model is
reformulated, there is no condition for logical consistency that needs to be
imposed.
The simultaneous equations models with truncated dependent variables consid-
ered by Amemiya (1974) are also switching simultaneous equations models which
require conditions for logical consistency. Again, one needs to examine whether
these conditions need to be imposed exogenously or whether a more logical
formulation of the problem leads to a model where these conditions are automati-
cally satisfied. For instance, Waldman (1981) gives an example of time allocation
of young men to school and work where the model is formulated in terms of
underlying behavioural relations and the conditions derived by Amemiya follow
naturally from economic theory. On the other hand, these conditions have to be
imposed exogenously (and are difficult to give an economic interpretation) if the
model is formulated in a mechanical fashion where time allocated to’work was
modelled as a linear function of school time and exogenous variables and time
allocated to school was modelled as a linear function of work time and exogenous
variables.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1651

The point of this lengthy discussion is that in switching simultaneous equation


models, we often have to impose some conditions for the logical consistency of
the model. If these conditions cannot be given a meaningful economic interpreta-
tion, it is worthwhile checking the original formulation of the model rather than
imposing these conditions exogenously and estimating the parameters in the
model subject to these conditions.
An interesting feature of the switching simultaneous systems is that it is
possible to have underidentified systems in one of the regimes. As an illustration,
consider the following model estimated by Avery (1982):

D = &Xi + a,Y + ui Demand for Durables.


Y, =&Xi + a2D + u2 Demand for Debt.
Y, =&X3 + a,D + uj Supply of Debt.
Y = min( Yr, Y,) Actual quantity of Debt.

D, Y,, Y, are the endogenous variables and X, and X3 are sets of exogenous
variables. Note that the exogenous variables in the demand for durables equation
and the demand for debt equation are the same.
The model is a switching simultaneous equations model with endogenous
switching. We can write the model as follows:

Regime 1: Y, -c Y2 Regime 2: Y; < Yr


D=pXl+a,Y+ul D = &Xl + a,Y + u1

Y = ,ll;X, + cw,D c u2 Y = &X3 + a,D + u3

If we get the reduced forms for Y, and Y, in the two regimes and simplify the
expression Y, - Y2, we find that:

(Yr-Y,)inRegimeZ=s{(Y,-Y,)inRegimel}.

Thus, the condition for the logical consistency of this model is that (1 - (Y~oL~)
and
(1 - (Y& are of the same sign - a condition that can also be derived by using the
theorems in Gourieroux et al. (1980b).
The interesting thing to note is that the simultaneous equation system in
Regime 1 is under-identified. However, if the system of equations in Regime 2 is
identified, the fact that we can get consistent estimates of the parameters in the
demand equation for durables from Regime 2, enables us to get consistent
estimates of the parameters in the Y, equation. Thus the parameters in the
simultaneous equations system in Regime 1 are identified. One can construct a
formal and rigorous proof but this will not be attempted here. Avery (1982) found
1652 G. S. Maddaia

that he could not estimate the parameters of the structural equation for Y, but
this is possibly due to the estimation methods used.
In summary, switching simultaneous equations models often involve the im-
position of constraints on parameters so as to avoid some internal inconsistencies
in the model. But it is also very often the case that such logical inconsistencies
arise when the formulation of the model is mechanical. In many cases, it has been
found that a re-examination and a more careful formulation leads to an alterna-
tive model where such constraints need not be imposed.
There are also some switching simultaneous equations models where a variable
is endogenous in one regime and exogenous in another and, unlike the cases
considered by Richard (1980) and Davidson (1978), the switching is endogenous.
An example is the disequilibrium model in Maddala (1983b).

6. Disequilibrium models: Different formulations of price adjustment

Econometric estimation of disequilibrium models has a long history. The partial


adjustment models are all disequilibrium models and in fact this is the type of
model that the authors had in mind when they talked of ‘“disequilibrium model.”
Some illustrative examples of this are Rosen and Nadiri (1974). arid Jonson and
Taylor (1977).
The recent literature on disequilibrium econometrics considers a different class
of models and has a different structure. These models are more properly called
“rationing models.” This literature started with the paper by Fair and Jaffee
(1972). The basic equation in their models is

Q, = MidD,, St), (6.1)


where
Q, = quantity transacted
D, = quantity demanded
S, = quantity supplied.

Fair and Jaffee considered two classes of models


(i) Directional models: In these we mfer whether Q, is equal to D, or S, based
on the direction of price movement, i.e.
D, > St and hence Q, = S, if AP, > 0,
D, < St and hence Q, = D, if AP, c 0,
where A P, = P, - P,_ ,,
and
(ii) Quantitative models: In these the price change is proportional to excess
demand (or supply), i.e.
P, - P,-. i = y(D, - S,). (6.2)
Ch. 28: Disequd~bnum, Seij-sekrtton and Switchrnp Mod& 1653

The maximum likelihood estimation of the quantitative model is discussed in


Amemiya (1974a). The maximum likelihood estimation of the directional model,
and models with stochastic sample separation (i.e. where only (6.1) is used or (6.2)
is stochastic) is discussed in Maddala and Nelson (1974).
The directional method is logically inconsistent since the condition that AP,
gives information on sample separation implies that F’* is endogenous, in which
case there are not enough equations to determine the endogenous variables Q,
and Pt.” We will, therefore, discuss only -models with the price determination eq.
(4.2) included.
There are three important problems with the specification of this model that
need some discussion. These are:
(i) The meaning of the price adjustment eq. (6.2)
(ii) The modification in the specification of the demand and supply functions
that need to be made because of the existence of the disequilibrium, and
(iii) The vahdity of the min. condition (6.1).
We will discuss these problems in turn.

6.1. The meaning of the price adjustment equation

The disequilibrium market model usually considered consists of the following


demand and supply functions:

and the eqs. (6.1) and (6.2). To interpret the “price adjustment” eq. (6.2) we have
to ask the basic question of why disequilibrium exists. One interpretation is that
prices are fixed by someone. The model is thus a j&price model. The disequi-
librium exists because price is fixed at a level different from the market equilibrat-
ing level (as is often the case in centrally planned economies). In this case the

41~e directional method makes sense only for the estimation of the reduced form equations for 0,
and S, in a model with a price adjustment equation. There are cases where this is needed. The
likelihood function for the estimation of the parameters in this model is derived in Maddala and
Nelson (1974). It is:

where g( D, S) is the joint density of D and S (from the reduced form equations). When A P < 0 we
have D = Q and S > Q and when AP > 0 we have S = Q and D > Q. Note that the expression given
in Fair and Kelejian (1974) as the likelihood function for this model is not correct though it gives
consistent estimates of the parameters.
1654 G. S. Maddula

price adjustment eq. (6.2) can be interpreted as the rule by which the price-fixing
authority is changing the price. However, there is the problem that the price-fixing
authority does not know D, and S, since they are determined only after P, is
fixed. Thus, the eq. (6.2) cannot make any sense in the fix-price model. Laffont
and Garcia (1977) suggested a modification of the price adjustment equation
which is:

P t+l- P, = Y(D, - s,). (6.2’)

In this case the price fixing authority uses information on the past period’s
demand and supply to adjust prices upwards or downwards. In this case the
price-fixing rule is an operational one but one is still left wondering why the
price-fixing authority follows such a dumb rule as (6.2’). A more reasonable thing
to do is to fix the price at a level that equates expected demand and supply. One
such rule is to determine price by equating the components of (6.3) and (6.4) after
ignoring the stochastic disturbance terms. This gives

p = 4*P1- X2$*
f (6.5)
a2 - a1

This is the procedure suggested by Green and Laffont (1981) under the name of
“anticipatory pricing”.
As mentioned earlier, the meaning of the price adjustment equation depends on
the source of disequilibrium. An alternative to the fix-price model as an explana-
tion of disequilibrium is the pa&l adjustment model (see Bowden, 1978 a, b). The
source of disequilibrium in this formulation is stickiness of prices (due to some
institutional constraints or other factors). Let P,* be the market equilibrating
price. However, prices do not adjust fully to the market equilibrating level and we
specify the “partial adjustment” model:

4 - p,-1 =h(P,*- Pt_,) O<h<l


=A(P,*-P,+P,-P,_,). (6.6)
Hence

p* - p*-1z&P,*-P,). (6.7)

If P, -c P,* there will be excess demand and if P, > P,* there will be excess supply.
Hence, if A P, c 0 we have a situation of excess supply.
Note that in this case it is AP, (not AP,+l as in the Laffont-Garcia case) that
gives the sample separation. But the interpretation is not that prices rise in response
to excess demand (as implicitly argued by Fair and Jaffee) but that there is excess
Ch. 28: Disrqudbkmz, Self- selectron cmd Switching Models 1655

demand (or excess supply) because prices do not fully adjust to the equilibrating
values.5
Equation (6.7) can also be written as

pt-pt-1 = Y(D, - s,), (6.8)


if we assume that the excess demand (0, - S,) is proportional to the difference
(P,* - P,), i.e. the difference between the equilibrating price and the actual price.
The interpretation of the coefficient y in (6.8) is of course different from what
Fair and Jaffee gave to the same equation.
One can also allow for different speeds of upward and downward partial
adjustment. Consider the following formulation:

P,- Pt_l=x,(P,*-P,_,) if P,* > P,_l,

= A,( P,* - P,_,) if P,* < P,_,. (6.9)

These equations imply

p* - p,-1 =A-(P,+-P,) ifP,“>P,.


1

=&(p;-P,) ifP,*<P,
2

Note first that the conditions P,* r Prmi, P, > Pt-l, P,* > P, and D, > St are all
equivalent. Also assuming that excess demand is proportional to P,* - P, we can
write eqs. (6.10) as

4=~,@,-4) if D, > S,

= u,(D, - 4) if D, -C S,. (6.11)

Again note that we get A P, and not A P,+ 1 in these equations.


Ito and Ueda (1979) use Bowden’s formulation with different speeds of
adjustment as given by (6.9) to estimate the rates of adjustment in interest rates
for business loans in the U.S. and Japan. They prefer this formulation to that of
Fair and Jaffee or Laffont and Garcia because in eq. (6.9), A, and X2 are pure
numbers which can be compared across countries. The same cannot be said about
the parameters yi and y2 in eq. (6.11).

‘The formulation in terms of partial adjustment towards P* was suggested by Bowden (1978a)
though he does not use the interpretation of the Fair-Jafiee equation given here. Bowden (1978b)
discusses this approach in greater detail under the title: “The PAMEQ Specification”.
1656 G. S. Maddala

There is still one disturbing feature about the partial adjustment eq. (6.6) that
Bowden adopts and under which we have given a justification for the Fair and
Jaffee directional and quantitative methods. This is that AP, unambiguously gives
us an idea about whether there is excess demand or excess supply. As mentioned
earlier this does not make intuitive sense. On closer examination one sees that the
problem is with eq. (6.6), in particular the assumption that X lies between 0 and
1. This is indeed a very strong assumption and implies that prices are sluggish but
never change to overshoot P,* the equilibrium prices. There is? however, no
a priori reason why this should happen. 6 Once we drop the assumption that h
should lie between 0 and 1, it is no longer true that we can use AP, to classify
observations as belonging to excess demand or excess supply. As noted earlier the
assumption 0 < h < 1 implies that the conditions P,* > Pt__,, P, > PC_,, P,* > P,
and D, > S, are all equivalent. With A > 1, this no longer holds good.
In summary, we considered two models of disequilibrium-- the fix-price model
and the partial adjustment model. In the f&price model,the price adjustment eq.
(6.2) is non-operational. The modification (6.2’) suggested by Laffont and Garcia
is an operational rule but really does not make much sense. A more reasonable
formula for a price-setting rule is the anticipatory pricing rule (6.5). But this
implies that a price-adjustment equation like (6.2) or (62’j is not valid.
In the case of the partial adjustment model one ca.n derive an equation of the
form (6.2) though its meaning is different from the one given by Fair and Jaffee
and many others using this price adjustment equation. The meaning is not that
prices adjust in response to excess demand or supply but that excess demand and
supply exist because prices do not adjust to the market equilibrating level.
However, as discussed earlier, eq. (6.2) can be derived from the partial adjustment
model (6.6) only under a restrictive set of assumptions.
The preceding arguments hold good when eq. (6.2) is made stochastic with the
addition of a disturbance term. In this case there is not much use for the
price-adjustment equation. The main use of eq. (6.2) is that it gives a sample
separation, and estimation with sample separation known is much simpler than
estimation with sample separation unknown. If one is anyhow going to estimate a
model with sample separation unknown, then one can as well eliminate eq. (6.2).
For fix-price models, one substitutes the anticipatory price eq. (6.5) and for
partial adjustment models one uses eq. (6.6) directly.

6.2. Modifications in the speci$cution of the demand and supp(v functions

The preceding discussion refers to alternative formulations of the price adjust-


ment equation. One can also question the specification of the other equations as

6Since no economic model has been specified, there is no reaon to make any aItematr\e assumption
either.
Ch. 28: Disequilibrium, Self-selection and Switching Models 1657

well. We will now discuss alternative specifications of the demand and supply
functions.
The probability that there would be rationing should affect the demand and
supply functions. There are two ways of taking account of this. One procedure
suggested by Eaton and Quandt (1983) is to introduce the probability of rationing
as an explanatory variable in the demand and/or supply functions (6.3) and (6.4).
A re-specification of eq. (6.3), they consider is

(6.3’)

r, = Prob( 0, > S,),


yr is expected to be < 0. Eaton and Quandt show that the solution for rt is
unique.7 In their empirical work they include (1 - r,) as an explanatory variable
in the supply function. They also include a price adjustment equation in their
model.
An alternative procedure to take account of the probability of rationing is to
re-formulate the demand and supply functions in terms of expected prices and
incorporate the probability of disequilibrium as a determining factor in the
formation of expectations. This is the approach followed in Chanda (1984). Since
price expectations anyhow need to be introduced into the model and since
stickiness in price movement or other limitations on price movement are the
sources of disequilibrium, this procedure of incorporating probability of rationing
into price expectations is the logical one and is more meaningful than introducing
the probability of disequilibrium as an explanatory variable, as done by Eaton
and Quandt. The approach adopted by Eaton and Quandt does not say what
disequilibrium is due to, whereas the approach based on price expectations
depends on what the sources of disequilibrium are.
As an illustration of this approach we will re-formulate the supply function by
introducing expected prices. We leave eqs. (6.1) (6.2) and (6.3) as they are and
re-define (6.4) as

s, = X2*& + Qte + U2I, (6.4’)

where Pteis the expected price, i.e. the price the suppliers expect to prevail in
period t, the expectation being formed at time t - 1 (we will assume a one period
lag between production decisions and supply). Regarding the expected price P,', if
we use some naive extrapolative or the adaptive expectations formulae, then the
estimation proceeds as in earlier models with no price expectations, with minor
modifications. For instance, with the adaptive expectations formula, one would

‘Though the analysis is similar, the computations are more complex because of the presence of q in
the demand function.
1658 G. S. Maddala

first get the ML estimates conditional on a value X of the weighting parameter


and then choose the value of A for which the likelihood is maximum.
An alternative procedure is to use the rational expectations hypothesis

p,‘= E(P,IZ,-,), (6.12)

where P,’ is the expected price and Z,_1 represents the information set the
economic agents are assumed to have.
Equation (6.12) implies that we can write

where u, is uncorrelated with all the variables in the information set It-l. If the
information set Z,_, includes the exogenous variables X,, and Xzt, i.e. if these
exogenous variables are known at time I - 1, then we can substitute P,’ = P, - u,
in eq. (6.12). We can re-define a residual U$ = uzt - (Y*u, and u;, has the same
properties as Us,. Thus the estimation of the model simplifies to the case
considered by Fair and Jaffee.
If, on the other hand, X,, and X,, are not known at time (t - 1) we cannot
treat u, the same way as we treat uzl since u, can be correlated with Xi, and X,,.
In this case we proceed as follows.
From eqs. (6.2) (6.3), and (6.4’) we have

or

Taking expectations of both sides conditional on the information set ZI_1,

P,‘- Pt_l= Y[P;x;c-Pix;*,)+(~,-~*,P~l~

or

(6.13)

where

&= [l+y(a:-al),

h2=[l+y(:2-al)]

Ch. 28: Disequilibrium, Se&f-selection and Switching Models 1659

and XI7 and Xi, are the expected values of X,, and X,,. (Note that this equation
is valid even if the price adjustment eq. (6.2) is made stochastic.)
To obtain Xz and Xz we have to make some assumptions about how these
exogenous variables are generated. A common assumption is that they follow
vector autoregressive processes. Let us for the sake of simplicity of notation
assume a first order autoregressive process.

Xl, = +11x1,,-1 + +12x2,t-1+ Elt*

X2,=4321&,1-1 + +22x2,r-1+ Ezt* (6.14)

Then

x;: = +llx~,t-l + +12x2,1-19

and

x,: = +21q-1+ @22X2.,-l.

We substitute these equations in (6.13) and substitute the resulting expression


for P,’ in eq. (6.4’).
The estimation of the model will proceed as with the usual disequilibrium
model. The likelihood function in this model is derived in exactly the same way as
with the Fair and Jaffee model, as derived in Amemiya (1974). The only extra
complication is the existence of cross-equation restrictions as implied by eqs.
(6.14), as discussed in Wallis (1980). The two-stage least squares estimation
suggested in Amemiya (1974) can also be easily adapted to the above model. For
details of this, see Chanda (1984).
Yet another modification in the specification of the demand and supply
function that one needs to make is that of ‘spillovers’. The unsatisfied demand
and excess supply from the previous period will spill over to current demand and
supply. The demand and supply functions (6.3) and (6.4) are now reformulated
respectively as:

0, = X,,& + alp, + &@,-, - Q,-,>+G


St = X2,/32 + (~28 + a,@-1 - Q,-,I+ uzt, (6.15)

with 6, > 0, S, > 0, and S,6, ~1. [See Orsi (1982) for this last condition.]
At time (t - l), Q,_, is equal to D,_r or S,_,. Thus, one of these is not
observed. However, if the price adjustment eq. (6.2) is not stochastic, one has a
four-way regime classification depending on excess demand or excess supply at
time periods (t - 1) and t. Thus, the method of estimation suggested by Amemiya
(1974a) for the Fair and Jaffee model can be extended to this case. Such extension
1660 G. S. Maddala

is done in Laffont and Monfort (1979). Orsi (1982) applied this model to the
Italian labor market but the estimates of the spill-over coefficients were not
significantly different from zero. This method is further extended by Chanda
(1984) to the case where the supply function depends on expected prices and
expectations are formed rationally.

6.3. The validity of the ‘Min’condition

As mentioned in the introduction, the main element that distinguishes the recent
econometric literature on disequilibrium models from the earlier literature is the
‘Mm’ condition’ (6.1). This condition has been criticized on the grounds that:
(a) Though it can be justified at the micro-levei, it cannot be valid at the
aggregate level where it has been very often used.
(b) It introduces unnecessary computational problems which can be avoided by
replacing it with

Q=M~~[E(D),E(S)]+E. (6.1’)

(c) In some disequilibrium models, the appropriate condition for the trans-
acted quantity is

Q=O if D # S.

Criticism (a) made by Muellbauer (1978) is a valid one. The appropriate


modifications depend on the assumptions made about the aggregation procedure.
These problems have been discussed in Batchelor (1977), Kooiman and Kloek
(1979), Malinvaud (1982) and Muellbauer and Winter (1980). Bouisson, Laffont
and Vuong (1982) suggests using survey data to analyze models of disequilibrium
at the aggregate level.
Regarding criticism (b), Richard (1980b) and Hendry and Spanos (1980) argue
against the use of the ‘Min’ condition as formulated in (6.1). Sneessens (1981,
1983) adopts the condition (6.1’). However, eq. (6.1’) is hard to justify as a
behavioural equation. Even the computational advantages are questionable [see
Quandt (1983) pp. 25-261. The criticism of Hendry and Spanos is also not valid
on closer scrutiny [see Maddala (1983a), pp. 34-35 for details].
Criticism (c) is elaborated in Maddala (1983a,b), where a distinction is made
between “Rationing models” and “Trading Models”, the former term applying to
models for which the quantity transacted is determined by the condition (6.1),
and the latter term applying to models where no transaction takes place if 0, # S,.
Condition (6.1) is thus replaced by

Q, = 0 if D, f S,. (6.1”)
Ch. 28: Disequilibrium, Self-selection and Switching Models 1661

The term ‘trading model’ arose by analogy with commodity trading where trading
stops when prices hit a floor or a ceiling (where there is excess demand or excess
supply respectively). However, in commodity trading, a sequence of trades takes
place and all we have at the end of the day is the total volume of trading and the
opening, high, low and closing prices. ’ Thus, commodity trading models do not
necessarily fall under the category of ‘trading’ models defined here. On the other
hand models that involve ‘rationing’ at the aggregate level might fall into the class
of ‘trading’ models defined here at the micro-level. Consider, for instance, the
loan demand problem with interest rate ceilings. At the aggregate level there
would be an excess demand at the ceiling rate and there would be rationing. The
question is how rationing is carried out. One can argue that for each individual
there is a demand schedule giving the loan amounts L the individual would want
to borrow at different rates of interest R. Similarly, the bank would also have a
supply schedule giving the amounts L it would be willing to lend at different rates
of interest R. If the rate of interest at which these two schedules intersect is I E
the ceiling rate, then a transaction takes place. Otherwise no transaction takes
place. This assumption is perhaps more appropriate in mortgage loans rather than
consumer loans. In this situation Q is not Min(D, S). In fact Q = 0 if D # S.The
model would be formulated as:

LoanDemand Li=alRi+/3~Xli+uli
if RTsx,
Loan Supply Li=azRi+fl;X2i+~i
I
Li= 0 otherwise.

R5 is the rate of interest that equilibrates demand and supply. If the assumption
is that the individual borrows what is offered at the ceiling rate i?, an assumption
more appropriate for consumer loans, we have

Li=a,R+p;X,i+U2i if RF>i?.

In this case of course Q = Min( D,S),but there is never a case of excess-supply.


Further discussion of this problem can be found in Maddala and Trost (1982).

8Actually, in studies on commodity trading, the total number of contracts is treated as Q, and the
closing price for the day as P,. The closing price is perhaps closer to an equilibrium price than the
opening, low and high prices. But it cannot be treated as an equilibrium price. There is the question of
what we mean by ‘equilibrium’ price in a situation where a number of trades take place in a day. One
can interpret it as the price that would have prevailed if there was to be a Walrasian auctioneer and a
single trade took place for a day. If this is the case, then the closing price would be an equilibrium
price only if a day is a long enough period for prices at the different trades to converge to some
equilibrium. These problems need further work. See Monroe (1981).
1662 G. S. Maddala

The important situations where this sort of disequilibrium model arises is where
there are exogenous controls on the movement of prices. There are essentially
three major sources of disequilibrium that one can distinguish.
(1) Fixed prices
(2) Imperfect adjustment of prices
(3) Controlled prices
We have till now discussed the case of fixed prices and imperfect adjustment to
the market equilibrating price. The case of controlled prices is different from the
case of fixed prices. The disequilibrium model considered earlier in example 1,
Section 1 is one with flx:d prices. With fixed prices, the market is almost always
in disequilibrium. With controlled prices, the market is sometimes in equilibrium
and sometimes in disequilibrium.’
Estimation of disequilibrium models with controlled prices is discussed in
Maddala (1983a, pp. 327-34 and 1983b) and details need not be presented here.
Gourieroux and Monfort (1980) consider endogenously controlled prices and
Quandt (1984) discusses switching between equilibrium and disequilibrium.
In summary, not all situations of disequilibrium involve the ‘Min’ condition
(6.1). In those formulations where there is some form of rationing, the alternative
condition (6.1’), that has been suggested on grounds of ccmputational simplicity,
is not a desirable one to use and is difficult to justify conceptually.
What particular form the ‘Min’ condition takes depends on how the rationing
is carried out and whether we are analyzing micro or macro data. The discussion
of the loan problem earlier shows how the estimation used depends on the way
customers are rationed. This analysis applies at the micro level. For analysis with
macro data Goldfeld and Quandt (1983) discuss alternative decision criteria by
which the Federal Home Loan Bank Board (FHLBB) rations its advances to
savings and loan institutions. The paper based on earlier work by Goldfeld, Jaffee
and Quandt (1980) discusses how different targets and loss functions lead to
different forms of the ‘Min’ condition and thus call for different estimation
methods. This approach of deriving the appropriate rationing condition from
explicit loss functions is the appropriate thing to do, rather than writing down the
demand and supply functions (6.3), and (6.4), and saying that since their is
disequilibrium (for some unknown and unspecified reasons) we use the ‘Min’
condition (6.1).

7. Some other problems of specification in disequilibrium models

We will now discuss some problems of specifications in disequilibrium models


that need further work.

‘Mackinnon (1978) discusses this problem but the likelihood functions he presents are incorrect.
The correct analysis of this model is presented in Maddala (1983b).
Ch. 28: Disequilibrium, Self-selection and Switching Models 1663

7.1, Problems of serial correlation

The econometric estimation of disequilibrium models is almost exclusively based


on the assumption that the error terms are serially independent. If they are
serially correlated, the likelihood functions are intractable since they involve
integrals of a very high dimension. One can, however, derive a test for serial
correlation based on the Lagrangian multiplier principle that does not involve the
evaluation of multiple integrals. (See Lee and Maddala, 1983a.) Quandt (1981)
discusses the estimation of a simple disequilibrium model with autocorrelated
errors but the likelihood function maximized by him is L = n,h(Q,) which is not
correct since Q, and Q,_, are correlated. The only example till now where
estimation is done with autocorrelated errors is the paper by Cosslett and Lee
(1983) who analyze the model

Y, = x,p + d, - ll,,

where U, are first-order autocorrelated, Y, is a continuous indicator and 1, is a


discrete indicator measured with error. The model they consider is thus, a
switching regression model with exogenous switching and imperfect sample
separation. Cosslett and Lee derive a test statistic for detecting serial correlation
in such a model and show that the likelihood function can be evaluated by a
recurrence relation, and thus maximum likelihood estimation is computationally
feasible.
For the disequilibrium model with known sample separation, one can just
transform the demand and supply eqs. (6.3) and (6.4). For instance, if the
residuals in the two equations are first-order autocorrelated, we have

%* = Plul,,-l+ e1t9

u21 = p2u2,1-1+ e2r. (7.1)

Then we have

and

s, = p$_, + P2X2r - P2P2X2,,-I + e2t. (7.2)

Since sample separation is available, the procedure in Laffont and Monfort


(1979) can be used with the modification that there are nonlinear restrictions on
the parameters in (7.2). The same procedure holds good if, instead of (7.1) we
specify an equation where ult and u21 depended on lagged values of both ult and
U2r
1664 G. S. Maddala

Thus, serially correlated errors can be handled if the sample separation is


known and in models with exogenous switching even if the sample separation
is imperfect.

7.2. Tests for distributional assumptions

The econometric estimation of disequilibrium models is entirely based on the


assumption of normality of the disturbances. It would be advisable to devise tests
of the normality assumption and suggest methods of estimation that are either
distribution-free or based on distributions more general than the normal distribu-
tion. Lee (1982b) derives a test for the assumption of normality in the disequi-
librium market model from the Lagangean multiplier principle. The test is based
on some measures of cumulants. He finds that for the data used by Fair and
Jaffee (1972) the normality assumption is strongly rejected. More work, therefore,
needs to be done in devising methods of estimation based on more general
distributions, or deriving some distribution-free estimators [see Cossleit (1984)
and Heckman and Singer (1984) for some work in this direction].

7.3. Tests for disequilibrium

There have been many tests suggested for the “disequilibrium hypothesis”, i.e. to
test whether the data have been generated by an equilibrium model or a
disequilibrium model. Quandt (1978) discusses several tests and says that there
does not exist a uniformly best procedure for testing the hypothesis that a market
is in equilibrium against the alternative that it is not.
A good starting point for “all” tests for disequilibrium is to ask the basic
question of what the disequilibrium is due to. In the case of the partial adjustment
model given by eq. (6.7) the disequilibrium is clearly due to imperfect adjustment
of prices. In this case the proper test for the equilibrium vs. disequilibrium
hypothesis is to test whether X = 1. See Ito and Ueda (1981). This leads to a test
that l/y = 0 in the Fair and Jaffee quantitative model, since y is proportional to
l/l - X. This is the procedure Fair and Jaffee suggest. However, if the meaning of
the price adjustment equation is that prices adjust in response to either excess
demand or excess supply, then as argued in Section 6, the price adjustment
equation should have A P, + 1 not A P,, and also it is not clear how one can test for
the equilibrium hypothesis in this case. The intuitive reason is that now the price
adjustment equation does not give any information about the source of the
disequilibrium.
Quandt (1978) argues that there are two classes of disequilibrium models which
are;
(a) Models where it is known for which observations 0, < S, and for which
D, > S,, i.e. the sample separation is known, and
Ch. 28: Disequilibrium, Self selection and Switching Models 1665

(b) Models in which such information is not available.


He says that in case (a) the question of testing for disequilibrium does not arise
at all. It is only in case (b) that it makes sense.
The example of the partial adjustment model (6.7) is a case where we have
sample separation given by AP,. However, it still makes sense to test for the
disequilibrium hypothesis which in this case merely translates to a hypothesis
about the speed of adjustment of prices to levels that equilibrate demand and
supply. Adding a stochastic term us, to the price adjustment equation does not
change the test. When A =l this says P, = P,* + Use.
There is considerable discussion in Quandt’s paper on the question of nested
vs. non-nested hypothesis. Quandt argues that very often the hypothesis of
equilibrium vs. disequilibrium is non-nested, i.e. the parameter set under the null
hypothesis that the model is an equilibrium model is not a subset of the
parameter set for the disequilibrium model. The problem in these cases may be
that there is no adequate explanation of why disequilibrium exists in the first
place.
Consider for instance, the disequilibrium model: with the demand and supply
functions specified by eqs. (6.3) and (6.4).
Quandt argues that if one takes the limit of the likelihood function for this
model with price adjustment equation as:

(7.3)

and

CT*3 = Cov( u2, Uj) = 0,

(113 = cov( ul, u3) = 0,

u3’# 0,

then we get the likelihood function for the equilibrium model (Q, = D, = St) and
thus the hypothesis is “nested”; but that if 03’= 0, the likelihood function for the
disequilibrium model does not tend to the likelihood function for the equilibrium
model even if y + cc and thus the hypothesis is not tested. The latter conclusion,
however, is counter-intuitive and if we consider the correct likelihood function for
this model derived in Amemiya (1974) and if we take the limits as y + cc, we get
the likelihood function for the equilibrium model.
1666 G. S. Maddala

Quandt also shows that if the price adjustment equation is changed to

AP,+I =v@,-$)++t, (7.4)

then the limit of the likelihood function of the disequilibrium model as y + cc is


not the likelihood function for the equilibrium model. This makes intuitive sense
and is also clear when we look at the likelihood functions derived in Section 5. In
this case the hypothesis is nonnested, but the problem is that as discussed earlier,
this price adjustment equation does not tell us anything about what disequi-
librium is due to. As shown in Section 6, the price adjustment eq. (7.3) follows
from the partial adjustment eq. (6.7) and thus throws light on what disequilibrium
is due to, but the price adjustment eq. (7.4) says nothing about the source of
disequilibrium. If we view the equation as a forecast equation, then the disequi-
librium is due to imperfect forecasts of the market equilibrating price. In this case
it is clear that as y --) 00, we do not get perfect forecasts. What we need to have
for a nested model is a forecasting equation which for some limiting values of
some parameters yields perfect forecasts at the market equilibrating prices.
Consider now the case where we do not have a price adjustment equation and
the model merely consists of a demand equation and a supply equation. Now,
clearly the source of the disequilibrium is that P,is exogenous. Hence the test
boils down to testing whether P, is exogenous or endogenous. The methods
developed by Wu (1973) and Hausman (1978) would be of use here.
As mentioned earlier, if a disequilibrium is due to partial adjustment of prices,
then a test for disequilibrium is a test for X = 0 in eq. (6.7) or a test that l/y = 0
in eq. (6.2). The proper way to test this hypothesis is to re-parameter&e the
equations in terms of q = l/A before the estimation is done. This re-parameteri-
zation is desirable in all models (models with expectational vari-
ables, spillovers, inventories etc.) where the price adjustment eq. (6.2) or its
stochastic version is used.
There is only one additional problem and it is that the model is instable for
n < 0. Thus the null hypothesis TJ= 0 lies on the boundary of the set of admissible
values of n. In this case one can use the upper 2a percentage point of the x2
distribution in order that the test may have a significance level of (Y in large
samples. Upcher (1980) developed a Lagrange multiplier or score statistic. The
score test is not affected by the boundary problem and only requires estimation of
the constrained model, i.e. the model under the hypothesis of equilibrium. This
test is therefore computationally much simpler than either LR or Wald test and in
case the null hypothesis of equilibrium is accepted, one can avoid the burdensome
method of estimating the disequilibrium model. Upcher’s analysis shows that the
score test statistic is identical for both stochastic and non-stochastic specification
of *the price-adjustment equation. The advantage of this result is that it encom-
passes a broad spectrum of alternatives. But, in case the null hypothesis of
Ch. 28: Disequilibrium, Self-selection and Switching Models 1661

equilibrium is rejected, a range of alternative specifications for the disequilibrium


model is possible.
However, a major objection to the use of Lagrange multiplier procedure is that
it ignores the one-sided nature of the alternative and, therefore, is likely to result
in a test with low power compared to the LR or Wald test procedures.
This issue has been recently addressed by Rogers (1983) who has proposed a
test statistic that is asymptotically equivalent under the null hypothesis and a
sequence of local alternatives to the LR and Wald statistics, and which has the
same computational advantage over these statistics as does the Lagrange multi-
plier statistic over the LR and Wald statistics in the case of the usual two-sided
alternatives.
An alternative test for disequilibrium developed by Hwang (1980) relies on the
idea of deriving an equation of the form

from the equilibrium and disequilibrium model. The difference between the two
models is that ?~i,7r2,3 are stable over time in the equilibrium model and varying
over time in the disequilibrium model. Hwang, therefore, proposes to use stability
tests available in the literature for testing the hypothesis of equilibrium. In the
case of the equilibrium model P, is endogenous. Eq. (7.5) is derived from the
conditional distribution of Q, given P, and hence can be estimated by ordinary
least squares. The only problem with the test suggested by Hwang is that
parameter instability can arise from a large number of sources and if the null
hypothesis is rejected, we do not know what alternative model to consider.
In summary, it is always desirable to base a test for disequilibrium on a
discussion of the source for disequilibrium.

7.4. Models with inventories

In Section 6 we considered modifications of the demand and supply functions


taking account of spillovers. However, spillovers on the supply side are better
accounted for by introducing inventories explicitly. Dagenais (1980) considers
inventories and spillovers in the demand function and suggests a limited informa-
tion method. Chanda (1984) extends this analysis to take into account expected
prices in the supply function.
Green and Laffont (1981) consider inventories in the context of a disequi-
librium model with anticipatory pricing. LatTont (1983) presents a survey of the
theoretical and empirical work on inventories in the context of fixed-price models.
1668 G. S. Maddala

The issues of how to formulate the desired inventory holding and how to
formulate inventory behaviour in the presence of disequilibrium are problems
that need further study.

8. Multimarket disequilibrium models

The analysis in the preceding sections on single market disequilibrium models has
been extended to multimarket disequilibrium models by Gourieroux et al. (1980)
and Ito (1980). Quandt (1978) first considered a two-market disequilibrium model
of the following form: (the exogenous variables are omitted):

4, = alQzr + 4,
S,, = &Qzr + 4,
4 = azQu + F/1,,
% = PzQn + &t, (8.1)
Q,, = Mid% fL),
Q2, = Min@L &). (8.2)

Quandt did not consider the logical consistency of the model. This is considered
in Amemiya (1977) and Gourieroux et al. (1980a).
Consider the regimes:

R,: D,2S1.D2kSz,

R,: D,>S,.D,<S,,
R,: D,cS,-D,cS,,

R,: D, < S,. D, 2 S,. (8.3)

In regime 1, we have Q, = S,, Q, = S, and substituting these in (8.1) we have

Similarly, we can define the corresponding matrices A,, A,, A, in regimes


Ch. 28: Disequilibrium, Self-selection and Switching Models 1669

R,, R,, R, respectively that give the mapping from (III, St, D,,S,) to
(q, u2, u3, %I.

A,=

0
i 010 I --i%
1a2 -8101
-a,0 01
A,=

-(Y2
-p20
01 01 -81
-(Y1
01 01

11*
and

-&OO
1 0 0 -a1
0 1 0 -PI
A,=
-lY2 0 1 0

The logical consistency or ‘coherency’ conditions derived by Gourieroux et al. are


that the determinants of these four matrices, i.e. (1 - &p2), (l- cu,&), (1 - CQOL~),
(1- a1P2) must be the same sign.
The major problem that the multimarket disequilibrium models are supposed
to throw light on (which the models in eqs. (8.1) and (8.2) does not) refers to the
“spill-over effects” - the effects of unsatisfied demand or supply in one market on
the demand and supply in other markets. Much of this discussion on spill-over
effects has been in the context of macro-models, the two markets considered are
the commodity market and the labor market. The commodity is supplied by
producers and consumed by households. Labor is supplied by households and
used by producers. The quantities actually transacted are given by

C=Min(Cd,CS),

L = Min( Ld, L’). (8.4)

The demands and supplies actually presented in each market are called “effective”
demands and supplies and these are determined by the exogenous variables and
the endogenous quantity constraints (8.4). By contrast, the “notional” demands
and supplies refer to the unconstrained values. Denote these by Cd, p, Ed, E.
The different models of multi-market disequilibrium differ in the way ‘effective’
demands and “spill-over effects” are defined. Gourieroux et al. (1980a) define the
1670 G. S. Maddala

effective demands and ‘spill-over effects’ as follows:

Model I

cd=cd if L = L” I Ld,
(8.5)
=Cd+q(L-tS) ifL=Ld<LS,
CS=F if L = Ld 2 L”,
(8.6)
=CS+a,(L-Ed) if L = L” < Ld,
Ld=zd if C=C”SCd,
(8.7)
=?ld+P1(C-CS) if C=Cd<CS,
LS=E” if C=CdSCS,
(8.8)
=L”+p,(c-Cd) if C=C”<Cd.

This specification is based on Malinvaud (1977) and assumes that agents on the
short-side of the market present their notional demand as their effective demand
in the other market. For instance eq. (8.5) says that if households are able to sell
all the labor they want to, then their effective demand for goods is the same as
their ‘notional’ demand. On the other hand, if they cannot sell all the labor they
want to, there is a “spill-over effect” but note that this is proportional to L - 1’
not L - L’. (I.e. it is proportional to the difference between actual labor sold and
the ‘notional’ supply of labor.)
The model considered by Ito (1980) is as follows:

Model II

Cd = Cd + cQ(L - Ls), (8.5’)

Cs=CS+fx2(L-Zd), (8.6’)

Ld = Zd + P,(C - c”), (8.7’)

L”=Z”+p,(c-Cd). (8.8’)

An alternative model suggested by Portes (1977) based on work by Benassy is the


following:

Model III

Cd = Cd -i-al( L - L”), (8.5”)


C”=CS+a,(L-Ld), (8.6”)
Ch. 28: Disequilibrium, Self-selection and Switching Models 1671

Ld = Ed + &(C - CS), (8.7”)

L”=L”+j?,(c-Cd). (8.8”)

Portes compares the reduced forms for these three models and argues that
econometrically, there is little to choose between the alternative definitions of
effective demand.
The conditions for logical consistency (or coherency) are the same in all these
models viz: 0 < CX;~,~1 for i, j = 1,2. Both Gourieroux et al. (1980a) and Ito
(1980) derive these conditions, suggest price and wage adjustment equations
similar to those considered in Section 6, and discuss the maximum likelihood
estimation of their models. Ito also discusses two-stage estimation similar to that
proposed by Amemiya for the Fair and Jaffee model, and derives sufficient
conditions for the uniqueness of a quantity-constrained equilibrium in his model.
We cannot go into the details of all these derivations here. The details involve
more of algebra than any new conceptual problems in estimation. In particular,
the problems mentioned in Section 6 about the different price adjustment
equations apply here as well.
Laffont (1983) surveys the empirical work on multi-market disequilibrium
models. Quandt (1982, pp. 39-54) also has a discussion of the multi-market
disequilibrium models.
The applications of multi-market disequilibrium models all seem to be in the
macro area. However, here the problems of aggregation are very important and it
is not true that the whole economy switches from a regime of excess demand to
one of excess supply or vice versa. Only some segments might behave that way.
The implications of aggregation for econometric estimation have been studied in
some simple models by Malinvaud (1982).
The problems of spillovers also tend to arise more at a micro-level rather than a
macro-level. For instance, consider two commodities which are substitutes in
consumption (say natural gas and coal) one of which has price controls. We can
define the demand and supply functions in the two markets (omitting the
exogenous variables) as follows:

4 = qp, + PlPz+ ~1,


s, = c$P, + t+P, I P,

Q, = Mh(D,, S,),
D,=y,P,+S,P,+A(D,-S,)+V,,
S, = YZPZ + v,,
Q2 = D, = S,, i.e. the second market is always in equilibrium.

If P, I P, we have the usual simultaneous equations model with the two quanti-
1672 G. S. Maddulu

ties and two prices as the endogenous variables. If P, > P, then there is excess
demand in the first market and a spill-over of this into the second market. This
model is still in a “partial equilibrium” framework but would have interesting
empirical applications. It is at least one step forward from the single-market
disequilibrium model which does not say what happens to the unsatisfied demand
or supply.

9. Models with self-selection

As mentioned in the introduction, there is an early discussion of the self-selection


problem in Roy (1951) who discussed the case of individuals choosing between
two occupations, hunting and fishing, on the basis of their comparative ad-
vantage. See Maddala (1983a) pp. 257-8 for a discussion of this model.
The econometric discussion of the consequences of self-selectivity started with
the papers by Gronau (1974), Lewis (1974) and Heckman (1974). In this case the
problem is about women choosing to be in the labor force or not. The observed
distribution of wages is a truncated distribution. It is the distribution of wage
offers truncated by reservation wages. The Gronau-Lewis model consisted of two
equations:

w, = XP, + Ul,
w,+xp,+u,. (9-I)

We observe W = W, iff W, 2 W,. Otherwise W = 0. We discussed the estimation


of this model in Section 2 and we will not repeat it here. The term ‘selectivity
bias’ refers to the fact that if we estimate eq. (9.1) by OLS based on the
observations for which we have wages W, we get inconsistent estimates of the
parameters.
Note that

-403 ’
E(u,lWo2Wr)=-a ‘W(Z)

where

z= al- Xl2 u=p


u2- Ul
a=Var(u,-u,) and a,,=Cov(u,u,).
u ’ (I ’

Hence we can write (9.1) as:

(9.2)
where E(V) = 0.
Ch. 28: Disequilibrium, Self-selection and Switching Models 1673

A test for selectivity bias is a test for ui,, = 0. Heckman (1976) suggested a
two-stage estimation method for such models. First get consistent estimates for
the parameters in 2 by the probit method applied to the dichotomous variable
(in the labor force or not). Then estimate eq. (9.2) by OLS using the estimated
values 2 for 2.
The self-selectivity problem has since been analyzed in different contexts by
several people. Lee (1978) has applied it to the problem of unions and wages. Lee
and Trost (1978) have applied it to the problem of housing demand with choices
of owning and renting. Willis and Rosen (1979) have applied the model to the
problem of education and self-selection. These are all switching regression mod-
els. Griliches et al. (1979) and ReMy et al. (1979) consider models with both
selectivity and simultaneity. These models are switching simultaneous equations
models. As for methods of estimation, both two-stage and maximum likelihood
methods have been used. For two-stage methods, the paper by Lee et al. (1980)
gives the asymptotic covariance matrices when the selectivity criterion is of the
probit and tobit types.
In the literature of self-selectivity a major concern has been with testing for
selectivity bias. These are tests for ulU = 0 and a,, = Cov(u, u2) = 0. However, a
more important issue is the sign and magnitude of these covariances and often
not much attention is devoted to this. In actual practice we ought to have
a2u -%I > 0 but ulu and uzU can have any signs.‘o It is also important to
estimate the mean values of the dependent variables for the alternate choice. For
instance, in the case of college education and income, we should estimate the
mean income of college graduates had they chosen not to go to college, and
the mean income of non-college graduates had they chosen to go to college. In the
example of hunting and fishing we should compute the mean income of hunters
had they chosen to be fishermen and the mean income of fishermen had they
chosen to be hunters. Such computations throw light on the effects of self-selec-
tion and also reveal difficiencies in the model which simple tests for the existence
of selectivity bias do not. See Bjiirlund and Moffitt (1983) for such calculations.
In the literature on labor supply, there has been considerable discussion of
“individual heterogeneity”, i.e. the observed self-selection is due to individual
characteristics not captured by the observed variables (some women want to work
no matter what and some women want to sit at home no matter what). Obviously,
these individual specific effects can only be analyzed if we have panel data. This
problem has been analyzed by Heckman and Chamberlain, but since these
problems will be discussed in the chapters on labor supply models by Heckman

“This is pointed out in Lee (1978b). Trost (1981) illustrates this with an empirical example on
returns to college education.
1614 G. S. Maddala

and analysis of cross-section and time-series data by Chamberlain they will not be
elaborated here.
One of the more important applications of the procedures for the correction of
selectivity bias is in the evaluation of programs.
In evaluating the effects of several social programs, one has to consider the
selection and truncation that can occur at different levels. We can depict the
situation by a decision tree as follows.

Total Sample

Individual Decision Individual Decision


to Participate Not to Participate in
Experiment

Administrator’s Administrator’s Decision


Decision to Select Not to Select

Control Group Treatment Group

Dropout Continue Dropout Continue

Figure 2 A decision tree for the evaluation of social experiments.

In practical situations, one would have to assume randomness at certain levels


or else the model can get too unwieldy to be of any use. As to the level at which
selection and truncation bias needs to be introduced, this is a question that
depends on the nature of the problem. Further, in Figure 2 the individual’s
decision to participate preceded the administrator’s decision to select. This
situation can be reversed or both the decisions could be simultaneous. Another
problem is that caused by the existence of multiple categories such as no
participation, partial or full participation or different types of treatment. These
cases fall in the class of models with polychotomous choice and selectivity. The
selectivity problem with polychotomous choice has been analyzed in Hay (1980),
Dubin and McFadden (1984) and Lee (1981). A summary of these methods can
be found in Maddala (1983a, pp. 275-278). An empirical application illustrating
the approach suggested by Lee is in Trost and Lee (1984).
Ch. 28: Disequilibrium, Self-selection and Switching Models 1675

One further problem is that of truncated samples. Very often we do not have
data on all the individuals -participants and non-participants. If the data consists
of only participants in a program and we know nevertheless that there is
self-selection and we have data on the variables determining the participation
decision function, then we can still correct for selectivity bias. The methodology
for this problem is discussed in the next section. The important thing to note is
that though, theoretically, truncation does not change the identifiability of the
parameters, there is, nevertheless a loss of information.
There is a vast amount of literature on program evaluation. Some important
references are: Goldberger (1972) and Barnow, Cain and Goldberger (1981).
These papers and the selectivity problem in program evaluation have been
surveyed in Maddala (1983a, pp. 260-267).
One other problem is that of correcting for selectivity bias when the explana-
tory variables are measured with error. An example of this occurs in problems of
measuring wage discrimination, particularly a comparison between the Federal
and non-Federal sectors. A typical regression equation considered is one of
regressing earnings on productivity and a dummy variable depicting race or sex or
ethnic group. Since productivity cannot be measured, some proxies are used.
When such equations are estimated, say for the Federal (or non-Federal) sectors,
one has to take account of individual choices to belong to one or the other sector.
To avoid the selection bias we have to model not only the determinants of wage
offers but also the process of self-selection by which individuals got into that
sector. An analysis of this problem is in Abowd, Abowd and Killingsworth
(1983).
Finally, there is the important problem that most of the literature on selectivity
bias adjustment is based on the assumption of normality. Consider the simple two
equation model to analyze the selectivity problem.

Y= xp + u,
I*=zy-&,

X and Z are exogenous variables. I * is never observed. All we observe is I = 1 if


I * > 0, I = 0 otherwise. Also Y is not observed unless I * > 0.
Olsen (1980) shows how the only assumption we need to make a correction for
selection bias in the estimation of p, is that E is normal and that the conditional
expectation of u given E is linear. If u and E are bivariate normal, this condition
follows automatically. Goldberger (1980) made some calculations with alternative
error distributions and showed that the normal selection bias adjustment is quite
sensitive to departures from normality. Lee (1982a, 1983a) suggests some general
transformations to normality. The transformations suggested by him can be done
using some methods outlined in Hildebrand (1956) and Appendix II, c in Bock
1676 G. S. Maddala

and Jones (1968). This approach permits the analysis of selection bias with any
distributional assumptions. Details can be found in the papers by Lee, and a
summary in Maddala (1983a, pp. 272-275).

10. Multiple criteria for selectivity

There are several practical instances where selectivity could be due to several
sources rather than just one as considered in the examples in the previous Section.
Griliches et al. (1979) cite several problems with the NLS young men data set that
could lead to selectivity bias. Prominent among these are attrition and (other)
missing data problems. In such cases we would need to formulate the model as
switching regression or switching simultaneous equations models where the switch
depends on more than one criterion function.
During recent years there have been many applications involving multiple
criteria of selectivity. Abowd and Farber (1982) consider a model with two
decisions: the decision of individuals to join a queue for union jobs and the
decision of employers to draw from the queue. Poirier (1980) discusses a model
where the two decisions are those of the employee to continue with the sponsoring
agency after training and the decisions of the employer to make a job offer after
training. Fishe et al. (1981) consider a two-decision model: whether to go to
college or not and whether to join the labor force or not. Ham (1982) examines
the labor supply problem by classifying individuals into four categories according
to their unemployment and under-employment status. Catsiapis and Robinson
(1982) study the demand for higher education and the receipt of student aid
grants. Tunali (1983) studies migration decisions involving single and multiple
moves. Danzon and Lillard (1982) analyze a sequential process of settlement of
malpractice suits. Venti and Wise (1982) estimate a model combining student
preferences for colleges and the decision of the university to admit the student.
All these problems can be classified into different categories depending on
whether the decision rules are joint or sequential. This distinction, however, is not
made clear in the literature and the studies all use the multivariate normal
distribution to specify the joint probabilities.
With a two decision model, the specification is as follows:

Yl = Xl& + Ul, (10.1)

r, = X282 + u2, (10.2)

I:=z,Y,-&i, (10.3)

1; = z,y, - E2. (10.4)


Ch. 28: Disequilibrium, Self-selection, and Switching Models 1611

We also have to consider whether the choices are completely observed or they
are partially observed. Define the indicator variables

II =l iff Ii* > 0


= 0 otherwise,
I*=1 iff 12*> 0
= 0 otherwise.

The question is whether we observe I, and Z2 separately or only as a single


indicator variable I = Iii,. The latter is the case with the example of Abowd and
Farber. Poirier (1980) also considers a bivariate probit model with partial
observability but his model is a joint model-not a sequential model as in the
example of Abowd and Farber. In the example Poirier considers, the employer
must decide whether or not to give a job offer and the applicant must decide
whether or not to seek a job offer. We do not observe these individual decisions.
What we observe is whether the trainee continues to work after training. If either
the employer or the employee makes the decision first, then the model would be a
sequential model.
The example considered by Fishe et al. (1981) is a joint decision model but
both indicators I, and I2 are observed. Similar is the case considered by Ham
(1982) though it is hard to see how unemployment and underemployment could
be considered as two decisions. Workers do not choose to be unemployed and
underemployed. Rather both unemployment and underemployment are conse-
quences of more basic decisions of employers and employees. The example
considered by Catsiapis and Robinson (1982) is a sequential decision, though one
can also present arguments that allow it to be viewed as a joint decision model.
In the joint decision model with partial observability, i.e. where we observe
I = Ii-I, only and not Zi and I, individually, the parameters yi and y2 in eqs.
(10.3) and (10.4) are estimable only if there is at least one non-overlapping
variable in either one of 2, and 2,. Since V( EJ = V(Q) = 1 by normalization, let
us define COV(E~,E*) = p. Also write

Prob( 1: > 0, 1; > 0)


= Prob( si < Z,Y,, s2 < Z,Y,)
= F(Z,Y,7 Z,Y,, P).

Then the ML estimates of yr, y2 and p are obtained by maximizing the likelihood
function

L1= ~lgZ,YI, Z,Y,, Id- J-Jp WGYlY GY*7 l-41. (10.5)


1678 G. S. Maddala

With the assumption of bivariate normality of .si and Q, this involves the use of
bivariate probit analysis.
In the sequential decision model with partial observability, if we assume that
the function (10.4) is defined only on the subpopulation Ii = 1, then since the
distribution of Ed that is assumed is considered on ei < Ziy,,’ the likelihood
function to be maximized would be

L, = p1 PP(z,Y,)@(z*Y*)1~ ,IzI,[l- @(Z,Y1)@(Z*YJl. (10.6)

Again, the parameters yi and y2 are estimable only if there is at least one
non-overlapping variable in either one of Z, and Z, (otherwise we would not
know which estimates refer to yi and which refer to y2). In their example on job
queues and union status of workers, Abowd and Farber (1982) obtain their
parameter estimates using the likelihood function (10.6). One can, perhaps, argue
that even in the sequential model, the appropriate likelihood function is still
(10.5) and not (10.6). It is possible that there are persons who do not join the
queue (Ii = 0) but for whom employers would want to give a union job. The
reason we do not observe these individuals in union jobs is because they had
decided not to join the queue. But we do not also observe in the union jobs all
those with I2 = 0. Thus, we can argue that 1; exists and is, in principle, defined
even for the observations Ii = 0. If the purpose of the analysis is to examine
what factors influence the employers’ choice of employees for union jobs, then
possibly the parameter estimates should be obtained from (10.5). The difference
between the two models is in the definition of the distribution of E*. In the case of
(10.5), the distribution of e2 is defined over the whole population. In the case of
(10.6), it is defined over the subpopulation I, = 1. The latter allows us to make
only conditional inferences. l1 The former allows us to make both conditional and
marginal inferences. To make marginal inferences, we need estimates of yz. To
make conditional inferences we consider the conditional distribution f(ezlei <
Ziy,) which involves yi, ya, and p.
Yet another type of partial observability arises in the case of truncated samples.
An example is that of measuring discrimination in loan markets. Let I;” refer to
the decision of an individual on whether or not to apply for a loan, and let 1;
refer to the decision of the bank on whether or not to grant the loan.

II =l if the individual applies for a loan


= 0 otherwise,
I*=1 if the applicant is given a loan
= 0 otherwise.
“The conditional model does not permit us to allow for the fact that changes in 2, also might
affect the probability of being in the queue. Also, the decision of whether or not to join the queue can
be influenced by the perception of the probability of being drawn from the queue.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1679

Rarely do we have data on the individuals for whom I1 = 0. Thus what we have is
a truncated sample. We can, of course, specify the distribution of 1; only for the
subset of observations I1 = 1 and estimate the parameters yz by say the probit ML
method and examine the significance of the coefficients of race, sex, age, etc. to
see whether there is discrimination by any of these variables. This does not,
however, allow for self-selection at the application stage, say for some individuals
not applying because they feel they will be discriminated against. For this purpose
we define 1: over the whole population and analyze the model from the truncated
sample. The argument is that, in principle 1** exists even for the non-applicants.
The parameters yi, yz and p can be estimated by maximizing the likelihood
function

Jl ZlYl 9 Z,Y, 9 P> @&Y,)- WGYl, Z2Y2, P)


J%= l-I
I, =l @(ZlYl) * )z)
2 @GYl) .
(10.7)

In this model the parameters yi, y2 and p are, in principle, estimable even if Z,
and Z, are the same variables. In practice, however, the estimates are not likely to
be very good. Muthen and Jiirekog (1981) report the results of some Monte-Carlo
experiments on this. Bloom et al. (1981) report that attempts at estimating this
model did not produce good estimates. However, the paper by Bloom and
Killingsworth (1981) shows that correction for selection bias can be done even
with truncated samples. Wales and Woodland (1980) also present some encourag-
ing Monte-Carlo evidence. Since the situation of truncated samples is of frequent
occurrence (see Bloom and Killingsworth for a number of examples) more
evidence on this issue will hopefully accumulate in a few years.
The specification of the distributions of ei and e2 in (10.3) and (10.4) depends
on whether we are considering a joint decision model or a sequential decision
model. For problems with sequential decisions, the situation can diagrammati-
cally be described as follows:

In a sequential decision model, the disturbance e2 can be defined only on the


subpopulation for which It = 1. The specification of the joint distribution for
(pi, c2) over the whole population will not be appropriate in principle and will
introduce unnecessarily complicated functional forms for the conditional prob-
abilities. This point is emphasized in Lee and Maddala (1983b). On the other
1680 G. S. Maddala

hand, if we specify the marginal distribution of &Iand the conditional distribution


of e2 given Ii = 1 then there is no way we can allow for the correlations among
the decisions. Lee and Maddala (1983b) and Lee (1984) suggest the following: Let
Fi(eJ be the marginal distribution of &I defined on the subpopulation (1, = 1)
which is, of course, implied by the marginal distribution of &i on the whole
population. F2(s2) is the marginal distribution of E* defined on the subpolation
1, = 1. Given the marginal distributions Fi(ei) and &(E*) defined on the common
measurable space, there are infinitely many ways of generating joint distributions
with given marginals. Lee (1983a) discusses some computable methods of gener-
ating these distributions. This procedure can be applied to correct for selectivity
bias in sequential decision models with any specifications of the marginal distri-
butions of ei on the whole population and of e2 on the subpopulation and
allowing for correlations in the decisions. See Lee and Maddala (1983b) and Lee
(1984) for details.

11. Concluding remarks

In the preceding sections we have reviewed the recent literature on disequilibrium


and selectivity models. We will now go through some deficiencies of these models
and examine future areas of research.
The cornerstone of the “disequilibrium” models discussed in this chapter is the
“minimum condition.” One of the most disturbing points in the empirical
applications is that the models have been mechanically applied with no discussion
of what disequilibrium is due to and what the consequences are. In spite of all the
limitations mentioned in Section 3, the model discussed there (with slight varia-
tion) has been the model with the most empirical applications. For instance, Sealy
(1979) used the model to study credit rationing in the commercial loan market.
Portes and Winter (1978) used it to estimate demand for money and savings
functions in centrally planned economies (Czechoslovakia, East Germany,
Hungary and Poland). Portes and Winter (1980) used it to study the demand for
consumption goods in centrally planned economies. Chambers et al. (1978) used
it to study the effects of import quotas on the U.S. beef market.
The reason for the popularity of this model is that it needs us to specify very
little. The authors of the above papers specify the demand and supply functions
as usual, and then say there is “rationing” and disequilibrium because of
regulations. But even if the regulations control prices, it does not imply that prices
are fixed at certain levels continuously which is what the model says. Further,
there is no discussion of how the rationing is carried out and in almost all cases
the data used are macro-data and the implications of aggregation are ignored.
The main application of the methodology discussed in this chapter is to
regulated markets and centrally planned economies, where there are price and
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1681

quantity regulations. In Section 6 we discussed the case of controlled prices and


showed how the analysis can be applied to credit markets with interest rate
ceilings (or equivalently, labor markets with minimum-wage laws). The interest
rate ceiling problem has been analyzed in Maddala and Trost (1982). The
minimum wage problem has been analyzed in Meyer and Wise (1983a, b). An
analysis of price supports’ is in Maddala and Shonkwiler (1984). The case of
centrally planned economies has been analyzed by Charemza and Quandt (1982).
Another major criticism of the disequilibrium models appears in two papers by
Richard (1980) and Hendry and Spanos (1980). These criticisms are also elaborated
in the comments by Hendry and Richard as the survey paper by Quandt (1982).
Hendry and Spanos point out that the “minimum condition” was actually
discussed by Frisch (1949) but that he suggested formulation of “market pressures”
that are generated by the inequality between the unobserved latent variables D,
and S,. These pressures were formulated in the price adjustment eqs. discussed in
Section 6 but we also saw the serious limitations of this eq. in the presence of the
“minimum condition”. Hendry and Spanos suggest dropping the “minimum
condition” (which is the main source of all the headaches in estimation), con-
centrating on the “pressures” and dynamic adjustment processes, and modelling
the observables directly. Though there is some merit in their argument, as
mentioned earlier, the main application of the methodology described in this
chapter is to the analysis of regulated markets and planned economies and the
methods suggested by Hendry and Spanos are not applicable to such problems.
Since the Hendry-Spanos paper is discussed in detail in Maddala (1983a, pp.
343-345) we will not repeat the criticism here.
Finally, mention must be made of the criticism of the switching regression
models with endogenous switching (of which the disequilibrium and selection
models are particular cases) by Poirier and Rudd (1981). These authors argue that
there has been substantial confusion in the econometrics literature over switching
regression models with endogenous switching and that this confusion can cause
serious interpretation problems when the model is employed in empirical work.
Fortunately, however, the arguments presented by these authors are incorrect.
Since their paper has been discussed in detail in Maddala (1983a, pp. 283-287)
we will not repeat the criticism here.
The literature on self-selection contains interesting empirical applications in the
areas of labor supply, unions and wages, education and self-selection, program
evaluation, measuring discrimination and so on. However, the literature on
disequilibrium models lacks interesting empirical applications. Part of the prob-
lem here is that not much thought is often given to the substantive question of
what the sources of disequilibrium are and also there are few micro data sets to
which the methods have been applied. Almost all applications [Avery (1982),
Maddala and Trost (1982), Meyer and Wise (1983a,b) are perhaps some excep-
tions] are based on aggregate time-series data and there is not enough discussion
1682 G. S. Maddala

of problems of aggregation. The Fair and Jaffee example on the housing market
as well as the different models of “credit rationing” are all based on aggregate
data and there is much to be desired in the detailed specification of these models.
Perhaps the most interesting application of the disequilibrium models are in the
areas of regulated industries. After all, it is regulation that produces disequi-
librium in these markets. Estimation of some disequilibrium models with micro-
data sets for regulated industries and estimation of the effects of regulation would
make the disequilibrium literature more intellectually appealing than it has been.
There are also some issues that need to be investigated regarding the appropriate
formulation of the demand and supply functions under disequilibrium. The
expectation of disequilibrium can itself be expected to change the demand and
supply functions. Thus, one needs to incorporate expectations into the modelling
of disequilibrium.
The literature on self-selection, by contrast to the disequilibrium literature, has
several interesting empirical applications. However, even here a lot of work
remains to be done. The case of selectivity being based on several criteria rather
than one has been mentioned in Section 10. Here one needs a clear distinction to
be made between joint decision and sequential decision models. Another problem
is that of correcting for selectivity bias when the explanatory variables are
measured with error. Almost all the usual problems in the single equation
regression model need to be analyzed in the presence of the selection (self-selec-
tion) problem.

References

Abowd, A. M., J. M. Abowd and M. R. Killingsworth (1983) “Race, Spanish Origin and Earnings
Differentials Among Men: The Demise of Two Stylized Facts”. Discussion Paper #83-11, Econom-
ics Research Center/NORC, University of Chicago.
Abowd, J. M. and H. S. Farber (1982) “Job Queues and Union Status of Workers”, Industrial and
Labor Relations Review, 35(4), 354-361.
Amemiya, T. (1973) “Regression Analysis When the Dependent Variable is Truncated Normal”,
Econometrica, 41(6), 997-1016.
Amemiya, T. (1974a) “A Note on a Fair and JaKee Model”, Econometrica, 42(4), 759-762.
Amemiya, T. (1974b) “Multivariate Regression and Simultaneous Equations Models When the
Dependent Variables are Truncated Normal”, Econometrica, 42(6), 999-1012.
Amemiya, T. (1977) “The Solvability of a Two-Market Disequilibrium Model”. Working Paper 82,
IMSSS, Stanford University, August 1977.
Amemiya, T. and Sen G. (1977) “The Consistency of the Maximum Likelihood Estimator in a
Disequilibrium Model”, Technical Report No. 238, IMSSS, Stanford University.
Avery, R. B. (1982) “Estimation of Credit Constraints by Switching Regressions”, in: C. Manski and
D. McFadden, eds., Structural Analysis of Discrete Data: With Econometric Applications. MIT Press.
Barnow, B. S., G. G. Cain and A. S. Goldberger (1980) “Issues in the Analysis of Selectivity Bias”, in:
E. W. Stromsdorder and G. Farkas, eds., Evaluation Studies - Review Annual, 5, 43-59.
Batchelor, R. A. (1977) “A Variable-Parameter Model of Exporting Behaviour”, Review of Economic
Studies, 44(l), 43-58.
Bergstrom, A. R. and C. R. Wymer (1976) “A Model for Disequilibrium Neoclassical Growth and its
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1683

Application to the United Kingdom”, in: A. R. Bergstrom, ed., Statistical Inference in Continuous
Time Economic Models. Amsterdam, North-Holland Publishing Co.
Bemdt, E. R., Hall, B. H., Hall, R. E. and J. A. Hausman (1974) “Estimation and Inference in
Non-Linear Structural Models”, Annals of Economic and Social Measurement, 3(4), 653-665.
Bjorklund, A. and R. Moffitt (1983) “The Estimation of Wage Gains and Welfare Gains From
Self-Selection Models”. Manuscript, Institute for Research on Poverty, University of Wisconsin.
Bloom, D. E. and M. R. Killingsworth (1981) “Correcting for Selection Bias in Truncated Samples:
Theory, With an Application to the Analysis of Sex Salary Differentials in Academe”. Paper
presented at the Econometric Society Meetings, Washington, D.C., Dec. 1981.
Bloom, D. E., B. J. Preiss and J. Trussell(l981) “Mortgage Lending Discrimination and the Decision
to Apply: A Methodological Note”. Manuscript, Carnegie Mellon University.
Bock, R. D. and L. V. Jones (1968) The Measurement and Prediction of Juagement and Choice. San
Francisco: Holden-Day.
Bouissou, M. B., J. J. Laffont and Q. H. Vuong (1983) “Disequilibrium Econometrics on Micro Data”.
Paper presented at the European Meeting of the Econometric Society, Pisa, Italy.
Bowden. R. J. (1978a) “Snecification. Estimation and Inference for Models of Markets in Diseaui-
librium”, Inttknatiokal konomic Review, 19(3), 711-726.
Bowden, R. J. (1978b) The Econometrics of Disequilibrium. Amsterdam: North Holland Publishing Co.
Catsiapis, G. and C. Robinson (1982) “Sample Selection Bias With Multiple Selection Rules”, Journal
of Econometrics, 18, 351-368.
Chambers, R. G., R. E. Just, L. J. Moffitt and A. Schmitz (1978) “International Markets in
Disequilibrium: A Case Study of Beef”. Berkeley: California Agricultural Experiment Station.
Chanda, A. K. (1984) Econometrics of Disequilibrium and Rational Expectations. Ph.D. Dissertation,
University of Florida.
Charemza, W. and R. E. Quandt (1982) “Models and Estimation of Disequilibrium for Centrally
Planned Economies”, Review of Economic Studies, 49, 109-116.
Cosslett, S. R. (1984) “Distribution-Free Estimation of a Model with Sample Selectivity”. Discussion
Paper, Center for Econometrics and Decision Sciences, University of Florida.
Cosslett, S. R. and Long-Fei Lee (1983) “Serial Correlation in Latent Discrete Variable Models”.
Discussion Paper, University of Florida, forthcoming in Journal of Econometrics.
Dagenais, M. G. (1980) “Specification and Estimation of a Dynamic Disequilibrium Model”,
Economics Letters, 5, 323-328.
Danzon, P. M. and L. A. Lillard (1982) The Resolution of Medical Malpractice Claims: Modetiing the
Bargaining Process. Report #R-2792-ICJ, California: Rand Corporation.
Davidson, J. (1978) “FIML Estimation of Models with Several Regimes”. Manuscript, London School
of Economics, October 1978.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) “Maximum Likelihood from Incomplete Data
via the EM Algorithm”, Journal of the Royal Statistical Society, Series B, 39, l-38 with discussion.
Dubin, J. and D. McFadden (1984) “An Econometric Analysis of Residential Electrical Appliance
Holdings and Consumption”, Econometrica, 52(2), 345-62.
Eaton, J. and R. E. Quandt (1983) “A Model of Rationing and Labor Supply: Theory and
Estimation”, Econometrica, 50, 221-234.
Fair, R. C. and D. M. Jaffee (1972) “Methods of Estimation for Markets in Disequilibrium”,
Econometrica, 40, 497-514.
Fair, R. C. and H. H. Kelejian (1974) “Methods of Estimation for Markets in Disequilibrium: A
Further Study”, Econometrica, 42(l), 177-190.
Fishe, R. P. H., R. P. Trost and P. Lurie (1981) “Labor Force Earnings and College Choice of Young
Women: An Examination of Selectivity Bias and Comparative Advantage”, Economics of Education
Review, 1, 169-191.
Frisch, R. (1949) “Prolegomena to a Pressure Analysis of Economic Phenomena”, Metroeconomica, 1,
135-160:
Gersovitz, M. (1980) “Classification Probabilities for the Disequilibrium Model”, Journal of Econo-
metrics, 41, 239-246.
Goldberger, A. S. (1972) “Selection Bias in Evaluating Treatment Effects: Some Formal Illustrations”.
Discussion Paper # 123-72, Institute for Research on Poverty, University of Wisconsin.
Goldberger, A. S. (1981) “Linear Regression After Selection”, Journal of Econometrics, 15, 357-66.
1684 G. S. Maddala

Goldberger, A. S. (1980) “Abnormal Selection Bias”. Workshop Series #8006, SSRI, University of
Wisconsin.
Goldfeld, S. M., D. W. JatTee and R. E. Quandt (1980) “ A Model of FHLBB Advances: Rationing or
Market Clearing?“, Review of Economics and Statistics, 62, 339-347.
Goldfeld, S. M. and R. E. Quandt (1975) “Estimation in a Disequilibrium Model and the Value of
Information”, Journal of Econometrics, 3(3), 325-348.
Goldfeld, S. M. and R. E. Quandt (1978) “Some Properties of the Simple Disequilibrium Model with
Covariance”, Economics Letters, 1, 343-346.
Goldfeld, S. M. and R. E. Quandt (1983) “The Econometrics of Rationing Models”. Paper presented
at the European Meetings of the Econometric Society, Pisa, Italy.
Gourieroux, C., J. J. Laffont and A. Monfort (1980a) “Disequilibrium Econometrics in Simultaneous
Eqs. Systems”, Econometrica, 48(l), 75-96.
Gourieroux, C., J. J. LatTont and A. Monfort (1980b) “Coherency Conditions in Simultaneous Linear
Fqs. Models with Endogenous Switching Regimes”, Econometrica, 48(3), 675-695.
Gourieroux, C. and A. Monfort (1980) “Estimation Methods for Markets with Controlled Prices”.
Working Paper # 8012, INSEE, Paris, October 1980.
Green, J. and J. J. Laffont (1981) “Disequilibrium Dynamics with Inventories and Anticipatory Price
Setting”, European Economic Review, 16(l), 199-223.
Griliches, Z., B. H. Hall and J. A. Hausman (1978) “Missing Data and Self-Selection in Large
Panels”, Annales de L’INSEE, 30-31, The Econometrics of Panel Duta, 137-176.
Gronau, R. (1974) “Wage Comparisons: A Selectivity Bias”, Journal of Political Economy, 82(6),
1119-1143.
Ham, J. C. (1982) “Estimation of a Labor Supply Model with Censoring Due to Unemployment and
Underemployment”, Review of Economic Studies, 49, 335-354.
Hartley, M. J. (1977) “On the Estimation of a General Switching Regression Model via Maximum
Likelihood Methods”. Discussion Paper #415, Department of Economics, State University of New
York at Buffalo.
Hartley, M. J. (1979) “Comment”, Journal of the American Statistical Association, 73(364), 738-741.
Hartley, M. J. and P. Mallela (1977) “The Asymptotic Properties of a Maximum Likelihood Estimator
for a Model of Markets in Disequilibrium”, Econometrica, 45(5), 1205-1220.
Hausman, J. A. (1978) “Specification Tests in Econometrics”, Econometrica, 46(6), 1251-1272.
Hay, J. (1980) “Selectivity Bias in a Simultaneous Logit-OLS Model: Physician Specialty Choice and
Specialty Income”. Manuscript, University of Connecticut Health Center.
Heckman, J. J. (1974) “Shadow Prices, Market Wages and Labor Supply”, Econometrica, 42(4),
679-694.
Heckman, J. J. (1967a) “Simultaneous Equations Models with Continuous and Discrete Endogenous
Variables and Structural Shifts”, in: Goldfeld and Quandt, eds., Studies in Nonlinear Estimation.
Carnbrldge: Ballinger Publishing.
Heckman, J. J. (1976b) “The Common Structure of Statistical Models of Truncation, Sample
Selection, and Limited Dependent Variables, and a Simple Estimator for Such Models”, Annals of
Economic and Social Measurement, 5(4), 475-492.
Heckman, J. J. (1978) “Dummy Endogenous Variables in a Simultaneous Equations System”,
Econometrica, 46(6), 931-959.
Heckman, J. J. (1979) “Sample Selection Bias as a Specification Error”, Econometrica, 47(l), 153-161.
Heckman, J. and B. Singer (1984) “A Method for Minimizing the Impact of Distributional Assump-
tions in Econometric Models for Duration Data”, Econometrica, 52(2), 271-320.
Hendry, D. F. and A. Spanos (1980) “Disequilibrium and Latent Variables”. Manuscript, London
School of Economics.
Hildebrand, F. B. (1956) Introduction to Numercial Analysis. New York: McGraw-Hill.
Hwang, H. (1980) “A Test of a Disequilibrium Model”, Journal of Econometrics, 12, 319-333.
Ito, T. (1980) “Methods of Estimation for Multi-Market Disequilibrium Models”, Econometrica,
48(l), 97-125.
Ito, T. and K. Ueda (1981) “Tests of the Equilibrium Hypothesis in Disequilibrium Econometrics: An
International Comparison of Credit Rationing”, International Economic Review, 22(3), 691-708.
Johnson, N. L. and S. Katz (1972) Distributions in Statistics: Continuous Multivariate Distributions.
Wiley: New York.
Ch. 28: Disequilibrium, Self-selection, and Switching Models 1685

Johnson, P. D. and J. C. Taylor (1977) “Modellina Monetary Disequilibrium”, in: M. G. Porter, ed.,
The Australian Monetary System in the 1970’s. Australia: Monash University.
Kennv. L. W.. L. F. Lee. G. S. Maddala and R. P. Trost (1979) “Returns to College Education: An
Inv&tigation of Self-Selection Bias Based on the Project Talent Data”, Znteriational Economic
Review, 20(3), 751-765.
Kiefer, N. (1978) “Discrete Parameter Variation: Efficient Estimation of a Switching Regression
Model”, Econometrica, 46(2), 427-434.
Kiefer, N. (1979) “On the Value of Sample Separation Information”, Econometrica, 47(4), 997-1003.
Kiefer. N. (198Oa) “A Note on Reaime Classification in Disequilibrium Models”. Review of Economic
Studies, 47(l), 637-639. -
Kiefer, N. (1980b) “A Note on Switching Regression and Logistic Discrimination”, Econometrica, 48,
637-639.
King, M. (1980) “An Econometric Model of Tenure Choice and Housing as a Joint Decision”,
Journal of Public Economics, 14(2), 137-159.
Kooiman, T. and T. Kloek (1979) “Aggregation and Micro-Markets in Disequilibrium: Theory and
Application to the Dutch Labor Market: 1948-1975”. Working Paper, Rotterdam: Econometric
Institute, April 1979.
Laffont, J. J. (1983) “Fix-Price Models: A Survey of Recent Empirical Work”. Discussion Paper
# 8305, University of Toulouse.
Laffont, J. J. and R. Garcia (1977) “Disequilibrium Econometrics for Business Loans”, Econometrica,
45(5), 1187-1204.
LalTont, J. J. and A. Monfort (1979) “Disequilibrium Econometrics in Dynamic Models”, Journal of
Econometrics, 11, 353-361.
Lee, L. F. (1976) Estimation of Limited Dependent Variable Models by Two-Stage Methods. Ph.D.
Dissertation, University of Rochester.
Lee, L. F. (1978a) “Unionism and Wage Rates: A Simultaneous Equations Model with Qualitative
and Limited Dependent Variables”, International Economic Review, 19(2), 415-433.
Lee, L. F. (1978b) “Comparative Advantage in Individuals and Self-Selection”. Manuscript, Univer-
sity of Minnesota.
Lee, L. F. (1979) “Identification and Estimation in Binary Choice Models with Limited (Censored)
Dependent Variables”, Econometrica, 47(4), 977-996.
Lee, L. F. (1982a) “Some Approaches ot the Correction of Selectivity Bias”, Review of Economic
Studies, 49, 355-372.
Lee, L. F. (1982b) “Test for Normality in the Econometric Disequilibrium Markets Model”, Journal
of Econometrics, 19, 109-123.
Lee, L. F. (1983a) “Generalized Econometric Models with Selectivity”, Econometrica, 51(2), 507-512.
Lee, L. F. (1983b) “Regime Classification in the Disequilibrium Market Models”. Discussion Paper
# 93, Center for Econometrics and Decision Sciences, University of Florida.
Lee, L. F. (1984) “Sequential Discrete Choice Econometric Models With Selectivity”. Discussion
Paper, University of Minnesota.
Lee, L. F. and R. P. Trost (1978) “Estimation of Some Limited Dependent Variable Models with
Application to Housing Demand”, Journal of Econometrics, 8, 357-382.
Lee, L. F., G. S. Maddala and R. P. Trost (1980) “Asymptotic Covariance Matrices of Two-Stage
Probit and Two-Stage Tobit Methods for Simultaneous Equations Models with Selectivity”,
Econometrica, 48(2), 491-503.
Lee, L. F. and G. S. Maddala (1983a) “The Common Structure of Tests for Selectivity Bias, Serial
Correlation, Heteroscedasticity and Normality in the Tobit Model”. Manuscript, Center for
Econometrics and Decision Sciences, University of Florida. Forthcoming in the International
Economic Review.
Lee, L. F. and G. S. Maddala (1983b) “Sequential Selection Rules and Selectivity in Discrete Choice
Econometric Models”. Manuscript, Center for Econometrics and Decision Sciences, University of
Florida.
Lee, L. F. and R. H. Porter (1984) “Switching Regression Models with Imperfect Sample Separation
Information: With an Application on Cartel Stability”, Econometrica, 52(2), 391-418.
Lewis, H. G. (1974) “Comments on Selectivity Biases in Wage Comparisons”, Journal of Political
Economy, 82(6), 1145-1155.
1686 G. S. Mad&a

Mackinnon, J. G. (1978) “Modelling a Market Which is Sometimes in Disequilibrium”. Discussion


Paper #287, Canada: Queens University, April 1978.
Mackinnon, J. F. and N. D. Olewiler (1980) “Disequilibrium Estimation of the Demand for Copper”,
The Bell Journal of Economics, 11, 197-211.
Maddala, G. S. (1977a) “Self-Selectivity Problems in Econometrica Models”, in: P. R. Krishnan, ed.,
Applicafions of Statistics. North-Holland Publishing, 351-366.
Maddala, G. S. (1977b) “Identification and Estimation Problems in Limited Dependent Variable
Models”, in: A. S. Blinder and P. Friedman, eds., Natural Resources, Uncertainty and General
Equilibrium Systems: Essays in Memoty of Rafael Lusky. New York: Academic Press, 219-239.
Maddala, G. S. (1983a) Limited Dependent und Qualitative Variables in Econometrics. New York:
Cambridge University Press.
Maddala, G. S. (1983b) “Methods of Estimation for Models of Markets with Bounded Price
Variation”, International Economic Review, 24(2), 361-378.
Maddala, G. S. (1984) “Estimation of the Disequilibrium Model with Noisy Indicators”. Manuscript,
University of Florida.
Maddala, G. S. and F. D. Nelson (1974) “Maximum Likelihood Methods for Models of Markets in
Disequilibrium”, Econometrica, 42(6), 1013-1030.
Maddala, G. S. and L. F. Lee (1976) “Recursive Models with Qualitative Endogenous Variables”,
Annals of Economic and Social Measurement, 5(4), 525-545.
Maddala, G. S. and F. D. Nelson (1975) “Switching Regression Models with Exogenous and
Endogenous Switching”, Proceedings of the Business and Economic Statistics Section, American
Statistical Association, 423-426.
Maddala, G. S. and J. S. Shonkwiler (1984) “Estimation of a Disequilibrium Model Under Rational
Expectations and Price Supports: The Case of Corn in the US”. Manuscript, University of Florida.
Maddala, G. S. and R. P. Trost (1981) “Alternative Formulations of the Nerlove-Press Models”,
Journal of Econometrics, 16, 35-49.
Maddala, G. S. and R. P. Trost (1982) “On Measuring Discrimination in Loan Markets”, Housing
Finance Review, l(l), 245-268.
Malinvaud, E. (1977) The Theory of Unemployment Reconsidered. Oxford: Blackwell.
Malinvaud, E. (1982) “An Econometric Model for Macro-Disequilibrium Analysis”, in:
M. Hazewinkel and A. H. G. Rinnoy Kan, eds., Current Developments in the Interface: Economics,
Econometrics, Mathematics. D. Reidel Publishing Co., 239-258.
Melino, A. (1982) “Testing for Sample Selection Bias”, Review of Economic Studies, 49(l), 151-153.
Meyer, R. H. and D. A. Wise (1983a) “The Effect of Minimum Wage on the Employment and
Earnings of Youth”, Journal of Labor Economics, l(l), 66-100.
Meyer, R. H. and D. A. Wise (1983b) “Discontinuous Distributions and Missing Persons: The
Minimum Wage and Unemployed Youth”, Econometrica, 51(6), 1677-1698.
Monroe, Margaret A. (1981) A Disequilibrium Econometric Annlysis of Interest Rate Futures Murkets.
Ph.D. Dissertation, University of Florida.
Muelbauer, J. and Winter D. (1980) “Unemployment, Employment and Exports in British Manufac-
turing: A Non-clearing Markets Approach”, European Economic Review, 13(2), 383-409.
Muthen, B. and K. G. Joreskog (1981) “Selectivity Problems in Quasi-experimental Studies”. Paper
presented at the Conference on “Experimental Research in Social Sciences”. University of Florida,
January 1981.
Nelson, F. D. (1975) Estimation of Economic Relationships with Censored, Truncated und Limited
Dependent Variables. Ph.D. Dissertation, University of Rochester.
Nelson, F. D. (1977) “Censored Regression Models with Unobserved Stochastic Censoring
Thresholds”, Journal of Econometrics, 6, 309-327.
Olsen, R. J. (1980) “A Least Squares Correction for Selectivity Bias”, Econometricu, 48(6), 1815-1820.
Olsen, R. J. (1982) “Distribution Tests for Selectivity Bias and a More Robust Likelihood Estimator”,
International Economic Review, 23(l), 223-240.
Orsi, R. (1982) “On the Dynamic Specification of Disequilibrium Econometrics: An Analysis of
Italian Male and Female Labor Markets”. CORE Discussion Paper # 8228, Louvain, Belgium.
Poirier, D. J. (1980) “Partial Observability in Bivariate Probit Models”, Journal of Econometrics, 12,
209-217.
Poirier, D. J. and P. A. Rudd (1981) “On the Appropriateness of Endogenous Switching”, Journnl of
Ch. 28: Disequilibrium, Sey-selection, and Switching Models 1687

Econometrics, 16(2), 249-256.


Portes, R. D. (1978) “Effective Demand and Spillovers in Empirical Two-Market Disequilibrium
Models”. Discussion Paper #595, Harvard Institute of Economic Research, November 1977.
Portes, R. D. and D. Winter (1978) “The Demand for Money and for Consumption Goods in
Centrally Planned Economies”, The Review of Economics and Statistics, 60(l), 8-18.
Portes, R. D. and D. Winter (1980) “Disequilibrium Estimates for Consumption Goods Markets in
Centrally Planned Economies”, Review of Economic Studies. 47(l). 137-159.
Quandt, R. E. (1978) “Maximum Likelihood Estimation of Disequilibrium Models”, Pioneering
Economics, Italy: Padova.
Quandt, R. E. (1978) “Tests of the Equilibrium vs. Disequilibrium Hypothesis”, International
Economic Review, 19(2), 435-452.
Quandt, R. E. and J. D. Ramsey (1978) “Estimating Mixtures of Normal Distributors and Switching
Regressions”, with discussion, Journal of the American Statistical Association, 73, 730-752.
Quandt, R. E. (1981) “Autocorrelated Errors in Simple Disequilibrium Models”, Economics Letters, 7,
55-61.
Quandt, R. E. (1982) “Econometric Disequilibrium Models”. With comments by D. F. Hendry,
A. Monfort and J. F. Richard, Econometric Reviews, l(l), l-63.
Quandt, R. E. (1983) “Bibliography of Quantity Rationing and Disequilibrium Models”. Princeton
University, Dec. 1983, updated every 3-6 months.
Quandt, R. E. (1984) “Switching Between Equilibrium and Disequilibrium”, Review of Economics and
Statistics, forthcoming.
Richard, J. F. (1980a) “Models with Several Regime Changes and Changes in Erogeneity”, Review of
Economic Studies, 47(l), l-20.
Richard, J. F. (1980b) “C-Type Distributions and Disequilibrium Models”. Paper presented in the
Toulouse Conference on “Economics and Econometrics of Disequilibrium”.
Rogers, A. J. (1983) “Generalized Lagrange Multiplier Tests for Problems of One-Sided Alternatives”.
Manuscript, Princeton University.
Rosen, S. and M. I. Nadiri (1974) “A Disequilibrium Model of Demand for Factors of Production”,
American Economic Review, papers and proceedings, 64(2), 264-270.
Rosen, H. and R. E. Quandt (1978) “Estimation of a Disequilibrium Aggregate Labor Market”,
Review of Economics and Statistics, 60, 371-379.
Roy, A. D. (1951) “Some Thoughts on the Distribution of Earnings”, Oxford Economic Papers, 3,
135-146.
Samelson, H., R. M. Thrall and 0. Wesler (1958j “A Pax ;iiion Theorem for Euclidean n Space”,
Proceedings of the American Mathematical Society, 9, 805-807.
Schmidt, P. (1982) “An Improved Version of the Quandt-Ramsay MGF Estimator for Mixtures of
Normal Distributions and Switching Regressions”, Econometrica, 50(2), 501-516.
Sealy, C. W., Jr. (1979) “Credit Rationing in the Commercial Loan Market: Estimates of a Structural
Model Under Conditions of Disequilibrium”, Journal of Finance, 34(2), 689-702.
Sneessens, H. (1981) Theory and Estimation of Macroeconomic Rationing Models. New York:
Springer-Verlag, 1981.
Sneessens, H. (1983) “A Macro-Economic Rationing Model of the Belgian Economy”, European
Economic Reuiew, 20, 193-215.
Tishchler, A. and I. Zang (1979) “A Switching Regression Model Using Inequality Conditions”,
Journal of Econometrics, 11, 259-274.
Trost, R. P. (1981) “Interpretation of Error Covariances With Non-Random Data: An Empirical
Illustration of Returns of College Education”, Atlantic Economic Journal, 9(3), 85-90.
Trost, R. P. and L. F. Lee (1984) “Technical Training and Earnings: A Polychotomous Choice Model
with Selectivity”, The Review of Economics and Statistics, 66(1),151-156.
TunaIi. I. (1983) “A Common Structure for Models of Double Selection”. Renort #8304. Social
Systems -Research Institute, University of Wisconsin.
Upcher, M. R. (1980) Theory and Applications of Disequilibrium Econometrics. Ph.D. Dissertation,
Canberra: Australian National University.
Venti, S. F. and D. A. Wise (1982) “Test Scores, Educational Opportunities, and Individual Choice”,
Journal of Public Economics, 18, 35-63.
Waldman, D. M. (1981) “An Economic Interpretation of Parameter Constraints in a Simultaneous
1688 G. S. Maddala

Equations Model with Limited Dependent Variables”, International Economic Review, 22(3),
731-730.
Wales, T. .I. and A. D. Woodland (1980) “Sample Selectivity and the Estimation of Labor Supply
Functions”, Intemutional Economic Review, 21, 437-468.
Wallis, K. F. (1980) “Econometric Implications of the Rational Expectations Hypothesis”,
Econometrica, 48(l), 49-72.
Willis, R. J. and S. Rosen (1979) “Education and Self-Selection”, Journal of Political Economy, Part 2,
87(5), 507-526.
Wu, De-Mm (1973) “Alternative Tests of Independence Between Stochastic Regressors and Dis-
turbances”, Econometrica, 41(3), 733-750.
Chdpter 29

ECONOMETRIC ANALYSIS OF LONGITUDINAL DATA*


JAMES J. HECKMAN

Uniuersiiy of Chicugo and NORC

BURTON SINGER

Yule University und NORC

Contents

0. Introduction 1690
1. Single spell models 1691
1.1. Statistical preliminaries 1691
1.2. Examples of duration models produced by economic theory 1695
1.3. Conventional reduced form models 1704
1.4. Identification and estimation strategies 1710
1.5. Sampling plans and initial conditions problems 1727
1.6. New issues that arise in formulating and estimating choice theoretic duration models 1744
2. Multiple spell models 1748
2.1. A unified framework 1748
2.2. General duration models for the analysis of event history data 1753
3. Summary 1759
References 1761

*This research was supported by NSF Grant SES-8107963 and NIH Grant NIH-l-ROl-HD16846-01
to the Economics Research Center, NORC, 6030 S. Ellis, Chicago, Illinois 60637. We thank Takeshi
Amemiya and Aaron Han for helpful comments.

Hundhook of Econometrics, Volume III, Edited by Z. Griliches and M.D. Intriligator


0 Elsevier Science Publishers BV, 1986
1690 J. J. Heckman and B. Singer

0. Introduction

In analyzing discrete choices made over time, two arguments favor the use of
continuous time models. (1) In most economic models there is no natural time
unit within which agents make their decisions and take their actions. Often it is
more natural and analytically convenient to characterize the agent’s decision and
action processes as operating in continuous time. (2) Even if there were natural
decision periods, there is no reason to suspect that they correspond to the annual
or quarterly data that are typically available to empirical analysts, or that the
discrete periods are synchronized across individuals. Inference about an underly-
ing stochastic process that is based on interval or point sampled data may be very
misleading especially if one falsely assumes that the process being investigated
operates in discrete time. Conventional discrete choice models such as logit and
probit when defined for one time interval are of a different functional form when
applied to another time unit, if they are defined at all. Continuous time models
are invariant to the time unit used to record the available data. A common set of
parameters can be used to generate probabilities of events occurring in intervals
of different length. For these reasons the use of continuous time duration models
is becoming widespread in economics.
This paper considers the formulation and estimation of continuous time
econometric duration models. Research on this topic is relatively new and much
of the available literature has borrowed freely and often uncritically from
reliability theory and biostatistics. As a result, most papers in econometric
duration analysis present statistical models only loosely motivated by economic
theory and assume access to experimental data that are ideal in comparison to the
data actually available to social scientists.
This paper is in two parts. Part I -which is by far the largest-considers single
spell duration models which are the building blocks for the more elaborate
multiple spell models considered in Part II. Many issues that arise in multiple
spell models are more easily discussed in a single spell setting and in fact many of
the available duration data sets only record single spells.
Our discussion of single spell duration models is in six sections. In Section 1.1
we present some useful definitions and statistical concepts. In Section 1.2 we
present a short catalogue of continuous time duration models that arise from
choice theoretic economic models. In Section 1.3 we consider conventional
methods for introducing observed and unobserved variables into reduced form
versions of duration models. We discuss the sensitivity of estimates obtained from
single spell duration models to inherently ad hoc methods for controlling for
observed and unobserved variables.
Ch. 29: Econometric Analysis of Longitudinal Data 1691

The extreme sensitivity to ad hoc parameterizations of duration models that is


exhibited in this section leads us to ask the question “what features of duration
models can be identified nonparametrically ?” Our answer is the topic of Section
1.4. There we present nonparametric procedures for assessing qualitative features
of conditional duration distributions in the presence of observed and unobserved
variables. We discuss nonparametric identification criteria for a class of duration
models (proportional hazard models) and discuss tradeoffs among criteria re-
quired to secure nonparametric identification. We also discuss these questions for
a more general class of duration models. The final topic considered in this section
is nonparametric estimation of duration models.
In Section 1.5 we discuss the problem of initial conditions. There are few
duration data sets for which the beginning of the sample observation period
coincides with the start of a spell. More commonly, the available data for single
spell models consist of interrupted spells or portions of spells observed after the
sample observation period begins. The problem raised by this sort of sampling
frame and its solution are well known for duration models with no unobservables
in time homogeneous environments. We present these solutions and then discuss
this problem for the more difficult but empirically relevant case of models with
unobservable variables in time inhomogeneous environments. In Section 1.6 we
return to the structural duration models discussed in Section 1.2 and consider new
econometric issues that arise in attempting to recover explicit economic parame-
ters.
Part II on multiple spells is divided into two sections. The first (Section 2.1)
presents a general framework which contains many interesting multiple spell
models as a special case. The second (Section 2.2) presents a multiple spell event
history model and considers conditions under which access to multiple spell data
aids in securing model identification. This paper concludes with a brief summary.

1. Single spell models

1.1. Statistical preliminaries

There are now a variety of excellent textbooks on duration analysis that discuss
the formulation of duration models so that a lengthy introduction to standard
survival models is unnecessary.’ In an effort to make this chapter self-contained,
however, this section sets out the essential ideas that we need from this literature
in the rest of the chapter.

‘See especially, Kalbfleisch and Prentice (1980), Lawless (1982) and Cox and Oakes (1984).
1692 J. J. Heckman and B. Singer

A nonnegative random variable T with absolutely continuous distribution


function G(t) and density g(t) may be uniquely characterized by its hazard
function. The hazard for T is the conditional density of T given T > t 2 0, i.e.

h(t)=f(tlTx)= s(t) (1.1.1)


l-G(t) “*

Knowledge of G determines h.
Conversely, knowledge of h determines G because by integration of (1.1.1)

/*h(U)du=-ln(I-G(x))Irfc,
0 0

G(t) =I-exp( -Jbh(u)du); (1.1.2)

c = 0 since G(0) = 0. The density of T is

g(t)= h(t)exp( -p(u)du). (1.1.3)

For the rest of this paper we assume that the distribution of T is absolutely
continuous, and we associate T with spell duration.2 In this case it is also natural
to interpret h(t) as an exit rate or escape rate from the state because it is the limit
(as A + 0) of the probability that a spell terminates in interval (t, t + A) given
that the spell has lasted t periods, i.e.

G(t +A)-G(t)
= lim
A-0 A I (I- &))

s(t) (1.1.4)
= l-G(t) ’

Equation (1.1.4) constitutes an alternative definition of the hazard that links the
models discussed in Part I to the more general multistate models discussed in Part
II.

*For a treatment of duration distributions that are not absolutely continuous see, e.g. Lawless
(1982).
Ch. 29: Econometric Analysis of Longitudinal Data 1693

The survivor function is the probability that a duration exceeds 1. Thus

S(t)=P(T>t)=l-G(t)=exp(-c(u)du). (1.1.5)

In terms of the survivor function we may write the density g(t) as

Note that there is no requirement that

lim Jfi(u)du-+cc, (1.1.6)


f-+cc 0

or equivalently that

s(o0) = 0.

If (1.1.6) is satisfied, the duration distribution is termed nondefective. Otherwise,


it is termed defective.
The technical language here creates the possibility of confusion. There is
nothing wrong with defective distributions. In fact they emerge naturally from
many optimizing models. For example, Jovanovic (1979) derives an infinite
horizon worker-firm matching model with a defective job tenure distribution.
Condition (1.1.6) is violated in his model so S(oc) > 0 because some proportion of
workers find that their current match is so successful that they never wish to leave
their jobs.
Duration dependence is said to exist if

dh(t) _+.
dt ’

The only density with no duration dependence almost everywhere is the exponen-
tial distribution. For in this case h(t) = h, a constant, and hence from (1.1.2), T is
an exponential random variable. Obviously if G is exponential, h(t) = h.
If dh(t)/dt > 0, at t = t,, there is said to be positive duration dependence at t,.
If d h( t)/dt < 0, at t = to, there is said to be negative duration dependence at t,. In
job search models of unemployment, positive duration dependence arises in the
case of a “declining reservation wage” (see, e.g. Lippman and McCall, 1976). In
this case the exit rate from unemployment is monotonically increasing in t. In job
turnover models negative duration dependence (at least asymptotically) is associ-
ated with worker-firm matching models (see, e.g. Jovanovic, 1979).
1694 J. .I.Heckman and B. Singer

For many econometric duration models it is natural to analyze conditional


duration distributions where the conditioning is with respect to observed (x(t))
and unobserved (O(t)) variables. Indeed, by analogy with conventional regression
analysis, much of the attention in many duration analyses focuses on the effect of
regressors (x( t )) on durations.
We define the conditional hazard as

Pr(t<T-ct+AlT>t,x(t),B(t))
h(tlx(t),t3(t)) = lim (1.1.7)
A-0 A

The dating on regressor vector x(t) is an innocuous convention. x(t) may include
functions of the entire past or future or the entire paths of some variables, e.g.

where the zi( U) are underlying time dated regressor variables.


We make the following assumptions about these conditioning variables.
(A.l) O(t) is distributed independently of x(t’) for all t, t’. The distribution of
8 is p(e). The distribution of x is D(x).
(A.2) There are no functional restrictions connecting the conditional distribu-
tion of T given B and x and the marginal distributions of B and x.
Speaking very loosely, x is assumed to be “weakly exogenous” with respect to the
duration process. More precisely x is ancillary for T.3
By analogy with the definitions presented for the raw duration models, we may
integrate (1.1.7) to produce the conditional duration distribution

G(tlx,e) =I-exp(
-~(2+(u),e(24))d24), (1.1.8)

the conditional survivor function

qtle, X) = P(T> tlx,e)


= exp (-j’h(ulx(u),e(u))du), (1.1.9)
0

‘See. e.g. COY and Hinkley (1974) fw a discussionof ancillzity.


Ch. 29: Econometric Anulysis of Longitudinal Dutu 1695

and the conditional density

One specification of conditional hazard (1.1.7) that has received much attention
in the literature is the proportional hazard specification [see Cox (1972)]

which postulates that the log of the conditional hazard is linear in functions of t,
x and 8 and that

44) 2 o,de(t)) md-+)) 20 for all t,

where 17is a monotonic continuous increasing function of e(t).

1.2. Examples of duration models produced by economic theory

In this section of the paper, we present three examples of duration models


produced by economic choice models. These examples are (A) a continuous time
labor supply, (B) a continuous time search unemployment model, and (C) a
continuous time consumer purchase model that generalizes conventional discrete
choice models in a straightforward way.
Examples A and B contain most of the essential ideas. We demonstrate how a
continuous time formulation avoids the need to specify arbitrary decision periods
as is required in conventional discrete time models (see, e.g. Heckman, 1981a).
We also discuss a certain identification problem that arises in single spell models
that is “solved” by assumption in conventional discrete time formulations.

1.2.1. Example A: A dynamic model of labor force participation

The one period version of this model is the workhorse of labor economics.
Consumers at age a are assumed to possess a concave twice differentiable one
period utility function defined over goods (X(a)) and leisure (L(a)). Denote this
utility function by U( X(a), L(a)). Define leisure hours so that 0 I L(a) I 1. The
consumer is free to choose his hours of work at parametric wage W(a). There are
no .fixed costs of work, and for convenience taxes are ignored. At each age the
consumer receives unearned income Y(a). There is no saving or borrowing.
Decisions are assumed to be made under perfect certainty.
The consumer works at age a if the marginal rate of substitution between
goods and leisure evaluated at the no work position (also known as the non-
1696 J. J. Hecknun and B. Singer

market wage)

mw~l) (1.24
M(Y(a))
= U,(Y(a),l) ’

is less than the market wage N’(a). For if this is so, his utility is higher in the
market than at home. The subscripts on U denote partial derivatives with respect
to the appropriate argument. It is convenient to define an index function Z(u)
written as

z(u) =W(u)-M(Y(u)).

If Z(u) 2 0, the consumer works at age a, and we record this event by setting
d(u) =l. If Z(a) -C0, d(u) = 0.
In a discrete time model, a spell of employment begins at a, and ends at u2 + 1
provided that Z(u, - 1) < 0, Z(u, + j) 2 0, j = 0,. . . , u2 - a,, I(u, + 1) < 0. Re-
versing the direction of the inequalities generates a characterization of a nonwork
spell that begins at a, and ends at u2.
To complete the econometric specification, error term E(U) is introduced.
Under an assumption of perfect certainty, the error term arises from variables
observed by the consumer but not observed by the econometrician. In the current
context, E(U) can be interpreted as a shifter of household technology and tastes.
For each person successive values of E(U) may be correlated, but it is assumed
that E(U) is independent of Y(u) and IV(u). We define the index function
inclusive of E(U) as

z*(u) = W(u)-M(Y(u))+E(u). (1.2.2)

If Z*(u) 2 0, the consumer works at age a.


The distribution of Z*(u) induces a distribution on employment spells. To
demonstrate this point in a simple way we assume that (i) the E(U) are serially
independent, (ii) the environment is time homogeneous so W(u) and Y(u) remain
constant over time for the individual, (iii) the probability that a new value of E is
received in an interval is P, and (iv) that the arrival times of new values of E(U)
are independent of W, Y, and other arrival times. We denote the c.d.f. of E by F.
By virtue of the perfect certainty assumption, the individual knows when new
values of E will arrive and what they will be. The econometrician, however, does
not have this information at his disposal. He never directly observes E(U) and
only knows that a new value of nonmarket time has arrived if the consumer
actually changes state.
Ch. 29: Econometric Annlysis of Longitudinul Dutu 1697

The probability that an employed person does not leave the employed state is

I- P(G)> (1.2.3)

where 4 = M(Y) - W. The probability of receiving j new values of E in interval


t, is

Pj = > Pi(l_ P)te-j,


( i

The probability that a spell is longer than t, is the sum over j of the products of
the probability of receiving j innovations in t,(Pi) and the probability that the
person does not leave the employed state on each of the i occasions (1 - F( #))j.
Thus

p(q>t,)= 2 ‘f
(iJ
pj(i-P)‘e-‘(i-F(q))’

=
j=()

(l- PF(J/))“. (1.2.4)

Thus the probability that an employment spell starting at calendar time t, = 0


terminates at t, is

P(T,=t,)=P(T,>t,-l)-P(T,x,)

= (l-PF(lj/))“-‘(PF(~)). (1.2.5)

By similar reasoning it can be shown that the probability that a nonemploy-


ment spell lasts t, periods is

P(T,=t,)= [(l-P(l-F(rC/))]‘n-‘P(l-F(rC,)). (1.2.6)

In conventional models of discrete choice over time [see, e.g. Heckman (1981a)l
P is implicitly set to one. Thus in these models it is assumed that the consumer
receives a new draw of E each period. The model just presented generalizes these
models to allow for the possibility that E may remain constant over several
periods of time. Such a generalization creates an identification problem because

PF( 4) or P(l -
from a single employment or nonemployment spell it is only possible to estimate
F( #)) respectively. This implies that any single spell model of the
duration of employment or nonemployment is consistent with the model of eq.
(1.2.2) with P =l or with another model in which (1.2.2) does not characterize
behavior but in which the economic variables determine the arrival time of new
values of E. However, access to both employment and nonemployment spells
1698 J. J. Heckman and B. Singer

solves this problem because P = PF(#)+ P(l- F(q)), and hence F(#) and. P
are separately identified.
The preceding model assumes that there are natural periods of time within
which innovations in E may occur. For certain organized markets there may be
well-defined trading intervals, but for the consumer’s problem considered here no
such natural time periods exist. This suggests the following continuous time
reformulation.
In place of the Bernoulli assumption for the arrival of fresh values of E, suppose
instead that a Poisson process governs the arrival of shocks. As is well known [see,
e.g. Feller (1970)] the Poisson distribution is the limit of a Bernoulli trial process
in which the probability of success in each interval 17= A/n, P,, goes to zero in
such a way that lim, _,OnPq + X # 0. Thus in the reformulated continuous time
model it is assumed that an infinitely large number of very low probability
Bernoulli trials occur within a specified interval of time.
For a time homogeneous environment the probability of receiving j offers in
time period t, is

P(jlt e ) = exp( - At e ) (Ye) (1.2.7)


J! ’

Thus for the continuous time model the probability that a person who begins
employment at a = a, will stay in the employed state at least t, periods is, by
reasoning analogous to that used to derive (1.2.6),

Pr(T,>t,)= 2 exp(-hr,)$$(l-F($))i
J=o

= exp(- W#)t,), (1.2.8)

so the density of spell lengths is

dt,) = W~)exp(- WWJ-

A more direct way to derive (1.2.8) notes that from the definition of a Poisson
process, the probability of receiving a new value of E in interval (a, CI+ A) is

p = XA + o(A),

where lim d ~ o( o(A)/A) + 0, and the probability of exiting the employment state
conditional on an arrival of E is E;(#). Hence the exit rate or hazard rate from the
Ch. 29: Econometric Analysis of Longitudinal Data 1699

employment state is

h’=di_m XAFtJI)
+o(A) 3
A

= XF($).
Using (1.1.4) relating the hazard function and the survivor function we conclude
that

Pr(T, > t,) = exp( - pe(u)du) = exp(- XF(#)t,).

By similar reasoning, the probability that a person starting in the nonemployed


state will stay on in that state for at least duration t, is

Pr(T,>t,lh)=exp(-A(l-F($))t,).

Analogous to the identification result already presented for the discrete time
model, it is impossible using single spell employment or nonemployment data to
separate X from F( I/J) or 1 - F( J/) respectively. However, access to data on both
employment and nonemployment spells makes it possible to identify both X and
F(#).
The assumption of time homogeneity of the environment is made only to
simplify the argument. Suppose that nonmarket time arrives via a nonhomoge-
neous Poisson process so that the probability of receiving one nonmarket draw in
interval (a, a + A) is

p(u) = A(a)A + o(A). (1.2.9)

Assuming that W and Y remain constant, the hazard rate for exit from employ-
ment at time period a for a spell that begins at a, is

h&la,) = +)F(ICI), (1.2.10)

so that the survivor function for the spell is

P(T,>t,lur)=exp -F(+)c”n(.)d.).’ (1.2.11)


(

4As first noted by Lundberg (1903), it is possible to transform this model to a time homogeneous
Poisson model if we redefine duration time to be

~*(r,,a,)=/““f’A(u)du.
01
Allowing for time inhomogeneity in Y(a) and W(a) raises a messy, but not especially deep problem.
It is possible that the values of these variables would change at a point in time in between the arrival
of E values and that such changes would result in a reversal of the sign of I*(a) so that the consumer
would cease working at points in time when e did not change. Conditioning on the paths of Y(a) and
W(a) formally eliminates the probiem.
1700 J. J. Heckman and B. Singer

By similar reasoning

~(~,>t,(a,)=exp -(I-F(#))f+%(u)du).
i

1.2.2. Example B: A one state model of search unemployment

This model is well exposited in Lippman and McCall (1976). The environment is
assumed to be time homogeneous. Agents are assumed to be income maximizers.
If an instantaneous cost c is incurred, job offers arrive from a Poisson process
with parameter X independent of the level of c(c > 0). The probability of
receiving a wage offer in time interval At is X At + o(At).5 Thus the probability
of two or more job offers in interval At is negligible.6
Successive wage offers are independent realizations from a known absolutely
continuous wage distribution F(W) with finite mean that is assumed to be
common to all agents. Once refused, wage offers are no longer available. Jobs last
forever, there is no on the job search, and workers live forever. The instantaneous
rate of interest is r( > 0).
V is the value of search. Using Bellman’s optimality principle for dynamic
programming [see, e.g. Ross (1970)], V may be decomposed into three compo-
nents plus a negligible component [of order o(A t)].

v=------- cAt
(l- XAt) ~
l+rAt + l+rAt

+ &Emax[w/r; V]+o(At), for Y> 0,

= 0 otherwise. (1.2.12)

The first term on the right of (1.2.12) is the discounted cost of search in interval
At. The second term is the probability of not receiving an offer (1 - X At) times
the discounted value of search at the end of interval At. The third term is the
probability of receiving a wage offer, (A At), times the discounted value of the
expectation [computed with respect to F(W)] of the maximum of the two options
confronting the agent who receives a wage offer: to take the offer (with present
value w/r) or to continue searching (with present value I’). Note that eq. (1.2.12)

50( A t) is defined as a term such that lim,, _ Oo( A t)/A t + 0.


‘For one justification of the Poisson wage arrival assumption, see, e.g. Burdett and Mortensen
(1978).
Ch. 29: Econometric Analysis of Longitudinal Data 1701

is defined only for V> 0. If I/= 0, we may define the agent as out of the labor
force [see Lippman and McCall (1976)]. As a consequence of the time homogene-
ity of the environment, once out the agent is always out. Sufficient to ensure the
existence of an optimal reservation wage policy in this model is E( ]Wl) < cc
[Robbins (1970)].
Collecting terms in (1.2.12) and passing to the limit, we reach the familiar
formula [Lippman and McCall (1976)]

c+rV= (h/r)i;(~-rV)dF(w) for V> 0, (1.2.13)

where rV is the reservation wage, which is implicitly determined from (1.2.13).


For any offered wage w 2 rV, the agent accepts the offer. The probability that an
offer is unacceptable is F(F(~V).
To calculate the probability that an unemployment spell T, exceeds t,, we may
proceed as in the preceding discussion of labor supply models and note that the
probability of receiving an offer in time interval (a, a + A) is

p = hA + o(A), (1.2.14)

and further note that the probability that an offer is accepted is (1 - F(rV)) so

h, = X(1 - F(rV)), (1.2.15)

and

P(T, > t,) = exp( - h(l- F(rV))t,). (1.2.16)

For discussion of the economic content of this model, see, e.g Lippman and
McCall (1976) or Flinn and Heckman (1982a).
Accepted wages are truncated random variables with rV as the lower point of
truncation. The density of accepted wages is

g( WIW> rv) = f(w)


W2l.V. (1.2.17)
l- F(rV) ’

Thus the one spell search model has the same statistical structure for accepted
wages .as other models of self selection in labor economics [Lewis (1974),
Heckman (1974), and the references in Amemiya (1984)].
1702 J. J. Heckmun and B. Singer

From the assumption that wages are distributed independently of wage arrival
times, the joint density of duration times t, and accepted wages (w) is the
product of the density of each random variable,

m(t,,w)= {A(l-F(r~))exp-(X(l-F(rT/)t.)} I_f21;tv)


= (xexp-h(l-F(r~))t,)f(w), w2rv. (1.2.18)

Time homogeneity of the environment is a strong assumption to invoke


especially for the analysis of data on unemployment spells. Even if the external
environment were time homogeneous, finiteness of life induces time inhomogene-
ity in the decision process of the agent. We present a model for a time
inhomogeneous environment.
For simplicity we assume that a reservation wage property characterizes the
optimal policy noting that for general time inhomogeneous models it need not.7
We denote the reservation wage at time r as TV(~).
The probability that an individual receives a wage offer in time period (7, r + A)
is

&)=X(r)A+o(A). (1.2.19)

The probability that it is accepted is (1 - F( rV( 7))). Thus the hazard rate at time
7 for exit from an unemployment spell is

h(T) = q4(1- w+))), (1.2.20)

so that the probability that a spell that began at r1 lasts at least t, is

h(z)(l-F(rV(z)))dr). (1.2.21)

The associated density is

. exp ( -J’““+)(l-a(rV(z)))dr).8
71

‘For time inhomogeneity induced solely by the finiteness of life, the reservation wage property
characterizes an optimal policy (see, e.g. De Groot, 1970).
‘Note that in this model it is trivial to introduce time varying forcing variables because by
assumption the agent cannot accept a job in between arrival of job offers. Compare with the discussion
in footnote 4.
C/I. 29: Econometric Analysis o/ Longitudinul Datu 1703

1.2.3. Example C: A dynamic McFadden model

As in the marketing literature (see, e.g. Hauser and Wisniewski, 1982a, b, and its
nonstationary extension in Singer, 1982), we imagine consumer choice as a
sequential affair. An individual goes to a grocery store at randomly selected times.
Let h(r) be the hazard function associated with the density generating the
probability of the event that the consumer goes to the store at time r. We assume
that the probability of two or more visits to the store in interval A is o(A).
Conditional on arrival at the store, he may purchase one of J items. Denote the
purchase probability by Pi<7). Choices made at different times are assumed to be
independent, and they are also independent of arrival times. Then the probability
that the consumer purchases good j at time r is

so that the probability that the next purchase is item j at a time t = r + ri or later
is

~(t, jl~~) =exp( -_l?+‘h(U)Pj(u)da). (1.2.23)


71

The Pj may be specified using one of the many discrete choice models discussed
in Amemiya’s survey (1981). For the McFadden random utility model with
Weibull errors (1974), the Pi are multinominal logit. For the Domencich-
McFadden (1975) random coefficients preference model with normal coefficients
the Pj are specified by multivariate probit.
In the dynamic McFadden model few new issues of estimation and specifica-
tion arise that have not already been discussed above or in Amemiya’s survey
article (1984). For concreteness, we consider the most elementary version of this
model.
Following McFadden (1974), the utility associated with each of J possible
choices at time r is written as

~(T)=v(~,xj(T))+E(~,xj(T)), j=l,. . ., J

where s denotes vectors of measured attributes of individuals, X(T) represents


vectors of attributes of choices, V is a nonstochastic function and the E(S, xj( 7))
are i.i.d. Weibull, i.e.
1704 J. J. Heckman and B. Singer

Then as demonstrated by McFadden (p. 110)

Adopting a linear specification for V we write

I+? x,(r)> = $(.GG),


so
exp(+48(4)
pj(sT x,(T))= J

c exP(x;(+(4).
I=1

In a model without unobservable variables, the methods required to estimate this


model are conventional.
The parameter /3(s) can be estimated by standard logit analysis using data on
purchases made at purchase times. The estimation of the times between visits to
stores can be conducted using the conventional duration models described in
Section 1.3. More general forms of Markovian dependence across successive
purchases can be incorporated (see Singer, 1982, for further details).

1.3. Conventional reduced form models

The most direct approach to estimating the economic duration models presented
in Section 1.2 is to specify functional forms for the economic parameters and their
dependence on observed and unobserved variables. This approach is both costly
and controversial. It is controversial because economic theory usually does not
produce these functional forms- at best it specifies potential lists of regressor
variables some portion of which may be unobserved in any data set. Moreover in
many areas of research such as in the study of unemployment durations, there is
no widespread agreement in the research community about the correct theory.
The approach is costly because it requires nonlinear optimization of criterion
functions that often can be determined only as implicit functions. We discuss this
point further in Section 1.6.
Because of these considerations and because of a widespread belief that it is
useful to get a “feel for the data” before more elaborate statistical models are fit,
reduced form approaches are common in the duration analysis literature. Such an
approach to the data is inherently ad hoc because the true functional form of the
duration model is unknown. At issue is the robustness of the qualitative in-
ferences obtained from these models with regard to alternative ad hoc specifica-
tions. In this section of the paper we review conventional approaches and reveal
Ch. 29: Econometric Analysis of Longitudinal Data 1705

their lack of robustness. Section 1.4 presents our response to this lack of
robustness.
The problem of nonrobustness arises solely because regressors and unobserv-
ables are introduced into the duration model. If unobservables were ignored and
the available data were sufficiently rich, it would be possible to estimate a
duration model by a nonparametric KaplanMeier procedure [see, e.g. Lawless
(1982) or Kalbfleisch and Prentice (1980)]. Such a general nonparametric ap-
proach is unlikely to prove successful in econometrics because (a) the available
samples are small especially after cross classification by regressor variables and
(b) empirical modesty leads most analysts to admit that some determinants of any
duration decision may be omitted from the data sets at their disposal.
Failure to control for unobserved components leads to a well known bias
toward negative duration dependence. This is the content of the following
proposition:
Proposition 1
Uncontrolled unobservables bias estimated hazards towards negative duration
dependence. 0
The proof is a straightforward application of the Cauchy-Schwartz theorem.
Let h( tlx, 6) be the hazard conditional on x, B and h(tln) is the hazard
conditional only on x. These hazards are associated respectively with conditional
distributions G(tlx,8) and G(tlx).
From the definition,

j&lxJ)dd6’)
h(tlx) =

Thus’

ah(M) =
j-(1-G(tlx,e)) ah(ra-Jye)
d/J(e)

Jo-ww))dd~)

‘We use the fact that

ah(tlx,@) = at
at
1706 J. J. Heckman and B. Singer

The second term on the right-hand side is always nonpositive as a consequence of


the Cauchy-Schwartz theorem. 0
Intuitively, more mobility prone persons are the first to leave the population
leaving the less mobile behind and hence creating the illusion of stronger negative
duration dependence than actually exists.
To ignore unobservables is to bias estimated hazard functions in a known
direction. Ignoring observables has the same effect. So in response to the limited
size of our samples and in recognition of the myriad of plausible explanatory
variables that often do not appear in the available data, it is unwise to ignore
observed or unobserved variables. The problem is how to control for these
variables.
There are many possible conditional hazard functions [see, e.g. Lawless (1982)].
One class of proportional hazard models that nests many previous models as a
special case and therefore might be termed “flexible” is the Box-Cox conditional
hazard

h(f]x,@)=exp (x ‘(t)j9+ ($+i+(+++B(l)), (1.3.2)

where hi # X,, x(t) is a 1 x k vector of regressors and p is a k X 1 vector of


parameters, and 8 is assumed to be scalar. (See Flinn and Heckman, 1982b.)
Exponentiating ensures that the hazard is nonnegative as is required for a
conditional density.
Setting y2 = 0 and Xi = 0 produces a Weibull hazard; setting y2 = 0 and hi = 1
produces a Gompertz hazard. Setting yi = yZ = 0 produces an exponential model.
Conditions under which this model is identified for the case y2 = 0 are presented
in Section 1.4.
The conventional approach to single spell econometric duration analysis as-
sumes a specific functional form known up to a finite set of parameters for the
conditional hazard and a specific functional form known up to a finite set of
parameters for the distribution of unobservables. e(t) is assumed to be a time
invariant scalar random variable 8. An implicit assumption in most of this
literature is that the origin date of the sample is also the start date of the spells
being analyzed so that initial conditions or left censoring problems are ignored.
We question this assumption in Section 1.5 below.
The conventional approach does, however, allow for right censored spells
assuming independent censoring mechanisms. We consider two such schemes.

Let V(t) be the probability that a spell is censored at duration t or later. If

v(t) = 0 t-c L,
(1.3.3)
v(t) =l 12 L,
Ch. 29: Econometric Analysis of Lmgitudinal Data 1101

there is censoring at fixed duration L. This type of censoring is common in many


economic data sets. More generally, for continuous censoring times let u(t) be the
density associated with V(t). In an independent censoring scheme, the censoring
time is assumed to be independent of the survival time, and the censoring
distribution is assumed to be functionally independent of the survival distribu-
tion, and does not depend on 13.
Let d = 1 if a spell is not right censored and d = 0 if it is. Let t denote an
observed spell length. Then the joint frequency of (t, d) conditional on x for the
case of absolutely continuous distribution V(t) is

= {~(t)‘-dv(t)d}l[h(tlr(t),e)l”S(tlx(t),e)dp(B). (1.3.4)
8

By the assumption of functional independence between V(t) and G(tlx), we may


ignore the V and u functions in estimating p(e) and h(tlx(t), 0) via maximum
likelihood.
For the Dirac censoring distribution (1.3.3), the density of observed durations
is

(1.3.5)

It is apparent from (1.3.4) or (1.3.5) that without further restrictions, a variety


of h( tlx, 0) and ~(0) pairs will be consistent with any f( t, d Ix).” Conditions
under which a unique pair is determined are presented in Section 1.4. It is also
apparent from (1.3.4) or (1.3.5) that given the functional form of either h(tlx, 0)
or p( 0) and the data (f(t, d Ix)) it is possible, at least in principle, to appeal to
the theory of integral equations and solve for either p(8) or h(tlx,8). Current
practice thus ouerparumeterizes the duration model by specifying the functional
form of both h (t Ix, d) and p( 6). In Section 1.4, we discuss methods for
estimating ~(0) nonparametrically given that the functional form of h(tlx, 8) is
specified up to a finite number of parameters. In the rest of this section we
demonstrate consequences of incorrectly specifying either h(tlx, 6) or ~(0).
First consider the impact of incorrect treatment of time varying regressor
variables. Many conventional econometric duration analyses are cavalier about

“Heckman and Singer (1982) present some examples. They are not hard to generate for anyone
with access to tables of integral transforms.
1708 J. J. Heckman and B. Singer

such variables because introducing them into the analysis raises computational
problems. Except for special time paths of variables the term

which appears in survivor function (1.1.8) does not have a closed form expression.
To evaluate it requires numerical integration.
To circumvent this difficulty, one of two expedients is often adopted (see, e.g.
Lundberg, 1981, Cox and Lewis, 1966):
(i) Replacing time trended variables with their within spell average

x(t) = f jg:(U)dU. t > 0,

(ii) Using beginning of spell values

Expedient (i) has the undesirable effect of building spurious dependence between
duration time t and the manufactured regressor variable. To see this most clearly,
suppose that x is a scalar and x(u) = a + bu. Then clearly

x(t) = a + it,

and t and x(t) are clearly linearly dependent. Expedient (ii) ignores the time
inhomogeneity in the environment.”
To illustrate the potential danger from adopting these expedients consider the
numbers presented in Table 1. These record Weibull hazards ((1.3.2) with yz = 0
and A, = 0) estimated on data for employment to nonemployment transitions
using the CTM program described by Hotz (1983). In these calculations, unob-
servables are ignored. A job turnover model estimated using expedient (i) indi-
cates weak negative duration dependence (column one row two) and a strong
negative effect of high national unemployment rates on the rate of exiting jobs.
The same model estimated using expedient (ii) now indicates (see column two)
strong negative duration dependence and a strong positive effect of high national

I1Moreover in the multistate models with heterogeneity that are presented in Part II of this paper,
treating x(0) ‘as exogenous is incorrect because the value of x(0) at the start of the current spell
depends on the lengths of outcomes of preceding spells. See the discussion in Section 2.2. This
problem is also discussed in Flinn and Heckman (1982b, p. 62).
Ch. 29: Econometric Anulysis of Longitudinal Dais 1709

Table 1
Weibull model - Employment to nonemployment transitions
(absolute value of normal statistics in parentheses)a

Regressors fixed at Regressors fixed at Regressors


average value over value as of start vary
the spell of spell freely
(expedient i) (expedient ii)

Intercept 0.971 - 3.743 - 3.078


(1.535) (12.074) (8.670)
In duration (n) -0.137 - 0.230 - 0.341
(1.571) (2.888) (3.941)
Married with spouse - 1.093 - 0.921 -0.610
present? ( = 1 if yes; (2.679) (2.310) (1.971)
= 0 otherwise)
National unemployment - 1.800 0.569 0.209
rate (6.286) (3.951) (1.194)

“Source: See Flinn and Heckman (1982b, p. 69).

unemployment rates on the rate of exiting employment. Allowing regressors to


vary freely reveals that the strong negative duration dependence effect remains,
but now the effect of the national unemployment rate on exit rates from
employment is weak and statistically insignificant.
These empirical results are typical. Introducing time varying variables into
single spell duration models is inherently dangerous, and ad hoc methods for
doing so can produce wildly misleading results. More basically, separating the
effect of time varying variables from duration dependence is only possible if there
is “sufficient” independent variation in x(t). To see this, consider hazard (1.3.2)
with yz = 0 and x(t) scalar. Taking logs, we reach

PI-1
ln(h(tlx,d))=x(t)P+ 7 -Y1+ w.
i 1 i

If

t”’ -1
x(t) = 7’
1

it is obviously impossible to separately estimate /3 and yl. There is a classical


multicollinearity problem. For a single spell model in a time inhomogeneous
environment with general specifications for duration dependence, the analyst is at
the mercy of the data to avoid such linear dependence problems. Failure to
1710 J. J. Heckman and B. Singer

Table 2
Sensitivity to misspecification of the mixing distribution ~(0)“~~

Normal heterogeneity Log normal heterogeneity Gamma heterogeneity

Intercept - 3.92 - 13.2 5.90


(2.8)b (4.7) (3.4)
In duration ( y) - 0.066 - 0.708 - 0.576
(0.15) (0.17) (0.17)
0.0036 - 0.106 - 0.202
(0.048) (0.03) (0.06)

Education 0.0679 - 0.322 - 0.981


(0.233) (0.145) (0.301)
Tenure on - 0.0512 0.00419 - 0.034
previous job (0.0149) (0.023) (0.016)
Unemployment - 0.0172 0.0061 - 0.003
benefits (0.0036) (0.0051) (0.004)

Married 0.833 0.159 - 0.607


(0.1) (0.362) (0.30) (0.496)
Unemployment - 26.12 25.8 - 17.9
rate (9.5) (10.3) (11.2)
Ed. x age - 0.00272 0.00621 0.0152
(0.0044) (0.034) (0.0053)

“Sample size is 456.


‘Standard errors in parentheses.
Source: See Heckman and Singer (1982) for further discussion of these numbers

control for time varying regressor variables may mislead, but introducing such
variables may create an identification problem.
Next we consider the consequence of misspecifying the distribution of unob-
servables. Table 2 records estimates of a Weibull duration model with three
different specifications for ~(6) as indicated in the column headings. The esti-
mates and inference vary greatly depending on the functional form selected for
the mixing distribution. Trussell and Richards (1983) report similar results and
exhibit similar sensitivity to the choice of the functional form of the conditional
hazard h(tlx, 8) for a fixed p(O).

1.4. Identijication and estimation strategies

In our experience the rather vivid examples of the sensitivity of estimates of


duration models to changes in specification presented in the previous.section of
the paper are the rule rather than the exception. This experience leads us to
address the following three questions in this section of the paper:

(A) What features, if any, of h( tin, l3) and/or ~(0) can be identified from the
“raw data”, i.e. G(tlx)?
Ch. 29: Econometric Analysis of Longiiudinal Data 1711

(B) Under what conditions are h(tjx, tl) and ~(6) identified? i.e. how much
a priori information has to be imposed on the model before these functions
are identified?
(C) What empirical strategies exist for estimating h(tlx, 0) and/or ~(8) non-
parametrically and what is their performance?

We assume a time homogeneous environment throughout. Little is known about


the procedure proposed below for general time inhomogeneous environments.

I .4.1. Nonparametric procedures to assess the structural hazard h(t 1x, 0)

This section presents criteria that can be used to test the null hypothesis of no
structural duration dependence and that can be used to assess the degree of model
complexity that is required to adequately model the duration data at hand. The
criteria to be set forth here can be viewed in two ways: As identification theorems
and as empirical procedures to use with data.
We consider the following problem: G(tlx) is estimated. We would like to infer
properties of G(tlx, 0) without adopting any parametric specification for p(O) or
h (t Ix, 0). We ignore any initial conditions problems. We further assume that x(t)
is time invariant.‘*
As a consequence of Proposition 1 proved in the preceding section, if G( tlx)
exhibits positive duration dependence for some intervals of t values, h( tlx, 0)
must exhibit positive duration dependence for some interval of @ values in those
intervals of t. As noted in Section 1.3, this is so because the effect of scalar
heterogeneity is to make the observed conditional duration distribution exhibit
more negative duration dependence (more precisely, never less negative duration
dependence) than does the structural hazard h( tlx, 0).
In order to test whether or not an empirical G( tlx) exhibits positive duration
dependence, it is possible to use the total time on test statistic (Barlow et al. 1972,
p. 267). This statistic is briefly described here. For each set of x values,
constituting a sample of 1, durations, order the first k durations starting with the
smallest

Let DiII,= [I, -(i + l)](t; - ti_& where t, = 0. Define

l2 If x(t) is not time invariant, additional identification problems arise. In particular, nonparametric
estimation of G( tlx( 1)) becomes much more difficult.
1712 J. J. Heckman and B. Singer

V, is called the cumulative total time on test statistic. If the observations are from
a distribution with an increasing hazard rate, V, tends to be large. Intuitively, if
G(tlx)is a distribution that exhibits positive duration dependence, D,: 1, sto-
chastically dominates D,: I,, D,: ,, stochastically dominates D, : l,, and so forth.
Critical values for testing the null hypothesis of no duration dependence have
been presented by Barlow and associates (1972, p. 269). This test can be modified
to deal with censored data (Barlow et al. 1972, p. 302). The test is valuable
because it enables the econometrician to test for positive duration dependence
without imposing any arbitrary parametric structure on the data.
Negative duration dependence is more frequently observed in economic data.
That this should be so is obvious from eq. (1.3.1) in the proof of Proposition 1.
Even when the structural hazard has a positive derivative dh(t]x, 0)/& > 0, it
often occurs that the second term on the right-hand side of (1.3.1) outweighs the
first term. It is widely believed that it is impossible to distinguish structural
negative duration dependence from a pure heterogeneity explanation of observed
negative duration dependence when the analyst has access only to single spell
data. To investigate duration distributions exhibiting negative duration depen-
dence, it is helpful to distinguish two families of distributions.
Let 9,= {G: -ln[l-G(t] x )] is concave in t holding x fixed}. Membership in
this class can be determined from the total time on test statistic. If G, is log
concave, the 0,: r, defined earlier are stochastically increasing in i for fixed 1,
and X. Ordering the observations from the largest to the smallest and changing
the subscripts appropriately, we can use V, to test for log concavity.
Next let gz = {G: G(tlx) = /(l -exp( - t+(X)n(d)))dp (0) for some probabil-
ity measure p on [0, cc]}. It is often erroneously suggested that gt = gz2, i.e. that
negative duration dependence by a homogeneous population (G E 9,) cannot be
distinguished from a pure heterogeneity explanation (G E g*).
In fact, by virtue of Bernstein’s theorem (see, e.g. Feller, 1971, p. 439-440) if
G E qz it is completely monotone, i.e.

(-l)$(l-G(t,x))rO for n rl and all t 2 0 (1.4.1)

and if G(tlx) satisfies (1.4.1), G(tjx) E 9f2.


Setting n = 3, (4.1) is violated if (- 1)3[ J3/at3](1 - G(tJx)) < 0, i.e. if for some
t = t,

_ a’Wx> +3h(t,x)%+‘(t,x) >o,


[ at2 I ,= f”

[see Heckman and Singer (1982) and Lancaster and Nickel1 (1980)].
Ch. 19: Econometric Annlysis of Longitudinal Data 1713

Formal verification of (1.4.1) requires uncensored data sufficiently rich to


support numerical differentiation twice. Note that if the data are right censored at
r = t *, we may apply (1.4.1) over the interval 0 < 1 I t* provided that -we define

and test whether

(-l)“$l-G*(#))>O for n 21 and O<t<t*. (1.4.2)

Satisfaction of (1.4.2) for all 0 < t < t* is only a necessary condition. It is


sufficient only if t * + 00.
Chamberlain (1980) has produced an alternative test of the necessary condi-
tions that must be satisfied for a distribution to belong to gZ that does not
require numerical differentiation of empirical distribution functions and that can
be applied to censored data.
The key insight in his test is as follows. For G E gZ2, the probability that T > k
is the survivor function

(1.4.3)

By a transformation of variables z = exp( - +(x)77(6)), we may transform (1.4.3)


for fixed x to

S( klx) = /olz’da*(z), (1.4.4)

i.e. as the k th moment of a random variable defined on the unit interval.


From the solution to the classical Hausdorff moment problem (see, e.g. Shohat
and Tamarkin, 1943, p. 9) it is known that there exists a n*(z) that satisfies
(1.4.4) if

A’??([lx) 2 0 k,I=O.l,..., oc (1.4.5)


1714 J. J. Heckmun and B. Singer

where

A”S(rlx) = S(@),

A’S(/lx)=S(llx)-S(l+llx),

A%(l(x) = S(llx)-( ;)S(l[x)

Choosing equispaced intervals (O,l, .. . , [t*]) where [t *] is the nearest whole


integer less than t*, form the S(ZJx) functions I = 0,. . . , [I*]. Compute the
survivor functions so defined and test a subset of the necessary conditions
(i=l,..., k). The estimated survivor functions are asymptotically normally dis-
tributed as the number of independent observations becomes large, and thus the
asymptotic distribution of the subset of survivor functions is straightforward to
compute. Failure of these necessary conditions implies that (1.4.4) and hence
(1.4.3) cannot represent the underlying duration distribution. Thus it is possible
to reject G E gz if some subset of conditions (1.4.5) is not satisfied. Note that if x
are i.i.d., the same test procedure can be applied to the full sample based on the
unconditional survivor functions S(Z), I = 1,. . . , [t *]. In an important paper Robb
(1984) extends this analysis by presenting a larger set of necessary conditions and
by producing finite sample test statistics for the strengthened conditions.
It is important to note that (1.4.5) or (1.4.1) are rejection criteria. There are
other models that may satisfy (1.4.1). For example

(1.4.6)

for OL< 1 is completely monotone. By Bernstein’s theorem this distribution has a


representation in gz.

1.4.2. Nonparametric procedures to assess the mixing distribution

In this subsection we consider some procedures that enable us to assess the


modality of the mixing distribution. For expositional simplicity we suppress the
dependence of X. Let Yj= {G:G(t)=jdg(u)du, g(t)=jg(tlO)m(8)dO for
some probability density m(0) and g(t(8)= k(t(B)u(t), where @t(d) is sign
regular of order 2 (SR,)}.
Ch. 29: Econometric Analysis of Lmgitudinul Data 1715

Sign regularity means that if t, < t, and 6i < t’,, then

E det W,l4) kW2) >.


2
w214) w2v2) i - ’
i

where a2 is either +l or -1. If &2= + 1, then &t](3) is called totally positive of


order 2, abbreviated TP,. From the point of view of inferring properties about the
density of a mixing measure from properties of g, models with SR, conditional
densities allow us to obtain lower bounds on the number of modes in m(0) from
knowledge of the number of modes in g(u). Models for which k(t]e) =
g(t]e)/u( t) satisfies SR, include all members of the exponential family. In fact,
for the exponential family, k(t]e) is TP,. Thus an assessment of modality of an
estimated density, using, for example, the procedure of Hartigan and Hartigan
(1985), is an important guide to specifying the characteristics of the density of 0.
Sign regular (particularly totally positive) kernels include many examples that
are central to model specification in economics. In particular, if dv(t) is any
measure on [0, + co) such that jam[exp(tO)]dv(t) -C+ cc for B E 0 (an ordered
set), let

and, in what follows, set dv(t) = u(t)dr and g(t]e) = /3(B)[expte]v(t). Then the
density g(t) = /fl( B)exp(te) u( t)m( B)de governs the observable durations of
spells, g(t]e) is a member of the exponential family, and k(t]@) = P(B)exp(te) is
TP, (Karlin, 1968). The essential point in isolating this class of duration densities
is that knowledge of the number and character of the modes of g/u implies that
the density, m, of the mixing measure must have at least as many modes. In
particular, if g/u is unimodal, m cannot be monotonic; it must have at least one
mode. More generally, if c is an arbitrary positive level and (g(t)/u(t))- c
changes sign k times as t increases from 0 to + co, then m(O)- c must change
sign at least k times as 8 traverses the parameter set 8 from left to right (Karlin,
1968, p. 21).
The importance of this variation-diminishing character of the transformation
jk( tl8)m( 8)dB for modeling purposes is that if we assess the modality of g
using, for example, the method of Hartigan and Hartigan (1985) then because u
is given a priori, we know the modality of g/u, which in turn, implies restrictions
on m in fitting mixing densities to data. In terms of a strategy of fitting finite
mixtures, a bimodal g/u suggests fitting a measure with support at, say, five
1716 J. J. Heckman and B. Singer

Figure 1

points to the data, but subject to the constraints that p1 < p2, p2 > p3, p3 < p4,
and p4 > ps, as shown in Figure 1.
Subsequent specification of a mixing density m( r3) to describe the same data
could proceed by fitting spline polynomials with knots at 8,, . . ., 0, to the
estimated discrete mixing distribution.

1.4.3. Identijiability

In the preceding section, nonparametric procedures were proposed to assess


qualitative features of conditional hazards and mixing distributions. These proce-
dures aid in model selection but provide only qualitative guidelines to aid in
model specification. In this section we consider conditions under which the
conditional hazards and mixing distributions are identified. Virtually all that is
known about this topic is for proportional hazard models (1.1.10) with scalar time
invariant heterogeneity (e(t) = d) and time invariant regressors (x(t) = x). Thus
identification conditions are presented for the model

Wx, 0) = W+)e. (1.4.7)

Before stating identifiability conditions, it is useful to define

Z(t) = [+!+)du.

Then for the proportional hazard model (1.4.7) we have the following proposition
due to Elbers and Bidder (1982).
Proposition 2

If (i) E(O) = 1, (ii) Z(t) defined on [0, co) can be written as the integral of a
nonnegative integrable function 4 (t) defined on [0, cc), Z( t ) = /d# (u) du, (iii) the
set S, x E S, is an open set in Rk and the function cp is defined on S and is
nonnegative, differentiable and nonconstant on S, then Z, (p, and p(e) are
identified. 0
Ch. 29: Econometric Analysis of Longitudinal Data 111-l

The important point to note about Proposition 2 is that the identification


analysis is completely nonparametric. No restrictions are imposed on 4, cp or
~(6’) except for those stated in the proposition. Condition (iii) requires the
existence of at least one continuous valued regressor variable defined on an
interval. In the Appendix to their paper, Elbers and Bidder modify their proof to
establish identifiability in models with only discrete valued regressor variables.
However, the existence of at least one regressor variable in the model is essential
in securing identification. Condition (i) requires the existence of a mean for the
distribution of 19. This assumption excludes many plausible fat tailed mixing
distributions. Defining 17by 13= es, condition (i) is not satisfied for distributions
of n which do not possess a moment generating function. For example Pareto 0
with finite mean, Cauchy distributed n and certain members of the Gamma
family fail condition (i).13
The requirement that E(O) < cc is essential to the Elbers-Ridder proof. If this
condition is not satisfied, and if no further restrictions are placed on p(e), the
duration model is not identified.
As an example of this point, suppose that the true model is Weibull with
Z,(t) = tao, cpa(x) = e X’80and p,, such that E( 0) < cc. The survivor function for
this model is

&,(z(x) =1-G,(+) =/nw[exp(- ~aOexP(~%))]&,(@).

Define w = ta exp(x@). Then

s&) = ~Twt- tie)1 dhte) = ~ot4,


is the Laplace transform (L) of random variable &I.
We have already noted in the discussion surrounding (4.6) that by virtue of
Bernstein’s theorem, if 0 < c < 1,

L,(d) = /om[exp( - de*)] dpr(B*),

is completely monotone and is the Laplace transform of some random variable


O* where E(O*) = cc. Thus

l-G,(tjx)=L,(o’)=L,(w)=l-G&lx),

I3 For

f(q)= [exp(-.X7)]% and Ail, E(expq)=co.


1718 J. J. Heckman and B. Singer

so a model with

Zl(t)=ta~‘,cpl(x)=exp(cxj8,) and pi such that E(O*) = cc

explains the data as well as the original model (a = LYE,/3 = & and p = p. with
E( 0) < 00).
The requirement that E(O) < 00 is overly strong. Heckman and Singer (1984a)
establish identifiability when E(O) = cc by restricting the tail behavior of the
admissible mixing distribution. Their results are recorded in the following pro-
position.
Proposition. 3

If

(i) The random variable 0 is nonnegative with a nondefective distribution p.


For absolutely continuous p, the density m(e) is restricted so that

m(e)- (lns)‘sf’ier~(q

(1.4.8)

ase~oowherec>O,O<~<landy2OwhereL(B)isslowlyvaryingin
the sense of Karamata.14 E is assumed known.
(ii) Z E 2 = { Z(t), t 2 0 : Z(t) is a nonnegative increasing function with Z(0) =
0 and 3c > 0 and t, not depending on the function Z(t) such that
Z( t + ) = c where c is a known constant}.
(iii) $JE @ = { +( x ), x E S: $J is nonconstant on S, 3 at least one coordinate xi
defined on (- 00, 00) such that +(O,O,. . . , x;,O,. . .) traverses (0, cc)
as xi traverses (- cc, cc), 0 ES, and +(O) = l}.

Then Z, 9, and p are identified. 0 For proof, see Heckman and Singer (1984a).
Condition (i) is weaker than the Elbers and Kidder condition (i). 8 need not
possess moments of any order nor need the distribution function p have a
density. However, in order to satisfy (i) the tails of the true distribution are
assumed to die off at a fast enough rate and the rate is assumed known. The
condition that Z( t + ) = c for some c > 0 and t + > 0 for all admissible Z plays an
important role. This condition is satisfied, for example, by a Weibull integrated
hazard since for all (Y,Z(1) = 1. The strengthened condition (ii) substitutes for the
weakened (i) in our analysis. Condition (iii) has identical content in both analyses.
The essential idea in both is that + varies continuously over an interval. In the

l4Heckman and Singer (1984a) also present conditions for p( 0) that are not absolutely continuous.
For a discussion of slowly varying functions see Feller (1971, p. 275).
Ch. 29: Econometric Analysis of Longitudinal Data 1719

absence of a finiteness of first moment assumption, Proposition 3 gives a


conditional identification result. Given E, it is possible to estimate $.J, P, ‘p
provided cross over condition (ii) is met.
A key assumption in the Heckman-Singer proof and in the main proof of
Elbers and Ridder is the presence in the model of at least one exogenous variable
that takes values in an interval of the real line. In duration models with no
regressors or with only categorical regressors both proofs of identifiability break
down. This is so because both proofs require exogenous variables that trace out
the Laplace transform of 0 over some interval in order to uniquely identify the
functions of interest.15
The requirement that a model possess at least one regressor is troublesome. It
explicitly rules out an interaction detection strategy that cross-classifies the data
on the basis of exogenous variables and estimates separate Z(t) and ~(0)
functions for different demographic groups. It rules out interactions between x
and 8 and x and Z.
In fact some widely used parametric hazard models can be identified together
with the mixing distribution ~(0) even when no regressors appear in the model.
Identification is secured under these conditions by specifying the functional form
of the hazard function up to a finite number of unknown parameters and placing
some restrictions on the moments of admissible ~1distributions.
A general strategy of proof for this case is as follows [for details see Heckman
and Singer (1984a)]. Assume that Z&(t) is a member of a parametric family of
nonnegative functions and that the pair (a, CL)is not identified. Assuming that Z;
is differentiable to order j, nonidentifiability implies that the identities

for all t 2 0 must hold for at least two distinct pairs (cy,, po), ((~i, pr). We then
derive contradictions. We demonstrate under certain stated conditions that these
identities cannot hold unless a, = a,. Then ~1 is identified by the uniqueness
theorem for Laplace transforms.

15As previously noted, in their Appendix Elbers and Ridder (1982) generalize their proofs to a case
in which all of the regressors are discrete valued. However, a regressor is required in order to secure
identification.
1720 J. J. Heckman and B. Singer

To illustrate this strategy consider identifiability for the class of BOX-COX


hazards (see eq. 1.3.2 with yZ = 0):

Z;=exp(y(y)).

For this class of hazard models there is an interesting tradeoff between the
interval of admissible h and the number of bounded moments that is assumed to
restrict the admissible p( 0). More precisely, the following propositions are proved
in our joint work.
Proposition 4
For the true value of A, A,, defined so that A, I 0, if E(O) -c COfor all admissible
p, and for all bounded y, then the triple (ye, A,, pO) is uniquely identified. 0 [For
proof, see Heckman and Singer (1984a).]
Proposition 5
For the true value of A, A,, such that 0 < A, < 1, if all admissible p are restricted
to have a common finite mean that is assumed to be known a priori (E( 0) = m,)
and a bounded (but not necessarily common) second moment E(02) < 00, and
all admissible y are bounded, then the triple (yO, A,, pO) is uniquely identified. 0
(For proof see Heckman and Singer, 1984a.)
Proposition 6
For the true value of A, A,, restricted so that 0 < A, < j, j a positive integer, if all
admissible ~1are restricted to have a common finite mean that is assumed to be
known a priori (E(O) = m,) and E bounded (but not necessarily common)
(j + 1)” moment (E(@+‘) < CO), and all admissible y are bounded, then the
triple (y,,, A,, p,J is uniquely identified. 0 (For proof see Heckman and Singer,
1984a.)

It is interesting that each integer increase in the value of A, > 0 requires an


integer increase in the highest moment that must be assumed finite for all
admissible p.
The general strategy of specifying a flexible functional form for the hazard and
placing moment restrictions on the admissible p works in other models besides
the Box-Cox class of hazards. For example consider a nonmonotonic log logistic
model used by Trussell and Richards (1985).

z,(t) = (ww”-’ CX3>h, a> 0. (1.4.9)


l+(h)* ’
Ch. 29: Econometric Anulysis of Longitudinal Data 1721

Proposition 7

For hazard model (1.4.9), the triple (A,, (Ye,p,,) is identified provided that the
admissible p are restricted to have a common finite mean E(O) = m, < cc. 0 (For
proof, see Heckman and Singer, 1984a.)
An interesting and more direct strategy of proof of identifiability which works
for some of the hazard model specifications given above is due to Arnold and
Brockett (1983). To illustrate their argument, consider the Weibull hazard

h(tpq = d-9,

and mixing distributions restricted to those having a finite mean. Then a routine
calculation shows that (Y may be calculated directly in terms of the observed
survivor function via the recipe

(y= lim 1n(Nt)/s(t))


t-0 lnt ’

The mixing distribution is then identified using the uniqueness theorem for
Laplace transforms. Their proof of identifiability is constructive in that it also
provides a direct procedure for estimation of ~(6) and (Ythat is distinct from the
procedure discussed below.
Provided that one adopts a parametric position on h(tlt9) these propositions
show that it is possible to completely dispense with regressors. Another way to
interpret these results is to note that since for each value of x, we may estimate 2,
and p(8), it is not necessary to adopt proportional hazards specification (1.4.7) in
order to secure model identification. All that is required is a conditional (on x)
proportional hazards specification. Z and p may be arbitrary functions of X.
Although we have no theorems yet to report, it is obvious that it should be
possible to reverse the roles of p(e) and h(tlr3): i.e. if p(O) is parameterized it
should be possible to specify conditions under which h( tie) is identified nonpara-
metrically.
The identification results reported here are quite limited in scope. First, as
previously noted in Section 1.3, the restriction that the regressors are time
invariant is crucial. If the regressors contain a common (to all observations) time
trended variable, ‘p can be identified from +!Jonly if strong functional form
assumptions are maintained so that In Ic, and lncp are linearly independent. Since
one cannot control the external environment, it is always possible to produce a #
function that fails this linear independence test. Moreover, even when x(t)
follows a separate path for each person, so that there is independent variation
between In Jl(t) and lncp( t), at least for some observations, a different line of
proof is required than has been produced in the literature.
1122 J. J. Heckman and B. Singer

Second, and more important, the proportional hazard model is not derived
from an economic model. It is a statistically convenient model. As is implicit from
the models presented in Section 1.2 and as will be made explicit in Section 1.6
duration models motivated by economic theory cannot in general be cast into a
proportional hazards mold. Accordingly, the identification criteria discussed in
this section are of limited use in estimating explicitly formulated economic
models. In general, the hazard functions produced by economic theory are not
separable as is assumed in (1.4.7).
Research is underway on identifiability conditions for nonseparable hazards.
As a prototype we present the following identification theorem for a specific
nonseparable hazard.
Proposition 8
Nonseparable model with (i) Z,(t) = t(ux)2+e, (ii) density ws(x]O) =
(8 +P)[exp-(0 +&xl and (iii) jedp(B) < cc is identified. 0 For proof, see
Heckman and Singer (1983).
Note that not only is the hazard nonseparable in x and 8 but the density of x
depends on 0 so that x is not weakly exogeneous with respect to 8.
Before concluding this discussion of identification, it is important to note that
the concept of identifiability employed in this and other papers is the requirement
that the mapping from a space of (conditional hazards) X (a restricted class of
probability distributions) to (a class of joint frequency functions for durations
and covariates) be one to one and onto. This formulation of identifiability is
standard. In this literature there is no requirement of a metric on the spaces or of
completeness. Such requirements are essential if consistency of an estimator is
desired. In this connection, Kiefer and Wolfowitz (1956) propose a definition of
identifiability in a metric space whereby the above-mentioned mapping is 1: 1 on
the completion (with respect to a given metric) of the original spaces. Without
some care in defining the original space, undesirable distributions can appear in
the completions.
As an example, consider a Weibull hazard model with conditional survivor
function given an observed k-dimensional covariate x defined as

where

/I E compact subset of k-dimensional Euclidean space, and F0 is restricted to be


a probability distribution on [0, + cc) with jomudF,( u) = 1. As a specialization of
Elbers and Ridder’s (1982) general proof, (Ye, & and F, are identified. Now
consider the completion with respect to the Kiefer-Wolfowitz (1956) metric of the
Ch. 29: Econometric Analysis of Longitudinal Data 1723

Cartesian product of the parameter space of allowed OLand /3 values and the
probability distributions on [0, + cc) satisfying /$‘a dF,( V) = 1. The completion
contains distributions Fi on [0, + oe) satisfying jomudF,( u) = cc. Now observe
that if S(t(x) has a representation as defined above for some (YE (0,l) and F,
with mean 1, then it is also a completely monotone function of t. Thus we also
have the representation

S(tlx) = jm[exp(
- t(w( -d%))u)]d&(u),
0

but now Fi must have an infinite mean. This implies that ((Ye,&,, F,) and
(1, &, Fi) generate the same survivor function. Hence the model is not identifi-
able on the completion of a space where probability distributions are restricted to
have a finite mean.
This difficulty can be eliminated by further restricting F, to belong to a
uniformly integrable family of distribution functions. Then all elements in the
completion with respect to the Kiefer-Wolfowitz and a variety of other metrics
will also have a finite mean and identifiability is again ensured. The comparable
requirement for the case when E,,(V) = cc is that (1.4.8) converges uniformly to
its limit.
The a priori restriction of identifiability considerations to complete metric
spaces is not only central to establishing consistency of estimation methods but
also provides a link between the concept of identifiability as it has developed in
econometrics and notions of identifiability which are directly linked to con-
sistency as in the engineering literature on control theory.

1.4.4. Nonparametric estimation

Securing identifiability of a nonparametric model is only the first step toward


estimating the model. At the time of this writing, no nonparametric estimator has
been devised that consistently estimates the general proportional hazard model
(1.4.7).
In Heckman and Singer (1984b) we consider consistent estimation of the
proportional hazard model when +(t) and cp(X) are specified up to a finite
number of parameters but p(e) is unrestricted except that it must have either a
finite mean and belong to a uniformly integrable family or satisfy tail condition
(1.4.8) with uniform convergence. We verify sufficiency conditions due to Kiefer
and Wolfowitz (1956) which, when satisfied, guarantee the existence of a con-
sistent nonparametric maximum likelihood estimator. We analyze a Weibull
model for censored and uncensored data and demonstrate how to verify the
sufficiency conditions for more general models. The analysis only ensures the
existence of a consistent estimator. The asymptotic distribution of the estimator is
unknown.
1724 J. J. Heckman and B. Singer

Drawing on results due to Laird (1978) and Lindsey (1983a,b), we characterize


the computational form of the nonparametric maximum likelihood estimator.16
To state these results most succinctly, we define

t* = q(x)[#(+-b= cp(x)z(t).
For any fixed value of the parameters determining v(x) and Z(t) in (1.4.7) t*
conditional on 8 is an exponential random variable i.e.

f( t*lfSJ)= Bexp( - t*O) 820. (1.4.10)

For this model, the following propositions can be established for the Nonpara-
metric Maximum Likelihood Estimator (NPMLE).
Proposition 9

Let I* be the number of distinct t* values in the sample of I( 2 I*) observa-


tions. Then the NPMLE of ~(6) is a finite mixture with at most I* points of
increase, i.e. for censored and uncensored data (with d = 1 for uncensored
observations)

f(t*) = E B,dexp(- t*ei)Pi,


i=l

where Pi 2 0, c:z,Pi =l. 0


Thus the NPMLE is a finite mixture but in contrast to the usual finite mixture
model, I* is estimated along with the Pi and ei. Other properties of the NPMLE
are as follows.
Proposition 10

Assuming that no points of support { ei} come from the boundary of 8 the
NPMLE is unique. 0 (See Heckman and Singer, 1984b.)
Proposition 11

For uncensored data, 1&,.,=1/t& and d,, =1/t,& where “*” denotes the
NPMLE estimate, and t;, and t& are, respectively, the sample maximum and

161ncomputing the estimator it is necessary to impose all of the identifiability conditions in order to
secure consistent estimators. For example, in a Weibull model with E( 8) < cc, it is important to
impose this requirement in securing estimates. As our example in the preceding subsection indicated,
there are other models with E(8) = cc that will explain the data equally well. In large samples, this
condition is imposed, for example, by picking estimates of p(0) such that l/(1 - b(@)df?l< DC,or
equivalently l/(1 - P(t?)df?-’ > 0. Similarly, if identification is secured by tail condition (1.4.8), this
must be imposed in selecting a unique estimator. See also the discussion at the end of Section 1.4.3.
CIT. 29: Econometric Anu!ysis of Longitudinal Data 1725

minimum values for t*. For censored data, emi, = 0 and d,, = l/tz,-,. 0 (See
Heckman and Singer, 1984b.)
These propositions show that the NPMLE for p( r3) in the proportional hazard
model is in general unique and the estimated points of support lie in a region with
known bounds (given t *). In computing estimates one can confine attention to
this region. Further characterization of the NPMLE is given in Heckman and
Singer (1984b).
It is important to note that all of these results are for a given t* = Z( t)cp(x).
The computational strategy we use fixes the parameters determining Z(t) and
(p(x) and estimates ~(0). For each estimate of ~(6’) so achieved Z(t) and cp(x)
are estimated by traditional parametric maximum likelihood methods. Then fresh
t * are generated and a new p( 0) is estimated until convergence occurs. There is
no assurance that this procedure converges to a global optimum.
In a series of Monte Carlo runs reported in Heckman and Singer (1984b) the
following results emerge.

(i) The NPMLE recovers the parameters governing Z(t) and q(x) rather well.
(ii) The NPMLE does not produce reliable estimates of the underlying mixing
distribution.
(iii) The estimated c.d.f. for duration times G(tlx) produced via the NPMLE
predicts the sample c.d.f. of durations quite well even in fresh samples of
data with different distributions for the x variables.

A typical run is reported in Table 3. The structural parameters (a,, (Ye) are
estimated rather well. The mixing distribution is poorly estimated but the within
sample agreement between the estimated c.d.f. of T and the observed c.d.f. is
good. Table 4 records the results of perturbing a model by changing the mean of
the regressors from 0 to 10. There is still close agreement between the estimated
model (with parameters estimated on a sample where X - N(O,l)) and the
observed durations (where X - N(10, 1)).
The NPMLE can be used to check the plausibility of any particular parametric
specification of the distribution of unobserved variables. If the estimated parame-
ters of a structural model achieved from a parametric specification of the
distribution of unobservables are not “too far” from the estimates of the same
parameters achieved from the NPMLE, the econometrician would have much
more confidence in adopting a particular specification of the mixing distribution.
Development of a formal test statistic to determine how far is “too far” is a topic
for the future. However, because of the consistency of the nonparametric maxi-
mum likelihood estimator a test based on the difference between the parameters
of Z(t) and cp(x) estimated via the NPMLE and the same parameters estimated
under a particular assumption about the functional form of the mixing distribu-
tion would be consistent.
1126 J. J. Heckman and B. Singer

Table 3
Results from a typical estimation

dp(0) = [exp(AB)exp-(ee//3)dB]/r(1/2) with A =1/2 /3 =l


True model a, =l a,=1
Estimated model 0.9852 0.9846
(0.0738)* (0.1022)*
where Z(t) = t”l and v(x) = exp(azx)
Sample size L = 500
Log likelihood - 1886.47

Estimated mixing distribution


Estimated 9, Estimated P, Estimated c.d.f. True c.d.f Observed c.d.f.

- 12.9031 0.008109 0.008109 0.001780 0.0020


- 7.0938 0.06524 0.07335 0.03250 0.0400
- 4.0107 0.1887 0.2621 0.1510 0.1620
- 1.7898 0.3681 0.6302 0.4366 0.4280
- 0.0338 0.3698 1.000 0.8356 0.8320

Estimated cumulative distribution of duration vs. actual (6(t) vs. G(t))


Value of t Estimated t c.d.f. Observed c.d.f.

0.25 0.1237 0.102


0.50 0.2005 0.186
1.00 0.3005 0.296
3.00 0.4830 0.484
5.00 0.5661 0.556
10.00 0.6675 0.660
20.00 0.7512 0.754
40.00 0.8169 0.818
99.00 0.8800 0.880

*The numbers reported below the estimates are standard errors from the estimated information
matrix for (a, P, 0) given I*. As noted in the text these have no rigorous justification.

Table 4
Predictions on a fresh sample, X - N(10, 1)
(The model used to fit the parameters is X - N(O,l)).

Estimated cumulative distribution of duration vs. actual t?(t) vs. G(t)


Value of t Estimated t c.d.f. Observed c.d.f.
(X105)

1.0 0.1118 0.1000


4.0 0.2799 0.2800
8.0 0.3924 0.3920
10.0 0.4300 0.4360
25.0 0.5802 0.5740
100.0 0.7607 0.7640
300.0 0.8543 0.8620
5000.0 0.9615 0.9660
Ch. 29: Econometric Analysis of Lmgitudinal Data 1121

The fact that we produce a good estimator of the structural parameters while
producing a poor estimator for p suggests that it might be possible to protect
against the consequences of m&specification of the mixing distribution by fitting
duration models with mixing distributions from parametric families, such as
members of the Pearson system, with more than the usual two parameters. Thus
the failure of the NPMLE to estimate more than four or five points of increase for
p can be cast in a somewhat more positive light. A finite mixture model with five
points of increase is, after all, a nine (independent) parameter model for the
mixing distribution. Imposing a false, but very flexible, mixing distribution may
not cause much bias in estimates of the structural coefficients. Morever, for small
I*, computational costs are lower for the NPMLE than they are for traditional
parametric maximum likelihood estimators of ~(8). The computational costs of
precise evaluation of ~(0) over “small enough” intervals of 13 are avoided by
estimating a finite mixtures model.
We conclude this section noting that the Arnold and Brockett (1983) estimator
for (Ydiscussed in Section 1.4.3 circumvents the need to estimate dp(B) and so in
this regard is more attractive than the estimator discussed in this subsection.
Exploiting the fact that t * is independent of X, it is possible to extend their
estimator to accommodate models with regressors. (The independence conditions
provide orthogonality restrictions from which it is possible to identify j3.) How-
ever, it is not obvious how to extend their estimator to deal with censored data.
Our estimator can be used without modification on censored data.

1.5. Sampling plans and initial conditions problems

There are few duration data sets for which the start date of the sample coincides
with the origin date of all sampled spells. Quite commonly the available data are
random samples of interrupted spells or else are spells that begin after the start
date of the sample. For interrupted spells one of the following duration times may
be observed: (1) time in the state up to the sampling date (Tb), (2) time in the
state after the sampling date (T,), or (3) total time in a completed spell observed
at the origin of the sample (T, = T, + Tb). Durations of spells that begin after the
origin date of the sample are denoted Td.
In this section we derive the density of each of these durations for time
homogeneous and time inhomogeneous environments and for models with and
without observed and unobserved explanatory variables. The main message of
this section is that in general the distributions of each of the random variables T,,
Tb, T, and Td differ from the population duration distribution G(t). Estimators
based on the wrong duration distribution in general produce invalid estimates of
the parameters of G(t) and will lead to incorrect inference about the population
duration distribution.
1728 J. J. Heckmanand B. Singer

1.5.1. Time homogeneous environments and models without observed and


unobserved explanatory variables’?

We first consider the analytically tractable case of a single spell duration model
without regressors and unobservables in a time homogeneous environment. To
simplify notation we assume that the sample at our disposal begins at calendar
time 0. Looking backward, a spell of length t, interrupted at 0 began t, periods
ago. Looking forward, the spell lasts t, periods after the sampling date. The
completed spell is t, = t, + t, in length. We ignore right censoring and assume
that the underlying distribution is nondefective. (These assumptions are relaxed in
Subsection 15.2 below.)
Let k( - tb) be the intake rate; i.e. t, periods before the sample begins, k( - tb)
is the proportion of the population that enters the state of interest at time
r = - t,. The time homogeneity assumption implies that

k(-t,)=k, Vt,. (1.5.1)

Let g(t) = h( t)exp( - /dh( u)du) be the density of completed durations in the
population.The associated survivor function is

s(t)=l-G(t)=exp(-c(u)du).

The proportion of the population experiencing a spell at calendar time r = 0, P,,,


is obtained by integrating over the survivors from each cohort i.e.

P,,lmk(_t~)(l-G(tb))dth=J”k(-tb)exp(-j~~h(u)du)dt,.
0 0 0

Thus the density of an interrupted spell of length t, is the ratio of the proportion
surviving from those who entered tb periods ago to the total stock

k(-t,)exp( -Joihh(u)du)
f(t,> = k(-t&-G@,)) = (1.5.2)
PII PO

Assuming m = jo”xg(x)dx -Ccc (and so ruling out defective distributions) and

“See Cox (1962), Cox and Lewis (1966), Sheps and Menken (1973), Salant (1977) and Baker and
Trevedi (1982) for useful presentations of time homogeneous models.
Ch. 29: Econometric Am&is of Longitudinal Data 1729

integrating the denominator of the preceding expression by parts, we reach the


familiar expression [see, e.g. Cox and Lewis (1966)]

f(t,) = (l-G(d) s(b) 1


=-=-exp(-Joihh(u)du) 1. (1.5.3)
m m m

The density of sampled interrupted spells is not the same as the population
density of completed spells.
The density of sampled completed spells is obtained by the following straight-
forward argument. In the population, the conditional density of t, given 0 < t, -c t,
is

dd
g(tch)= (1 _ @lb))
= h(t,)exp( - /%(u)du),
fb
t, > t,. (1.5.4)

Using (1.5.3), the marginal density of t, in the sample is

(1.5.5)

SO

The density of the forward time t, can be derived from (1.5.4). Substitute for tc
using t, = t, + t, and integrate out t, using density (1.5.3). Thus

f( t,) = /o”g( t, + t,lt,)_f( tb) dt, = lrng(to;


“)dt,
= !$,(z)dr= (1-E(t”))

_ S(t,) _ 4jd”h(+q (1.5.6)


m m

So in a time homogeneous environment the functional form of f(t,) is identical


to 0th).
1730 J. J. Heckman and B. Singer

The following results are well known about the distributions of the random
variables T,, Tb and T,.

(0 If g(t) is exponential with parameter 0 (i.e. g(t) = 0exp( - 10) then so are
f( t,) and f( th). The proof is immediate.
(ii) E(T,) = (m/2)(1 +(a2/m2))18
where a2=E(T-m)2= w(r - m)2g(t)dt.
J
(iii) E(T,) = (m/2)(1 +(u2,mz0))
(since T, and T, have the same density)
(iv) E(TJ = rn(l+(~~/rn~))‘~
so E(T,) = 2E(T,) = 2E(T,)
and E (T,) > m unless u 2 = 0.
(v) If (-ln(l-G(t))/t)t in t, u2/m2 >l.
(This condition is implied if h(t) = g(t)/1 - G(t) is increasing in f i.e.
h’(t) > 0). In this case, E(T,) = E(T,) > m. (See Barlow and Proschan, 1975
for proof).
(vi) If (-ln(l-G(t))/t)lt, u2/m2<1.
(This condition is implied if h’(t) < 0.) In this case E(T,) = E(T,) < m. (See
Barlow and Proschan, 1975 for proof).

Result (i) restates the classical result (see Feller, 1970) that if the population
distribution of durations is exponential so are the sample distributions of T, and
Tb. Result (iii) coupled with result (v) indicates that if the population distribution
of durations exhibits positive duration dependence, the mean of interrupted spells
( Tb) exceeds the population mean duration. Result (iii) coupled with (vi) reverses
this ordering for duration distributions with negative duration dependence. Result

Integrating by parts assuming that E(T2) = jFt2g(t)dr -C co, we obtain


Ch. 29: Economefric Analysis of Longitudinal Data 1731

(iv) indicates that sampled completed spells have a mean in excess of the
population mean unless a2 = 0 (hence the term “length biased sampling”) and
that completed spells have a mean twice that of interrupted (Tb) or partially
completed forward spells (T,).
We next present the distribution of Td, the duration time for spells that begin
after the origin date of the sample. Let F denote the time a spell begins. The
density of Y is k(r). Assuming that F and Td are independent the joint
probability that a spell begins at .F = r and lasts less than t, periods is

Pr{.F=randT,<t,}=k(r)G(t,).

Thus the density of Td in a time homogeneous environment is

f(t,> = g(CJ (1.57)

The distributions of T,, Tb and T, are of a different functional form than the
distribution of T. The only exception is the case in which T is an exponential
random variable with parameter X; in this case T, and Tb are also exponential
with parameter X. The distribution of Td has the same functional form as the
distribution of T.
Thus in a typical longitudinal sample in which data are available for the
completed portions of durations of spells in progress (T,) and on durations
initiated after the origin date of the sample (Td), two different distributions are
required to analyze the data.
It is common to “solve” the left censoring problem by assuming that G(t) is
exponential. The bias that results from invoking this assumption when it is false
can be severe. As an example suppose that the population distribution of t is
Weibull so

g(t) = acpta-kxp( - cpt*) cp> 0, a> 0. (1.5.8)

Suppose that the sample data are on the completed portions of interrupted spells
and that there is no right censoring so that using formula (1.56)

fw( - t”cp)
J-0,)= ,I \ 3
r ;+1
( 1
f
V-J
l/a+1

I
If it is falsely assumed that g*(t) = Xe- Ix, the maximum likelihood estimator of
1732 J. J. Heckman and B. Singer

X for a random sample of durations is

which has probability limit

r; 1
plim 1 = q~(‘/~)-. i
r;i i
For (Y= 2,

plim X = ((p)l’*I71/2).

As another example, suppose the sample being analyzed consists of complete


spells sampled at time zero (i.e. T,) generated by an underlying population
exponential density

g(t) = Aexp( - tx).

Then from (15.5)

f( t,) = X2t,exp( - At,).

If it is falsely assumed that g(t) characterizes the duration data and X is


estimated by maximum likelihood

plimj\=2A.

This is an immediate consequence of results (i) and (v) previously stated.


Continuing this example, suppose instead that a Weibull model is falsely
assumed, i.e.

g*(t) = cut”-lq exp( - t+)

and the parameters (Yand cp are estimated by maximum likelihood. The maxi-
Ch. 29: Econometric Analysis of Longitudina/ Data 1733

mum likelihood estimator solves the following equations,

i: ti”
1
-=
i=l

+ I’

C In tj @ f: (lnt,)tf
L+ i=l = i=l

& Z z ’

so

i In ti i tFlnti
I+ i=l = i=l
(1.59)
& I
it; .
i=l

Using the easily verified result that

Wp)
cctP-‘(lnt)exp(-th)dt=X-P F-lnXr(P) ,
/0

and the fact that in large samples plim &= CY*is the value of (Y* that solves
(1.5.9), (Y*is the solution to

E(t”*lnt)
-$+E(lnt)=
E(t”*) ’

and we obtain the’equation

-$+( K$J~F_2-lnx) = ( Y&(P) Ip_a*+2-hrh). (1.5.10)

Using the fact that

r’(P+l) 1 + T’(P)
r(P+l) =F z-(P)’
1734 J. J. Heckman and B. Singer

and collecting terms, we may rewrite (1.5.10) as

1 + 8QP) _ 1 8QP) (1.511)


a*(a*+l) ap p=2 r(p) ap P=a*+l’

Since r(2) = 1, it is clear that (Y*= 1 is never a solution of this equation. In fact,
since the left hand side is monotone decreasing in ar* and the right hand side is
monotone increasing in (Y*, and since at OL*= 1, the left hand side exceeds the
right hand side, the value of (Y* that solves (1.5.11) exceeds unity. Thus if a
Weibull model is fit by maximum likelihood to length biased completed spells
generated by an exponential population model, in large samples positive duration
dependence will always be found, i.e. (Y*> 1.
It can also be shown that

A"*-'
plim + =
r(cu* +2) .

If the Weibull is fit to data on T, and Tb generated from an exponential


population, LX*= 1.
These examples dramatically illustrate the importance of recognizing the im-
pact of the sampling plan on the distribution of observed durations. As a general
proposition only the distribution of Td -the length of spells initiated after the
origin date of the sample-is invariant to the sampling plan. As a short cut, one
can obtain inefficient but consistent estimates of G(t) by confining an empirical
analysis to such spells.
However, in the presence of unobserved variables this strategy will in general
produce inconsistent parameter estimates. We turn next to consider initial condi-
tions problems in models with observed and unobserved explanatory variables.

1.5.2. The densities of T,, Tb, T, and Td in time inhomogeneous environments for
models with observed and unobserved explanatory variables

We define k( r1x( r), 0) to be the intake rate into a given state at calendar time r.
We assume that 0 is a scalar heterogeneity component and x(r) is a vector of
explanatory variables. It is convenient and correct to think of k( rlx( r), 6) as the
density associated with the random variable F for a person with characteristics
(x(r), 8). We continue the useful convention that spells are sampled at 7 = 0.
The densities of T,, Tb, T, and Td are derived for two cases: (a) conditional on a
sample path { x(u)}’ o. and (b) marginally on the sample path { x(u)}’ m, (i.e.
integrating it out). We denote the distribution of {x(u)}’ o. as D(x) with
associated density dD( x).
Ch. .?9: Econometric Analysis of Longitudinal Data 1735

The derivation of the density of Tb conditional on { X(U)}‘! co is as follows. The


proportion of the population in the state at time 7 = 0 is obtained by integrating
over the survivors of each cohort of entrants. Thus

Note that, unlike the case in the models analyzed in Section 1.5.1, this integral
may exist even if the underlying distribution is defective provided that the k( .)
factor damps the survivor function. We require

lim j711+Ek(r)S(7) =0 for a> 0.


S”P
7+-cc

The proportion of people in the state with sample path {x(u)}? m whose spells
are exactly of length t, is the set of survivors from a spell that initiated at
7= -t, or

Thus the density of T, conditional on {x(u)}‘? o. is

&-f&+tJ.@)exp( -p(ulx(u- th).e)du)dc(e)


=
4l(4
(1.5.13)

The marginal density of Tb (integrating out X) is obtained by an analogous


argument: Divide the marginal flow rate as of time 7 = - t, (the integrated flow
rate) by the marginal (integrated) proportion of the population in the state at
7 = 0.
Thus defining

P,,= / $+~(x),
1736 J. J. Heckmun and B. Singer

where X is the domain of integration for x, we write

X exp (-ld”h(ulx(u-t,),8)du)dp(e)dD(x)
f(b) = (1.514)
PO

Note that we use a function space integral to integrate out {x(u)}‘?,. [See Kac
(1959) for a discussion of such integrals.] Note further that one obtains an
incorrect expression for the marginal density of Tb if one integrates (5.13) against
the population density of x(dD(x)). The error in this procedure is that the
appropriate density of x against which (1.5.13) should be integrated is a density
of x conditional on the event that an observation is in the sample at r = 0. By
Bayes’ theorem this density is

which is not in general the same as the density dD(x). For proper distributions
for Th,

f(xITh>O)=dD(x)F.
0

The derivation of the density of T,, the completed length of a spell sampled at
Y = 0 is equally straightforward. For simplicity we ignore right censoring prob-
lems so that we assume that the sampling frame is of sufficient length that all
spells are not censored and further assume that the underlying duration distribu-
tion is not defective. (But see the remarks at the conclusion of this section.)
Conditional on {x(u)}? o3 and 8 the probability that the spell began at r is

The conditional density of a completed spell of length t that begins at r is

h(tlX(7+t),8)exp(-j051(21lX(7+u),8)du).

For any fixed r I 0, t, by definition exceeds - 7. Conditional on x, the


probability that T, exceeds r is P,(x). Thus, integrating out 7, respecting the fact
Ch. 29: Econometric Analysis of Longiiudinal Data

that t,> -r

X exp (-jni’h(u,x(r+u)$)du)dC(B)d7
f( fcl{44L) = . (1.515)
POW

The marginal density of T, is

/“,
_ c

X exp (-jb’h(ulx(7++9)du)dC(B)dD(x)dT
f(G) = (1.516)
PO

Ignoring right censoring, the derivation of the density of T, proceeds by


recognizing that T, conditional on 7 I 0 is the right tail portion of random
variable - Y + T,, the duration of a completed spell that begins at F = 7. The
probability that the spell is sampled is P,,(x). Thus the conditional density of
T, = t, given { x(u)}? o. is obtained by integrating out 7 and correctly condition-
ing on the event that the spell is sampled, i.e.

/” J+i+),
_coo v4t, - +a),e)
X exp (-f-‘h(Ulx(U+r),@)dU)dB(B)d7
f(Llc+4 “J = >
pow
(1.517)

and the corresponding marginal density is

JomJXJ$(~14~P)h(L
_ - 44L)J)

X exp (-r-‘h(~(X(u+r)$)du)dp(8)dD(x)dr
f kJ = . (1.5.18)
PO

Of special interest is the case k( r/x, t9) = k(x) in which the intake rate does not
depend on unobservables and is constant for all 7 given x, and in which x is time
1738 J. J. Heckman and B. Singer

invariant. Then (1.5.13) specializes to

(1.5.13’)

where

m(x)=/X/exp(-Jk(ulx,8)du)dp(8)d~.
0 e 0

This density is very similar to (1.5.3). Under the same restrictions on k and x,
(1.5.15) and (1.5.17) specialize respectively to

(1.5.15’)

which is to be compared to (1.5.5), and

f(t Ix) = j)q(- ~(UIXJP+-W


2 (1.5.17’)
m(x)
(1

which is to be compared to (1.5.6). For this special case all of the results (i)-(vi)
stated in Section 1.5.1 go through with obvious redefinition of the densities to
account for observed and unobserved variables.

It is only for the special case of k(rlx, 0) = k(r)x) with time invariant regressors
that the densities of T,, Tb and T, do mot depend on the parameters of k.

In order to estimate the parameters of h( t Jx,0) from data on TO, Tb or T,


gathered in a time inhomogeneous environment for a model with unobservables,
knowledge of k is required. As long as 0 appears in the conditional hazard and k
depends on B or 7 or if x is not time invariant, k must be specified along with
p(8) and h(t(x, 8).
The common expedient for “solving” the initial conditions problem for the
density of T, -assuming that G(t(x, 0) is exponential-does not avoid the depen-
dence of the density of T, on k even if k does not depend on t9 as long as it
Ch. 29: Econometric Analysis of Longitudinal Data 1739

depends on r or x( 7) where X(T) is not time invariant. Thus in the exponential


case in which h(ulx(u + T), 0) = h(x(u + r), e), we may write (1.5.17) for the
case k = k(Tlx(7)) as

Only if h(x( u + T), 6) = h(x( u + r)), so that unobservables do not enter the
model (or equivalently that the distribution of 0 is degenerate), does k cancel in
the expression. In that case the numerator factors into two components, one of
which is the denominator of the density. “k” also disappears if it is a time
invariant constant that is functionally independent of e.*O
At issue is the plausibility of alternative specifications of k. Although nothing
can be said about this matter in a general way, for a variety of economic models,
it is plausible that k depends on 8, r and x( 7) and that the x are not time
invariant. For example, in a study of unemployment spells over the business
cycle, the onset of a spell of unemployment is the result of prior job termination
or entry into the workforce. So k is the density of the length of a spell resulting
from a prior economic decision. The same unobservables that determine unem-
ployment are likely to determine such spells as well. In addition, it is odd to
assume a time invariant general economic and person specific environment in an
analysis of unemployment spells: Aggregate economic conditions change, and
person specific variables like age, health, education and wage rates change over
time. Similar arguments can be made on behalf of a more general specification of
k for most economic models.

20We note that one “short cut” procedure frequently used does not avoid these problems. The
argument correctly notes that conditional on 0 and the start date of the sample

This expression obviously does not depend on k. The argument runs astray by integrating this
expression against dp(B) to get a marginal (with respect to 0) density. The correct density of B is not
dp( 0) and depends on k by virtue of the fact that sample 0 are generated by the selection mechanism
that an observation must be in the sample at r = 0. Precisely the same issue arises with regard to the
distribution of x in passing from (1.5.13) to (15.14). However, density (*) can be made the basis of a
simpler estimation procedure in a multiple spell setting as we note below in Section 2.2.
1740 J. J. Heckman and B. Singer

The initial conditions problem for the general model has two distinct compo-
nents.

(i) The functional form of k( 71x( r), 8) is not in general known. This includes as
a special case the possibility that for some unknown 7* < 0, k(T[x(T), 0) = 0
for 7 -Cr*. In addition, the value of r* may vary among individuals so that if
it is unknown it must be treated as another unobservable.
(ii) If x( 7) is not time invariant, its value may not be known for r < 0 so that
even if the functional form of k is known, the correct conditional duration
densities cannot be constructed.

These problems exacerbate the problem of securing model identification. As-


sumptions made about the functional form of k and the presample values of x( 7)
inject a further source of arbitrariness into single spell model specification. Even
if x( 7) is known for r I 0, k, p and h cannot all be identified nonparametrically.
The initial conditions problem stated in its most general form is intractable.
However, various special cases of it can be solved. For example, suppose that the
functional form of k is known up to some finite number of parameters, but
presample values of x(r) are not, If the distribution of these presample values is
known or can be estimated, one method of solution to the initial conditions
problem is to define duration distributions conditional on past sample values of
X(T) but marginal on presample values, i.e. to integrate out presample x( 7) from
the model using the distribution of their values._This suggests using (15.14) rather
than (15.13) for the density of Tb. In place of either (15.15) or (1.5.16) for the
density of T,, this approach suggests using

0
/ -1, /Ie (x(7):7<0)
k(7lX(7),e)h(t,lX(t,+dJ>
Xexp ( - ~(z+(r+u)J)du)dD(x)d/@)d~
f(~,lw4~r;> = 9
PO

(1.5.19)

with a similar modification in the density of T,.


This procedure requires either that the distribution of presample x(r) be
known or else that it be estimated along with the other functions in the model.
The latter suggestion complicates the identification problem one further fold. The
former suggestion requires either access to another sample from which it is
possible to estimate the distribution of presample values of x or else that it be
possible to use within sample data on x to estimate the distribution of the
Ch. 29: Economeiric Anulysis of Longitudinul Data 1141

presample data, as would be possible, for example, if presample and within


sample data distributions differed only by a finite order polynomial time trend.
Recall, however, that the distribution of x within the sample is not the
distribution of x in the population, D(x). This is a consequence of the impact of
the sample selection rule on the joint distribution of x and T.21 The distribution
of the x within sample depends on the distribution of 8, and the parameters of
h( tlx, 8) and the presample distribution of x. Thus, for example, the joint density
of T, and x for r > 0 is

k(7lx(7),e)h(t,-QIx(t,),B)
dD(x)j_ot Ibx: 7 < 0)
Xexp ( - ~-‘h(ux(.+~),s)d.)do(x)dp(8)d*
f(tu,x(r)lr20)= 7
PO

(1.5.20)

so, the density of within sample x(r) is

It is this density and not dD(x) that is estimated using within sample data on x.
This insight suggests two further points. (1) By direct analogy with results
already rigorously established in the choice based sampling literature (see, e.g.
Manski and Lerman, 1977; Manski and McFadden, 1981, and Cosslett, 1981)
more efficient estimates of the parameters of h(tlx, e), and p(B) can be secured
using the joint densities of T, and x since the density of within sample data
depends on the structural parameters of the model as a consequence of the sample
selection rule. (2) Access to other sources of data on the x will be essential in
order to “integrate out” presample x via formulae like (1.5.19).
A partial avenue of escape from the initial conditions problem exploits T,, i.e.
durations for spells initiated after the origin date of the sample. The density of Td

” Preciselythe same phenomenonappearsin the choice based sampling literature (see, e.g. Manski
and Lerman, 1977, Manski and McFadden, 1981 and Cosslett, 1981). In fact the suggestion of
integrating out the missing data is analogous to the suggestions offered in Section 1.7 of the Manski
and McFadden paper.
1142 J. J. Heckman and B. Singer

conditional on { x( u)}~ld+Q where rd > 0 is the start date of the spell is

Xexp ( - ld.‘h(ulX(*+U),B)dlI)dr(B)d7
r(LlW41:‘“) =
IwJk(71X(7),s)dp(e)d7 . (1*5*21)
0 8

The denominator is the probability that Y 2 0. Only if k does not depend on 0


will the density of Td not depend on the parameters of k. More efficient inference
is based on the joint density of Y and t,

Xexp ( - Jddh(ulx(7fu),B)du)dp(B)d7
f(t,A{+G}:+“) =

(1.5.22)

Inference based on (1.521) or (1.5.22) requires fewer a priori assumptions than


are required to use data on T,, Tb, or T,. Unless x is specified to depend on
lagged values of explanatory variables, presample values of x are not required.
Since the start dates of spells are known, it is now in principle possible to estimate
k nonparametrically. Thus in samples with spells that originate after the origin
date of the sample, inference is more robust.
As previously noted, the densities of the durations of T,, Tb, T, and Td are in
general different. However they depend on a common set of parameters. In
samples with spells that originate after the start date of the sample, these cross
density restrictions aid in solving the initial conditions problem because the
parameters estimated from the relatively more informative density of Td can be
exploited to estimate parameters from the other types of duration densities.
Before concluding this section, it is important to recall that we have abstracted
from the problems raised by a finite length sample frame and the problems of
right censoring. If the sampling frame is such that r* > Y > 0, for example, the
formulae for the durations of T,, T, and Td presented above must be modified to
account for this data generation process.
For example, the density of measured completed spells that begin after the start
date of the sample incorporates the facts that 0 I 7 I r* and Td s T* - 7, i.e.
that the onset of the spell occurs after r = 0 and that all completed spells must be
of length r * - Y or less. Thus in place of (5.21) we write (recalling that rd is the
Ch. 29: Econometric Analysis of Longitudinal Data 1743

start date of the spell)

&lM4):+“? Tdsr*-F, 720)

xexp ( - l~~~(ulx(7+u),B)du)dp(8)d~d~~
0

The denominator is the joint probability of the events 0 < Y < r* - Td and
0 < Td < T* which must occur if we are to observe a completed spell that begins
during the sampling frame 0 < Y < r *. As r* + cc, this expression is equivalent
to the density in (5.21).
The density of right censored spells that begin after the start date of the sample
is simply the joint probability of the events 0 < Y < r* and Td > 7* - .7, i.e.

P(0< 9- <~*AT~>~*--~{x(u)}~*)

The modifications required in the other formulae presented in this subsection to


account for the finiteness of the sampling plan are equally straightforward. For
spells sampled at r = 0 for which we observe presample values of the duration
and post sample completed durations (T,), it must be the case that (a) 9 < 0 and
(b) r * - 7 >- Tc-> - 7 where r * > 0 is the length of the sampling plan. Thus in
place of (1.5.15) we write

Xexp ( - p(uix(r+ u),B)du)d/@)dr

= /” J’*-‘~(rix(r),e)h(t,1x(7+ t,),e) .
-52 --7
Xexp ( - /%(UIx(r + u)J)du)d&i’)di,dr
0
1744 J. J. Heekman and B. Singer

The denominator of this expression is the joint probability of the events that
- .7 < T, < T* - 7 and Y I 0. For spells sampled at r = 0 for which we observe
presample values of the duration and post-sample right censored durations, it must
be the case that (a) Y < 0 and (b) T, 2 7 * - F so the density for such spells is

The derivation of the density for T, in the presence of a finite length sample
frame is straightforward and for the sake of brevity is deleted. It is noted in
Sheps-Menken (1973) (for models without regressors) and Flinn-Heckman
(1982b) (for models with regressors) that failure to account for the sampling
frame produces the wrong densities and inference based on such densities may be
seriously misleading.

1.6. New issues that arise in formulating and estimating choice theoretic duration
models

In this section we briefly consider new issues that arise in the estimation of choice
theoretic duration models. For specificity, we focus on the model of search
unemployment in a time homogeneous environment that is presented in Section
1.2.2. Our analysis of this model serves as a prototype for a broad class of
microeconomic duration models produced from optimizing theory.
We make the following points about this model assuming that the analyst has
access to longitudinal data on I independent spells of unemployment.

(4 Without data on accepted wages, the model of eqs. (1.2.10)-(1.2.21) is


hopelessly underidentified even if there are no regressors or unobservables in
the model.
(W Even with data on accepted wages, the model is not identified unless the
distribution of wage offers satisfies a recoverability condition to be defined
below.
cc>For models without unobserved variables, the asymptotic theory required to
analyze the properties of the maximum likelihood estimator of the model is
nonstandard.
CD)Allowing for individuals to differ in observed and unobserved variables
injects an element of arbitrariness into model specification, creates new
Ch. 29: Econometric Analysis of Longitudinal Data 1745

identification and computational problems, and virtually guarantees that the


hazard is not of the proportional hazards functional form.
(E) A new feature of duration models with unobservables produced by optimiz-
ing theory is that the support of 0 now depends on parameters of the model.

We consider each of the points in turn.

1.6.1. Point A

From a random sample of durations of unemployment spells in a model without


observed or unobserved explanatory variables, it is possible to estimate h, (in eq.
(1.2.15)) via maximum likelihood or Kaplan-Meier procedures (see, e.g.
Kalbfleisch and Prentice, 1980, pp. 10-16). It is obviously not possible using such
data alone to separate h from (1 - F(rV)) much less to estimate the reservation
wage rV.

1.6.2. Point B

Given access to data on accepted wage offers it is possible to estimate the


reservation wage rV. A consistent estimator of rV is the minimum of the accepted
wages observed in the sample

Z= min{ Wi}I=,. (1.6.1)

For proof see Flinn and Heckman (1982a).


Access to accepted wages does not secure identification of F. Only the
truncated wage offer distribution can be estimated:

F(wlw>rV)= “‘;I’i(‘I;“‘, w>rV.


r

To recover an untruncated distribution from a truncated distribution with a


known point of truncation requires further conditions. If F is normal, such
recovery is possible. If it is Pareto, it is not. 22A sufficient condition that ensures
recoverability is that F(w) be real analytic over the support of W so that by an
analytic continuation argument, F(w) can be continued outside of the region of
truncation.23 In the Pareto example, the support of W is unknown.

“Thus if F(w)=cpwS, c,~w~rn, /IS-2, where


cp=-(j3+1)/(~)p+~, F(wlw>rV)=
- (/3 + 1) wP/( r V)B’ ’ so q (or c2) does not appear in the conditional distribution.
23For a good discussion of real analytic functions, see Rudin (1974). If a function is real analytic,
knowledge of the function over an interval is sufficient to determine the function over its entire
domain of definition.
1746 J. J. Heekman and B. Singer

If the recoverability condition is not satisfied, it is not possible to determine F


even if rV can be consistently estimated. Hence it is not possible to decompose h u
in (1.2.15) into its constituent components.
If the recoverability condition is satisfied, it is possible to consistently estimate
F, X and rV. From (1.2.13), it is possible to estimate a linear relationship between
r and c. The model is identified only by restricting r or c in some fashion. The
most commonly used assumption fixes r at a prespecified value.

1.6.3. Point C

Using density (1.2.18) in a maximum likelihood procedure creates a non-standard


statistical problem. The range of random variable W depends on a parameter of
the model (W 2 rV). For a model without observed or unobserved explanatory
variables, the maximum likelihood estimator of rV is in fact the order statistic
estimator (1.6.1). The likelihood based on (1.2.18) is monotonically increasing in
rV, so that imposing the restriction that W 2 rV is essential in securing maximum
likelihood estimates of the model. Assuming that the density of W is such that
f( r V) # 0, the consistent maximum likelihood estimator of the remaining parame-
ters of the model can be obtained by inserting rP in place of rV everywhere in
(1.2.18) and the sampling distribution of this estimator is the same whether or not rV
is known a priori or estimated. For a proof, see Flinn and Heckman (1982a). In a
model with observed explanatory variables but without unobserved explanatory
variables, a similar phenomenon occurs. However, at the time of this writing, a
rigorous asymptotic distribution theory is available only for models with discrete
valued regressor variables which assume a finite number of values.

1.6.4. Point D

Introducing observed and unobserved explanatory variables into a structural


duration model raises the same sort of issues about ad hoc model specifications
already discussed in the analysis of reduced form models in Section 1.3. However,
there is the additional complication that structural restriction (1.2.13) produced
by economic theory must be satisfied. One is not free to arbitrarily specify the
parameters of the model.
It is plausible that c, r, X and F in (1.2.13) all depend on observed and
unobserved explanatory variables. Introducing such variables into the economet-
ric search model raises three new problems.

(i) Economic theory provides no guidance on the functional form of the c, r, h


and F functions (other than the restriction given by (1.2.13)).24 Estimates

“As discussed in Flinn and Heckman (1982a). some equilibrium search models place restrictions on
the functional form of F.
Ch. 29: Econometric Analysis of Longitudinal Data 1141

secured from these models are very sensitive to the choice of these functional
forms. Model identification is difficult to check and is very functional form
dependent.
(ii) In order to impose the restrictions produced by economic theory to secure
estimates, it is necessary to solve nonlinear eq. (1.2.13). Of special impor-
tance is the requirement that I’> 0. If this restriction is not satisfied, the
model cannot explain the data. If V < 0, an unemployed individual will not
search. Closed form solutions exist only for special cases and in general
numerical algorithms must be developed to impose or test these restrictions.
Such numerical analysis procedures are costly even for a simple one spell
search model and for models with more economic content often become
computationally intractable. (One exception is a dynamic McFadden model
with no restrictions between the choice and interarrival time distributions.)
(iii) Because of restrictions like (1.2.13), proportional hazard specifications (1.1.10)
are rarely produced by economic models.

1.65. Point E

In the search model without unobserved variables, the restriction that W > rV is
an essential piece of identifying information. In a model with unobservable 0
introduced in c, r, X or F, rV= rV(19) as a consequence of functional restriction
(1.2.13). In this model, the restriction that W 2 rV is replaced with an implicit
equation restriction on the support of 0; i.e. for an observation with accepted
wage W and reservation wage rV(Q, the admissible support set for 0 is

{e: Oa%-@)I w}.


This set is not necessarily connected.
The left hand side of the inequality states the requirement that must be satisfied
if search is undertaken (rV> 0 for r > 0). The right hand side of the inequality
states the requirement that accepted wages must exceed reservation wages. Unless
this restriction is imposed on the support of 0,. the structural search model is not
identified. (See Flinn and Heckman, 1982a.)25
Thus in a duration model produced from economic theory not only is the
conditional hazard h(t 1x( t), 8) unlikely to be of the proportional hazard func-
tional form, but the support of 0 will depend on parameters of the model. The
mixing distribution representations presented in Section 1.4.4 above are unlikely
to characterize structural duration models. Accordingly, the nonparametric iden-

25 Kiefer and Neumann (1981) fail to impose this requirement in their discrete time structural search
model so their proposed estimator is inconsistent. See Flinn and Heckman, 1982~.
1748 J. J. Heckman and B. Singer

tification and estimation strategies presented in Section 1.4 require modification


before they can be applied to explicit economic models.

2. Multiple spell models

The single spell duration models discussed in Part I are the principal building
blocks for the richer, more behaviorally interesting models presented in this part
of the paper. Sequences of birth intervals, work histories involving movements
among employment states, the successive issuing of patents to firms and individ-
ual criminal victimization histories are examples of multiple spell processes which
require a more elaborate statistical framework than the one presented in Part I.
In this part of the paper we confine our attention to new issues that arise in the
analysis of multiple spell data. Issues such as the sensitivity of empirical estimates
to ad hoc specifications of mixing distributions and initial conditions problems
which also arise in multiple spell models are not discussed except in cases where
access to multiple spell data aid in their resolution.
This part of the paper is in two sections. In Section 2.1 we present a unified
statistical framework within which a rich variety of discrete state continuous time
processes can be formulated and analyzed. We indicate by example how special-
izations of this framework yield a variety of models, some of which already
appear in the literature. We do not present a complete analysis of multiple spell
processes including their estimation and testing on data generated by various
sampling processes because at the time of this writing too little is known about
this topic.
Section 2.2 considers in somewhat greater detail a class of multiple spell
duration models that have been developed for the analysis of event history data.
In this Section we also consider some alternative approaches to initial conditions
problems and some alternative approaches to controlling for unobserved variables
that are possible if the analyst has access to multiple spell data.

2. I. A unified framework

2.1.1. A general construction

To focus on main issues, in this section we ignore models with unobserved


variables. We retain the convention that the sample at our disposal starts at
calendar time 7 = 0.
Let {Y(r),r>O}, YEN where I= {l,...,C}, C<cc, be a finite state
continuous time stochastic process. We define random variable R(j), j E
C/I. 29: Econometric Anulysis of Longitudinal Duta 1149

{ 1,. . . , CQ} as the value assumed by Y at the jth transition time. Y( 7) or R(j) is
generated by the following sequence.
(i) An individual begins his evolution in a state Y(0) = R(0) = r(0) and waits
there for a random length of time Tr governed by a conditional survivor function

P(7’I>t,~r(0))=exp(-/“h(u~~(u),r(0))du).
0

As before h(ulx(u), r(0)) is a calendar time (or age) dependent function and we
now make explicit the origin state of the process.
(ii) At time T(1) = r(l), the individual moves to a new state R(1) = r(1)
governed by a conditional probability law

I = 4)la, 40))~
which may also be age dependent.
(iii) The individual waits in state R(1) for a random length of time T, governed
by

P(T, >t21~(1),r(l),r(0)) =exp(


-~(ulx(u+7o),r(l),r(O))d
Note that one coordinate of X(U) may be u + r(l), and that F(2)- F(1) = T2.
At the transition time y(2) = r(2) he switches to a new state R(2) = r(2) where
the transition probability

ww = 4m4), 7m 4L 40),

may be calendar time dependent.


Continuing this sequence of waiting times and moves to new states gives rise to
a sequence of random variables

R(0) = r(O), 9-(l) = r(l), R(1) = r(l), 372) = r(2), R(2) = Y(2),...

and suggests the definitions

Y(T) = R(k) for T(k)seT(k+l),

where R(k), k = 0,1,2 ,... is a discrete time stochastic process governed by the
conditional probabilities

J’@(k) = dk)lt,o q--l),


1750 J. J. Heckmun nnd B. Singer

where

r,= (&,...dk) and rk_l= (~(O),...,r(k-1)).

T, = F(k)- .F( k - 1) is governed by the conditional survivor function,

2.1.2. Specializations of interest

We now present a variety of special cases to emphasize the diversity of models


encompassed by the preceding construction.
2.1.2. I. Repeated events of the same kind. This is a one state process, e.g. births
in a fertility history. R( .) is a degenerate process and attention focuses on the
sequence of waiting times T,, T,, . . . .
One example of such a process writes

p(T, > t&,-J = exp( -


0
dk-l)))du).
/‘*hk(UIX(U+
The hazard for the kth interval depends on the number of previous spells. This
special form of dependence is referred to as occurrence dependence. In a study of
fertility, k - 1 corresponds to birth parity for a woman at risk. Heckman and
Borjas (1980) consider such models for the analysis of unemployment.
Another variant writes the hazard of a current spell as a function of the mean
duration of previous spells, i.e. for spell j > 1

h(ulx(u+,T(j-1)), t,_,)=h u[

[See, e.g. Braun and Hoem (1979).]


Yet another version of the general model writes for the jth spell

This is a model with both occurrence dependence and lagged duration depen-
dence, where the latter is defined as dependence on lengths of preceding spells.
A final specification writes

h(ujw(u+7(j-l)),til)=h(x(u+7(j-1))).
C/I. 29: Econometric Anu!vsis of Longitudinal Duta 1751

For spell j this is a model for independent non-identically distributed durations;


and Y(r) is a nonstationary renewal process.
2.1.2.2. Multistate processes. Let

where

lhjll = MY

is a finite stochastic matrix

P(T, > t,lt,-,, vl) = ev(- +k-l)tJr

where the elements of {Xi} are positive constants. Then Y( 7) is a time homoge-
neous Markov chain with constant intensity matrix

Q=A(M-I)

where

Xl 0
A=

I !
0
-.

.A,
,

and C is the number of states in the chainz6


In the dynamic McFadden model for a stationary environment presented in
Section 1.2.3, M has the special structure mij = m,j = P, for all i and 1; i.e. the
origin state is irrelevant in determining the destination state. This restricted model
can be tested against a more general specification.27
A time inhomogeneous semi-Markov process emerges as a special case of the
general model if we let

P(R(k)= r(k)lh, rk-lydk -1)) = qk-l),r(k)(dk), tk),


26Note that without further restrictions on the elements of M, it is not possible to separate A, from
(m,, - 1) so that one might as well normalize mii = 0.
“Note that in the McFadden model it is not necessary to normalize rn,, = 0 to identify A, because
of the cross row restrictions on M.
1752 J. J. Heckmun und B. Singer

where

is a two parameter family of time (T) and duration (u) dependent stochastic
matrices with each element a function 7 and u and

m;, = 0.

We further define

P(T, > t/J,-,, f-k-1, T(k-L))=exp( -l’~h(ulro_,,,.r(k-l))du).


0

With this restricted form of dependence, Y(r) is a time inhomogeneous semi-


Markov process. (Hoem, 1972, provides a nice expository discussion of such
processes.)
Moore and Pyke (1968) consider the problem of estimating a time inhomoge-
neous semi-Markov model without observed or unobserved explanatory variables.
The natural estimator for a model without restrictions connecting the parameters
of P(R(k) = 4k)llk, r&l, r(k - 1)) and P(Fk > tkltk_l, rkpl, ~(k - 1)) breaks
the estimation into two components.

(i) Estimate I7 by using data on transitions from i to j for observations with


transitions having identical (calendar time 7, duration u) pairs. A special case
of this procedure for a model with no duration dependence in a time
homogeneous environment pools i to j transitions for all spells to estimate
the components of M (see also Billingsley, 1961). Another special case for a
model with duration dependence in a time homogeneous environment pools i
to j transitions for all spells of a given duration.
(ii) Estimate P(T, > tkltk_l, rk_l, ~(k -1)) using standard survival methods (as
described in Section 1.3 or in Lawless (1982)) on times between transitions.

These two estimators are consistent, asymptotically normal, and efficient and
are independent of each other as the number of persons sampled becomes large.
There is no efficiency gain from joint estimation. The same results carry over if 11
and P(T, > tkltk_l, rk_l, T(k -1)) are parameterized (e.g. elements of I7 as a
logit, P( Tk > t,J.) as a general duration model) provided, for example, the
regressors are bounded iid random variables. The two component procedure is
efficient. However, if there are parameter restrictions connecting I7 and the
conditional survivor functions, the two component estimation procedure produces
C/L 19: Econometric Anulysis of Longiiudinnl Datu 1153

inefficient estimators. If II and the conditional survivor functions depend on a


common unobservable, a joint estimation procedure is required to secure a
consistent random effect estimator.

2.2. General duration models for the analysis of event history data

In this section we present a multistate duration model for event history data, i.e.
data that give information on times at which people change state and on their
transitions. We leave for another occasion the analysis of multistate models
designed for data collected by other sampling plans. This is a major area of
current research.
An equivalent way to derive the densities of duration times and transitions for
the multistate processes described in Section 2.1 that facilitates the derivation of
the likelihoods presented below is based on the exit rate concept introduced in
Part 1. An individual event history is assumed to evolve according to the following
steps.
(i) At time r = 0, an individual is in state rcO,= (i), i = 1,. . ., C. Given oc-
cupancy of state i, there are Ni I C - 1 possible destinations.28 The limit (as
At + 0) of the probability that a person who starts in i at calendar time r = 0
leaves the state in interval (tl, t, + At) given regressor path {x( ~)}a’“’ and
unobservable 8 is the conditional hazard or escape rate

P( t, -c TI < t, + Atlr,, = (i), F(O) = 0, +I), 6, q ’ tJ


lim
At+0 At

= h(t,lrc,j = (i), F(O) = 0, +I), 6). (2.2.1)

This limit is assumed to exist.


The limit (as At + 0) of the probability that a person starting in rco, = (i) at
time r(O) leaves to go to j # i, j E N, in interval (tl, t, + At) given regressor path
{ x(u)}~,+” and 0 is

lim P(t,<T,<t,+At,R(l)=jlr(,,=(i),~(O)=O,x(t,),8,T,2t,
At40 At

=h(t,,_ilr,,= (i),y(0)=0,x(tI),8). (2.2.2)

*‘If some transitions are prohibited then N, < C - 1.


1754 J. J. Heckman und B. Singer

From the laws of conditional probability

z h(t,,jlq,,= (i>3-(O)=O,-WJ)
/=1

=h(t,lr(O,= (i>,-qO)=o,-+,),e).

(ii) The probability that a person starting in state i at calendar time r = 0


survives to 7’r = t, is (from the definition of the survivor function in (1.8) and
from hazard (2.2.1))

P(TI ‘t,l~~,=(i),~(O)=O,{r(u)}~,e)

= exp - *VI z41q0)


= (i), Y(O)=O,x(u)J)du).
( J0 (

Thus the density of Tl is

f(fllrco,= (i),~(O)=O,(r(u))~,e)

=h(t,lr,,=i,~(O)=O,x(t,),e)

XP(Tl~~llr~o,=(i),~(0)=O,{x(u)}~,e)
The density of the joint event R(1) = j and T, = t, is

f(t,,jlrco,=(i),~(0)=O,{~(~)}~,e)
=h(t,(r,,=(i),~(O)=O,x(t,),e)

x P(TI >‘Ilr~,,=(i),~(0)=O,(x(~))~,e).

This density is sometimes called a subdensity. Note that

~f(r,,jlr~o~=(i),l(0)=O,(x(u)}~,e)
j =1

Proceeding in this fashion, one can define densities corresponding to each


duration in the individual’s event history. Thus, for an individual who starts in
state rem, after his mth transition, the subdensity for T,,, = tm+l and R(m +l)
Ch. 29: Economerric Analysis of Longitudinal Data 1755

= j, j=l,..., N, is

f(l m+l, JP cm)’ ~(m)=7(m),{x(U)}gr(m+1),e),


where
m+l

r(Yn+1)= c t,. (2.2.3)


n=l

As in Part I we assume an independent censoring mechanism. The most


commonly encountered form of such a mechanism is upper limit truncation on
the final spell. As noted in Part I, in forming the likelihood we can ignore the
censoring densities.
The conditional density of completed spells T,, . . . , Tk and right censored spell
Tk+l given {~<u)>~7(k)+fk+l assuming that Y(O) = 0 is the exogenous start date of
the event history (and so corresponds to the origin date of the sample) is, allowing
for more general forms of dependence,

(2.2.4)

As noted in Section 1.5, it is unlikely that the origin date of the sample
coincides with the start date of the event history. Let

be the probability density for the random variables describing the events that a
person is in state R(0) = r(0) at time Y(O) = 0 with a spell of length I,,
(measured after the start of the sample) that ends with an exit to state R(1) = r(1)
given ( x(u)}$‘) and 8. The derivation of this density in terms of the intake
density k appears in Section 1.5 (see the derivation of the density of T,). The only
new point to notice is that the h in Section 1.5 should be replaced with the
appropriate h as defined in (2.2.2). The joint density of (r(O), flo r(1)) the
completed spell density sampled at Y(O) = 0 terminating in state r(1) is defined
analogously. For such spells we write the density as
1756 J. J. Heckman and B. Singer

In a multiple spell model setting in which it is plausible that the process has
been in operation prior to the origin data of the sample, intake rate k introduced
in Section 1.5 is the density of the random variable 7 describing the event
“entered the state r(0) at time Y = 7 I 0 and did not leave the state until
Y > 0.” The expression for k in terms of exit rate (2.2.2) depends on (i)
presample values of x and (ii) the date at which the process began. Thus in
principle given (i) and (ii) it is possible to determine the functional form of k. In
this context it is plausible that k depends on 8.
The joint likelihood for r(O), t,,(l = a, c), r(l), t,, . . . , r(k), tk+l conditional on
8 and {x(u)}-m‘(k)+rk+l for a right censored k + 1st spell is

The marginal likelihood obtained by integrating out B is,

(2.2.6)

= ig(@), tl,, r(l), tz,..., t,, r(k), tL+11{X(U)}~~+t’+1,8)dy(e).

Equation (2.2.6) makes explicit that the date of onset of spell m + 1 (Y( m + 1))
depends on the durations of the preceding spells. Accordingly, in a model in
which the exit rates (2.2.2) depend on 8, the distribution of time varying x
variables (including date of onset of the spell) sampled at the start of each spell
depends on 0. Such variables are not (weakly) exogenous or ancillary in duration
regression equations, and least squares estimators of models that include such
variables are, in general, inconsistent. (See Flinn and Heckman, 1982b.) Provided
that in the population X is distributed independently of 0, time varying variables
create no econometric problem for maximum likelihood estimators based on
density (2.2.6) which accounts for the entire history of the process. However, a
maximum likelihood estimator based on a density of the lust n < k + 1 spells that
conditions on T(k +l- n) or {x(u)}?!~,‘~-“) assuming they are independent of
0 is inconsistent.
Ch. 29: Econometric Analysis of Longitudinal Data 1151

Using (2.2.5) and conditioning on T,, = t,, produces conditional likelihood

(2.2.7)

For three reasons, inference based on conditional likelihood (2.2.7) appears to


be attractive (see Heckman, 1981b). (1) With this likelihood it is not necessary to
specify or estimate the distribution ~(8). It thus appears possible to avoid one
element of arbitrariness in model specification. (2) With this likelihood we avoid
the initial conditions problem because ‘p and { x( u)}7(‘; do not appear in density
(2.2.7). (3) Treating 8 as a parameter allows for arbitrary dependence between @
and x. These three reasons demonstrate the potential gains that arise from having
multiple spell data.29
However for general duration distributions, inference based on (2.2.7) fit on
panel data produces inconsistent estimators. This is so because the conditional
likelihood function depends on person specific component 0. Estimating 8 as a
parameter for each person along with the other parameters of the model produces
inconsistent estimators of all parameters if k < 00 in the available panel data
because the likelihood equations are not in general separable (see Neyman and
Scott, 1948). In most panel data sets, k is likely to be small.
No Monte Carlo study of the performance of the inconsistent estimator has
been performed. By analogy with the limited Monte Carlo evidence reported in
Heckman (1981b) for a fixed effect discrete choice model if x does not contain
lagged values of the dependent variable, the inconsistency is likely to be negligible
even if the likelihood is fit on short panels. The inconsistency issue may be a
matter of only theoretical concern.
Chamberlain (1984) drawing on results due to Andersen (1973, 1980) presents a
class of multiple spell duration models for which it is possible to find sufficient or
ancillary statistics for 8. Estimation within this class of models avoids the
inconsistency problem that arises in likelihoods based on (2.2.7). The class of
exponential family distributions for which the Andersen-Chamberlain procedures
are valid is very special and does not provide arbitrarily close approximations to a
general duration density. Most economically motivated duration models are not
likely to be members of the exponential family. With these procedures it is not

*‘The conditional likelihood cannot be used to analyze single spell data. Estimating 0 as a person
specific parameter would expl&n each single spell observation perfectly and no structural parameters
of the model would be identified.
1758 J. J. Heckman and B. Singer

possible to estimate duration dependence parameters. These procedures avoid the


need to specify or estimate ~(8) and solve the problem of initial conditions by
making very strong and nonrobust assumptions about the functional form of the
conditional hazard h ( t )x, 8 ).
The random effect maximum likelihood estimator based on density (2.2.6) is
the estimator that is likely to see the greatest use in multispell models that control
for unobservables. Flinn and Heckman (1982b) and Hotz (1983) have developed a
general computational algorithm called CTM for a likelihood based on (2.2.6)
that has the following features.

0) It allows for a flexible Box-Cox hazard for (2.2.2) with scalar heterogeneity.

h(tlx,f3)=exp ( x(t)P+ (qA,+( $++cB),

A, < A,. (1.3.2)’

where j3, yi, yZ, A,, A, and c are permitted to depend on the origin state, the
destination state and the serial order of the spell. Lagged durations may be
included among the X. Using maximum likelihood procedures it is possible to
estimate all of these parameters except for one normalization of c.
(ii) It allows for general time varying variables and right censoring. The regres-
sors may include lagged durations.30
(iii) p( 6) can be specified as either normal, log normal or gamma or the NPMLE
procedure discussed in Section 1.4.1 can be used.”
(iv) It solves the left censoring or initial conditions problem by assuming that the
functional form of the initial duration distribution for each origin state is
different from that of the other spells.32

The burden of computing likelihoods based on (2.2.6) is lessened by the following


recursive estimation strategy. (1) Integrate out T,, . . ., Tk+l from (2.2.6) and
estimate the parameters of the reduced likelihood. (2) Then integrate out
T,,..., Tk+l from (2.2.6) and estimate the parameters of the reduced likelihood
fixing the parameters estimated from stage one. (3) Proceed in this fashion until
all parameters are estimated. One Newton step from these parameter values
produces efficient maximum likelihood estimators.

30The random effect maximum likelihood estimator based on (2.2.6) can be shown to be consistent
in the presence of 0 with lagged durations included on x.
31The NPMLE procedure of Heckman and Singer (1984b) can be shown to be consistent for
multiple spell data.
32 This procedure is identical to the procedure discussed in Section 1.5.2, using spells that originate
after the origin of the sample.
Ch. .?9: Econometric Analysis of Longitudinal Data 1759

For more details on the CTM program see Hotz (1983). For further details on
the CTM likelihood function and its derivatives, see Flinn and Heckman (1983).33
For examples of structural multispell duration models see Coleman (1983) and
Flinn and Heckman (1982a).

3. Summary

This paper considers the formulation and estimation of continuous time social
science duration models. The focus is on new issues that arise in applying
statistical models developed in biostatistics to analyze economic data and for-
mulate economic models. Both single spell and multiple spell models are dis-
cussed. In addition, we present a general time inhomogeneous multiple spell
model which contains a variety of useful models as special cases.
Four distinctive features of social science duration analysis are emphasized:

(1) Because of the limited size of samples available in economics and because of
an abundance of candidate observed explanatory variables and plausible
omitted explanatory variables, standard nonparametric procedures used in
biostatistics are of limited value in econometric duration analysis. It is
necessary to control for observed and unobserved explanatory variables to
avoid biasing inference about underlying duration distributions. Controlling
for such variables raises many new problems not discussed in the available
literature.
(2) The environments in which economic agents operate are not the time homoge-
neous laboratory environments assumed in biostatistics and reliability theory.
Ad hoc methods for controlling for time inhomogeneity produce badly biased
estimates.
(3) Because the data available to economists are not obtained from the controlled
experimental settings available to biologists, doing econometric duration
analysis requires accounting for the effect of sampling plans on the distri-
butions of sampled spells.
(4) Econometric duration models that incorporate the restrictions produced by
economic theory only rarely can be represented by the models used by
biostatisticians. The estimation of structural econometric duration models
raises new statistical and computational issues.

s31n Flinn and Heckman (1983) the likelihood is derived using a “competing risks” framework.
[See, e.g. Kalbfleisch and Prentice (1980) for a discussion of competing risits models.] This framework
is in fact inessential to their approach. A more direct approach starts with hazards (2.2.1) and (2.2.2)
that are not based on “latent failure times.” This direct approach, given hazard specification (1.3.2);
produces exactly the same estimating equations as are given in their paper.
1760 J. J. Heckmun und B. Singer

Because of (1) it is necessary to parameterize econometric duration models to


control for both observed and unobserved explanatory variables. Economic
theory only provides qualitative guidance on the matter of selecting a functional
form for a conditional hazard, and it offers no guidance at all on the matter of
choosing a distribution of unobservables. This is unfortunate because empirical
estimates obtained from econometric duration models are very sensitive to
assumptions made about the functional forms of these model ingredients.
In response to this sensitivity we present criteria for inferring qualitative
properties of conditional hazards and distributions of unobservables from raw
duration data sampled in time homogeneous environments; i.e. from uncondi-
tional duration distributions. No parametric structure need be assumed to imple-
ment these procedures.
We also note that current econometric practice ouerparameterizes duration
models. Given a functional form for a conditional hazard determined up to a
finite number of parameters, it is possible to consistently estimate the distribution
of unobservables nonparametrically. We report on the performance of such an
estimator and show that it helps to solve the sensitivity problem.
We demonstrate that in principle it is possible to identify both the conditional
hazard and the distribution of unobservables without assuming parametric func-
tional forms for either. Tradeoffs in assumptions required to secure such model
identification are discussed. Although under certain conditions a fully nonpara-
metric model can be identified, the development of a consistent fully nonparamet-
ric estimator remains to be done.
We also discuss conditions under which access to multiple spell data aids in
solving the sensitivity problem. A superficially attractive conditional likelihood
approach produces inconsistent estimators, but the practical significance of this
inconsistency is not yet known. Conditional inference schemes for eliminating
unobservables from multiple spell duration models that are based on sufficient or
ancillary statistics require unacceptably strong assumptions about the functional
forms of conditional hazards and so are not robust. Contrary to recent claims,
they offer no general solution to the model sensitivity problem.
The problem of controlling for time inhomogeneous environments (Point (2))
remains to be solved. Failure to control for time inhomogeneity produces serious
biases in estimated duration models. Controlling for time inhomogeneity creates a
potential identification problem.
For a single spell data it is impossible to separate the effect of duration
dependence from the effect of time inhomogeneity by a fully nonparametric
procedure. Although it is intuitively obvious that access to multiple spell data aids
in the solution of this identification problem, the development of precise condi-
tions under which this is possible is a topic left for future research.
We demonstrate how sampling schemes distort the functional forms of sample
duration distributions away from the population duration distributions that are
Ch. 29: Econometric Analysis of Longitudinal Data 1761

the usual object of econometric interest (Point (3)). Inference based on mis-
specified duration distributions is in general biased. New formulae for the
densities of commonly used duration measures are produced for duration models
with unobservables in time inhomogeneous environments. We show how access to
spells that begin after the origin date of a sample aids in solving econometric
problems created by the sampling schemes that are used to generate economic
duration data.
We also discuss new issues that arise in estimating duration models explicitly
derived from economic theory (Point (4)). For a prototypical search unemploy-
ment model we discuss and resolve new identification problems that arise in
attempting to recover structural economic parameters. We also consider non-
standard statistical problems that arise in estimating structural models that are
not treated in the literature. Imposing or testing the restrictions implied by
economic theory requires duration models that do not appear in the received
literature and often requires numerical solution of implicit equations derived from
optimizing theory.

References

Amemiya. T. (1981) “Qualitative Response Models: A Survey”, Journal of Economic Literature, 19,
1483-1536.
Amemiya, T. (1984) “Tobit Models: A Survey”, Journal of Econometrics, 24, l-63.
Andersen, E. B. (1973) Conditional Inference and Models for Measuring. Copenhagen: Mentalhygiej-
nisk Forlag.
Andersen, E. B. (1980) Discrete Statistical Models with Social Science Annlications.
‘1 Amsterdam:
North-Holland.
Arnold, Barry and P. Brockett (1983) “Identifiability For Dependent Multiple Decrement/Competing
Risks Models”, Scandanavian Actuarial Journal. 10. 117-127.
Baker, G. and P. Trevedi (1982) “Methods for Estimating the Duration of Periods of Unemployment”.
Australian National University Working Paper.
Barlow, R. E. and F. Proschan (1975) Statistical Theory of Reliability and Life Testing. New York:
Holt, Rinehart and Winston.
Barlow, ,R. E., D. J. Bartholomew, J. M. Bremner and H. D. Brunk (1972) Statistical Inference Under
Order Restrictions. London: Wiley.
Billingsley, P. (1961) Statistical Inference for Markov Processes. Chicago: University of Chicago Press.
Braun, H. and J. Hoem (1979) “Modelling Cohabitational Birth Intervals in the Current Danish
Population: A Progress Report”. Copenhagen University, Laboratory of Actuarial Mathematics,
working paper no. 24.
Burdett, K. and D. Mortensen (1978) “Labor Supply under Uncertainty”, in: R. Ehrenberg, ed.,
Research in Labor Economics. London: JAI Press, 2, 109-157.
Chamberlain, G. (1985) “Heterogeneity, Duration Dependence and Omitted Variable Bias”, in: J.
Heckman and B. Singer, eds., Longitudinal Analysis of labor Market Data. New York: Cambridge
University Press.
Chamberlain, G. (1980) “Comment on Lancaster and Nickell”, Journal of Royal Statistical Society,
Series A. 160.
Coleman, T. (1983) “A Dynamic Model of Labor Supply under Uncertainty”. U. of Chicago,
prcscnted at 1983 Summer Meetings of the Econometric Society, Evanston, Ill., unpublished
manuscript.
1762 J. J. Heckman and B. Singer

Cosslett, 8. (1981) “Efficient Estimation of Discrete Choice Models”, in: C. Mar&i and D.
McFadden, eds., Structurul Analysis of Discrete Data with Econometric Applications. Cambridge:
MIT Press, 41-112.
Cox, D. R. (1962) Renewal Theory. London: Methuen.
Cox, D. R. (1972) “Regression Models and Lifetables”, Journal of the Royal Statistical Society, Series
B, 34, 187-220.
Cox, D. R. and D. Hinkley (1974) Theoretical Statistics. London: Chapman and Hall.
Cox, D. R. and P. A. W. Lewis (1966) The Statistical Analysis of u Series of Events. London: Chapman
and Hall.
Cox, D. R. and D. 0. Oakes (1984) Analysis of Survival Data. London: Chapman and Hall.
DeGroot, M. (1970) Optimal Statisticul Decisions. New York: McGraw-Hill.
Domencich, T. and D. McFadden (1975) Urban Travel Demand. Amsterdam: North-Holland.
Elbers, C. and G. Ridder (1982) “True and Spurious Duration Dependence: The Identifiability of the
Proportional Hazard Model”, Review of Economic Studies, 49, 403-410.
Feller, W. (1970) An Introduction to Probability Theory and Its Applications. New York: Wiley, Vol. I,
third edition.
Feller, W. (1971) An Introduction to Probability Theory and Its Applications. New York: Wiley, Vol. II.
Flinn, C. and J. Heckman (1982a) “New Methods for Analyzing Structural Models of Labor Force
Dynamics”, Journul of Econometrics, 18, 115-168.
Flinn, C. and J. Heckman (1982b) “Models for the Analysis of Labor Force Dynamics”, in: R.
Basmann and G. Rhodes, eds., Advances in Econometrics, 1, 35-95.
Flinn, C. and J. Heckman (1982~) “Comment on ‘Individual Effects in a Nonlinear Model: Explicit
Treatment of Heterogeneity in the Empirical Job Search Literature”‘, unpublished manuscript,
University of Chicago.
Flinn, C. and J. Heckman (1983) “The Likelihood Function for the Multistate-Multiepisode Model in
‘Models for the Analysis of Labor Force Dynamics”‘, in: R. Basmann and G. Rhodes, eds.,
Advances in Econometrics. Greenwich: JAI Press, 3.
Hartigan, J. and P. Hartigan (1985) “The Dip Test for Unimodalities”, The Annals of Statistics, 13(l),
70-84.
Hauser, J. R. and K. Wisniewski (1982a) “Dynamic Analysis of Consumer Response to Marketing
Strategies”, Munagement Science, 28, 455-486.
Hauser, J. R. and K. Wisniewski (1982b) “Application, Predictive Test and Strategy Implications for a
Dynamic Model of Consumer Response”, Marketing Science, 1, 143-179.
Heckman, J. (1981a) “Statistical Models for Discrete Panel Data”, in: C. Manski and D. McFadden
eds., The Structural Analysis of Discrete Data. Cambridge: MIT Press.
Heckman, J. (1981b) “The Incidental Parameters Problem and the Problem of Initial Conditions in
Estimating a Discrete Time-Discrete Data Stochastic Process”, in: C. Manski and D. McFadden,
eds., Structurul Analysis of Discrete Data with Economic Applications. Cambridge: MIT Press,
179-197.
Heckman, J. (1974) “Shadow Prices, Market Wages and Labor Supply”, Econometrica, 42(4),
679-694.
Heckman, J. and G. Borjas (1980) “Does Unemployment Cause Future Unemployment? Definitions,
Questions and Answers from a Continuous Time Model of Heterogeneity and State Dependence”,
Economica, 47, 247-283.
Heckman, J. and B. Singer (1982) “The Identification Problem in Econometric Models for Duration
Data”, in: W. Hildenbrand, ed., Advances in Econometrics. Proceedings of World Meetings of the
Econometric Society, 1980. Cambridge: Cambridge University Press.
Heckman, J. and B. Singer (1983) “The Identifiability of Nonproportional Hazard Models”. Univer-
sity of Chicago, unpublished manuscript.
Heckman, J. and B. Singer (1984a) “The Identifiability of the Proportional Hazard Model”, Review of
Economic Studies, 51(2), 231-243.
Heckman, J. and B. Singer (1984b) “A Method for Minimizing the Impact of Distributional
Assumptions in Econometric Models for duration Data”, Econometrica, 52(2), 271-320.
Hoem. J. (1972) “Inhomogeneous Semi-Markov Processes, Select Actuarial Tables and Duration
Dependence in Demography”, in: T. Greville, ed., Population Dynamics. New York: Academic
Press, 251-296.
Ch. 29: Econometric Analysis of Longitudinal Data 1763

Hotz. J. (1983) “Continuous Time Models (CTM): A Manual”. GSIA, Pittsburgh: Carnegie-Melon
University.
Jovanovic, B. (1979) “Job Matching and the Theory of Turnover”, Journal of Political Economy,
October, 87, 972-990.
Kac, M. (1959) Probability and Related Topics in the Physical Science. New York: Wiley.
Kalbfleisch, J. and R. Prentice (1980) The Siatistical Analysis of Failure Time Data. New York: Wiley.
Kieter. N. and G. Neumann (1981) ~ I
“Individual Effects in a Nonlinear Model”, Econometrica. 49(4). ,/
965-980.
Lancaster. T. and S. Nickel1 (1980) “The Analysis of Reemployment Probabilities for the Unem-
ployed”, Journal of the Royal Statistical Society, Series A, 143, 141-165.
Lawless, J. F. (1982) Statistical Models and Methods for Lifetime Data. New York: Wiley.
Lewis, H. G. (1974) “Comments on Selectivity Biases in Wage Comparisons”, Journal of Political
Economy, November, 82(6), 1145-1156.
Lindsey, B. (1983a) “The Geometry of Mixture Likelihoods, Part I”, Annals of Statistics, 11, 86-94.
Lindsey, B. (1983b) “The Geometry of Mixture Likelihoods, Part II”, Annals of Statistics, 11(3),
783-792.
Lippman, S. and J. McCall (1976) “The Economics of Job Search: A Survey”, Economic Inquiry,
September, 14, 113-126.
Lundberg, F. (1903) “I. Approximerad Framstallmng af Sannolikhetsfunktionen II. Aterforsakring af
Kollektivrisker”. Uppsala: Almquist und Wicksell.
Lundberg, S. (1981) “The Added Worker: A Reappraisal”. NBER Working Paper no. 706, Cam-
bridge, Mass.
Manski, C. and D. McFadden (1981) “Alternative Estimators and Sample Designs for Discrete Choice
Analysis”, in: C. Manski and D. McFadden, Structural Analysis of Discrete Data with Econometric
Applications. Cambridge: MIT Press, 2-50.
Manski, C. and S. Lerman (1977) “The Estimation of Choice Probabilities from Choice Based
Samples”, Econometrica, 45, 1977-1988.
McFadden, D. (1974) “Conditional Logit Analysis of Qualitative Choice Behavior”, in: P. Zarembka,
ed., Frontiers in Econometrics. New York: Academic Press.
Moore, E. and R. Pyke (1968) “Estimation of the Transition Distributions of a Markov Renewal
Process”, Annals of the Institute of Statistical Mathematics. Tokyo, 20(3), 411-424.
Neyman, J. and E. Scott (1948) “Consistent Estimates Based on Partially Consistent Observations”,
Econometrica, 16, l-32.
Robb, R. (1984) “Two Essays on the Identification of Economic Models”. University of Chicago,
May, unpublished manuscript.
Robbins. H. (1970) “Optimal Stopping”, American Mathematical Monthly, 77, 333-43.
Ross, S. M. (1970) Applied Probability Models with Optimization Applications. San Francisco: Holden-
Day.
Rudin, W. (1974) Real and Complex Analysis. New York: McGraw Hill.
Salant, S. (1977) “Search Theory and Duration Data: A Theory of Sorts”, Quarterly Journal of
Economics, February, 91, 39-57.
Sheps, M. and J. Menken (1973) Mathematical Models of Conception and Birth. Chicago: University of
Chicago Press.
Shohat, J. and J. Tamarkin (1943)/ The Problem of Moments. New York: American Mathematical
Society.
Singer, B. (1982) “Aspects of Nonstationarity”, Journal of Econometrics, 18(l), 169-190.
Trusseli, J. and T. Richards (1985) “Correcting for Unobserved Heterogeneity in Hazard Models: An
Application of the Heckman-Singer Procedure, in N. Tuma, Sociological Methodology. San Fran-
cisco: Jossey Bass.
1768 A. Deaton

0. Introduction

The empirical analysis of consumer behavior has always held a central position in
econometrics and many of what are now standard techniques were developed in
response to practical problems in interpreting demand data. An equally central
position in economic analysis is held by the theory of consumer behavior which
has provided a structure and language for model formulation and data analysis.
Demand analysis is thus in the rare position in econometrics of possessing long
interrelated pedigrees on both theoretical and empirical sides. And although the
construction of models which are both theoretically and empirically satisfactory is
never straightforward, no one who reads the modem literature on labor supply,
on discrete choice, on asset demands, on transport, on housing, on the consump-
tion function, on taxation or on social choice, can doubt the current vigor and
power of utility analysis as a tool of applied economic reasoning. There have been
enormous advances towards integration since the days when utility theory was
taught as a central element in microeconomic courses but then left unused by
applied economists and econometricians.
Narrowly defined, demand analysis is a small subset of the areas listed above,
referring largely to the study of commodity demands by consumers, most usually
based on aggregate data but occasionally, and more so recently, on cross-sections
or even panels of households. In this chapter, I shall attempt to take a somewhat
broader view and discuss, if only briefly, the links between conventional demand
analysis and such topics as labor supply, the consumption function, rationing,
index numbers, equivalence scales and consumer surplus. Some of the most
impressive recent econometric applications of utility theory are in the areas of
labor supply and discrete choice, and these are covered in other chapters. Even so,
a very considerable menu is left for the current meal. Inevitably, the choice of
material is my own, is partial (in both senses), and does not pretend to be a
complete survey of recent developments. Nor have I attempted to separate the
economic from the statistical aspects of the subject. The strength of consumer
demand analysis has been its close articulation of theory and evidence and the
theoretical advances which have been important (particularly those concerned
with duality) have been so precisely because they have permitted a more intimate
contact between the theory and the interpretation of the evidence. It is not
possible to study applied demand analysis without keeping statistics and ew-
nomic theory simultaneously in view.
The layout of the chapter is as follows. Section 1 is concerned with utility and
the specification of demand functions and attempts to review the theory from the
Ch. 30: Demand Analysis 1169

point of view of applied econometrics. Duality aspects are particularly em-


phasized. Section 2 covers what I shall call ‘naive’ demand analysis, the estima-
tion and testing, largely on aggregate time series data, of ‘complete’ systems of
demand equations linking quantities demanded to total expenditure and prices.
The label “naive” implies simplicity neither in theory nor in econometric tech-
nique. Instead, the adjective refers to the belief that, by itself, the simple, static,
neoclassical model of the individual consumer could (or should) yield an adequate
description of aggregate time-series data. Section 3 is concerned with microeco-
nomic or cross-section analysis including the estimation of Engel curves, the
treatment of demographic variables, and the particular econometric problems
which arise in such contexts. There is also a brief discussion of the econometric
issues that arise when consumers face non-linear budget constraints. Sections 4
and 5 discuss two theoretical topics of considerable empirical importance, sep-
arability and aggregation. The former provides the analysis underpinning econo-
metric analysis of subsystems on the one hand and of aggregates, or supersystems,
on the other. The latter provides what justification there is for grouping over
different consumers. Econometric analysis of demand under conditions of ration-
ing or quantity constraints is discussed in Section 6. Section 7 provides a brief
overview of three important topics which, for reasons of space, cannot be covered
in depth, namely, intertemporal demand analysis, including the analysis of the
consumption function and of durable goods, the choice over qualities, and the
links between demand analysis and welfare economics, particularly as concerns
the measurement of consumer surplus, cost-of-living index numbers and the costs
of children. Many other topics are inevitably omitted or dealt with less fully than
is desirable; some of these are covered in earlier surveys by Goldberger (1967),
Brown and Deaton (1972) and Barten (1977).

1. Utility and the specification of demand

1.1. Assumptions for empirical analysis

As is conventional, I begin with the specification of preferences. The relationship


“is at least as good as”, written 2, is assumed to be reflexive, complete, transitive
and continuous. If so, it may be represented by a utility function, u(q) say,
defined over commodity vector q with the property that the statement qA > qB
for vectors qA and qE is equivalent to the statement v(qA) 2 u(qB). Clearly, for
most purposes, it is more convenient to work with a utility function than with a
preference ordering. There seem few prior empirical grounds for objecting to
reflexivity, completeness, transitivity or continuity, nor indeed to the assumption
that u(q) is monotone increasing in q. Again, for empirical work, there is little
1770 A. Deaton

objection to the assumption that preferences are conuex, i.e. that for qA z qB, and
for 0 I X I 1, AqA + (1 - A)qB 2 qB. This translates immediately into quasi-con-
cavity of the utility function u(q), i.e. for qA, qB, 0 I A 5 1,

u(qA) 2 u(q’) implies u(XqA+(l-A)qB) 2u(qB). (1)

Henceforth, I shall assume that the consumer acts so as to maximise the


monotone, continuous and quasi-concave utility function u(q).
It is common, in preparation for empirical work, to assume, in addition to the
above properties, that the utility function is strictly quasi-concave (so that for
0 < X < 1 the second inequality in (1) is strict), dl&mtiable, and that all goods
are essential, i.e. that in all circumstances all goods are bought. All these
assumptions are convenient in particular situations. But they are all restrictive
and all rule out phenomena that are likely to be important in some empirical
situations. Figure 1 illustrates in two dimensions. All of the illustrated indiffer-
ence curves are associated with quasi-concave utility functions, but only A is
either differentiable or strictly quasi-concave. The flat segments on B and C
would be ruled out by strict quasi-concavity; hence, strictness ensures single-val-

41
Figure 1 Indifference curves illustrating quasi-concavity, differentiability and essential
goods.
Ch. 30: Demand Analysis 1771

ued demand functions. Empirically, flats are important because they represent
perfect substitutes; for example, between S and T on B, the precise combination
of q1 and q2 makes no difference and this situation is likely to be relevant, say,
for two varieties of the same good. Non-differentiabilities occur at the kink points
on the curves B and C. With a linear budget constraint, kinks imply that for
relative prices within a certain range, two or more goods are bought in fixed
proportions. Once again, this may be practically important and fixed relationships
between complementary goods are often a convenient and sensible modelling
strategy. The n-dimensional analogue of the utility function corresponding to C is
the fixed coefficient or Leontief utility function

(2)

For positive parameters ai,. . . , a,. Finally curve A illustrates the situation where
q2 is essential but q1 is not. As q2 tends to zero, its marginal value relative to that
of q1 tends to infinity along any given inditIerence curve. Many commonly used
utility functions impose this condition which implies that q2 is always purchased
in positive amounts. But for many goods, the behavior with respect to q1 is a
better guide; if p1 > p&l, the consumer on indifference curve A buys none of ql.
Data on individual households always show that, even for quite broad commodity
groups, many households do not buy all goods. It is therefore necessary to have
models that can deal with this fact. _

1.2. Lugrangians and matrix methoa3

If u(q) is strictly quasi-concave and differentiable, the maximization of utility


subject to the budget constraint can be handled by Lagrangian techniques.
Writing the constraint pa q = x for price vector p and total expenditure x, the
first-order conditions are

au@!_ = xp,
1) (3)
8%

which, under the given assumptions, solve for the demand functions

4i=giCx, PI* (4)

For example, the linear expenditure system has utility function

u = IT(q, - #, (5)
1772 A. Deaton

for parameters y and 8, the first-order conditions of which are readily solved to
give the demand functions

Pi4i=PiYi+BiBi(x-P*Y)* (6)

In practice, the first-order conditions are rarely analytically soluble even for quite
simple formulations (e.g. Houthakker’s (1960) “direct addilog” u = &qp), nor
is it at all straightforward to pass back from given demand functions to a closed
form expression for the utility function underlying them, should it indeed exist.
The generic properties of demands are frequently derived from (3) by total
differentiation and matrix inversion to express dq as a function of dx and dp, the
so-called “fundamental matrix equation” of consumer demand analysis, see
Barten (1966) originally and its frequent later exposition by The& e.g. (1975b, pp.
14lI), also Phlips (1974, 1983, p. 47), Brown and Deaton (1972, pp. 1160-2).
However, such an analysis requires that u(q) be twice-differentiable, and it is
usually assumed in addition that utility has been monotonically transformed so
that the Hessian is non-singular and negative definite. Neither of these last
assumptions follows in any natural way from reasonable axioms; note in particu-
lar that is is not always possible to transform a quasi-concave function by means
of a monotone increasing function into a concave one, see Kannai (1977) Afriat
(1980). Hence, the methodology of working through first-order conditions in-
volves an expansive and complex web of restrictive and unnatural assumptions,
many of which preclude consideration of phenomena requiring analysis. Even in
the hands of experts, e.g. the survey by Barten and Bohm (1980) the analytical
apparatus becomes very complex. At the same time, the difficulty of solving the
conditions in general prevents a close connection between preferences and
demand, between the a priori and the empirical.

1.3. Duality, cost functions and demands

There are many different ways of representing preferences and great convenience
can be obtained by picking that which is most appropriate for the problem at
hand. For the purposes of generating empirically useable models in which
quantities are a function of prices and total expenditure, dual representations are
typically most convenient. In this context, duality refers to a switch of variables,
from quantities to prices, and to the respecification of preferences in terms of the
latter. Define the cost function, sometimes expenditure function, by

44 P> = (mpP% +I) 2 u>.


Ch. 30: Demand Analysis 1773

If x is the total budget to be allocated, then x will be the cheapest way of


reaching whatever u can be reached at p and x, so that

c(u,p) =x.

The function c(u, p) can be shown to be continuous in both its arguments,


monotone increasing in u and monotone non-decreasing in p. It is linearly
homogeneous and concuue in prices, and first and second differentiable almost
everywhere. It is strictly quasi-concave if u(q) is difirentiable and everywhere
differentiable if u(q) is strictly quasi-concave. For proofs and further discussions
see McFadden (1978), Diewert (1974a), (1980b) or, less rigorously, Deaton and
Muellbauer (1980a, Chapter 2).
The empirical importance of the cost function lies in two features. The first is
the ‘derivative property’, often known as Shephard’s Lemma, Shephard (1953). By
this, whenever the derivative exists

WG P)
aPi
=h,(u,p)=q,. (9)

The functions hi(tc, p) are known as Hicks& demands, in contrast to the


Marshallian demands gi(x, p). The second feature is the Shephard-Uzawa
duality theorem [again see McFadden (1978) or Diewert (1974a), (1980b)l which
given convex preferences, allows a constructive recovery of the utility function
from the cost function. Hence, all the information in u(q) which is relevent to
behavior and empirical analysis is encoded in the function c(u, p). Or put
another way, any function c(u, p) with the correct properties can serve as an
alternative to u(q) as a basis for empirical analysis. The direct utility function
need never be explicitly evaluated or derived; if the cost function is correctly
specified, corresponding preferences always exist. The following procedure is thus
suggested in empirical work. Starting from some linearly homogeneous concave
cost function c(u, p), derive the Hicksian demand functions hi(u, p) by differ-
entiation. These can be converted into Marshallian demands by substituting for u
from the inverted form of (8); this is written

u=#(x9P), 00)
and is known as the indirect utility function. (The original function u(q) is the
direct utility function and the two are linked by the identity \I,(x, p) = u { g(x, p)}
for utility m aximizmg demands g(x, p)). Substituting (10) into (9) yields

qi=hi(u,p)=hi{rC,(x,P),P}=gi(x,P), 01)
1774 A. Dearon

which can then be estimated. Of course, the demands corresponding to the


original cost function may not fit the data or may have other undesirable
properties for the purpose at hand. To build this back into preferences, we must
be able to go from gi(x, p) back to c(u, p). But, from Shephard’s Lemma,
qi = gi(x, p) may be rewritten as

ac(aud.p)
=gi{c(u,p),P)9 (12)
I

which may be solved for c(u, p) provided the mathematical integrability condi-
tions are satisfied. These turn out to be equivalent to Slutsky symmetry, so that
demand functions displaying symmetry always imply some cost function, see, for
example, Hurwicz and Uzawa (1971) for further details. If the Slutsky matrix is
also negative semi-definite (together with symmetry, the ‘economic’ integrability
condition), the cost function will be appropriately concave which it must be to
represent preferences. This possibility, of moving relatively easily between prefer-
ences and demands, is of vital importance if empirical knowledge is to be linked
to economic theory.
An alternative and almost equally straightforward procedure is to start from
the indirect utility function J/(x, p). This must be zero degree homogeneous in x
and p and quasi-convex in p and Shephard’s Lemma takes the form

- a4(x? P)/aPi
4i=gi(x3P)= (13)
Wk PVJX ’

a formula known as Roy’s identity, Roy (1942). This is sometimes done in


“normalized” form. Clearly, Jl(x, p) = \cI(l, p/x) = q*(r) where r = p/x is the
vector of normalized prices. Hence, using $* instead of 4, Roy’s identity can be
written in the convenient form

Pi4i 8$*/a10gri alog+, P)


wi=X= C aG*/alogrk= aiogpi 9
k

where the last equality follows from rewriting (9).


One of the earliest and best practical examples of the use of these techniques is
Samuelson’s (1947-8) derivation of the utility function (5) from the specification
of the linear expenditure system suggested earlier by Klein and Rubin (1947-8).
A more recent example is provided by the following. In 1943, Holbrook Working
suggested that a useful form of Engel curve was given by expressing the budget
share of good i, wi, as a linear function of the logarithm of total expenditure.
Ch. 30: Demand Analysis 1115

Hence,

wi = ai + &ln X, 05)

for parameters (Y and /3, generally functions of prices, and this form was
supported in later comparative tests by Leser (1963). From (14) the budget shares
are the logarithmic derivatives of the cost function, so that (15) corresponds to
differential equations of the form

alnc(u, P)
alnp, = ai( P>+ai( P)lnctu, P)9 06)

which give a solution of the general form

lnc(u,p)=ulnb(p)+(l-u)lna(p), (17)

where (ui(p) = (ailn b - biln a)/@ b -In a) and pi(p) = bi/(ln b -In a) for ai
= 8 In u/8 In pi and bi = d In b/a In pi. The form (17) gives the cost function as a
utility-weighted geometric mean of the linear homogeneous functions u(p) and
b(p) representing the cost functions of the very poor (U = 0) and the very rich
(U = 1) respectively. Such preferences have been called the PIGLOG class by
Muellbauer (1975b); (1976a), (1976b). A full system of demand equations within
the Working-Leser class can be generated by suitable choice of the functions
b(p) and u(p). For example, if

lnu(p)=u,+C~klnPk+~CCYk*mlnpklnPm,
k m
08)
lnb(p) =ha(p)+&flpfi,

we reach the “almost ideal demand system” (AIDS) of Deaton and Muellbauer
(1980b) viz

and yij = +(yiT + y,:). A variation on the same theme is to replace the geometric
mean (17) by a mean of order E

c(u,p)= {ub(p)‘+(l-~)u(P)“}“~, (20)


1776 A. Deaton

with Engel curves

w, = ai + pix-‘. (21)
This is Muellbauer’s PIGL class; equation (21) in an equivalent Box-Cox form,
has recently appeared in the literature as the “generalized Working model”, see
Tran van Hoa, Ironmonger, and Manning (1983) and Tran van Hoa (1983).
I shall return to these and similar models below, but for the moment note how
the construction of these models allows empirical knowledge of demands to be
built into the specification of preferences. This works at a less formal level too.
For example, prior information may relate to the shape of indifference curves, say
that two goods are poor substitutes or very good substitutes as the case may be.
This translates directly into curvature properties of the cost function; ‘kinks’ in
quantity space turn into ‘flats’ in price space and vice versa so that the specifica-
tion can be set accordingly. For further details, see the elegant diagrams in
McFadden (1978).
The duality approach also provides a simple demonstration of the generic
properties of demand functions which have played such a large part in the testing
of consumer rationality, see Section 2 below. The budget constraint implies
immediately that the demand functions add-up (trivially) and that they are
zero-degree homogeneous in prices and total expenditure together (since the
budget constraint is unaffected by proportional changes in p and x). Shephard’s
Lemma (9) together with the mild regularity conditions required for Young’s
Theorem implies that

ah, d=c ah.


a=c __.A
ap, .T J$& = Jpi Jp, ap, ’ (22)

so that, if sij, the Slutsky substitution term is ah,/ap,, the matrix of such terms,
S, is symmetric. Furthermore, since c( u, p) is a concave function of p, S must be
negative semi-definite. (Note that the homogeneity of c( u, p) implies that p lies in
the nullspace of S). Of course, S is not directly observed, but it can be evaluated
using (12); differentiating with respect to pj gives the Slutsky equation.

'ij=
GTi
G +
agi
xqj. (23)

Hence to the extent that agi/apj and ag,/ax can be estimated econometrically,
symmetry and negative semi-definiteness can be checked. I shall come to practical
attempts to do so in the next section.
Ch. 30. Demand Analysis 1-m

1.4. Inverse demand functions

In practical applications, it is occasionally necessary to estimate prices as a


function of quantities rather than the other way round. An approach to specifica-
tion exists for this case which is precisely analogous to that suggested above.
From the direct utility function and the first-order conditions (lo), apply the
budget constraint p. q = x to give

Pi4i i3u/alnq,
-= (24)
x Cau/alnq,’
k

which is the dual analogue of (14), though now determination goes from the
quantities q to the normalized prices p/x. Alternatively, define the distance
function d(u, q), dual to the cost function, by

dhq)=~{p~q; #(Lpbu}. (25)

The distance function has properties analogous to the cost function and, in
particular,

(26)
are the inverse compensated demand functions relating an indifference curve u
and a quantity ray q to the price to income ratios at the intersection of q and u.
See McFadden (1978), Deaton (1979) or Deaton and Muellbauer (1980a, Chapter
2.7) for fuller discussions.
Compensated and uncompensated inverse demand functions can be used in
exactly the same way as direct demand functions and are appropriate for the
analysis of situations when quantities are predetermined and prices adjust to clear
the market. Hybrid situations can also be analysed with some prices fixed and
some quantities fixed; again see McFadden (1978) for discussion of “restricted”
preference representation functions. Note one final point, however. The Hessian
matrix of the distance function d(u, q) is the Antonelli matrix A with elements

a*d aai(u, 4)
a..=-=a..=
aqiaqj (27)
1~ ~1 aqj ’
which can be used to define q-substitutes and q-complements just as the Slutsky
matrix defines p-substitutes and p-complements, see Hicks (1956) for the original
discussion and derivations. Unsurprisingly the Antonelli and Slutsky matrices are
intimately related and given the close parallel been duality and matrix inversion,
1778 A. Decrton

it is appropriate that they should be generalised inverses of one another. For


example, using v to denote the vector of price or quantity partial derivatives, (9)
and (26) combine to yield

q=vc{%v+, vc(w)}}. (28)


Hence, differentiating with respect to p/x and repeatedly applying the chain rule,
we obtain at once

S*=S*AS*. (29)

Similarly,

A= AS*A, (30)

where S * = xS. Note that the homogeneity restrictions imply Aq = S *p = 0 which


together with (29) and (30) complete the characterization as generalized inverses.
These relationships also allow passage from one type of demand function to
another so that the Slutsky matrix can be calculated from estimates of indirect
demand functions while the Antonelli matrix may be calculated from the usual
demands. The explicit formula for the latter is easily shown to be

A= (xS+qq’)-‘-~-~pp’, (31)

with primes denoting transposition, see Deaton (1981a). The Antonelli matrix has
important applications in measuring quantity index numbers, see, e.g. Diewert
(1981, 1983) and in optimal tax theory, see Deaton (1981a). Formula (31) allows
its calculation from an estimate of the Slutsky matrix.
This brief review of the theory is sufficient to permit discussion of a good deal
of the empirical work in the literature. Logically, questions of aggregation and
separability ought to be treated first, but since they are not required for an
understanding of what follows, I shall postpone their discussion to Section 4.

2. Naive demand analysis

Following Stone’s first empirical application of the linear expenditure system in


1954, a good deal of attention was given in the subsequent literature to the
problems involved in estimating complete, and generally nonlinear, systems of
demand equations. Although the issues are now reasonably well understood, they
deserve brief review. I shall use the linear expenditure system as representative of
Ch. 30: Demand Analysis 1779

the class

Pit4it =_A( PI, xt; ‘)+ Uir7 (32)

for commodity i on observation t, parameter vector b, and error uil. For the
linear expenditure system the function takes the form

fi( Pty Xt; b)=YiPit+P;(x,-P,.Y). (33)

2.1. Simultaneity

The first problem of application is to give a sensible interpretation to the quantity


x,. In loose discussion of the theory x, is taken as “income” and is assumed to be
imposed on the consumer from outside. But, if q1 is the vector of commodity
purchases in period t, then (a) only exceptionally is any real consumer given a
predetermined and inflexible limit for total commodity expenditure and (b) the
only thing which expenditures add up to is total expenditure defined as the sum
of expenditures. Clearly then, x, is in general jointly endogenous with the
expenditures and ought to be treated as such, a point argued, for example, by
Summers (1959), Cramer (1969) and more recently by Lluch (1973), Lluch and
Williams (1974). The most straightforward solution is to instrument x, and there
are no shortages of theories of the consumption function to suggest exogenous
variables. However, in the spirit of demand analysis this can be formalized rather
neatly using any intertemporally separable utility function. For example, loosely
following Lluch, an intertemporal or extended linear expenditure system can be
proposed of the form

Pit4it = PitYit + Pit w- i C P2Y7k (34)


~=f k

where the yir and pi, parameters are now specific to periods (needs vary over the
life-cycle), W is the current present discounted value of present and future
income and current financial assets, and p:k is the current discounted price of
good k in future period r( p:k = ptk since t is the present). As with any such
system based on intertemporally separable preferences, see Section 4 below, (34)
can be solved for x, by summing the left-hand side over i and the result, i.e. the
consumption function, used to substitute for W. Hence (34) implies the familiar
1780 A. Deaton

static linear expenditure system, i.e.

where u, = cuil, fit = cp,, and it is assumed, as is reasonable, that p, # 0. This


not only relates the parameters in the static version (33) to their intertemporal
counterparts, but it also gives valuable information about the structure of the
error term in (32). Given this, the bias introduced by ignoring the simultaneity
between X, and pi,qir can be studied. For the usual reasons, it will be small if the
equations fit well, as Prais (1959) argued in his reply to Summers (1959). But there
is a rather more interesting possibility. It is easily shown, on the basis of (35), that

cov(x f,u It
)=Cu. rk -p”
P
c&,,, (36)
k ’ k m

where aii is the (assumed constant) covariance between uif and Ujt, i.e.

Cov( Ui,) u,,) = s*pij. (37)

where urs is the Kronecker delta. Clearly, the covariance in (36) is zero if
~k”ik/~ukrn = pi,/&.One specialized theory which produces exactly this rela-
tionship is Theil’s (1971b, 1974,1975a, 1975b, pp. 56-90,1979) “rational random
behaviour” under which the variance, covariance matrix of the errors u,, is
rendered proportional to the Slutsky matrix by consumers’ trading-off the costs of
exact maximization against the utility losses of not doing so. If this model is
correct, there is no simultaneity bias, see Deaton (1975a, pp. 161-8) and Theil
(1976, pp. 4-6, 80-82) for applications. However, most econometricians would
tend to view the error terms as reflecting, at least in part, those elements not
allowed for by the theory, i.e. misspecifications, omitted variables and the like.
Even so, it is not implausible that (36) should be close to zero since the
requirement is that error covariances between each category and total expenditure
should be proportional to the marginal propensity to spend for that good. This is
a type of “error separability” whereby omitted variables influence demands in
much the same way as does total outlay.
In general, simultaneity will exist and the issue deserves to be taken seriously; it
is likely to be particularly important in cross-section work, where occasional large
purchases affect both sides of the Engel curve. Ignoring it may also bias the other
tests discussed below, see Altfield (1985).
Ch. 30: Demand Analysis 1781

2.2. Singularity of the variance - covariance matrix

The second problem arises from the fact that with x, dejked as the sum of
expenditures, expenditures automatically add-up to total expenditure identically,
i.e. without error. Hence, provided fi in (32) is properly chosen, we must have

CPitqit=Xt; Ch(Pt,xt; b)=xt; CUit’O. (38)


i i

Writing D as the n x n contemporaneous variance-covariance matrix of the ui,‘s


with typical element wij, i.e.

E ( Uit3ujs ) = GtsWij9 (39)

then the last part of (38) clearly implies

&ij= ~oij=o, (40)


i i

so that the variance-covariance matrix is singular. If (32) is stacked in the usual


way as an nT observation regression, its covariance matrix is Q@1 which cannot
have rank higher than (n - l)T. Hence, the usual generalized least squares
estimator or its non-linear analogue is not defined since it would require the
non-existent inverse Q-%1.
This non-existence is, however, a superficial problem. For a set of equations
such as (32) satisfying (38), one equation is essentially redundant and all of its
parameters can be inferred from knowledge of those in the other equations.
Hence, attempting to estimate all the parameters in all equations is equivalent to
including some parameters more than once and leads to exactly the same
problems as would arise if, for example, some independent variables were
included more than once on the right hand side of an ordinary single-variable
regression. The solution is obviously to drop one of the equations and estimate
the resulting (n - 1) equations by GLS, Zelhrer’s (1962) seemingly unrelated
regressions estimator (SURE), or similar technique. Papers by McGuire, Farley,
Lucas and Winston (1968) and .by Powell (1969) show that the estimates are
invariant to the particular equation which is selected for omission. Barten (1969)
also considered the maximum-likelihood estimation of such systems ‘when the
errors follow the multivariate normal assumption. If 9, is the variance-covari-
ante matrix of the system (32) excluding the nth equation, a sample of T
observations has a log-likelihood conditional on normality of

(41)
1782 A. Dearon

where u(n) is the (n - 1)-vector of uit excluding element n. Barten defines a new
non-singular matrix V by

V = S2+ tcii, (42)

where i is the normalized vector of units, i.e. ii =1/n, and 0 < K < co. Then (41)
may be shown to be equal to

ln~=${lnu+lnn-(n-I)ln2?r-lndetV}-+ 5 u;V-‘u,. (43)


t=1

This formulation establishes that the likelihood is independent of the equation


deleted (and incidentally of K since (41) does not depend on it) and also returns
the original symmetry to the problem. However, in practice, the technique of
dropping one equation is usually to be preferred since it reduces the dimension of
the parameter vector to be estimated which tends to make computation easier.
Note two further issues associated with singularity. First, if the system to be
estimated is a “subsystem” of commodities that does not exhaust the budget, the
variance covariance matrix of the residuals need not, and usually will not be
singular. In consequence, SURE or FIML (see below) can be carried out directly
on the subsystem. However, it is still necessary to assume a non-diagonal
variance-covariance matrix; overall singularity precludes all goods from having
orthogonal errors and there is usually no good reason to implicitly confine all the
off-diagonal covariances to the omitted goods. Second, there are additional
complications if the residuals are assumed to be serially correlated. For example,
in (32), it might be tempting to write

for serially uncorrelated errors Ed,.If R is the diagonal matrix of pi’s, (44) implies
that

52=RS2R+I, (45)

where 2 is the contemporaneous variance-covariance matrix of the E’S. Since


Oi = Xi = 0, we must have s2p = 0, which, since i spans the null space of A&
implies that p a i, i.e. that all the pi’s are the same, a result first established by
Bemdt and Savin (1975). Note that this does not mean that (44) with p, = p for
all i is a sensible specification for autocorrelation in singular systems. It would
seem better to allow for autocorrelation at an earlier stage in the modeling, for
example by letting uir be autocorrelated in (34) and following through the
consequences for the compound errors in (35). In general, this will imply vector
Ch. 30: Demand Analysis 1783

autoregressive structures, as, for example, in Guilkey and Schmidt (1973) and
Anderson and Blundell(1982). But provided autocorrelation is handled in a way
that respects the singularity (as it should be), so that the omitted equation is not
implicitly treated differently from the others, then it will always be correct to
estimate by dropping one equation since all the relevant information is contained
in the other (n - 1).

2.3. Estimation

For estimation purposes, rewrite (32) in the form

(46)

with t =l,..., T indexing observations and i=l , . . . ,(n - 1) indexing goods. I


shall discuss only the case where uti are independently and identically distributed
as multivariate normal with zero mean and nonsingular covariance matrix 0. [For
other specifications, see, e.g. Woodland (1979)]. Since s2 is not indexed on t,
homoskedasticity is being assumed; this is always more likely to hold if the y,,‘s
are the budget shares of the goods, not quantities or expenditures. Using budget
shares as dependent variables also ensures that the R2 statistics mean something.
Predicting better than wit = (Y~is an achievement (albeit a mild one), while with
quantities or expenditures, R2 tend to be extremely high no matter how poor the
model.
Given the variance-covariance matrix s2, typical element wij, the MLE’s of p,
p say, satisfy the first-order conditions, for all i,

where tik’ is the (k, I)th element of 0-l. These equations also define the linear or
non-linear GLS estimator. Since D is usually unknown, it can be replaced by its
maximum likelihood estimator,

If ij,, replaces wij in (47) and (47) and (48) are solved simultaneously, fi and b
are the full-information maximum likelihood estimators (FIML). Alternatively,
some consistent estimator of /3 can be used in place of b in (48) and the resulting
b used in (47); the resulting estimates of /3 will be asymptotically equivalent to
FIML. Zellner’s (1962) seemingly unrelated regression technique falls in this class,
1784 A. Deaton

see also Gallant (1975) and the survey by Srivastava and Dwivedi (1979) for
variants. Consistency of estimation of 4 in (47) is unaffected by the choice of 0;
the MLE’s of /3 and 52 are asymptotically independent, as calculation of the
information matrix will show. All this is standard enough, except possibly for
computation, but the use of standard algorithms such as those of Marquardt
(1963), scoring, Berndt, Hall, Hall and Hausman (1974) Newton-Raphson,
Gauss-Newton all work well for these models, see Quandt (1984) in this Handbook
for a survey. Note also Byron’s (1982) technique for estimating very large
symmetric systems.
Nevertheless, there are a number of problems, particularly concerned with the
estimation of the covariance matrix 9, and these may be severe enough to make
the foregoing estimators undesirable, or even infeasible. Taking feasibility first,
note that the estimated covariance matrix b given by (48) is the mean of T
matrices each of rank 1 so that its rank cannot be greater than T. In consequence,
systems for which (n - 1) > T cannot be estimated by FIML or SURE if the
inverse of the estimated b is required. Even this underestimates the problem. In
the linear case (e.g. the Rotterdam system considered below) the demand system
becomes the classical multivariate regression model
Y=XB+U, (49)
with Ya(TX(n’l))matrix, Xa(TXK)matrix, B(kX(n-l))andU(TX(n
- 1)). (The nth equation has been dropped). The estimated variance-covariance
matrix from (48) is then

b= +YfI- x(xtx)-'xt)y.

Now the idempotent matrix in backets has rank (T - k) so that the inverse will
not exist if n - 1 > T - k. Since X is likely to contain at least n + 2 variables
(prices, the budget and a constant), an eight commodity system would require at
least 19 observations. Non-linearities and cross-section restrictions can improve
matters, but they need not. Consider the following problem, first pointed out to
me by Teun Kloek. The AIDS system (19) illustrates most simply, though the
problem is clearly a general one. Combine the two parts of (19) into a single set of
equations,

wj, = (aj - &a,) + &ln X, + C (Y;j - Biaj)ln P/t

- iPiE CykhPktlnPmr+ Uir. (51)


k m

Not counting OLD,which is unidentified, the system (without restrictions) has a


Ch. 30: Demand Analysis 1785

total of (2 + n)(n - 1) parameters -(n -1) (Y’Sand /3’s, and n(n-1) y’s-or
(n + 2) per equation as in the previous example. But now, each equation has
2 + (n - 1)n parameters since all y’s always appear. In consequence, if the
constant, ln x, ln p, and the cross-terms are linearly independent in the sample,
and if T < 2 + (n - l)n, it is possible to choose parameters such that the calcu-
lated residuals for any one (arbitrarily chosen) equation will be exactly zero for all
sample points. For these parameters, one row and one column of the estimated b
will also be zero, its determinant will be zero and the log likelihood (41) or (43)
will be infinite. Hence full information MLE’s do not exist. In such a case, at least
56 observations would be necessary to estimate an 8 commodity disaggregation.
All these cases are variants of the familiar “ undersized sample” problem in FIML
estimation of simultaneous equation systems and they set upper limits to the
amount of commodity disaggregation that can be countenanced on any given
time-series data.
Given a singular variance-covariance matrix, for whatever reason, the log
likelihood (41) which contains the term - T/2 logdet 9, will be infinitely large
and FIML estimates do not exist. Nor, in general, can (47) be used to calculate
GLS or SURE estimators if a singular estimate of D is employed. However, there
are a number of important special cases in which (47) has solutions that can be
evaluated even when ti is singular (though it is less than clear what is the status of
these estimators). For example, in the classical multivariate regression model (49)
the solution to (47) is the OLS matrix estimator B = (X’X)-‘X’Y which does not
involve s2, see e.g. Goldberger (1964, pp. 207-12). Imposing identical within
equation restrictions on (49), e.g. homogeneity, produces another (restricted)
classical model with the same property. With cross-equation restrictions of the
form R/3 = r, e.g. symmetry, for stacked j3, fi, the solution to (47) is

which, though involving 52, can still be calculated with Q singular provided the
matrix in square brackets is non-singular. I have not been able to find the general
conditions on (47) that allow solutions of this form, nor is it clear that it is
important to do so. General non-linear systems will not be estimable on under-
sized samples, and except in the cases given where closed-form solutions exist,
attempts to solve (47) and (48) numerically will obviously fail.
The important issue, of course, is the small sample performance of estimators
based on near-singular or singular estimates of Q. In most time series applications
with more than a very few commodities, fi is likely to be a poor estimator of s2
and the introduction of very poor estimates of 52 into the procedure for parame-
ter estimation is likely to give rise to extremely inefficient estimates of the latter.
Paradoxically, the search for (asymptotic) efficiency is likely to lead, in this case,
1786 A. Deaton

to much greater (small-sample) inefficiency than is actually obtainable. Indeed it


may well be that estimation techniques which do not depend on estimating s2 will
give better estimates in such situations. One possibility is the minimization of the
truce of the matrix on the right-hand side of (48) rather than its determinant as
required by FIML. This is equivalent to (non-linear) least squares applied to the
sum of the residual sums of squares over each equation and can be shown to be
ML if (the true) 52= a2(1- ii’) for some a*, see Deaton (1975a, p. 39). There is
some general evidence that such methods can dominate SURE and FIML in
small samples, see again Srivastava and Dwivedi (1979). Fiebig and Theil (1983)
and Theil and Rosalsky (1984) have carried out Monte Carlo simulations of
symmetry constrained linear systems, i.e. with estimators of the form (52). The
system used has 8 commodities, 15 observations and 9 explanatory variables so
that their estimate of fi from (50) based on the unconstrained regressions is
singular. Fiebig and Theil find that replacing ti by fi yielded “estimates with
greatly reduced efficiency and standard errors which considerably underestimate
the true variability of these estimates”. A number of alternative specifications for
were examined and Theil and Rosalsky found good performance in terms of MSE
for Deaton’s (1975a) specification 52= a*( a - uu’) where u is the sample mean of
the vector of budget shares and 0 is the diagonal matrix of u’s. Their results also
give useful information on procedures for evaluating standard errors. Define the
matrix A(Z), element aij by

(53)

where uk’ is the (k, I)th element of X1, so that { A(@} -’ is the conventionally
used (asymptotic) variance-covariance matrix of the FIML estimates p from (47).
Define also B(& s2) by

(54)

Hence, if p* is estimated from (47) using some assumed variance-covariance


matrix a say (as in the experiments reported above), then the variance-covari-
ante matrix V* is given by

(55)

Fiebig and Theil’s experiments suggest good performance if s2 in B((2, a) is


replaced by 0 from (48).
Ch. 30: Demand Analysis 1787

2.4. Interpretation of results

It is perhaps not surprising that authors who finally surmounted the obstacles in
the way of estimating systems of demand equations should have professed
themselves satisfied with their hard won results. Mountaineers are not known for
criticising the view from the summit. And certainly, models such as the linear
expenditure system, or which embody comparably strong assumptions, yield very
high R* statistics for expenditures or quantities with t-values that are usually
closer to 10 than to unity. Although there are an almost infinite number of studies
using the linear expenditure system from which to illustrate, almost certainly the
most comprehensive is that by Lluch, Powell and Williams (1977) who fit the
model (or a variant) to data from 17 developed and developing countries using an
eightfold disaggregation of commodities. Of the 134 R2 statistics reported (for 2
countries 2 of the groups were combined) 40 are greater than 0.99,104 are greater
than 0.95 and only 14 are below 0.90. (Table 3.9 p. 49). The parameter estimates
nearly ah “look sensible” and conform to theoretical restrictions, i.e. marginal
propensities to consume are positive yielding, in the case of the linear expenditure
system, a symmetric negative semi-definite Slutsky matrix. However, as is almost
invariably the case with the linear expenditure system, the estimated residuals
display substantial positive autocorrelation. Table 3.10 in Lluch, Powell and
Williams displays Durbin-Watson statistics for all countries and commodities: of
the 134 ratios, 60 are less than 1.0 and only 15 are greater than 2.0. Very similar
results were found in my own, Deaton (1975a), application of the linear expendi-
ture system to disaggregated expenditures in post-war Britain. Such results
suggest that the explanatory power of the model reflects merely the common
upward time trends in individual and total expenditures. The estimated j3
parameters in (33), the marginal propensities to consume, will nevertheless be
sensible, since the model can hardly fail to reflect the way in which individual
expenditures evolve relative to their sum over the sample as a whole. Obtaining
sensible estimates of marginal propensities to spend on time-series data is not an
onerous task. Nevertheless, the model singularly fails to account for variations
around trend, the high R* statistics could be similarly obtained by replacing total
expenditure by virtually any trending variable, and the t-values are likely to be
grossly overestimated in the presence of the very severe autocorrelation, see, e.g.
Malinvaud (1970, pp. 521-2) and Granger and Newbold (1974). In such cir-
cumstances, the model is almost certainly a very poor approximation to whatever
process actually generated the data and should be abandoned in favor of more
appropriate alternatives. It makes little sense to “treat” the autocorrelation by
transforming the residuals by a Cochrane-Orcutt type technique, either based on
(44) with a common parameter, or using a full vector autoregressive specification.
[See Hendry (1980) for some of the consequences of trying to do so in similar
situations.]
1788 A. Demon

In spite of its clear misspecifications, there may nevertheless be cases where the
linear expenditure system or a similar model may be the best that can be done.
Because of its very few parameters, (2n - 1) for an n commodity system, it can be
estimated in situations (such as the LDC’s in Lluch, Powell and Williams book)
where data are scarce and less parsimonious models cannot be used. In such
situations, it will at the least give a theoretically consistent interpretation of the
data, albeit one that is probably wrong. But in the absence of alternatives, this
may be better than nothing. Even so, it is important that such applications be
seen for what they are, i.e. untested theory with “sensible” parameters, and not as
fully-tested data-consistent models.

2.5. Flexible functional forms

The immediately obvious problem with the linear expenditure system is that it has
too few parameters to give it a reasonable chance of fitting the data. Referring
back to (33) and dividing through by pi, it can be seen that the y, parameters are
essentially intercepts and that, apart from them, there is only one free parameter
per equation. Essentially, the linear expenditure system does little more than fit
bivariate regressions between individual expenditures and their total. Of course,
the prices also enter the model but all own- and cross-price effects must also be
allowed for within the two parameters per equation, one of which is an intercept.
Clearly then, in interpreting the results from such a model, for example, total
expenditure elasticities, own and cross-price elasticities, substitution matrices, and
so on, there is no way to sort out which numbers are determined by measurement
and which by assumption. Certainly, econometric analysis requires the applica-
tion of prior reasoning and theorizing. But it is not helped if the separate
influences of measurement and assumption cannot be practically distinguished.
Such difficulties can be avoided by the use of what are known as “flexible
functional forms,” Diewert (1971). The basic idea is that the choice of functional
form should be such as to allow at least one free parameter for the measurement
of each effect of interest. For example, the basic linear regression with intercept is
a flexible functional form. Even if the true data generation process is not linear,
the linear model without parameter restrictions can offer a first-order Taylor
approximation around at least one point. For a system of (n - 1) independent
demand functions, (n - 1) intercepts are required, (n - 1) parameters for the total
expenditure effects and n(n - 1) for the effects of the n prices. Bamett (1983b)
offers a useful discussion of how Diewert’s definition relates to the standard
mathematical notions of approximation.
Flexible functional form techniques can be applied either to demand functions
or to preferences. For the former, take the differential of (9) around some
Ch. 30: Demand Analysis 1789

convenient point, i.e.

dq, = hi0 + h,,du + xsijdpj. (56)

But from (10) and (14)

dlnu= dlnx-xw,dlnp, -(alnc/alnu)-‘, (57)


k

so that writing dq, = qidlnq, and multiplying (56) by pi/x, the approximation
becomes

widlnqi=ai+bi(dlnx- w.dlnp)+~cijdlnpj, (58)


i

where

ai = p,h,,/x

cij = PiSijPj/X.

Eq. (58), with ai, bi and cij parametrized, is the Rotterdam system of Barten
(1966), (1967), (1969) and Theil (1965), (1975b), (1976). It clearly offers a lo-
cal first-order approximation to the underlying relationship between q, x
and p.
There is, of course, no guarantee that a function hi( u, p) exists which has ai, bi
and cij constant. Indeed, if it did, Young’s theorem gives hiuj = hij, which, from
(59), is easily seen to hold only if cij = - ( ijijbi - bibi). If imposed, this restriction
would remove the system’s ability to act as a flexible functional form. (In fact, the
restriction implies unitary total expenditure and own-price elasticities). Contrary
to assertions by Phlips (1974,1983), Yoshihara (1969), Jorgenson and Lau (1976)
and others, this only implies that it is not sensible to impose the restriction; it
does not affect the usefulness of (58) for approximation and study of the true
demands via the approximation, see also Barten (1977) and Barnett (1979b).
Flexible functional forms can also be constructed by approximating preferences
rather than demands. By Shephard’s Lemma, an order of approximation in prices
(or quantities) - but not in utility- is lost by passing from preferences to de-
mands, so that in order to guarantee a first-order linear approximation in the
latter, secondorder approximation must be guaranteed in preferences. Beyond
1790 A. Deaton

that, one can freely choose to approximate the direct utility function, the indirect
utility function, the cost-function or the distance function provided only that the
appropriate quasi-concavity, quasi-convexity, concavity and homogeneity restric-
tions are observed. The best known of these approximations is the trunslog,
Sargan (1971) Christensen, Jorgenson and Lau (1975) and many subsequent
applications. See in particular Jorgenson, Lau and Stoker (1982) for a comprehen-
sive treatment. The indirect translog gives a quadratic approximation to the
indirect function J/*(r) for normalized prices, and then uses (14) to derive the
system of share equations. The forms are
~*(r)=~,+C~,1nr,+3CCp,*,lnr,lnrj (60)
k i

(61)

where pij = i( pi; + /?;T). In estimating (61) some normalization is required, e.g.
that Ca, = 1. The direct translog approximates the direct utility function as a
quadratic in the vector q and it yields an equation of the same form as (61) with
w, on the left-hand side but with qi replacing r, on the right. Hence, while (61)
views the budget share as being determined by quantity adjustment to exogenous
price to outlay ratios, the direct translog views the share as adapting by prices
adjusting to exogenous’ quantities. Each could be appropriate under its own
assumptions, although presumably not on the same set of data. Yet another
flexible functional form with close affinities to the translog is the second-order
approximation to the cost function offered by the AIDS, eqs. (17) (18) and (19)
above. Although the translog considerably predates the AIDS, the latter is a good
deal simpler to estimate, at least if the price index In P can be adequately
approximated by some fixed pre-selected index.
The AIDS and translog models yield demand functions that are first-order
flexible subject to the theory, i.e. they automatically possess symmetric substitu-
tion matrices, are homogeneous, and add up. However, trivial cases apart, the
AIDS cost function will not be globally concave nor the translog indirect utility
function globally convex, though they can be so over a restricted range of r (see
below). The functional forms for both systems are such that, by relaxing certain
restrictions, they can be made first-order flexible without theoretical restrictions,
as is the Rotterdam system. For example, in the AIDS, eq. (19) the restrictions
yii = yji and +,, = 0 can be relaxed while, in the indirect translog, eq. (61)
Pij = Pii can be relaxed and In x included as a separate variable without neces-
sarily assuming that its coefficient equals -cpij. Now, if the theory is correct,
and the flexible functional form is an adequate representation of it over the data,
the restrictions should be satisfied, or at least not significantly violated. Similarly,
Ch. 30: Demand Analysis 1791

for the Rotterdam system, if the underlying theory is correct, it might be expected
that its approximation by (58) would estimate derivatives conforming to the
theoretical restrictions. From (59), homogeneity requires ccij = 0 and symmetry
cij = cji. Negative semi-definiteness of the Slutsky matrix can also be imposed
(globally for the Rotterdam model and at a point for the other models) following
the work of Lau (1978) and Barten and Geyskens (1975).
The AIDS, translog, and Rotterdam models far from exhaust the possibilities
and many other flexible functional forms have been proposed. Quadratic logarith-
mic approximations can be made to distance and cost functions as well as to
utility functions. The direct quadratic utility function u = (q - a)‘A(q - a) is
clearly flexible, though it suffers from other problems such as the existence of
“bliss” points, see Goldberger (1967). Diewert (1973b) suggested that G*(r) be
approximated by a “Generalized Leontief” model

-1

+*W = (6, + 2 CQfi2 + C Cyijr:/2$‘2


I
.
i i j

This has the nice property that it is globally quasi-convex if Si 2 0 and yii 2 0 for
all i, j; it also generalizes Leontief since with 6, = Si = 0 and yij = 0 for i # j,
#*(r) is the indirect utility function corresponding to the Leontief preferences (2).
Bemdt and Khaled (1979) have, in the production context, proposed a further
generalization of (62) where the 3 is replaced by a parameter, the “generalized
BOX-COX” system.
There is now a considerable body of literature on testing the symmetry and
homogeneity restrictions using the Rotterdam model, the translog, or these other
approximations, see, e.g. Barten (1967), (1969) Byron (1970a), (1970b), Lluch
(1971), Parks (1969) Deaton (1974a), (1978), Deaton and Muellbauer (1980b),
Theil (1971a), (1975b), Christensen, Jorgensen and Lau (1975) Christensen and
Manser (1977), Bemdt, Darrough and Diewert (1977) Jorgenson and Lau (1976),
and Conrad and Jorgenson (1979). Although there is some variation in results
through different data sets, different approximating functions, different estimation
and testing strategies, and different commodity disaggregations, there is a good
deal of accumulated evidence rejecting the restrictions. The evidence is strongest
for homogeneity, with less (or perhaps no) evidence against symmetry over and
above the restrictions embodied in homogeneity. Clearly, for any one model, it is
impossible to separate failure of the model from failure of the underlying theory,
but the results have now been replicated frequently using many different func-
tional forms, so that it seems implausible that an inappropriate specification is at
the root of the difficulty. There are many possible substantive reasons why the
theory as presented might fail, and I shall discuss several of them in subsequent
sections. However, there are a number of arguments questioning this sort of
1792 A. Deaton

procedure for testing. One is a statistical issue, and questions have been raised
about the appropriateness of standard statistical tests in this context; I deal with
these matters in the next subsection. The other arguments concern the nature of
flexible functional forms themselves.
Empirical work by Wales (1977), Thursby and Lovell (1978) Griffin (1978),
Berndt and Khaled (1979), and Guilkey and Lovell (1980) cast doubt on the
ability of flexible functional forms both to mimic the properties of actual
preferences and technologies, and to behave “regularly” at points in price-outlay
space other than the point of local approximation (i.e. to generate non-negative,
downward sloping demands). Caves and Christensen (1980) investigated theoreti-
cally the global properties of the (indirect) translog and the generalized Leontief
forms. For a number of two and three commodity homothetic and non-homo-
thetic systems, they set the parameters of the two systems to give the same pattern
of budget shares and substitution elasticities at a point in price space, and then
mapped out the region for which the models remained regular. Note that
regularity is a mild requirement; it is a minimal condition and does not by itself
suggest that the system is a good approximation to true preferences or behavior.
It is not possible here to reproduce Caves and Christensen’s diagrams, nor do the
authors give any easily reproducible summary statistics. Nevertheless, although
both systems can do well (e.g. when substitutability is low so that preferences are
close to Leontief, the GL is close to globahy regular, and similarly for the translog
when preferences are close to Cobb-Douglas), there are also many cases where
the regular regions are worringly small. Of course, these results apply only to the
translog and the GL systems, but I see no reason to suppose that similar problems
would not occur for the other flexible functional forms discussed above.
These results raise questions as to whether Taylor series approximations, upon
which most of these functional forms are based, are the best type of approxima-
tions to work with, and there has been a good deal of recent activity in exploring
alternatives. Barnett (1983a) has suggested that Laurent series expansions are a
useful avenue to explore. The Laurent expansion of a function f(x) around the
point x,, takes the form

f(x)=
“=
‘c”+-XJ,
-00
(63)

and Barnett has suggested generalizing the GL form (62) to

{4*(r))-‘=a, +2a’v + v’Av -2b5- U'BV, (64)

where v. = r!12 and 5. = r.w1/2. The resulting demand system has too many
parameters to be estimaied in most applications, and has more than it needs to be
Ch. 30: Demand Analysis 1793

a second-order flexible functional form. To overcome this, Barnett suggests


setting b = 0, the diagonal elements of B to zero, and forcing the off-diagonal
elements of both A and B to be non-negative (the Laurent model (64) like the
GL model (62) is globally regular if all the parameters are non-negative). The
resulting budget equations are

wi = a;~, + aiiui + c + c b,‘jfijfii /D,


LZ~‘~U~CI~ (65)
j+i j#i

where D is the sum over i of the bracketed expression. Barnett calls this the
miniflex Laurent model. The squared terms guarantee non-negativity, but are
likely to cause problems with multiple optima in estimation. Bamett and Lee
(1983) present results comparable to those of Caves and Christensen’s which
suggest that the miniflex Laurent has a substantially larger regular region than
either translog or GL models.
A more radical approach has been pioneered by Gallant, see Gallant (1981),
and Gallant and Golub (1983), who has shown how to approximate indirect
utility functions using Fourier series. Interestingly, Gallant replicates the
Christensen, Jorgenson and Lau (1975) rejection of the symmetry restriction,
suggesting that their rejection is not caused by the approximation problems of the
translog. Fourier approximations are superior to Taylor approximations in a
number of ways, not least in their ability to keep their approximating qualities in
the face of the separability restrictions discussed in Section 4 below. However,
they are also heavily parametrized and superior approximation may be being
purchased at the expense of low precision of estimation of key quantities. Finally,
many econometricians are likely to be troubled by the sinusoidal behavior of
fitted demands when projected outside the region of approximation. There is
something to be said for using approximating functions that are themselves
plausible for preferences and demands.
The whole area of flexible functional forms is one that has seen enormous
expansion in the last five years and perhaps the best results are still to come. In
particular, other bases for spanning function space are likely to be actively
explored, see, e.g. Bamett and Jones (1983).

2.6. Statistical testing procedures

The principles involved are most simply discussed within a single model and for
convenience I shall use the Rotterdam system written in the form, i = 1,. . . , (n - 1)

w,dlnq,=~,+b~dlnx,+Cyiidlnpj+Uirr (66)
1794 A. Deaton

where dln X, is an abbreviated form of the term in (58) and, in practice, the
differentials would be replaced by finite approximations, see Theil(1975b, Chapter
2) for details. I shall omit the n th equation as a matter of course so that D stands
for the (n - 1) x (n - 1) variance-covariance matrix of the u ‘s.
The u, vectors are assumed to be identically and independently distributed as
N(O,52). I shall discuss the testing of two restrictions: homogeneity ciyij = 0, and
symmetry, y,j = yii.
Equation (66) is in the classical multivariate regression:orm (49) so equation
by equation OLS yield: SURE and FIML estimates. Let p be the stacked vector
of OLS estimates and D for the unrestricted estimate of the variance-covariance
matrix (50). If the matrix of unrestricted residuals Y - Xi is denoted by I?, (50)
takes the form

Testing homogeneity is relatively straightforward since the restrictions are within


equation restrictions. A simple way to proceed is to substitute y,,, = -cy-ly,,
into (66) to obtain the restricted model

n-1

w,dlnqj=a,+bjdlnx,+ c yij(dlnpj-dlnp,), (68)


J=l

and re-estimate. Once again OLS is SURE is FIML and the restriction can be
tested equation by equation using standard text-book F-tests. These are exact
tests and no problems of asymptotic approximation arise. For examples, see
Deaton and Muellbauer’s (1980b) rejections of homogeneity using AIDS. If an
overall test is desired, a Hotelling T2 test can be constructed for the system as a
whole, see Anderson (1958 pp. 207-10) and Laitinen (1978). Laitinen also
documents the divergence between Hotelling’s T2 and its limiting x2 distribution
when the sample size is small relative to the number of goods, see also Evans and
Savin (1982). In consequence, homogeneity should always be tested using exact F
or T2 statistics and neuer using asymptotic test statistics such as uncorrected
Wald, likelihood ratio, or Lagrange multiplier tests. However, my reading of the
literature is that the rejection of homogeneity in practice tends to be confirmed
using exact tests and is not a statistical illusion based on the use of inappropriate
asymptotics.
Testing symmetry poses much more severe problems since the presence of the
cross-equation restrictions makes estimation more difficult, separates SUR from
FIML estimators and precludes exact tests. Almost certainly the simplest testing
procedure is to use a Wald test based on the unrestricted (or homogeneous)
estimates. Define R as the fn(n -1)X( n - l)( n + 2) matrix representing the
Ch. 30: Demand Analysis 1795

symmetry (and homogeneity) restrictions on /3, so that

(R/3)‘= (I$*-?% Y13-Y31,...,Y(n-l)n-Yn(n-l)). (69)

Then, under the null hypothesis of homogeneity and symmetry combined,

w,=P’R’[R(B~(xlx)-l}R’]-lR~, (70)

is the Wald test statistic which is asymptotically distributed as ~f,~~(~_i). Apart


from the calculation of W, itself, computation requires no more than OLS
estimation. Alternatively, the symmetry constrained estimator B given by (52)
with r = 0, can be calculated. From this, restricted residuals E can be derived,
and a new (restricted) estimate of 9, 0, i.e.

ji=T-li’E
(71)

The new estimate of b can be substituted into (52) and iterations continued to
convergence yielding the FIML estimators of /3 and Sz. Assume that this process
has been carried out and that (at the risk of some notational confusion) fi and fi
are the final estimates. A likelihood ratio test can then be computed according to

W,=Tln{detfi/detfi}, (72)

and W, is also asymptotically distributed as ~:,~~(~_i). Finally, there is the


Lagrange multiplier, or score test, which is derived by replacing h in (70) by 0,
so that

(73)

with again the same limiting distribution.


From the general results of Berndt and Savin (1977) it is known that W, 2 W,
2 W,; these are mechanical inequalities that always hold, no matter what the
configuration of data, parameters, and sample size. In finite samples, with
inaccurate and inefficient estimates of s2, the asymptotic theory may be a poor
approximation and the difference between the three statistics may be very large.
In my own experience I have encountered a case with 8 commodities and 23
observations where W, was more than a hundred times greater than W,. Meisner
(1979) reports experiments with the Rotterdam system in which the null hypothe-
sis was correct. With a system,of 14 equations and 31 observations, W, rejected
symmetry at 5% 96 times out of 100 and at 1% 91 times out of 100. For 11
equations the corresponding figures were 50 and 37. Bera, Byron and Jarque
(1981) carried out similar experiments for W, and W,. From the inequalities, we
1796 A. Deaton

know that rejections will be less frequent, but it was still found that, with n large
relative to (T - k) both W, and W, grossly over-rejected.
These problems for testing symmetry are basically the same as those discussed
for estimation in (2.3) above; typical time series are not long enough to give
reliable estimates of the variance-covariance matrix, particularly for large sys-
tems. For estimation, and for the testing of within equation restrictions, the
difficulties can be circumvented. But for testing cross-equation restrictions, such
as symmetry, the problem remains. For the present, it is probably best to suspend
judgment on the -existing tests of symmetry (positive or negative) and to await
theoretical or empirical developments in the relevant test statistics. [See Byron
and Rosalsky (1984) for a suggested ad hoc size correction that appears to work
well in at least some situations.]

2.7. Non-parametric tests

All the techniques of demand analysis so far discussed share a common approach
of attempting to fit demand functions to the observed data and then enquiring as
to the compatibility of these fitted functions with utility theory. If unlimited
experimentation were a real possibility in economics, demand functions could be
accurately determined. As it is, however, what is observed is a finite collection of
pairs of quantity and price vectors. It is thus natural to argue that the basic
question is whether or not these observed pairs are consistent with any preference
ordering whatever, bypassing the need to specify particular demands or prefer-
ences. It may well be true that a given set of data is perfectly consistent with
utility maximization and yet be very poorly approximated by AIDS, the translog,
the Rotterdam system or any other functional form which the limited imagination
of econometricians is capable of inventing.
Non-parametric demand analysis takes a direct approach by searching over the
price-quantity vectors in the data for evidence of inconsistent choices. If these do
exist, a utility function exists and algorithms exist for constructing it (or at least
one out of the many possible). The origins of this type of analysis go back to
Samuelson’s (1938) introduction of revealed preference analysis. However, the
recent important work on developing test criteria is due to Hanoch and Rothschild
(1972) and especially to Afriat (1967), (1973), (1976), (1977) and (1981). Unfor-
tunately, some of Afriat’s best work has remained unpublished and the published
work has often been difficult for many economists to understand and assimilate.
However, as the techniques involved have become more widespread in economics,
other workers have taken up the topic, see the interpretative essays by Diewert
(1973a) and Diewert and Parkan (1978) -the latter contains actual test results-and
also the recent important work by Varian (1982, 1983).
Afriat proposes that a finite set of data be described as cyclically consistent if,
for any “cycle”, a, b, c,. . ., r, a of indices, pa. q” 2 pa. qb, ph. qb 2 pb* q’,
Ch. 30: Demand Analysis 1191

. . ..p’q’>p’q”.thenitmustbetruethat pa-q”=p”-qb, pbqb=pbqc,...,prqr=


p’q”. He then shows that cyclical consistency is necessary and sufficient for the
finite set of points to be consistent with the existence of a continuous, non-sati-
ated, concave and monotonic utility function. Afriat also provides a constructive
method of evaluating such a utility function. Varian (1982) shows that cyclical
consistency is equivalent to a “generalized axiom of revealed preference” (GARP)
that is formulated as follows. Varian defines q’ as strictly directly revealed
preferred to q, written qiPoq if p’q’ > p’q, i.e. qi was bought at pi even though q
cost less. Secondly qi is revealed preferred to q, written q’Rq, if p’q’ 2 p’qj,
pjqj 2 Pjqk , . . ., p”‘q” 2 p”‘q, for some sequence of observations (q’, q-j,. . ., q”),
i.e. qi is indirectly or directly (weakly) revealed preferred to q. GARP then states
that q’Rqj implies not qjP”qi, and all the nice consequences follow. Varian has
also supplied an efficient and easily used algorithm for checking GARP, and his
methods have been widely applied. Perhaps not surprisingly, the results show few
conflicts with the theory, since on aggregate time series data, most quantities
consumed increase over time so that contradictions with revealed preference
theory are not possible; each new bundle was unobtainable at the prices and
incomes of all previous periods.
Since these methods actually allow the construction of a well-behaved utility
function that accounts exactly for most aggregate time-series data, the rejections
of the theory based on parametric models (and on semi-parametric models like
Gallant’s Fourier system) must result from rejection of functional form and not
from rejection of the theory per se. Of course, one could regard the non-paramet-
ric utility function as being a very profligately parametrized parametric utility
function, so that if the object of research is to find a reasonably parsimonious
theory-consistent formulation, the non-parametric results are not very helpful.
Afriat’s and Varian’s work, in particular see Afriat (1981) and Varian (1983),
also allows testing of restricted forms of preferences corresponding to the various
kinds of separability discussed in Section 4. Varian has also shown how to handle
goods that are rationed or not freely chosen, as in Section 6 below. Perhaps most
interesting are the tests for homotheticity, a condition that requires the utility
function to be a monotone increasing transform of a linearly homogeneous
function and which implies that all total expenditure elasticities are unity. Afriat
(1977) showed that for two periods, 0 and 1, the necessary and sufficient
condition for consistency with a homothetic utility function is that the Laspeyres
price index be no less than the Paasche price index, i.e. that

P’YO > PW
po. qo - po. ql *
(74)

For many periods simultaneously, Afriat (1981) shows that the Laspeyres index
between any two periods i and j, say, should be no less than the chain-linked
Paasche index obtained by moving from i to j in any number of steps. Given that
1798 A. Deaton

no one using any parametric form has ever suggested that all total expenditure
elasticities are unity, it comes as something of a surprise that the Afriat condition
appears to be acceptable for an 111 commodity disaggregation of post-war U.S.
data, see Manser and McDonald (1984).
Clearly, more work needs to be done on reconciling parametric and non-para-
metric approaches. The non-parametric methodology has not yet been success-
fully applied to cross-section data because it provides no obvious way of dealing
with non-price determinants of demand. There are also difficulties in allowing for
“disturbance terms” so that failures of, e.g. GARP, can be deemed significant or
insignificant, but see the recent attempts by Varian (1984) and by Epstein and
Yatchew (1985).

3. Cross-section demand analysis

Although the estimation of complete sets of demand functions on time-series data


has certainly been the dominant concern in demand analysis in recent years, a
much older literature is concerned with the analysis of “family budgets” using
sample-survey data on cross-sections of households. Until after the Second World
War, such data were almost the only sources of information on consumer
behavior. In the last few years, interest in the topic has once again become intense
as more and more such data sets are being released in their individual microeco-
nomic form, and as computing power and econometric technique develop to deal
with them. In the United Kingdom, a regular Family Expenditure Survey with a
sample size of 7000 households has been carried out annually since 1954 and the
more recent tapes are now available to researchers. The United States has been
somewhat less forward in the area and until recently, has conducted a Consumer
Expenditure Survey only once every decade. However, a large rotating panel
survey has recently been begun by the B.L.S. which promises one of the richest
sets of data on consumer behavior ever available and it should help resolve many
of the long-standing puzzles over differences between cross-section and time-series
results. For example, most very long-run time-series data sets which are available
show a rough constancy of the food share, see Kuznets (1962) (1966), Deaton
(1975~). Conversion to farm-gate prices, so as to exclude the increasing compo-
nent of transport and distribution costs and built in services, gives a food share
which declines, but does so at a rate which is insignificant in comparison to its
rate of decline with income in cross-sections [for a survey of cross-section results,
see Houthakker (1957)]. Similar problems exist with other categories of expendi-
ture as well as with the relationship between total expenditure and income.
There are also excellent cross-section data for many less developed countries, in
particular from the National Sample Survey in India, but also for many other
South-East Asian countries and for Latin America. These contain a great wealth
Ch. 30: Demand Analysis 1799

of largely unexploited data, although the pace of work has recently been increas-
ing, see, for example, the survey paper on India by Bhattacharrya (1978), the
work on Latin America by Musgrove (1978), Howe and Musgrove (1977), on
Korea by Lluch, Powell and Williams (1977, Chapter 5) and on Sri Lanka by
Deaton (1981~).
In this section, I deal with four issues. The first is the specification and choice
of functional form for Engel curves. The second is the specification of how
expenditures vary with household size and composition. Third, I discuss a group
of econometric issues arising particularly in the analysis of micro data with
particular reference to the treatment of zero expenditures, including a brief
assessment of the Tobit procedure. Finally, I give an example of demand analysis
with a non-linear budget constraint.

3.1. Forms of Engel curves

This is very much a traditional topic to which relatively little has been added
recently. Perhaps the classic treatment is that of Prais and Houthakker (1955)
who provide a list of functional forms, the comparison of which has occupied
many manhours on many data sets throughout the world. The Prais-Houthakker
methodology is unashamedly pragmatic, choosing functional forms on grounds of
fit, with an attempt to classify particular forms as typically suitable for particular
types of goods, see also Tomqvist (1941), Aitchison and Brown (1954-5), and the
survey by Brown and Deaton (1972) for similar attempts. Much of this work is
not very edifying by modem standards. The functional forms are rarely chosen
with any theoretical model in mind, indeed all but one of Prais and Houthakker’s
Engel curves are incapable of satisfying the adding-up requirement, while, on the
econometric side, satisfactory methods for comparing different (non-nested) func-
tional forms are very much in their infancy. Even the apparently straightforward
comparison between a double-log and a linear specification leads to considerable
difficulties, see the simple statistic proposed by Sargan (1964) and the theoreti-
cally more satisfactory (but extremely complicated) solution in Aneuryn-Evans
and Deaton (1980).
More recent work on Engel curves has reflected the concern in the rest of the
literature with the theoretical plausibility of the specification. Perhaps the most
general results are those obtained in a paper by Gorman (1981), see also Russell
(1983) for alternative proofs. Gorman considers Engel curves of the general form

where R is some finite set and (p,( ) are a series of functions. If such equations are
1800 A. Deaton

to be theory consistent, there must exist a cost function c(u, p) such that

iilnp, - ai~(p)@r{lnc(u~p)~~ (76)


rER

Gorman shows that for these partial differential equations to have a solution, (a)
the rank of the matrix formed from the coefficients arr( p) can be no larger than 3
and (b), the functions c&( ) must take specific restricted forms. There are three
generic forms for (75), two of which are reproduced below

Wi=aj(p)+b,(p)lnx+dj(p) 5 Y~(P)O~~” (77)


m=l

wi=“i(P)+bi(P) C CL~~Z(P)~~~+~~(P) C em(P)X(Tm~ (78)


U,ES- %I~~+

where S is a finite set of elements u,, S_ its negative elements and S, its positive
elements. A third form allows combinations of trigonometrical functions of x
capable of approximating a quite general function of x. However, note that the
y,, 1-1, and 9, functions in (77) and (78) are not indexed on the commodity
subscript i, otherwise the rank condition on a,, could not hold.
Equations (77) and (78) provide a rich source of Engel curve specifications and
contain as special cases anumber of important forms. From (77), with m =l, the
form proposed by Working and Leser and discussed above, see (15), is obtained.
In econometric specifications, u,(p) adds to unity and b,(p) to zero, as will their
estimates if OLS is applied to each equation separately. The log quadratic form

(79)

was applied in Deaton (1981~) to Sri Lankan micro household data for the food
share where the quadratic term was highly significant and a very satisfactory fit
was obtained (an R2 of 0.502 on more than 3,000 observations.) Note that, while
for a single commodity, higher powers of In x could be added, doing so in a
complete system would require cross-equation restrictions since, according to
(77), the ratios of coefficients on powers beyond unity should be the same for all
commodities. Testing such restrictions (and Wald tests offer a very simple
method-see Section 4(a) below) provides yet another possible way of testing the
theory.
Equation (78) together with S = { - 1, 1, 2,. . . , r , . . . } gives general polynomial
Engel curves. Because of the rank condition, the quadratic with S = { - 1, l} is as
Ch. 30: Demand Analysis 1801

general as any, i.e.

Piqi=b:(P)+al(P)X+dr(P)X*, (80)
where b:(p) = bi( p)p,(p) and dT( p) = di( p)f3,( p). This is the “quadratic
expenditure system” independently derived by Howe, Pollak and Wales (1979)
Pollak and Wales (1978) and (1980). The cost function underlying (80) may be
shown to be

a(P)
4% P) = 4 P)- u + y(p> Y (81)

where the links between the ai, br and dr on the one hand and the (Y,j3 and y
on the other are left to the interested reader. (With lnc(u, p) on the left hand
side, (81) also generates the form (79)). This specification, like (79) is also of
considerable interest for time-series analysis since, in most such data, the range of
variation in x is much larger than that in relative prices and it is to be expected
that a higher order of approximation in x than in p would be appropriate.
Indeed, evidence of failure of linearity in time-series has been found in several
studies, e.g. Carlevaro (1976). Nevertheless, in Howe, Pollak and Wales’ (1979)
study using U.S. data from 1929-1975 for four categories of expenditure, tests
against the restricted version represented by the linear expenditure system yielded
largely insignificant results. On grouped British cross-section data pooled for two
separate years and employing a threefold categorization of expenditures, Pollak
and Wales (1978) obtain a x2 values of 8.2 (without demographics) and 17.7
(with demographics) in likelihood ratio tests against the linear expenditure
system. These tests have 3 degrees of freedom and are notionally significant at the
5% level (the 5% critical value of a x: variate is 7.8) but the study is based on
only 32 observations and involves estimation of a 3 X 3 unknown covariance
matrix. Hence, given the discussion in Section 2.6 above, a sceptic could reasona-
bly remain unconvinced of the importance of the quadratic terms for this
particular data set.
Another source of functional forms for Engel curves is the study of conditions
under which it is possible to aggregate over consumers and I shall discuss the
topic in Section 5 below.

3.2. Modelling demographic eflects

In cross-section studies, households typically vary in much more than total


expenditure; age and sex composition varies from household to household, as do
the numbers and ages of children. These demographic characteristics have been
1802 A. Deaton

the object of most attention and I shall concentrate the discussion around them,
but other household characteristics can often be dealt with in the same way, (e.g.
race, geographical region, religion, occupation, pattern of durable good owner-
ship, and so on). If the vector of these characteristics is a, and superscripts denote
individual households, the general model becomes

q:= gi(Xh2
P? ah)? (82)
with gi taken as common and, in many studies, with p assumed to be the same
across the sample and suppressed as an argument in the function.
The simplest methodology is to estimate a suitable linearization of (82) and one
question which has been extensively investigated in this way is whether there are
economies of scale to household size in the consumption of some or all goods. A
typical approach is to estimate

lnq,~=~i+Pilnxh+yjlnnh+~i, (83)

where nh is the (unweighted) number of individuals in the household. Tests are


then conducted for whether (y, + pi - 1) is negative (economies of scale), zero (no
economies or diseconomies) or positive (diseconomies of scale), since this magni-
tude determines whether, at a given level of per capita outlay, quantity per head
decreases, remains constant, or increases. For example, Iyengar, Jain and
Srinivasan (1968), using (83) on data from the 17th round of the Indian N.S.S.
found economies of scale for cereals and for fuel and light, with roughly constant
returns for milk and milk products and for clothing.
A more sophisticated approach attempts to relate the effects of characteristics
on demand to their role in preferences, so that the theory of consumer behavior
can be used to suggest functional forms for (82) just as it is used to specify
relationships in terms of prices and outlay alone. Such models can be used for
welfare analysis as well as for the interpretation of demand; I deal with the latter
here leaving the welfare applications to Section 7 below. A fairly full account of
the various models is contained in Deaton and Muellbauer (1980a, Chapter 8) so
that the following is intended to serve as only a brief summary.
Fully satisfactory models of household behavior have to deal both with the
specification of needs or preferences at the individual level and with the question
of how the competing and complementary needs of different individuals are
reconciled within the overall budget constraint. The second question is akin to the
usual question of social choice, and Samuelson (1956) suggested that family utility
U, might be written as

Uh= v( d(ql) )....., U”h(q”h)], (84


Ch. 30: Demand Analysis 1803

for the n h individuals in household h. Such a form allows decentralized budgeting


over members subject to central (parental) control over members’ budgets.
Presumably the problems normally inherent in making interpersonal comparisons
of welfare are not severe within a family since, typically, such allocations seem to
be made in a satisfactory manner. Building on this idea, Muellbauer (1976~) has
suggested that utility is equalised within the family (e.g. for a maximin social
welfare function), so that if yr(u, p) is the cost function for individual r, the
family cost function is given by

which, if needs can be linked to, say, age through the y functions, would yield an
applicable specification with strong restrictions on behavior. However, such
models are somewhat artificial in that they ignore the ‘public’ or shared goods in
family consumption, though suitable modifications can be made. They also lack
empirical sharpness in that the consumption vectors of individual family members
are rarely observed. The exception is in the case of family labor supply, see
Chapter 32 of this volume.
Rather more progress has been made in the specification of needs under the
assumption that the family acts as a homogeneous unit. The simplest possibility is
that, for a given welfare level, costs are affected multiplicatively by some index
depending on characteristics and welfare, i.e.

Ch(Uh,p, u”) = m(uh, Uh)C(UhY


P>Y (86)
where c(uh, p) is the cost function for some reference household type, e.g. one
with a single adult. The index m(ah, uh) can then be thought of as the number of
adult equivalences generated by ah at the welfare level uh. Taking logarithms and
differentiating (86) with respect to In p, gives

(87)

which is independent of ah. Hence, if households face the same prices, those with
the same consumption patterns W, have the same uh, so that by comparing their
outlays the ratio of their costs is obtained. By (86), this ratio is the equivalence
scale m(uh, uh). This procedure derives directly from Engel’s (1895) pioneering
work, see Prais and Houthakker (1955). In practice, a single good, food, is usually
used although there is no reason why the model cannot be applied more generally
under suitable specification of the m and c functions in (86), see e.g. Muellbauer
1804 A. Deafon

(1977). For examples of the usual practice, see Jackson (1968) Orshansky (1965)
Seneca and Taussig (1971) and Deaton (1981~).
Although the Engel model is simple to apply, it has the long recognised
disadvantage of neglecting any commodity specific dimension to needs. Common
observation suggests that changes in demographic composition cause substitution
of one good for another as well as the income effects modelled by (86) and (87).
In a paper of central importance to the area, Barten (1964) suggested that
household utility be written

Uh= u(q*), (88)


q: = qi/mi(uh). (89)

So that, using Pollak and Wales’ (1981) later terminology, the demographic
variables generate indices which “scale” commodity consumption levels. The
Barten model is clearly equivalent to writing the cost function in the form

c”( Uh, p, a”) = c( Uhp*), (90)

Pi*=Pimi(ah), (91)

for a cost function C(U, p) for the reference household. Hence, if g;(x, p) are the
Marshallian demands for the household, household h’s demands are given by

(92)

Differentiation with respect to aj gives

alnqi _ alnmi
(93)
aaj aa,

where eik is the cross-price elasticity between i and k. Hence, a change in


demographic composition has a direct affect through the change in needs (on mi)
and an indirect effect through the induced change in the “effective” price
structure. It is this recognition of the quasi-price substitution effects of demo-
graphic change, that “a penny bun costs threepence when you have a wife and
child” that is the crucial contribution of the Barten model. The specification itself
may well neglect other important aspects of the problem, but this central insight
is of undeniable importance.
The main competition to the Barten specification comes from the model
originally due to Sydenstricker and King (1921) but rediscovered and popularized
by Prais and Houthakker (1955). This begins from the empirical specification,
Ch. 30: Demand Analysis 1805

apparently akin to (89)

(94)

where m,(a h, is the specifc commodity scale, and mo(ah) is some general scale.
In contrast to (93), we now have the relationship

alnq, = ahmi

Jai i?aj ei 7 (95)

so that the substitution effects embodied in (93) are no longer present. Indeed, if
xh/mo( ah) is interpreted as a welfare indicator (which is natural in the context)
(94) can only be made consistent with (88) and (89) if indifference curves are
Leontief, ruling out all substitution in response to relative price change, see
Muellbauer (1980) for details, and Pollak and Wales (1981) for a different
interpretation.
On a single cross-section, neither the Barten model nor the Prais-Houthakker
model are likely to be identifiable. That there were difficulties with the
Prais-Houthakker formulation has been recognized for some time, see Forsyth
(1960) and Cramer (1969) and a formal demonstration is given in Muellbauer
(1980). In the Barten model, (93) may be rewritten in matrix notation as

F=(I+E)M, (96)

and we seek to identify M from observable information on F. In the most


favorable case, E may be assumed to be known (and suitable assumptions may
make this practical even on a cross-section, see Section 4.2 below). The problem
lies in the budget constraint, p. q = x which implies w’[l+ E] = 0 so that the
matrix (I + E) has at most rank n -1. Hence, for any given F and E, both of
which are observable, there exist an infinite number of M matrices satisfying (96).
In practice, with a specific functional form, neither F nor E may be constant over
households so that the information matrix of the system could conceivably not be
singular. However, such identification, based on choice of functional form and the
existence of high nonlinearities, is inherently controversial. A much better solu-
tion is the use of several cross-sections between which there is price variation and,
in a such a case, several quite general functional forms are fully identified. For the
Prais-Houthakker model, (95) may be written as

F=M-em’, (97)

where m = d ln m’/Ja. From the budget constraint, w’F = 0 so that m’ = w’M


1806 A. Deaton

which yields

F= (I- ew’)M. (98)

Once again (I - ew’) is singular, and the identification problem recurs. Here price
information is likely to be of less help since, with Leontief preferences, prices
have only income effects. Even so, it is not difficult to construct Prais-Houthakker
models which identified given sufficient variation in prices.
Since Prais and Houthakker, the model has nevertheless been used on a number
of occasions, e.g by Singh (1972), (1973), Singh and Nagar (1973) and
McClements (1977) and it is unclear how identification was obtained in these
studies. The use of a double logarithmic formulation for f, helps; as is well-known,
such a function cannot add up even ZucaZly, see Willig (1976), Varian (1978), and
Deaton and Muellbauer (1980a, pp 19-20) so that the singularity arguments
given above cannot be used. Nevertheless, it seems unwise to rely upon a clear
misspecification to identify the parameters of the model. Coondoo (1975) has
proposed using an assumed independence of m, on x as an identifying restric-
tion; this is ingenious but, unfortunately, turns out to be inconsistent with
the model. There are a number of other possible means of identification, see
Muellbauer (1980), but essentially the only practical method is the obvious one of
assuming a priori a value for one of the m,‘s. By this means, the model can be
estimated and its results compared with those of the Barten model. Some results
for British data are given in Muellbauer (1977) (1980) and are summarized in
Deaton and Muellbauer (1980a, pp 202-5). In brief, these suggest that each
model is rather extreme, the Prais-Houthakker with its complete lack of substitu-
tion and the Barten with its synchronous equivalence of demographic and price
substitution effects. If both models are normalized to have the same food scale,
the Prais-Houthakker model also tends to generate the higher scales for other
goods since, unless the income effects are very large, virtually all variations with
composition must be ascribed directly to the mi’s. The Barten scales are more
plausible but evidence suggests that price effects and demographic effects are not
linked as simply as is suggested by (93).
Gorman (1976) has proposed an extension to (90) which appears appropriate in
the light of this evidence. In addition to the Barten substitution responses he adds
fixed costs of children yj(ah) say; hence (90) becomes

with (94) retained as before. Clearly, (99) generates demands of the form
Ch. 30: Demand Analysis 1807

Pollak and Wales (1981) call the addition of fixed costs “demographic translating”
as opposed to “demographic scaling” of the Barten model; the Gorman model
(99) thus combines translating and scaling. In their paper, Pollak and Wales test
various specifications of translating and scaling. Their results are not decisive but
tend to support scaling; with little additional explanatory power from translating
once scaling has been allowed for. Note, however, that the translating term in (99)
might itself form the starting point for the modelling, just as did the multiplicative
term in the Engel model. If the scaling terms in, (99) are dropped, so that p
replaces p*, and if it is recognized that the child cost term p. y(ah) is likely to be
zero for certain “adult” goods, then for i an adult good, we have

q; = h,( Uh, P), tw


independent of a”. For all such goods, additional children exert only income
effects, a proposition that can be straightforwardly tested by comparing the ratios
of child to income derivatives across goods, while families with the same outlay
on adult goods can be identified as having the same welfare level. This is the
model first proposed by Rothbarth (1943) and later implemented by Henderson
(1949-50a) (1949-50b) and Nicholson (1949), see also Cramer (1969). Deaton
and Muellbauer (1983) have recently tried to reestablish it as a simply imple-
mented model that is superior to the Engel formulation for applications where
computational complexity is a problem.

3.3. Zero expenditures and other problems

In microeconomic data on consumers expenditure, it is frequently the case that


some units do not purchase some of the commodities, alcohol and tobacco being
the standard examples. This is of course entirely consistent with the theory of
consumer behavior; for example, two goods (varieties) may be very close to being
perfect substitutes so that (sub) utility for the two might be

so that, if outlay is x, the demand functions are

4j = xi/Pi if pi/pj < ai/~j


= 0 otherwise, (103)

for i, j = 1,2 and for p1cx2 # p2cx1. It is not difficult to design more complex (and
more realistic) models along similar lines. For a single commodity, many of these
1808 A. Deaton

models can be made formally equivalent to the Tobit, Tobin (1958) model

y,* - xy + ui

Yi = YF if yj* 2 0
= 0 otherwise, (104)

and the estimation of this is well-understood.


However, there are a number of extremely difficult problems in applying the
Tobit model to the analysis of consumer behavior. First, there is typically more
than one good and whenever the demand for one commodity switches regime (i.e.
becomes positive having been zero, or vice versa), there are, in general, regime
changes in all the other demands, if only to satisfy the budget constraint. In fact,
the situation is a good deal more complex since, as will be discussed in Section 6
below, non-purchase is formally equivalent to a zero ration and the imposition of
such rations changes the functional form for other commodities in such a way as
to generate both income and substitution effects. With a n goods in the budget,
and assuming at least one good purchased, there are 2”-’ possible regimes, each
with its own particular set of functional forms for the non-zero demands. Wales
and Woodland (1983) have shown how, in principle, such a problem can be
tackled and have estimated such a system for a three good system using a
quadratic (direct) utility function. Even with these simplifying assumptions, the
estimation is close to the limits of feasibility. Lee and Pitt (1983) have demon-
strated that a dual approach is as complicated. An alternative approach may be
possible if only a small number (one or two) commodities actually take on zero
values in the sample. This is to condition on non-zero values, omitting all
observations where a zero occurs, and to allow specifically for the resulting
sample selection bias in the manner suggested, for example, by Heckman (1979).
This technique has been used by Blundell and Walker (1982) to estimate a system
of commodity demands simultaneously with an hours worked equation for
secondary workers.
The second problem is that it is by no means obvious that the Tobit specifica-
tion is correct, even for a single commodity. In sample surveys, zeros frequently
occur simply because the item was not bought over a relatively short enumeration
period (usually one or two weeks, and frequently less in developing countries).
Hence, an alternative to (104) might be

y,* = x;p + ui,


Yi = Yi*/Ti with probability TV,
yi = 0 with probability (1 - v,) . (109
Ch. 30: Demand Anajysis 1809

Hence, if, p( ui) is the p.d.f. of ui the likelihood for the model is

(106a)

This can be maximized directly to estimate j3 and V; given some low parameter
specification for TV.But note in particular that for ri = B for all i and ui taken as
i.i.d.N(O, u2) the likelihood is, for n, the number of zero yi’s,

(106b)

Hence OLS on the positive yi’s alone is consistent and fully efficient for /3/r and
a/?z. The MLE of a is simply the ratio of the number of positive y,‘s to the
sample size, so that, in this case, all parameters are easily estimated. If this is the
true model, Tobit will not generally be consistent. However, note that (105) allows
yi to be negative (although this may be very improbable) and ideally the Tobit
and the binary model should be combined. A not very successful attempt to do
this is reported in Deaton and Irish (1984). See also Kay, Keen and Morris (1984)
for discussion of the related problem of measuring total expenditure when there
are many zeroes.
In my view, the problem of dealing appropriately with zero expenditures is
currently one of the most pressing in applied demand analysis. We do not have a
theoretically satisfactory and empirically implementable method for modelling
zeroes for more than a few commodities at once. Yet all household surveys show
large fractions of households reporting zero purchases for some goods. Since
household surveys typically contain several thousands observations, it is im-
portant that procedures be developed that are also computationally inexpensive.
There are also a number of other problems which are particularly acute in
cross-section analysis and are not specific to the Tobit specification. Heteroscedus-
ticity tends to be endemic in work with micro data and, in my own practical
experience, is extremely difficult to remove. The test statistics proposed by
Breusch and Pagan (1979) and by White (1980) are easily applied, and White has
proposed an estimator for the variance-covariance matrix which is consistent
under heteroscedasticity and does not require any specification of its exact form.
Since an adequate specification seems difficult in practice, and since in micro
studies efficiency is rarely a serious problem, White’s procedure is an extremely
valuable one and should be applied routinely in large cross-section regressions.
Note, however, that with Tobit-like models, untreated heteroscedasticity generates
inconsistency in the parameter estimates, see Chapter 27, thus presenting a much
more serious problem. The heteroscedasticity introduced by grouping has become
1810 A. Deuion

less important as grouped data has given way to the analysis of the original micro
observations, but see Haitovsky (1973) for a full discussion.
Finally, there are a number of largely unresolved questions about the way in
which survey design should be taken into account (if at all) in econometric
analysis. One topic is whether or not to use inverse probability weights in
regression analysis, see e.g. DuMouchel and Duncan (1983) for a recent discus-
sion. The other concerns the possible implications for regression analysis of
Godambe’s (1955) (1966) theorem on the non-existence of uniformly minimum
variance or maximum likelihood estimators for means in finite populations, see
Cassel, Sarndal and Wretman (1977) for a relatively cool discussion.

3.4. Non-linear budget constraints

Consumer behavior with non-linear budget constraints has been extensively


discussed in the labor supply literature, where tax systems typically imply a
non-linear relationship between hours worked and income received, see Chapter
32 in this Handbook and especially Hausman (1985). I have little to add to
Hausman’s excellent treatment, but would nevertheless wish to emphasize the
potential for these techniques in demand analysis, particularly in “special”

Other
goods

Z sugar

Figure 2. Budget constraint for a fair price shop


Ch. 30: Demand Analysis 1811

markets. Housing is the obvious example, but here I illustrate with a simple case
based on Deaton (1984). In many developing countries, the government operates
so-called “fair price” shops in which certain commodities, e.g. sugar or rice, are
made available in limited quantities at subsidized prices. Typically, consumers
can buy more than the fair price allocation in the free market at a price pl, with
p1 > p,, the fair price price. Figure 2 illustrates for “sugar” versus a numeraire
good with unit price. Z is the amount available in the fair price shop and the
budget constraint assumes that resale of surplus at free market prices is impossi-
ble.
There are two interrelated issues here for empirical modelling. At the micro
level, using cross-section data, we need to know how to use utility theory to
generate Engel curves. At the macro-level, it is important to know how the two
prices p. and p1 and the quantity Z affect total demand. As usual, we begin with
the indirect utility function, though the form of this can be dictated by prior
beliefs about demands (e.g. there has been heavy use of the indirect utility
function associated with a linear demand function for a single good- for the
derivation, see Deaton and Muellbauer (1980a, p. 96) (1981) and Hausman
(1980)). Maximum utility along AD is u0 = #(x, p, po) with associated demand,
by Roy’s identity, of s0 = g(x, p, po). Now, by standard revealed preference, if
s0 < Z, s,, is optimal since BC is obtainable by a consumer restricted to being
within AD. Similar, maximum utility along EC is ui = J/(x +(pr - po)Z, p, pl)
with s = g(x +( p1 - p,,)Z, p, pl). Again, if s1 > Z, then si is optimal. The
remaining case is s,, > Z and si < Z (both of which are infeasible), so that sugar
demand is exactly Z (at the kink B). Hence, for individual h with expenditure xh
and quota Zh, the demand functions are given by

sh= gh(xh,PY PO) if gh(xh,p, PO) <Zh (107)


Sh=gh(xh+(P1-PO)Zh7P~P1) if gh(xh+(pl-pO)Zh,P,P1)>Zh

008)
sh=z” if gh(xh +( p1 - po)Zh, p, PI) I ZhS gh(xh, P, PO) (109)

Figure 3 gives the resulting Engel curve. Estimation on cross-section data is


straightforward by an extension of the Tobit method; the demand functions gh
are endowed with taste variation in the form of a normally distributed random
term, and a likelihood wifh three “branches” corresponding to sh < Zh, s h = Zh,
and sh > Zh is constructed. The middle branch corresponds to the zero censoring
for Tobit; the outer two are analogous to the non-censored observations in Tobit.
The aggregate free-market demand for sugar can also be analysed using the
model. To simplify, assume that households differ only in outlay, xh. Define xr
by g{ xr + ( p1 - po)Z, p, pl} = Z, so that consumers with x > xr enter the free
1812 A. Deaton

-_----

Figure 3. Engel curve with a non-linear budget constraint.

market. Hence per capita free market demand is

which, from the definition of xr is simply

as
-=
CYZ
$(p,- d-1) dF(x).
/,‘p{ 012)

Since, at the entensive margin, consumers buy nothing in the free market, only the
intensive margin is of importance. Note that all of these estimations and
calculations take a particularly simple form if the Marshallian demand functions
are assumed to be linear, so that, even in this non-standard situation, linearity can
still greatly simplify.
The foregoing is a very straightforward example but is illustrates the flavor of
the analysis. In practice, non-linear budget constraints may have several kink
points and the budget set may be non-convex. While such things can be dealt
with, e.g. see Ring (1980) or Hausman and Wise (1980) for housing, and Reece
and Zieschang (1984) for charitable giving, the formulation of the likelihood
becomes increasingly complex and the computations correspondingly more
Ch. 30: Demand Analysis 1813

burdensome. While virtually all likelihood functions can be maximized in princi-


ple, doing so for real applied examples with several thousand observations can be
prohibitively expensive.

4. Separability

In the conventional demand analysis discussed so far, a number of important


assumptions have not been justified. First, demand within each period is analysed
conditional on total expenditure and prices for that period alone, with no
mention of the broader determinants of behavior, wealth, income, other prices
and so on. Second, considerations of labor supply were completely ignored.
Third, no attention was given to questions of consumption and saving or to the
problems arising for goods which are sufficiently durable to last for more than
one period. Fourth, the practical analysis has used, not the elementary goods of
the theory, but rather aggregates such as food, clothing, etc., each with some
associated price index. Separability of one sort or another is behind each of these
assumptions and this section gives the basic results required for applied analysis.
No attempt is made to give proofs, for more detailed discussion the reader may
consult Blackorby, Primont and Russell (1978), Deaton and Muellbauer (1980a
Chapter 5) or the original creator of much of the material given here, Gorman
(1959) (1968) as well as many unpublished notes.

4. I. Weak separability

Weak separability is the central concept for much of the analygs. Let qA be some
subvector of the commodity vector q so that q = (qA, qA) without loss of
generality. qA is then said to be (weakly) separable if the direct utility function
takes the form

013)

uA(qA) is the subutility (or felicity) function associated with qA. This equation is
equivalent to the existence of a preference ordering over qA alone; choices over
the qA bundles are consistent independent of the vector qA. More symmetrically,
preferences as a whole are said to be separable if q can be partitioned into
(qA, qB,....qN) such that

u= u(uA(qA),UR(qB)~...,UN(qN)). 014)

Since u is increasing in the subutility levels, it is immediately obvious that


1814 A. De&on

maximization of overall u imp&s maximization of the subutilities subject to


whatever is optimally spent on the groups. Hence, (113) implies the existence of
subgroup demands

(115)

where x A - pA. qA, while (115) has the same implication for all groups. Hence, if
preferences in a life-cycle model are weakly separable over time periods, commod-
ity demand functions conditional on x and p for each time period are guaranteed
to exist. Similarly, if goods are separable from leisure, commodity demand
functions of the usual type can be justified.
Tests of these forms of separability can be based on the restrictions on the
substitution matrix implied by (115). If i and j are two goods in distinct groups,
i E G, j E H, G f H, then the condition

(116)
for some quantity pCH (independent of i and j) is both necessary and sufficient
for (114) to hold. If a general enough model of substitution can be estimated,
(116) can be used to test ‘separability, and Byron (1968), Jorgenson and Lau
(1975) and Pudney (1981b), have used essentially this technique to find separabil-
ity patterns between goods within a single period. Bamett (1979a) has tested the
important separability restriction between goods and leisure using time series
American data and decisively rejects it. If widely repeated, this result would
suggest considerable misspecification in the traditional studies. It is also possible
to use a single cross-section to test separability between goods and leisure.
Consider the following cost function proposed by Muellbauer (1981b).

c(wbp) =d(p)+b(p)w+{a(p)}‘-“wsu, (117)

where w is the wage d(p), b(p) and a(p) are functions of p, homogenous of
degrees, 1, 0 and 1 respectively. Shephard’s Lemma gives immediately

018)

for transfer income p, hours worked h and parameters CY,/3, y all constant in a
single cross-section. It may be shown that (117) satisfies (114) for leisure vis-a-vi,
goods if and only if b(p) is a constant, which for (118) implies that pi/y; be
independent of i, i =l,..., n. This can be tested by first estimating (114) as a
system by OLS equation by equation and then computing the Wald test for the
Ch. 30: Demand Analysis 1815

(n - 1) restrictions, i = 1,. . . , (n - 1)

P;Y, - VA = 0. 019)
This does not involve estimating the restricted nonlinear model. My own results
on British data, Deaton (1981b), suggest relatively little conflict with separability,
however, earlier work by Atkinson and Stern (1981) on the same data but using
an ingenious adaptation of Becker’s (1965) time allocation model, suggests the
opposite. Blundell and Walker (1982), using a variant of (117) reject the hypothe-
sis that wife’s leisure is separable from goods. Separability between different time
periods is much more difficult to test since it is virtually impossible to provide
general unrestricted estimates of the substitution responses between individual
commodities across different time periods.
Subgroup demand functions are only a part of what the applied econometrician
needs from separability. Just as important is the question of whether it is possible
to justify demand functions for commodity composites in terms of total expendi-
ture and composite price indices. The Hicks (1936) composite commodity theo-
rem allows this, but only at the price of assuming that there are no relative price
changes within subgroups. Since there is no way of guaranteeing this, nor often
even of checking it, more general conditions are clearly desirable. In fact, the
separable structure (114) may be sufficient in many circumstances. Write uA, ug,
etc. for the values of the felicity functions and ca(uA, pA) etc. for the subgroup
cost functions corresponding to the uA(@‘) functions. Then the problem of
choosing the group expenditure levels xA, xr,, . . . can be written as

max 24= u( uA, ug,. . ., uN), (120)


s.t. x = CCR( UR, p”).
R

Write

P”)
CR(UR,
CR(UR~pR)=CR(UR~~R)-C (uR p”)’ (121)
R 9

for some fixed prices FR. For such a fixed vector, cR(uR, FR) is a welfare
indicator or quantity index, while the ratio cR( uR, pR)/cR( uR, PR) is a true (sub)
cost-of-living price index comparing pR and pR using uR as reference, see Pollak
(1975). Finally, since uR = qR(cR(uR, pR), pR), (120) may be written

maxu=u{~~(c,(u,,PA),PA),~B( 1, }. 022)
cR( ‘R, P”)
S.t.&R(“R, pR)*
cR(uR, PR) =”
1816 A. Deaton

which is a standard utility maximization problem in which the constant price


utility levels cR( uR, jR) are the quantities and the indices cR( uR, pR)/cR( uR, PR)
are the prices. Of course, neither of these quantities is directly observable and the
foregoing analysis is useful only to the extent that cR( uR, PR) is adequately
approximated by the constant price composite qR.pR and the price index by the
implicit price deflator pR- q “/jiR. q R. The approximations will be exact under the
conditions of the composite commodity theorem, but may be very good in many
practical situations where prices are highly but not perfectly collinear. If so, the
technique has the additional advantage of justifying the price and quantity indices
typically available in the national accounts statistics. An ideal solution not relying
on approximations requires quantity indices depending only on quantities and
price indices depending only on prices. Given weak separability, this is only
possible if either each subcost function is of the form cJ uG, pc) = 8,( u,)b,( pG)
so that the subgroup demands (11) display unit elasticity for all goods with
respect to group outlay or each indirect felicity function takes the “Gorman
generalized polar form ”

‘G= FG[XG/bG(PG)] +aGbG), (123)

for suitable functions FG, b, and a,, the first monotone increasing, the latter two
linearly homogeneous, and the utility function (114) or (120) must be additive in
the individual felicity functions. Additivity is restrictive even between groups, and
will be further discussed below, but (123j permits fairly general forms of Engel
curves, e.g. the Working form, AIDS, PIGL and the translog (61) if C,C,p,, = 0.
See Blackorby, Boyce and Russell (1978) for an empirical application, and
Anderson (1979) for an attempt to study the improvement over standard practice
of actually computing the Gorman indices. In spite of this analysis, there seems to
be a widespread belief in the profession that homothetic weak separability is
necessary for the empirical implementation of two-stage budgeting (which is itself
almost the only sensible way to deal with very large systems) - see the somewhat
bizarre exchanges in the 1983 issue of the Journal of Business and Economic
Statistics. In my view, homothetic separability is likely to be the least attractive of
the alternatives given here; it is rarely sensible to maintain without testing that
subgroup demands have unit group expenditure elasticities. In many cases, prices
will be sufficiently collinear for the problem (122) to given an acceptably accurate
representation. And if not, additivity between broad groups together with the very
flexible Gorman generalized polar form should provide an excellent alternative.
Even failing these possibilities, there are other types of separability with useful
empirical properties, see Blackorby, Primont and Russell (1978) and Deaton and
Muellbauer (1980, Chapter 5).
One final issue related to separability is worth noting. As pointed out by
Blackorby, Primont and Russell (1977), flexible functional forms do not in
Ch. 30: Demand Analysis 1817

general remain flexible under the global imposition of separability restrictions.


Hence, a specific functional form which offers a local second-order approximation
to an arbitrary utility function may not be able to similarly approximate, say, an
arbitrary additive utility function once its parameters are restricted to render it
globally additive. For example, Blackorby et al. show that weak separability of
the translog implies either strong separability or homothetic separability so that
the translog cannot model non-homothetic weak separability. The possibility of
imposing and testing restrictions locally (say, at the sample mean) remains, but
this is less attractive since it is difficult to discriminate between properties of the
data generation process and the approximating properties of the functional form.

4.2. Strong separability and additivity

Strong separability restricts (114) to the case where the overall function is
additive, i.e. for some monotone increasing f

24= f(pJRblR)) 024)

If each of the groups qR contains a single good, preferences are said to be


additive, or that wants are independent. I deal with this case for simplicity since
all the additional features over weak separability occur between groups rather
than within them. The central feature of additivity is that any combination of
goods forms a separable set from any other, so that (116) must hold without the
G, H labels on pLGH, i.e. for some p and for all i, j in different groups (i # j
under additivity)

a4i aqj
s *,=p,,p (125)

The budget constraint (or homogeneity) can be used to complete this for all i and
j; in elasticity terms, the relationship is, Frisch (1959) Houthakker (1960)

eij = c#d,jei - eiwJ (1 + +ej),

for some scalar +, (uncompensated) cross-price elasticity eij, and total expendi-
ture elasticity e,. This formula shows immediately the strengths and weaknesses of
additivity. Apart from the data wi, knowledge of the (n - 1) independent ei’s
together with the quantity (p (obtainable from knowledge of one single price
elasticity) is sufficient to determine the whole (n x n) array of price elasticities.
Additivity can therefore be used to estimate price elasticities on data with little or
1818 A. Deaton

no relative price variation, e.g. on cross-sections, on short-time series, or in


centrally planned economies where relative prices are only infrequently altered.
This was first realised by Pigou (1910) and the idea has a distinguished history in
the subject, see Frisch (1932) (1959) and the enormous literature on the (additive)
linear expenditure system [for Eastern European experience, see Szakolczai (1980)
and Fedorenko and Rimashevskaya (1981)]. Conversely, however, there is very
little reason to suppose that (126) is empirically valid. Note, in particular, that for
w, small relative to e, (as is usually the case), e,, = +ej (as Pigou pointed out) and
there seems no grounds for such a proportionality relationship to be generally
valid. Indeed such tests as have been carried out, Barten (1969) Deaton (1974b)
(1975a) (1975b), Theil (1975b), suggest that additivity is generally not true, even
for broad categories of goods. Nevertheless, the assumption continues to be
widely used, for example in the interesting cross-country work of Theil and Suhm
(1982) no doubt because of its economy of parametrization ( = high level of
restrictiveness). There is also a substantial industry in collecting estimates of the
parameter + under the (entirely baseless) supposition that it measures the inverse
of the elasticity of the marginal utility of money.
Few of the practical objections to additivity apply to its use in an intertemporal
context and it is standard practice to specify life-time preferences by (124) where
the R’s refer to time periods, an example being Lluch’s (1973) intertemporal
linear expenditure system (ELES), although this is also additive within periods.
On elegant way of exploiting additivity is again due to Gorman (1976) and
utilizes the concept of a “consumer profit function”. Define rr( p, r) by

~(p,r)=m~fx{-p-q+r-24; u=u(q)}, 027)

for concave u(q), so that the consumer sells utility (to him or herself) at a price r
( = the reciprocal of the marginal utility of money) using inputs q at prices p.
Now if u(q) has the explicitly additive form CuR(qR), so will rr( p, r), i.e.

(128)

Now m(p, r) also has the derivative property q = - vPr( p, r) so that for i
belonging to group R,

q = _ aTRR(rdR)
I
(129)
aPRi ’

which depends only on within group prices and the single price of utility r which
is common to all groups and provides the link between them. In the intertemporal
context, r is the price of lifetime utility, which is constant under certainty or
follows (approximately) a random walk under uncertainty, while pR is within
Ch. 30: Demand Analysis 1819

period prices. Hence, as realized by MaCurdy and .utilized in Heckman (1978),


Heckman and MaCurdy (1980), and MaCurdy (1981), eq. (129) can be imple-
mented on panel data by treating r as a fixed effect so that only data on current
magnitudes are required. Since these are typically the only data available, the
technique is of considerable importance. See Browning, Deaton and Irish (1984)
for further discussion of profit functions and additivity and for an application to
British data (in which the simple life-cycle model of the simultaneous determina-
tion of consumption and labor supply has some difficulty in dealing with the
evidence.)
Another important use of separability in general and of additivity in particular
is as a vehicle for the structuring and interpretation of preference patterns. For
example, in the “characteristics” model of consumer behaviour pioneered by
Gorman (1956, 1980), Stone (1956) and Lancaster (1966), and recently estimated
by Pudney (1981a), it is a transformation of the goods which generates utility, and
it may be quite plausible to assume that preferences are separable or even additive
in the transformed characteristics (food, shelter, mate, etc.) rather than in the
market goods which have no direct role in satisfying wants. One possibility,
extensively explored by Theil and his co-workers, e.g. Theil(1976) and Theil and
Laitinen (1981) for a review, is that preferences are additive over characteristics
given by a linear transform of the market goods. Theil and Laitinen use the
Rotterdam model and, by a technique closely related to factor analysis, rotate the
axes in goods space to obtain the “preference independence transform”. Applied
to the demand for beef, pork and chicken in the U.S., the model yields the
transformed goods “inexpensive meat”, “ beef/pork contrast” and “antichicken”,
Theil (1976, p. 287). These characteristics may indeed reflect real aspects of
preference structures in the U.S., but as is often the case with factor analytical
techniques (see e.g. Armstrong (1967) for an amusing cautionary tale) there is
room for some (largely unresolvable) scepticism about the validity and value of
any specific interpretations.

5. Aggregation over consumers

Clearly, on micro or pane! data, aggregation is not an issue, and as the use of such
data increases, the aggregation problem will recede in importance. However,
much demand analysis is carried out on macroeconomic aggregate or per capita
data, and it is an open question as to whether this makes sense or not. The topic
is a large one and I present only the briefest discussion here, see Deaton and
Muellbauer (1980a, Chapter 6) for further discussion and references. At the most
general level, average aggregate demand 4, is given by

qi=Gi(x1,x2 ,..., xh ,..., XH,P), (130)


1820 A. Deaton

for the H outlays xh of household h. The function Gi can be given virtually any
properties whatever depending on the configuration of individual preferences. If,
however, the outlay distribution were fixed in money terms, x h = khX for con-
stants kh, (130) obviously gives

although without restrictions on preferences, see e.g. Eisenberg (1961) Pearce


(1964), Chipman (1974), and Jerison (1984) there is no reason to suppose that the
GF functions possess any of the usual properties of Marshallian demands. Of
course, if the utility (real outlay) distribution is fixed, Hicksian demands aggre-
gate in the same way as (130) and (131) and there exist macro demand functions
with all the usual properties. There is very little relevant empirical evidence on the
movement over time of either the outlay or the utility distribution, but see
Simmons (1980) for some conjectures for the U.K.
If the distribution of outlay is not to be restricted in any way, formulae such as
(131) can only arise if mean preserving changes in the x-distribution have no
effect on aggregate demand, i.e. if all individuals have identical marginal propen-
sities to spend on each of the goods. This condition, of parallel linear Engel
curves, dates back to Antonelli (1886), but is usually (justly) credited to Gorman
(1953) (1961). As he showed, utility maximizing consumers have parallel linear
Engel curves if and only if the individual cost functions have the form

032)

a specification known as the “Gorman polar form”. Suitable choice of the ah(p)
and b(p) functions permits (132) to be a flexible functional form, Diewert
(1980a), but the uniformity across households implied by the need for all Engel
curves to be parallel seems implausible. However, it should be noted that a single
cross-section is insufficient to disprove the condition since, in principle, and
without the use of panel data, variation in the ah(p) functions due to non-outlay
factors cannot be distinguished from the direct effects of variations in xh. A
somewhat weaker form of the aggregation condition, emphasized by Theil (1954)
(1975 Chapter 4) is that the marginal propensities to consume be distributed
independently of the xh, see also Shapiro (1976) and Shapiro and Braithwait
(1979). Note finally that if aggregation is to be possible for all possible income
distributions, including those for which some people have zero income, then the
parallel linear Engel curves must pass through the origin so that ah(p) in (132) is
zero and preferences are identical and homothetic.
If, however, the casual evidence against any form of linear Engel curves is
taken seriously exact aggregation requires the abandonment of (131) at least in
principle. One set of possibilities has been pursued by Muellbauer (1975b)
(1976a) (1976b) who examines conditions under which the aggregate budget share
Ch. 30: Demand Analysis 1821

of each good can be expressed as a function of prices and a single indicator of x,


not necessarily the mean. If, in addition, this indicator is made independent of
prices, the cost functions must take the form

033)

called by Muellbauer, “price-independent generalised linearity” (PIGL). With


(Y= 1, PIGL is essentially the Gorman polar form and the Engel curves are linear;
otherwise, a controls the curvature of the Engel curves with, for example, the
AIDS and Working-Leser forms as special cases when (Y= 0. The macro relation-
ships corresponding to (133) render qi a function of both x and of the mean of
order (1 - a) of the outlay distribution. Hence, if ar = - 1, the Engel curves are
quadratic and the average aggregate demands depend upon the mean and
variance of x. This opens up two new possibilities. On the one hand, the
presumed (or estimated) curvature of the Engel curves can be used to formulate
the appropriate index of dispersion for inclusion in the aggregate demands, see
e.g. the papers by Berndt, Darrough and Diewert (1977) and by Simmons (1980)
both of which use forms of (133). On the other hand, the income and hence outlay
distribution changes very little over time, such models allow the dispersion terms
to be absorbed into the function and justify the use of (131) interpreted as a
conventional Marshallian demand function, see e.g. Deaton and Muellbauer
(1980b). This position seems defensible in the light of the many studies which,
using one technique or another, have failed to find any strong influence of the
income distribution on consumer behaviour.
Recent theoretical work on aggregation has suggested that the generalized
linearity and price independent generalised linearity forms of preference have a
more fundamental role to play in aggregation than solving the problem posed by
Muellbauer. Jerison (1984) has shown that the generalized linearity conditions are
important for aggregation with fixed income distribution, while Freixas and
Mas-Cole11 (1983) have proved the necessity of PIGL for the weak axiom of
revealed preference to hold in aggregate if the income distribution is unrestricted.
(Note that Hildenbrand’s (1983) proof that WARP holds on aggregate data
requires that the density of the income distribution be monotone declining and
have support (0, cc), so that modal income is zero!).
In a more empirical vein, Lau (1982) has considered a more general form of
aggregation than that required by (131). Lau considers individual demand func-
tions of the form gh(xh, p, a”) for budget xh, prices p and attributes (e.g.
demographics) ah. His first requirement is that Cgh(xh, p, ah) be symmetric in
the H xh ‘s and ah’s, i.e. be invariant to who has what x and what a. This alone
is sufficient to restrict demands to the form

Rh(Xh,
p,ah)=g(xh,p,ah)+kh(p), (134)
1822 A. Deaton

i.e. to be identical up to the addition of a function of prices alone. Lau then


derives the conditions under which aggregate demands are a function of not the
H x’s and a’s, but of a smaller set of m indices, m < H. Lau shows that

(135)
with fi(x, a) non-constant symmetric functions of the H-vectors x and a, implies
that

(136)
k=l

Gorman’s (1981) theorem, see 3(a) above, tells us what form the $k functions can
take, while Lau’s theorem makes Gorman’s results the more useful and important.
Lau’s theorem provides a useful compromise between conventional aggregation as
represented by (131) on the one hand and complete agnosticism on the other.
Distributional effects on demand are permitted, but in a limited way. Gorman’s
results tell us that to get these benefits, polynomial specifications are necessary
which either link quantities to outlays or shares to the logarithms of outlays. The
latter seem to work better in practice and are therefore recommended for use.
Finally, mention must be made of the important recent work of Stoker who, in
a series of papers, particularly (1982) (1984), has forged new links between the
statistical and economic theories of aggregation. This work goes well beyond
demand analysis per se but has implications for the subject. Stoker (1982) shows
that the estimated parameters from cross-section regressions will estimate the
corresponding macro-effects not only under the Gorman perfect aggregation
conditions, but also if the independent variables are jointly distributed within the
exponential family of distributions. In the context of demand analysis, the
marginal propensity to consume from a cross-section regression would con-
sistently estimate the impact of a change in mean income on mean consumption
either with linear Engel curves or with non-linear Engel curves and income
distributed according to some exponential family distribution. Since one of the
reasons we are interested in aggregation is to be able to move from micro to
macro in this way, these results open up new possibilities. Stoker (1984) also
carries out the process in reverse and derives completeness (or identification)
conditions on the distribution of exogenous variables that allow recovery of micro
behavior from macro relationships.
Much of the work reported in this section, by Muellbauer, Lau and Stoker, can
be regarded as developing the appropriate techniques of allowing for the impacts
of distribution on aggregate demand functions. That such effects could be
potentially important has been known for a long time, see de Wolff (1941) for an
early contribution. What still seems to be lacking so far is empirical evidence that
such effects are actually important.
Ch. 30: Demand Analysis 1823

6. Behavior under quantity constraints

The existence and consequences of quantity constraints on purchases has recently


been given much attention in the literature and the question of whether (or how)
the labor market clears remains of central importance for much of economic
analysis, see Ashenfelter (1980) for a good discussion in which rationing is taken
seriously. If empirical studies of consumer behavior are to contribute to this
discussion, they must be able to model the effects of quantity rationing on
purchases in other markets and be able to test whether or not quantity constraints
exist. Perhaps the most famous work on the theory of quantity constraints traces
back to Samuelson’s (1947) Foundations and the enunciation of the Le Chatelier
principle by which substitution possibilities in all markets are reduced by the
imposition of quantity restrictions in any. These effects were further studied in the
later papers of Tobin and Houthakker (1951) and surveyed in Tobin (1952). All
the results obtained are essentially local, given the effects on deviations or
elasticities of imposition or changes in quantity restrictions. Applied work,
however, requires theory which generates functional forms and, for this, global
relationships between rationed and unrationed demands are required. In the
presentation here, I follow the work of Neary and Roberts (1980) and Deaton
(1981b).
The commodity vector q is partitioned into (q", ql) where q" may or may not
be constrained to take on values z. These may be outside impositions or they may
essentially be “chosen” by the consumer. An example of the latter is when a
consumer decides not to participate in the labor force; since hours cannot be
negative, the commodity demand functions conditional on non-participation are
those which arise from a quantity restriction of zero hours worked. The simplest
case arises if q1 forms a separable group, so that without quantity restrictions on
q", it is possible to write

q; = g!(x - p”qo, p’), (137)

see eq. (115) above. Clearly, rationing makes no difference to (137) except that z
replaces q O, so that testing for the existence of the quantity restrictions can be
carried out by testing for the endogeneity of q" using a Wu (1973) or Hausman
(1978) test with p” as the necessary vector of exogenous instruments not
appearing in (137). Without separability matters are more complicated and, in
addition to the variables in (137), the demand for q1 depends on z so that
without quantity restrictions

q; = gF(x, PO,P’), (138)

while, under rationing,

q; = g/yx - pO.z, pl, z). (139)


1824 A. Deuton

Efficient estimation and testing requires that the relationship between g F and g R
be fully understood. Once again, the cost function provides the answer. If
c( u, p”,pl) is the unrestricted cost function, i.e. that which generates (138) the
restricted cost function c*(u, p”, p’, z) is defined by

c*( u, PO, Pl, z) = min{p”.qo+p’*q’; u(q”,ql)=u,qo=z}

=p”‘z+Y(u,P’,z), (140)

where y does not depend upon p”. Define the “virtual prices”, PO, Rothbarth
(1941) as a function {‘(u, pl, z) by the relation

so that p” is the vector of prices which at u and p1 would cause z to be freely


chosen. At these prices, restricted and unrestricted costs must be identical, i.e.

c(~,~“,p)=80~z+Y(~,P1,z), (142)

is an identity in u, p1 and z with j” = {‘(u, p’, z). Hence, combining (140) and
(142)

c*( 24, PO, pl, z) = (pO - p”)*z + c( u, PO, P). (143)

With p” determined by (141) this equation is the bridge between restricted and
unrestricted cost functions and, since (138) derives from differentiating c( u, p”, p)
and (139) from differentiating c*(u, p”, pl, z), it also gives full knowledge of the
relationship between g F and g R. This can be put to good theoretical use, to prove
all the standard rationing results and a good deal more besides.
For empirical purposes, the ability to derive g R from g F allows the construc-
tion of a “matched pair” of demand functions, matched in the sense of deriving
from the same preferences, and representing both free and constrained behavior.
A first attempt, applied to housing expenditure in the U.K., and using the
Muellbauer cost function (117) is given in Deaton (1981b). In that study I also
found that allowing for quantity restrictions using a restricted cost function
related to that for the AIDS, removed much of the conflict with homogeneity on
post-war British data. Deaton and Muellbauer (1981) have also derived the
matched functional form g F and gR for commodity demands for the case where
there is quantity rationing in the labor market and where unrestricted labor
supply equations take the linear functional forms frequently assumed in the labor
supply literature.
Ch. 30: Demand Analysis 1825

7. Other topics

In a review of even this length, only a minute fraction of demand analysis can be
covered. However, rather than omit them altogether, I devote this last section to
an acknowledgement of the existence of three areas closely linked to the preceed-
ing analysis (and which many would argue are central), intertemporal demand
analysis, the analysis of quality, and the use of demand analysis in welfare
economics.

7.1. In tertemporal demand analysis

Commodity choices over a lifetime can perhaps be modelled using the utility
function

u=v{q’,q’,..., q’,... qL,B@}, (144)

where the q’ represent vectors of commodity demands for period 7, B is bequests


at death which occurs with certainty at the end of period L, and rL is some
appropriate price index to be applied to B. Utility is maximized subject to the
appropriate constraint, i.e.

fj’-q’+ ?;“( B,lrL) = W, (145)

where a denotes discounting and W is the discounted present value at 0 of


present and future financial assets and either full income, if labor supply is
included, or labor income, if labor supply is taken as fixed.
Clearly (144) (145) are together formally identical to the usual model so that
the whole apparatus of cost functions, duality, functional forms and so on can be
brought into play. However, the problem is nearly always given more structure by
assuming (144) to be additively separable between periods so that demand
analysis proper applies to the more disaggregated stage of two stage budgeting,
while the allocation to broad groups (i.e. of expenditure between the periods)
becomes the province of the consumption function, or more strictly, the life-cycle
model. The apparatus of Section 4.2 can be brought into play to yield the new
standard life-cycle results, see Browning, Deaton and Irish (1985) Hall (1981),
Bewley (1977). Even a very short review of this consumption function literature
would double the length of this chapter.
The presence of durable goods can also be allowed for by entering stocks at
various dates into the intertemporal model (144). Under the assumption of perfect
1826 A. Deaton

capital markets, constant proportional physical depreciation, and no divergence


between buying and selling prices, these stocks can be priced at “user cost”
defined by

P: = [PI - Pr+10 - wo+ r,+1)13


when pr is the price of the good at time t, 6 is the rate of physical depreciation
and rt is the interest rate, see Diewert (1974b) or Deaton and Muellbauer (1980a
Chapter 13) for full discussions of this model. If user cost pricing is followed,
(although note the expectational element in p,+J, durable goods can be treated
like any other good with p,?S, (for stock S,) as a dependent variable in a demand
system, and x, (including p:S, not the purchase of durables) and all prices and
user costs as independent variables. The model is a very useful benchmark, but its
assumptions are more than usually unrealistic and it is not surprising that it
appears to be rejected in favour of alternative specifications, See Muellbauer
(1981a). However, no fully satisfactory alternative formulation exists, and the
literature contains a large number of quite distinct approaches. In many of these,
commodity demands are modelled conditional on the stocks which, in turn,
evolve with purchases, so that dynamic formulations are created in which long-run
and short-run responses are distinct. The stock-adjustment models of Stone and
Rowe (1957) (1958) and Chow (1957) (1960) are of this form, as is the very similar
“state” adjustment model of Houthakker and Taylor (1966) who extend the
formulation to all goods while extending the concept of stocks to include “stocks”
of habits (since in these models, stocks are substituted out, it makes little
difference what name is attached to them). There are also more sophisticated
models in which utility functions are defined over instantaneous purchases and
stocks, e.g. Phlips’ (1972) “dynamic” linear expenditure system, and further
refinements in which intertemporal functions are used to model the effects of
current purchases on future welfare via their effects on future stocks, Phlips (1974,
1983 Part II). These models are extremely complicated to estimate and it is not
clear that they capture any essential features not contained in the stock-adjust-
ment model, on the one hand, and the user cost model on the other, see in
particular the results of Spinnewyn (1979a) (1979b). It remains for future work to
tackle the very considerable task of constructing models which can deal, in
manageable form, with the problems posed by the existence of informational
asymmetries [lemons, Akerlof (1970)], borrowing constraints, indivisibilities, tech-
nological diffusion, and so on.

7.2. Choice of qualities

The characteristics model of consumer behavior is a natural way of analysing


choice of qualities and, indeed, Gorman’s (1956, 1980) classic paper is concerned
Ch. 30: Demand Analysis 1827

with quality differentials in the Iowa egg market. By specifying a technology


linking quality with market goods, the model naturally leads to the characteriza-
tion of shadow prices for qualities and these have played a central role in the
“new household economics”, see in particular, Becker (1976). A related but more
direct method of dealing with quality was pioneered in the work of Fisher and
Shell (1971), see also Muellbauer (1975a) and Gorman (1976) for reformulations
and extensions. The model is formally identical to the Barten model of household
composition discussed in Section 3 above with the m’s now interpreted as quality
parameters “augmenting” the quantities in consumption. Under either formula-
tion, competition between goods manufacturers will, under appropriate assump-
tions, induce a direct relationship between the price of each good (or variety) and
an index of its quality attributes. These relationships are estimated by means of
“hedonic” regressions in which (usually the logarithm of) price is regressed on
physical attributes across different market goods, see e.g. Burstein (1961) and
Dhrymes (1971) for studies of refrigerator prices, and Ohta and Griliches (1976) ,
Cowling and Cubbin (1971) (1972), Cubbin (1975) and Deaton and Muellbauer
(1980a p. 263-5) for results on car prices. These techniques date back to Griliches
(1961) and ultimately to Court (1939). Choice among discrete varieties involves
many closely related techniques, see Chapter 24 of this handbook.
Empirical studies of consumer demand for housing are a major area where
quality differences are of great importance. However, until recently, much of the
housing literature has consisted of two types of study, one regressing quantities of
housing services against income and some index of housing prices, either individ-
ual or by locality, while the other follows the hedonic approach, regressing prices
on the quantities of various attributes, e.g. number of rooms, size, presence of and
type of heating, distance from transport, shops and so on. Serious attempts are
currently being made to integrate these two approaches and this is a lively field
with excellent data, immediate policy implications, and some first-rate work being
done. Lack of space prevents my discussing it in detail; for a survey and further
references see Mayo (1978).

7.3. Demand analysis and welfare economics

A large proportion of the results and formulae of welfare economics, from cost
benefit analysis to optimal tax theory, depend for their implementation on the
results of empirical demand analysis, particularly on estimates of substitution
responses. Since the coherence of welfare theory depends on the validity of the
standard model of behavior, the usefulness of applied demand work in this
context depends crucially on the eventual solution of the problems with homo-
geneity (possible symmetry) and global regularity discussed in Section 2 above.
But even without such difficulties, the relationship between the econometric
estimates and their welfare application is not always clearly appreciated. In
1828 A. Deaion

consequence, I review briefly here the estimation of three welfare measures,


namely consumer surplus, cost-of-living indices, and equivalence scales.
I argued in Section 1 that it was convenient to regard the cost function as the
centrepiece of applied demand analysis. It is even more convenient to do so in
welfare analysis. Taking consumer surplus first, the compensating variation (CV)
and equivalent variation (EV) are defined by, respectively,

cv=c(zP,pl)-c(uO,pO), (147)
EV=c(u’,p’)-c(u’,p’), 048)

so that both measure the money costs of a welfare affecting price change from p”
to pl, CV using u” as reference (compensation returns the consumer to the
original welfare level) and EV using u1 (it is equivalent to the change to u’). Base
and current reference true cost-of-living index numbers are defined analogously
using ratios instead of differences, hence

p( PI, PO; u”> = c( u”, P’>/C( u”, PO>, (149)


p( Pl, PO; u’) = c( ul, P’)/C( ul, PO), W-N

are the base and current true indices. Note the CV, EV and the two price indices
depend in no way on how utility is measured; they depend only on the indiffer-
ence curve indexed by u, which could equally well be replaced by Cp(u) for any
monotone increasing +. Even so, the cost function is not observed directly and a
procedure must be prescribed for constructing it from the (in principle) observ-
able Marshallian demand functions. If the functional forms for these are known,
and if homogeneity, symmetry and negativity are satisfied, the cost function can
be obtained by solving the partial differential equations (12) often analytically,
see e.g. Hausman (1981). Unobserved constants of integration affect only the
measurability of u so that complete knowledge of the Marshallian demands is
equivalent to complete knowledge of consumer surplus and the index numbers. If
analytical integration is impossible or difficult, numerical integration is straight-
forward (provided homogeneity and symmetry hold) and algorithms exist in the
literature, see e.g. Samuelson (1948) and in much more detail, Vartia (1983). If the
integrability conditions fail, consumer behavior is not according to the theory and
it is not sensible to try to calculate the welfare indices in the first place, nor is it
possible to do so. Geometrically, calculating CV or EV is simply a matter of
integrating the area under a Hicksian demand curve; there is no valid theoretical
or practical reason for ever integrating under a Marshallian demand curve. The
very considerable literature discussing the practical difficulties of doing so (the
path-dependence of the integral, for example) provides a remarkable example of
the elaboration of secondary nonsense which can occur once a large primary
category error has been accepted; the emperor with no clothes, although quite
unaware of his total nakedness, is continuously distressed by his inability to tie
Ch. 30: Demand Analysis 1829

his shoelaces. A much more real problem is the assumption that the functional
forms of the Marshallian demands are known, so that working with a specific
model inevitably understates the margin of ignorance about consumer surplus or
index numbers. The tools of non-parametric demand analysis, as discussed in
Section 2.7, can, however, be brought to bear to give bounding relationships on
the cost function and hence on the welfare measures themselves, see Varian
(1982b).
The construction of empirical scales is similar to the construction of price
indices although there are a few special difficulties. For household characteristics
ah, the equivalence scale M( ah, a’; U, p) is defined by

M( ah, uo, u,p)=c(u,p,ah)/c(u,p,aO), 051)


for reference household characteristics u” and suitably chosen reference welfare
level u and price vector p. Models such as those discussed in Section 3.2 yield
estimates of the parameters of c(u, p, a) so that scales can be evaluated. How-
ever, the situation is not quite the same as for the price indices (149) and (150).
For these, c(u, p) only is required and this is identified by the functional forms
for its tangents hi( U, p) = gi { c( u, p), p }, But for c( U, p, a), we observe only the
p-tangents .together with their derivatives with respect to a, i.e. dqi/duj, the
demographic effects on demand, and this information is insufficient to identify
the function. In particular, as emphasized by Pollak and Wales (1979), the cost
functions c(+( U, a), p, a) and c( U, p, a) have identical behavioral consequences
if &$/au > 0 while giving quite different equivalence scales. Since c(u, p, a) is
formally identical to the restricted cost function discussed in Section 6 above, its
derivatives with respect to a can be interpreted as shadow prices [differentiate eq.
(143)]. These could conceivably be measured from “economic” studies of fertility,
in which case the equivalence scale would be fully identified just as are the price
indices from c( U, p). Failing such evidence, it is necessary to be very explicit
about exactly what prior information is being used to identify the scales. In
Deaton and Muellbauer (1981), the identification issue is discussed in detail and it
is shown that the same empirical evidence yields systematically different scales for
different models, e.g. those of Engel, Barten and Rothbarth discussed in 3.2. It is
also argued that plausible identification assumptions can be made, so that
demand analysis may, after all, have something to say about the economic costs
of children.

References

Afriat, S. N. (1967) “The Construction of Utility Functions From Expenditure Data”, International
Economic Review, 8, 67-77.
Afriat, S. N. (1973) “On a System of Inequalities in Demand Analysis: An Extension of the Classical
Method”, International Economic Review, 14, 460-472.
Afriat, S. N. (1976) 7’he Combinatorial Theory of Demand. London: Input-output Co.
Afriat, S. N. (1977) The Price Index. Cambridge University Press.
1830 A. Deaton

Afriat. S. N. (1980) Demand Functions and the Slutsky Mafrix. Princeton: Princeton University Press.
Afriat, S. N. (1981) “On the Constructability of Consistent Price Indices Between Several Periods
Simultaneously”, in: A. S. Deaton, ed., Essays in the Theory and Measurement of Consumer
Behaviour in Honour of Sir Richard Stone. Cambridge: Cambridge University Press.
Aitchison. J. and J. A. C. Brown (1954-5) “A Synthesis of Engel Curve Theory”, Revrew of Economic
Studies, 22, 35-46.
Akerlof, G. (1970) “The Market for Lemons”, Quarter/y Journal of Economics, 84, 488-500.
Altfteld, C. L. F. (1985) “Homogeneity and Endogeneity in Systems of Demand Equations”, Journal
of Econometrics, 27, 191-209.
Anderson, G. J. and R. W. Blundell(l982) “Estimation and Hypothesis Testing in Dynamic Singular
Equation Systems”, Econometrica, 50, 155991571.
Anderson, R. W. (1979) “Perfect Price Aggregation and Empirical Demand Analysis”, Econometrica,
47, 1209-30.
Anderson, T. W. (1958) An Introduction to Multivariate Statistical Analysts. New York: John Wiley.
Aneurvn-Evans, G. B. and A. S. Deaton (1980) “Testing Linear versus Logarithmic Regressions”,
Rev& of Economic Studies, 41, 215-91. ~ ’ -
Antonelli. G. B. (1886) Sulla Teoria Matematica della Economia Politica, Pisa: nella Tipografia de1
Folchetto. Republished as “On the Mathematical Theory of Political Economy”, in: J. S. Chipman,
L. Hurwicz, M. K. Richter and H. F. Sonnenschein, eds., Preferences, Uility, and Demand. New
York: Harcourt Brace Jovanovich, 1971.
Armstrong, J. S. (1967) “Derivation of Theory by Means of Factor Analysis or Tom Swift and his
Electric Factor Analysis Machine”, American Statistician, 21(5), 17-21.
Ashenfelter, 0. (1980) “Unemployment as Disequilibrium in a Model of Aggregate Labor Supply”,
Econometrica, 48, 541-564.
Atkinson, A. B. and N. Stern (1981) “On Labour Supply and Commodity Demands”, in: A. S.
Deaton, ed., Esqys in the Theory and Measurement of Consumer Behaviour. New York: Cambridge
University Press.
Barnett, W. A. (1979a) “The Joint Allocation of Leisure and Goods Expenditure”, Econometrica, 47,
539-563.
Barnett. W. A. (1979b) “Theoretical Foundations for the Rotterdam Model”, Review of Economic
Studtes, 46, 109-130.
Barnett, W. A. (1983a) “New Indices of Money Supply and the Flexible Laurent Demand System”,
Journal of Economic and Business Statistics, 1, l-23.
Barn&t, W. A. (1983b) “Definitions of ‘Second Order Approximation’ and ‘Flexible Functional
Form”‘, Economics Letters, 12, 31-35.
Bamett W. A. and A. Jonas (1983) “The Mum-Szatz Demand System: An Application of a Globally
Well-Behaved Series Expansion”, Economics Letters, 11, 331-342.
Barnett W. A. and Y. W. Lee (1985) “The Regional Properties of the Miniflex Laurent, Generalized
Leontief, and Translog Flexible Functional Forms”. Econometrica, forthcoming.
Barten, A. P. (1964) “Family Composition, Prices and Expenditure Patterns”, in: P. E. Hart, G. Mills
and J. K. Whitaker, eds., Economic Analysis for National Economic Planning. London: Butterworth.
Barten, A. P. (1966) Theorie en empirie van een volledig stelsel van vraagvergelijkrngen. Doctoral
dissertation, Rotterdam.
Barten, A. P. (1967) “Evidence on the Slutsky Conditions for Demand Equations”, Review of
Economics and Statistics, 49, 77-84.
Barten, A. P. (1969) “Maximum Likelihood Estimation of a Complete System of Demand Equations”,
European Economic Review, 1, I-13.
Barten, A. P. (1977) “The Systems of Consumer Demand Functions Approach: A Review”,
Economefrica, 45, 23-51.
Barten, A. P. and V. Bohm (1980) “Consumer Theory”, in: K. J. Arrow and M. D. Intriligator, eds..
Handbook of Mathematical Economics. Amsterdam: North-Holland.
Barten, A. P. and E. Geyskens (1975) “The Negativity Condition in Consumer Demand”, European
Economic Review, 6, 221-260.
Becker, G. S. (1965) “A Theory of the Allocation of Time”, Economic Journal, 75, 493-517.
Becker, G. S. (1976) The Economic Approach to Human Behaviour. Chicago: University of Chicago
Press.
Ch. 30: Demand Analysis 1831

Bera, A. K., R. P. Byron and C. M. Jarque (1981) “Further Evidence on Asymptotic Tests for
Homogeneity and Symmetry in Large Demand Systems”, Economics Letters, 8, 101-105.
Bemdt. E. R.. M. N. Darrouah and W. E. Diewert (1977) “Flexible Functional Forms and
Expenditure’Distributions: Ai Application to Canadian- Consumer Demand Functions”, Interna-
tionul Economic Review, 18,651-675.
Bemdt, E. R., B. H. Hall, R. E. Hall and J. A. Hausman (1974) “Estimation and Inference in
Non-Linear Structural Models”, Annals of Economic and Social Measurement, 3, 653-665.
Bemdt, E. R. and M. S. Khaled (1979) “Parametric Productivity Measurement and the Choice Among
Flexible Functional Forms”, Journal of Political Economy, 84, 1220-1246.
Bemdt, E. R. and N. E. Savin (1975) “Estimation and Hypothesis Testing in Singular Equation
Systems With Autoregressive Disturbances”, Econometrica, 43, 937-957.
Bemdt, E. R. and N. E. Savin (1977) “Conflict Among Criteria For Testing Hypotheses in the
Multivariate Linear Regression Model”, Econometrica, 45, 1263-1277.
Bewley, T. (1977) “The Permanent Income Hypothesis: A Theoretical Formulation”, Journal of
Economic Theory, 16,252-292.
Bhattacharrya, N. (1978) “Studies on Consumer Behaviour in India”, in: A Survey of Research in
Economics, Vol. 7, Econometrics, Indian Council of Social Science Research: New Delhi, Allied
Publishers.
Blackorby, C., R. Boyce and R. R. Russell (1978) “Estimation of Demand Systems Generated by the
Gorman Polar Form; A Generalization of the S-branch Utility Tree”, Econometrica, 46, 345-363.
Blackorby, C., D. Primont and R. R. Russell (1977) “On Testing Separability Restrictions With
Flexible Functional Forms”, Journal of Econometrics, 5, 195-209.
Blackorby, C., D. Primont and R. R. Russell (1978) Duality, Separability and Functional Structure.
New York: American Elsevier.
Blundell, R. W. and I. Walker (1982) “Modelling the Joint Determination of Household Labour
Supplies and Commodity Demands”, Economic Journal, 92, 351-364.
Breusch, T. S. and A. R. Pagan (1979) “A Simple Test for Heteroscedasticity and Random Coefficient
Variation”, Econometrica, 47, 1287-1294.
Brown, J. A. C. and A. S. Deaton (1972) “Models of Consumer Behaviour: A Survey”, Economic
Journal, 82, 1145-1236.
Browning, M. J., A. Deaton and M. Irish (1985) “A Profitable Approach to Labor Supply and
Commodity Demands Over the Life-Cycle”, Econometricu, forthcoming.
Burstein, M. L. (1961) “Measurement of the Quality Change in Consumer Durables”, Manchester
School, 29, 267-279.
Byron, R. P. (1968) “Methods for Estimating Demand Equations Using Prior Information: A Series
of Experiments With Australia Data”, Australian Economic Papers, 7, 227-248.
Byron, R. P. (1970a) “A Simple Method for Estimating Demand Systems Under Separable Utility
Assumptions”, Review of Economic Studies, 37, 261-274.
Byron, R. P. (1970b) “The Restricted Aitken Estimation of Sets of Demand Relations”, Econometrica,
38, 816-830.
Byron, R. P. (1982) “A Note on the Estimation of Symmetric Systems”, Econometrica, 50, 1573-1575.
Byron, R. P. and M. Rosalsky (1984) “Symmetry and Homogeneity Tests in Demand Analysis: A Size
Correction Which Works”. University of Florida at Gainsville, mimeo.
Carlevaro, F. (1976) “A Generalization of the Linear Expenditure System”, in: L. Solari and J.-N. du
Pasquier, eds., Private and Enlarged Consumption. North-Holland for ASEPELT, 73-92.
Cassell, C. M., C.-E. Samdal and J. H. Wretman (1977) Foundations of Inference in Survey Sampling.
New York: Wiley.
Caves, D. W. and L. R. Christensen (1980) “Global Properties of Flexible Functional Forms”,
Americun Economic Review, 70,422-432.
Chipman, J. S. (1974) “Homothetic Preferences and Aggregation”, Journal of Economic Theory, 8,
26-38.
Chow, G. (1957) Demand for Automobiles in the U.S.: A Study in Consumer Durables. Amsterdam:
North-Holland.
Chow, G. (1960) “Statistical Demand Functions for Automobiles and Their Use for Forecasting”, in:
A. C. Harberger, ed., The Demand for Durable Goods. Chicago: University of Chicago Press.
Christensen, L. R., D. W. Jorgenson and L. J. Lau (1975) “Transcendental Logarithmic Utility
Functions”, American Economic Review, 65, 367-283.
1832 A. Deaton

Christensen, L. R. and M. E. Manser (1977) “Estimating U.S. Consumer Preferences for Meat With a
Flexible Utility Function”, Journal of Econometrics, 5, 37-53.
Conrad, K. and D. W. Jorgenson (1979) “Testing the Integrability of Consumer Demand Functions”,
European Economic Review, 12, 149-169.
Coondoo, D. (1975) “EfIects of Household Composition on Consumption Pattern: A Note”, Arthan-
iti, 17.
Court, A. T. (1939) “Hedonic Price Indexes with Automotive Examples”, in: The Dynamics of
Automobile Demand. New York: General Motors.
Cowling, K. and J. Cubbin (1971) “Price, Quality, and Advertising Competition”, Economica, 82,
963-978.
Cowling, K. and J. Cubbin (1972) “Hedonic Price Indexes for U.K. Cars”, Economic Journal, 82,
963-978.
Cramer, J. S. (1969) Empirical Economics. Amsterdam: North-Holland.
Cubbin, J. (1975) “Quality Change and Pricing Behaviour in the U.K. Car Industry 1956-1968”,
Economica, 42, 43-58.
Deaton, A. S. (1974a) “The Analysis of Consumer Demand in the United Kingdom, 1900-1970”,
Econometrica, 42, 341-367.
Deaton, A. S. (1974b) “A Reconsideration of the Empirical Implications of Additive Preferences”,
Economic Journal, 84, 338-348.
Deaton, A. S. (1975a) Models and Projections of Demand in Post- War Britain. London: Chapman &
Hall.
Deaton, A. S. (1975b) “The Measurement of Income and Price Elasticities”, European Economic
Review, 6, 261-274.
Deaton, A. S. (1975~) The Structure of Demand 1920-1970, The Fontana Economic History of Europe.
Collins: Fontana, 6(2).
Deaton, A. S. (1976) “A Simple Non-Additive Model of Demand”, in: L. Solari and J.-N. du
Pasquier, eds., Private and Enlarged Consumption. North-Holland for ASEPELT, 56-72.
Deaton, A. S. (1978) “Specification and Testing in Applied Demand Analysis”, Economic Journal, 88,
524-536.
Deaton, A. S. (1979) “The Distance Function and Consumer Behaviour with Applications to Index
Number and Optimal Taxation”, Review of Economic Studies, 46, 391-405.
Deaton, A. S. (1981a) “Optimal Taxes and the Structure of Preferences”, Econometrica, 49,1245-1268.
Deaton, A. S. (1981b) “Theoretical and Empirical Approaches to Consumer Demand Under Ration-
ing”, in: A. S. Deaton, ed., Essays in the Theory and Measurement of Consumer Behaviour. New
York: Cambridge University Press.
Deaton, A. S. (1981~) “Three Essays on a Sri Lankan Household Survey”. Living Standards
Measurement Study W.P. No. 11, Washington: The World Bank.
Deaton, A. S. (1982)“Model Selection Procedures, or Does the Consumption Function Exist?“, in:
G. Chow and P. Corsi, eds., Evaluating the Reliability of Macroeconomic Models. New York: Wiley.
Deaton, A. S. (1984) “Household Surveys as a Data Base for the Analysis of Optimality and
Disequilibrium”, Sankhya: The Indian Journal of Statistics, 46, Series B, forthcoming.
Deaton, A. S. and M. Irish (1984) “A Statistical Model for Zero Exoenditures in Household Budeets”.
-
Journal of Public Economics, 23, 59-80.
Deaton, A. S. and J. Muellbauer (1980a) Economics and Consumer Behavior. New York: Cambridge
University Press.
Deaton, A. S. and J. Muellbauer (1980b) “An Almost Ideal Demand System”, Amertcan Economic
Review, 70, 312-326.
Deaton, A. S. and J. Muellbauer (1981) “Functional Forms for Labour Supply and Commodity
Demands with and without Quantity Constraints”, Econometrica, 49, 1521-1532.
Deaton, A. S. and J. Muellbauer (1986) “Measuring Child Costs in Poor Countries”, Journal of
Political Economy, forthcoming.
Dhrymes, P. J. (1971) “Price and Quality Changes in Consumer Capital Goods: An Empirical Study”,
in: Z. Griliches, ed., Price Indexes and Quality Change: Studies in New Methods of Measurement.
Cambridge: Harvard University Press.
Diewert, W. E. (1971) “An Application of the Shephard Duality Theorem: A Generalized Leontief
Production Function”, Journal of Political Economy, 79, 481-507.
Diewert, W. E. (1973a) “Afriat and Revealed Preference Theory”, Review of Economic Studies, 40,
419-426.
Ch. 30: Demand Analysis 1833

Diewert, W. E. (1973b) “Functional Forms for Profit and Transformation Functions”, Journal of
Economic Theory, 6, 284-316.
Diewert, W. E. (1974a) “Applications of Duality Theory”, Chapt. 3 in: M. D. Intriligator and D. A.
Kendrick, eds., Frontiers of Quantitiue Economics, American Elsevier: North-Holland, Vol. II.
Diewert, W. E. (1974b) “Intertemporal Consumer Theory and the Demand for Durables”,
Econometrica, 42, 497-516.
Diewert, W. E. (1980a) “Symmetry Conditions for Market Demand Functions”, Review of Economic
Studies, 47, 595-601.
Diewert, W. E. (1980b) “Duality Approaches to Microeconomic Theory”, in: K. J. Arrow and M.~ J.
Intriligator, eds., Handbook of Mathematical Economics. North-Holland.
Diewert, W. E. (1981) “The Economic Theory of Index Numbers: A Survey”, in: A. S. Deaton, ed.,
Essays in the Theory and Measurement of Consumer Behaviour in Honour of Sir Richard Stone.
Cambridge: Cambridge University Press.
Diewert, W. E. (1983) “The Theory of the Cost of Living Index and the Measurement of Welfare
Change”. University of British Columbia, mimeo.
Dicwert, W. E. and C. Parkan (1978) “Tests for Consistency of Consumer Data and Nonparametric
Index Numbers”. University of British Columbia: Working Paper 78-27, mimeo.
DuMouchel, W. H. and G. J. Duncan (1983) “Using Sample Survey Weights in Multiple Regression
Analyses of Statified Samples”, Journul of American Statistical Association, 78, 535-543.
Eisenberg, E. (1961) “Aggregation of Utility Functions”, Management Science, 7, 337-350.
Engel, E. (1895) “Die Lebenskosten Belgischer Arbeiterfamilien friiher und jetzt”, International
Stutistical Institute Bulletin, 9, l-74.
Epstein, L. and A. Yatchew (1985). “Non-parametric Hypothesis Testing Procedures and Applica-
tions to Demand Analysis”, University of Toronto, mimeo.
Evans, G. B. A. and N. E. Savin (1982) “Conflict Among the Criteria Revisited; the W, LR and LM
Tests”, Econometrica, 50, 737-748.
Federenko, N. P. and N. J. Rimashevskaya (1981) “The Analysis of Consumption and Demand in the
USSR”, in: A. S. Deaton, ed., Essays in the Theory and Measurement of Consumer Behauiour. New
York: Cambridge University Press.
Fiebig, D. G. and H. Theil(1983) “The Two Perils of Symmetry Constrained Estimation of Demand
Systems”, Economics Letters, 13, 105-111.
Fisher, F. M. and K. Shell (1971) “Taste and Quality Change in the Pure Theory of the True Cost of
Living Index”, in: Z. Griliches, ed., Price Indexes and Quality Changes: Studies in New Methods of
Meusurement. Cambridge: Harvard University Press.
Forsyth, F. G. (1960) “The Relationship Between Family Size and Family Expenditure”, Journal of
the Royal Statistical Society, Series A, 123, 367-397.
Freixas, X. and A. Mas-Cole11 (1983) “Engel Curves Leading to the Weak Axiom in the Aggregate”.
Harvard University, mimeo.
Frisch, R. (1932) New Methods of Measuring Marginal Utility. Tubingen: J.C.B. Mohr.
Frisch, R. (1959) “A Complete Scheme for Computing All Direct and Cross Demand Elasticities in a
Model with Many Sectors”, Econometrica, 27, 367-397.
Gallant, R. A. (1975) “Seemingly Unrelated Non-Linear Regressions”, Journal of Econometrics, 3,
35-50.
Gallant, R. A. (1981) “On the Bias in Flexible Functional Forms and an Essentially Unbiased Form:
The Fourier Functional Form”, Journal of Econometrics, 15,211-245.
Gallant, R. A. and G. H. Golub (1983) “Imposing Curvature Restrictions on Flexible Functional
Forms”. North Carolina State Umversity and Stanford University, mimeo.
Godambe, V. P. (1955) “A Unified Theory of Sampling From Finite Populations”, Journal of the
Royal Stutisticul Society, Series B, 17, 268-278.
Godambe, V. P. (1966) “A New Approach to Sampling from Finite Populations: Sufficiency and
Linear Estimation”, Journal of the Royal Statistical Society, Series B, 28, 310-319.
Goldberger, A. S. (1964) Econometric Theory. New York: Wiley.
Goldberger, A. S. (1967) “Functional Form and Utility: A Review of Consumer Demand Theory”.
Social Systems Research Institute, University of Wisconsin, mimeo.
Gorman, W. M. (1953) “Community Preference Fields”, Econometrica 21, 63-80.
Gorman, W. M. (1956, 1980) “A Possible Procedure for Analysing Quality Differentials in ihe Egg
Market”, Review of Economic Studies, 47, 843-856.
Gorman, W. M. (1959) “Separable Utility and Aggregation”, Econometrica, 27, 469-481.
1834 A. Deuton

Gorman, W. M. (1961) “On a Class of Preference Fields”, Metroeconomica, 13, 53-56.


Gorman, W. M. (1968) “The Structure of Utility Functions”, Review of Economic Studies, 5, 369-390.
Gorman, W. M. (1970) “Quasi Separable Preferences, Costs and Technologies”. University of North
Carolina, Chapel Hill, mimeo.
Gorman, W. M. (1976) “Tricks with Utility Functions”, in: M. Artis and R. Nobay, eds., Essays in
Economic Analysis. Cambridge: Cambridge University Press.
Gorman, W. M. (1981) “Some Engel Curves”, in: A. S. Deaton, ed., Essays in Theory and
Measurement of Consumer Behaviour. New York: Cambridge University Press.
Granger, C. W. .I. and P. Newbold (1974) “Supurious Regressions in Econometrics”, Journal of
Econometrics, 2, 111-120.
Griffin, J. M. (1978) “Joint Production Technology: The Case of Petro-Chemicals”, Econometricu 46,
379-396.
Griliches, Z. (1961) “Hedonic Price Indexes for Automobiles: An Econometric Analysis of Quality
Change”, in: Z. Griliches, ed., Price Indexes und Quality Change: Studies in New Methods of
Measurement. Cambridge: Harvard University Press, 1971.
G&key, D. K. and C. A. Knox Love11 (1980) “On the Flexibility of the Translog Approximation”,
International Economic Review, 21, 137-147.
Guilkey, D. K. and P. Schmidt (1973) “Estimation of Seemingly Unrelated Regressions with Vector
Auto-Regressive Errors”, Journal of the American Statistical Association, 68, 642-647.
Haitovsky, Y. (1973) Regression Estimation from Grouped Observations. New York: Hafner.
Hall, R. E. (1978) “Stochastic Implications of the Life-Cycle Permanent Income Hypothesis: Theory
and Evidence”, Journal of Political Economy, 86, 971-987.
Hanoch, G. and M. R. Rothschild (1972) “Testing the Assumptions of Production Theory: A Non
Parainetric Approach”, Journal of Political Economy, 80, 256-275.
Hausman, J. A. (1978) “Specification Tests in Econometrics”, Econometrica, 46, 1251-1271.
Hausman, J. A. (1980) “The Effect of Wages, Taxes, and Fixed Costs on Women’s Labor Force
Participation”, Journal of Public Economics, 14, 161-194.
Hausman, J. A. (1981) “Exact Consumer’s Surplus and Deadweight Loss”, American Economic
Review, 71, 662-676.
Hausman, J. A. (1985) “The Econometrics of Non-Linear Budget Sets”, Econometricu, forthcoming.
Hausman, J. A. and D. A. Wise (1980) “Discontinuous Budget Constraints and Estimation: The
Demand for Housing”, Review of Economic Studies, 47, 45-96.
Heckman, J. J. (1978) “A Partial Survey of Recent Research on the Labor Supply of Women”,
American Economic Review, pap & proc, 68, 200-207.
Heckman, J. J. (1979) “Sample Selection Bias as a Specification Error”, Econometrica, 47, 153-161.
Heckman, J. J. and T. MaCurdy (1980) “A Life-Cycle Model of Female Labor Supply”, Review of
Economic Studies, 47, 47-74.
Henderson, A. M. (1949-1950a) “The Costs of Children”, Populution Studies, Parts I-III, 3,130-150,
4, pp 267-298.
Henderson, A. M. (1949-1950b) “The Cost of a Family”, Review of Economic Studies, 17, 127-148.
Hendry, D. F. (1980) “Econometrics: Alchemy or Science”, Economica, 47, 387-406.
Hicks, J. R. (1936) V&e and Capital. Oxford: Oxford Univeristy Press.
Hicks, J. R. (1956) A Revision of Demand Theory. Oxford: Oxford University Press.
Hildenbrand, W. (1983) “On the Law of Demand”, Econometrica, 51, 997-1019.
Hoa, Tran van (1983) “The Integrability of Generalized Working Models”, Economics Letters, 13,
101-104.
Hoa, Tran van, D. S. Ironmonger and I. Manning (1983) “Energy Consumption in Australia:
Evidence from a Generalized Working Model”, Economics Letters, 12,383-389.
Houthakker, H. S. (1957) “An International Comparison of Household Expenditure Patterns Com-
memorating the Centenary of Engel’s Law”, Econometrica, 25, 532-551.
Houthakker, H. S. (1960) “Additive Preferences”, Econometrica, 28, 224-256.
Houthakker, H. S. and L. D. Taylor (1966) Consumer Demand in the United Stutes, 1929-70, Analysis
und Projections. Cambridge: Harvard Univeristy Press, second edition 1970.
Howe, H. and P. Musgrove (1977) “An Analysis of ECIEL Household Budget Data for Bogota,
Caracas, Guayaquil and Lima”, in: C. Lluch, A. A. Powell and R. Williams, eds., Putterns in
Household Demund and Saving. Oxford: Oxford University Press for the World Bank.
Howe, H., R. A. Pollak and T. J. Wales (1979) “Theory and Time Series Estimation of the Quadratic
Expenditure System”, Econometrica, 47, 1231-1247.
Ch. 30: Demand Analysis 1835

Hurwicz, L. and H. Uzawa (1971) “On the Integrability of Demand Functions”, in: J. S. Chipman,
L. Hurwicz. M. K. Richter and H. F. Sonnenschein, eds., Preference, Utility and Demand. New
York: Harcourt, Brace, Jovanovich, 114-148.
Iyengar, N. S.. L. R. Jain and T. N. Srinivasar (1968) “Economies of Scale in Household Consump-
tion: A Case Study”, Indian Economic Journal, Econometric Annual, 15, 465-477.
Jackson, C. (196&) “Revised Equivalence Scales for Estimating Equivalent Incomes for Budget Costs
by Family Type”, BLS Bulletin, U.S. Dept. of Labor, 1570-1572.
Jerison, M. (1984) “Aggregation and Pairwise Aggregation of Demand When the Distribution of
Income is Fixed”, Journal of Economic Theory, forthcoming.
Jorgenson, D. W. and L. J. Lau (1975) “The Structure of Consumer Preferences”, Annuls of Economic
und Social Measurement, 4, 49-101.
Jorgenson, D. W. and L. J. Lau (1976) “Statistical Tests of the Theory of Consumer Behaviour”, in:
H. Albach. E. Helmstadter and R. Henn, eds., Quantitative Wirtschuftsforschung. Tubingen: J.C.B.
Mohr.
Jorgenson, D. W., L. J. Lau and T. Stoker (1982) “The Transcendental Logarithmic Model of
Aggregate Consumer Behavior”, Advances in Econometrics, 1, JAI Press.
Kannai, Y. (1977) “Concavifiability and Constructions of Concave Utility Functions”, Journal of
Muthematicul Economics, 4, l-56.
Kay, J. A., M. J. Keen and C. N. Morris (1984) “Consumption, Income, and the Interpretation of
Household Expenditure Data”, Journul of Public Economics, 23, 169-181.
Kina, M. A. (1980) “An Econometric Model of Tenure Choice and Demand for Housing as a Joint
D&ision”, Journul of Public Economics, 14, 137-159.
Klein, L. R. and H. Rubin (1947-48) “A Constant Utility Index of the Cost of Living”, Review of
Economic Studies, 15, 84-87.
Kuznets, S. (1962) “Quantitative Aspects of the Economic Growth of Nations: VII The Share and
Structure of Consumption”, Economic Development and Cultural Change, 10, l-92.
Kuznets, S. (1966) Modern Economic Growth. New Haven: Yale University Press.
Laitinen, K. (1978) “Why is Demand Homogeneity so Often Rejected?“, Economics Letters, 1,
187-191.
Lancaster, K. J. (1966) “A New Approach to Consumer Theory”, Journal of Political Economy, 74,
132-157.
Lau, L. J. (1978) “Testing and Imposing Monotonicity, Convexity, and Quasi-Concavity”, in: M. Fuss
and D. McFadden, eds., Production Economics: A Dual Approach to Theory and Applications.
Amsterdam: North-Holland.
Lau, L. J. (1982) “A Note on the Fundamental Theorem of Exact Aggregation”, Economics Letters, 9,
119-126.
Lee, L. F. and M. M. Pitt (1983) “Specification and Estimation of Demand Systems with Limited
Dependent Variables”. University of Minnesota, mimeo.
Leser, C. E. V. (1963) “Forms of Engel Functions”, Econometricu, 31, 694-703.
Lluch, C. (1971) “Consumer Demand Functions, Spain, 1958-64”, European Economic Review, 2,
227-302.
Lluch, C. (1973) “The Extended Linear Expenditure System”, European Economic Review, 4, 21-32.
Lluch, C., A. A. Powell and R. A. Williams (1977) Patterns in Household Demand and Saving. Oxford:
Oxford University Press for the World Bank.
Lluch, C. and R. A. Williams (1974) “Consumer Demand Systems and Aggregate Consumption in the
U.S.A.: An Application of the Extended Linear Expenditure System”, Cunudian Journal of
Economics, 8, 49-66.
MaCurdy, T. E. (1981) “An Empirical Model of Labor Supply in a Life-Cycle Setting”, Journal of
Politicul Economy, 89, 1059-1085.
Malinvaud, E. (1970) ‘Statistical Methods of Econometrics. Amsterdam: North-Holland.
Manser, M. E. and R. J. McDonald (1984) “An Analysis of the Substitution Bias in Measuring
Inflation”, Bureau of Labor Statistics, mimeo.
Marquardt, D. W. (1963) “An Algorithm for Least-Squares Estimation on Non-Linear Parameters”,
Journal of the Society of Industrial and Applied Mathematics, 11, 431-441.
Mayo, S. K. (1978) “Theory and Estimation in the Economics of Housing Demand”, Journal of
Urban Economics, 14,137-159.
McClements, L. D. (1977) “Equivalence Scales for Children”, Journal of Public Economics, 8,
191-210.
1836 A. De&on

McFadden, D. (1978) “Costs, Revenue, and Profit Functions”, in: M. Fuss and D. McFadden, eds.,
Production Economics: A Dual Approach to Theory and Applications. Amsterdam: North-Holland.
McGuire, T. W., J. W. Farley, R. E. Lucas and R. L. Winston (1968) “Estimation and Inference for
Linear Models in which Subsets of the Dependent Variable are Constrained”, Journal of the
American Statistical Association, 63, 1201-1213.
Meisner. J. F. (1979) “The Sad Fate of the Asymptotic Slutsky Symmetry Test for Large Systems”,
Economic Letters, 2, 231-233.
Muellbauer, .I. (1974) “Household Composition, Engel Curves and Welfare Comparisons Between
Households: A Duality Approach”, European Economic Review, 103-122.
Muellbauer, J. (1975a) “The Cost of Living and Taste and Quality Change”, Journul of Economic
Theor?;, 10, 269-283.
Muellbauer, .I. (1975b) “Aggregation, Income Distribution and Consumer Demand”, Review of
Economic Studies, 62, 525-543.
Muellbauer, J. (1976a) “Community Preferences and the Representative Consumer”, Econometrica,
44, 979-999.
Muellbauer, J. (1976b) “Economics and the Representative Consumer”, in: L. Solari and J-N. du
Pasquier, eds., Private and Enlarged Consumption. Amsterdam: North-Holland for ASEPELT,
29-53.
Muellbauer, J. (1976~) “Can We Base Welfare Comparisons Across Households on Behaviour?“.
London: Birkbeck Coliege, mimeo.
Muellbauer, J. (1977) “Testing the Barten Model of Household Composition Effects and the Cost of
Children”, Economic Journal, 87, 460-487.
Muellbauer, J. (1980) “The Estimation of the Prais-Houthakker Model of Equivalence Scales”,
Econometrica, 48, 153-176.
Muellbauer, J. (1981a) “Testing Neoclassical Models of the Demand for Consumer Durables”, in:
A. S. Deaton, ed., Essays in the Theory and Measurement of Consumer Behaviour. New York:
Cambridge University Press.
Muellbauer, J. (1981b) “Linear Aggregation in Neoclassical Labour Supply”, Review of Economic
Studies, 48, 21-36.
Musgrove, P. (1978) Consumer Behavior in Latin America: Income and Spending of Families in Ten
Andean Cities. Washington: Brookings.
Neary, J. P. and K. W. S. Roberts (1980) “The Theory of Household Behaviour Under Rationing”,
European Economic Review, 13, 25-42.
Nicholson, J. L. (1949) “Variations in Working Class Family _ Expenditure”,
_ Journal of the Royal
Statistical Society, Skies A, 112, 359-411. -
Ohta. M. and 2. Griliches (19761 “Automobile Prices Revisited: Extensions of the Hedonic Hvuothe-
sis”, in: N. Terleckyj, ed:, Ho’usehold Production and Consumption. New York: National B&au of
Economic Research.
Orshansky, M. (1965) “Counting the Poor: Another Look at the Poverty Profile”, Social Security
Bulletin, 28, 3-29.
Parks, R. W. (1969) “Systems of Demand Equations: An Empirical Comparison of Alternative
Functional Forms”, Econometrica, 37, 629-650.
Pearce, I. F. (1964) A Contribution lo Demand Analysis. Oxford University Press.
Phlips, L. (1972) “A Dynamic Version of the Linear Expenditure Model”, Review of Economics and
Statistics, 54, 450-458.
Phlips, L. (1974) Applied Consumprion Analysis. Amsterdam and Oxford: North-Holland, second
edition 1983.
Pigou, A. C. (1910) “A Method of Determining the Numerical Value of Elasticities of Demand”,
Economic Journal, 20,636-640.
Pollak, R. A. (1975) “Subindexes in the Cost-of-Living Index”, International Economic Review, 16,
135-150.
Pollak, R. A. and T. J. Wales (1978) “Estimation of Complete Demand Systems from Household
Budget Data”, American Economic Review, 68, 348-359.
Pollak, R. A. and T. J. Wales (1979) “Welfare Comparisons and Equivalence Scales”, American
Economic Review, pap & proc 69, 216-221.
Pollak, R. A. and T. J. Wales (1980) “Comparison of the Quadratic Expenditure System and Translog
Demand Systems with Alternative Specifications of Demographic Effects”, Economerrica, 48,
595-612.
Ch. 30: Demand Analysis 1837

Pollak, R. A. and T. .I. Wales (1981) “Demographic Variables in Demand Analysis”, Econometrica,
49,1533-1551.
Powell, A. A. (1969) “Aitken Estimators as a Tool in Allocating Predetermined Aggregates”, Journal
of the American Statistical Association, 64, 913-922.
Prais, S. J. (1959) “A Comment”, Econometrica, 27, 127-129.
Prais, S. J. and H. S. Houthakker (1955) The Analysis of Family Budgets. Cambridge: Cambridge
University Press, second edition 1971.
Pudney, S. E. (1980) “Disaggregated Demand Analysis: The Estimation of a Class of Non-Linear
Demand Systems”, Review of Economic Studies, 47, 875-892.
Pudney, S. E. (1981a) “Instrumental Variable Estimation of a Characteristics Model of Demand”,
Review of Economic Studies, 48,417-433.
Pudney, S. E. (1981b) “An Empirical Method of Approximating the Separable Structure of Consumer
Preferences”, Review of Economic Studies, 48, 561-577.
Quandt, R. E. (1983) “Computational Problems and Methods”, Handbook of Econometrics. Chapter
12, Vol. 1.
Reece, W. S. and K. D. Zieschang (1985) “Consistent Estimation of the Impact of Tax Deductibility
on the Level of Charitable Contributions”, Econometrica, forthcoming.
Rothbarth, E. (1941) “The Measurement of Change in Real Income Under Conditions of Rationing”,
Review of Economic Studies, 8, 100-107.
Rothbarth, E. (1943) “Note on a Method of Determining Equivalent Income for Families of Different
Composition”, Appendix 4 in: C. Madge, ed., War-Time Pattern of Saving and Spending. Occasional
paper No. 4., London: National Institute of Economic and Social Research.
Roy, R. (1942) De I’Utilith, Contribution b la Theorie des Choix. Paris: Hermann.
Russell, T. (1983) “On a Theorem of German”, Economics Letters, 11, 223-224.
Samuelson, P. A. (1938) “A Note on the Pure Theory of Consumer Behaviour”, Economica, 5, 61-71.
Samuelson, P. A. (1947) Foundations of Economic Analysis. Cambridge: Harvard University Press.
Samuelson, P. A, (1947-48) “Some Implications of Linearity”, Review of Economic Studies, 15, 88-90.
Samuelson, P. A. (1948) “Consumption Theory in Terms of Revealed Preference”, Economica, 15,
243-253.
Samuelson, P. A. (1956) “Social Indifference Curves”, Quarterly Journal of Economics, 70, l-22.
Sargan, J. D. (1964) “Wages and Prices in the United Kingdom” in: P. E. Hart, C. Mills and J. K.
Whitaker, eds., Econometric Analysis for National Economic Planning. London: Butterworths.
Sargan, J. D. (1971) “Production Functions”, Part V in: P. R. G. Layard, J. D. Sargan, M. E. Ager
and D. J. Jones, eds., Qualified Manpower and Economic Performance. London: Penguin Press.
Seneca, J. J. and M. K. Taussig (1971) “Family Equivalence Scales and Personal Income Tax
Exemptions for Children”, Review of Economics and Statistics, 53, 253-262.
Shapiro, P. (1977) “Aggregation and the Existence of a Social Utility Functions”, Review of Economic
Studies, 46, 653-665.
Shapiro, P. and S. Braithwait (1979) “Empirical Tests for the Existence of Group Utility Functions”,
Review of Economic Studies, 46, 653-665.
Shephard, R. (1953) Cost and Production Functions. Princeton: Princeton University Press.
Simmons, P. (1980) “Evidence on the Impact of Income Distribution on Consumer Demand in the
U.K. 1955-68”, Review of Economic Studies, 47, 893-906.
Singh, B. (1972) “On the Determination of Economies of Scale in Household Consumption”,
International Economic Review, 13, 257-270.
Singh, B. (1973) “The Effect of Household Composition on its Consumption Pattern”, Sankhya,
Series B, 35, 207-226. .
Singh B. and A. L. Nagar (1973) “Determination of Consumer Unit Scales”, Econometrica, 41,
347-355.
Spinnewyn, F. (1979a) “Rational Habit Formation”, European Economic Review, 15, 91-109.
Spinnewyn, F. (1979b) “The Cost of Consumption and Wealth in a Model with Habit Formation”,
Economics Letters, 2, 145-148.
Srivastava, V. K. and T. D. Dwivedi (1979) “Estimation of Seemingly Unrelated Regression
Equations: A Brief Survey”, Journal of Econometrics, 10, 15-32.
Stoker, T. (1982) “The Use of Cross-Section Data to Characterize Macro Functions”, Journal of the
American Statistical Association, 77, 369-380.
1838 A. Deaion

Stoker, T. (1985) “Completeness, Distribution Restrictions and the Form of Aggregate Functions”,
Econometrico, forthcoming.
Stone, J. R. N. (1954) “Linear Expenditure Systems and Demand Analysis: An Application to the
Pattern of British Demand”, Economic Journal, 64, 511-527.
Stone, J. R. N. (1956) Quantity and Price Indexes in National Accounts. Paris: OEEC.
Stone, R. and D. A. Rowe (1957) “The Market Demand for Durable Goods”, Econometrica, 25,
423-443.
Stone, R. and D. A. Rowe (1958) “Dynamic Demand Functions: Some Econometric Results”,
Economic Journal, 27, 256-70.
Summers, R. (1959) “A Note on Least Squares Bias in Household Expenditure Analysis”,
Econometrica, 27, 121-126.
Sydenstricker, E. and W. I. King (1921) “The Measurement of the Relative Economic Status of
Families”, Quarterly Publication of the American Statistical Association. 17, 842-857.
Szakolnai, G. (1980) “Limits to Redistribution: The Hungarian Experience”, in: D. A. Collard,
R. Lecomber and M. Slater, eds., Income Distribution, the Limits to Redistribution. Bristol:
Scientechnica.
Theil. H. (1954) Lineur Aggregation of Economic Relations. Amsterdam: North-Holland.
Theil. H. (1965) “The Information Approach to Demand Analysis”, Econometricu, 33. 67-87.
Theil, H. (1971a) Principles of Econometrics. Amsterdam: North-Holland.
Theil, H. (1971b) “An Economic Theory of the Second Moments of Disturbances of Behavioural
Equations”, American Economic Review, 61, 190-194.
Theil. H. (1974) “A Theorv of Rational Random Behavior”. Journal of the American Statistical
Assoctution, 69, 310-314. ’
Theil, H. (1975a) “The Theory of Rational Random Behavior and its Application to Demand
Analysis”, European Economic Review, 6, 217-226.
Theil, H. (1975b) Theory and Measurement of Consumer Demand. North-Holland, Vol. I.
Theil, H. (1976) Theory and Measurement of Consumer Demand. North-Holland, Vol. II.
Theil, H. (1979) The System- Wide Approach to Microeconomics. Chicago: University of Chicago Press.
Theil, H. and K. Laitinen (1981) “The Independence Transformation: A Review and Some Further
Explorations”, in: A. S. Deaton, ed., Essays in the’Theory and Measurement of Consumer Behaoiour.
New York: Cambridge University Press.
Theil, H. and M. Rosalsky (1984) “More on Symmetry-Constrained Estimation”. University of
Florida at Gainesville, mimeo.
Theil, H. and F. E. Suhm (1981) International Consumption Comparisons: A S,vstem- Wide Approuch.
Amsterdam: North-Holland.
Tbursby, J. and C. A. Knox Love11 (1978) “An Investigation of the Kmenta Approximation to the
CES Function”, International Economic Review, 19, 363-377.
Tobin, J. (1952) “A Survey of the Theory of Rationing”, Econometrica, 20, 512-553.
Tobin, J. (1958) “Estimation of Relationships for Limited Dependent Variables”, Econometrica, 26,
24-36.
Tobin, J. and H. S. Houthakker (1951) “The Effects of Rationing on Demand Elasticities”, Review of
Economic Studies, 18, 140-153.
Tomqvist, L. (1941) “Review”, Ekonomisk Tidrkrtjt, 43, 216-225.
Varian, II. R. (1978) “A Note on Locally Constant Income Elasticities”, Economics Letfers, 1, 5-9.
Varian, H. R. (1982) “The Nonparametric Approach to Demand Analysis”, Econometriccr, 50,
945-973.
Varian, H. R. (1983) “Nonparametric Tests of Consumer Behavior”, Review oj Economic Studies, 50,
99-110.
Varian, H. R. (1984) “Nonparametric Analysis of Optimizing Behavior with Measurement Error”.
University of Michigan, mimeo.
Vartia, Y. 0. (1983) “Efficient Methods of Measuring Welfare Change and Compensated Income in
Terms of Market Demand Functions”, Econometrica, 51, 79-98.
Wales, T. J. (1977) “On the Flexibility of Flexible Functional Forms: An Empirical Approach”,
Journal of Econometrics, 5, 183-193.
Wales. T. J. and A. D. Woodland (1983) “Estimation of Consumer Demand Systems with Binding
Non-Negativity Constraints”, Journal of Econometrics, 21, 263-285.
Ch. 30: Demand Analysis 1839

White, H. (1980) “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for
Heteroskedasticity”, Econometrica, 48, 817-838.
Willig, R. (1976) “Integrability Implications for Locally Constant Demand Elasticities”, Journal of
Economic Theory, 12, 391-401.
de Wolff, P. (1941) “Income Elasticity of Demand, a Micro-Economic and a Macro-Economic
Interpretation”, Economic Journal, 51, 104-145.
Woodland, A. (1979) “Stochastic Specification and the Estimation of Share Equations”, Journal of
Econometrics, 10, 361-383.
Working, H. (1943) “Statistical Laws of Family Expenditure”, Journal of the American Statistical
Association, 38, 43-56.
Wu, D-M. (1973) “Alternative Tests of Independence Between Stochastic Regressors and Dis-
turbances”, Econometrica, 41, 733-750.
Yoshihara, K. (1969) “Demand Functions: An Application to the Japanese Expenditure Pattern”,
Econometrica, 37, 257-274.
Zellner, A. (1962) “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for
Aggregation Bias”, Journal of the American Statistical Association, 57, 348-368.
Chapter 31

ECONOMETRIC METHODS FOR MODELING


PRODUCER BEHAVIOR
DALE W. JORGENSON

Harvard University

Contents

1. Introduction 1842
1.1. Production theory 1842
1.2. Parametric form 1844
1.3. Statistical method 1845
1.4. Overview of the paper 1847
2. Price functions 1848
2.1, Duality 1849
2.2. Substitution and technical change 1851
2.3. Parametrization 1855
2.4. Integrability 1857
3. Statistical methods 1860
3.1. Stochastic specification 1860
3.2. Autocorrelation 1862
3.3. Identification and estimation 1865
4. Applications of price functions 1871
4.1. Substitution 1872
4.2. Technical change 1876
4.3. Two stage allocation 1882
5. Cost functions 1884
5.1. Duality 1884
5.2. Substitution and economies of scale 1886
5.3. Parametrization and integrability 1889
5.4, Stochastic specification 1891
6. Applications of cost functions 1893
6.1. Economies of scale 1893
6.2. Multiple outputs 1897
7. Conclusion 1900
7.1. General equilibrium modeling 1900
7.2. Panel data 1902
7.3. Dynamic models of production 1904
References 1905

Handbook of Econometrics, Volume III, Edited by Z. Griliches and M.D. Intriligator


Q Elsevier Science Publishers B V, 1986
1842 D. W. Jorgensott

1. Introduction

The purpose of this chapter is to provide an exposition of econometric methods


for modeling producer behavior. The objective of econometric modeling is to
determine the nature of substitution among inputs, the character of differences in
technology, and the role of economies of scale. The principal contribution of
recent advances in methodology has been to exploit the potential of economic
theory in achieving this objective.
Important innovations in specifying econometric models have arisen from the
dual formulation of the theory of production. The chief advantage of this
formulation is in generating demands and supplies as explicit functions of relative
prices. By using duality in production theory, these functions can be specified
without imposing arbitrary restrictions on patterns of production.
The econometric modeling of producer behavior requires parametric forms for
demand and supply functions. Patterns of production can be represented in terms
of unknown parameters that specify the responses of demands and supplies to
changes in prices, technology, and scale. New measures of substitution, technical
change, and economies of scale have provided greater flexibility in the empirical
determination of production patterns.
Econometric models of producer behavior take the form of systems of demand
and supply functions. All the dependent variables in these functions depend on
the same set of independent variables. However, the variables and the parameters
may enter the functions in a nonlinear manner. Efficient estimation of these
parameters has necessitated the development of statistical methods for systems of
nonlinear simultaneous equations.
The new methodology for modeling producer behavior has generated a rapidly
expanding body of empirical work. We illustrate the application of this methodol-
ogy by summarizing empirical studies of substitution, technical change, and
economies of scale. In this introductory section we first review recent method-
ological developments and then provide a brief overview of the paper.

1.1. Production theory

The economic theory of production- as presented in such classic treatises as


Hick’s Value and Capital (1946) and Samuelson’s Foundations of Economic
Analysis (1983)-is based on the maximization of profit, subject to a production
function. The objective of this theory is to characterize demand and supply
functions, using only the restrictions on producer behavior that arise from
Ch. 31: Economeiric Methods for Modeling Producer Behuvior 1843

optimization. The principal analytical tool employed for this purpose is the
implicit function theorem.’
Unfortunately, the characterization of demands and supplies as implicit func-
tions of relative prices is inconvenient for econometric applications. In specifying
an econometric model of producer behavior the demands and supplies must be
expressed as explicit functions. These functions can be parametrized by treating
measures of substitution, technical change, and economies of scale as unknown
parameters to be estimated on the basis of empirical data.
The traditional approach to modeling producer behavior begins with the
assumption that the production function is additive and homogeneous. Under
these restrictions demand and supply functions can be derived explicitly from the
production function and the necessary conditions for producer equilibrium.
However, this approach has the disadvantage of imposing constraints on patterns
of production - thereby frustrating the objective of determining these patterns
empirically.
The traditional approach was originated by Cobb and Douglas (1928) and was
employed in empirical research by Douglas and his associates for almost two
decades.2 The limitations of this approach were made strikingly apparent by
Arrow, Chenery, Minhas, and Solow (1961, henceforward ACMS), who pointed
out that the Cobb-Douglas production function imposes a priori restrictions on
patterns of substitution among inputs. In particular, elasticities of substitution
among all inputs must be equal to unity.
The constant elasticity of substitution (CES) production function introduced by
ACMS adds flexibility to the traditional approach by treating the elasticity of
substitution as an unknown parameter.3 However, the CES production function
retains the assumptions of additivity and homogeneity and imposes very stringent
limitations on patterns of substitution. McFadden (1963) and Uzawa (1962) have
shown, essentially, that elasticities of substitution among all inputs must be the
same.
The dual formulation of production theory has made it possible to overcome
the limitations of the traditional approach to econometric modeling. This formu-
lation was introducted by Hotelling (1932) and later revived and extended by
Samuelson (1954, 1960)4 and Shephard (1953, 1970).5 The key features of the

‘This approach to production theory is employed by Carlson (1939). Frisch (1965), and Schneider
(1934). The English edition of Frisch’s book is a translation from the ninth edition of his lectures,
published in Norwegian in 1962; the first edition of these lectures dates back to 1926.
*These studies are summarized by Douglas (1948). See also: Douglas (1967, 1976). Early economet-
ric studies of producer behavior, including those based on the Cobb-Douglas production function,
have been surveyed by Heady and Dillon (1961) and Walters (1963). Samuelson (1979) discusses the
impact of Douglas’s research.
3Econometric studies based on the CES production function have been surveyed by Griliches
(1967), Jorgenson (1974) Kennedy and Thirlwall (1972). Nadiri (1970), and Nerlove (1967).
1844 D. W. Jorgenson

dual formulation are, first, to characterize the production function by means of a


dual representation such as a price or cost function and, second, to generate
explicit demand and supply functions as derivatives of the price or cost function.h
The dual formulation of production theory embodies the same implications of
optimizing behavior as the theory presented by Hicks (1946) and Samuelson
(1983). However, the dual formulation has a crucial advantage in the development
of econometric methodology: Demands and supplies can be generated as explicit
functions of relative prices without imposing the arbitrary constraints on produc-
tion patterns required in the traditional methodology. In addition, the implica-
tions of production theory can be incorporated more readily into an econometric
model.

1.2. Parametric form

Patterns of producer behavior can be described most usefully in terms of the


behavior of the derivatives of demand and supply functions.7 For example,
measures of substitution can be specified in terms of the response of demand
patterns to changes in input prices. Similarly, measures of technical change can be
specified in terms of the response of these patterns to changes in technology. The
classic formulation of production theory at this level of specificity can be found in
Hicks’s Theory of Wages (1963).
Hicks (1963) introduced the elasticity of substitution as a measure of substitu-
tability. The elasticity of substitution is the proportional change in the ratio of
two inputs with respect to a proportional change in their relative price. Two
inputs have a high degree of substitutability if this measure exceeds unity and a
low degree of substitutability if the measure is less than unity. The unitary
elasticity of substitution employed in the Cobb-Douglas production function is a
borderline case between high and low degrees of substitutability.
Similarly, Hicks introduced the bias of technical change as a measure of the
impact of changes in technology on patterns of demand for inputs. The bias of
technical change is the response of the share of an input in the value of output to
a change in the level of technology. If the bias is positive, changes in technology

4Hotelling (1932) and Samuelson (1954) develop the dual formulation of production theory on the
basis of the Legendre transformation. This approach is employed by Jorgenson and Lau (lY74a,
1974b) and Lau (1976,197Sa).
5Shephard utilizes distance functions to characterize the duality between cost and production
functions. This approach is employed by Diewert (1974a, lY82), Hanoch (1978), McFadden (1978),
and Uzawa (1964).
6Surveys of duality in the theory of production are presented by Diewert (1982) and Samuelson
(1983).
‘This approach to the selection of parametric forms is discussed by Diewert (1974a), Fuss,
McFadden, and Mundlak (1978). and Lau (1974).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1845

increase demand for the input and are said to use the input; if the bias is negative,
changes in technology decrease demand for the input and are said to save input.
If technical change neither uses nor saves an input, the change is neutral in the
sense of Hicks.
By treating measures of substitution and technical change as fixed parameters
the system of demand and supply functions can be generated by integration.
Provided that the resulting functions are themselves integrable, the underlying
price or cost function can be obtained by a second integration. As we have
already pointed out, Hicks’s elasticity of substitution is unsatisfactory for this
purpose, since it leads to arbitrary restrictions on patterns of producer behavior.
The introduction of a new measure of substitution, the share elasticity, by
Christensen, Jorgenson, and Lau (1971, 1973) and Samuelson (1973) has made it
possible to overcome the limitations of parametric forms based on constant
elasticities of substitution.’ Share elasticities, like biases of technical change, can
be defined in terms of shares of inputs in the value of output. The share elasticity
of a given input is the response of the share of that input to a proportional change
in the price of an input.
By taking share elasticities and biases of technical change as fixed parameters,
demand functions for inputs &th constant share elasticities and constant biases
of technical change can be obtained by integration. The shares of each input in
the value of output can be taken to be linear functions of the logarithms of input
prices and of the level of technology. The share elasticities and biases of technical
change can be estimated as unknown parameters of these functions.
The constant share elasticity (CSE) form of input demand functions can be
integrated a second time to obtain the underlying price or cost function. For
example, the logarithm of the price of output can be expressed as a quadratic
function of the logarithms of the input prices and the level of technology. The
price of output can be expressed as a transcendental or, more specifically, an
exponential function of the logarithms of the input prices.’ Accordingly,
Christensen, Jorgenson, and Lau refer to this parametric form as the translog
price functi0n.l’

1.3. Statistical method

Econometric models of producer behavior take the form of systems of demand


and supply functions. All the dependent variables in these functions depend on

sA more detailed discussion of this measure is presented in Section 2.2 below.


9An alternative approach, originated by Diewert (1971, 1973, 1974b), employs the square roots of
the input prices rather than the logarithms and results in the “generalized Leontief” parametric form.
‘OSurveys of parametric forms employed in econometric modeling of producer behavior are
presented by Fuss, McFadden, and Mundlak (1978) and Lau (1986).
1846 D. W. Jorgenson

the same set of independent variables-for example, relative prices and the level
of technology. The variables may enter these functions in a nonlinear manner, as
in the translog demand functions proposed by Christensen, Jorgenson, and Lau.
The functions may also be nonlinear in the parameters. Finally, the parameters
may be subject to nonlinear constraints arising from the theory of production.
The selection of a statistical method for estimation of systems of demand and
supply functions depends on the character of the data set. For cross section data
on individual producing units, the prices that determine demands and supplies
can be treated as exogenous variables. The unknown parameters can be estimated
by means of nonlinear multivariate regression techniques. Methods of estimation
appropriate for this purpose were introduced by Jennrich (1969) and Malinvaud
(1970,1980).1’
For time series data on aggregates such as industry groups, the prices that
determine demands and supplies can be treated as endogenous variables. The
unknown parameters of an econometric model of producer behavior can be
estimated by techniques appropriate for systems of nonlinear simultaneous equa-
tions. One possible approach is to apply the method of full information maximum
likelihood. However, this approach has proved to be impractical, since it requires
the likelihood function for the full econometric model, not only for the model of
producer behavior.
Jorgenson and Laffont (1974) have developed limited information methods for
estimating the systems of nonlinear simultaneous equations that arise in modeling
producer behavior. Amemiya (1974) proposed to estimate a single nonlinear
structural equation by the method of nonlinear two stage least squares. The first
step in this procedure is to linearize the equation and to apply the method of two
stage least squares to the linearized equation. Using the resulting estimates of the
coefficients of the structural equation, a second linearization can be obtained and
the process can be repeated.
Jorgenson and Laffont extended Amemiya’s approach to a system of nonlinear
simultaneous equation by introducing the method of nonlinear three stage least
squares. This method requires an estimate of the covariance matrix of the
disturbances of the system of equations as well as an estimate of the coefficients
of the equations. The procedure is initiated by linearizing the system and applying
the method of three stage least squares to the linearized system. This process can
be repeated, using a second linearization.12
It is essential to emphasize the role of constraints on the parameters of
econometric models implied by the theory of production. These constraints may
take the form of linear or nonlinear restrictions on the parameters of a single

“Methods for estimation of nonlinear multivariate regression models are summarized by Malinvaud
(1980).
“Nonlinear two and three stage least squares methods are also discussed by Amemiya (1977),
Gallant (1977), and Gallant and Jorgenson (1979).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1847

equation or may involve restrictions on parameters that occur in several equa-


tions. An added complexity arises from the fact that the restrictions may take the
form of equalities or inequalities. Estimation under inequality restrictions requires
nonlinear programming techniques.13
The constraints that arise from the theory of production can be used to provide
tests of the validity of the theory. Similarly, constraints that arise from simplifica-
tion of the patterns of production can be tested statistically. Methods for
statistical inference in multivariate nonlinear regression models were introduced
by Jennrich (1969) and Malinvaud (1970,198O). Methods for inference in systems
of nonlinear simultaneous equations were developed by Gallant and Jorgenson
(1979) and Gallant and Holly (1980).14

1.4. Overview of the paper

This paper begins with the simplest form of the econometric methodology for
modeling producer behavior. This methodology is based on production under
constant returns to scale. The dual representation of the production function is a
price function, giving the price of output as a function of the prices of inputs and
the level of technology. An econometric model of producer behavior is generated
by differentiating the price function with respect to the prices and the level of
technology.
We present the dual formulation of the theory of producer behavior under
constant returns to scale in Section 2. We parameterize this model by taking
measures of substitution and technical change to be constant parameters. We
than derive the constraints on these parameters implied by the theory of produc-
tion. In Section 3 we present statistical methods for estimating this model of
producer behavior under linear and nonlinear restrictions. Finally, we illustrate
the application of this model by studies of data on individual industries in Sec-
tion 4.
In Section 5 we consider the extension of econometric modeling of producer
behavior to nonconstant returns to scale. In regulated industries the price of
output is set by regulatory authority. Given the demand for output as a function
of the regulated price, the level of output can be taken as exogenous to the
producing unit. Necessary conditions for producer equilibrium can be derived
from cost minimization. The minimum value of total cost can be expressed as a
function of the level of output and the prices of all inputs. This cost function
provides a dual representation of the production function.

I3Constrained estimation is discussed in more detail in Section 3.3 below.


“‘Surveys of methods for estimation of nonlinear multivariate regressions and systems of nonlinear
simultaneous equations are given by Amemiya (1983) and Malinvaud (1980), especially Chs 9 and 20.
Computational techniques are surveyed by Quandt (1983).
1848 D. W. Jorgenson

The dual formulation of the theory of producer behavior under nonconstant


returns to scale parallels the theory under constant returns. However, the level of
output replaces the level of technology as an exogenous determinant of produc-
tion patterns. An econometric model can be parametrized by taking measures of
substitution and economies of scale to be constant parameters. In Section 6 we
illustrate this approach by means of studies of data on individual firms in
regulated industries.
In Section 7 we conclude the paper by outlining frontiers for future research.
Current empirical research has focused on the development of more elaborate and
more detailed data sets. We consider, in particular, the modeling of consistent
time series of inter-industry transactions tables and the application of the results
to general equilibrium analysis of the impact of economic policy. We also discuss
the analysis of panel data sets, that is, time series of cross sections of observations
on individual producing units.
Current methodological research has focused on dynamic modeling of produc-
tion. At least two promising approaches to this problem have been proposed;
both employ optimal control models of producer behavior. The first is based on
static expectations with all future prices taken to be equal to current prices. The
second approach is based on stochastic optimization under rational expectations,
utilizing information about expectations of future prices contained in current
production patterns.

2. Price functions

The purpose of this section is to present the simplest form of the econometric
methodology for modeling producer behavior. We base this methodology on a
production function with constant returns to scale. Producer equilibrium implies
the existence of a price function, giving the price of output as a function of the
prices of inputs and the level of technology. The price function is dual to the
production function and provides an alternative and equivalent description of
technology.
An econometric model of producer behavior takes the form of a system of
simultaneous equations, determining the distributive shares of the inputs and the
rate of technical change. Measures of substitution and technical change give the
responses of the distributive shares and the rate of technical change to changes in
prices and the level of technology. To generate an econometric model of producer
behavior we treat these measures as unknown parameters to be estimated.
The economic theory of production implies restrictions on the parameters of an
econometric model of producer behavior. These restrictions take the form of
linear and nonlinear constraints on the parameters. Statistical methods employed
in modeling producer behavior involve the estimation of systems of nonlinear
Ch. 31: Econometric Methods for Modeling Producer Behavior 1 x49

simultaneous equations with parameters subject to constraints. These constraints


give rise to tests of the theory of production and tests of restrictions on patterns
of substitution and technical change.

2. I. Duality

In order to present the theory of production we first require some notation. We


denote the quantity of output by y and the quantities of J inputs by x1( j =
1,2.. . J). Similarly, we denote the price of output by q and the prices of the J
inputs by p,(j=l,2... J). We find it convenient to employ vector notation for
the input quantities and prices:
x = (Xi, x 2.. . xJ) -vector of input quantities.
P = (Pl? P2.. . pJ) - vector of input prices.
We assume that the technology can be represented by a production function, say
F, where:

Y = F(x, t>, (2.1)


and t is an index of the level of technology. In the analysis of time series data for
a single producing unit the level of technology can be represented by time. In the
analysis of cross section data for different producing units the level of technology
can be represented by one-zero dummy variables corresponding to the different
units.15
We can define the shares of inputs in the value of output by:

PJXj
u_=
J
- (j=l,Z...J).
ClY ’
Under competitive markets for output and all inputs the necessary conditions for
producer equilibrium are given by equalities between the share of each input in
the value of output and the elasticity of output with respect to that input:

u=S(x,t), (2.2)
where

u = (Ul, u2.. . uJ) -vector of value shares.


lnx = (lnx,,lnx,... In xJ) - vector of logarithms of input quantities.

15Time series and cross section differences in technology have been incorporated into a model
of substitution and technical change in U.S. agriculture by Binswanger (1974a, 1974b, 1978~).
Binswanger’s study is summarized in Section 4.2 below.
1850 D. W. Jorgenson

Under constant returns to scale the elasticities and the value shares for all
inputs sum to unity:

i,u=i’alny =I

dlnx ’

where i is a vector of ones. The value of output is equal to the sum of the values
of the inputs.
Finally, we can define the rate of technical change, say u,, as the rate of growth
of the quantity of output holding all inputs constant:

ut =$y(x,t). (2.3)

It is important to note that this definition does not impose any restriction on
patterns of substitution among inputs.
Given the identity between the value of output and the value of all inputs and
given equalities between the value share of each input and the elasticity of output
with respect to that input, we can express the price of output as a function, say Q,
of the prices of all inputs and the level of technology:

q=Q(~,t). (2.4)
We refer to this as the price function for the producing unit.
The price function Q is dual to the production function F and provides an
alternative and equivalent description of the technology of the producing unit.16
We can formalize this description in terms of the following properties of the price
function:

1. Positiuity. The price function is positive for positive input prices.


2. Homogeneity. The price function is homogeneous of degree one in the
input prices.
3. Monotonicity. The price function is increasing the input prices.
4. Concauity. The price function is concave in the input prices.

Given differentiability of the price function, we can express the value shares of
all inputs as elasticities of the price function with respect to the input prices:

u=z(P,t), (2.5)

“The dual formulation of production theory under constant returns to scale is due to Samuelson
(1954).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1851

where:

Inp=(lnp,,lnp,... In pJ) -vector of logarithms of input prices.

Further, we can express the negative of the rate of technical change as the rate of
growth of the price of output, holding the prices of all inputs constant:

(2.6)
Since the price function Q is homogeneous of degree one in the input prices,
the value shares and the rate of technical change are homogeneous-of degree zero
and the value shares sum to unity:

i,“=i’alnq=1
alnp .

Since the price function is increasing in the input prices the value shares must be
nonnegative,

u 22 0.

Since the value shares sum to unity, we can write:

u 2 0,

where u 2 0 implies u 2 0 and u # 0.

2.2. Substitution and technical change

We have represented the value shares of all inputs and the rate of technical
change as functions of the input prices and the level of technology. We can
introduce measures of substitution and technical change to characterize these
functions in detail. For this purpose we differentiate the logarithm of the price
function twice with respect to the logarithms of input prices to obtain measures of
substitution:

u = a2lnq(p,t)= j&(PT 4. (2.7)


pp alnp2

We refer to the measures of substitution (2.7) as share elasticities, since they


give the response of the value shares of all inputs to proportional changes in
1852 D. W. Jorgenson

the input prices. If a share elasticity is positive, the corresponding value share
increases with the input price. If a share elasticity is negative, the value share
decreases with the input price. Finally, if a share elasticity is zero, the value
share is independent of the price.17
Second, we can differentiate the logarithm of the price function twice with
respect to the logarithms of input prices and the level of technology to obtain
measures of technical change:

P-8)

We refer to these measures as biases of technical change. If a bias of technical


change is positive, the corresponding value share increases with a change in the
level of technology and we say that technical change is input-using. If a bias of
technical change is negative, the value share decreases with a change in technol-
ogy and technical change is input-sauing. Finally, if a bias is zero, the value share
is independent of technology; in this case we say that technical change is
neutral.18
Alternatively, the vector of biases of technical change uPl can be employed to
derive the implications of changes in input prices for the rate of technical change.
If a bias of technical change is positive, the rate of technical change decreases
with the input price. If a bias is negative, the rate of technical change increases
with the input price. Finally, if a bias is zero so that technical change is neutral,
the rate of technical change is independent of the price.
To complete the description of technical change we can differentiate the
logarithm of the price function twice with respect to the level of technology:

(2.9)

We refer to this measure as the deceleration of technical change, since it is the


negative of rate of change of the rate of technical change. If the deceleration is
positive, negative, or zero, the rate of technical change is decreasing, increasing, or
independent of the level of technology.
The matrix of second-order logarithmic derivatives of the logarithm of the price
function Q must be symmetric. This matrix includes the matrix of share elastici-
ties UPP, the vector of biases of technical change up,, and the deceleration of
technical change u,,. Concavity of the price function in the input prices implies

17The share elasticity was introduced by Christensen, Jorgenson, and Lau (1971, 1973) and
Samuelson (1973).
“This definition of the bias of technical change is due to Hicks (1963). Alternative definitions of
biases of technical change are compared by Binswanger (1978b).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1853

that matrix of second-order derivatives, say H, is nonpositive definite, so that the


matrix UPP + vu’ - V is nonpositive definite, where:

+NH.N=U,,+w’-V;

the price of output q is positive and the matrices N and V are diagonal:

We can define substitution and complementarity of inputs in terms of the


matrix of share elasticities UPP and the vector of value shares u. We say that two
inputs are substitutes if the corresponding element of the matrix UpP + uu’ - V is
negative. Similarly, we say that two inputs are complements if the corresponding
element of this matrix is positive. If the element of this matrix corresponding to
the two inputs is zero, we say that the inputs are independent. The definition of
substitution and complementarity is symmetric in the two inputs, reflecting the
symmetry of the matrix uPP + uu’- V. If there are only two inputs, nonpositive
definiteness of this matrix tmplies that the inputs cannot be complements.”
We next consider restrictions on patterns of substitution and technical change
implied by separability of the price function Q. The most important applications
of separability are associated with aggregation over inputs. Under separability the
price of output can be represented as a function of the prices of a smaller number
of inputs by introducing price indexes for input aggregates. By treating the price
of each aggregate as a function of the prices of the inputs making up the
aggregate, we can generate a second stage of the model.
We say that the price function Q is separable in the K input prices
{ PI> P2.. . pK} if and only if the price function can be represented in the form:

(2.10)

where the function P is independent of the J - K input prices { pK+ I, pK12.. . pJ}
and the level of technology t. *’ We say that the price function is homothetically
separable if the function P in (2.10) is homogeneous of degree one.21 Separability
of the price function implies homothetic separability.22

“Alternative definitions of substitution and complementarity are discussed by Samuelson (1974).


2oThe concept of separability is due to Leontief (1947a, 1947b) and Sono (1961).
“The concept of homothetic separability was introduced by Shephard (1953, 1970).
22A proof of this proposition is given by Lau (1969, 1978a).
1x54 D. W. Jmgenson

The price function Q is homothetically separable in the K input prices


{ pl, p2.. . pK} if and only if the production function F is homothetically
separable in the K input quantities {x1, x2.. . xK}:

(2.11)

where the function G is homogenous of degree one and independent of J - K


quantities {x~+~,x~+*... xJ } and the level of technology t.23
We can interpret the function P in the definition of separability of the price
function as a price index; similarly, we can interpret the function G as a quantity
index. The price index is dual to the quantity index and has properties analogous
to those of the price function:

1. Positivity. The price index is positive for positive input prices.


2. Homogeneity. The price index is homogeneous of degree one in the input
prices.
3. Monotonicity. The price index is increasing in the i@ut prices.
4. Concavity. The price index is concave in the input prices.

The total cost of the K inputs included in the price index P, say c, is the sum
of expenditures on all K inputs:

c= c PkXk.
k=l

We can define the quantity index G for this aggregate as the ratio of total cost to
the price index P:

G=$. (2.12)

The product of the price and quantity indexes for the aggregate is equal to the
cost of the K inputs.24
We can analyze the implications of homothetic separability by introducing
price and quantity indexes of aggregate input and defining the value share of
aggregate input in terms of these indexes. An aggregate input can be treated in
precisely the same way as any other input, so that price and quantity indexes can
be used to reduce the dimensionality of the space of input prices and quantities.
The price index generates a second stage of the model, by treating the price of
each aggregate as a function of the prices of the inputs making up the aggregate.25

23A proof of this proposition is given by Lau (1978a).


24Thi~ characterization of price and quantity indexes was originated by Shephard (1953, 1970).
25Gorman (1959) has analyzed the relationship between aggregation over commodities and two
stage allocation. A presentation of the theory of two stage allocation and references to the literature
are given by Blackorby, Primont, and Russell (1978).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1855

2.3. Parametrization

In the theory of producer behavior the dependent variables are value shares of all
inputs and the rate of technical change. The independent variables are prices of
inputs and the level of technology. The purpose of an econometric model of
producer behavior is to characterize the value shares and the rate of technical
change as functions of the input prices and the level of technology.
To generate an econometric model of producer behavior a natural approach is
to treat the measures of substitution and technical change as unknown parameters
to be estimated. For this purpose we introduce the parameters:

(2.13)

where Bpp is a matrix of constant share elasticities, & is a vector of constant


biases of technical change, and /3,, is a constant deceleration of technical
change.26
We can regard the matrix of share elasticities, the vector of biases of technical
change, and the deceleration of technical change as a system of second-order
partial differential equations. We can integrate this system to obtain a system of
first-order partial differential equations:

u=a,+B,,lnp+j$,.t,

- u, = (Y, + &ln p + &,. t, (2.14)

where the parameters - aP, a, - are constants of integration.


To provide an interpretation of the parameters - aP, a, -we first normalize the
input prices. We can set the prices equal to unity where the level of technology t
is equal to zero. This represents a choice of origin for measuring the level of
technology and a choice of scale for measuring the quantities and prices of inputs.
The vector of parameters LYEis the vector of value shares and the parameter (Y, is
the negative of the rate of technical charge where the level of technology t is zero.
Similarly, we can integrate the system of first-order partial differential eqs.
(2.14) to obtain the price function:

1np=cu~+a~1np+a;t+~1np’B,,1np+1np’&;t+~~,;t2, (2.15)

where the parameter CQ is a constant of integration. Normalizing the price of

26Share elasticities were introduced as constant parameters of an econometric model of producer


behavior by Christensen, Jorgenson, and Lau (1971, 1973). Constant share elasticities, biases, and
deceleration of technical change are employed by Jorgenson and Fraumeni (1981) and Jorgenson
(1983, 1984b). Binswanger (1974a, 1974b, 1978~) uses a different definition of biases of technical
change in parametrizing an econometric model with constant share elasticities.
1856 D. W. Jorgenson

output so that it is equal to unity where t is zero, we can set this parameter equal
to zero. This represents a choice of scale for measuring the quantity and price of
output.
For the price function (2.15) the price of output is a transcendental or, more
specifically, an exponential function of the logarithms of the input prices. We
refer to this form as the transcendental logarithmic price function or, more simply,
the translog price function, indicating the role of the variables. We can also
characterize this price function as the constant share elasticity or CSE price
function, indicating the role of the fixed parameters. In this representation the
scalars - (Y,, p, -the vectors - (Ye,&, - and the matrix Bpp are constant parameters
that reflect the underlying technology. Differences in levels of technology among
time periods for a given producing unit or among producing units at a given point
of time are represented by differences in the level of technology t.
For the translog price function the negative of the average rates of technical
change at any two levels of technology, say t and t - 1, can be expressed as the
difference between successive logarithms of the price of output, less a weighted
average of the differences between successive logarithms of the input prices with
weights given by the average value shares:

-U,=lnq(t)-lnq(t-l)-iY[lnp(t)-lnp(t-l)]. (2.16)

In the expression (2.16) U, is the average rate of technical change,

v,=f[u,(t)+u,(t-l)],

and the vector of average value shares U is given by:

U=i[u(t)+U(t-.l)].

We refer to the expression (2.16), introduced by Christensen and Jorgenson


(1970), as the translog rate of technical change.
We have derived the translog price function as an exact representation of a
model of producer behavior with constant share elasticities and constant biases
and deceleration of technical change. 27 An alternative approach to the translog
price function, based on a Taylor’s series approximation to an arbitrary price
function, was originated by Christensen, Jorgenson, and Lau (1971, 1973).
Diewert (1976, 1980) has shown that the translog rate of technical change (2.16) is
exact for the translog price function and the converse.
Diewert (1971, 1973, 1974b) introduced the Taylor’s series approach for
parametrizing models of producer behavior based on the dual formulation of the

“Arrow, Chenery, Minhas, and Solow (1961) have derived the CES production function as an exact
representation of a model of producer behavior with a constant elasticity of substitution.
Ch. 31: Econometric Methods for Modeling Producer Behuvior 1857

theory of production. He utilized this approach to generate the “generalized


Leontief” parametric form, based on square root rather than logarithmic transfor-
mations of prices. Earlier, Heady and Dillon (1961) had employed Taylor’s series
approximations to generate parametric forms for the production function, using
both square root and logarithmic transformations of the quantities of inputs.
The limitations of Taylor’s series approximations have been emphasized by
Gallant (1981) and Elbadawi, Gallant, and Souza (1983). Taylor’s series provide
only a local approximation to an arbitrary price or production function. The
behavior of the error of approximation must be specified in formulating an
econometric model of producer behavior. To remedy these deficiencies Gallant
(1981) has introduced global approximations based on Fourier series.28

2.4. Integrability

The next stop in generating our econometric model of producer behavior is to


incorporate the implications of the econometric theory of production. These
implications take the form of restrictions on the system of eqs. (2.14), consisting
of value shares of all inputs u and the rate of technical change u,. These
restrictions are required to obtain a price function Q with the properties we have
listed above. Under these restrictions we say that the system of equations is
integrable. A complete set of conditions for integrability is the following:

2.4, I. Homogeneity

The value shares and the rate of technical change are homogeneous of degree zero
in the input prices.
We first represent the value shares and the rate of technical change as a sys-
tem of eqs. (2.14). Homogeneity of the price function implies that the
parameters - Bpp, BPt -in this system must satisfy the restrictions:

Bp,i = 0,
(2.17)
&,i = 0,

where i is a vector of ones. For J inputs there are J+l restrictions implied by
homogeneity.

‘aAn alternative approach to the generation of the translog parametric form for the production
function by means of the Taylor’s series was originated by Kmenta (1967). Kmenta employs a Taylor’s
series expansion in terms of the parameters of the CES production function. This approach imposes
the same restrictions on patterns of production as those implied by the constancy of the elasticity of
substitution. The Kmenta approximation is employed by Griliches and Ringstad (1971) and Sargan
(1971), among others, in estimating the elasticity of substitution.
1858 D. W. Jorgenson

2.4.2. Product exhaustion

The sum of the value shares is equal to unity.


Product exhaustion implies that the value of the J inputs is equal to the value
of the product. Product exhaustion implies that the parameters- (Y*, Bpp,
j?,* -must satisfy the restrictions:

ffbi =l,

Bipi = 0,

&i = 0. (2.18)

For J inputs there are J + 2 restrictions implied by product exhaustion.

2.4.3. Symmetry

The matrix of share elasticities, biases of technical change, and the deceleration of
technical change must be symmetric.
A necessary and sufficient condition for symmetry is that the matrix of
parameters must satisfy the restrictions:

(2.19)

For J inputs the total number of symmetry restrictions is iJ( J + 1).

2.4.4. Nonnegativity

The value shares must be nonnegative. Nonnegativity is implied by monotonicity


of the price function:

alnq
alnp >=O’

For the translog price function the conditions for monotonicity take the form:

alnq
-=~~~+B~,lnp+/$;t~O. (2.20)
alnp

Since the translog price function is quadratic in the logarithms of the input prices,
we can always choose prices so that the monotonicity of the price function is
Ch. 31: EconometricMethoakfor Modeling Producer Behavior 1859

violated. Accordingly, we cannot impose restrictions on the parameters that


would imply nonnegativity of the value shares for all prices and levels of
technology. Instead, we consider restrictions that imply monotonicity of the value
shares wherever they are nonnegative.

2.4.5. Monotonicity

The matrix of share elasticities must be nonpositive definite.


Concavity of the price function implies that the matrix BPP + uu’ - V is
nonpositive definite. Without violating the product exhaustion and nonnegativity
restrictions we can set the matrix uu’ - V equal to zero. For example, we can
choose one of the value shares equal to unity and all the others equal to zero. A
necessary condition for the matrix BP, + uu’ - V to be nonpositive definite is that
the matrix of constant share elasticities BP, must be nonpositive definite. This
condition is also sufficient, since the matrix uu’ - V is nonpositive definite and the
sum of two nonpositive definite matrixes is nonpositive definite.29
We can impose concavity on the translog price functions by representing the
matrix of constant share elasticities Bpp in terms of its Cholesky factorization:

Bpp = TDT’,

where T is a unit lower triangular matrix and D is a diagonal matrix. For J


inputs we can write the matrix BP, in terms of its Cholesky factorization as
follows:

where:

T=

The matrix of constant share elasticities Bpp must satisfy restrictions implied by
symmetry and product exhaustion. These restrictions imply that the parameters of

29This approach to global concavity was originated by Jorgenson and Fraumeni (1981). Caves and
Christensen (1980) have compared regions where concavity obtains for alternative parametric forms
1860 D. W. Jorpmon

the Cholesky factorization must satisfy the following conditions:

1+h,,+A,,+ ... +x,,=o,


1+X,,+A,,+ *.* +x,2=0,
...........................
8.J = 0.

Under these conditions there is a one-to-one transformation between the elements


of the matrix of share elasticities BPP and the parameters of the Cholesky
factorization- T, D. The matrix of share elasticities is nonpositive definite if and
only if the diagonal elements {S,, 8, _._S,_,} of the matrix D are nonpositive.3”

3. Statistical methods

Our model of producer behavior is generated from a translog price function for
each producing unit. To formulate an econometric model of production and
technical change we add a stochastic component to the equations for the value
shares and the rate of technical change. We associate this component with
unobservable random disturbances at the level of the producing unit. The
producer maximizes profits for given input prices, but the value shares of inputs
are subject to a random disturbance.
The random disturbances in an econometric model of producer behavior may
result from errors in implementation of production plans, random elements in the
technology not reflected in the model of producer behavior, or errors of measure-
ment in the value shares. We assume that each of the equations for the value
shares and the rate of technical change has two additive components. The first is a
nonrandom function of the input prices and the level of technology; the second is
an unobservable random disturbance that is functionally independent of these
variables.31

3. I. Stochastic specification

To represent an econometric model of production and technical change we


require some additional notation. We consider observations on the relative
distribution of the value of output among all inputs and the rate of technical

30The Cholesky factorization was first proposed for imposing local concavity restrictions by Lau
(1978b).
31Different stochastic specifications are compared by Appelbaum (1978), Burgess (1975), and Geary
and McDonnell (1980). The implications of alternative stochastic specifications are discussed in detail
by Fuss, McFadden, and Mundlak (1978).
Ch. 31: Economeiric Methods for Modeling Producer Behavior 1861

change. We index the observations by levels of technology (f = 1,2.. . T). We


employ a level of technology indexed by time as an illustration throughout the
following discussion. The vector of value shares in the t th time period is denoted
u’(t =1,2 . . . T). Similarly, the rate of technical change in the t th time period is
denoted u:. The vector of input prices in the t th time period is denoted
p,(t =1,2... 2’). Similarly, the vector of logarithms of input prices is denoted
lnp,(t=1,2...T).
We obtain an econometric model of production and technical change corre-
sponding to the translog price function by adding random disturbances to the
equations for the value shares and the rate of technical change:

v’ = LYE+ BJn pt + /3,,. t + E’,

~:=“~+Pd,lnp,+P,;t+~:, @=1,2.X), (34

where ef is the vector of unobservable random disturbances for the value shares of
the t th time period and E: is the corresponding disturbance for the rate of
technical change. Since the value shares for all inputs sum to unity in each time
period, the random disturbances corresponding to the J value shares sum to zero
in each time period:

i’d = 0, @=1,2...T), (3.2)

so that these disturbances are not distributed independently.


We assume that the unobservable random disturbances for all J + 1 equations
have expected value equal to zero for all observations:

E E:=O, (t=1,2...7). (3.3)


[Iet

We also assume that the disturbances have a covariance matrix that is the same
for all observations; since the random disturbances corresponding to the J value
shares sum to zero, this matrix is nonnegative definite with rank at most equal to
J. We assume that the covariance matrix of the random disturbances correspond-
ing to the value shares and the rate of technical change, say Z, has rank J, where:

v= “1 =Z, (t =1,2... T).


[Iat

Finally, we assume that the random disturbances corresponding to distinct


observations in the same or distinct equations are uncorrelated. Under this
assumption the covariance matrix of random disturbances for all observations has
1862 D. W. .Jorgenson

the Kronecker product form:

1
81

E:

(3.4)

3.2. Autocorrelation

The rate of technical change ui is not directly observable; we assume that the
equation for the translog price index of the rate of technical change can be
written:

-~~=~~~+Pd,lnp,+P~~.t+E:, (t =1,2...T), (3.5)

where Ej is the average disturbance in the two periods:

E:=$[E;+E:-r], (t=1,2...T).

Similarly, 6 is a vector of averages of the logarithms of the input prices and t


is the average of time as an index of technology in the two periods.
Using our new notation, the equations for the value shares of all inputs can be
written:

V*= cyP+ BJn pr+ &. i + 2, (t =1,2... T), (3.6)

where E* is a vector of averages of the disturbances in the two periods. As before,


the average value shares sum to unity, so that the average disturbances for the
equations corresponding to value shares sum to zero:

i’E’ = 0, (t =1,2...T). (3.7)

The covariance matrix of the average disturbances corresponding to the equa-


tion for the rate of technical change for all observations is proportional to a
Ch. 31: Econometric Methods for Modeling Producer Behavior 1863

Laurent matrix:

(3.8)

where:

1
2
...
1 ...

s2= ii ...

0 0 0

The covariance matrix of the average disturbance corresponding to the equa-


tion for each value share is proportional to the same Laurent matrix. The
covariance matrix of the average disturbances for all observations has the
Kronecker product form:

(3.9)

Since the matrix D in (3.9) is known, the equations for the average rate of
technical change and the average value shares can be transformed to eliminate
autocorrelation. The matrix 52 is positive definite, so that there is a matrix P such
that:

POP’ = I,
P’P = r’.

To construct the matrix P we first invert the matrix D to obtain the inverse
matrix tip’, a positive definite matrix. We then calculate the Cholesky factoriza-
1864 D. W. Jorgenson

tion of the inverse matrix K’,

Q-t = TDT’.

where T is a unit lower triangular matrix and D is a diagonal matrix with positive
elements along the main diagonal. Finally, we can write the matrix P in the form:

where D112 is a diagonal matrix with elements along the main diagonal equal to
the square roots of the corresponding elements of D.
We can transform equations for the average rates of technical change by the

=
matrix P = D’12T’ to obtain equations with uncorrelated random disturbances:

1
v=
* 1 lnp,, ..a 2-i E2

1
f
u3
I
1 lnp,, ... 3-i .e3
f
Dl/=T Dl/=T’ + D”/=T’ . 9

-T T-f
“t 1 lnp,, ... ET
:I
i3.10)

since:

The transformation P = D ‘12T’ is applied to data on the average rates of


technical change U, and data on the average values of the variables that appear on
the right hand side of the corresponding equation.
We can apply the transformation P = D ‘/*T’ to the equations for average
value shares to obtain equations with uncorrelated disturbances. As before, the
transformation is also applied to data on the average values of variables that
appear on the right hand side of the corresponding equations. The covariance
matrix of the transformed disturbances from the equations for the average value
shares and the equation for the average rates of technical change has the
Kronecker product form:

IQ~W~T’) = a3I.
(103W~~f)(2f8i2)( (3.11)

To estimate the unknown parameters of the translog price function we combine


the first J - 1 equations for the average value shares with the equation for the
average rate of technical change to obtain a complete econometric model of
production and technical change. We can estimate the parameters of the equation
Ch. 31: Econometric Methodr for ModelingProducer Behavior 1865

for the remaining average value share, using the product exhaustion restrictions
on these parameters. The complete model involves :J( J + 3) unknown parame-
ters. A total of $(.I” + 4J + 5) additional parameters can be estimated as func-
tions of these parameters, using the homogeneity, product exhaustion, and
symmetry restrictions.32

3.3. Identification and estimation

We next discuss the estimation of the econometric model of production and


technical change given in (3.5) and (3.6). The assumption that the input prices
and the level of technology are exogenous variables implies that the model
becomes a nonlinear multivariate regression model with additive errors, so that
nonlinear regression techniques can be employed. This specification is appropriate
for cross section data and individual producing units. For aggregate time series
data the existence of supply functions for all inputs makes it essential to treat the
prices as endogenous. Under this assumption the model becomes a system of
nonlinear simultaneous equations.
To estimate the complete model of production and technical change by the
method of full information maximum likelihood it would be necessary to specify
the full econometric model, not merely the model of producer behavior. Accord-
ingly, to estimate the model of production in (3.5) and (3.6) we consider limited
information techniques. For nonlinear multivariate regression models we can
employ the method of maximum likelihood proposed by Malinvaud (1980).33 For
systems of nonlinear simultaneous equations we outline the estimation of the
model by the nonlinear three stage least squares (NL3SLS) method originated by
Jorgenson and Laffont (1974). Wherever the right hand side variables can be
treated as exogenous, this method reduces to limited information maximum
likelihood for nonlinear multivariate regression models.
Application of NL3SLS to our model of production and technical change
would be straightforward, except for the fact that the covariance matrix of the
disturbances is singular. We obtain NL3SLS estimators of the complete system by
dropping one equation and estimating the resulting system of J equations by
NL3SLS. The parameter estimates are invariant to the choice of the equation
omitted in the model.
The NL3SLS estimator can be employed to estimate all parameters of the
model of production and technical change, provided that these parameters are

32This approach to estimation is presented by Jorgenson and Fraumeni (1981).


33Maximum likelihood estimation by means of the “seemingly unrelated regressions” model
analyzed by Zellner (1962) would not be appropriate here, since the symmetry constraints we have
described in Section 2.4 cannot be written in the bilinear form considered by Zellner.
1866 D. W. Jorgenson

identified. The necessary order condition for identification is that:

f(J+3) < (.I-l)min(V,r-l), (3.12)

where V’is the number of instruments. A necessary and sufficient rank condition
is given below; this amounts to the nonlinear analogue of the absence of
multicollinearity.
Our objective is to estimate the unknown parameters- (Ye, Bpp, ppt -subject to
the restrictions implied by homogeneity, product exhaustion, symmetry, and
monotonicity. By dropping the equation for one of the value shares, we can
eliminate the restrictions implied by summability. These restrictions can be used
in estimating the parameters that occur in the equation that has been dropped.
We impose the restrictions implied by homogeneity and symmetry as equalities.
The restrictions implied by monotonicity take the form of inequalities.
We can write the model of production and technical change in (3.5) and (3.6) in
the form:

where u, ( j = 1,2. . . J - 1) is the vector of observations on the distributive share of


the j th input for all time periods, transformed to eliminate autocorrelation, u, is
the corresponding vector of observations on the rates of technical change; the
vector y includes the parameters- (Ye,at, Bpp, &, &,; h(j = 1,. . . ,2.. . J) is a
vector of nonlinear functions of these parameters; finally, ej( j = 1,2.. . J) is
the vector of disturbances in the jth equation, transformed to eliminate autocor-
relation.
We can stack the equations in (3.13), obtaining:

U=_f(Y)+E, (3.14)

where:

By the assumptions in Section 3.1 above the random vector E has mean zero and
Ch. 31: Econometric Methods for Modeling Producer Behavior 1867

covariance matrix E’,@I where 2, is obtained from the covariance ,S in (3.11) by


striking the row and column corresponding to the omitted equation.
The nonlinear three stage least squares (NL3SLS) estimator for the model of
production and technical change is obtained by minimizing the weighted sum of
squared residuals:

s(y) = [u-/(y)]‘[~;‘a2(z~z)-‘z~][o-f(Y)], (3.15)

with respect to the vector of unknown parameters y, where Z is the matrix of


T - 1 observations on the 1’ instrumental variables. Provided that the parameters
are identified, we can apply the Gauss-Newton method to minimize (3.15). First,
we linearize the model (3.14), obtaining:

af
U=f(Yo)+-(Yo)Ay+U, (3.16)
ay
where yO is the initial value of the vector of unknown parameters y and

Ay=y,-Yo,

where yr is the revised value of this vector. The fitted residuals u depend on the
initial and revised values.
To revise the initial values we apply Zellner and Theil’s (1962) three stage least
squares method to the linearized model, obtaining:

Ay= ( ~(yo)'(~;l~Z(Z'Z)-'Z')~(yo))-l
~~(yo)'(e;1~z(z2)-1z')[u-f(yo)]. (3.17)

If S(y,) > S(y,), a further iteration is performed by replacing y. by yi in (3.16)


and (3.17); resulting in a further revised value, say y2, and so on. If this condition
is not satisfied, we divide the revision A-y by two and evaluate the criteria S(y)
again; we continue reducing the revision Ay until the criterion improves or
the convergence criterion maxiAyj/yj is less than some prespecified limit. If the
criterion improves, we continue with further iterations. If not, we stop the
iterative process and employ the current value of the vector of unknown parame-
ters as our NL3SLS estimator.34

34Computational techniques for constrained and unconstrained estimation of nonlinear multivariatc


regression models are discussed by Malinvaud (1980). Techniques for computation of unconstrained
estimators for systems of nonlinear simultaneous equations are discussed by Bemdt, Hall, Hall, and
Hausman (1974) and Belsley (1974,1979).
1868 II. W. Jorgenson

The final step in estimation of the model of production and technical change is
to minimize the criterion function (3.15) subject to the restrictions implied by
monotonicity of the distributive shares. We have eliminated the restrictions that
take the form of equalities. Monotonicity of the distributive shares implies
inequality restrictions on the parameters of the Cholesky factorization of the
matrix of constant share elasticities ,BP,. The diagonal elements of the matrix D
in this factorization must be nonposltrve.
We can represent the inequality constrains on the matrix of share elasticities
BP, in the form:

~,(YkO, (j=1,2... J-l), (3.18)

where J - 1 is the number of restrictions. We obtain the inequality constrained


nonlinear three stage least squares estimator for the model by minimizing the
criterion function subject to the constraints (3.18). This estimator corresponds to
the saddlepoint of the Lagrangian function:

L=S(y)+X$, (3.19)

where X is a vector of J- 1 Lagrange multipliers and + is a vector of J - 1


constraints.
The Kuhn-Tucker (1951) conditions for a saddlepoint of the Lagrangian (3.19)
are the first-order conditions:

aL
-= as(Y) +,*=, (3.20)
a~ au ay ’
and the complementary slackness condition:

X$ = 0, x 2 0. (3.21)

To find a saddlepoint of the Lagrangian (3.19) we begin by linearizing the


model of production and technical change (3.14) as in (3.16). Second, we linearize
the constraints as:

G(Y) = $AY ++(Y,), (3.22)

where y0 is a vector of initial values of the unknown parameters. We apply Liew’s


(1976) inequality constrained three stage least squares method to the linearized
model, obtaining
Ch. 31: Econometric Methods for Modeling Producer Behavior 1869

where AS is the change in the values of the parameters (3.17) and X* is the
solution of the linear complementarity problem:

where:

‘A=09
+$Y&Y-$(vo) h20.

Given an initial value of the unknown parameters


I yO that satisfies the J - 1
constraints (3.18), if S(y,) < S(y,) and S, satisfies the constraints, the iterative
process continues by linearizing the model (3.14) as in (3.16) and the constraints
(3.18) as in (3.22) at the revised value of the vector of unknown parameters
yr = yO + Ay. If not, we shrjnk Ay as before, continuing until an improvement is
found subject to the constraints or maxjAy,/yj is less than a convergence
criterion.
The nonlinear three stage least squares estimator obtained by minimizing the
criterion function (3.15) is a consistent estimator of the vector of unknown
parameters y. A consistent estimator of the covariance matrix E’, with typical
element is ajk is given by

~,~=~[“j-r,(p)]‘[u,-r,ol, (j, k=1,2..-J)- (3.24)

Under suitable regularity conditions the estimator y is asymptotically normal


with covariance matrix:

v(y) = ( ~(y)‘(~;‘.z(z~z)-lz~)~(y)}jl. (3.25)

We obtain a consistent estimator of this matrix by inserting the consistent


estimators T and 2, in place of the parameters y and 2,. The nonlinear three
stage least squares estimator is efficient in the class of instrumental variables
estimators using Z as the matrix of instrumental variables.35

35The method of nonlinear three stage least squares introduced by Jorgenson and Laffont (1974)
was extended to nonlinear inequality constrained estimation by Jorgenson, Lau, and Stoker (19X2),
esp. pp. 196-204.
1870 D. W. Jorgenson

The rank condition necessary and sufficient for identifiability of the vector of
unknown parameters y is the nonsingularity of the following matrix in the
neighborhood of the true parameter vector:

(3.26)

The order condition (3.12) given above is necessary for the nonsingularity of this
matrix.
Finally, we can consider the problem of testing equality restrictions on the
vector of unknown parameters y. For example, suppose that the maintained
hypothesis is that there are r = $( J + 3) elements in this vector after solving out
the homogeneity, product exhaustion, and symmetry restrictions. Additional
equality restrictions can be expressed in the form:

r=g@L (3.27)

where 6 is a vector of unknown parameters with S elements, s < r. We can test


the hypothesis:

H: Y = g(S),
against the alternative:

A: yzg(6).

Test statistics appropriate for this purpose have been analyzed by Gallant and
Jorgenson (1979) and Gallant and Holly (1980).36
A statistic for testing equality restrictions in the form (3.27) can be constructed
by analogy with the likelihood ratio principle. First, we can evaluate the criterion
function (3.15) at the minimizing value T, obtaining:

s(y)= [u-f(~)]‘[2;1sz(ztz)-1z~][u-f(Q)].
Second, we can replace the vector of unknown parameters y by the function g(6)
in (3.27):

S(6)= {u-f[g(s)]}'[~,lez(z~z)-'z~]{u-~~g~~)l};

36A nonstatistical approach to testing the theory of production has been presented by Afriat (1972),
Diewert and Parkan (1983), Hanoch and Rothschild (1972), and Varian (1984).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1871

minimizing the criterion function with respect to S, we obtain the minimizing


value 8, the constrained estimator of y, g(a), and the constrained value of the
criterion itself S( 6).
The appropriate test statistic, say T(y, 8), is equal to the difference between the
constrained and unconstrained values of the criterion function:

T(P,@=s@)-s(q). (3.28)

Gallant and Jorgenson (1979) show that this statistic is distributed asymptotically
as &i-squared with r - s degrees of freedom. Wherever the right hand side
variables can be treated as exogenous, this statistic reduces to the likelihood ratio
statistic for nonlinear multivariate regression models proposed by Malinvaud
(1980). The resulting statistic is distributed asymptotically as chi-squared.37

4. Applications of price functions

We first illustrate the econometric modeling of substitution among inputs in


Section 4.1 by presenting an econometric model for nine industrial sectors of the
U.S. economy implemented by Berndt and Jorgenson (1973). The Berndt-
Jorgenson model is based on a price function for each sector, giving the price of
output as a function of the prices of capital and labor inputs and the prices of
inputs of energy and materials. Technical change is assumed to be neutral, so that
all biases of technical change are set equal to zero.
In Section 4.2 we illustrate the econometric modeling of both substitution and
technical change. We present an econometric model of producer behavior that has
been implemented for thirty-five industrial sectors of the U.S. economy by
Jorgenson and Fraumeni (1981). In this model the rate of technical change and
the distributive shares of productive inputs are determined simultaneously as
functions of relative prices. Although the rate of technical change is endogenous,
this model must be carefully distinguished from models of induced technical
change.
Aggregation over inputs has proved to be an extremely important technique for
simplifying the description of technology for empirical implementation. The
corresponding restrictions can be used to generate a two stage model of producer
behavior. Each stage can be parametrized separately; alternatively, the validity of
alternative simplifications can be assessed by testing the restrictions. In Section
4.3 we conclude with illustrations of aggregation over inputs in studies by Berndt
and Jorgenson (1973) and Bemdt and Wood (1975).

37Statistics for testing linear inequality restrictions in linear multivariate regression models have
been developed by Gourieroux, Holly, and Montfort (1982); statistics for testing nonlinear inequality
restrictions in nonlinear multivariate regression models are given by Gourieroux, Holly, and Monfort
(1980).
1872 D. W. Jorgenson

4. I. Substitution

In the Berndt-Jorgenson (1973) model, production is divided among nine sectors


of the U.S. economy:
1. Agriculture, nonfuel mining, and construction.
2. Manufacturing, excluding petroleum refining.
3. Transportation.
4. Communications, trade, and services.
5. Coal mining.
6. Crude petroleum and natural gas.
7. Petroleum refining.
8. Electric utilities.
9. Gas utilities.
The nine producing sectors of the U.S. economy included in the
Berndt-Jorgenson model can be divided among five sectors that produce energy
commodities-coal, crude petroleum and natural gas, refined petroleum, electri-
city, and natural gas as a product of gas utilities-and four sectors that produce
nonenergy commodities - agriculture, manufacturing, transportation, and com-
munications. For each sector output is defined as the total domestic supply of the
corresponding commodity group, so that the input into the sector includes
competitive imports of the commodity, inputs of energy, and inputs of nonenergy
commodities.
The Berndt-Jorgenson model of producer behavior includes a system of
equations for each of the nine producing sectors giving the shares of capital,
labor, energy and materials inputs in the value of output as functions of the prices
of the four inputs. To formulate an econometric model stochastic components are
added to this system of equations. The rate of technical change is taken to be
exogenous, so that the adjustment for autocorrelation described in Section 3.2 is
not required. However, all prices are treated as endogenous variables; estimates of
the unknown parameters of the econometric model are based on the nonlinear
three stage least squares estimator presented in Section 3.3.
The endogenous variables in the Berndt-Jorgenson model of producer behavior
include value shares of capital, labor, energy, and materials inputs for each sector.
Three equations can be estimated for each sector, corresponding to three of the
value shares, as in (2.14). The unknown parameters include three elements of the
vector { ap} and six share elasticities in the matrix { Bpp }, which is constrained to
be symmetric, so that there is a total of nine unknown parameters. Berndt and
Jorgenson estimate these parameters from time series data for the period
1947-1971 for each industry; the estimates are presented by Hudson and
Jorgenson (1974).
As a further illustration of modeling of substitution among inputs, we consider
an econometric model of the total manufacturing sector of the U.S. economy
Ch. 31: Econometric Methods for Modeling Producer Behavior 1873

implemented by Berndt and Wood (1975). This sector combines the manufactur-
ing and petroleum refining sectors of the Berndt-Jorgenson model. Berndt and
Wood generate this model by expressing the price of aggregate input as a function
of the prices of capital, labor, energy, and materials inputs into total manufactur-
ing. They find that capital and energy inputs are complements, while all other
pairs of inputs are substitutes.
By comparison with the results of Berndt and Wood, Hudson and Jorgenson
(1978) have classified patterns of substitution and complementarity among inputs
for the four nonenergy sectors of the Berndt-Jorgenson model. For agriculture,
nonfuel mining and construction, capital and energy are complements and all
other pairs of inputs are substitutes. For manufacturing, excluding petroleum
refining, energy is complementary with capital and materials, while other pairs of
inputs are substitutes. For transportation energy is complementary with capital
and labor while other pairs of inputs are substitutes. Finally, for communications,
trade and services, energy and materials are complements and all other pairs of
inputs are substitutes.
Bemdt and Wood have considered further simplification of the Berndt-
Jorgenson model of producer behavior by imposing separability restrictions on
patterns of substitution among capital, labor, energy, and materials inputs.38 This
would reduce the number of input prices at the first stage of the model through
the introduction of additional input aggregates. For this purpose additional stages
in the allocation of the value of sectoral output among inputs would be required.
Berndt and Wood consider all possible pairs of capital, labor, energy, and
materials inputs, but find that only the input aggregate consisting of capital and
energy is consistent with the empirical evidence.39
Bemdt and Morrison (1979) have disaggregated the Berndt-Wood data on
labor input between blue collar and white collar labor and have studied the
substitution among the two types of labor and capital, energy, and materials
inputs for U.S. total manufacturing, using a translog price function. Anderson
(1981) has reanalyzed the Bemdt-Wood data set, testing alternative specifications
of the model of substitution among inputs. Gallant (1981) has fitted an alternative
model of substitution among inputs to these data, based on the Fourier functional
form for the price function. Elbadawi, Gallant, and Souza (1983) have employed
this approach in estimating price elasticities of demand for inputs, using the
Berndt-Wood data as a basis for Monte Carlo simulations of the performance of
alternative functional forms.

3RRestrictions on patterns of substitution implied by homothetic separability have been discussed by


Bemdt and Christensen (1973a). Jorgenson and Lau (1975), Russell (1975), and Blackorby and Russell
(1976).
39The methodology for testing separability restrictions was originated by Jorgenson and Lau (1975).
This methodology has been discussed by Blackorby, Primont and Russell (1977) and by Denny and
Fuss (1977). An alternative approach has been developed by Woodland (1978).
1874 D. W. Jorgenson

Cameron and Schwartz (1979) Denny, May, and Pinto (1978) Fuss (1977a),
and McRae (1981) have constructed econometric models of substitution among
capital, labor, energy, and materials inputs based on translog functional forms for
total manufacturing in Canada. Technical change is assumed to be neutral, as in
the study of U.S. total manufacturing by Berndt and Wood (1975) but noncon-
stant returns to scale are permitted. McRae and Webster (1982) have compared
models of substitution among inputs in Canadian manufacturing, estimated from
data for different time periods.
Friede (1979) has analyzed substitution among capital, labor, energy, and
materials inputs for total manufacturing in the Federal Republic of Germany. He
assumes that technical change is neutral and utilizes a translog price function. He
has disaggregated the results to the level of fourteen industrial groups, covering
the whole of the West German economy. He has separated materials inputs into
two groups-manufacturing and transportation services as one group and other
nonenergy inputs as a second group. Ozatalay, Grubaugh, and Long (1979) have
modeled substitution among capital, labor, energy and materials inputs, on the
basis of a translog price function. They use time series data for total manufactur-
ing for the period 1963-74 in seven countries-Canada, Japan, the Netherlands,
Norway, Sweden, the U.S., and West Germany.
Longva and Olsen (1983) have analyzed substitution among capital, labor,
energy, and materials inputs for total manufacturing in Norway. They assume
that technical change is neutral and utilize a generalized Leontief price function.
They have disaggregated the results to the level of nineteen industry groups.
These groups do not include the whole of the Norwegian economy; eight
additional industries are included in a complete multi-sectoral model of
production for Norway. Dargay (1983) has constructed econometric models of
substitution among capital, labor, energy, and materials inputs based on translog
functional forms for total manufacturing in Sweden. She assumes that technical
change is neutral, but permits nonconstant returns to scale. She has disaggregated
the results to the level of twelve industry groups within Swedish manufacturing.
Although the breakdown of inputs among capital, labor, energy, and materials
has come to predominate in econometric models of production at the industry
level, Humphrey and Wolkowitz (1976) have grouped energy and materials inputs
into a single aggregate input in a study of substitution among inputs in several
U.S. manufacturing industries that utilizes translog price functions. Friedlaender
and Spady (1980) have disaggregated transportation services between trucking
and rail service and have grouped other inputs into capital, labor and materials
inputs. Their study is based on cross section data for ninety-six three-digit
industries in the United States for 1972 and employs a translog functional form
with fixed inputs.
Parks (1971) has employed a breakdown of intermediate inputs among agricul-
tural materials, imported materials and commercial services, and transportation
Ch. 31: Econometric Methods for Modeling Producer Behavior 1875

services in a study of Swedish manufacturing based on the generalized Leontief


functional form. Denny and May (1978) have disaggregated labor input between
while collar and blue collar labor, capital input between equipment and struc-
tures, and have grouped all other inputs into a single aggregate input for
Canadian total manufacturing, using a translog functional form. Frenger (1978)
has analyzed substitution among capital, labor, and materials inputs for three
industries in Norway, breaking down intermediate inputs in a different way for
each industry, and utilizing a generalized Leontief functional form.
Griffin (1977a, 1977b, 1977c, 1978) has estimated econometric models of
substitution among inputs for individual industries based on translog functional
forms. For this purpose he has employed data generated by process models of the
U.S. electric power generation, petroleum refining, and petrochemical industries
constructed by Thompson, et al. (1977). Griffin (1979) and i(opp and Smith
(1980a, 1980b, 1981a, 1981b) have analyzed the effects of alternative aggregations
of intermediate inputs on measures of substitution among inputs in the steel
industry. For this purpose they have utilized data generated from a process
analysis model of the U.S. steel industry constructed by Russell and Vaughan
(1976).40
Although we have concentrated attention on substitution among capital, labor,
energy, and materials inputs, there exists a sizable literature on substitution
among capital, labor, and energy inputs alone. In this literature the price function
is assumed to be homothetieally separable in the prices of these inputs. This
requires that all possible pairs of the inputs -capital and labor, capital and
energy, and labor and energy - are separable from materials inputs. As we have
observed above, only capital-energy separability is consistent with the results of
Berndt and Wood (1975) for U.S. total manufacturing.
Appelbaum (1979b) has analyzed substitution among capital, labor, and energy
inputs in the petroleum and natural gas industry of the United States, based on
the data of Berndt and Jorgenson. Field and Grebenstein (1980) have analyzed
substitution among physical capital, working capital, labor, and energy for ten
two-digit U.S. manufacturing industries on the basis of translog price functions,
using cross section data for individual states for 1971.
Griffin and Gregory (1976) have modeled substitution among capital, labor,
and energy inputs for total manufacturing in nine major industrialized
countries - Belgium, Denmark, France, Italy, the Netherlands, Norway, the U.K.,
the U.S., and West Germany-using a translog price function. They pool four
cross sections for these countries for the years 1955, 1960, 1965, and 1969,
allowing for differences in technology among countries by means of one-zero

40The advantages and disadvantages of summarizing data from process analysis models by means of
econometric models have been discussed by Maddala and Roberts (1980, 1981) and Griffin (1980,
1981~).
1876 D. W. Jorgenson

dummy variables. Their results differ substantially from those of Berndt and
Jorgenson and Berndt and Wood. These differences have led to an extensive
discussion among Berndt and Wood (1979, 1981), Griffin (1981a, 1981b), and
Kang and Brown (1981), attempting to reconcile the alternative approaches.
Substitution among capital, labor, and energy inputs requires a price function
that is homothetically separable in the prices of these inputs. An alternative
specification is that the price function is homothetically separable in the prices of
capital, labor, and natural resource inputs. This specification has been utilized by
Humphrey and Moroney (1975), Moroney and Toeves (1977,1979) and Moroney
and Trapani (1981a, 1981b) in studies of substitution among these inputs for
individual manufacturing industries in the U.S. based on translog price functions.
A third alternative specification is that the price function is separable in the
prices of capital and labor inputs. Berndt and Christensen (1973b, 1974) have
used translog price functions employing this specification in studies of sub-
stitution among individual types of capital and labor inputs for U.S. total manu-
facturing. Berndt and Christensen (1973b) have divided capital input between
structures and equipment inputs and have tested the separability of the two types
of capital input from labor input. Berndt and Christensen (1974) have divided
labor input between blue collar and white collar inputs and have tested the
separability of the two types of labor input from capital input. Hamermesh and
Grant (1979) have surveyed the literature on econometric modeling of substitu-
tion among different types of labor input.
Woodland (1975) has analyzed substitution among structures, equipment and
labor inputs for Canadian manufacturing, using generalized Leontief price func-
tions. Woodland (1978) has presented an alternative approach to testing sep-
arability and has applied it in modeling substitution among two types of capital
input and two types of labor input for U.S. total manufacturing, using the
translog parametric form. Field and Berndt (1981) and Berndt and Wood (1979,
1981) have surveyed econometric models of substitution among inputs. They
focus on substitution among capital, labor, energy and materials inputs at the
level of individual industries.

4.2. Technical change

The Jorgenson-Fraumeni (1981) model is based on a production function char-


acterized by constant returns to scale for each of thirty-five industrial sectors of
the U.S. economy. Output is a function of inputs of primary factors of produc-
tion -capital and labor services -inputs of energy and materials, and time as an
index of the level of technology. While the rate of technical change is endogenous
in this econometric model, the model must be carefully distinguished from models
of induced technical change, such as those analyzed by Hicks (1963) Kennedy
(1964), Samuelson (1965), von Weizsacker (1962) and many others. In those
Ch. 31: Econometric Methods for Modeling Producer Behavior 1877

models the biases of technical change are endogenous and depend on relative
prices. As Samuelson (1965) has pointed out, models of induced technical change
require intertemporal optimization since technical change at any point of time
affects future production possibilities.41
In the Jorgenson-Fraumeni model of producer behavior myopic decision rules
can be derived by treating the price of capital input as a rental price of capital
services.42 The rate of technical change at any point of time is a function of
relative prices, but does not affect future production possibilities. This greatly
simplifies the modeling of producer behavior and facilitates the implementation
of the econometric model. Given myopic decision rules for producers in each
industrial sector, all of the implications of the economic theory of production can
be described in terms of the properties of the sectoral price functions given in
Section 2.1.43
The Jorgenson-Fraumeni model of producer behavior consists of a system of
equations giving the shares of capital, labor, energy, and materials inputs in the
value of output and the rate of technical change as functions of relative prices and
time. To formulate an econometric model a stochastic component is added to
these equations. Since the rate of technical change is not directly observable, we
consider a form of the model with autocorrelated disturbances; the data are
transformed to eliminate the autocorrelation. The prices are treated as endoge-
nous variables and the unknown parameters are estimated by the method of
nonlinear three stage least squares presented in Section 3.3.
The endogenous variables in the Jorgenson-Fraumeni model include value
shares of sectoral inputs for four commodity groups and the sectoral rate of
technical change. Four equations can be estimated for each industry, correspond-
ing to three of the value shares and the rate of technical change. As unknown
parameters there are three elements of the vector {(Ye}, the scalar { LYE},six share
elasticities in the matrix {BP,}, which is constrained to be symmetric, three biases
of technical change in the vector { &}, and the scalar { &}, so that there is a
total of fourteen unknown parameters for each industry. Jorgenson and Fraumeni
estimate these parameters from time series data for the period 1958-1974 for each
industry, subject to the inequality restrictions implied by monotonicity of the
sectoral input value shares.44
The estimated share elasticities with respect to price {BP,} describe the
implications of patterns of substitution for the distribution of the value of output
among capital, labor, energy, and materials inputs. Positive share elasticities

41A review of the literature on induced technical change is given by Binswanger (lY78a).
42The model of capital as a factor of production was originated by Walras (1954). This model has
been discussed by Diewert (1980) and by Jorgenson (1973a, 1980).
41Myopic decision rules are derived by Jorgenson (1973b).
44 Data on energy and materials are based on annual interindustry transactions tables for the United
States compiled by Jack Faucett Associates (1977). Data on labor and capital are based on estimates
by Fraumeni and Jorgenson (1980).
1878 II. W. Jorgenson

imply that the corresponding value shares increase with an increase in price;
negative share elasticities imply that the value shares decrease with price; zero
share elasticities correspond to value shares that are independent of price. The
concavity constraints on the sectoral price functions contribute substantially to
the precision of the estimates, but require that the share of each input be
nonincreasing in the price of the input itself.
The empirical findings on patterns of substitution reveal some striking similari-
ties among industries. 45 The elasticities of the shares of capital with respect to the
price of labor are nonnegative for thirty-three of the thirty-five industries, so that
the shares of capital are nondecreasing in the price of labor for these thirty-three
sectors. Similarly, elasticities of the share of capital with respect to the price of
energy are nonnegative for thirty-four industries and elasticities with respect to
the price of materials are nonnegative for all thirty-five industries. The share
elasticities of labor with respect to the prices of energy and materials are
nonnegative for nineteen and for all thirty-five industries, respectively. Finally,
the share elasticities of energy with respect to the price of materials are nonnega-
tive for thirty of the thirty-five industries.
We continue the interpretation of the empirical results with estimated biases of
technical change with respect to price { &}. These parameters can be interpreted
as changes in the sectoral value shares (2.14) with respect to time, holding prices
constant. This component of change in the value shares can be attributed to
changes in technology rather than to substitution among inputs. For example, if
the bias of technical change with respect to the price of capital input is positive,
we say that technical change is capital-using; if the bias is negative, we say that
technical change is capital-saving.
Considering the rate of technical change (2.14) the biases of technical change
{ BP,} can be interpreted in an alternative and equivalent .way. These parameters
are changes in the negative of the rate of technical change with respect to changes
in prices. As substitution among inputs takes place in response to price changes,
the rate of technical change is altered. For example, if the bias of technical change
with respect to capital input is positive, an increase in the price of capital input
decreases the rate of technical change; if the bias is negative, an increase in the
price of capital input increases the rate of technical change.
A classification of industries by patterns of the biases of technical change is
given in Table 1. The pattern that occurs with greatest frequency is capital-using,
labor-using, energy-using, and materials-saving technical change. This pattern
occurs for nineteen of the thirty-five industries for which biases are fitted.
Technical change is capital-using for twenty-five of the thirty-five industries,
labor-using for thirty-one industries, energy-using for twenty-nine industries, and
materials-using for only two industries.

45Parameter &mates are given by Jorgenson and Fraumeni (1983), pp. 255-264.
(‘A. 31: Lkmmetric Methods for Modeling Producer Behmior 1879

Table 1
Classification of industries by biases of technical change

Pattern of biases Industries

Capital using Agriculture, metal mining, crude pctrolcum and natural


Labor using gas, nonmetallic mining, textiles, apparel, lumber,
Energy using furniture, printing, leather, fabricated metals,
Material saving electrical machinery, motor vehicles, instruments,
miscellaneous manufacturing, transportation, trade,
finance, insurance and real estate, services
Capital using Coal mining, tobacco manufacturers, communications,
Labor using government enterprises
Energy saving
Material saving
Capital using Petroleum refining
Labor saving
Energy using
Material saving
Capital using Construction
Labor saving
Energy saving
Material using
Capital saving Electric utilities
Labor saving
Energy using
Material saving
Capital saving Primary metals
Labor using
Energy saving
Material saving
Capital saving Paper, chemicals, rubber, stone, clay and glass,
Labor using machinery except electrical, transportation equip-
Energy using ment and ordnance, gas utilities
Mate& saving
Capital saving Food
Labor saving
Energy using
Material using

“Source: Jorgenson and Fraumeni (19X3), p. 264.

The patterns of biases of technical change given in Table 1 have important


implications for the relationship between relative prices and the rate of economic
growth. An increase in the price of materials increases the rate of technical change
in thirty-three of the thirty-five industries. By contrast, increases in the prices of
capital, labor, and energy reduced the rates of technical change in twenty-five,
thirty-one, and twenty-nine industries, respectively. The substantial increases in
1880 D. W. Jorgenson

energy prices since 1973 have had the effect of reducing sectoral rates of technical
change, slowing the aggregate rate of technical change, and diminishing the rate
of growth for the U.S. economy as a whole.46
While the empirical results suggest a considerable degree of similarity across
industries, it is necessary to emphasize that the Jorgenson-Fraumeni model of
producer behavior requires important simplifying assumptions. First, conditions
for producer equilibrium under perfect competition are employed for all in-
dustries. Second, constant returns to scale at the industry level are assumed.
Finally, a description of technology that leads to myopic decision rules is
employed. These assumptions must be justified primarily by their usefulness in
implementing production models that are uniform for all thirty-five industrial
sectors of the U.S. economy.
Binswanger (1974a, 1974b, 1978~) has analyzed substitution and technical
change for U.S. agriculture, using cross sections of data for individual states for
1949, 1954, 1959, and 1964. Binswanger was the first to estimate biases of
technical change based on the translog price function. He permits technology to
differ among time periods and among groups of states within the United States.
He divides capital inputs between land and machinery and divides intermediate
inputs between fertilizer and other purchased inputs. He considers substitution
among these four inputs and labor input.
Binswanger employs time series data on U.S. agriculture as a whole for the
period 191221964 to estimate biases of technical change on an annual basis.
Brown and Christensen (1981) have analyzed time series data on U.S. agriculture
for the period 1947-1974. They divide labor services between hired labor and
self-employed labor and capital input between land and all other-machinery,
structures, and inventories. Other purchased inputs are treated as a single
aggregate. They model substitution and technical change with fixed inputs, using
a translog functional form.
Berndt and Khaled (1979) have augmented the Berndt-Wood data set for U.S.
manufacturing to include data on output. They estimate biases of technical
change and permit nonconstant returns to scale. They employ a Box-Cox
transformation of data on input prices, generating a functional form that includes
the translog, generalized Leontief, and quadratic as special cases. The Box-Cox
transformation is also employed by Appelbaum (1979a) and by Caves,
Christensen, and Trethaway (1980). Denny (1974) has proposed a closely related
approach to parametrization based on mean value functions.
Kopp and Diewert (1982) have employed a translog parametric form to study
technical and allocative efficiency. For this purpose they have analyzed data on
U.S. total manufacturing for the period 1947-71 compiled by Berndt and Wood

46The implications of patterns of biases of technical change are discussed in more detail by
Jorgenson (1981).
C-h. 31: Econometrrc Methods for Modeling Producer Behavior 1881

(1975) and augmented by Berndt and Khaled (1979). Technical change is not
required to be neutral and nonconstant returns to scale are permitted. They have
interpreted the resulting model of producer behavior as a representation of
average practice. They have then re-scaled the parameters to obtain a “frontier”
representing best practice and have employed the results to obtain measures of
technical and allocative efficiency for each year in the sample.47
Wills (1979) has modeled substitution and technical change for the U.S. steel
industry, using a translog price function. Norsworthy and Harper (1981) have
extended and augmented the Berndt-Wood data set for total manufacturing and
have modeled substitution and technical change, using a translog price function.
Woodward (1983) has reanalyzed these data and has derived estimates of rates of
factor augmentation for capital, labor, energy, and materials inputs, using a
translog price function.
Jorgenson (1984b) has modeled substitution and technical change for thirty-five
industries of the United States for the period 1958-1979, dividing energy inputs
between electricity and nonelectrical energy inputs. He employs translog price
functions with capital, labor, two kinds of energy, and materials inputs and finds
that technical change is electricity-using and nonelectrical energy-using for most
U.S. industries. Nakamura (1984) has developed a similar model for twelve
sectors covering the whole of the economy for the Federal Republic of Germany
for the period 1960-1974. He has disaggregated intermediate inputs among
energy, materials, and services.
We have already discussed the work of Kopp and Smith on substitution among
inputs, based on data generated by process models of the U.S. steel industry.
Kopp and Smith (1981c, 1982) have also analyzed the performance of different
measures of technical change, also using data generated by these models. They
show that measures of biased technical change based on the methodology
developed by Binswanger can be explained by the proportion of investment in
specific technologies.
Econometric models of substitution among inputs at the level of individual
industries have incorporated intermediate inputs-broken down between energy
and materials inputs-along with capital and labor inputs. However, models of
substitution and technical change have also been constructed at the level of the
economy as a whole. Output can be divided between consumption and investment
goods, as in the original study of the translog price function by Christensen,
Jorgenson, and Lau (1971, 1973) and input can be divided between capital and
labor services.
Hall (1973) has considered nonjointness of production of investment and
consumption goods outputs for the United States. Kohli (1981, 1983) has also

47A survey of the literature on frontier representations of technology is given by Forsund, Lovell,
and Schmidt (1980).
1882 D. W. Jorgenson

studied nonjointness in production for the United States. Burgess (1974) has
added imports as an input to inputs of capital and labor services. Denny and
Pinto (1978) developed a model with this same breakdown of inputs for Canada.
Conrad and Jorgenson (1977, 1978) have considered nonjointness of production
and alternative models of technical change for the Federal Republic of Germany.

4.3. Two stage allocation

Aggregation over inputs has proved to be a very important means for simplifying
the description of technology in modeling producer behavior. The price of output
can be represented as a function of a smaller number of input prices by
introducing price indexes for input aggregates. These price indexes can be used to
generate a second stage of the model by treating the price of each aggregate as a
function of the prices of the inputs making up the aggregate. We can parametrize
each stage of the model separately.
The Berndt-Jorgenson (1973) model of producer behavior is based on two
stage allocation of the value of output of each sector. In the first stage the value of
sectoral output is allocated among capital, labor, energy, and materials inputs,
where materials include inputs of nonenergy commodities and competitive im-
ports. In the second stage the value of energy expenditure is allocated among
expenditures on individual types of energy and the value of materials expenditure
is allocated among expenditures on competitive imports and nonenergy commod-
ities.
The first stage of the econometric model is generated from a price function for
each sector. The price of sectoral output is a function of the prices of capital and
labor inputs and the prices of inputs of energy and materials. The second stage of
the model is generated from price indexes for energy and materials inputs. The
price of energy is a function of the prices of five types of energy inputs, while the
price of materials is a function of the prices of four types of nonenergy inputs and
the price of competitive imports.
The Berndt-Jorgenson model of producer behavior consists of three systems of
equations. The first system gives the shares of capital, labor, energy and materials
inputs in the value of output, the second system gives the shares of energy inputs
in the value of energy input, and the third system gives the shares of nonenergy
inputs and competitive imports in the value of materials inputs. To formulate an
econometric model stochastic components are added to these systems of equa-
tions. The rate of technical change is taken to be exogenous; all prices-including
the prices of energy and materials inputs for each sector-are treated as endoge-
nous variables. Estimates of the unknown parameters of all three systems of
equations are based on the nonlinear three stage least squares estimator.
The Berndt-Jorgenson model illustrates the use of two stage allocation to
simplify the description of producer behavior. By imposing the assumption that
Ch. 31: Econometric Methods for Modeling Producer Behavior 1883

the price of aggregate input is separable in the prices of individual energy and
materials inputs, the price function that generates the first stage of the model can
be expressed in terms of four input prices rather than twelve. However, simplifica-
tions of the first stage of the model requires the introduction of a second stage,
consisting of price functions for energy and materials inputs. Each of these price
functions can be expressed in terms of five prices of individual inputs.
Fuss (1977a) has constructed a two stage model of Canadian total manufactur-
ing using translog functional forms. He treats substitution among coal, liquid
petroleum gas, fuel oil, natural gas, electricity, and gasoline as a second stage of
the model. Friede (1979) has developed two stage models based on translog price
functions for fourteen industries of the Federal Republic of Germany. In these
models the second stage consists of three separate models-one for substitution
among individual types of energy and two for substitution among individual
types of nonenergy inputs. Dargay (1983) has constructed a two stage model of
twelve Swedish manufacturing industries utilizing a translog functional form. She
has analyzed substitution among electricity, oil, and solid fuels inputs at the
second stage of the model.
Nakamura (1984) has constructed three stage models for twelve industries of
the Federal Republic of Germany, using translog price functions. The first stage
encompasses substitution and technical change among capital, labor, energy,
materials, and services inputs. The second stage consists of three models - a model
for substitution among individual types of energy, a model for substitution among
individual types of materials, and a model for substitution among individual
types of services. The third stage consists of models for substitution between
domestically produced input and the corresponding imported input of each type.
Pindyck (1979a, 1979b) has constructed a two stage model of total manufactur-
ing for ten industrialized countries- Canada, France, Italy, Japan, the Nether-
lands, Norway, Sweden, the U.K., the U.S., and West Germany-using a translog
price function. He employs annual data for the period 1959-1973 in estimating a
model for substitution among four energy inputs -coal, oil, natural gas, and
electricity. He uses annual data for the period 1963-73 in estimating a model for
substitution among capital, labor, and energy inputs. Magnus (1979) and Magnus
and Woodland (1984) have constructed a two stage model for total manufacturing
in the Netherlands along the same lines. Similarly, Ehud and Melnik (1981) have
developed a two stage model for the Israeli economy.
Halvorsen (1977) and Halvorsen and Ford (1979) have constructed a two stage
model for substitution among capital, labor, and energy inputs for nineteen
two-digit U.S. manufacturing industries on the basis of translog price functions.
For this purpose they employ cross section data for individual states in 1971. The
second stage of the model provides a disaggregation of energy input among inputs
of coal, oil, natural gas, and electricity. Halvorsen (1978) has analyzed substitu-
tion among different types of energy on the basis of cross section data for 1958,
1962. and 1971.
18X4 D. W. Jorgenson

5. Cost functions

In Section 2 we have considered producer behavior under constant returns to


scale. The production function (2.1) is homogeneous of degree one, so that a
proportional change in all inputs results in a change in output in the same
proportion. Necessary conditions for producer equilibrium (2.2) are that the value
share of each input is equal to the elasticity of output with respect to that input.
Under constant returns to scale the value shares and the elasticities sum to unity.
In this Section we consider producer behavior under increasing returns to scale.
Under increasing returns and competitive markets for output and all inputs,
producer equilibrium is not defined by profit maximization, since no maximum of
profit exists. However, in regulated industries the price of output is set by
regulatory authority. Given demand for output as a function of the regulated
price, the level of output is exogenous to the producing unit.
With output fixed from the point of view of the producer, necessary conditions
for equilibrium can be derived from cost minimization. Where total cost is
defined as the sum of expenditures on all inputs, the minimum value of cost can
be expressed as as function of the level of output and the prices of all inputs. We
refer to this function as the cost function. We have described the theory of
production under constant returns to scale in terms of properties of the price
function (2.4); similarly, we can describe the theory under increasing returns in
terms of properties of the cost function.

5.1. Duality

Utilizing the notation of Section 2, we can define total cost, say c, as the sum of
expenditures on all inputs:

c= c pjxj.
j=1

We next define the shares of inputs in total cost by:

u_=
PJxj
- (j=1,2...J).
J c ’

With output fixed from the point of view of the producing unit and competitive
markets for all inputs, the necessary conditions for producer equilibrium are given
by equalities between the shares of each input in total cost and the ratio of the
Ch. 31: Econometric Method.7for Modeling Producer Behmior 1X85

elasticity of output with respect to that input and the sum of all such elasticities:

d In y
alnx
(5.1)
‘= i,alny ’
alnx

where i is a vector of ones and:

u=(u1,u2- uJ) -vector of cost shares.

Given the definition of total cost and the necessary conditions for producer
equilibrium, we can express total cost, say c, as a function of the prices of all
inputs and the level of output:

c=c(p,y). (5.2)
We refer to this as the cost function. The cost function C is dual to the production
function F and provides an alternative and equivalent description of the technol-
ogy of the producing unit.48
We can formalize the theory of production in terms of the following properties
of the cost function:
1. Positiuity. The cost function is positive for positive input prices and a
positive level of output.
2. Homogeneity. The cost function is homogeneous of degree one in the input
prices.
3. Monotonicity. The cost function is increasing in the input prices and in the
level of output.
4. Concuuity. The cost function is concave in the input prices.
Given differentiability of the cost function, we can express the cost shares of all
inputs as elasticities of the cost function with respect to the input prices:

u=++,,). (5.3)

Further, we can define an index of returns to scale as the elasticity of the cost
function with respect to the level of output:

4RDuality between cost and production functions is due to Shephard (1953, 1970).
1886 D. w. .7orgenson

Following Frisch (1965), we can refer to this elasticity as the cost flexibility.
The cost flexibility uV is the reciprocal of the degree of returns to scale, defined
as the elasticity of output with respect to a proportional increase in all inputs:

1
u”= i, 8lny . (5.5)
t3In x

If output increases more than in proportion to the increase in inputs, cost


increases less than in proportion to the increase in output.
Since the cost function C is homogeneous of degree one in the input prices, the
cost shares and the cost flexibility are homogeneous of degree zero and the cost
shares sum to unity:

i’u=i’alnc=l,
ifI In p

Since the cost function is increasing in the input prices, the cost shares must be
nonnegative and not all zero:

u 2 0.

The cost function is also increasing in the level of output, so that the cost
flexibility is positive:

UP> 0.

5.2. Substitution and economies of scale

We have represented the cost shares of all inputs and the cost flexibility as
functions of the input prices and the level of output. We can characterize these
functions in terms of measures of substitution and economies of scale. We obtain
share elasticities by differentiating the logarithm of the cost function twice with
respect to the logarithms of input prices:

(5.6)

These measures of substitution give the response of the cost shares of all inputs to
proportional changes in the input prices.
Second, we can differentiate the logarithm of the cost function twice with
respect to the logarithms of the input prices and the level of output to obtain
Ch. 31: Econometric Methods for Modeling Producer Behavior 1887

measures of economies of scale:

(5.7)

We refer to these measures as biases of scale. The vector of biases of scale up_”can
be employed to derive the implications of economies of scale for the relative
distribution of total cost among inputs. If a scale bias is positive, the cost share of
the corresponding input increases with a change in the level of output. If a scale
bias is negative, the cost share decreases with a change in output. Finally, if a
scale bias is zero, the cost share is independent of output.
Alternatively, the vector of biases of scale uPy can be employed to derive the
implications of changes in input prices for the cost flexibility. If the scale bias is
positive, the cost flexibility increases with the input price. If the scale bias
is negative, the cost flexibility decreases with the input price. Finally, if the bias is
zero, the cost flexibility is independent of the input price.
To complete the description of economies of scale we can differentiate the
logarithm of the cost function twice with respect to the level of output:

U =%P,Y)=-&P,v). (5.8)
.“.” 8 In y 2

If this measure is positive, zero, or negative, the cost flexibility is increasing,


decreasing, or independent of the level of output.
The matrix of second-order logarithmic derivatives of the logarithms of the cost
function C must be symmetric. This matrix includes the matrix of share elastici-
ties UPP, the vector of biases of scale uPy, and the derivative of the cost flexibility
with respect to the logarithm of output uy,,. Concavity of the cost function in the
input prices implies that the matrix of second-order derivatives, say H, is
nonpositive definite, so that the matrix UP, + uu’ - V is nonpositive definite,
where:

Total cost c is positive and the diagonal matrices N and V are defined in terms of
the input prices p and the cost shares u, as in Section 2. Two inputs are
substitutes if the corresponding element of the matrix uPP + uu’- V is negative,
complements if the element is negative, and independent if the element is zero.
In Section 2.2 above we have introduced price and quantity indexes of
aggregate input implied by homothetic separability of the price function. We can
analyze the implications of homothetic separability of the cost function by
1888 D. W. Jorgenson

introducing price and quantity indexes of aggregate input and defining the cost
share of aggregate input in terms of these indexes. An aggregate input can be
treated in precisely the same way as any other input, so that price and quantity
indexes can be used to reduce the dimensionality of the space of input prices and
quantities.
We say that the cost function C is homothetic if and only if the cost function is
separable in the prices of all J inputs { pi, p2 . . . p,}, so that:

(5.9)

where the function P is homogeneous of degree one and independent of the level
of output y. The cost function is homothetic if and only if the production
function is homothetic, where

(5.10)

where the function G is homogeneous of degree one.49


Since the cost function is homogeneous of degree one in the input prices, it is
homogeneous of degree one in the function P, which can be interpreted as the
price index for a single aggregate input; the function G is the corresponding
quantity index. Furthermore, the cost function can be represented as the product
of the price index of aggregate input P and a function, say H, of the level of
output:

c= P( PI, ~2 a.. p,).Hb). (5.11)

Under homotheticity, the cost flexibility uv is independent of the input prices:

uY =+y(y). (5.12)

If the cost flexibility is also independent of the level of output, the cost function is
homogeneous in the level of output and the production function is homogeneous
in the quantity index of aggregate input G. The degree of homogeneity of the
production function is the degree of returns to scale and is equal to the reciprocal
of the cost flexibility. Under constant returns to scale the degree of returns to
scale and the cost flexibility are equal to unity.

49The concept of homotheticity was introduced by Shephard (1953). Shephard shows that ho-
motheticity of the cost function is equivalent to homotheticity of the production function.
Ch. 31: Econometric Method for Modeling Producer Behavior 18X9

5.3. Parametrization and integrability

In Section 2.3 we have generated an econometric model of producer behavior by


treating the measures of substitution and technical change as unknown parame-
ters to be estimated. In this Section we generate an econometric model of cost and
production by introducing the parameters:

BP,= %P’ 4v = UN9 4.”= UY,” 7


(5.13)

where Bpp is a matrix of constant share elasticities, BP, is a vector of constant


biases of scale, and BYYis a constant derivative of the cost flexibility with respect
to the logarithm of output.
We can regard the matrix of share elasticities, the vector of biases of scale, and
the derivative of the cost flexibility with respect to the logarithm of output as a
system of second-order partial differential equations. We can integrate this system
to obtain a system of first-order partial differential equations:

(5.14)
uv = cfyy+ &ln p + P,,ln y,

where the parameters- LY,,,(Y_”-are constants of integration. Choosing scales for


measuring the quantities and prices of output and the inputs, we can consider
values of input prices and level of output equal to unity. At these values the
vector of parameters ‘Ye is equal to the vector of cost shares and the parameters
(Ye is equal to the cost flexibility.
We can integrate the system of first-order partial differential eqs. (5.14) to
obtain the cost function:

In c = (Ye+ ol,ln p + a,ln y + 4 In p’B,,ln p

+lnp~~~,lny+:4,(lny)‘, (5.15)

where the parameter IY,, is a constant of integration. This parameter is equal to


the logarithm of total cost where the input prices and level of output are equal to
unity, We can refer to this form as the translog cost function, indicating the role
of the variables, or the constant share elasticity (CSE) cost function, indicating
the role of the parameters.
To incorporate the implications of the economic theory of production we
consider restrictions on the system of eqs. (5.14) required to obtain a cost
function with properties listed above. A complete set of conditions for integrabil-
1890 D. W. Jorgenson

ity is the following:

5.3.1. Homogeneity

The cost shares and the cost flexibility are homogeneous of degree zero in the
input prices.
Homogeneity of degree zero of the cost shares and the cost flexibility implies
that the parameters- BP, and fi,,, ~ must satisfy the restrictions:

Bppi = 0

&i = 0. (5.16)

where i is a vector of ones. For J inputs there are J + 1 restrictions implied by


homogeneity.

5.3.2. Cost exhaustion

The sum of the cost shares is equal to unity.


Cost exhaustion implies that the value of the J inputs is equal to total cost.
Cost exhaustion implies that the parameters - (Ye, Bpp, p,, -must satisfy the re-
strictions:

c$i =l,

Bipi = 0, (5.17)

pdYi = 0.

For J inputs there are J + 2 restrictions implied by cost exhaustion.

5.3.3. Symmetry

The matrix of share elasticities, biases of scale, and the derivative of the cost
flexibility with respect to the logarithm output must be symmetric.
A necessary and sufficient condition for symmetry is that the matrix of
parameters must satisfy the restrictions:

(5.18)

For J inputs the total number of symmetry restrictions is fJ( J + 1).


CA. 31: Econometric Methods for Modeling Producer Behavior 1891

5.3.4. Nonnegutivity

The cost shares and the cost flexibility must be nonnegative.


Since the translog cost function is quadratic in the logarithms of the input
prices and the level of output, we cannot impose restrictions on the parameters
that imply nonnegativity of the cost shares and the cost flexibility. Instead, we
consider restrictions on the parameters that imply monotonicity of the cost shares
wherever they are nonnegative.

5.3.5. Monotonicity

The matrix of share elasticities B,, + ud- V is nonpositive definite.


The conditions on the parameters assuring concavity of the cost function
wherever the cost shares are nonnegative are precisely analogous to the conditions
given in Section 2.4 for concavity of the price function wherever the value shares
are nonnegative. These conditions can be expressed in terms of the Cholesky
factorization of the matrix of constant share elasticities BP,.

5.4. Stochastic speci$cation

To formulate an econometric model of cost and production we add a stochastic


component to the equations for the cost shares and the cost function itself. To
represent the econometric model we require some additional notation. Where
there are K producing units we index the observations by producing unit
(k = 1,2.. . K). The vector of cost shares for the kth unit is denoted uk and total
cost of the unit is ck (k =1,2... K). The vector of input prices faced by the k th
unit is denoted pk and the vector of logarithms of input prices is In pk(k =
1,2.. . K). Finally, the level of output of the ith unit is denoted yk(k = 1,2.. . K ).
We obtain an econometric model of cost and production corresponding to the
translog cost function by adding random disturbances to the equations for the
cost shares and the cost function:

(5.19)

hlCk = a0 + a,hI pk + a,ln yk + i In PkB,,ln Pk

+lnpk~~,lnyk+f(1nyk)2+e;, (k=1,2...K),

where .ak is the vector of unobservable random disturbances for the cost shares of
the k th producing unity and E: is the corresponding disturbance for the cost
function (k = 1,2.. . K). Since the cost shares for all inputs sum to unity for each
1892 D. W. Jorgenson

producing unit, the random disturbances corresponding to the J cost shares sum
to zero for each unit:

i’ek = 0 (k =1,2...K), (5.20)

so that these disturbances are not distributed independently.


We assume that the unobservable random disturbances for all J + 1 equations
have expected value equal to zero for all observations:

&k
E =O, (k =1,2... K). (5.21)
[ e: 1

We also assume that the disturbances have a covariance matrix that is the same
for all producing units and has rank J, where:

'k
V =.Z, (k=1,2... K).
[G 1

Finally, we assume that random disturbances corresponding to distinct ob-


servations are uncorrelated, so that the covariance matrix of random disturbances
for all observations has the Kronecker product form:

El1

-512

v G =Z@I. (5.22)
E21

E;

We can test the validity of restrictions on economies of scale by expressing


them in terms of the parameters of an econometric model of cost and production.
Under homotheticity the cost flexibility is independent of the input prices. A
necessary and sufficient condition for homotheticity is given by:

(5.23)

the vector of biases of scale is equal to zero. Under homogeneity the cost
flexibility is independent of output, so that:
Ch. 31: Econometric Method for Modeling Producer Behavior 1893

the derivative of the flexibility with respect to the logarithm of output is zero.
Finally, under constant returns to scale, the cost flexibility is equal to unity; given
the restrictions implied by homogeneity, constant returns requires:

ay =l. (5.24)

6. Applications of cost functions

To illustrate the econometric modeling of economies of scale in Section 6.1, we


present an econometric model that has been implemented for the electric power
industry in the United States by Christensen and Greene (1976). This model is
based on cost functions for cross sections of individual electric utilities in 1955
and 1970. Total cost of steam generation is a function of the level of output and
the prices of capital, labor, and fuel inputs. Steam generation accounts for more
than ninety percent of total power generation for each of the firms in the
Christensen-Greene sample.
A key feature of the electric power industry in the United States is that
individual firms are subject to price regulation. The regulatory authority sets the
price for electric power. Electric utilities are required to supply the electric power
that is demanded at the regulated price. This model must be carefully dis-
tinguished from the model of a regulated firm proposed by Averch and Johnson
(1962).50 In the Averch-Johnson model firms are subject to an upper limit on the
rate of return rather than price regulation. Firms minimize costs under rate of
return regulation only if the regulatory constraint is not binding.
The literature on econometric modeling of scale economies in U.S. transporta-
tion and communications industries parallels the literature on the U.S. electric
power industry. Transportation and communications firms, like electric utilities,
are subject to price regulation and are required to supply all the services that are
demanded at the regulated price. However, the modeling of transportation and
communications services is complicated by joint production of several outputs.
We review econometric models with multiple outputs in Section 6.2.

6.1. Economies of scale

The Christensen-Greene model of the electric power industry consists of a system


of equations giving the shares of all inputs in total cost and total cost itself as

5oA model of a regulated firm based on cost minimization was introduced by Nerlove (1963).
Surveys of the literature on the Averch-Johnson model have been given by Bailey (1973) and Baumol
and Klevorick (1970).
1894 D. W. Jorgenson

functions of relative prices and the level of output. To formulate an econometric


model Christensen and Greene add a stochastic component to these equations.
They treat the prices and levels of output as exogenous variables and estimate the
unknown parameters by the method of maximum likelihood for nonlinear multi-
variate regression models.
The endogenous variables in the Christensen-Greene model are the cost shares
of capital, labor, and fuel inputs and total cost. Christensen and Greene estimate
three equations for each cross section,. corresponding to two of the cost shares and
the cost function. As unknown parameters they estimate two elements of the
vector up, the two scalars- (Ye and (Ye-three elements of the matrix of share
elasticities Bpp,two biases of scale in the vector bPv, and the scalar /3,,. They
estimate a total of ten unknown parameters for each of two cross sections of
electric utilities for the years 1955 and 1970.51 Estimates of the remaining
parameters of the model are calculated by using the cost exhaustion, homogene-
ity, and symmetry restrictions. They report that the monotonicity and concavity
restrictions are met at every observation in both cross section data sets.
The hypothesis of constant returns to scale can be tested by first considering
the hypothesis that the cost function is homothetic; under this hypothesis the cost
flexibility is independent of the input prices. Given homotheticity the additional
hypothesis that the cost function is homogeneous can be tested; under this
hypothesis the cost flexibility is independent of output as well as prices.
These hypotheses can be nested, so that the test of homogeneity is conditional on
the test of homotheticity. Likelihood ratio statistics for these hypotheses are
distributed, asymptotically, as &i-squared.
We present the results of Christensen and Greene for 1955 and 1970 in Table 2.
Test statistics for the hypotheses of homotheticity and homogeneity for both cross
section data sets and critical values for &i-squared are also presented in Table 2.
Homotheticity can be rejected, so that both homotheticity and homogeneity are
inconsistent with the evidence; homogeneity, given homotheticity, is also rejected.
If all other parameters involving the level of output were set equal to zero, the
parameter (Ye would be the reciprocal of the degree of returns to scale. For both
1955 and 1970 data sets this parameter is significantly different from unity.
Christensen and Greene employ the fitted cost functions presented in Table 2
to characterize scale economies for individual firms in each of the two cross
sections. For both years the cost functions are U-shaped with a minimum point
occurring at very large levels of output. In 1955 118 of the 124 firms have

SIChri~ten~en and Greene have assembled data on cross sections of individual firms for 1955 and
1970. The quantity of output is measured in billions of kilowatt hours (kwh). The quantity of fuel
input is measured by British thermal units (Btu). Fuel prices per million Btu are averaged by weighting
the price of each fuel by the corresponding share in total consumption. The price of labor input is
measured as the ratio of total salaries and wages and employee pensions and benefits to the number of
full-time employees plus half the number of part-time employees. The price of capital input is
estimated as the sum of interest and depreciation.
Ch. .<I: Econometric Meihodr for Modeling Producer Behuvior 1895

Table 2
Cost function for U.S. electric power industry (parameter estimates, 1955 and 1970;
t-ratios in parentheses).a

1955 1970

8.412 7.14
(31.52) (32.45)
0.386 0.587
(6.22) (20.87)
0.094 0.208
(0.94) (2.95)
0.348 ~ 0.151
(4.21) (- 1.85)
0.558 0.943
(8.57) (14.64)
0.059 0.049
(5.76) (12.94)
- 0.008 0.003
(~ 1.79) (- 1.23)
- 0.016 -0.018
(~ 10.10) (- 8.25)
0.024 0.021
(5.14) (6.64)
0.175 0.118
(5.51) (6.17)
0.038 0.081
(2.03) (5.00)
0.176 0.178
(6.83) (10.79)
-0.018 ~ 0.011
(- 1.01) (- 0.749)
- 0.159 ~ 0.107
(-6.05) (- 7.48)
- 0.020 - 0.070
(- 2.08) (-6.30)

Test statistics for restrictions on economies of scaleb

Statistic Homotheticity Homogeneity

1955 78.22 102.27


1970 57.91 157.46
Critical Value (1%) 9.21 11.35

“Source: Christensen and Greene (1976, Table 4, p. 666).


hSource: Christensen and Greene (1976, Table 5, p. 666).

significant economies of scale; only six firms have no significant economies or


diseconomies of scale, but these firms produce 25.9 percent of the output of the
sample. In 1970 ninety-seven of the 114 firms have significant economies of scale,
sixteen have none, and one has significant scale diseconomies.
Econometric modeling of economies of scale in the U.S. electric power industry
has generated a very extensive literature. The results through 1978 have been
surveyed by Cowing and Smith (1978). More recently, the Christensen-Greene
1896 D. W. Jorgrnsor~

data base has been extended by Greene (1983) to incorporate cross sections of
individual electric utilities for 1955, 1960, 1965, 1970, and 1975. By including
both the logarithm of output and time as an index of technology in the translog
total cost function (5.15), Greene is able to characterize economies of scale and
technical change simultaneously.
Stevenson (1980) has employed a translog total cost function incorporating
output and time to analyze cross sections of electric utilities for 1964 and 1972.
Gollop and Roberts (1981) have used a similar approach to study annual data on
eleven electric utilities in the United States for the period 195881975. They use
the results to decompose the growth of total cost among economies of scale,
technical change, and growth in input prices. Griffin (1977b) has modeled
substitution among different types of fuel in steam electricity generation using
four cross sections of twenty OECD countries. Halvorsen (1978) has analyzed
substitution among different fuel types, using cross section data for the United
States in 1972.
Cowing, Reifschneider, and Stevenson (1983) have employed a translog total
cost function similar to that of Christensen and Greene to analyze data for
eighty-one electric utilities for the period 1964-1975. For this purpose they have
grouped the data into four cross sections, each consisting of three-year totals for
all firms. If disturbances in the equations for the cost shares (5.19) are associated
with errors in optimization, costs must increase relative to the minimum level
given by the cost function (5.15). Accordingly, Cowing, Reifschneider and Steven-
son employ a disturbance for the cost function that is constrained to be positive.52
An alternative to the Christensen-Greene model for electric utilities has been
developed by Fuss (1977b, 1978). In Fuss’s model the cost function is permitted
to differ ex ante, before a plant is constructed, and ex post, after the plant is in
place. 53 Fuss employ s a generalized Leontief cost function with four input
prices- structures, equipment, fuel, and labor. He models substitution among
inputs and economies of scale for seventy-nine steam generation plants for the
period 1948-61.
We have observed that a model of the behavior of a regulated firm based on
cost minimization must be carefully distinguished from the model originated by
Averch and Johnson (1962). In addition to allowing a given rate of return,
regulatory authorities may permit electric utilities to adjust the regulated price of
output for changes in the cost of specific inputs. In the electric power industry a

52Statistical methods for models of production with disturbances constrained to be positive or


negative are discussed by Aigner, Amemiya and Pokier (1976) and Greene (1980).
53A model of production with differences between ex ante and ex post substitution possibilities was
introduced by Houthakker (1956). This model has been further developed by Johansen (1972) and
Sato (1975) and has been discussed by Hildenbrand (1981) and Koopmans (1977). Recent applications
are given by Forsund and Hjalmarsson (1979,1983), and Forsund and Jansen (1983).
Ch. 31: Econometric Methods for Modeling Producer Behavior 1x97

common form of adjustment is to permit utilities to change prices with changes in


fuel costs.
Peterson (1975) has employed a translog cost function for the electric utility
industry to test the Averch-Johnson hypothesis. For this purpose he introduces
three measures of the effectiveness of regulation into the cost function: a one-zero
dummy variable distinguishing between states with and without a regulatory
commission, a similar variable differentiating between alternative methods for
evaluation of public utility property for rate making purposes, and a variable
representing differences between the rate of return allowed by the regulatory
authority and the cost of capital. He analyzes annual observations on fifty-six
steam generating plans for the period 1966 to 1968.
Cowing (1978) has employed a quadratic parametric form to test the
Averch-Johnson hypothesis for regulated firms. He introduces both the cost of
capital and the rate of return allowed by the regulatory authority as determinants
of input demands. Cowing analyzes data on 114 steam generation plants con-
structed during each of three time periods-1947-50, 1955-59, and 2960-65.
Gollop and Karlson (1978) have employed a translog cost function that incorpo-
rates a measure of the effectiveness of regulatory adjustments for changes in fuel
costs. This measure is the ratio of costs that may be recovered under the fuel cost
adjustment mechanism to all fuel costs. Gollop and Karlson analyze data for
cross sections of individual electric utilities for the years 1970, 1971, and 1972.
Atkinson and Halvorsen (1980) have employed a translog parametric form to
test the effects of both rate of return regulation and fuel cost adjustment
mechanisms. For this purpose they have analyzed cross section data for electric
utilities in 1973. Gollop and Roberts (1983) have studied the effectiveness of
regulations on sulfur dioxide emissions in the electric utility industry. They
employ a translog cost function that depends on a measure of regulatory
effectiveness. This measure is based on the legally mandated reduction in emis-
sions and on the enforcement of emission standards. Gollop and Roberts analyze
cross sections of fifty-six electric utilities for each of the years 1973-1979 and
employ the results to study the impact of environmental regulation on productiv-
ity growth.

6.2. Multiple outputs

Brown, Caves, and Christensen (1979) have introduced a model for joint produc-
tion of freight and transportation services in the railroad industry based on the
translog cost function (5.15). 54 A cost flexibility (5.4) can be defined for each
output. Scale biases and derivatives of the cost flexibilities with respect to each

54A review of the literature on regulation with joint production is given by Bailey and Friedlaender
(1982).
1898 D. W. Jorgemn

output can be taken to be constant parameters. The resulting cost function


depends on logarithms of input prices and logarithms of the quantities of each
output. Caves, Christensen, and Trethaway (1980) have extended this approach
by introducing Box-Cox transformations of the quantities of the outputs in place
of logarithmic transformations. This generalized translog cost function permits
complete specialization in the production of a single output.
The generalized translog cost function has been applied to cross sections of
Class I railroads in the United States for 1955, 1963, and 1974 by Caves,
Christensen, and Swanson (1980). They consider five categories of inputs: labor,
way and structures, equipment, fuel, and materials. For freight transportation
services they take ton-miles and average length of freight haul as measures of
output. Passenger services are measured by passenger-miles and average length of
passenger trip. They employ the results to measure productivity growth in the
U.S. railroad industry for the period 1951-74. Caves, Christensen, and Swanson
(1981) have employed data for cross sections of Class I railroads in the United
States to fit a variable cost function, treating way and structures as a fixed input
and combining equipment and materials into a single variable input. They have
employed the results in measuring productivity growth for the period 1951-74.
Friedlaender and Spady (1981) and Harmatuck (1979) have utilized a translog
total cost function to analyze cross section data on Class I railroads in the United
States. Jara-Diaz and Winston (1981) have employed a quadratic cost function to
analyze data on Class III railroads, with measures of output disaggregated to the
level of individual point-to-point shipments. Brautigan, Daugherty, and Turnquist
(1982) have used a translog variable cost function to analyze monthly data for
nine years for a single railroad. Speed of shipment and quality of service are
included in the cost function as measures of the characteristics of output.
The U.S. trucking industry, like the U.S. railroad industry, is subject to price
regulation. Spady and Friedlaender (1978) have employed a translog cost function
to analyze data on a cross section of 168 trucking firms in 1972. They have
disaggregated inputs into four categories-labor, fuel, capital, and purchased
transportation. Freight transportation services are measured in ton-miles. To take
into account the heterogeneity of freight transportation services, five additional
characteristics of output are included in the cost function - average shipment size,
average length of haul, percentage of less than truckload traffic, insurance costs,
and average load per truck.
Friedlaender, Spady, and Chiang (1981) have employed the approach of Spady
and Friedlaender (1978) to analyze cross sections of 154, 161, and 47 trucking
firms in 1972. Inputs are disaggregated in the same four categories, while an
additional characteristic of output is included, namely, terminal density, defined
as ton-miles per terminal. Separate models are estimated for each of the three
samples. Friedlaender and Spady (1981) have employed the results in analyzing
Ch. 31: Econometric Methods for Modeling Producer Behavior 1899

the impact of changes in regulatory policy. Harmatuck (1981) has employed a


translog cost function to analyze a cross section of 100 trucking firms in 1977. He
has included data on the number and size of truck load and less-than-truckload
shipments and average length of haul as measures of output. He disaggregates
input among five activities-line haul, pickup and delivery, billing and collecting,
platform handling, and all other.
Finally, Chiang and Friedlaender (1985) have disaggregated the output of
trucking firms into four categories-less than truckload hauls of under 250 miles,
between 250-500 miles, and over 500 miles, and truck load traffic-all measured
in ton miles. Inputs are disaggregated among five categories-labor, fuel, revenue
equipment, “other” capital, and purchased transportation. Characteristics of
output similar to those included in earlier studies by Chiang, Friedlaender, and
Spady are incorporated into the cost function, together with measures of the
network configuration of each firm. They have employed this model to analyze a
cross section of 105 trucking firms for 1976.
The U.S. air transportation industry, like the U.S. railroad and trucking
industries, is subject to price regulation. Caves, Christensen, and Trethaway
(1983) have employed a translog cost function to analyze a panel data set for all
U.S. truck and local service airlines for the period 1970-81. Winston (1985) has
provided a survey of econometric models of producer behavior in the transporta-
tion industries, including railroads, trucking, and airlines.
In the United States the communications industries, like the transportation
industries, are largely privately owned but subject to price regulation. Nadiri and
Schankerman (1981) have employed a translog cost function to analyze time
series data for 1947-76 on the U.S. Bell System. They include the operating
telephone companies and Long Lines, but exclude the manufacturing activities of
Western Electric and the research and development activities of Bell Laboratories.
Output is an aggregate of four service categories; inputs of capital, labor, and
materials are distinguished. A time trend is included in the cost function as an
index of technology; the stock of research and development is included as a
separate measure of the level of technology.
Christensen, Cummings, and Schoech (1983) have employed alternative specifi-
cations of the translog cost functions to analyze time series data for the U.S. Bell
System for 1947-1977. They employ a distributed lag of research and develop-
ment expenditures by the Bell System to represent the level of technology. As
alternative representations they consider the proportion of telephones with access
to direct distance dialing, the percentage of telephones connected to central offices
with modern switching facilities, and a more comprehensive measure of research
and development. They also consider specifications with capital input held fixed
and with experienced labor and management held fixed. Evans and Heckman
(1983, 1984) have provided an alternative analysis of the same data set. They have
1900 D. W. Jorgenson

studied economies of scope m the joint production of teiecommunications services.


Bell Canada is the largest telecommunications firm in Canada. Fuss, and
Waverman (1981) have employed a translog cost function to analyze time series
data on Bell Canada for the period 1952-1975. Three outputs are distinguished:
message toll service, other total service, and local and miscellaneous service.
Capital, labor, and materials are treated as separate categories of input. The level
of technology is represented by a time trend. Denny, Fuss, Everson, and
Waverman (1981) have analyzed time series data for the period 1952-1976. The
percentage of telephones with access to direct dialing and the percentage of
telephones connected to central offices with modern switching facilities are
incorporated into the cost function as measures of the level of technology. Kiss,
Karabadjian, and Lefebvre (1983) have compared alternative specifications of
output and the level of technology. Fuss (1983) has provided a survey of
econometric modeling of telecommunications services.

7. Conclusion

The purpose of this concluding section is to suggest possible directions for future
research on econometric modeling of producer behavior. We first discuss the
application of econometric models of production in general equilibrium analysis.
The primary focus of empirical research has been on the characterization of
technology for individual producing units. Application of the results typically
involves models for both demand and supply for each commodity. The ultimate
objective of econometric modeling of production is to construct general equi-
librium models encompassing demand and supplies for a wide range of products
and factors of production.
A second direction for future research on producer behavior is to exploit
statistical techniques appropriate for panel data. Panel data sets consist of
observations on several producing units at many points of time. Empirical
research on patterns of substitution and technical change has been based on time
series observations on a single producing unit or on cross section observations on
different units at a given point of time. Research on economics of scale has been
based primarily on cross section observations.
Our exposition of econometric methods has emphasized areas of research where
the methodology has crystallized. An important area for future research is the
implementation of dynamic models of technology. These models are based on
substitution possibilities among outputs and inputs at different points of time. A
number of promising avenues for investigation have been suggested in the
literature on the theory of production. We conclude the paper with a brief review
of possible approaches to the dynamic modeling of producer behavior.
Ch. 31: Econometric Methods for Modeling Producer Behavior 1901

7.1. General equilibrium modeling

At the outset of our discussion it is essential to recognize that the predominant


tradition in general equilibrium modeling does not employ econometric methods.
This tradition originated with the seminal work of Leontief (1951) beginning with
the implementation of the static input-output model. Leontief (1953) gave a
further impetus to the development of general equilibrium modeling by introduc-
ing a dynamic input-output model. Empirical work associated with input-output
analysis is based on estimating the unknown parameters of a general equilibrium
model from a single interindustry transactions table.
The usefulness of the “fixed coefficients” assumption that underlies input-out-
put analysis is hardly subject to dispute. By linearizing technology it is possible to
solve at one stroke the two fundamental problems that arise in the practical
implementation of general equilibrium models. First, the resulting general equi-
librium model can be solved as a system of linear equations with constant
coefficients. Second, the unknown parameters describing technology can be
estimated from a single data point.
The first successful implementation of a general equilibrium model without the
fixed coefficients assumption of input-output analysis is due to Johansen (1974).
Johansen retained the fixed coefficients assumption in modeling demands for
intermediate goods, but employed linear logarithmic or Cobb-Douglas produc-
tion functions in modeling the substitution between capital and labor services and
technical change. Linear logarithmic production functions imply that relative
shares of inputs in the value of output are fixed, so that the unknown parameters
characterizing substitution between capital and labor inputs can be estimated
from a single data point.
In modeling producer behavior Johansen employed econometric methods only
in estimating constant rates of technical change. The essential features of
Johansen’s approach have been preserved in the general equilibrium models
surveyed by Fullerton, Henderson, and Shoven (1984). The unknown parameters
describing technology in these models are determined by “calibration” to a single
data point. Data from a single interindustry transactions table are supplemented
by a small number of parameters estimated econometrically. The obvious disad-
vantage of this approach is that arbitrary constraints on patterns of production
are required in order to make calibration possible.
An alternative approach to modeling producer behavior for general equilibrium
models is through complete systems of demand functions for inputs in each
industrial sector. Each system gives quantities demanded as functions of prices
and output. This approach to general equilibrium of modeling producer behavior
was originated by Berndt and Jorgenson (1973). As in the descriptions of
technology by Leontief and Johansen, production is characterized by constant
1902 D. w. Jorgmson

returns to scale in each sector. As a consequence, commodity prices can be


expressed as functions of factor prices, using the nonsubstitution theorem of
Samuelson (1951). This greatly facilitates the solution of the econometric general
equilibrium model constructed by Hudson and Jorgenson (1974) by permitting a
substantial reduction in dimensionality of the space of prices to be determined by
the model.
The implementation of econometric models of producer behavior for general
equilibrium analysis is very demanding in terms of data requirements. These
models require the construction of a consistent time series of interindustry
transactions tables. By comparison, the noneconometric approaches of Leontief
and Johansen require only a single inter-industry transactions table. Second, the
implementation of systems of input demand functions requires methods for
the estimation of parameters in systems of nonlinear simultaneous equations.
Finally, the restrictions implied by the economic theory of producer behavior
require estimation under both equality and inequality constraints.
Jorgenson and Fraumeni (1981) have constructed an econometric model of
producer behavior for thirty-five industrial sectors of the U.S. economy. The next
research objective is to disaggregate the demands for energy and materials by
constructing a hierarchy of models for allocation within the energy and materials
aggregates. A second research objective is to incorporate the production models
for all thirty-five industrial sectors into an econometric general equilibrium model
of production for the U.S. economy along the lines suggested by Jorgenson (1983,
1984a). A general equilibrium model will make it possible to analyze the implica-
tions of sectoral patterns of substitution and technical change for the behavior of
the U.S. economy as a whole.

7.2. Panel data

The approach to modeling economies of scale originated by Christensen and


Greene (1976) is based on the underlying assumption that individual producing
units at the same point of time have the same technology. Separate models of
production are fitted for each time period, implying that the same producing unit
has a different technology at different points of time. A more symmetrical
treatment of observations at different points of time is suggested by the model of
substitution and technical change in U.S. agriculture developed by Binswanger
(1974a, 1974b, 1978~). In this model technology is permitted to differ among time
periods and among producing units.
Caves, Christensen, and Trethaway (1984) have employed a translog cost
function to analyze a panel data set for all U.S. trunk and local service airlines for
the period 1970-81. Individual airlines are observed in some or all years during
the period. Differences in technology among years and among producing units are
Ch. 31: Ecormmetric Methods jar Modeling Producer Behavior 1903

incorporated through one-zero dummy variables that enter the cost function.
One set of dummy variables corresponds to the individual producing units. A
second set of dummy variables corresponds to the time periods.
Although airlines provide both freight and passenger service, the revenues for
passenger service greatly predominate in the total, so that output is defined as an
aggregate of five categories of transportation services. Inputs are broken down
into three categories-labor, fuel, and capital and materials. The number of points
served by an airline is included in the cost functions as a measure of the size of
the network. Average stage length and average load factor are included as
additional characteristics of output specific to the airline.
Caves, Christensen, and Trethaway introduce a distinction between economies
of scale and economies of density. Economies of scale are defined in terms of the
sum of the elasticities of total cost with respect to output and points served,
holding input prices and other characteristics of output constant. Economies of
density are defined in terms of the elasticity of total cost with respect to output,
holding points served, input prices, and other characteristics of output constant.
Caves, Christensen, and Trethaway find constant returns to scale and increasing
returns to density in airline service.
The model of panel data .employed by Caves, Christensen, and Trethaway in
analyzing air transportation service is based on “ fixed effects”. The characteristics
of output specific to a producing unit can be estimated by employing one-zero
dummy variables for each producing unit. An alternative approach based on
“random effects” of output characteristics is utilized by Caves, Christensen,
Trethaway, and Windle (1984) in modeling rail transportation service. They
consider a panel data set for forty-three Class I railroads in the United States for
the period 1951-1975.
Caves, Christensen, Trethaway, and Windle employ a generalized translog cost
function in modeling the joint production of freight and passenger transportation
services by rail. They treat the effects of characteristics of output specific to each
railroad as a random variable. They estimate the resulting model by panel data
techniques originated by Mundlak (1963,1978). The number of route miles served
by a railroad is included in the cost function as a measure of the size of the
network. Length of haul for freight and length of trip for passengers are included
as additional characteristics of output.
Economies of density in the production of rail transportation services are
defined in terms of the elasticity of total cost with respect to output, holding route
miles, input prices, firm-specific effects, and other characteristics of output fixed.
Economies of scale are defined holding only input prices and other characteristics
of output fixed. The impact of changes in outputs, route miles, and firm specific
effects can be estimated by panel data techniques. Economies of density and scale
can be estimated from a single cross section by omitting firm-specific dummy
variables.
1904 D. W. Jorgenson

Panel data techniques require the construction of a consistent time series of


observation on individual producing units. By comparison, the cross section
methods developed by Christensen and Greene require only a cross section of
observations for a single time period. The next research objective in characterizing
economies of scale and economies of density is to develop panel data sets for
regulated industries-electricity generation, transportation, and communica-
tions-and to apply panel data techniques in the analysis of economies of scale
and economies of density.

7.3. Dynamic models of production

The simplest intertemporal model of production is based on capital as a factor of


production. A less restrictive model generates costs of adjustment from changes in
the level of capital input through investment. As the level of investment increases,
the amount of marketable output that can be produced from given levels of all
inputs is reduced. Marketable output and investment can be treated as outputs
that are jointly produced from capital and other inputs. Models of production
based on costs of adjustment have been analyzed, for example, by Lucas (1967)
and Uzawa (1969).
Optimal production planning with costs of adjustment requires the use of
optimal control techniques. The optimal production plan at each point of time
depends on the initial level of capital input, so that capital is a “quasi-fixed”
input. Obviously, labor and other inputs can also be treated as quasi-fixed in
models of production based on costs of adjustment. The optimal production plan
at each point of time depends on the initial levels of all quasi-fixed inputs.
The optimal production plan with costs of adjustment depends on all future
prices of outputs and inputs of the production process. Unlike the prices of
outputs and inputs at each point of time employed in the production studies we
have reviewed, future prices cannot be observed on the basis of market transac-
tions. To simplify the incorporation of future prices into econometric models of
production, a possible approach is to treat these prices as if they were known with
certainty. A further simplification is to take all future prices to be equal to current
prices, so that expectations are “static”.
Dynamic models of production based on static expectations have been em-
ployed by Denny, Fuss, and Waverman (1981), Epstein and Denny (1983), and
Morrison and Berndt (1980). Denny, Fuss, and Waverman have constructed
models of substitution among capital, labor, energy, and materials inputs for
two-digit industries in Canada and the United States. Epstein and Denny have
analyzed substitution among these same inputs for total manufacturing in the
United States. Morrison and Berndt have utilized a similar data set with labor
input divided between blue collar and white collar labor. Berndt, Morrison, and
Watkins (1981) have SUNeyed dynamic models of production.
Ch. 31: Econometric Methods for Modeling Producer Behavior 1905

The obvious objection to dynamic models of production based on static


expectations is that current prices change from period to period, but expectations
are based on unchanging future prices. An alternative approach is to base the
dynamic optimization on forecasts of future prices. Since these forecasts are
subject to random errors, it is natural to require that the optimization process
take into account the uncertainty that accompanies forecasts of future prices. Two
alternative approaches to optimization under uncertainty have been proposed.
We first consider the approach to optimization under uncertainty based on
certainty equivalence. Provided that the objective function for producers is
quadratic and constraints are linear, optimization under uncertainty can be
replaced by a corresponding optimization problem under certainty. This gives rise
to linear demand functions for inputs with prices replaced by their certainty
equivalents. This approach has been developed in considerable detail by Hansen
and Sargent (1980, 1981) and has been employed in modeling producer behavior
by Epstein and Yatchew (1985), Meese (1980) and Sargent (1978).
An alternative approach to optimization under uncertainty is to employ the
information about expectations of future prices contained in current input levels.
This approach has the advantage that it is not limited to quadratic objective
functions and linear constraints. Pindyck and Rotemberg (1983a) have utilized
this approach in analyzing the Berndt-Wood (1975) data set for U.S. manufactur-
ing, treating capital and labor input as quasi-fixed. They employ a translog
variable cost function to represent technology, adding costs of adjustment that
are quadratic in the current and lagged values of .the quasi-fixed inputs. Pindyck
and Rotemberg (1983b) have employed a similar approach to the analysis of
production with two kinds of capital input and two types of labor input.

References

Afriat, S. (1972) “Efficiency Estimates of Production Functions”, International Economic Review,


October, 13(3), 568-598.
Aigncr, D. J., T. Amemiya and D. J. Poirier (1976) “On the Estimation of Production Frontiers:
Maximum Likelihood Estimation of the Parameters of a Discontinuous Density Function”, Interna-
tional Economic Review, June, 17(2), 311-396.
Amemiya, T. (1974) “The Nonlinear Two-Stage Least Squares Estimator”, Journal of Econometrics,
July, 2(2), 105-110.
Amemiya, T. (1977) “The Maximum Likelihood Estimator and the Nonlinear Three-Stage Least
Squares Estimator in the General Nonlinear Simultaneous Equation Model”, Econometrica, May,
45(4), 955-968.
Amemiya, T. (1983) “Nonlinear Regression Models”, this Hundbook, 1, 333-3X9.
Anderson, R. G. (1981) “On The Specifcation of Conditional Factor Demand Functions in Recent
Studies of U.S. Manufacturing”, in: E. R. Bemdt and B. C. Field, eds., 119-144.
Applebaum, E. (1978) “Testing Neoclassical Production Theory,” Journal of Econometrics, February,
7(l), 87-102.
Applebaum, E. (1979a) “On the Choice of Functional Forms”, International Economic Review, June,
20(2), 449-458.
Applebaum, E. (1979b) “Testing Price Taking Behavior”, Journal of Econometrics, February, 9(3),
283-294.
1906 D. W. Jorgenson

Arrow, K. J., H. B. Chencry, B. S. Minhas and R. M. Solow (1961) “Capital-Labor Substitution and
Economic Efficiency”, Review of Economics and Statistics, August, 63(3), 2255241.
Atkinson, S. E. and R. Halvorsen (1980) “A Test of Relative and Absolute Price Efficiency in
Regulated Utilities”, Review of Economics and Statistics, February, 62(l), 81-88.
Avcrch, H. and L. L. Johnson (1962) “Behavior of the Firm Under Regulatory Constraint”, Americun
Economic Review, December, 52(5), 1052-1069.
Bailey, E. E. (1973) Economic Theory of Regulatory Constmint. Lexington: Lexington Books.
Bailey, E. E. and A. F. Friedlacnder (1982) “Market Structure and Multiproduct Industries”, Journal
of Economic Literature, September, 20(3), 1024-1048.
Baumol, W. J. and A. K. Klevorick (1970) “Input Choices and Rate-of-Return Regulation: An
Overview of the Discussion”, Bell Journal of Economics und Munugement Science, Autumn, l(2).
1622190.
Belsley, D. A. (1974) “Estimation of Systems of Simultaneous Equations and Computational Applica-
tions of GREMLIN”, Annuls of Social and Economic Meuuurement, October, 3(4), 551-614.
Bclsley, D. A. (1979) “On The Computational Competitiveness of Full-Information Maximum-Likeli-
hood and Three-Stage Least-Squares in the Estimation of Nonlinear, Simultaneous-Equations
Models”, Journal of Econometrics, February, 9(3), 315-342.
Berndt, E. R. and L. R. Christensen (1973a) “The Internal Structure of Functional Relationships:
Separability, Substitution, and Aggregation”, Review of Economic Studies, July, 40(3), 123, 403-410.
Bcrndt, E. R. and L. R. Christensen (1973b) “The Translog Function and the Substitution of
Equipment, Structures, and Labor in U.S. Manufacturing, 1929%1968”, Journul of Econometrics,
March, l(l), 81-114.
Bcrndt, E. R. and L. R. Christensen (1974) “Testing for the Existence of a Consistent Aggregate Index
of Labor Inputs”, American Economic Review, June, 64(3), 391-404.
Berndt, E. R. and B. C. Field, cds. (1981) Modeling nnd Meusuring Natural Resource Substitution.
Cambridge: M.I.T. Press.
Berndt, E. R., B. H. Hall, R. E. Hall and J. A. Hausman (1974) “Estimation and Inference in
Nonlinear Structural Models”, Annals of Social and Economic Measurement, October, 3(4), 653-665.
Bcrndt, E. R. and D. W. Jorgenson (1973) “Production Structure”, in: D. W. Jorgenson and H. S.
Houthakker. eds., U.S. Energy Resources and Economic Growth. Washington: Energy Policy Project.
Berndt. E. R. and M. Khaled (1979) “Parametric Productivity Measurement and Choice Among
Flexible Functional Forms”, Journal of Political Economy, December, 87(6), 1220-1245.
Berndt, E. R. and C. J. Morrison (1979) “Income Redistribution and Employment Effects of Rising
Energy Prices”, Resources and Energy, October, 2(2), 131-150.
Bcrndt. E. R., C. J. Morrison and G. C. Watkins (1981) “Dynamic Models of Energy Demand: An
Assessment and Comparison”, in: E. R. Bemdt and B. C. Field, eds., 259-289.
Bcrndt, E. R. and D. 0. Wood (1975) “Technology, Prices, and the Derived Demand for Energy”,
Revrew of Economics und Stutistics, August, 57(3), 376-384.
Bcmdt, E. R. and D. 0. Wood (1979) “Engineering and Econometric Intcrprctations of Energy-Capital
Complcmcntarity”, American Economic Review, June, 69(3), 342-354.
Berndt, E. R. and D. 0. Wood (1981) “Engineering and Econometric Interpretations of Energy-Capital
Complementarity: Reply and Further Results”, American Economic Review, December, 71(5),
1105-1110.
Binswanger. H. P. (1974a) “A Cost-Function Approach to the Measurement of Elasticities of Factor
Demand and Elasticities of Substitution”, American Journnl of Agricultural Economics, May, 56(2),
377-386.
Binswangcr, H. P. (1974b) “The Mcasurcmcnt of Technical Change Biases with Many Factors of
Production,” Americun Economic Review, December, 64(5), 964-976.
Binswangcr, H. P. (1978a) “Induced Technical Change: Evolution of Thought”, in: H. P. Binswangcr
and V. W. Ruttan, eds., 13-43.
Binswanger, H. P. (1978b) “Issues in Modeling Induced Technical Change”, in: H. P. Binswangcr and
V. W. Ruttan, cds., 128-163.
Binswanger, H. P. (1978~) “Measured Biases of Technical Change: The United States”, in: H. P.
Binswangcr and V. W. Ruttan, cds., 215-242.
Binswangcr, H. P. and V. W. Ruttan, eds. (1978) Induced Innovcrtion. Baltimore: Johns Hopkins
University Press.
C‘h. .?I: Econometric Method.7 for Modeling Producer Rehuvior 1907

Blackorby, C., D. Primont and R. R. Russell (1977) “On Testing Separability Restrictions with
Flexible Functional Forms”, Journal of Econometrics, March, 5(2), 195-209.
Blackorby, C., D. Primont and R. R. Russell (1978) Duality, Separability, and Functionul Structure.
Amsterdam: North-Holland.
Blackorby, C. and R. R. Russell (1976) “Functional Structure and the Allen Partial Elasticities of
Substitution: An Application of Duality Theory”, Reutew of Economic Studies, 43(2), 134, 2855292.
Braeutigan, R. R., A. F. Daughety and M. A. Turnquist (1982) “The Estimation of a Hybrid Cost
Function for a Railroad Firm”, Review of Economics and Statistics, August. 64(3), 394-404.
Brown, M., ed. (1967) The Theory and Empiricttl Analysis of Production. New York: Columbia
University Press.
Brown, R. S.. D. W. Caves and L. R. Christensen (1979) “Modeling the Structure of Cost and
Production for Multiproduct Firms”, Southern Economic Journal, July, 46(3), 256273.
Brown, R. S. and L. R. Christensen (1981) “Estimating Elasticities of Substitution in a Model of
Partial Static Equilibrium: An Application to U.S. Agriculture, 1947 to 1974”, in: E. R. Berndt and
B. C. Field, eds., 209-229.
Burgess, D. F. (1974) “A Cost Minimization Approach to Import Demand Equations”, Review of
Economics und Stcrtistits, May, 56(2), 224-234.
Burgess, D. F. (1975) “Duality Theory and Pitfalls in the Specification of Technology”, Journnl of
Econometrics, May, 3(2), 105-121.
Cameron, T. A. and S. L. Schwartz (1979) “Sectoral Energy Demand in Canadian Manufacturing
Industries”. Energy Economics, April, l(2), 112-118.
C’arlson, S. (1939) A Study on the Pure Theory of Production. London: King.
Caves, D. W. and L. R. Christensen (1980) “Global Properties of Flexible Functional Forms”,
Americctn Economic Review, June, 70(3), 422-432.
Caves, D. W., L. R. Christensen and J. A. Swanson (1980) “Productivity in U.S. Railroads,
1951-1974”. Bell Journal of Economics, Spring 1980, 11(l), 166-181.
Caves, D. W., L. R. Christensen and J. A. Swanson (1981) “Productivity Growth, Scale Economies
and Capacity Utilization in U.S. Railroads, 1955-1974”. Amertcan Economic Review, December,
71(5), 994-1002.
Caves. D. W.. L. R. Christensen and M. W. Trethaway (1980) “Flcxiblc Cost Functions for
Multiproduct Firms”, Review of Economics und Statistics, August, 62(3), 477-481.
Caves, D. W., L. R. Christensen and M. W. Trethaway (1984) “Economics of Density Versus
Economics of Scale: Why Trunk and Local Airline Costs Differ”, Rand Journul of Economics,
Winter. 15(4), 471-489.
Caves. D. W.. L. R. Christensen, M. W. Trethaway and R. Windle (1984) “Network Effects and the
Measurement of Returns to Scale and Density for U.S. Railroads”, in: A. F. Daughety, ed.,
Anu!vtical Studies in Trunsport Economics, forthcoming.
C’hiang, S. J. W. and A. F. Friedlaender (1985) “Trucking Technology and Marked Structure”, Review
of Economit~s und Statistics, May, 67(2), 250-258.
Christ, C., ct. al. (1963) Mwsurement in Economics. Stanford: Stanford University Press.
Christensen, L. R., D. Cummings and P. E. Schocch (1983) “Econometric Estimation of Scale
Economies in Telecommunications”, in: L. Courville, A. de Fontcnay and R. Dobell, eds., 27-53.
Christensen, L. R. and W. H. Greene (1976) “Economies of Scale in U.S. Electric Power Generation”,
Jourtutl of Politicctl Economy, August, X4(4), 655-676.
Christensen. L. R. and D. W. Jorgenson (1970) “U.S. Real Product and Real Factor Input,
lY2Y-1967”. Review of Income and Wealth, March, 16(l), 19-50.
Christensen, L. R., D. W. Jorgenson and L. J. Lau (1971) “Conjugate Duality and the Transcendental
Logarithmic Production Function”, Econometricu, July, 39(3), 255-256.
Christensen, L. R., D. W. Jorgenson and L. J. Lau (1973) “Transcendental Logarithmic Production
Frontiers”, Retjiew of Economics and Statisttcs, Februarv. 55(l). 28-45.
Cobb, C. W. and P. H. Douglas (192X) “A Theory of+Production”, American Economic Revrew,
March, 18(2), 139-165.
Conrad, K. and D. W. Jorgenson (1977) “Tests of a Model of Production for the Federal Republic of
Germany, 1950-1973”, European Economic Review, October, 10(l), 51-75.
Conrad, K. and D. W. Jorgenson (1978) “The Structure of Technology: Nonjointness and Commodity
Augmentation, Fcdcral Republic of Germany, lY50-lY73”, Empirical Economics, 3(2), 91-113.
1908 D. W. Jorgenson

Courvillc, L., A. de Fontenay and R. Dobell, eds. (1983) Economic An&is of Telecommunications.
Amsterdam: North-Holland.
Cowing, T. G. (1978) “The Effectiveness of Rate-of-Return Regulation: An Empirical Test Using
Protit Functions”, in: M. Fuss and D. McFadden, eds.. 2, 215-246.
Cowing, T. G. and V. K. Smith (1978) “The Estimation of a Production Technology: A Survey of
Econometric Analyses of Steam Electric Generation”, Land Economics, May, 54(2), 158-16X.
Cowing, T. G. and R. E. Stevenson, eds. (1981), Productioitv Measurement in Regulated ,I Industries. New
York: Academic Press.
Cowing, T. G., D. Rcifschncider and R. E. Stevenson, “A Comparison of Alternative Frontier Cost
Function Specifications”, in: A. Doaramaci, ed.. 63-92.
Dargay. J. (1983) “The Demand for Energy in Swedish Manufacturing,” in B.-C. Ysander, ed., Energy
in Swedish Manu/acturing. Stockholm: Industrial Institute for Economic and Social Research,
57-128.
Denny, M. (1974) “The Relationship Between Functional Forms for the Production System”,
Canadian Journal of Economics, February, 7(l), 21-31.
Denny, M. and M. Fuss (1977) “The Use of.Approximation Analysis to Test for Separability and the
Existence of Consistent Aggregates”, American Economic Review, June, 67(3), 404-418.
Denny, M., M. Fuss, C. Everson and L. Waverman (1981) “Estimating the Effects of Technological
Innovation in Telecommunications: The Production Structure of Bell Canada”, Canadian Journal of
Economics, February, 14(l), 24-43.
Denny, M., M. Fuss and L. Waverman (1981) “The Substitution Possibilities for Energy: Evidence
from U.S. and Canadian Manufacturing Industries”, in: E. R. Bemdt and B. C. Field, eds.,
230-258.
Denny, M. and J. D. May (1978) “Homotheticity and Real Value-Added in Canadian Manufacturing”,
in: M. Fuss and D. McFadden, eds., 2, 53-70.
Denny, M., J. D. May and C. Pinto (1978) “The Demand for Energy in Canadian Manufacturing:
Prologue to an Energy Policy”, Canadian Journal of Economics, May, 11(2), 300-313.
Denny. M. and C. Pinto, “An Aggregate Model with Multi-Product Technologies”, in: M. Fuss and
D. McFadden, eds., 2, 249-268.
Diewert, W. E. (1971) “An Application of the Shephard Duality Theorem, A Generalized Leontief
Production Function”, Journal of Political Economy, May/June, 79(3), 481-507.
Diewert, W. E. (1973) “Functional Forms for Profit and Transformation Functions”, Journal of
Economic Theory, June, 6(3), 284-316.
Diewert. W. E. (1974a) “Applications of Duality Theory”, in: M. D. Intrilligator and D. A. Kendrick,
eds., 106-171.
Diewert, W. E. (1974b) “Functional Forms for Revenue and Factor Requirement Functions”,
fntemutional Economic Review, February, 15(l), 119-130.
Diewert, W. E. (1976) “Exact and Superlative Index Numbers”, Journal of Econometrics, May, 4(2),
115-14s.
Diewert. W. E. (1980) “Aggregation Problems in the Measurement of Capital”, in: D. Usher, ed., The
Measurement of Capital. Chicago: University of Chicago Press, 433-528.
Diewert, W. E. (1982) “Duality Approaches to Microeconomic Theory”, in: K. J. Arrow and M. D.
Intrilligator, eds., Handhook of Mathematical Economics, 2, 535-591.
Diewert, W. E. and C. Parkan (1983) “Linear Programming Tests of Regularity Conditions for
Production Functions”, in: W. Eichhom, R. Henn, K. Neumann and R. W. Shephard, eds.,
131-158.
Dogramaci, A., ed. (1983) Developments in Econometric Ana!yses of Productivity. Boston:
Kluwer-Nijhoff.
Douglas, P. W. (1948) “Are There Laws of Production?“, American Economic Review, March, 38(l),
1-41.
Douglas, P. W. (1967) “Comments on the Cobb-Douglas Production Function”, in: M. Brown, ed.,
15-22.
Douglas, P. W. (1976) “The Cobb-Douglas Production Function Once Again: Its History, Its Testing,
and Some Empirical Values,” October, 84(5), 903-916.
Ehud, R. I. and A. Melnik (1981) “The Substitution of Capital, Labor and Energy in the Israeli
Economy”, Resources and Energy, November, 3(3), 247-258.
Ch. 31: Econometric Methods for Modeling Producer Behavior 1909

Eichhorn, W., R. Henn, K. Neumann and R. Wm. Shephard, eds. (1983) Quantitative Studies on
Production and Prices. Wurzburg: Physica-Verlag.
Elbadawi, I.. A. R. Gallant and G. Souza (1983) “An Elasticity Can Be Estimated Consistently
Without a Priori Knowledge of Functional Form”, Econometrica, November, 51(6), 1731-1752.
Epstein, L. G. and A. Yatchew (1985) “The Empirical Determination ol Technology and Expecta-
tions: A Simplified Procedure”, Journal of Econometrics, February, 27(2), 2355258.
Evans. D. S. and J. J. Heckman (1983) “Multi-Product Cost Function Estimates and Natural
Monopoly Tests for the Bell System”, in: D. S. Evans, ed., Breaking up Bell. Amsterdam:
North-Holland, 253-282.
Evans, D. S. and J. J. Heckman (1984) “A Test for Subadditivity of the Cost Function with an
Application to the Bell System”, American Economic Review, Scptcmbcr, 74(4), 615-623.
Faucett, Jack and Associates (1977) Development of 35Order Input-Output Tables, 1958-1974.
Washington: Federal Emergency Management Agency.
Field, B. C. and E. R. Bemdt (1981) “An Introductiory Review of Research on the Economics of
Natural Resource Substitution”, in: E. R. Bemdt and B. C. Field, eds., l-14.
Field, B. C. and C. Grebenstein (1980) “Substituting for Energy in U.S. Manufacturing”, Review of
Economtcs und Statistics, May, 62(2), 207-212.
Forsund, F. R. and L. Hjalmarsson (1979) “Frontier Production Functions and Technical Progress: A
Study of General Milk Processing Swedish Dairy Plants”, Econometrica, July, 47(4), 883-901.
Forsund, F. R. and L. Hjalmarsson (1983) “Technical Progress and Structural Change in the Swedish
Cement Industry 195S-1979”, Econometrica, September, 51(5), 1449-1467.
Forsund, F. R. and E. S. Jansen (1983) “Technical Progress and Structural Change in the Norwegian
Primary Aluminum Industry”, Scundinavian Journal of Economics, 85(2), 113-126.
Forsund, F. R., C. A. K. Love11 and P. Schmidt (1980) “A Survey of Frontier Production Functions
and of Their Relationship to Efficiency Measurement”, Journul of Econometrrcs, May, 13(l), 5-25.
Fraumeni. B. M. and D. W. Jorgenson (1980) “The Role of Capital in U.S. Economic Growth,
194X-1976”, in: G. von Furstenberg, ed., 9-250.
Frcngcr, P. (1978) “Factor Substitution in the Interindustry Model and the Use of Inconsistent
Aggregation”, in: M. Fuss and D. McFadden, eds., 2, 269-310.
Friede, G. (1979) Investigution of Producer Behavior in the Federal Republic of Germany Using the
Translog Price Function. Cambridge: Oelgeschlager, Gunn and Hain.
Friedlaender, A. F. and R. H. Spady (1980) “A Derived Demand Function for Freight Transporta-
tion”, Review of Econonucs and Statistics, August, 62(3), 432-441.
Friedlaender, A. F. and R. H. Spady (1981) Freight Transport Regulation. Cambridge: M.I.T. Press.
Friedlaender, A. F., R. H. Spady and S. J. W. Chiang (1981) “Regulation and the Structure of
Technology in the Trucking Industry”, in: T. G. Cowing and R. E. Stevenson, eds., 77-106.
Frisch, R. (1965) Theoty of Production. Chicago: Rand McNally.
Fullerton, D., Y. K. Henderson and J. B. Shoven, “A Comparison of Methodologies in Empirical
General Equilibrium Models of Taxation”, in: H. E. Scarf and J. B. Shoven, eds., 367-410.
Fuss, M. (1977a) “The Demand for Energy in Canadian Manufacturing: An Example of the
Estimation of Production Structures with Many Inputs”, Journal <$ Econometrics, January, 5(l),
89-116.
Fuss, M. (1977b) “The Structure of Technology Over Time: A Model for Testing the Putty-Clay
Hypothesis”, Econometrica, November, 45(8), 1797-1821.
Fuss, M. (197X) “Factor Substitution in Electricity Generation: A Test of the Putty-Clay Hypothesis”,
in: M. Fuss and D. McFadden, eds., 2, 187-214.
Fuss, M. (1983) “A Survey of Recent Results in the Analysis of Production Conditions in Telecom-
munications”, in: L. Courville, A. de Fontenay and R. Dobell, eds., 3-26.
Fuss, M. and D. McFadden, eds. (1978) Production Economics. Amsterdam, North-Holland, 2 Vols.
Fuss, M., D. McFadden and Y. Mundlak (1978) “A Survey of Functional Forms in the Economic
Analysis of Production”, in: M. Fuss and D. McFadden, eds., 1, 219-268.
Fuss, M. and L. Waverman (1981) “Regulation and the Multiproduct Firm: The Case of Telecom-
munications in Canada”, in: G. Fromm, ed., Studies in Public Regulation. Cambridge: M.I.T. Press,
277-313.
Gallant, A. R. (1977) “Three-Stage Least Squares Estimation for a System of Simultaneous, Nonlin-
ear, Implicit Equations”, Journal of Econometrics, January, 5(l), 71-88.
1910 D. W. Jorgenson

Gallant, A. R. (1981) “On the Bias in Flexible Functional Forms and an Essentially Unbiased Form”,
Journal of Econometrics, February, 15(2), 211-246.
Gallant, A. R. and A. Holly (1980) “Statistical Inference in an Implicit, Nonlinear, Simultaneous
Equations Model in the Context of Maximum Likelihood Estimation”, Econometrica, April, 48(3),
6977720.
Gallant, A. R. and D. W. Jorgenson (1979) “Statistical Inference for a System of Simultaneous,
Nonlinear, Implicit Equations in the Context of Instrumental Variable Estimation”, Journul of
E~~onometrics, October/December, 11(2/3), 275-302.
Geary, P. T. and E. J. McDonnell (1980) “Implications of the Specification of Technologies: Further
Evidence”, Journul of Econometrics, October, 14(2), 247-255.
Gollop. F. M. and S. M. Karlson (1978) “The Impact of the Fuel Adjustment Mechanism on
Economic Efficiency”, Review of Economics and Stutistics, November, 60(4), 574-584.
Gallop, F. M. and M. J. Roberts (1981) “The Sources of Economic Growth in the U.S. Electric Power
Industry”, in: T. G. Cowing and R. E. Stevenson, eds., 107-145.
Gollop. F. M. and M. J. Roberts (1983) “Environmental Regulations and Productivity Growth: The
Case of Fossil-Fueled Electric Power Generation”. Journal of Poliiicul Economy, August, 91(4).
6544674.
German, W. M. (1959) “Separable Utility and Aggregation”, Econometrica, July, 27(3), 469-481.
Gourieroux, C., A. Holly and A. Monfort (1980) “Kuhn-Tucker, Likelihood Ratio and Wald Tests
for Nonlinear Models with Constraints on the Parameters”. Harvard University, Harvard Institute
for Economic Research, Discussion Paper No. 770, June.
Gourieroux, C.. A. Holly and A. Monfort (1982) “Likelihood Ratio Test, Wald Test, and Kuhn-Tucker
Test in Linear Models with Inequality Constraints on the Regression Parameters”, Econometricu,
January, 50(l), 63-80.
Greene. W. H. (1980) “Maximum Likelihood Estimation of Econometric Frontier Functions”,
Journal of Econumetrics, May, 13(l), 27-56.
Greene. W. H. (1983) “Simultaneous Estimation of Factor Substitution, Economies of Scale, Produc-
tivity, and Non-Neutral Technical Change”, in: A. Dogramaci, ed., 121-144.
Grifftn, J. M. (1977a) “The Econometrics of Joint Production: Another Approach”, Review of
Economics and Sfutistics, November, 59(4), 389-397.
Griffin, J. M. (1977b) “Interfuel Substitution Possibilities: A Translog Application to Pooled Data”,
Internationul Economic Review, October, 18(3), 755-770.
Grit%, J. M. (1977~) “Long-Run Production Modeling with Pseudo Data: Electric Power Generation”,
Bell Journal of Economics, Spring 1977, 8(l), 112-127.
Griffin, J. M. (1978) “Joint Production Technology: The Case of Petrochemicals”, Econometrica,
March, 46(l), 379-396.
Griffin, J. M. (1979) “Statistical Cost Analysis Revisited”, Quarterly Journul of Economics, February,
93(l). 107-129.
Griffin, J. M. (1980) “Alternative Functional Forms and Errors of Pseudo Data Estimation: A Reply”,
Review of Economics und Statistics, May, 62(2), 327-328.
Griffin, J. M. (1981a) “The Energy-Capital Complementarity Controversy: A Progress Report on
Reconciliation Attempts”, in: E. R. Bemdt and B. C. Field, eds., 70-80.
Griffin, J. M. (1981b) “Engineering and Econometric Interpretations of Energy-Capital Complemen-
tarity: Comment”, American Economic Review, December, 71(5), 1100-1104.
Griflin, J. M. (1981~) “Statistical Cost Analysis Revisited: Reply”, Quarterly Journul of Economics,
February, 96(l), 183-187.
GriRin, J. M. and P. R. Gregory (1976) “An Intercountry Translog Model of Energy Substitution
Responses”, Americun Economic Review, December, 66(5), 845-857.
Griliches, Z. (1967) “Production Functions in Manufacturing: Some Empirical Results”, in: M.
Brown, ed., 275-322.
Griliches, Z. and V. Ringstad (1971) Economies of Scale and the Form of the Production Function.
Amsterdam: North-Holland.
Hall, R. E. (1973) “The Specification of Technology with Several Kinds of Output”, Journal of
Political Economy, July/August, 81(4), 878-892.
Halvorsen, R. (1977) “Energy Substitution in U.S. Manufacturing”, Review of Economics and
Statistics. November, 59(4), 381-388.
Ch. .?I: Econometric Methods for Modeling Producer Behaoior 1911

Iialvorsen, R. (1978) Econometric Studies of U.S. Energy Demand. Lexington: Lexington Books.
Halvorsen, R. and J. Ford, “Substitution Among Energy, Capital and Labor Inputs in U.S.
Manufacturing”, in: R. S. Pindyck, ed., Advances in the Economics af Energy and Resources.
Greenwich: JAI Press. 1. 51-75 _.
Hamermesh, D. S. and~J. Grant (1979) “Econometric Stud& of Labor-Labor Substitution and Their
Implications for Policy”, Journal of IIuman Resources, Fall. 14(4). 518-542.
Han&h, G. (1978) “Symmetric Duality and Polar Production’ Functions”, in: M. Fuss and D.
McFadden, eds., 1, 111-132.
Hanoch, G. and M. Rothschild ( 1972) “Testing the Assumptions of Production Theory: A Nonpara-
metric Approach”, Journal of Political Economy, March/April, 80(2), 256-275.
Hansen, L. P. and T. J. Sargent (1980) “Formulating and Estimating Dynamic Linear Rational
Expectations Models”, Journal of Economic D.ynamics and Control, February, 2(l), l-46.
Hansen, L. P. and T. J. Sargent (1981) “Linear Rational Expectations Models for Dynamically
Interrelated Variables”, in: R. E. Lucas and T. J. Sargent. eds.. Rational Exnectations and
Econometric Practice. Minneapolis: University of Minnesot~Prcss, 1, 127-156. ’
Harmatuck, Donald J. (1979) “A Policy-Sensitive Railway Cost Function”, Logi.stic.v and Trunsporta-
tion Review, April, 15(2), 277-315.
Harmatuck, Donald J. (1981) “A Multiproduct Cost Function for the Trucking Industry”, Journal o/
Transportation Economics and Polky, May, 15(2), 135-153.
Heady, E. 0. and J. L. Dillon (1961) Agricultural Production Functions. Ames: Iowa State University
Press.
Hicks, J. R. (1946) Value and Cupitul. 2nd ed. (1st ed. 1939) Oxford: Oxford University Press.
Hicks, J. R. (1963) The Theory of Wages. 2nd ed. (1st ed. 1932), London: Macmillan,
Hildenbrand, W. (1981) “Short-Run Production Functions Based on Microdata”, Econometrtca,
September, 4Y(5), 1095-1125.
Hotelling. H. S. (1932) “Edgeworth’s Taxation Paradox and the Nature of Demand and Supply
Functions”, Journal of Politicul Economy, October, 40(5), 517-616.
Houthakker, H. S. (1955-1956) “The Pareto Distribution and the Cobb-Douglas Production Func-
tion in Activity Analysis”, Review of Economic Studies, 23(l), 60, 27-31.
Hudson, E A. and D. W. Jorgenson (1974) “U.S. Energy Policy and Economic Growth, 1975-2000”,
Bell .Journul of Econ0mic.s und Munugement Science, Autumn, 5(2), 461-514.
Hudson, E. A. and D. W. Jorgenson (197X) “The Economic Impact of Policies to Rcducc U.S. Energy
Growth,” Resources and Energy. November. l(3). 205-230.
Humphrey, D. B. and J. R. Moroney (1975) “Substitution Among Capital, Labor, and Natural
Resource Products in American Manufacturing”, Journal of Political Econom_v, February, 83(l),
57-82.
Humphrey, D. B. and B. Wolkowitz (1976) “Substituting Intermediates for Capital and Labor with
Alternative Functional Forms: An Aggregate Study”, Applied Economics, March, X(l), 59-68.
Intriligator. M. D. and D. A. Kendrick, eds. (1974) Frontiers in Qunntitative ,!konomit:r. Amsterdam:
North-Holland, Vol. 2.
JaraaDiaz, S. and C. Winston (1981) “Multiproduct Transportation Cost Functions: Scale and Scope
in Railway Operations”, in: N. Blattner, ed., Eighth European Associution for Research m Industrtal
Economics, Basle: University of Basle, 1, 437-469.
Jcnnrich, R. I. (1969) “Asymptotic Properties of Nonlinear Least Squares Estimations”, Annuls of
Mathematical Statistits, April, 40(2), 633-643.
Johansen. L. (1972) Production Functions. Amsterdam: North-Holland.
Johansen, L. (1974) A Multi-Sectoral Study of Economic Growth. 2nd ed. (1st ed. 1960) Amsterdam,
North-Holland.
Jorgenson. D. W:. (1973a) “The Economic Theory of Replacement and Depreciation”, in: W.
Sellckaerts, ed., Econometrrcs and Economic Theoty. New York: Macmillan, 1X9-221.
Jorgenson, D. W. (1973b) “Technology and Decision Rules in the Theory of Investment Behavior”,
Quarter!y Journal of Economics, November 1973, 87(4), 523-543.
Jorgenson, D. W. (1974) “Investment and Production: A Review”, in: M. D. Intriligator and D. A.
Kendrick. eds., 341-366.
Jorgenson. D. W. (19X0) “Accounting for Capital”, in: G. von Furstenberg, ed., 251-319.
Jorgenson, D. W. (1981) “Energy Prices and Productivity Growth”, Scundinuviun Journal of Econonz-
1912 D. W. Jorgenson

tcs, X3(2), 165-179.


Jorgenson, D. W. (1983) “Modeling Production for General Equilibrium Analysis”, Scandinauian
Journal oj Economrcs, 85(2), 101-112.
Jorgenson, D. W. (1984a) “Econometric Methods for Applied General Equilibrium Analysis”, in:
H. E. Scarf and J. B. Shoven, eds., 139-203.
Jorgenson, D. W. (1984b) “The Role of Energy in Productivity Growth”, in: J. W. Kendrick, ed.,
International Comparisons of Productivity and Causes of the Slowdown. Cambridge: Ballinger,
219-323.
Jorgenson, D. W. and B. M. Fraumeni (1981) “Relative Prices and Technical Change”, in: E. R.
Berndt and B. C. Field, eds., 17-47; revised and reprinted in: W. Eichhorn, R. Henn, K. Neumann
and R. W. Shephard, eds., 241-269.
Jorgenson, D. W. and J.-J. Laffont (1974) “Efficient Estimation of Non-Linear Simultaneous Equa-
tions with Additive Disturbances”, Annal.s of Social and Economrc Meusurement, October, 3(4),
615-640.
Jorgenson, D. W. and L. .I. Lau (1974a) “Duality and Differentiability in Production”, Journal of
Ecorzomic Theory, September, 9(l), 23-42.
Jorgenson, D. W. and L. J. Lau (1974b) “The Duality of Technology and Economic Behavior”,
Review of Economic Studies, April, 41(2), 126, 181-200.
Jorgenson. D. W. and L. J. Lau (1975) “The Structure of Consumer Preferences”, Annals of Social and
Economic Measurement, January, 4(l), 49-101.
Jorgenson, D. W., L. J. Lau and T. M. Stoker (1982) “The Transcendental Logarithmic Model of
Aggregate Consumer Behavior”, in: R. L. Basmann and G. Rhodes, cds., Advances in Econometrics.
Greenwich: JAI Press, 1, 97-238.
Kang, H. and G. M. Brown (1981) “Partial and Full Elasticities of Substitution and the Energy-Capital
Complementarity Controversy”, in: E. R. Berndt and B. C. Field, eds., 81-90.
Kennedy, C. (1964) “Induced Bias in Innovation and the Theory of Distribution”, Economic Journal,
September, 74(298), 541-547.
Kennedy, C. and A. P. Thirlwall (1972) “Technical Progress: A Survey”, Economic Journul, March,
X2(325). 11-72.
Kiss, F.. S. Karabadjian and B. J. Lefebvre (1983) “Economies of Scale and Scope in Bell Canada”,
in: L. Courville, A. de Fontenay and R. Dobell, eds., 55-82.
Kmcnta, J. (1967) “On Estimation of the CES Production Function”, Internutionnl Economic Review,
June, 8(2), 180-189.
Kohli, U. R. (1981) “Nonjointness and Factor Intensity in U.S. Production”, International Economic
Remew, February, 22(l), 3-18.
Kohli, U. R. (1983) “Non-joint Technologies”, Review af Economic Studies, January, 50(l), 160,
209-219.
Kopp, R. J. and W. E. Diewert (1982) “The Decomposition of Frontier Cost Function Deviations into
Measures of Technical and Allocative Efficiency”, Journnl of Econometrics, August, 19(2/3),
319-332.
Kopp, R. J. and V. K. Smith (1980a) “Input Substitution, Aggregation, and Engineering Descriptions
of Production Activities”, Economics Letter.s, 5(4), 289-296.
Kopp, R. J. and V. K. Smith (1980b) “Measuring Factor Substitution with Neoclassical Models: An
Experimental Evaluation”, Bell Journal of Economics, Autumn, 11(2), 631-655.
Kopp, R. J. and V. K. Smith (1981a) “Measuring the Prospects of Resource Substitution Under Input
and Technology Aggregation”, in: E. R. Bemdt and B. C. Field, eds., 145-174.
Kopp, R. J. and V. K. Smith (1981b) “Productivity Measurement and Environmental Regulation: An
Engineering-Econometric Analysis”, in: T. G. Cowing and R. E. Stevenson, eds., 249-283.
Kopp, R. J. and V. K. Smith (1981~) “Neoclassical Modeling of Nonneutral Technological Change:
An Experimental Appraisal”, Scandinavian Journal of Economics, 85(2), 127-146.
Kopp, R. J. and V. K. Smith (1982) “Neoclassical Measurement of Ex Ante Resource Substitution:
An Experimental Evaluation”, in: J. R. Moroney, ed., Advances in the Economics of Energy and
Resources. Greenwich: JAI Press, 4, 183-198.
Koopmans, T. C. (1977) “Examples of Production Relations Based on Microdata”, in: G. C.
Harcourt, ed.. The Microeconomic Foundations of Macroeconomics. London: Macmillan, 144-171.
Kuhn, H. W. and A. W. Tucker (1951) “Nonlinear Programming”, in: J. Neyman, ed., Proceedings of
C‘h. 31: Econometric Methods for Modeling Producer Behavior 1913

the Second Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of
California Press, 481-492.
Lau, L. J. (1969) “Duality and the Structure of Utility Functions”, Journal of Economic Theon;,
December, l(4), 374-396.
Lau, L. J. (1974) “Applications of Duality Theory: Comments”, in: M. D. Intriligator and D. A.
Kendrick, eds., 176-199.
Lau, L. J. (1976) “A Characterization of the Normalized Restricted Protit Function”, Journal of
Economic Theory, February, 12(l), 131-163.
Lau, L. J. (197Xa) “Applications . of Profit Functions”, in: M. Fuss and D. McFadden. eds.. 1.
133-216.
Lau, I.. J. (1978b) “Testing and Imposing Monotonicity, Convexity and Quasi-Convexity Constraints”,
in: M. Fuss and D. McFadden, eds., 1. 409-453.
Lau. L. J. (1986) “Functional Forms in Econometric Model Building”, this Handhook, Vol. 3.
Leontief, W. W. (1947a) “Introduction to a Theory of the Internal Structure of Functional Relation-
ships”, Econometrico, October, 15(4), 361-373.
Leontief, W. W. (1947b) “A Note on the Interrelation of Subsets of Independent Variables of a
Continuous Function with Continuous First Derivatives”, Bulletin of the American Mathematical
Socrety, April, 53(4), 343-350.
Leontief, W. W. (1951) The Structure of the American Economy, 1919-1939. 2nd cd. (1st ed. 1941)
New York: Oxford University Press.
Leontief. W. W., ed. (1953) Studies in the Structure of the American Economy. New York: Oxford
University Press.
Liew, C. K. (1976) “A Two-Stage Least-Squares Estimator with Inequality Restrictions on the
Parameters”, Review of Economics and Statistics, May, 58(2), 234-238.
Longva, S. and 0. Olsen (1983) “Producer Behaviour in the MSG Model”, in: 0. Bterkholt, S.
Longva, 0. Olsen and S. Strom, eds., Ana!vsis of Supply and Demand of Electricit_v in the Norwegian
I:‘conomy. Oslo: Central Statistical Bureau, 52-83.
Lucas, R.-E. (1967) “Adjustment Costs and the Theory of Supply”, Journal of Political Econom,y,
August, Pt. 1, 75(4), 321-334.
Maddala, Cr. S. and R. B. Roberts (1980) “Alternative Functional Forms and Errors of Pseudo Data
Estimation”, Review of Economics und Statistics, May, 62(2), 323-326.
Maddala. G. S. and R. B. Roberts (1981) “Statistical Cost Analysis Revisited: Comment”, Quartet+
Journal of Economrcs, February, 96(l), 177-182.
Magnus, J. R. (1979) “Substitution Between Energy and Non-Energy Inputs in the Netherlands,
1950-1976”. International Economic Review, June, 20(2), 465-484.
Magnus, J. R. and A. D. Woodland (1980) “Interfuel Substitution and Separability in Dutch
Manufacturing: A Multivariate Error Components Approach”, London School of Economics,
November.
Malinvaud, E. (1970) “The Consistency of Non-Linear Regressions”, Annals of Mathematical Statts-
tics, June, 41(3), 456-469.
Malinvaud, E. (1980) Statistical Methods of Econometricx 3rd ed. (1st ed. 1966) trans. A. Silvey.
Amsterdam: North-Holland.
McFadden, D. (1963) “Further Results on CES Production Functions”, Review of Economic Studie.s,
June, 30(2), X3, 73-83.
McFadden, 1). (1978) “Cost, Revenue, and Profit Functions”, in: M. Fuss and D. McFadden, eds., 1,
l-110.
McRae. R. N. (1981) “Regional Demand for Energy by Canadian Manufacturing Industries”,
Internatronal Journal o/Energy S_ystems, January, l(l), 38-48.
McRae, R. N. and A. R. Webster (1982) “The Robustness of a Translog Model to Describe Regional
Energy Demand by Canadian Manufacturing Industries”, Resources and Energy, March, 4(l), l-25.
Meese. R. (19X0) “Dynamic Factor Demand Schedules for Labor and Capital Under Rational
Expectations”, Journal of Econometrics, September, 14(l), 141-15X.
Moroney, J. R. and A. Toevs (1977) “Factor Costs and Factor Use: An Analysis of Labor, Capital,
and Natural Resources”, Southern Economic Journal, October, 44(2), 222-239.
Moroney. J. R. and A. Toevs (1979) “Input Prices, Substitution, and Product Inflation”, in: R. S.
Pindyck, ed., Advances tn the Economics of Energy and Resources. Greenwich: JAI Press, 1, 27-50.
1914 D. W. Jorgenson

Moroncy, J. R. and J. M. Trapani (1981a) “Alternative Models of Substitution and Technical Change
in Natural Resource Intensive Industries”, in: E. R. Berndt and B. C. Field, eds., 48-69.
Moroney. J. R. and J. M. Trapam (1981b) “Factor Demand and Substitution in Mineral-Intensive
Industries”, Bell Journul of Economics. Spring. 12(l). 212-285.
Morrison. C. J. and E. R. Berndt (1981) “Short-run Labor Productivity in a Dynamic Model”, Journal
of Econometrics, August, 16(3), 339-366.
Mundlak, Y. (1963) “Estimation of Production and Behavioral Functions from a Combination of
Cross-Section and Time Series Data”, in: C. Christ, et al., 138-166.
Mundlak. Y. (1978) “On the Pocling of Time Series and Cross Section Data”, Econometn’cu, January,
46(l), 60-X6.
Nadiri. M. I. (1970) “Some Approaches to the Theory and Measurcmcnt of Total Factor Productivity:
A Survey”, Journul of Economic Lueruture, December, 8(4), 1137-1178.
Nadiri, M. I. and M. Schankcrman (1981) “The Structure of Production, Technological Change, and
the Rate of Growth of Total Factor Productivity in the U.S. Bell System”, in: T. G. Cowing and
R. E. Stevenson, eds., 219-248.
Nakamura, S. (1984) An Inter-Industry Translog Model of Prices and Technical Change for the We.st
C;ermun Economy. Berlin: Springer-Verlag.
Nerlove, M. (1963) “Returns to Scale in Electricity Supply”, in: C. Christ, et al., 167-200.
Nerlove, M. (1967) “ Recent Empirical Studies of the CES and Related Production Functions”, in: M.
Brown, cd., 55-122.
Norsworthy, J. R. and M. J. Harper (1981) “Dynamic Models of Energy Substitution in U.S.
Manufacturing”, in: E. R. Bemdt and B. C. Field, eds., 177-208.
Ozatalay, S., S. S. Grumbaugh and T. V. Long III, “Energy Substitution and National Energy Policy”,
Americun Economic Review, May, 69(2), 369-371.
Parks, R. W. (1971) “Responsiveness of Factor Utilization in Swedish Manufacturing, 1870-1950”,
Rerliew of Economics and Stntisrics, May, 53(2), 129-139.
Peterson, H. C. (1975) “An Empirical Test of Regulatory Effects”, Bell Journal of Economics, Spring,
6(l), 111-126.
Pindyck. R. S. (lY79a) “Interfuel Substitution and Industrial Demand for Energy”, Reuiew of
Gonomic.v und Srarisric.s, May, 61(2), 169-179.
Pindyck, R. S. (1979b) The Structure of World Energy Demand. Cambridge: M.I.T. Press.
Pindyck, R. S. and J. J. Rotemberg (1983a) “Dynamic Factor Demands and the Effects of Energy_.
Price Shocks”, Americun Economic Review, December, 73(5), 1066-1079.
Pindvck. R. S. and J. J. Rotembere (1983b) “Dvnamic Factor Demands Under Rational Exoectations”.
S&dinaoian Journul of Econo&, 85(i), 223-239.
Quandt, R. E. (1983) “Computational Problems and Methods”, this Handbook, 1, 701-764.
Russell, C. S. and W. J. Vaughan (1976) Steel Production. Baltimore: Johns Hopkins University Press.
Russell, R. R. (1975) “Functional Separability and Partial Elasticities of Substitution”, Reuiew of
Economic Studies, January, 42(l), 129, 79-86.
Samuelson, P. A. (1951) “Abstract of a Theorem Concerning Substitutability in Open Leonticf
Models”, in: T. C. Koopmans, ed., Activity Anulysis of Production and Allocution. Wiley: New York,
142-146.
Samuelson, P. A. (1953-1954) “Prices of Factors and Goods in General Equilibrium”, Review of
Economic Studies, 21(l), 54, l-20.
Samuelson, P. A. (1960) “Structure of a Minimum Equilibrium System”, in: R. W. Pfouts, cd., ~.~sstlys
in Economics und Econometrrcs. Chapel Hill: University of North Carolina Press, l-33.
Samuelson, P. A. (1965) “A Theory of Induced Innovation Along Kennedy-Weizsacker Lines”,
Revrew of Economics and Stufistics, November, 47(4), 343-356.
Samuelson, P. A. (1973) “Relative Shares and Elasticities Simplified: Comment”, American Economic
Review, Septcmbcr, 63(4), 770-771.
Samuclson, P. A. (1974) “Complementarity-An Essay on the 40th Anniversary of the Hicks-Allen
Revolution in Demand Theory”, Journul of Economic Literature, December, 12(4), 1255-1289.
Samuelson, P. A. (1979) “Paul Douglas’s Measurement of Production Functions and Marginal
Productivities”, Journal of Political Economy, October, Part 1, X7(5), 923-939.
Samuelson, P. A. (1983) Foundations of Economic Analysis. 2nd ed. (1st ed. 1947), Cambridge:
Harvard University Press.
Sargan, J. D. (1971) “Production Functions”, in: R. Layard, cd., Qualijied Manpower and Economic
Ch. <I: EconomrtrrcMethods for Modeling Producer Behavior 1915

Prrformunce. London: Allan Lane. 145-204.


Sargent. T. J. (1978) “Estimation of Dynamic Labor Demand Schedules Under Rational Expectations”,
Journal of Polirical Econom_y, December, 86(6), 1009-1045.
Sato, K. (1975) Produclion Functions und Aggregation. Amsterdam: North-Holland.
Scarf, H. E. and J. B. Shaven, eds. (1984) Applied General Equilibrium Ana+sr.s. Cambridge:
Cambridge University Press.
Schneider, E. (1934) Theorie der Produktion. Wien: Springer.
Shephard. R. W. (1953) Cost and Production Functions. Princeton: Princeton University Press.
Shephard, R. W. (1970) Theory of Cost and Production Functions. Princeton: Princeton University
Press.
Sono, M. (1961) “The Effect of Price Changes on the Demand and Supply of Separable Goods”,
International Economic Review, September, 2(3), 239-271.
Spady, R. H. and A. F. Friedlaender (1978) “Hedonic Cost Functions for the Regulated Trucking
Industry”. Bell Journal of Economics, Spring, 9(l), 159-179.
Stevenson, R. E. (1980) “Measuring Technological Bias”, American Economic Review, March, 70(l),
162-173.
Thompson, R. G., et al. (1977) Environment and Energy in Petroleum Rejming, Electric Power, and
Chemical Industries. Houston: Gulf Publishing.
Uzawa, H. (1962) “Production Functions with Constant Elasticity of Substitution”, Review of
Economic Studies, October, 29(4), 81, 291-299.
Uzawa, H. (1964) “Duality Principles in the Theory of Cost and Production”, International Economic
Review, May, 5(2), 216-220.
Uzawa, H. (1969) “Time Preference and the Penrose Effect in a Two-Class Model of Economic
Growth”, Journal of Political Economy, July/August, Pt. 2, 77(4), 628-652.
Varian, H. (1984) “The Nonparametric Approach to Production Analysis”, Econometricu, May, 52(2),
579-598.
von Furstenberg, G., ed. (1980) Capital, Eficiency, and Growth. Cambridge: Ballinger.
von Weizsacker, C. C. (1962) “A New Technical Progress Function”. Massachusetts Institute of
Technology, Department of Economics.
Walras, L. (1954) Elements of Pure Economics. trans. W. JatIe, Homewood: Irwin.
Walters, A. A. (1963) “Production and Cost Functions: An Econometric Survey”, Econometrica,
January-April, 31(l), 1-66.
Wills, J. (1979) “Technical Change in the U.S. Primary Metals Industry”, Journul of Econometrics,
April, 10(l), 85-98.
Winston, C. (1985) “Conceptual Developments in the Economics of Transportation: An Interpretive
Survey”, Journal of Economic Literature, March, 23(l), 57-94.
Woodland, A. D. (1975) “Substitution of Structures, Equipment, and Labor in Canadian Production”,
Internarional Economic Reuiew, February, 16(l), 171-187.
Woodland, A. D. (1978) “On Testing Weak Separability”, Journal of Econometrics, December, 8(3),
383-398.
Woodward. G. T. (1983) “A Factor Augmenting Approach for Studying Capital Measurement,
Obsolescence, and the Recent Productivity Slowdown”, in: A. Dogramaci, ed., 93-120.
Zellner, A. (1962) “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for
Aggregation Bias”, Journnl of the American Statisiical Association, June, 58(2), 348-368.
Zellner, A. and H. Theil (1962) “Three-Stage Least Squares: Simultaneous Estimation of Simulta-
neous Equations”, Econometrica, January, 30(l), 54-78.
C‘hupter 32

LABOR ECONOMETRICS*
JAMES J. HECKMAN

Utliversity of Chicago

THOMAS E. MACURDY

Contents

0. Introduction 1918
1. The index function model 1920
1.1. Introduction 1920
1.2. Some definitions and basic ideas 1921
1.3. Sampling plans 1926
2. Estimation 1929
2.1. Regression functions characterizations 1930
2.2. Dummy endogenous variable models 1945
3. Applications of the index function model 1952
3.1. Models with the reservation wage property 1952
3.2. Prototypical dummy endogenous variable models 1959
3.3. Hours of work and labor supply 1963
4. Summary 1971
Appendix: The principal assumption 1972
References 1974

*Heckman’s research on this project was supported by National Science Foundation Grant No.
SES-8107963 and NIH Grants ROl-HD16846 and ROl-HD19226. MaCurdy’s research on this project
was supported by National Science Foundation Grant No. SES-8308664 and a grant from the Alfred
P. Sloan Foundation. This paper has benefited greatly from comments generously given by Ricardo
Barros, Mark Gritz, Joe Hotz, and Frank Howland.

Hmdhook of Econometrics, Volume 111, Edited by Z. Griliches und M.D. Intriligutor


0 Elsevier Science Publishers BV, I986
1918 J. J. Heckmun and T. E. MuCurdy

0. Introduction

In the past twenty years, the field of labor economics has been enriched by two
developments: (a) the evolution of formal neoclassical models of the labor market
and (b) the infusion of a variety of sources of microdata. This essay outlines the
econometric framework developed by labor economists who have built theoreti-
cally motivated models to explain the new data.
The study of female labor supply stimulated early research in labor economet-
rics. In any microdata study of female labor supply, two facts are readily
apparent: that many women do not work, and that wages are often not available
for nonworking women. To account for the first fact in a theoretically coherent
framework, it is necessary to model corner solutions (choices at the extensive
margin) along with conventional interior solutions (choices at the intensive
margin) and to develop an econometrics sufficiently rich to account for both types
of choices by agents. Although there were precedents for the required type of
econometric model in work in consumer theory by Tobin (1958) and his students
[e.g. Rosett (1959)], it is fair to say that labor economists have substantially
improved the original Tobin framework and have extended it in various im-
portant ways to accommodate a variety of models and types of data. To account
for the second fact that wages are missing in a nonrandom fashion for nonwork-
ing women, it is necessary to develop models for censored random variables. The
research on censored regression models developed in labor economics had no
precedent in econometrics and was largely neglected by statisticians (See the essay
by Griliches in this volume).
The econometric framework developed for the analysis of female labor supply
underlies more recent models of job search [Yoon (1981), Kiefer and Neumann
(1979), Flinn and Heckman (1982)], occupational choice [Roy (1951), Tinbergen
(1951), Siow (1984), Willis and Rosen (1979), Heckman and Sedlacek (1984)], job
turnover [Mincer and Jovanovic (1981) Borjas and Rosen (1981), Flinn (1984)],
migration [Robinson and Tomes (1982)], unionism [Lee (1978) Strauss and
Schmidt (1976), Robinson and Tomes (1984)] and training evaluation [Heckman
and Robb (1985)].
All of the recent models presented in labor econometrics are special cases of an
index function model. The origins of this model can be traced to Karl Pearson’s
(1901) work on the mathematical theory of evolution. See D. J. Kevles (1985, p.
31) for one discussion of Pearson’s work. In Pearson’s framework, discrete and
censored random variables are the manifestations of underlying continuous
random variables subject to various sampling schemes. Discrete random variables
are indicators of whether or not certain latent continuous variables lie above or
Ch. 32: L.&or Econometrics 1919

below given thresholds. Censored random variables are direct observations on the
underlying random variables given that certain selection criteria are met. Assum-
ing that the underlying continuous random variables are normally distributed
leads to the theory of biserial and tetrachoric correlation. [See Kendall and Stuart
(1967, Vol. II), for a review of this theory.] Later work in mathematical psy-
chology by Thurstone (1927) and Bock and Jones (1968) utilized the index
function framework to produce mathematical models of choice among discrete
alternatives and stimulated a considerable body of ongoing research in economics
[See McFadden’s paper in Volume II for a survey of this work and Lord and
Novick (1968) for an excellent discussion of index function models used in
psychometrics].
The index function model cast in terms of underlying continuous latent
variables provides the empirical counterpart of many theoretical models in labor
economics. For example, it is both natural and analytically convenient to for-
mulate labor supply or job search models in terms of unobserved reservation
wages which can often be plausibly modeled as continuous random variables.
When reservation wages exceed market wages, people do not work. If the
opposite occurs, people work and wages are observed. A variety of models that
are special cases of the reservation wage framework will be presented below in
Section 3.
The great virtue of research in labor econometrics is that the problems and the
solutions in the field are the outgrowth of research on well-posed economic
problems. In this area, the economic problems lead and the proposed statistical
solutions follow in response to specific theoretical and empirical challenges. This
imparts a vitality and originality to the field that is not found in many other
branches of econometrics.
One format for presenting recent developments in labor econometrics is to
chart the history of the subject, starting with the earliest models, and leading up
to more recent developments. This is the strategy we have pursued in previous
joint work [Heckman and MaCurdy (1981); Heckman, Killingsworth and
MaCurdy (1981)]. The disadvantage of such a format is that basic statistical ideas
become intertwined with specific economic models, and general econometric
points are sometimes difficult to extract.
This paper follows another format. We first state the basic statistical and
econometric principles. We then apply them in a series of worked examples. This
format has obvious pedagogical advantages. At the same time, it artihcially
separates economic problems from econometric theory and does not convey the
flow of research problems that stimulated the econometric models.
This paper is in three parts: Part 1 presents a general introduction to the index
function framework; Part 2 presents methods for estimating index function
models; and Part 3 makes the discussion concrete by presenting a series of models
in labor economics that are special cases of the index function framework.
1920 J. J. Heckman and T. E. MaCurdy

1. The index function model

1.1. Introduction

The critical assumption at the heart of index function models is that unobserved
or partially observed continuous random variables generate observed discrete,
censored, and truncated random variables. The goal of econometric analysis
conducted for these models is to recover the parameters of the distributions of the
underlying continuous random variables.
The notion that continuous latent variables generate observed discrete, censored
and truncated random variables is natural in many contexts. For example, in the
discrete choice literature surveyed by McFadden (1985), the difference between
the utility of one option and the utility of another is often naturally interpreted as
a continuous random variable, especially if, as is sometimes plausible, utility
depends on continuously distributed characteristics. When the difference of
utilities exceeds a threshold (zero in this example), the first option is selected. The
underlying utilities of choices are never directly observed.
As another example, many models in labor economics are characterized by a
“reservation wage” property. Unemployed persons continue to search until their
reservation wage - a latent variable-is less their the offered wage. The difference
between reservation wages and offered wages is a continuous random variable if
some of the characteristics generating reservation wages are continuous random
variables. The decision to stop searching is characterized by a continuous latent
variable falling below a threshold (zero). Observed wages are censored random
variables with the censoring rule characterized by a continuous random variable
(the difference between reservation wages and market wages) crossing a threshold.
Further examples of index functions generated by economic models are presented
in Section 3.
From the vantage point of context-free statistics, using continuous latent
variables to generate discrete, censored or truncated random variables introduces
unnecessary complications into the statistical analysis. Despite its ancient heri-
tage, the index function approach is no longer widely used or advocated in the
modern statistics literature. [See, e.g. Bishop, Fienberg and Holland (1975) or
Haberman (1978) Volumes I and II.]’ Given their disinterest in behavioral
models, many statisticians prefer direct parameterizations of discrete data and
censored data models that typically possess no behavioral interpretation. Some
statisticians have argued that econometric models that incorporate behavioral

‘Such models are still widely used in the psychometric literature. See Lord and Novick (1968) or
Bock and Jones (1968).
Ch. 3.?: Lohor Econometr~ts 1921

theory are needlessly complicated. For this reason labor economics has been the
locus of recent research activity on index function models.

1.2. Some dejnitions and basic ideas

Index functions are defined as continuously distributed random variables. It is


helpful to distinguish two types of index functions: those corresponding to
continuous random variables that are not directly observed in a given context (2)
and those corresponding to continuous random variables that are partially
observed (Y) in a sense to be made precise below. In the subsequent discussion,
the set ti represents the support (or the domain of definition) of (Y, Z); the set 0
denotes the support of Z, and * is the support of Y; &? is the Cartesian product
of \k and 0.2

I .2.1. Quan tal response models

We begin with the most elementary index function model. This model ignores the
existence of Y and focuses on discrete variables whose outcomes register the
occurrence of various states of the world. Let 0, be a nontrivial subset of 0.
Although we do not directly observe Z, we know if

If this event occurs, we denote it by setting an indicator function ai equal to one.


More formally,

if Z E O,,
(1.2.1)
otherwise.

When 6; = 1, state i occurs. The distribution of Z induces a distribution on the ai


because

Pr(S, =l) = Pr(Z E Oi). (1.2.2)

The discrete choice models surveyed by McFadden (1985) can be cast in this
framework. Let Z be a J X 1 vector of utilities, Z = (V(l), . . . , V(J))‘. The event
that option i is selected is the event that V(i) is maximal in the set { V(j)}:=,. In
the space of the distribution of utilities, the event that V(i) is maximal corre-

‘a, 0 and * and all partitions of these sets considered in this paper are assumed to be Bore1 sets.
1922 J. J. Heckman und T. E. MaC’ur<p

sponds to the subspace of 2 defined by the inequalities

V(j)-V(i) 10, j=l ,..., J.

Then in this notation

O,= J},
{ZlV(j)-V(i)<O, j=l,...,

and

Pr(ai=l) =Pr(ZEO,).

Introducing exogenous variables (X) into this model raises only minor concep-
tual issues3 The distribution of Z can be defined conditional on X, and the
regions of definition of 6, can also be allowed to depend on X (so 0; = Oi( X)).
The conditional probability that 8, =I given X is

Pr(6,=1lX) =Pr(ZE@JX). (1.2.3)

1.2.2. Models involving endogenous discrete and continuous random variables

We now consider a selection mechanism which records observations on Y only if


(Y, Z) lies in some subspace of 0. More formally, we define the observed value of
Y as Y* with

Y*=Y if(Y,Z)EQ,, (1.2.4)

where fi, is a subspace of 0. We establish the convention that

y*=o if(Y,Z)@Q,. (1.2.5)

This convention is innocuous because the probability that Y = 0 is zero as a


consequence of the assumption that Y is an absolutely continuous random
variable.
A special case of this selection mechanism produces truncated random vari-
ables. Y * is a truncated random variable if the event (Y, Z) E a, implies that Y
must lie in a strict subset of its support 9. Thus Y * is observed only in certain
ranges of values of Y. For example, negative income tax experiments sample only
low income persons. Letting Y be income, Y * is only observed in data from such
experiments if Y is below the cut off point for inclusion of observations into the
experiments.

‘Exogenous random variables are always observed and have a marginal density that shares no
parameters in common with the conditional distribution of the endogenous variables given the
exogenous variables.
Ch. 32: I.uhor Econometrics 1923

Observed values of Y produced by the general selection mechanism (1.2.4)


without restrictions on the range of Y * are censored random variables. As an
example of a censored random variable, consider the analysis of Cain and Watts
(1973). Let Y be hours of work, Z, be wage rates, and Z, denote unearned
income, where the Z, are assumed to be unobserved in this context. Negative
income tax experiments observe Y only for low income people (i.e. people for
whom Z,Y + Z, is sufficiently low). While sampled hours of work - Y * -may
take on all values assumed by Y, the density of Y * may differ greatly from the
density of Y.
A useful extension of the selection mechanism presented in eq. (1.2.4) is a
multi-state model which defines observed values of Y for various states of the
world indexed by i, i = 1,. . . , I. For state i we define the observed value of Y as

y.*=r if (Y,Z) EGi, i=l,..., I,, (1.2.6)

where the Oi’s are subsets of 0, and I, ( 4 I) is the number of states in which Y
is observed. In the remaining states (I - Zi in number), Y is not observed. We
define an indicator variable ai by

1 if(Y,Z)Ea,
6; = i=l ,..., Z.
i 0 if(Y,Z)eQ,

To avoid uninteresting complications, it is assumed that U fclOi = 9, and that


the sets Qi and Qj are disjoint for i f j. Without any loss of generality, we may
set

Y*=O if ai=O. (1.2.7)

The variable Y * = ~fl=,~* equals Y if it is observed (i.e. if ai = 1 for some


i=l ,..., Ii), and Y * = 0 if any of the states i = Z, + 1,. _. , Z occur. In other words,
Y is observed when cj’=i6, = 1.
To obtain specifications of various density functions that are useful in the
econometrics of labor supply, rationing and state contingent demand theory, let
f( y, z) be the joint density of (Y, Z). Denote the conditional support of Z when
(Y, Z) E Qi as Oily which is defined so that, for any fixed Y = y E qlk,, the set of
admissible Y values in fii, the event Z E Oily necessarily implies Si = 1; the set
Oilv in general depends on Y = y. In this notation, the density of y* conditional
on 6, = 1 is

/&“,*f(YA4 dz
for y,* E qj
gi(Y?> =
i
Pr(6, =l) i=l ,...> Z1, (1.2.8)
1924 J. J. Heckman and T. E. MaCurdy

with

Pr(8, =l) = Pr((Y, Z) E tii) = /of(y, z)dydl,

where the notation Jo,,, and Jo denotes integration over the sets 0 given y and
9, respectively - i.e.

and

The function gi( .) is the conditional density of Y given that selection rule (1.2.6)
is satisfied. As a consequence of convention (1.2.7) the distribution of Yj* when
& = 0 has point mass at q* = 0 (i.e. Pr(q* = 016, = 0) = 1).
The joint density of Y* and Si is

gi(y,*,ai) = [gi(y~)Pr(Gi=1)]“‘[J(y~)Pr(Si=0)]’-6’, i=l ,***, 1,


(l!2.9)

where J( yi*) = 1 if y: = 0 and J( y,*) = 0 otherwise, where Pr( i_$= 0) = 1 - Pr( ai


= l), and where we adopt the convention that zero raised to the zero-th power
equals one (i.e. when ai = 0 and yr 4 ‘k; so gi( y:) = 0, then [ gi( yT)Pr(6; = l)]’
= l)?
From (1.2.9) the conditional density of Y * given that state i = 1,. . . , I, occurs
is

Y * is defined to be degenerate at zero if one of the other states i = I1 + 1,. . . , I


occurs. A compact expression for the conditional density of Y * is

~(y*l~1,..., ST>= i&I [gi(Y*)ls87 if C’ ai= (1.2.10)


i=l

4 We use the term “density” in the sense of the product measure d [ yl + K, ( y, )] X d [ K, (z ) + Kl (z )]


on R’ x[O,l] where dy is Lebesgue measure on R’ and K,(z) is the probability distribution that
assigns the point a in R’ unit mass.
Ch. 32: Labor Econometrics 1925

with Y * = 0 with probability one when 6; = 1 for some value of i = I, + 1,. . . , I.


The joint density of Y * and S,, . . . , 6, is the product of the conditional density of
Y* (1.2.10) and the joint probability of 6 ,,..., 6,; i.e.

~b*Jl,..., =lfil
sf> [gi(Y*)Pr(sz =l>l 8’i=++l
I
[J(Y?)Pr(si=l)l SC.
(1.2.11)

In some problems the particular state of the world in which an observation


occurs is unknown (i.e. the ai’s are not separately observed); it is only known that
one of a subset of states has occurred. Given information on Y *, one can
determine whether or not one of the first Ii states has occurred-since Y* # 0
indicates ai = 1 for some i 2 I, and Y * = 0 indicates 6, = 1 for some i > II -but it
may not be possible to determine the particular i for which Si = 1.
For example, suppose that when Y * # 0, one only knows that either 8, =
C)=,S, = 1 or S, = C!\r-I,+ iSi = 1. Suppose further that when Y * = 0, it is only
known that & = c!_,_[ +iSi =l. The densities (1.2.10) and (1.2.11) cannot directly
be used as a basis for inference in this situation. (Unless, of course, 1, = 1, 1, = 2,
and I = 3.)
The densities appropriate for analyzing data on Y * and the 8,‘s are obtained
by conditioning on the available knowledge about states. The desired densities are
derived by computing the expected value of (1.2.10) to eliminate the individual
8,‘s that are not observed. In particular, the marginal density of y* given 8, = 1 is
given by the law of iterated expectations as

k(y*@,=l) =E(h(y*16,,...,6,)@,=1)

= i h(y*16i=1)Pr(6i=11;T1=l)
r=l

= 5 g,(Y*)Pr(G,=l)/Pr(&=l).
i=l

Analogously, the density of Y * given & = 1 is

k(y*lZ,=l) = ; gi(y*)Pr(Si=1)/Pr(8,=1).5
i=I,+l

When 8, = 1, Y * is degenerate at zero. Thus the density of Y * conditional on the

‘These derivations use the fact that the sets D, are qmtually exclusive so Pr( 8, = 1) = Ez, Pr( 6, = 1)
and E( 13,)6, = 1J = Pr( 8, = 1 IS, = 1) = Pr( 8, = l)/Pr( 6, = l), with completely analogous results hold-
ing for 6, and 8,.
1926 J. J. Heckman and T E. MaCur&

6,‘sis given by

$1
h(y*18,,$,,8,) = 5 g,(y*)Pr(Si=1)/Pr(8i=1)
[ i=l I

where Y* has point mass at zero when 8s = 1 (i.e. Pr( Y * = 01& = 1) = 1).
Multiplying the conditional density (1.2.12) by the probability of the events 6i,

$1
&, 8, generates the joint density for Y* and the 6,‘s:

h(y*,8,,8,,&) = i
[ i=l
g,(y*)Pr(G,=l)

II
1
’ C gi(Y*)Pr(Gi=l) =l) “.
[ i=I,+l I
(1.2.13)

Densities of the form (1.2.8)-(1.2.13) appear repeatedly in the models for the
analysis of labor supply presented in Section 3.3.
All the densities in the preceding analysis can be modified to depend on
exogenous variables X, as can the support of the selection region (i.e. 9, = Q;(X)).
Writing f( y, z IX) to denote the appropriate conditional density, only obvious
notational modifications are required to introduce such variables.

I. 3. Sampling plans

A variety of different sampling plans are used to collect the data available to
labor economists. The econometric implications of data collected from such
sampling plans have received a great deal of attention in the discrete choice and
labor econometrics literatures. In this subsection we define the concepts of simple
random samples, truncated random samples, censored random samples, stratified
random samples, and choice based samples. To this end we let h(X) denote the
population density of the exogenous variables X, so that the joint density of
(Y,S,X) is

f(YJ, X) =f(WX>h(X), (1.3.1)


1927

with c.d.f.

F(Y,6, Jo (1.3.2)

From the definition of exogeneity, the marginal density of X contains no


parameters in common with the conditional density of (Y, 8) given X.
In the cases considered here, the underlying population is assumed to be
infinite and generated by probability density (1.3.1) and c.d.f. (1.3.2). If the
sampling is such that it produces a simple random sample, successive observations
must (a) be independent and (b) each observation must be a realization from a
common density (1.3.1). In this textbook case, the sample likelihood is the
product of terms of the form (1.3.1) with realized values of (Y, 6, X) substituted
in place of the random variables.
Next suppose that from a simple random sample, observations on (Y, 6, X) are
retained only if these random variables lie in some open subset of the support of
(Y, 6, X). More precisely suppose that observations on (Y, S, X) are retained only
if

w, J-1EAlCA, (1.3.3)

where A is the support of random variables (Y, 6, X).


In the classical statistical literature [See, e.g. Kendall and Stuart, Vol. II, (1967)]
no regressors are assumed to appear in the model. In this case, a sample is defined
to be censored if the number of observations not in A, is recorded (so S is known
for all observations). If this information is not retained, the sample is truncated.
When regressors are present, there are several ways to extend these definitions
allowing either S or X to be recorded when (Y, 8, X) P A,. In this paper we adopt
the following conventions. If information on (8, X) for all (Y, 6, X) e A, is
retained (but Y is not known), we call the sample censored. If information on
(8, X) is not retained for (Y, S, X) P A,, the sample is truncated. Note that in
these definitions A, can consist of disconnected sets of A.
One operational difference between censored and truncated samples is that for
censored samples it is possible to consistently estimate the population probability
that (Y, 6, X) E A,, whereas for truncated samples these probabilities cannot be
consistently estimated as sample sizes become large. In neither sample is it
possible to directly estimate the conditional distribution of (Y, 6, X) given
(Y, 6, X) GEA, using an empirical c.d.f. for this subsample.

‘It is possible to estimate this conditional distribution using the subsample generated by the
rcguircmcnt that (Y, 6, X) E A, for certain specific functional form assumptions for F. Such forms for
F are termed “recoverable” in the literature. See Heckman and Singer (1986) for further discussion of
this issue of recoverability.
1928 J. J. Heckman and T. E. MaCurdy

In the special case in which the subset A, only restricts the support of X,
(exogenous truncated and censored samples), the econometric analysis can pro-
ceed conditional on X. In light of the assumed exogeneity of X, the only possible
econometric problem is a loss in efficiency of proposed estimators.
Truncated and censored samples are special cases of the more general notion of
a strati$ed sample. In place of the special sampling rule (1.3.3), in a general
stratified sample, the rule for selecting independent observations is such that even
in an infinite sample the probability that (Y, 8, X) E Ai c A does not equal the
.population probability that (Y, 6, X) E A, where U f,iAi = A, and A i and A j
are disjoint for all i # j. It is helpful to further distinguish between exogenous&
stratijied and endogenously strati$ed samples.
In an exogenously stratified sample, selection occurs solely on the X in the
sense that the sample distribution of X does not converge to the population
distribution of X even as the sample size is increased. This may occur because
data are systematically missing for X in certain regions of the support, or more
generally because some subsets of the support of X are oversampled. However,
conditional on X, the sample distribution of (Y, 6 ]X) converges to the population
distribution. By virtue of the assumed exogeneity of X, such a sampling scheme
creates no special econometric problems.
In an endogenously stratified sample, selection occurs on (Y, 8) (and also
possibly on the X), and the sampling rule is such that the sample distribution of
(Y, S) does not converge to the population distribution F( Y, 8) (conditional or
unconditional on X). This can occur because data are missing for certain values
of Y or 6 (or both), or because some subsets of the support of these random
variables are oversampled. The special case of an endogenously stratified sample
in which, conditional on (Y, S), the population density of X characterizes the
data, i.e.

h(X,Y,S)=fgfy), (1.3.4)
7

is termed choice based sampling in the literature. [See McFadden in Volume II or


the excellent survey article by Manski and McFadden (1981).‘] In a general
endogenously stratified sample, (1.3.4) need not characterize the density of the
data produced by an infinite repetition of the sampling rule. Moreover, in both
choice based and more general endogenously stratified samples, the sample
distribution of X depends on the parameters of the conditional distribution of
(Y, 8) given X so, as a consequence of the sampling rules, X is no longer

‘Strictly speaking, the choice based sampling literature focuses on a model in which Y is integrated
out of the model so that 6 and X are the relevant random variables.
Ch. 32: L&or Econometrics 1929

exogenous in such samples, and its distribution is informative on the structural


parameters of the model.
Truncated and censored samples are special cases of a general stratified sample.
A truncated sample is produced from a general stratified sample for which the
sampling weight for the event (Y, S, X) 4 A, is identically zero. In a censored
sample, the sampling weight for the event (Y, 6, X) GEA, is the same as the
population probability of the event.
Note that in a truncated sample, observed Y may or may not be a truncated
random variable. For example, if A, only restricts 6, and 6 does not restrict the
support of Y, observed Y is a censored random variable. On the other hand, if A,
restricts the support of Y, observed Y is a truncated random variable. Similarly in
a censored sample, Y may or may not be censored. For example, if A, is defined
only by a restriction on values that 6 can assume, and 6 does not restrict the
support of Y, observed Y is censored. If A, is defined by a restriction on the
support of Y, observed Y is truncated even though the sample is censored. An
unfortunate and sometimes confusing nomenclature thus appears in the literature.
The concepts of censored and truncated random variables are to be carefully
distinguished from the concepts of censored and truncated random samples.
Truncated and censored sample selection rule (1.3.3) is essentially identical to
the selection rule (1.2.6) (augmented to include X in the manner suggested at the
end of subsection 1.2). Thus the econometric analysis of models generated by
rules such as (1.2.6) can be applied without modification to the analysis of models
estimated on truncated and censored samples. The same can be said of the
econometric analysis of models fit on all stratified samples for which the sampling
rule can be expressed as some restriction on the support of (Y, Z, 6, X). In the
recent research in labor econometrics, all of the sample selection rules considered
can be written in this form, and an analysis based on samples generated by
(augmented) versions of (1.2.6) captures the essence of the recent literature.8

2. Estimation

The conventional approach to estimating the parameters of index function models


postulates specific functional forms for f( y, z) or f(y, z IX) and estimates the
parameters of these densities by the method of maximum likelihood or by the
method of moments. Pearson (1901) invoked a normality assumption in his
original work on index function models and this assumption is still often used in

‘We note, however, that it is possible to construct examples of stratified sample selection rules that
cannot be cast in this format. For example, selection rules that weight various strata in different
(nonzero) proportions than the population proportions cannot be cast in the form of selection rule
(1.2.6).
1930 J. J. Heckman and T. E. MaCur&

recent work in labor econometrics. The normality assumption has come under
attack in the recent literature because when implications of it have been subject to
empirical test they have often been rejected.
It is essential to separate conceptual ideas that are valid for any index function
model from results special to the normal model. Most of the conceptual frame-
work underlying the normal index model is valid in a general nonnormal setting.
In this section we focus on general ideas and refer the reader to specific papers in
the literature where relevant details of normal models are presented.
For two reasons we do not discuss estimation of index function models by the
method of maximum likelihood. First, once the appropriate densities are derived,
there is little to say about the method beyond what already appears in the
literature. [See Amemiya (1985).] We devote attention to the derivation of the
appropriate densities in Section 3. Second, it is our experience that the conditions
required to secure identification of an index function model are more easily
understood when stated in a regression or method of moments framework.
Discussions of identifiability that appeal ‘.o the nonsingularity of an information
matrix have no intuitive appeal and cften degenerate into empty tautologies. For
these reasons we focus attention on regression and method of moments proce-
dures.

2.1. Regression function characterizations

We begin by presenting a regression function characterization of the econometric


problems encountered in the analysis of data collected from truncated, censored
and stratified samples and models with truncated and censored random variables.
We start with a simple two equation linear regression specification for the
underlying index functions and derive the conditional expectations of the ob-
served counterparts of the index variables. More elaborate models are then
developed. We next present several procedures for estimating the parameters of
the regression specifications.

2. I. I. A prototypical regression specijication

A special case of the index function framework set out in Section 1 writes Y and
Z as scalar random variables which are assumed to be linear functions of a
common set of exogenous variables X and unobservables U and V respectively.’

‘)By exogenous variables we mean that X is observed and is distributed independently of (U, V) and
that the parameters of the distribution of X are not functions of the parameters (fi, y) or the
parameters of the distribution of (U, V).
1931

y=xp+u, (2.14
z=xy+v, (2.1.2)

where (fi, y) is a pair of suitably dimensioned parameter vectors, and Y is


observed only if Z E O,, a proper subset of the support of Z. For expositional
convenience we initially assume that the sample selection rule depends only on
the value of Z and not directly on Y. In terms of the notation of Section 1, we
begin by considering a case in which Y is observed if (Y, Z) E L’i where s2, is a
subset of the support of (Y,Z) defined by fit= {(Y,Z)l-COIYICO, ZE@~}.
For the moment, we also restrict attention to a two-state model. State 1 occurs if
Z E 0, and state 0 is observed if Z 65 0,. We later generalize the analysis to
consider inclusion rules that depend explicitly on Y and we also consider
multi-state models.
The joint density of (U, V), denoted by f( U, o), depends on parameters 4 and
may depend on the exogenous variables X. Since elements of /3, y, and J/ may be
zero, there is no loss of generality in assuming that a common X vector enters
(2.1.1), (2.1.2) and the density of (U, V).
As in Section 1, we define the indicator function

if ZEOt;
otherwise.

In a censored regression model in which Y is observed only if 8 = 1, we define


Y * = Y if S = 1 and use the convention that Y * = 0 if S = 0. In shorthand
notation

The conditional expectation of Y given 8 = 1 and X is

E(YlS=l,X)=xp+M, (2.1.3)

where

M=M(Xy,J,)=E(U16=1,X),

is the conditional expectation of U given that X and Z E 0,. If the disturbance U


is independent of I’, M = 0. If the disturbances are not independent, M is in
general a nontrivial function of X and the parameters of the model (y, 4).
Note that since Y * = SY, by the law of iterated expectations

E(Y*~X)=E(Y*~6=O,X)Pr(G=O~X)+E(Y*~6=1,X)Pr(G=lIX)
= (Xfi+M)Pr(S=llX). (2.1.4)
1932 J. J. Heckmm und T. E. MaCurdy

Applying the analysis of Section 1, the conditional distribution of U given X


and 2~0~ is

~~Jh z - Xy)dz
f(UlZEO,, x) = (2.1.5)
Pl ’

where P, = Pr( Z E 0, IX) is the probability that 6 = 1 given X. P, is defined as

sf,(
0,
z - Xy)dz, (2.1.6)

where f,,( .) denotes the marginal density of V. Hence,

(2.1.7)

A regression of Y on X using a sample of observations restricted to have 6 = 1


omits the term M from the regression function (2.1.3), and familiar specification
bias error arguments apply.
For example, consider a variable X, that appears in both equations (so the jth
coefficients of /3 and y are nonzero). A regression of Y on X fit on samples
restricted to satisfy 6 = 1 that does not include M as a regressor produces
coefficients that do not converge to /3. Letting “ *” denote the OLS coefficient,

where L,, is the probability limit of the coefficient of Xj in a projection of M on


X.” Note t/hat if a variable X, that does not appear in (2.1.1) is introduced into a
least squares equation that omits M, the least squares coefficient converges to

plim Sk= LMxk,

so X, may proxy M.
The essential feature of both examples is that in samples selected so that 6 = 1,
X is no longer exogenous with respect to the disturbance term lJ* ( = SU)

“‘It is not the case that L MX, = (aM/aX,), although the approximation may be very close. See
Byron and Rera (1983).
C/l. 37: Labor Econometrics 1933

although it is defined to be exogenous with respect to U. The distribution of U *


depends on X (see the expression for M below (2.1.3)). As X is varied, the mean
of the distribution of U * is changed. Estimated regression coefficients combine
the desired ceteris paribus effect of X on Y (holding U * fixed) with the effect of
changes in X on the mean of U *.
Characterizing a sample as a subsample from a larger random sample gener-
ated by having Z E 0, encompasses two distinct ideas that are sometimes
confused in the literature. The first idea is that of self-selection. For example, in a
simple model of labor supply an individual chooses either to work or not to work.
An index function Z representing the difference between the utility of working
and of not working can be used to characterize this decision. From an initial
random sample, a sample of workers is not random since Z 2 0 for each worker.
The second idea is a more general concept-that of sample selection- which
includes the first idea as a special case. From a simple random sample, some rule
is used to generate the sample used in an empirical analysis. These rules may or
may not be the consequences of choices made by the individuals being studied.
Econometric solutions to the general sample selection bias problem and the
self-selection bias problem are identical. Both the early work on female labor
supply and the later analysis of “experimental data” generated from stratified
samples sought to eliminate the effects of sample selection bias on estimated
structural labor supply and earnings functions.
It has been our experience that many statisticians and some econometricians
find these ideas quite alien. From the context-free view of mathematical statistics,
it seems odd to define a sample of workers as a selected sample if the object of
the empirical analysis is to estimate hours of work equations “After all,” the
argument is sometimes made, “nonworkers give us no information about the
determinants of working hours.”
This view ignores the fact that meaningful behavioral theories postulate a
common decision process used by all agents (e.g. utility maximization). In
neoclassical labor supply theory all agents are assumed to possess preference
orderings over goods and leisure. Some agents choose not to work, but non-
workers still possess well-defined preference functions. Equations like (2.1.1) are
defined for all agents in the population and it is the estimation of the parameters
of the population distribution of preferences that is the goal of structural econo-
metric analysis. Estimating functions on samples selected on the basis of choices
biases the estimates of the parameters of the distribution of population prefer-
ences unless explicit account is taken of the sample selection rule in the estima-
tion procedure.”

“Many statisticians implicitly adopt the extreme view that nonworkers come from a different
population than workers and that there is no commonality of decision processes and/or parameter
values in the two populations. In some contexts (e.g. in a single cross section) these two views are
empirically indistinguishable. See the discussion of recoverability in Heckman and Singer (1986).
1934 J. J. Heckman and T. E. MaCurdy

2.1.2. SpeciJcation for selection corrections

In order to make the preceding theory empirically operational it is necessary to


know M (up to a vector of estimable parameters). One way to acquire this
information is to postulate a specific functional form for it directly. Doing so
makes clear that conventional regression corrections for sample selection bias
depend critically on assumptions about the correct functional form of the
underlying regression eq. (2.1.1) and the functional form of M.
The second and more commonly utilized approach used to generate M pos-
tulates specific functional forms for the density of (U, V) and derives the
conditional expectation of (I given S and X. Since in practice this density is
usually unknown, it is not obvious that this route for selecting M is any less ad
hoc than the first.
One commonly utilized assumption postulates a linear regression relationship
for the conditional expectation of U given V:

E( CT1I/, x) = TV, (2.1.8)

where 7 is a regression coefficient. For example, (2.1.8) is generated if U and V


are bivariate normal random variables and X is exogenous with respect to U and
V. Many other joint densities for (U, V) also yield linear representation (2.1.8).
[See Kagan, Linnik and Rao (1973)].
Equation (2.1.8) implies that the selection term M can be written as

M = E(U)6 =I, X) = nY(I’16 =l, x). (2.1.9)

Knowledge of the marginal distribution of V determines the functional form of


the selection bias term.
Letting f,,(u) denote the marginal density of V, it follows from the analysis of
Section 1 that

hy?!ub)d~
E(V(6=1,X)= p ) (2.1.10)
1

where the set r,= {V: V+xy~@,}, and

P,=Prob(ZtOllX)=Prob(V~~I]X)=~~f,,(U)dli. (2.1 .ll)

One commonly used specification of 0, writes 0, = { Z: Z 2 0}, so r, = { I/:


1935

1/k - Xy}. In this case (2.1.10) and (2.1.11) become

E(Vl6=1,X)=E(VIV2-Xy,X)=
/OOxvufr,wu
p ) (2.1.12)
1

and

P,=Prob(S=ljX)=/W$a)du=l-F,.(-xy), (2.1.13)
Y

respectively, where F,( .) is the cumulative distribution function of I/. Since Z is


not directly observed, it is permissible to arbitrarily normalize the variance of the
disturbance of the selection rule equation because division by a positive constant
does not change the probability content of the inequality that defines F,. Thus,
E( Ul6 = 1, X) is the same if one replaces f,,(u) with f,,( ou)/u and reinterprets F,
as {aV: V+(Xy*)/aE@,} using any u > 0 where y * = uy. The normalization
for E(V*) that we adopt depends on the particular distribution under consider-
ation.
Numerous choices for h,(u) have been advanced in the literature yielding a
wide variety of functional forms for (2.1.12). Table 1 presents various specifica-
tions of f,(u) and the implied specifications for E(V16 = 1, X) = E(V’] 1/>
- Xy, X) proposed in work by Heckman (1976b, 1979), Goldberger (1983) Lee
(1982), and Olson (1980). Substituting the formulae for the truncated means
presented in the third column of the table into relation (2.1.4) produces an array
of useful expressions for the sample selection term M. All of the functions
appearing in these formulae-including the gamma, the incomplete gamma, and
the distribution functions - are available on most computers.
Inserting any of these expressions for M into eqs. (2.1.3) or (2.1.4) yields an
explicit specification for the regression relation associated with Y (or Y *) given
the selection rule generating the data. In order to generate (2.1.9) one requires a
formula for the probability that 6 =l given X to complete the specification for
E( Y *). Formula (2.1.13) gives the specification of this probability in terms of the
cumulative distribution function of V.
In place of the linear conditional expectation (2.1.8), Lee (1982) suggests a
more general nonlinear conditional expectation of U given I/. Drawing on
well-known results in the statistics literature, Lee suggests application of Edge-
worth-type expansions. For the bivariate Gram-Charlier series expansion, the
conditional expectation of ZJ given V and exogenous X is

E(U( v, x) = pv+ -B(V) (2.1.14)


A(V) ’
Table 1
Truncated mean formulae for selected zero-mean distributions.

Distribution
(Mean, Variance, Density Truncated mea&
Sign of Skewne$ f”(U) E(uJu 2 - Xy)

Normal
(0,l.O)
Student’s th

Chi-square’

(0,2n, +)

for-niu<ce ~)-nC(Z,~)]~[r(f)(l-F;(-Xv))]

Logistic for n 2 X,
e” _
[lnF,.(Xy)~XyF,(-XY)I/F,,(XY)
(1+ eu)2

Laplace l-xy for xy 5 0


(0,2,0) fe-W (1 + Xy)/(2eXY - 1) for Xy > 0

Uniform
5
(O,f,O) l/\/iz for IuI Ifi (0 - XY)/2 for IXYI IA
4

Log-normaId
p-‘/Z(,w2)+ U)-le-lln(e"*+u)l*/2 2
(O,e’-e,+)
e1,2 @(1-ln(e"2- XY)) _1
for u z - e1j2 for Xy 5 &I2
P
2
. @(-ln(e 1’2 - xy)) 2
[ 1 Q.
.Y
a The function F,(a) = /” ,f, ( u) d u in these formulae is the cumulative distribution function. .h
‘The parameter n denotes degrees of freedom. For Student’s f, it is assumed that n > 2. The function T(a) = /Ty”mlem-‘dy is the 3
gamma function. ci
‘The function G(a, b) = ffy”- ‘e-y dy is the incomplete gamma function. 4
d The function @( .) represents the standardized normal cumulative distribution function. 9
‘Skewness is defined as mean minus the median.
Ch. 32: Luhor Econometrics 1937

with

0’) =l+ [~034@‘)/61+ [PM -3I(L,(W2%

W’) = L- w&W’)/~)+[P~ - w,041(~3(~)/6)~


where p is the correlation coefficient of U and V, pij = E(U’V’) are cross
moments of U and V, and the functions A,(V) = V2 - 1, A,(V) = V3 - 3V, and
A 4( V) = V4 - 6V2 + 3 are Hermite polynomials. Assuming the event V2 - Xy
determines whether or not Y is observed. the selection term is

M=E(UIV2-Xy,X)= (2.1.15)
1-F,(-xy) .

This expression does not have a simple analytical solution except in very special
cases. Lee (1982) invokes the assumption that V is a standard normal random
variable, in which case A(V) =l (since p 03 = po4 - 3 = 0) and the conditional
mean is

E(Ul v> = J’-P + (v’ - lh2/2) + (v3 - 3v)(1113 - 3~)/6. (2.1.16)

For this specification, (2.1.15) reduces to

(2.1.17)

where $I( .) and @( .) are, respectively, the density function and the cumulative
distribution functions associated with a standard normal distribution, and rt, r2,
and r3 are parameters.12

‘*The requirement that V is normally distributed is not as restrictive as it may first appear. In
particular, suppose that the distribution of V, F,( .) is not normal. Defining J( .) as the transforma-
tion W1 0 F,, , the random variable J(V) is normally distributed with mean zero and a variance equal
to one. Define a new unobserved dependent variable Z, by the equation

z,=-J(-Xy)+J(V). (*>
Since J( .) is monotonic, the events Z, > 0 and Z 2 0 are equivalent. All the analysis in the text
continues to apply if eq. (*) is substituted in place of eq. (2.1.2) and the quantities Xy and V are
replaced everywhere by - J( - Xy) and J(V), respectively. Notice that expression (2.1.17) for M
obtained by replacing Xy by - J( - X-r) does not arise by making a change of variables from V to
J(V) in performing the integration appearing in (2.1.15). Thus, (2.1.17) does not arise from a
Gram-Charlier expansion of the bivariate density for U and nonnormal V; instead, it is derived from
a Gram-Charlier expansion applied to the bivariate density of U and normal J(V).
1938 .I. J. Heckmun and T. E. MaCurdy

An obvious generalization of (2.1.8) or (2.1.16) assumes that

(2.1.18)
k=l

where the gk( .)‘s are known functions. The functional form implied for the
selection term is

M=E(U1b’--x)‘,x)= f TkE(gkll/k-xY,x)
k=l

= 5 T,rnk(X). (2.1.19)
k=l

Specifying a particular functional form for the g,‘s and the marginal distribution
for V produces an entire class of sample selection corrections that includes Lee’s
procedure as a special case.
Cosslett (1984) presents a more robust procedure that can be cast in the format
of eq. (2.1.19). With his methods it is possibie to consistently estimate the
distribution of V, the functions mk, the parameters TV, and K the number of
terms in the expansion. In independent work Gallant and Nychka (1984) present
a more robust procedure for correcting models for sample selection bias assuming
that the joint density of (U, V) is twice continuously differentiable. Their analysis
does not require specifications like (2.1.8), (2.1.14) or (2.1.18) or prior specifica-
tion of the distribution of V.

2.1.3. Multi-state generalizations

Among many possible generalizations of the preceding analysis, one of the most
empirically fruitful considers the situation in which the dependent variable Y is
generated by a different linear equation for each state of the world. This model
includes the “switching regression” model of Quandt (1958, 1972). The occur-
rence of a particular state of the world results from Z falling into one of the
mutually exclusive and exhaustive subsets of 0, O,, i = 0,. . . , I. The event Z E 0,
signals the occurrence of the ith state of the world. We also suppose that Y is
observed in states i = 1,. . . , I and is not observed in state i = 0. In state i > 0, the
equation for Y is

r=xp,+q, (2.1.20)
Ch. .{7: Lcrhor Econometrics 1939

where the U,‘s are error terms with E(U,) = 0. Define U = (U,, . . . , U,), and let
f,,,(lJ, V) be the joint density of U and the disturbance V of the equation
determining Z. The value of the discrete dependent variable

if Z E Oj,
(2.1.21)
otherwise,

records whether or not state i occurs. In this notation the equation determining
the censored version of Y may be written as

y*= i: 6,(xp,+uJ, (2.1.22)

where we continue to adopt the convention that Y * = 0 when Y is not observed


(i.e. when Z E 0,).
It is useful to distinguish two cases of this model. In the first case all states of
the world are observed by the analyst, so that the values of the &‘s are known by
the econometrician for all i. In the second case not all of the &‘s are known by
the econometrician. The analysis of the first case closely parallels the analysis
presented for the simple two-state model.
For the first case, the regression function for observed Y given 8, = 1, X, and
i # 0, is

E(Y16, =l, x) = xpi+ M,, (2.1.23)

with

J -mlB,Uifu,u(U,,Z-XXY)dZdUi
(”
M;-E(U,IZEOi,X)= 2 (2.1.24)
pi

where f,,,( . , -) denotes the joint density of U, and I’, and P, = Prob( Z E Oil X) is
the probability that state i occurs.
Paralleling the analysis of Section 2.1.2, one can develop explicit specifications
for each selection bias correction term M, by using formulae such as (2.1.9,
(2.1.14) or (2.1.18). With the convention that Y* = 0 when 6, =l, the regression
functions (2.1.23) can be combined into a single relation

E(Y*l6,A,...,&,X) = f: &(XP,+M). (2.1.25)


i=l

In the second case considered here not all states of the world are observed by
the econometrician. It often happens that it is known if Y is observed, and the
1940 J. .I. Heckmun und T. E. MaCurdy

value of Y is known if it is observed, but it is not kno;vn which of a number of


possible states has occurred. In such a case, one might observe whether 6, = 1 or
8, = 0 (i.e. whether cf_,6; = 0 or c:= ,S, = l), but not individual values of the 8,‘s
for i = 1,. . . , I. Examples of such situations are given in our discussion of labor
supply presented in Section 3.3.
To determine the appropriate regression equation for Y in this second case, it is
necessary to compute the expected value of Y given by (2.1.22) conditional on
8, = 0 and X. This expectation is

E(Yl6,=O,X)= i (xp;+M;)Pi/(l-P,), (2.1.26)


i=l

where P,= Prob( Z E @ilX).‘3 Relation (2.1. 26) 1‘s the regression of Y on X for
the case in which Y is observed but the particular state occupied by an
observation is not observed.
Using (2.1.22), and recalling that Y * = Y(l - 8,) is a censored random variable,
the regression of Y * on X is

E(Y*lX)= i (xp;+M,)P,. (2.1.27)


I=1

If Y is observed for all states of the world, then Y * = Y, 6, = 0, and (2.1.26) and
(2.1.27) are identical because the set 0, is the null set so that PO= 0 and
C~=,P,=l.i

2.1.4. Generalization of the regression framework

Extensions of the basic framework presented above provide a rich structure for
analyzing a wide variety of problems in labor econometrics. We briefly consider
three useful generalizations.
The first relaxes the linearity assumption maintained in the specification of the
equations determining the dependent variables Y and Z. In eqs. (2.1.1) and (2.1.2)
substitute h y( X, p) for Xp and h,( X, y) for Xy where h y(. , .) and

131n order to obtain (2.1.26) we use the fact that the 0,‘s are nonintersecting sets so that

,~,+,=l,X) =Prob( &=l~&,=l,X)

=Prob
(
h z(. , .) are known nonlinear functions of exogenous variables and parameters.
Modifying the preceding analysis and formulae to accommodate this change in
specification only requires replacing the quantities Xb and Xy everywhere by the
functions h r and h,. A completely analogous modification of the multi-state
model introduces nonlinear specifications for the conditional expectation of Y in
the various states.
A second generalization extends the preceding framework of Sections 2.1.1-2.1.3
by interpreting Y, Z and the errors U and V as vectors. This extension enables the
analyst to consider a multiplicity of behavioral functions as well as a broad range
of sampling rules. No conceptual problems are raised by this generalization but
severe computational problems must be faced. Now the sets 0, are multidimen-
sional. Tallis (1961) derives the conditional means relevant for the linear multi-
variate normal model, but it remains a challenge to find other multivariate
specifications that yield tractable analytical results. Moreover, work on estimating
the multivariate normal model has just begun [e.g. see Catsiapsis and Robinson
(1982)]. A current area of research is the development of computationally
tractable specifications for the means of the disturbance vector lJ conditional on
the occurrence of alternative states of the world.
A third generalization allows the sample selection rule to depend directly on
realized values of Y. For this case, the sets Oi are replaced by the sets Oi where
(Y, Z) E fij designates the occupation of state i. The integrals in the preceding
formulae are now defined over the Oi. In place of the expression for the selection
term M in (2.1.7), use the more general formula

ww, z>E 91, x>= /L&J- XP)fu,bPI


- XP,z - Xy)dzdy
9

where

P, = / 12,fuo(~
- W, z - Xy)dzd_v,

is the probability that S, = 1 given X. This formula specializes to the expression


(2.1.7) for M when 9, = {(Y, Z): - CO5 Y< 60 and Z E O,}, i.e. when Z alone
determines whether state 1 occurs.

2.1.5. Methods for estimating the regression specifications

We next consider estimating the regression specifications associated with the


elementary two-state model (2.1.1) and (2.1.2). This simple specification is by far
the most widely used model encountered in the literature. Estimation procedures
1942 J. J. Heckman and T. E. MaCur+

available for this two-state model can be directly generalized to more complicated
models.
For the two-state model, expression (2.1.3) implies that the regression equation
for Y conditional on X and 6 = 1 is given by
Y=Xp+M+e,

where e = U - E( U] S = 1, X) is a disturbance with E( e 1S = 1, X) = 0. Choosing


specification (2.1.9) (2.1.17) or one based on (2.1.19) for the selection term M
leads to
M=mr with m=m(Xu,$>, (2.1.28)
where the II, are unknown parameters of the density function for V. If, for
example, specification (2.1.9) is chosen, m( Xy, I/J) = E( VI V 2 - Xy) which can
be any one of the truncated mean formulae presented in Table 1. If, on the other
hand, specification (2.1.19) is chosen, r and m are to be interpreted as vectors
with r’= (ri,..., TV) and m = (m,, . . . , mK). The regression equation for Y is

Y=Xp+mr+e. (2.1.29)
The implied regression equation for the censored dependent variable Y * = SY is

Y*=(X~+mr)(l-~O(-Xy;~))+~, (2.1.30)
where E is a disturbance with E(EIX) = 0 and we now make explicit the
dependence of F, on #.
The appropriate procedure for estimating the parameters of regression eqs.
(2.1.29) and (2.1.30) depends on the sampling plan that generates the available
data. It is important to distinguish between two types of samples discussed in
Section 1: truncated samples which include data on Y and X only for observa-
tions for which the value of the dependent variable Y is actually known (i.e.
where Z 2 0 for the model under consideration here), and censored samples
which include data on Y * and X from a simple random sample of 6, X and Y *.
For a truncated sample, nonlinear least squares applied to regression eq.
(2.1.29) can be used to estimate the coefficients of p and r and the parameters y
and # which enter this equation through the function m. More specifically,
defining the function g and the parameter vector 8 as g( X, 13)= XP + m( Xy, I/J)T
and 0’ = (p’, r’, y’, #‘), eq. (2.1.29) can be written as

Y=g(X,d)+e. (2.1.31)

Since the disturbance e has a zero mean conditional on X and 6 = 1 and is


distributed independently across the observations in the truncated sample, under
standard conditions [see Amemiya (1985)] nonlinear least squares estimators of
the parameters of this equation are both consistent and asymptotically normally
distributed.
Ch. 32: Lahor Econometrics 1943

In general, the disturbance e is heteroscedastic, and the functional form of the


heteroscedasticity is unknown unless the joint density f,, is specified. As a
consequence, when calculating the large-sample covariance matrix of 8, it is
necessary to use methods proposed by Eicker (1963, 1967) and White (1981) to
consistently estimate this covariance matrix in the presence of arbitrary hetero-
scedasticity. The literature demonstrates that the estimator 8 is approximately
normally distributed in large samples with the true value 8 as its mean and a
variance-covariance matrix given by HP ‘RH-’ with

(2.1.32)

where N is the size of the truncated sample, a6,/&3], denotes the gradient
vector of g for the n th observation evaluated at 8, and d, symbolizes the least
square residual for observation n. Thus

8 - N(B, H-‘RH-‘). (2.1.33)

For censored samples, two regression methods are available for estimating the
parameters p, r, y, and 4. First, one can apply the nonlinear least squares
procedure just described to estimate regression eq. (2.1.30). In particular, reinter-
preting the function g as g( X, 8) = [X/3 + m( Xy, $)r](l- F,( - Xy; $)), it is
straightforward to write eq. (2.1.30) in the form of an equation analogous to
(2.1.31) with Y* and E replacing Y and e. Since the disturbance E has a zero
mean conditional on X and is distributed independently across the observations
making up the censored sample, under standard regularity conditions nonlinear
least squares applied to this equation yields a consistent estimator 8 with a
large-sample normal distribution. To account for potential heteroscedasticity
compute the asymptotic variance-covariance matrix of 8 using the formula in
(2.1.33) with the matrices H and R calculated by summing over the N *
observations of the censored sample.
A second type of regression procedure can be implemented on censored
samples. A two-step procedure can be applied to estimate the equation for Y
given by (2.1.29). In the first step, obtain consistent estimates of the parameters y
and J/ from a discrete choice analysis which estimates the parameters of P,. From
these estimates it is possible to consistently estimate m (or the variables in the
vector m). More specifically, define 0; = (y’, 1c/‘) as a parameter vector which
uniquely determines m as a function of X. The log likelihood function for the
independently distributed discrete variables S,, given X,,, n = 1,. . _, N * is

E[6,ln(l-F,,(-X,y;J/))+(1-6,)ln(F,,(-X,y;~))l.
il = 1
(2.1.34)
1944 J. J. Heckman und T. E. MuCurdy

Under general conditions [See Amemiya (1985) for one statement of these
conditions], maximum likelihood estimators of y and 1c/are consistent, and with
maximum likelihood estimates fiZ one can construct Cz, = m( X,7,I/J) for each
observation. In step two of the proposed estimation procedure, replace the
unobserved variable m in regression eq. (2.1.29) by its constructed counterpart A
and apply linear least-squares to the resulting equation using only data from the
subsample in which Y and X are observed. Provided that the model is identified,
the second step produces estimators for the parameters S; = (p’, 7’) that are both
consistent and asymptotically normally distributed.
When calculating the appropriate large-sample covariance matrix for least
squares estimator 8i, one must account for the fact that in general the dis-
turbances of the regression equation are heteroscedastic and that the variables fi
are estimated quantities. A consistent estimator for the covariance matrix which
accounts for both of these features is given by

C = Q?QzQi’+ Q?Q3Qz,Q;Q?, (2.1.35)

where Q4 is the covariance matrix for I!& estimated by maximum likelihood


[minus the inverse of the Hessian matrix of (2.1.34)], and the matrices Q,, Q2,
and Q, are defined by

Q, = 5 wnw,‘,
n= 1
Q2= ;
n=l
w,,w,@-2, and Qx= t w&I,;
n=l

(2.1.36)

where the row vector wn = (X,, fi,)’ denotes the regressors for the n th observa-
tion, the variable e”, symbolizes the least-squares residual, and the row vector
de,,/&‘; ]e is the gradient of the function e, = Y,, - X,/3 - m,,a_with respect to y
and + evaluated at the maximum likelihood estimates 7 and 4 and at the least
squares estimates fl and 7”-i.e.

14To derive the expression for the matrix C given by (2.1.35) we use the following result. Let
L,, = L( 8,. X,,) denote the n th observation on the gradient of the likelihood function (2.1.34) with
respect to B,, with this gradient viewed as a function of the data and the true value of 8,; and let w,,
and eon be KJ,, and e, evaluated at the true parameter values. Then E( w,J,,e,l,,Lk 18, = 1, X,) =
w,,,E(~,,,,I~,=~,X,)L’,(~,=~,X,)=O.
Ch. 32: I.uhor Econometrics 1945

The large-sample distribution for the two-step estimator is thus

8, 7 N&C). (2.1.38)

2.2. Dummy endogenous variable models

One specialization of the general model presented in Section 2.1 is of special


importance in labor economics. The multi-state equation system (2.1.20)-(2.1.22)
is at the heart of a variety of models of the impact of unions, training,
occupational choice, schooling, the choice of region of residence and the choice of
industry on wages. These models have attracted considerable attention in the
recent literature.
This section considers certain aspects of model formulation for this class of
models. Simple consistent estimators are presented for an empirically interesting
subclass of these models. These estimators require fewer assumptions than are
required for distribution dependent maximum likelihood methods or for the
sample selection bias corrections (M functions) discussed in Section 2.1.
In order to focus on essential ideas, we consider a two-equation, two-state
model with a single scalar dummy right-hand side variable that can assume two
values. Y is assumed to be observed in both states so that we also abstract from
censoring. Generalization of this model to the vector case is performed in
Heckman (1976a, 1978, Appendix), Schmidt (1981), and Lee (1981).

2.2. I. Specification of a two-equation system

Two versions of the dummy endogenous variable model are commonly confused
in the literature: fixed coefficient models and random coefficient models. These
specifications should be carefully distinguished because different assumptions are
required to consistently estimate the parameters of these two distinct models. The
fixed coefficient model requires fewer assumptions.
In the fixed coefficient model

r=xp+th+u, (2.2.1)

z=xy+v, (2.2.2)

where

if Z20,
otherwise,
1946 J. J. Necknmn and T. E. MaCur&

U and V are mean zero random disturbances, and X is exogenous with respect to
U. Simultaneous equation bias is present in (2.2.1) when lJ is correlated with S.
In the random coefficient model the effect of 6 on Y (holding U fixed) varies in
the population. In place of (2.2.1) we write

Y=xp+S(a+&)+U, (2.2.3)

where E is a mean zero error term. l5 E q uation (2.2.2) is unchanged except now V
may be correlated with E as well as U. The response to 6 = 7 differs in the
population, with successively sampled observations assumed to be random draws
from a common distribution for (17, E, V). In this model X is assumed to be
exogenous with respect to (U, E). Regrouping terms, specification (2.2.3) may be
rewritten as

Y= xp+6a+(U+&). (2.2.4)

Unless 6 is uncorrelated with E (which occurs in some interesting economic


models - see Section 3.2) the expectation of the composite error term U + ~8 in
(2.2.4) is nonzero because E(b) # 0. This aspect of the random coefficient model
makes its econometric analysis fundamentally different from the econometric
analysis of the fixed coefficient model. Simultaneous equations bias is present in
the random coefficient model if the composite error term in (2.2.4) is correlated
with 6.
Both the random coefficient model and the fixed coefficient model are special
cases of the multi-state “switching” model presented in Section 2.1.3. Rewriting
random coefficient specification (2.2.3) as

Y=s(CY+Xp+U+&)+(1-8)(Xp+U), (2.2.5)

this equation is of the form of multi-state eq. (2.1.22). The equivalence of (2.2.5)
and (2.1.22) follows directly from specializing the multi-state framework so that:
(i) 6,~ 0 (so th at there is no censoring and Y = Y *); (ii) I = 2 (which along with
(i) implies that th ere are two states); (iii) 6 = 1 indicates the occurrence of state 1
and the events 6, = 1 and 6, = 0 (with 1 - 6 = 1 indicating the realization of state
2); and (iv) X& = Xfi, Vi = U, Xfi, = X/3+ a, and U, = U + E. In this notation eq.
(2.2.3) may be written as

Y= xp,+~x(p,-p2)+u1+(ui-u2)8. (2.2.6)

One empirically fruitful generalization of this model relaxes (iv) by letting both
slope and intercept coefficients differ in the two regimes. Equation (2.2.6) with

“Individuals may or may not know their own value of E. “Randomness” as used here refers to the
cconometrician’s ignorance of E.
C'h.31: L&or Econometrm 1947

condition (iv) modified so that PI and & are freely specified can also be used to
represent this generalization.
Fixed coefficient specification (2.2.1) specializes the random coefficient model
further by setting E = 0 so U, - U, = 0 in (2.2.6). In the fixed coefficient model,
U, = lJ, so that the unobservables in the state specific eqs. (2.1.20) are identical in
each state. Examples of economic models which produce this specification are
given below in Section 3.2.
The random coefficient and the fixed coefficient models are sometimes confused
in the literature. For example, recent research on the union effects on wage rates
has been unclear about the distinction [e.g. see Freeman (1984)]. Many of the
cross section estimates of the union impact on wage rates have been produced
from the random coefficient model [e.g. see Lee (1978)] whereas most of the recent
longitudinal estimates are based on a fixed coefficient model, or a model that can
be transformed into that format [e.g. see Chamberlain (1982)]. Estimates from
these two data sources are not directly comparable because they are based on
different model specifications.16
Before we consider methods for estimating both models, we mention one aspect
of model formulation that has led to considerable confusion in the recent
literature. Consider an extension of equation system (2.2.1)-(2.2.2) in which
dummy variables appear on the right-hand side of each equation

Y=Xfi+a,&+U, (2.2.7a)

2 = xy + Q1*8r+ V, (2.2.7b)

where

if YTO,
otherwise,

and

if 22 0,
otherwise.

Without imposing further restrictions on the support of the random variables


(U, V), this model makes no statistical sense unless

ala2 = 0. (2.2.8)

[See Heckman (1978) or Schmidt (1981)]. This assumption- termed the “principal

“For further discussion of this point, see Heckman and Robb (1985).
1948 J. J. Heckman and T. E. MaCurdy

assumption” in the literature - rules out contradictions such as the possibility that
Y 2 0 but 6, = 0, or other such contradictions between the signs of the elements
of (Y, 2) and the values assumed by the elements of (a,, 6,).
The principal assumption is a logical requirement that any well-formulated
behavioral model must satisfy. An apparent source of confusion on this point
arises from interpreting (2.2.7) as well-specified behavioral relationships. In the
absence of a precise specification determining the behavioral content of (2.2.7) it
is incomplete. The principal assumption forces the analyst to estimate a well-
specified behavioral and statistical model. This point is developed in the context
of a closely related model in an appendix to this paper.

2.2.2. Estimation of the fixed coeficient model

In this subsection we consider methods for consistently estimating the fixed


coefficient dummy endogenous variable model and examine the identifiability
assumptions that must be invoked in order to recover the parameters of this
model. We do not discuss estimation of discrete choice eq. (2.2.2) and we focus
solely on estimating (2.2.1). An attractive feature of some of the estimators
discussed below is that the parameters of (2.2.1) can be identified even when no
regressor appears in (2.2.2) or when the conditions required to define (2.2.2) as a
conventional discrete choice model are not satisfied. It is sometimes possible to
decouple the estimation of these two equations.

2.2.2.1. Instrumental variable estimation. Equation (2.2.1) is a standard linear


simultaneous equation with S as an endogenous variable. A simple method for
estimating the parameters of this equation is a conventional instrumental variable
procedure. Since E(UIX) = 0, X and functions of X are valid instrumental
variables. If there is at least one variable in X with a nonzero y coefficient in
(2.2.2) such that the variable (or some known transformation of it) is linearly
independent of X included in (2.2.1), then this variable (or its transformation)
can be used as an instrumental variable for 6 in the estimation of (2.2.1).
These conditions for identification are very weak. The functional forms of the
distributions of U or V need not be specified. The variables X (or more precisely
Xy) need not be distributed independently of V so that (2.2.2) is not required to
be a well-specified discrete choice model.
If (2.2.2) is a well-specified discrete choice model, then the elements of X and a
consistent estimator of E( 6 1X) = P( 6 = 11X) constitute an optimal choice for the
instrumental variables according to well-known results in the analysis of nonlin-
ear two-stage least squares [e.g. see Amemiya (1985, Chapter S)]. Choosing X and
simple polynomials in X as instruments can often achieve comparable asymptotic
efficiency. Conventional formulae for the sampling error of instrumental variable
estimators fully apply in this context.
Ch. 32: Labor Econometrics 1949

2.2.2.2. Conditioning on X. The M function regression estimators presented in


Section 2.1 are based on the conditional expectation of Y given X and 8. It is
often possible to consistently estimate the parameters of (2.2.1) using the condi-
tional expectation of Y given only X.
From the specification of (2.2.1) we have

E(YIX) = Xp + aE(GIX). (2.2.9)

Notice that (if X is distributed independently of V)

E@(X)=l-F,(-Xy). (2.2.10)

Given knowledge of the functional form of F,, one can estimate (2.2.9) by
nonlinear least squares. The standard errors for this procedure are given by
(2.1.32) and (2.1.33) where g, in these formulae is defined as g, = Y, - X,*/3 - a(1
- E;(- %Y)).
One benefit of this direct estimation procedure is that the estimator is con-
sistent even if 8 is measured with error because measurements on 6 are never
directly used in the estimation procedure. Notice that the procedure requires
specification of the distribution of V (or at least its estimation). Specification of
the distribution of U or the joint distribution of U and V is not required.
2.2.2.3. Invoking a distributional assumption about U. The coefficients of (2.2.1)
can be identified if some assumptions are made about the distribution of U. NO
assumption need be made about the distribution of V or its stochastic dependence
with U. It is not required to precisely specify discrete choice eq. (2.2.2) or to use
nonlinearities or exclusion restrictions involving exogenous variables which are
utilized in the two estimation strategies just presented. No exogenous variables
need appear in either equation.
If U is normal, (Y and /3 are identified given standard rank conditions even if
no regressor appears in the index function equation determining the dummy
variable (2.2.2). Heckman and Robb (1985) establish that if E(U3) = E(U’) = 0,
which is implied by, but weaker than, assuming symmetry or normality of U, LY
and p are identified even if no regressor appears in the index function (2.2.2). It is
thus possible to estimate (2.2.1) without a regressor in the index function equation
determining 6 or without making any assumption about the marginal distribution
of V provided that stronger assumptions are maintained about the marginal
distribution of U.
In order to see how identification is secured in this case, consider a simplified
version of (2.2.1) with only an intercept and dummy variable 6

y=p,+sa+u. (2.2.11)
1950 J. J. Heckman and T. E. MaCurdy

Assume E( U3) = 0 = E(U5). With observations indexed by n, the method of


moments estimator solves for 6 from the pair of moment equations that equate
sample moments to their population values:

$ 5 [(Yn-r)-&(6n-8)]3=0,
n=l
(2.2.12a)

and

~n~lI(Y.e)-a(s.-s)]5=o. (2.2.12b)

where y and 8 are sample means of Y and 6 respectively. There is only one
consistent root that satisfies both equations. The inconsistent roots of (2.2.12a) do
not converge to the inconsistent roots of (2.2.12b). Choosing a value of a to
minimize a suitably weighted sum of squared discrepancies from (2.2.12a) and
(2.2.12b) (or choosing any other metric) solves the small sample problem that for
any finite N (2.2.12a) and (2.2.12b) cannot be simultaneously satisfied. For proof
of these assertions and discussion of alternative moment conditions on U to
secure identification of the fixed coefficient model, see Heckman and Robb
(1985).

2.2.3. Estimation of the random coeficient model

Many of the robust consistent estimators for the fixed coefficient model are
inconsistent when applied to estimate (Yin the random coefficient model.17 The
reason this is so is that in general the composite error term of (2.2.4) does not
possess a zero conditional (on X) or unconditional mean. More precisely,
E(&jX)# 0 and E(&)# 0 even though E(UIX)= 0 and E(U)= 0.” The
instrumental variable estimator of Section 2.2.2.1 is inconsistent because E( U +
SE1X) # 0 and so X and functions of X are not valid instruments. The nonlinear
least squares estimator of Section 2.2.2.2 that conditions on X is also in general
inconsistent. Instead of (2.2.9) the conditional expectation of Y given X for eq.
(2.2.4) is

E(YJX) = Xfi + cxE@IX)+ E(&IX). (2.2.13)

“In certain problems the coefficient of interest is a + E( E 18= 1). Reparameterizing (2.2.4) to make
this rather than (Yas the parameter of econometric interest effectively converts the random coefficient
model back into a fixed coefficient model when no regressors appear in index function (2.2.2).
‘*However, some of the models presented in Section 3.2 have a zero unconditional mean for 8~.
This can occur when E is unknown at the time an agent makes decisions about 8.
Ch. 32: L.&or Econometrics 1951

Inconsistency of the nonlinear least squares estimator arises because the unob-
served omitted term E( 8~ 1X) is correlated with the regressors in eq. (2.2.9).

2.2.3.1. Selectivity corrected regression estimators. The analysis of Section 2.15


provides two regression methods for estimating the parameters of a random
coefficient model. From eq. (2.2.6), a general specification of this model is

Y=s(xp,+v,)+(l-6)(xp,+u*). (2.2.14)

Relation (2.1.25) for the multi-state model of Section 2.1 implies that the
regression equation for Y on 6 and X is

Y=8(X&+M,)+(l-S)(X&+M,)+e, (2.2.15)

where M, = E(Uil6 = 1, X), M2 = E(U,l6 = 0, X), and e = 6(U, - M,)


+ (1 - a)(& - M2). Using selection specification (2.1.28),

M, = mi7, where miEm,(XY,$), i=1,2, (2.2.16)

where the functional forms of the elements of the row vectors m 1 and m 2 depend
on the particular specification chosen from Section 2.1.2.19 Substituting (2.2.16)
into (2.2.15), the regression equation for Y becomes

Y=X:P,+X2*P2+m:r1+m:r2+e, (2.2.17)

where

X:=6X, X,*=(1--6)X, m:=Sm, and rnz = (1- fS)m,.

Given familiar regularity conditions, the nonlinear least-squares estimator of


(2.2.17) is consistent and approximately distributed according to the large-sample
normal distribution given by (2.1.33), where the matrices H and R are defined by
(2.1.32) with g, in these formulae given by

A second approach adapts the two-step estimation scheme outlined in Section


2.1.5. Using maximum likelihood estimates 8, of the parameter vector 8, =
(y’, +‘)‘, construct estimates of Gin = mi( X,?, $), i = 1,2, for each observation.

“Inspection of eq. (2.2.2) and the process generating 6 reveals that the events 8 =l and 6 = 0
correspond to the conditions V 2 - Xy Ad - V > Xy; and, consequently, the functions Ml and M2
have forms completely analogous to the selection correction M whose specification is the topic of
Section 2.1.2.
1952 J. J. IIeckman and T. E. MuCur&

Replacing unobserved ml and m2 in (2.2.17) by their observed counterparts tii


and t?iZz,,the application of linear least-squares to the resulting equation yields an
estimate f?, of the parameter vector 8i = (pi, pi, 7;, T;)‘. Given standard assump-
tions, the estimator 8, is consistent and approximately normally distributed in
large samples. The covariance matrix C in (2.1.38) in this case is given by (2.1.35),
and the matrices Qi, Q2 and Q3 are as defined by (2.1.36) with w, =
(X;C,, X& m,*,, fi;,)‘and where

(2.2.18)

3. Applications of the index function model

This section applies the index function framework to specific problems in labor
economics. These applications give economic content to the statistical framework
presented above and demonstrate that a wide range of behavioral models can be
represented as index function models.
Three prototypical models are considered. We first present models wit!r a
“reservation wage” property. In a variety of models for the analysis of unemploy-
ment, job turnover and labor force participation, an agent’s decision process can
be characterized by the rule “stay in the current state until an offered wage
exceeds a reservation wage.” The second prototype we consider is a dummy
endogenous variable model that has been used to estimate the impact of school-
ing, training, occupational choice, migration, unionism and job turnover on
wages. The third model we discuss is one for labor force participation and hours
of work in the presence of taxes and fixed costs of work.

3.1. Models with the reservation wage property

Many models possess a reservation wage property, including models for the
analysis of unemployment spells [e.g. Kiefer and Neumann, (1979), Yoon (1981,
1984), Flinn and Heckman (1982)], for labor force participation episodes [e.g.
Heckman and Willis (1977); Heckman (1381), Heckman and MaCurdy (1980)
Killingsworth (1983)], for job histories [e.g. Johnson (1978), Jovanovic (1979),
Miller (1984), Flinn (1984)] and for fertility and labor supply [Moffit (1984) Hotz
and Miller (1984)]. Agents continue in a state until an opportunity arises (e.g. an
offered wage) that exceeds the reservation wage for leaving the state currently
Ch. 32: Labor Economeirics 1953

occupied. The index function framework has been used to formulate and estimate
such models.

3.1.1. A model of labor force participation

Agents at age t are assumed to possess a quasiconcave twice differentiable one


period utility function defined over goods (C(t)) and leisure (L(t)). Denote this
utility function by U(C(t), L(t)). We define leisure hours so that 0 < L(t) 5 1.
An agent is assumed to be able to freely choose his hours of work at a parametric
wage W(t). There are no fixed costs of work or taxes. At each age agents receive
unearned income, R(t), assumed to be nonnegative. Furthermore, to simplify the
exposition, we assume that there is no saving or borrowing, and decisions are
taken in an environment of certainty. Labor force participation models without
lending and borrowing constraints have been estimated by Heckman and MaCurdy
(1980) and Moffitt (1984).
In the simple model considered here, an agent does not work if his or her
reservation wage or value of time at home (the marginal rate of substitution
between goods and leisure evaluated at the no work position) exceeds the market
wage W(t). The reservation wage in the absence of savings is

where U,( .) and U,( -) denote partial derivatives. The market wage W(t) is
assumed to be known to the agent but it is observed by the econometrician only if
the agent works.
In terms of the index function apparatus presented in Section 1,

z(t) = w(t)- w,(t). (3.1.1)

If Z(t) 2 0 the agent works, 8(t) = 1, and the wage rate W(t) is observed. Thus
the observed wage is a censored random variable

r*(t) = W(t)8(t). (3.1.2)

The analysis of Sections 1 and 2 can be directly applied to formulate likelihood


functions for this model and to estimate its parameters.
For comparison with other economic models possessing the reservation wage
property, it is useful to consider the implications of this simple labor force
participation model for the duration of nonemployment. A nonworking spell
begins at I, and ends at t, provided that Z(t, - 1) > 0, Z(t, + j) I 0, for
j=O ,.**> t, - t, and Z(t, + 1) > 0. Reversing their direction, these inequalities
also characterize an employment spell that begins at t, and ends at t,. Assuming
1954 J. J. Heckmun und T. 17. MaCurdy

that unobservables in the model are distributed independently of each other in


different time periods, the (conditional) probability that a spell that begins at t,
lasts t, - t, + 1 periods is

[e Pr(Z(t) < O)*Pr(Z(t, +l) > 0). (3.1.3)


1

Precisely the same sort of specification arises in econometric models of search


unemployment.
As a specific example of a deterministic model of labor force participation,
assume that

Setting A(t) = exp{ X(t)/?, + e(t)} wh ere e(t) is a mean zero disturbance, the
reservation wage is

lnW,(t) = X(t)p,+ln(y/a)+(l-ar)lnR(t)+e(t).

The equation for log wage rates can be written as

lnW(t)=X(t)&+U(t).

Define an index function for this example as Z(t) = In W(t)- In W,( t), so that

Z(t)= X(t)(kPI)-ln(v/a)-(I-a)lnR(t)+V(t),

where V(t) = U(t)- e(t). Define another index function Y as

Y(t)=lnW(t)=X(t)&+U(t),

and a censored random variable Y*(t) by

Y*(t)=Y(t)G(t)=G(t)X(t)p,+S(t)u(t).

Assuming that (X(t), R(t)) is distributed independently of V(t), and letting


IJ,”= Var(v(t)), the conditional probability that 8(t) = 1 given X(t) and R(t) is

X(t)(P1-&)+lny/a+(l-a)lnR(t)
Pr(S(t)=lIX(t),R(t))=l-Go
i au i:

where G, is the c.d.f. of V(t)/a,. If V(t) is distributed independently across all t,


Ch. 32: Labor Econometrics 1955

the probability that a spell of employment begins at t = t, and ends at t = t,


conditional on t1 is
r = tz

,G Ma(t) =11X(t),R(t)) Pr@(t,+l> =01X(b), R(t2)).


[ 1 1

Assuming a functional form for G,, under standard conditions it is possible to


use discrete choice methods to consistently estimate (& - &)/u” (except for the
intercept) and (1- a)/~,. Using the M function regression estimators discussed
in Section 2.1, under standard conditions it is possible to estimate & consistently.
Provided that there is one regressor in X with a nonzero & coefficient and with a
zero coefficient in fir, it is possible to estimate 0; and (Yfrom the discrete choice
analysis. Hence it is possible to consistently estimate pi. These exclusion restric-
tions provide one method for identifying the parameters of the model. In the
context of a one period model of labor supply, such exclusion restrictions are
plausible.
In dynamic models of labor supply with savings such exclusion restrictions are
implausible. This is so because the equilibrium reservation wage function de-
termining labor force participation in any period depends on the wages received
in all periods in which agents work. Variables that determine wage rates in
working periods determine reservation wages in all periods. Conventional simulta-
neous equation exclusion restrictions cannot be used to secure identification in
this model. Identifiability can be achieved by exploiting the (nonlinear) restric-
tions produced by economic theory as embodied in particular functional forms.
Precisely the same problem arises in econometric models of search unemploy-
ment, a topic to which we turn next.

3.1.2. A model of search unemployment

The index function model provides the framework required to give econometric
content to the conventional model of search unemployment. As in the labor force
participation example just presented, agents continue on in a state of search
unemployment until they receive an offered wage that exceeds their reservation
wage. Accepted wages are thus censored random variables. The only novelty in
the application of the index function to the unemployment problem is that a
different economic theory is used to produce the reservation wage.
In the most elementary version of the search model, agents are income
maximizers. An unemployed agent’s decision problem is very simple. If cost c is
incurred in a period, the agent receives a job offer but the wage that comes with
the offer is unknown before the offer arrives. This uncertainty is fundamental to
the problem. Successive wage offers are assumed to independent realizations from
a known absolutely continuous wage distribution F(W) with E) WI -c co. Assum-
1956 J. J. Heckman and T. E. MaCurdy

ing a positive real interest rate r, no search on the job, and jobs that last forever
(so there is no quitting from jobs), Lippman and McCall (1976) show that the
value of search at time t, V(t), is implicitly determined by the functional equation

V(t)=max O;-c+ &Emax[F;V(t+l)]}, (3.1.4)


i

where the expectation is computed with respect to the distribution of W.


The decision process is quite simple. A searching agent spends c in period t
and faces two options in period t + 1: to accept a job which offers a per period
wage of W with present value W/r, or to continue searching, which option has
value V( t + 1). In period t, W is uncertain. Assuming that the nonmarket
alternative has a fixed nonstochastic value of 0, if V falls below 0, the agent
ceases to search. Lippman and McCall (1976) call the nonsearching state “out of
the labor force”.
Under very general conditions (see Robbins (1970) for one statement of these
conditions), the solution to the agent’s decision making problem has a reservation
wage characterization: search until the value of the option currently in hand
(W/r) exceeds the value of continuing on in the state, V(t + 1). For a time
homogenous (stationary) environment, the solution to the search problem has a
reservation wage characterization.20
Focusing on the time homogeneous case to simplify the exposition, note that
V(t) = V( t + 1) and that eq. (3.1.4) implies

rV+(l+r)c=$$w-rV)dF(w) for rV/20. (3.15)

The reservation wage is W, = rV. This function clearly depends on c, r and the
parameters of the wage offer distribution. Conventional exclusion restrictions of
the sort invoked in the labor force participation example presented in the
previous section cannot be invoked for this model.
Solving (3.1.5) for W, = rV and inserting the function so obtained into eqs.
(3.1.1) and (3.1.2) produces a statistical model that is identical to the deterministic
labor force participation model.
Except for special cases for F, closed form expressions for W, are not
available.21 Consequently, structural estimation of these models requires numeri-
cal evaluation of implicit functions (like V(t) in (3.1.4)) as input to evaluation of
sample likelihoods. To date, these computational problems have inhibited wide

20The reservation wage property characterizes other models as well. See Lippman and McCall
(1976).
21See Yoon (1981) for an approximate closed form expression of IV,.
Ch. 32: L.&or Economerrics 1951

scale use of structural models derived from dynamic optimizing theory and have
caused many analysts to adopt simplifying approximations.22
The density of accepted wages is

f(w)
dw*) = w* 2 w,, (3.1.6)
l- F(W,) ’

which is truncated. Assuming that no serially correlated unobservables generate


the wage offer distribution, the probability that an unemployment spell lasts j - 1
periods and terminates in period j is

[ Fw,)l’-l[l - mG)I. (3.1.7)

The joint density of durations and accepted wages is the product of (3.1.6) and
(3.1.7), or

h(w*, j) = [F(W,)]j-‘f(W*), (3.1.8a)

where

w* 2 w,. (3.1.8b)

In general the distribution of wages, F(w), cannot be identified. While the


truncated distribution G(w*) is identified, F(w) cannot be recovered without
invoking some untestable assumption about F. If offered wages are normally
distributed, F is recoverable. If, on the other hand, offered wages are Pareto
random variables, F is not identified. Conditions under which F can be recovered
from G are presented in Heckman and Singer (1985).
Even if F is recoverable, not all of the parameters of the simple search model
can be identified. From eq. (3.1.5) it should be clear that even if rV and F were
known exactly, an infinity of nonnegative values of r and c solve that equation.
From data on accepted wages and durations it is not possible to estimate both r
and c without further restrictions. 23 One normalization sets r at a known value.24

**Coleman (1984) presents indirect reduced form estimation procedures which offer a low cost
alternative to costly direct maximum likelihood procedures. Flinn and Heckman (19X2), Miller (1985),
Wolpin (1984), and Rust (1984) discuss explicit solutions to such dynamic problems. Kiefer and
Neumann (1979), Yoon (1981, 1984), and Hotz and Miller (1984) present approximate solutions.
23A potential source of such restrictions makes r and c known functions of exogenous variables.
24 Kiefer and Neumann (1979) achieve identification in this manner.
1958 J. J. Heckmun und T. E. MaCur&

Even if r is fixed, the parameter c can only be identified by exploiting inequality


(3.1.8b).25
If a temporally persistent heterogeneity component q is introduced into the
model (say due to unobserved components of c or Y), the analysis becomes
somewhat more difficult. To show this write W, as an explicit function of n,
W, = W,(q). In place of (3.1.8b) there is an implied restriction on the support
of 71

(3.1.9)

i.e. n is now restricted to produce a nonnegative reservation wage that is less than
(or equal to) the offered accepted wage. Modifying density (3.1.8a) to reflect this
dependence and letting #(n) be the density of n leads to

h(w’,j)=J(?jlo~w,(?j)~w*] ~~~(w,(n))]i(w*)~(~)d~.
t=1
(3.1.10)

Unless restriction (3.1.9) is utilized, the model is not identified.26

3.1.3. Models of job turnover

The index function model can also be used to provide a precise econometric
framework for models of on-the-job learning and job turnover developed by
Johnson (1978), Jovanovic (1979), Flinn (1984) and Miller (1985). In this class of
models, agents learn about their true productivity on a job by working at the job.
We consider the most elementary version of these models and assume that
workers are paid their realized marginal product, but that this product is due, in
part, to random factors beyond the control of the agent. Agents learn about their
true productivity by a standard Bayesian learning process. They have beliefs
about the value of their alternatives elsewhere. Ex ante all jobs look alike in the
simplest model and have value V,.
The value of a job which currently pays wage W( t ) in the t th period on the job
is V( W( t)). An agent’s decision at the end of period t given W(t) is to decide
whether to stay on the job the next period or to go on to pursue an alternative
opportunity. In this formulation, assuming no cost of mobility and a positive real
interest rate r,

v(w(t)) = w(t)+ &max{


E,V(Wt +I)); V,}, (3.1.11)

25See Flinn and Heckman (1982) for further discussion of this point.
26For further discussion of identification in this model, see Flinn and Heckman (1982)
Ch. 32: I~hor Econonwtrkv 1959

where the expectation is taken with respect to the distribution induced by the
information available in period t which may include the entire history of wage
payments on the job. If V, > E,V( IV(t + 1)) the agent changes jobs. Otherwise,
he continues on the job for one more period.
This setup can be represented by an index function model. Wages are observed
at a job in period t + 1 if E,( V( W’(t + 1))) > V,.

is the index function characterizing job turnover behavior. If Z(t) 2 0, 6(t) = 1


and the agent stays on the current job. Otherwise, the agent leaves. Wages
observed at job duration t are censored random variables Y*(t) = W(t)6(t). As
in the model of search unemployment computation of sample likelihoods requires
numerical evaluation of functional equations like (3.1.11).27

3.2. Prototypical dummy endogenous variable models

In this subsection we consider some examples of well posed economic models that
can be cast in terms of the dummy endogenous variable framework presented in
Section 2.2. We consider fixed and random coefficient versions of these models for
both certain and uncertain environments. We focus only on the simplest models
in order to convey essential ideas.

3.2.1. The impact of training on earnings

Consider a model of the impact of training on earnings in which a trainee’s


decision to enroll is based on a comparison of the present value of earnings with
and without training in an environment of perfect foresight. Our analysis of this
model serves as a prototype for the analysis of the closely related problems of
assessing the impact of schooling, unions, and occupational choice on earnings.
Let the annual earnings of an individual in year t be

x(t)p+6a+U(t), t>k
W(t) = (3.2.1)
X(t)6 + WY tsk.’

In writing this equation, we suppose that all individuals have access to training
at only one period in their life (period k) and that anyone can participate in

27Miller (1984) provides a discussion and an example of estimation of this class of models
1960 J. J. Heckman and T. E. MuCurdy

training if he or she chooses to do so. However, once the opportunity to train has
passed, it never reoccurs. Training takes one period to complete.28
Income maximizing agents are assumed to discount all earnings streams by a
common discount factor l/(1 + r). From (3.2.1) training raises earnings by an
amount (Y per period. While taking training, the individual receives subsidy S
which may be negative, (e.g. tuition payments). Income in period k is foregone
for trainees. To simplify the algebra we assume that people live forever.
As of period k, the present value of earnings for an individual who does not
receive training is

The present value of earnings for a trainee is

f’v(l)=S+ E($+++j)+,f$-q.
j=l

The present value maximizing enrollment rule has a person enroll in the program
if PV(1) > PV(0). Letting Z be the index function for enrollment,

Z=PV(l)-PV(O)=S-W(k)+;, (3.2.2)

and

if S-W(k)+:>0
(3.2.3)
otherwise

Because W(k) is not observed for trainees, it is convenient to substitute for W(k)
in (3.2.2) using (3.2.1). In addition some components of subsidy S may not be
observed by the econometrician. Suppose

S=Q#+v, (3.2.4)

where Q is observed by the econometrician and 11 is not. Collecting terms, we

28The assumption that enrollment decisions are made solely on the basis of an individual’s choice
process is clearly an abstraction. More plausibly, the training decision is the joint outcome of decisions
taken by the prospective trainee, the training agency and other agents. See Heckman and Robb (1985)
for a discussion of more general models.
Ch. 32: Labor Econometrics 1961

have

(y= l ifQ#+F-X(k)fi+q-U(k)>0
(3.2.5)
i0 otherwise

In terms of the dummy endogenous variable framework presented in Section 2.2,


(3.2.1) corresponds to eq. (2.2.1), and (3.2.5) corresponds to (2.2.2).
This framework can be modified to represent a variety of different choice
processes. For example, (Ymay represent the union-nonunion wage differential.
The variable S in this case may represent a membership bribe or enrollment fee.
In applying this model to the unionism problem an alternative selection mecha-
nism might be introduced since it is unlikely that income is foregone in any
period or that a person has only one opportunity in his or her lifetime to join a
union. In addition, it is implausible that membership is determined solely by the
prospective trainee’s decision if rents accrue to union membership.29
As another example, this model can be applied to schooling choices. In this
application, (Yis the effect of schooling on earnings and it is likely that schooling
takes more than one period. Moreover, a vector S is more appropriate since
agents can choose among a variety of schooling levels.
This framework can also be applied to analyze binary migration decisions,
occupational choice or industrial choice. In such applications, (Yis the per period
return that accrues to migration, choice of occupation or choice of industry
respectively. As in the schooling application noted above, it is often plausible that
6 is a vector. Furthermore, the content of the latent variable Z changes from
context to context; S should be altered to represent a cost of migration, or a cost
of movement among occupations or industries, and income may or may not be
foregone in a period of transition among states. In each of these applications, the
income maximizing framework can be replaced by a utility maximizing model.

3.2.2, A random coejkient speci$cation

In place of eq. (3.2.1), a random coefficient earnings function is

W(t)=X(t)p+qa+E)+U(t)
=X(t)p+&x+U(t)+&S. (3.2.6)

using the notation of eq. (2.2.3). This model captures the notion of a variable
effect of training (or unionism or migration or occupational choice, etc.) on
earnings.

29See Abowd and Farber (1982) for a discussion of this problem.


1962 J. J. Heckman und T. E. MaCurdy

If agents know E when they make their decisions about S, the following
modification to (3.2.5) characterizes the decision process:

if Q$+q-X(k)/?+q-U(k)fe/r>O
(3.2.7)
otherwise

The fact that E appears in the disturbance terms in (3.2.6) and (3.2.7) creates
another source of covariance between 6 and the error term in the earnings
equation that is not present in the fixed coefficient dummy endogenous variable
model.
The random coefficient model captures the key idea underlying the model of
self selection introduced by Roy (1951) that has been revived and extended in
recent work by Lee (1978) and Willis and Rosen (1979). In Roy’s model, it is
solely population variation in X(k), E, and U(k) that determines 6 (so 9 = Q = 0
in (3.2.7)).30
As noted in Section 2, the fixed coefficient and random coefficient dummy
endogenous variable models are frequently confused in the literature. In the
context of studies of the union impact on wages, Robinson and Tomes (1984)
find that a sample selection bias correction (or M-function) estimator of (Yand an
instrumental variable estimator produce virtually the same estimate of the coeffi-
cient. As noted in Section 2.2.3, the instrumental variable estimator is inconsistent
for the random coefficient model while the sample selection bias estimator is not.
Both are consistent for cx in the fixed coefficient model. The fact that the same
estimate is obtained from the two different procedures indicates that a fixed
coefficient model of unionism describes their data. (It is straightforward to
develop a statistical test that discriminates between these two models that is
based on this principle.)

3.2.2.1. Introducing uncertainty. In many applications of the dummy endoge-


nous variable model it is unlikely that prospective trainees (union members,
migrants, etc.) know all components of future earnings and the costs and benefits
of their contemplated action at the time they decide whether or not to take the
action. More likely, decisions are made in an environment of uncertainty.
Ignoring risk aversion, the natural generalization of decision rules (3.2.3) and
(3.2.5) assumes that prospective trainees (union members, migrants, etc.) compare
the expectation of PV(0) evaluated at the end of period k - 1 with the expecta-
tion of PI/(l) evaluated at the same date. This leads to the formulation

if E,_, S-W(k)+?]>0
9 (3.2.8)
otherwise

“For further discussion of this model and its applications see Heckman and Sedlacek (1985)
Ch. 32: Labor Econometrics 1963

where E, _ , denotes the expectation of the argument in brackets conditional on


the information available in period k - 1. “E” is a degenerate constant in the
fixed coefficient dummy endogenous variable model, but is not degenerate in the
general random coefficient specification.
Introducing uncertainty can sometimes simplify the econometrics of a problem.
(See Zellner, et al. (1966)). In the random coefficient model suppose that agents
do not know the value of E they will obtain when 6 = 1. For example, suppose
E,- 1( E) = 0. In this case trainees, union members, etc. do not know their
idiosyncratic gain to training, union membership, etc., before participating in the
activity. The random variable E does not appear in selection eq. (3.2.8) and is not
a source of covariation between 6 and the composite disturbance term in (3.2.6).
In this case earnings eq. (3.2.6) becomes a more conventional random coefficient
model in which the random coefficient is not correlated with its associated
variable. (See Heckman and Robb (1985).)
If an agent’s best guess of E is the population mean in eq. (3.2.8) then
E( ~18 = 1) = 0 so E(E~) = 0 and the error component ~8 creates no new econo-
metric problem not already present in the fixed coefficient framework. Consistent
estimators for the fixed coefficient model also consistently estimate (Y and j3 in
this version of the random coefficients model. In many contexts it is implausible
that E is known at the time decisions are taken, so that the more robust fixed
coefficient estimators may be applicable to random coefficient models.31

3.3. Hours of work and labor supply

The index function framework has found wide application in the recent empirical
literature on labor supply. Because this work is surveyed elsewhere [Heckman and
MaCurdy (1981) and Moffitt and Kehrer (1981)], our discussion of this topic is
not comprehensive. We briefly review how recent models of labor supply dealing
with labor force participation, fixed costs of work, and taxes can be fit within the
general index function framework.

3.3.1. An elementary model of labor suppi$

We initially consider a simple model of hours of work and labor force participa-
tion that ignores fixed costs and taxes. Let W be the wage rate facing a consumer,
C is a Hick’s composite commodity of goods and L is a Hicks’ composite
commodity of nonmarket time. The consumer’s strictly quasi-concave preference

311n the more general case in which future earnings are not known, the optimal forecasting rule for
W(k) depends on the time series process generating U(t). For an extensive discussion of more general
decision processes under uncertainty see Heckman and Robb (1985). An uncertainty model provides
yet another rationalization for the results reported in Robinson and Tomes (1984).
1964 J. J. Heckmun nnd T. E. MuCurdy

function is U(C, L, v), where v is a “taste shifter.” For a population of con-


sumers, the density of W and v is written as k(w, v). The maximum amount of
leisure is T. Income in the absence of work is R, and is assumed to be exogenous
with respect to v and any unobservables generating W.
A consumer works only if the best work alternative is better than the best
nonwork alternative (i.e. full leisure). In the simple model, this comparison can be
reduced to a local comparison between the marginal value of leisure at the no
work position (the slope of the consumer’s highest attainable indifference curve at
zero hours of work) and the wage rate.
The marginal rate of substitution (MRS) along an equilibrium interior solution
hours of work path is obtained by solving the implicit equation

MRS= U,(R+MRSH,T- H,v)


(3.3.1)
U,(R+MRSH,T- H,v) ’

for MRS, where H is hours of work and C = R + MRS *H. In equilibrium the
wage equals MRS. The reservation wage is MRS( R, 0, v). The consumer works if

MRS( R,O, v) < W; (3.3.2)

otherwise, he does not. If condition (3.3.2) is satisfied, the labor supply function is
determined by solving the equation MRS( R, H, v) = W for H to obtain

H=H(R,W,v). (3.3.3)

Consider a population of consumers who all face wage W and receive unearned
income R but who have different v’s The density k(vl W) is the conditional
density of “tastes for work” over the population with a given value of W. Letting
r, denote the subset of the support of v which satisfies MRS( R, 0, v) < W for a
given W, the fraction of the population that works is

(3.3.4)

The mean hours worked for those employed is

&@, W, v?k(vlW, R)dv


E[H(MRS(R,O,v) < W, W, R] = (3.3.5)
P(W,R) ’

The mean hours worked in the entire population is

E(H) =/ H(R, W, v)k(vl W, R)dv, (3.3.6)


G
Ch. 32: Labor Econometrics 1965

[remember H( R, W, v) = 0 for v P r,].


The model of Heckman (1974) offers an example of this framework. Write the
marginal rate of substitution function given by (3.3.1) in semilog form as

lnMRS( R, H, v) = a0 + a,R + cx2X2 + yH + v, (3.3.7)

where v is a mean zero, normally distributed error term. Market wage rates are
written as

lnW=Po+PIXl+v, (3.3.8)

where n is a normally distributed error term with zero mean. Equating (3.3.7) and
(3.3.8) for equilibrium hours of work for those observations satisfying In W >
MRS( R,O, v), one obtains

H=$[lnW-lnMRS(R.O,v)]

=~(P,-ao+&Xl-a,R-a,X,)+$(q-v). (3.3.9)

In terms of the conceptual apparatus of Sections 1 and 2, one can interpret this
labor supply model as a two-state model. State 0 corresponds to the state in which
the consumer does not work which we signify by setting the indicator variable
6 = 0. When 6 = 1 a consumer works and state 1 occurs. Two index functions
characterize the model where Y’ = (Y,, Y,) is a two element vector with

Y, = H and Y, = In W.

The consumer works (6 = 1) when (Y,, Y,) E Q1 where Q2,= {(Y,, Y,)( Y, > 0,
- 00 I Y, I co} is a subset of the support of (Y,, Y,). Note that the exogenous
variables X include Xi, X, and R. The joint distribution of the errors v and TJ
induces a joint distribution f(y,, y,]X) for Y via eqs. (3.3.8) and (3.3.9). Letting
Y * = 6Y denote the observed value of Y, Yr* = H * represents a consumer’s
actual hours of work and Y;C equals In W when the consumer works and equals
zero otherwise.
1966 J. J. Heckman and T. E. MaCurdy

By analogy with eq. (1.2.8) the joint density of hours and wages conditional on
X and working is given by

g(y*p =l, x) =
f(YL Y,*lX)
l,J(Yl? Y,lX)dY,dY,

(3.3.10)

From eq. (1.2.9), the distribution of Y* given X is

g(y*,GIX)= [f(y:,y,*(X]6[(1-Pr(S=11X))J(y:,Y,*)]1-6, (3.3.11)


where Pr(6 = 11X) denotes the probability that the consumer works given X, i.e.

(3.3.12)

and where J(Y:, Yz) = 1 if Y: = 0 = Y; and = 0 otherwise. When f(e) is a


bivariate normal density, the density g(y*, 6(X) is sometimes called a bivariate
Tobit model. Provided that one variable in X appears in (3.3.8) that does not
appear in (3.3.7) y can be consistently estimated by maximum likelihood using
the bivariate Tobit model.

3.3.2. A general model of labor supply with fixed costs and taxes

In this section we extend the simple model presented above to incorporate fixed
costs of work (such as commuting costs) and regressive taxes. We present a
general methodology to analyze cases in which marginal comparisons do not fully
characterize labor supply behavior. We synthesize the suggestions of Burtless and
Hausman (1978), Hausman (1980), Wales and Woodland (1979), and Cogan
(1981).
Fixed costs of work or regressive taxes produce a nonconvex budget constraint.
Figure 1 depicts the case considered here. 32 This figure represents a situation in
which a consumer must pay a fixed money cost equal to F in order to work. R, is
his nonlabor income if he does not work. Marginal tax rate of t,

32Generalization to more than two branches involves no new principle. Constraint sets like R,SN
are alleged to be common in negative income tax experiments and in certain social programs.
Ch. 32: I*rhor Econometrics 1967

Consumption

State 3 / \
I \
I Y \ ,r State 1
T HL 1’
Hours Worked
State 2 ’,
\
\
\
\
\
\
\
R3

Figure 1

applies to the branch R,S defined up to p hours, and a lower marginal rate t,
applies to branch NV.
Assuming that no one would ever choose to work T or more hours, a consumer
facing this budget set may choose to be in one of three possible states of the
world: the no work position at kink point R, (which we define as state l), or an
interior equilibrium on either segment R,S or segment SN (defined as states 2
and 3, respectively).33 A consumer in state 1 receives initial after-tax income R,.
In state 2, a consumer receives unearned income R, and works at an after-tax
wage rate equal to W, = W(l- tA) where W is the gross wage. A consumer in
state 3 earns after-tax wage rate W, = W(l- te) and can be viewed as receiving
the equivalent of R, as unearned income. Initially we assume that W is exoge-
nous and known for each consumer.
In the analysis of kinked-nonconvex budget constraints, a local comparison
between the reservation wage and the market wage does not adequately char-
acterize the work-no work decision as it did in the model of Section 3.3.1. Due to
the nonconvexity of the constraint set, existence of an interior solution on a
branch does not imply that equilibrium will occur on the branch. Thus in Figure
1, point B associated with indifference curve ZJ, is a possible interior equilibrium
on branch R,S that is clearly not the global optimum.

33The kink at S is not treated as a state of the world because preferences are assumed to be twice
diff‘erentiable and quasiconcave.
1968 J. J. IIeckmun und T. E. MuCur+

A general approach for determining the portion of the budget constraint on


which a consumer locates is the following. Write the direct preference function as
U(C, L, v) where v represents taste shifters. Form the indirect preference func-
tion V( R, W, v). Using Roy’s identity for interior solutions, the labor supply
function may be written as

H=F=H(R,w,v).
R

While the arguments of the functions U(e), V( .), and H( .) may differ across
consumers, the functional forms are assumed to be the same for each consumer.
If a consumer is at an interior equilibrium on either segment R,S or SN, then
the equilibrium is defined by a tangency of an indifference curve and the budget
constraint. Since this tangency indicates a point of maximum attainable utility,
the indifference curve at this point represents a level of utility given by V( R,, W,, v)
where Rj and W, are, respectively, the after-tax unearned income and wage rate
associated with segment i. Thus, hours of work for an interior equilibrium are
given by V,/V, evaluated at R, and W,. For this candidate equilibrium to be
admissible, the implied hours of work must lie between the two endpoints of the
interval (i.e. equilibrium must occur on the budget segment). A consumer does
not work if utility at kink R,, U( R,, T, v), is greater than both V( R >, W,, v) and
V( R,, W,, v), provided that these latter utility values represent admissible solu-
tions located on the budget constraint.
More specifically, define the labor supply functions Hcl), Hc2, and Hc3, as
Hcl, = 0 and

V,(R,, %, v) i=2,3; (3.3.13)


=H(Ri,W,>v),
H(i)=V,( Ri, Wi, v)
and define the admissible utility levels 1/(r), I$,, and I$) as I$, ==U(R,, T, v),
assumed to be greater than zero, and
-
V(R,, w,,v) if 0 c: Hczj I H
(3.3.14)
V(Z)=
0 otherwise

and

V/(R,,w,,v) if a< Hc3)< T


(3.3.15)
Q, =
0 otherwise

We assume the E!/(.) is chosen so that V( .) > 0 for all C, L, and v. A consumer
C‘h. 32: Labor Econometrics 1969

whose v lies in the set

will not work and occupies state 1. If Y lies in the set

r2= {w$)>&) and &2Q)L (3.3.17)

a consumer is at an mterior solution on segment R,S and occupies state 2.


Finally, a consumer is at equilibrium in state 3 on segment SN if Y is an element
of the set

r3= (4y3)ql) and v,3, ’ 52) >.


(3.3.18)

The sets F,, F,, and F, do not intersect, and their union is the relevant subspace
of the support of v. These sets are thus mutually exclusive.34 The functions Hcij
determine the hours of work for individuals for whom v E c.
Choosing a specification for the preference function and a distribution for
“ tastes” in the population, G(V), produces a complete statistical characterization
of labor supply behavior. The probability that a consumer is in state i is

Pr(vEc)=_/.+(v)dv. (3.3.19)

The expected hours of work of a consumer who is known to be in state i is

(3.3.20)

The expected hours of work for a randomly chosen individual is

E(H)= t E(H(,,IVEF,)Pr(vEc.). (3.3.21)


i=l

We have thus far assumed: (i) that data on potential wage rates are available
for all individuals including nonworkers, and (ii) that wage rates are exogenous

34 Certain values for Y may be excluded if they imply such phenomena as negative values of U or V
or nonconvex preferences. In this case we use the conditional density of B excluding those values.
1970 J. J. Heckmun and T. E. Ma(‘ur&

variables. Relaxing these assumptions does not raise any major conceptual
problems and makes the analysis relevant to a wider array of empirical situations.
Suppose that market wage rates are described by the function

w= Wxll), (3.3.22)

where X includes a consumer’s measured characteristics, and TJ is an error term


representing unmeasured characteristics. Substituting W( X, TJ) for W in the
preceding discussion, the extended partitions

FE {(v~11)l~i)2v(j)
for all j } , (3.3.23)

(recall that equality holds on a set of measure zero) replace the characterization of
the sets ri for known wages given by (3.3.16)-(3.3.18). A consumer for whom
(Y, 7) E r; occupies state i. The probability of such an event is

where $(v, 17) is the joint density of v and 9.


The labor supply functions for each state are changed by substituting W( X, 7)
for W in constructing the arguments of the functions for states 2 and 3 given by
(3.3.13).35 In place of (3.3.21), the expression for expected hours of work becomes

(3.3.25)
i=l

where

(3.3.26)

Using the expression for E(H) given by (3.3.25) in a regression analysis


permits wages to be endogenous and does not require that wage offer data be
available for all observations. The parameters of (3.3.25) or (3.3.26) can be
estimated using the nonlinear least-squares procedure described in Section 2.1. To
identify all the parameters of the model, the wage equation must also be
estimated using data on workers appropriately adjusting for sample selection bias.
An alternative strategy is to jointly estimate hours and wage equations.

35Note that the arguments W,, IV,, R, and R, each depend on W.


Ch. 32: L.&or Econometrics 1971

Thus far we have assumed that hours of work and wages are not measured with
error. The needed modifications required in the preceding analysis to accommo-
date measurement error are presented in Heckman and MaCurdy (1981).
To illustrate the required modifications when measurement error is present,
suppose that we express the model in terms of u and n and that errors in the
variables plague the available data on hours of work. When H > 0, suppose that
measured hours, which we denote by H+, are related to true hours by the
equation H + = H + e where e is a measurement error distributed independently
of the explanatory variables X. When such errors in variables are present, data on
hours of work (i.e. H+ when H > 0 and H when H = 0) do not allocate working
individuals to the correct branch of the budget constraint. Consequently, the
states of the world a consumer occupies can no longer be directly observed.
This model translates into a three index function model of the sort described in
Section 1.2. Two index functions, Y’= (Y,, Yz) = (Hi, W) are observed in some
states, and one index function, 2 = v, is never directly observed. Given an
assumption about the joint distribution of the random errors v, 11, and e, a
transformation from these errors to the variables Y, W, and H+ using eq. (3.3.13)
and the relation H+ = H( R, W, Y)+ e produces a joint density function f( Y, Z).
There are three states of the world in this model (so I = 3 in the notation of
Section 1.2). The ith state occurs when S, = 1 which arises if (Y, Z) E 52; where

and

Y is observed in the work states 2 and 3, but not when 6, = 1. Thus, adopting the
convention of Section 1, the observed version of Y is given by Y * = (6, + 6,)Y.
In this notation, the appropriate density functions for this model are given by
formulae (1.2.12) and (1.2.13) with & = 6, + S,, 8, = 0, and & = S,.

4. Summary

This paper presents and extends the index function model of Karl Pearson (1901)
that underlies all recent models in labor econometrics. In this framework,
censored, truncated and discrete random variables are interpreted as the manifes-
tation of various sampling schemes for underlying index function models. A
unified derivation of the densities and regression representations for index func-
1972 J. J. Heckmun and T. E. MaCurdy

tion models is presented. Methods of estimation are discussed with an emphasis


on regression and instrumental variable procedures.
We demonstrate how a variety of substantive models in labor economics can be
given an econometric representation within the index function framework. Mod-
els for the analysis of unemployment, labor force participation, job turnover, the
impact of interventions on earnings (and other outcomes) and hours of work are
formulated as special cases of the general index function model. By casting these
diverse models in a common mold we demonstrate the essential commonalities in
the econometric approach required for their formulation and estimation.

Appendix: The principal assumption

This appendix discusses the principal assumption in the context of a more


conventional discrete choice model. We write

z, = xp, + s,cq+ v,, (A.la)

(A.lb)

E(Vl) = E(T/,) =O, Var(Vr) =Var(V;) =l.

z, 2 0 iff 6, =l,

z, < 0 iff 6, = 0,

z, 2 0 iff 6, =I,

z, < 0 iff S,=O.

In this model Z, and Z, are not observed. Unless

(Yr(Y*
= 0, (A.2)

it is possible that Z, 2 0 but 6, = 0 or that Z, 2 0 but 6, = 0.


An argument that is often made against this model is that condition (A.2) rules
out “true simultaneity” among outcomes. By analogy with the conventional
simultaneous equations literature, replacing 6, with Z, and 6, with Z, in eq.
(A.l) generates a statistically meaningful model without need to invoke condition
(A.2). Appealing to this literature, the principal assumption looks artificial.
Ch. 32: L.ubor Econometrics 1973

To examine this issue more closely, we present a well-specified model of


consumer choice in which condition (A.2) naturally emerges. Let X = 1 (so there
are no exogenous variables in the model) and write the utility ordering over
outcomes as

(A.3)

where (nl, nz, TJ~)is a vector of parameters and (I+, Q, cg) is a vector of mean zero
continuous unobserved random variables.
The outcome 8, =l of the choice process arises if either U(l,l) or U(l,O) is
maximal in the choice set (i.e. max(U(1, l), U(l, 0)) 2 max(U(0, l), U(O,O))). For
a separable model with no interactions (v~ = 0 and es = 0), this condition can be
stated as

Is,=1 iff 71~+ er 2 0.

Setting ni = pr, (or = 0 and &I= V, produces eq. (A.la). Condition (A.2) is
satisfied. By a parallel argument for S,, (A.lb) is produced. Condition (A.2) is
satisfied because both (or = 0 and (Ye= 0.
For a general nonseparable choice problem (TJ~f 0 or eg f 0 or both) equation
system (A.l) still represents the choice process but once more (or = (Y*= 0. For
example, suppose that cg = 0. In this case

6,=1 iff max(U(l,l), U(l,O)) 2 max(U(O,l), U(O,O)).

For the case n3 > 0,

S,=l iff 7ji + E~1E2< -(v*+v,) ’ 0, (A.4)

or

or

where Xl, denotes the conditional random variable X given Y = y. The probabil-
ity that 6, = 1 can be represented by eq. (A.l) with (or = 0. In this model the
distribution of (Vi, V,) is of a different functional form than is the distribution of
(% E*).
1974 .I. J. Heckman and T. E. MaCur&

In this example there is genuine interaction in the utility of outcomes and eqs.
(A.l) still characterize the choice process. The model satisfies condition (A.2).
Even if (hi = (Ye= 0, there is genuine simultaneity in choice.
Unconditional representation (A.l) (with cyiZ 0 or CQf 0) sometimes char-
acterizes a choice process of interest and sometimes does not. Often partitions of
the support of (Vi, V,) required to define 6, and 6, are not rectangular and so the
unconditional representation of the choice process with CX~ # 0 or CY~# 0 is not
appropriate, but any well-posed simultaneous choice process can be represented
by equation system (A.l).
An apparent source of confusion arises from interpreting (A.l) as a well-
specified behavioral relationship. Thus it might be assumed that the utility of
agent 1 depends on the actions of agent 2, and vice versa. In the absence of any
behavioral mechanism for determining the precise nature of the interaction
between two actors (such as (A.3)), the model is incomplete. Assuming that player
1 is dominant (so CQ= 0) is one way to supply the missing behavioral relationship.
(Dominance here means that player 1 temporally has the first move.) Another
way to complete the model is to postulate a dynamic sequence so that current
utilities depend on previous outcomes (so CY~ = (Ye= 0, see Heckman (1981)).
Bjorn and Vuong (1984) complete the model by suggesting a game theoretic
relationship between the players. In all of these completions of the model, (A.2) is
satisfied.

References

Abowd, .I. and H. Farber (1982) “Jobs Queues and the Union Status of Workers”, Industrial nnd
Lcrhor Rehtions Review, 35, 354-367.
Amemiya, T. (1985) Aduanced Econometrics. Harvard University Press, forthcoming.
Bishop, Y., S. Fienberg and P. Holland (1975) Discrete Multivariate An&y.+. Cambridge: MIT Press.
Bock and Jones (1968) The Measurement und Prediction of Judgment and Choice. San Francisco:
Holden-Day.
Bojas, G. and S. Rosen (1981) “Income Prospects and Job Mobility of Younger Men”, in:
R. Ehrenburg, ed., Research in I*rhor Economics. London: JAI Press, 3.
Burtless, G. and J. Hausman (1978) “The Etfect of Taxation on Labor Supply: Evaluating the Gary
Negative Income Tax Experiment”, Journal of Political Economy, 86(6), 1103-1131.
Byron, R. and A. K. Bera (1983) “Least Squares Approximations to Unknown Regression Functions,
A Comment”, Internntional Economic Review, 24(l), 255-260.
Cain, G. and H. Watts, eds. (1973) Income Muintenunce and Labor Supp!v. Chicago: Markham.
Catsiapsis, B. and C. Robinson (1982) “Sample Selection Bias with Multiple Selection Rules: An
Application to Student Aid Grants”, Journal of Econometrics, 18. 351-368.
Chamberlain, G. (1982) “Multivariate Regression Models for Panel Data”, Journnl of Econometrics,
18, 5-46.
Cogan, J. (1981) “Fixed Costs and Labor Supply”, Econometrica, 49(4), 945-963.
Coleman, T. (1981) “Dynamic Models of Labor Supply”. University of Chicago, unpublished
manuscript.
Coleman, T. (1984) “Two Essays on the Labor Market”. University of California, unpublished Ph.D.
dissertation.
Cosslett, S. (1984) “Distribution-Free Estimator of Regression Model with Sample Selectivity”.
University of Florida, unpublished manuscript.
Ch. 32: I&or Eeonometric.s 1975

Eicker, F. (1963) “Asymptotic Normality and Consistency of the Least Squares Estimators for
Families of Linear Regressions”, Annals of Mathematical Statistics, 34, 446-456.
Eicker. F. (1967) “Limit Theorems for Regressions with Unequal and Dependent Errors”, in:
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley:
University of California Press, 1, 59-82.
Flinn, C. (1984) “Behavioral Models of Wage Growth and Job Change Over the Life Cycle”.
University of Chicago, unpublished Ph.D. dissertation.
Flinn, C. and J. Heckman (1982) “New Methods for Analyzing Structural Models of Labor Force
Dynamics”, Journal of Econometrics, 18, 115-16X.
Freeman. R. (1984) “Longitudinal Analysis of the Effects of Trade Unions”, Journal of Labor
Economics, 2, l-26.
Gallant, R. and D. Nychka (1984) “Consistent Estimation of the Censored Regression Model”,
unpublished manuscript, North Carolina State University.
Goldberger, A. (1983) “Abnormal Selection Bias”, in: S Karlin, T. Amemiya and L. Goodman, eds.,
Studies in Econometrics, Time Series and Multivarrate Statistics. New York: Academic Press, 67-84.
Griliches, Z. (1986) “Economic Data Issues”, in this volume.
Haberman, S. (1978) Analysis of Qualitatioe Data, New York: Academic Press, I and II.
Hausman, J. (1980) “The Effects of Wages, Taxes, and Fixed Costs on Women’s Labor Force
Participation”, Journal of Public Economics, 14, 161-194.
Heckman, J. (1974) “Shadow Prices, Market Wages and Labor Supply”, Econometrica, 42(4),
679-694.
Heckman, J. (1976a) “Simultaneous Equations Models with Continuous and Discrete Endogenous
Variables and Structural Shifts”, in: S. Goldfeld and R. Quandt. eds.. Studies in Nonlinear
Estimation. Cambridge: Ballinger.
Heckman, J. (1976b) “The Common Structure of Statistical Models of Truncation, Sample Selection
and Limited Dependent Variables and a Simple Estimator for Such Models”, Annals of Economic
and Social Measurement, Fall, 5(4), 475-492.
Heckman, J. (1978) “Dummy Endogenous Variables in a Simultaneous Equations System”,
Econometrica, 46, 931-961.
Heckman, J. (1979) “Sample Selection Bias as a Specification Error”, Econometrica, 47, 153.-162.
Heckman, J. (1981) “Statistical Models for Discrete Panel Data”, in: C. Manski and D. McFadden,
eds., Structural Analysis of Discrete Data with Economic Applications. Cambridge: MIT Press.
Heckman, J., M. Killingsworth and T. MaCurdy (1981) “Empirical Evidence on Static Labour Supply
Models: A Survey of Recent Developments”, in: Z. Homstein, J. Grice and A. Webb, eds., The
Economics of the Labour Market. London: Her Majesty’s Stationery Office, 75-122.
Heckman, J. and T. MaCurdy (1980) “A Life Cycle Model of Female Labor Supply”, Review of
Economic Studies, 41, 47-74.
Heckman, J. and T. MaCurdy (1981) “New Methods for Estimating Labor Supply Functions: A
Survey”, in: R. Ehrenberg, ed., Research in Labor Economics. London: JAI Press, 4.
Heckman, J. and R. Robb (1985) “Alternative Methods for Evaluating the Impact of Training on
Earnings”, in: J. Heckman and B. Singer, eds., Longitudinal Analysis of Labor Market Data.
Cambridge: Cambridge University Press.
Heckman, J. and G. Sedlacek (1985) “Heterogeneity, Aggregation and Market Wage Functions: An
Empirical Model of Self Selection in the Labor Market”, Journal of Political Economy, 93,
December.
Heckman, J. and B. Singer (1986) “Econometric Analysis of Longitudinal Data”, in this volume.
Heckman, J. and R. Willis (1977) “A Beta Logistic Model for Analysis of Sequential Labor Force
Participation by Married Women”, Journal of Political Economy, X5, 27-58.
Hotz, J. and R. Miller (1984) “A Dynamic Model of Fertility and Labor Supply”. Carnegie-Mellon
University, unpublished manuscript.
Johnson, W. (1978) “A Theory of Job Shopping”, Quartet+ Journal of Economics.
Jovanovic, B. (1979) “Firm Specific Capital and Turnover”, Journal of Political Economy, December,
87(6), 1246-1260.
Kagan, A., T. Linnik and C. R. Rao (1973) Some Characterization Theorems in Mathematical
Statistics. New York: Wiley.
Kendall, M. and A. Stuart (1967) The Advanced Theory of Stutistics. London: Griffen, II.
1976 J. J. Ileckmun und T. E. MuCurdv

Kcvles. D. J. (1985) In the Nume of Eugenics, New York: Knopf.


Kiefer. N. and G. Neumann (1979) “An Empirical Job Search Model with a Test of the Constant
Reservation Wage Hvnothesis”, Journal of Political Economv. Februarv. X7(1 ). 89-108.
Killingsworth, M. (1983) Labour Supply. Cambridge: Cambridge University Press.
Lee, L. F. (1978) “Unionism and Wage Rates: A Simultaneous Equations Model with Qualitative and
Limited Dependent Variables”, Internuttonal Economic Review, 19, 415-433.
Lee, L. F. (1981) “Simultaneous Equation Models with Discrete and Censored Variables”, in: C.
Manski and D. McFadden, eds., Structural Anu!vsis cd Discrete Data with Economic Applicutions.
Cambridge: MIT Press.
Lee, L. F. (1982) “Some Approaches to the Correction of Selectivity Bias”, Reotew of Economic
Studies, 49, 355-372.
Lippman, S. and J. McCall (1976) “The Economics of Job Search: A Survey, Part I”, Economtt
Inyuirv, 14, 155-189.
Lord, F. and M. Novick (1968) Statisticul Theories of Mental Test Scores. Reading: Audison-Wesley
Publishing Company.
Manski, C. and D. McFadden (1981) “Alternative Estimates and Sample Designs for Discrete Choice
Analysis”, in: C. Manski and D. McFadden, eds., Structural Analysts of Discrete Dutu with
Econometric Applicutions. Cambridge: MIT Press.
McFadden, D. (1985) “Econometric Analysis of Qualitative Response Models”, in: Z. Griliches and
J. Intriligator, eds., Hundbook of Econometrics. North-Holland, II.
Miller, R. (1984) “An Estimate of a Job Matching Model”, Journul of Pohtical Economy, Vol. 92,
December.
Mincer, J. and B. Jovanovic (1981) “Labor Mobility and Wages”, in: S. Rosen, ed., Studies in Lubor
Murkets. Chicago: University of Chicago Press.
Moffitt, R. (1984) “Profiles of Fertility, Labor Supply and Wages of Married Women: A Complete
Life-Cycle Model”, Review of Economic Studies, 51, 263-218.
Moffitt, R. and K. Kehrer (1981) “The Effect of Tax and Transfer Programs on Labor Supply: The
Evidence from the Income Maintenance Experiments”, in: R. Ehrenberg, ed., Research in Lubor
Economics. London: JAI Press, 4.
Olson, R. (1980) “A Least Squares Correction for Selectivity Bias”, Econometrica, 48, 1815-l 120.
Pearson, K. (1901) “Mathematical Contributions to the Theory of Evolution”, Phtlosophicul T~nsuc-
tions, 195, l-47.
Quandt, R. (1958) “The Estimation of the Parameters of a Linear Regression System Obeying Two
Separate Regimes”, Journul of the American Statistical Assoctution, 53, 873-880.
Quandt, R. (1972) “A New Approach to Estimating Switching Regressions”, Journal of the Americun
Statistical Association, 67, 306-310.
Robbins, H. (1970) “Optimal Stopping”, American Mathemutical Month/v, 11, 333-343.
Robinson, C. and N. Tomes (1982) “Self Selection and Interprovi&ial Migration in Canada”,
Canudiun Joumul of Economics, 15(3), 474-502.
Robinson, C. and N. Tomes (1984) “Union Wage Differentials in the Public and Private Sectors: A
Simultaneous Equations Specification”, Journul of Labor Economics, 2(l), 106-127.
Rossett, R. (1959) “A Statistical Model of Friction in Economics”, Econometrica, 27(2), 263-267.
Roy, A. (1951) “Some Thoughts on the Distribution of Earnings”. Oxford Economic Papers, 3,
135-146.
Rust, J. (1984) “Maximum Likelihood Estimation of Controlled Discrete Choice Processes”. SSRI
No. 8407, University of Wisconsin, May 1984.
Schmidt, P. (1981) “Constraints on the Parameters in Simultaneous Tobit and Probit Models”, in: C.
Manski, and D. McFadden, eds., Structurul Anulysis of Discrete Duta with Econometric Applicutions.
Cambridge: MIT Press.
Siow, A. (1984) “Occupational Choice Under Uncertainty”, Econometricu, 52(3), 631-646.
Strauss, R. and P. Schmidt (1976) “The Etfects of Unions on Earnings and Earnings on Unions: A
Mixed Logit Approach”, Internutionul Economic Review, 17(l), 204-212.
Tallis, G. M. (1961) “The Moment Generating Function of the Truncated Multivariate Distribution”,
Journal of the Royul Stutistical Society, Series R, 23, 233-239.
Thurstone, L. (1927) “A Law of Comparative Judgment”, Psychologtcul Reoiew, 37, 213-286.
Ch. 3: Luhor Econometria 1971

Tinbcrgcn, J. (1951) “Some Remarks on the Distribution of Labour Incomes”, Internattotul Econorn/c
Pupers. 195-20-l.
Tobin, J. (1958) “Estimation of Relationships for Limited Dependent Variables”, Gonometrrcu,26.
24-36.
Wales, T. J. and A. D. Woodland (1979) “Labour Supply and Progrcssivc Taxes”, Reeve!+ of Ecotromic
Studies, 46, 83-95.
White, H. (1981) “Consequences and Detection of Misspecifed Nonlinear Regression Models”,
Journul of the American Stutistwd Assoaatron, 16, 419-433.
Willis, R. and S. Rosen (1979) “Education and Self Selection”, Journul of Polrticcrl Econon!g, Xl.
Sl-S36.
Wolpin, K. (1984) “An Estimable Dynamic Stochastic Model of Fertility and Child Mortality”,
Journal of Politicrrl Ecc.onom,v,Vol. 92, August.
Yoon. B. (1981) “A Model of Unemployment Duration with Variable Search Intensity”, Reoiew o/
Economic.v and Statistics, November, 63(4), 599-609.
Yoon, B. (19X4) “A Nonstationary Hazard Model of Unemployment Duration”. New York: SUNY,
Department of Economics, unpublished manuscript.
Zellner, A., J. Kmenta and J. Dreze (1966) “Specification and Estimation of Cobb Douglas Production
Function Models”, Econometrica, 34, 784-795.
Chapter 33

EVALUATING THE PREDICTIVE ACCURACY OF MODELS


RAY C. FAIR

Contents

1. Introduction 1980
2. Numerical solution of nonlinear models 1981
3. Evaluation of ex ante forecasts 1984
4. Evaluation of ex post forecasts 1986
5. An alternative method for evaluating predictive accuracy 1988
6. Conclusion 1993
References 1994

Handbook of Econometrics, Volume III, Edited by Z. Griliches and h4. D. Intriligator


Q Elsevier Science Publishers B V, 1986
1980 R. C. Fair

1. Introduction

Methods for evaluating the predictive accuracy of econometric models are dis-
cussed in this chapter. Since most models used in practice are nonlinear, the
nonlinear case will be considered from the beginning. The model is written as:

.fi(YtY
xt2 ai> = uil,
(i=l ,..., n), (t =l,...,T),

where y, is an n-dimensional vector of endogenous variables, x, is a vector of


predetermined variables (including lagged endogenous variables), CX,is a vector of
unknown coefficients, and uir is the error term for equation i for period t. The
first m equations are assumed to be stochastic, with the remaining u,,(i = m +
1 >**., n) identically zero for all t.
The emphasis in this chapter is on methods rather than results. No attempt is
made to review the results of comparing alternative models. This review would be
an enormous undertaking and is beyond the scope of this Handbook. Also, as will
be argued, most of the methods that have been used in the past to compare
models are flawed, and so it is not clear that an extensive review of results based
on these methods is worth anyone’s effort. The numerical solution of nonlinear
models is reviewed in Section 2, including stochastic simulation procedures. This
is background material for the rest of the chapter. The standard methods that
have been used to evaluate ex ante and ex post predictive accuracy are discussed
in Sections 3 and 4, respectively. The main problems with these methods, as will
be discussed, are that they (1) do not account for exogenous variable uncertainty,
(2) do not account for the fact that forecast-error variances vary across time, and
(3) do not treat the possible existence of misspecification in a systematic way.
Section 5 discusses a method that I have recently developed that attempts to
handle these problems, a method based on successive reestimation and stochastic
simulation of the model. Section 6 contains a brief conclusion.
It is important to note that this chapter is not a chapter on forecasting
techniques. It is concerned only with methods for evaluating and comparing
econometric models with respect to their predictive accuracy. The use of these
methods should allow one (in the long run) to decide which model best approxi-
mates the true structure of the economy and how much confidence to place on the
predictions from a given model. The hope is that one will end up with a model
that for a wide range of loss functions produces better forecasts than do other
techniques. At some point along the way one will have to evaluate and compare
other methods of forecasting, but it is probably too early to do this. At any rate,
this issue is beyond the scope of this chapter.’

‘For a good recent text on forecasting techniques for time series, see Granger and Newbold (1977).
Ch. 33: Evaluatingthe PredictiveAccuracyof Models 1981

2. Numerical solution of nonlinear models

The Gauss-Seidel technique is generally used to solve nonlinear models. [See


Chapter 14 (Quandt) for a discussion of this technique.] Given a set of estimates
of the coefficients, given values for the predetermined variables, and given values
for the error terms, the technique can be used to solve for the endogenous
variables. Although in general there is no guarantee that the technique will
converge, in practice it has worked quite well.
A “static” simulation is one in which the actual values of the predetermined
variables are used for the solution each period. A “dynamic” simulation is one in
which the predicted values of the endogenous variables from the solutions for
previous periods are used for the values of the lagged endogenous variables for
the solution for the current period. An “ex post” simulation or forecast is one in
which the actual values of the exogenous variables are used. An “ex ante”
simulation or forecast is one in which guessed values of the exogenous variables
are used. A simulation is “outside-sample” if the simulation period is not
included within the estimation period; otherwise the simulation is “within-sam-
ple.” In forecasting situations in which the future is truly unknown, the simula-
tions must be ex ante, outside-sample, and (if the simulation is for more than one
period) dynamic.
If one set of values of the error terms is used, the simulation is said to be
“deterministic.” The expected values of most error terms in most models are zero,
and so in most cases the errors terms are set to zero for the solution. Although it
is well known [see Howrey and Kelejian (1971)] that for nonlinear models the
solution values of the endogenous variables from deterministic simulations are
not equal to the expected values of the variables, in practice most simulations are
deterministic. It is possible, however, to solve for the expected values of the
endogenous variables by means of “stochastic” simulation, and this procedure
will now be described. As will be seen later in this chapter, stochastic simulation
is useful for purposes other than merely solving for the expected values.
Stochastic simulation requires that an assumption be made about the distribu-
tions of the error terms and the coefficient estimates. In practice these distribu-
tions are almost always assumed to be normal, although in principle other
assumptions can be made. For purposes of the present discussion the normality
assumption will be made. In particular, it is assumed that U, = ( uit,. . . , u,,)’ is
independently and identically distributed as multivariate N(0, E). Given the
estimation technique, the coefficient estimates, and the data, one can estimate the
covariance matrix of the error terms and the covariance matrix of the coefficient
estimates. Denote these two matrices as ?? and p, respectively. The dimension of
2 is m x m, and the dimension of P is K x K, where K is the total number of
coefficients in the model: _%can be computed as (l/T@‘, where fi is the m X T
matrix of values of the estimated error terms. The computation of 9 depends on
1982 R. C. Fair

the estimation technique used. Given P and given the normality assumption, an
estimate of the distribution of the coefficient estimates is N(&, P), where & is the
K x 1 vector of the coefficient estimates.
Let u: denote a particular draw of the m error terms for period t from the
N(O,e) distribution, and let 1y* denote a particular draw of the K coefficients
from the N(& P) distribution. Given u : for each period t of the simulation and
given (Y*, one can solve the model. This is merely a deterministic simulation for
the given values of the error terms and coefficients. Call this simulation a “trial”.
Another trial can be made by drawing a new set of values of U: for each period t
and a new set of values of (Y*.This can be done as many times as desired. From
each trial one obtains a prediction of each endogenous variable for each period.
Let ji;?,kdenote the value on the jth trial of the k-period-ahead prediction of
variable i from a simulation beginning in period t.2 For J trials, the estimate of
the expected value of the variable, denoted Tirk, is:

In a number of studies stochastic simulation with respect to the error terms


only has been performed, which means drawing only from the distribution of the
error terms for a given trial. These studies include Nagar (1969); Evans, Klein,
and Saito (1972); Fromm, Klein, and S&ink (1972); Green, Liebenberg, and
Hirsch (1972); Sowey (1973); Cooper and Fischer (1972); Cooper (1974); Garbade
(1975); Bianchi, Calzolari, and Corsi (1976); and Calzolari and Corsi (1977).
Studies in which stochastic simulation with respect to both the error terms and
coefficient estimates has been performed include Cooper and Fischer (1974);
Schink (1971), (1974); Haitovsky and Wallace (1972); Muench, Rolnick, Wallace,
and Weiler (1974); and Fair (1980).
One important empirical conclusion that can be drawn from stochastic simula-
tion studies to date is that the values computed from deterministic simulations are
quite close to the mean predicted values computed from stochastic simulations. In
other words, the bias that results from using deterministic simulation to solve
nonlinear models appears to be small. This conclusion has been reached by Nagar
(1969), Sowey (1973), Cooper (1974), Bianchi, Calzolani, and Corsi (1976), and
Calzolani and Corsi (1977) for stochastic simulation with respect to the error
terms only and by Fair (1980) for stochastic simulation with respect to both error
terms and coefficients.
A standard way of drawing values of (Y*from the N( &, P) distribution is to (1)
factor numerically (using a subroutine package) P into PP',(2)draw (again using

‘Note that f denotes the first period of the simulation, so that ji, is the prediction for period
ti k-l.
Ch. 33: Evaluating the Predictive Accuracy of Models 1983

a subroutine package) K values of a standard normal random variable with mean


0 and variance 1, and (3) compute (Y* as & + Pe, where e is the K X 1 vector of
the standard normal draws. Since Eee’ = I, then E(LY* - ;)(a* - c?)‘= EPee’P’
= v, which is as desired for the distribution of LX*.A similar procedure can be
used to draw values of UT from the N(0, 2) distribution: 2 is factored into PP’,
and UT is computed as Pe, where e is a m x 1 vector of standard normal draws.
An alternative procedure for drawing values of the error terms, due to
McCarthy (1972), has also been used in practice. For this procedure one begins
with the m X T matrix of estimated error terms, U. T standard normal random
variables are then drawn, and u: is computed as Tp112fiee, where e is a T x 1
vector of the standard normal draws. It is easy to show that the covariance matrix
of UT is 2, where, as above, 2 is (l/T)oc’.
An alternative procedure is also available for drawing values of the coefficients.
Given the estimation period (say, 1 through T) and given 2, one can draw T
values of u:(t =l,..., T). One can then add these errors to the model and solve
the model over the estimation period (static simulation, using the original values
of the coefficient estimates). The predicted values of the endogenous variables
from this solution can be taken to be a new data base, from which a new set of
coefficients can be estimated. This set can then be taken to be one draw of the
coefficients. This procedure is more expensive than drawing from the N(&, P)
distribution, since reestimation is required for each draw, but it has the advantage
of not being based on a fixed estimate of the distribution of the coefficient
estimates. It is, of course, based on a fixed value of 2 and a fixed set of original
coefficient estimates.
It should finally be noted with respect to the solution of models that in actual
forecasting situations most models are subjectively adjusted before the forecasts
are computed. The adjustments take the form of either using values other than
zero for the future error terms or using values other than the estimated values for
the coefficients. Different values of the same coefficient are sometimes used for
different periods. Adjusting the values of constant terms is equivalent to adjusting
values of the error terms, given that a different value of the constant term can be
used each period.3 Adjustments of this type are sometimes called “add factors”.
With enough add factors it is possible, of course, to have the forecasts from a
model be whatever the user wants, subject to the restriction that the identities
must be satisfied. Most add factors are subjective in that the procedure by which
they were chosen cannot be replicated by others. A few add factors are objective.
For example, the procedure of setting the future values of the error terms equal to
the average of the past two estimated values is an objective one. This procedure,

3Although much of the discussion in the literature is couched in terms of constant-term adjustments,
Intriligator (1978, p. 516) prefers to interpret the adjustments as the user’s estimates of the future
values of the error terms.
1984 R. C. Fair

along with another type of mechanical adjustment procedure, is used for some of
the results in Haitovsky, Treyz, and Su (1974). See also Green, Liebenberg, and
Hirsch (1972) for other examples.

3. Evaluation of ex ante forecasts

The three most common measures of predictive accuracy are root mean squared
error (RMSE), mean absolute error (MAE), and Theil’s inequality coefficient4
(U). Let yii, be the forecast of variable i for period t, and let y,, be the actual
value. Assume that observations on jjii, and y,, are available for t = 1,. . . , T. Then
the measures for this variable are:

MAE (4

u (5)

where A in (5) denotes either absolute or percentage change. All three measures
are zero if the forecasts are perfect. The MAE measure penalizes large errors less
than does the RMSE measure. The value of U is one for a no-change forecast
(A ji, = 0). A value of U greater than one means that the forecast is less accurate
than the simple forecast of no change.
An important practical problem that arises in evaluating ex ante forecasting
accuracy is the problem of data revisions. Given that the data for many variables
are revised a number of times before becoming “final”, it is not clear whether the
forecast values should be compared to the first-released values, to the final values,
or to some set in between. There is no obvious answer to this problem. If the
revision for a particular variable is a benchmark revision, where the level of
the variable is revised beginning at least a few periods before the start of the
prediction period, then a common procedure is to adjust the forecast value by

4See Theil (1966, p, 28).


Ch. 33: Evaluating the Predictive Accuracy of Models 1985

adding the forecasted change (AJii,), which is based on the old data, to the new
lagged value (ri,_J and then comparing the adjusted forecast value to the new
data. If, say, the revision took the form of adding a constant amount ji to each of
the old values of yit, then this procedure merely adds the same Ji to each of the
forecasted values of yit. This procedure is often followed even if the revisions are
not all benchmark revisions, on the implicit assumption that they are more like
benchmark revisions than other kinds. Following this procedure also means that
if forecast changes are being evaluated, as in the U measure, then no adjustments
are needed.
There are a number of studies that have examined ex ante forecasting accuracy
using one or more of the above measures. Some of the more recent studies are
McNees (1973, 1974, 1975, 1976) and Zarnowitz (1979). It is usually the case that
forecasts from both model builders and nonmodel builders are examined and
compared. A common “base” set of forecasts to use for comparison purposes is
the set from the ASA/NBER Business Outlook Survey. A general conclusion
from these studies is that there is no obvious “winner” among the various
forecasters [see, for example, Zarnowitz (1979, pp. 23, 30)]. The relative perfor-
mance of the forecasters varies considerably across variables and length ahead of
the forecast, and the differences among the forecasters for a given variable and
length ahead are generally small. This means that there is yet little evidence that
the forecasts from model builders are more accurate than, say, the forecasts from
the ASA/NBER Survey.
Ex ante forecasting comparisons are unfortunately of little interest from the
point of view of examining the predictive accuracy of models. There are two
reasons for this. The first is that the ex ante forecasts are based on guessed rather
than actual values of the exogenous variables. Given only the actual and forecast
values of the endogenous variables, there is no way of separating a given error
into that part due to bad guesses and that part due to other factors. A model
should not necessarily be penalized for bad exogenous-variable guesses from its
users. More will be said about this in Section 5. The second, and more important,
reason is that almost all the forecasts examined in these studies are generated
from subjectively adjusted models, (i.e. subjective add factors are used). It is thus
the accuracy of the forecasting performance of the model builders rather than of
the models that is being examined.
Before concluding this section it is of interest to consider two further points
regarding the subjective adjustment of models. First, there is some indirect
evidence that the use of add factors is quite important in practice. The studies of
Evans, Haitovsky, and Treyz (1972) and Haitovsky and Treyz (1972) analyzing
the Wharton and OBE models found that the ex ante forecasts from the model
builders were more accurate than the ex post forecasts from the models, even
when the same add factors that were used for the ex ante forecasts were used for
the ex post forecasts. In other words, the use of actual rather than guessed values
1986 R. C. Fair

of the exogenous variables decreased the accuracy of the forecasts. This general
conclusion can also be drawn from the results for the BEA model in Table 3 in
Hirsch, Grimm, and Narasimham (1974). This conclusion is consistent with the
view that the add factors are (in a loose sense) more important than the model in
determining the ex ante forecasts: what one would otherwise consider to be an
improvement for the model, namely the use of more accurate exogenous-variable
values, worsens the forecasting accuracy.
Second, there is some evidence that the accuracy of non-subjectively adjusted
ex ante forecasts is improved by the use of actual rather than guessed values of
the exogenous variables. During the period 1970111-197311, I made ex ante
forecasts using a short-run forecasting model [Fair (1971)]. No add factors were
used for these forecasts. The accuracy of these forecasts is examined in Fair
(1974) and the results indicate that the accuracy of the forecasts is generally
improved when actual rather than guessed values of the exogenous variables are
used. ’
It is finally of interest to note, although nothing really follows from this, that
the (non-subjectively adjusted) ex ante forecasts from my forecasting model were
on average less accurate than the subjectively adjusted forecasts [McNees (1973)],
whereas the ex post forecasts, (i.e. the forecasts based on the actual values of the
exogenous variables) were on average about the same degree of accuracy as the
subjectively adjusted forecasts [Fair (1974)].

4. Evaluation of ex post forecasts

The measures in (3)-(5) have also been widely used to evaluate the accuracy of
ex post forecasts. One of the more well known comparisons of ex post forecasting
accuracy is described in Fromm and Klein (1976) where eleven models are
analyzed. The standard procedure for ex post comparisons is to compute ex post
forecasts over a common simulation period, calculate for each model and variable
an error measure, and compare the values of the error measure across models. If
the forecasts are outside-sample, there is usually some attempt to have the ends
of the estimation periods for the models be approximately the same. It is
generally the case that forecasting accuracy deteriorates the further away the
forecast period is from the estimation period, and this is the reason for wanting to
make the estimation periods as similar as possible for different models.
The use of the RMSE measure, or one of the other measures, to evaluate
ex post forecasts is straightforward, and there is little more to be said about this.
Sometimes the accuracy of a given model is compared to the accuracy of a
“naive” model, where the naive model can range from the simple assumption of
no change in each variable to an autoregressive moving average (ARIMA) process
for each variable. (The comparison with the no-change model is, of course,
Ch. 33: Evaluating the Predictive Accuracy of Models 1987

already implicit in the U measure.) It is sometimes the case that turning-point


observations are examined separately, where by “ turning point” is meant a point
at which the change in a variable switches sign. There is nothing inherent in the
statistical specification of models that would lead one to examine turning points
separately, but there is a strand of the literature in which turning-point accuracy
has been emphasized.
Although the use of the RMSE or similar measure is widespread, there are two
serious problems associated with the general procedure. The first concerns the
exogenous variables. Models differ both in the number and types of variables that
are taken to be exogenous and in the sensitivity of the predicted values of the
endogenous variables to the exogenous-variable values. The procedure does not
take these differences into account. If one model is less “endogenous” than
another (say that prices are taken to be exogenous in one model but not in
another), then it has an unfair advantage in the calculation of the error measures.
The other problem concerns the fact that forecast error variances vary across
time. Forecast error variances vary across time both because of nonlinearities in
the model and because of variation in the exogenous variables. Although RMSEs
are in some loose sense estimates of the averages of the variances across time, no
rigorous statistical interpretation can be placed on them: they are not estimates of
any parameters of the model.
There is another problem associated with within-sample calculations of the
error measures, which is the possible existence of data mining. If in the process of
constructing a model one has, by running many regressions, searched diligently
for the best fitting equation for each variable, there is a danger that the equations
chosen, while providing good fits within the estimation period, are poor ap-
proximations to the true structure. Within-sample error calculations are not likely
to discover this, and so they may give a very misleading impression of the true
accuracy of the model. Outside-sample error calculations should, of course, pick
this up, and this is the reason that more weight is generally placed on outside-
sample results.
Nelson (1972) used an alternative procedure in addition to the RMSE proce-
dure in his ex post evaluation of the FRB-MIT-PENN (FMP) model. For each of
a number of endogenous variables he obtained a series of static predictions using
both the FMP model and an ARIMA model. He then regressed the actual value
of each variable on the two predicted values over the period for which the
predictions were made. Ignoring the fact that the FMP model is nonlinear, the
predictions from the model are conditional expectations based on a given
information set. If the FMP model makes efficient use of this information, then
no further information should be contained in the ARIMA predictions. The
ARIMA model for each variable uses only a subset of the information, namely,
that contained in the past history of the variable. Therefore, if the FMP model
has made efficient use of the information, the coefficient for the ARIMA
1988 R. C. Fair

predicted values should be zero. Nelson found that in general the estimates of this
coefficient were significantly different from zero. This test, while interesting,
cannot be used to compare models that differ in the number and types of
variables that are taken to be exogenous. In order to test the hypothesis of
efficient information use, the information set used by one model must be
contained in the set used by the other model, and this is in general not true for
models that differ in their exogenous variables.

5. An alternative method for evaluating predictive accuracy

The method discussed in this section takes account of exogenous-variable uncer-


tainty and of the fact that forecast error variances vary across time. It also deals
in a systematic way with the question of the possible misspecification of the
model. It accounts for the four main sources of uncertainty of a forecast:
uncertainty due to (1) the error terms, (2) the coefficient estimates, (3) the
exogenous-variable forecasts, and (4) the possible misspecification of the model.
The method is discussed in detail in Fair (1980). The following is an outline of its
main features.
Estimating the uncertainty from the error terms and coefficients can be done by
means of stochastic simulation. Let u~$ denote the variance of the forecast error
for a k-period-ahead forecast of variable i from a simulation beginning in period
t. Given the J trials discussed in Section 2, a stochastic-simulation estimate of (I,:~
(denoted 6&) is:

where Jitk is determined by (2). If an estimate of the uncertainty from the error
terms only is desired, then the trials consist only of draws from the distribution of
the error terms.5
There are two polar assumptions that can be made about the uncertainty of the
exogenous variables. One is, of course, that there is no exogenous-variable
uncertainty. The other is that the exogenous-variable forecasts are in some way as
uncertain as the endogenous-variable forecasts. Under this second assumption
one could, for example, estimate an autoregressive equation for each exogenous
variable and add these equations to the model. This expanded model, which
would have no exogenous variables, could then be used for the stochastic-simula-

‘Note that it is implicitly assumed here that the variances of the forecast errors exist. For some
estimation techniques this is not always the case. If in a given application the variances do not exist,
then one should estimate other measures of dispersion of the distribution, such as the interquartile
range or mean absolute deviation.
Ch. 33: Evaluating the Predictive Accuracy of Models 1989

tion estimates of the variances. While the first assumption is clearly likely to
underestimate exogenous-variable uncertainty in most applications, the second
assumption is likely to overestimate it. This is particularly true for fiscal-policy
variables in macroeconomic models, where government-budget data are usually
quite useful for purposes of forecasting up to at least about eight quarters ahead.
The best approximation is thus likely to lie somewhere in between these two
assumptions.
The assumption that was made for the results in Fair (1980) was in between the
two polar assumptions. The procedure that was followed was to estimate an
eighth-order autoregressive equation for each exogenous variable (including a
constant and time in the equation) and then to take the estimated standard error
from this regression as the estimate of the degree of uncertainty attached to
forecasting the change in this variable for each period. This procedure ignores the
uncertainty of the coefficient estimates in the autoregressive equations, which is
one of the reasons it is not as extreme as the second polar assumption. In an
earlier stochastic-simulation study of Haitovsky and Wallace (1972), third-order
autoregressive equations were estimated for the exogenous variables, and these
equations were then added to the model. This procedure is consistent with the
second polar assumption above except that for purposes of the stochastic
simulations Haitovsky and Wallace took the variances of the error terms to be
one-half of the estimated variances. They defend this procedure (pp. 267-268) on
the grounds that the uncertainty from the exogenous-variable forecasts is likely to
be less than is reflected in the autoregressive equations.
Another possible procedure that could be used for the exogenous variables
would be to gather from various forecasting services data on their ex ante
forecasting errors of the exogenous variables (exogenous to you, not necessarily to
the forecasting service). From these errors for various periods one could estimate
a standard error for each exogenous variable and then use these errors for the
stochastic-simulation draws.
For purposes of describing the present method, all that needs to be assumed is
that some procedure is available for estimating exogenous-variable uncertainty. If
equations for the exogenous variables are not added to the model, but instead
some in between procedure is followed, then each stochastic-simulation trial
consists of draws of error terms, coefficients, and exogenous-variable errors. If
equations are added, then each trial consists of draws of error terms and
coefficients from both the structural equations and the exogenous-variable equa-
tions. In either case, let izk denote the stochastic-simulation estimate of the
variance of the forecast error that takes into account exogenous-variable uncer-
tainty. dik differs from c?;, in (6) in that the trials for g,$ include draws of
exogenous-variable errors.
Estimating the uncertainty from the possible misspecification of the model is
the most difhcult and costly part of the method. It requires successive reestima-
tion and stochastic simulation of the model. It is based on a comparison of
1990 R. C. Fair

estimated variances computed by means of stochastic simulation with estimated


variances computed from outside-sample forecast errors.
Consider for now stochastic simulation with respect to the structural error
terms and coefficients only (no exogenous-variable uncertainty). Assume that the
forecast period begins one period after the end of the estimation period, and call
this period t. As noted above, from this stochastic simulation one obtains an
estimate of the variance of the forecast error, 6&. One also obtains from this
simulation an estimate of the expected value of the k-period-ahead forecast of
variable i: Fitk in equation (2). The difference between this estimate and the
actual value, yiltk_i, is the mean forecast error:
I

kitk = Yil+k-l- Yitk. (7)

If it is assumed that Tirk exactly equals the true expected value, jjjrk, then iitk in
(7) is a sample draw from a distribution with a known mean of zero and variance
u$. The square of this error, Zf,,,, is thus under this assumption an unbiased
estimate of IJ~:~. One thus has two estimates of u,$, one computed from the mean
forecast error and one computed by stochastic simulation. Let ditk denote the
difference between these two estimates:

ditk = a,:, - ~5;~.

If it is further assumed that 6$ exactly equals the true value, then ditk is the
difference between the estimated variance based on the mean forecast error and
the true variance. Therefore, under the two assumptions of no error in the
stochastic-simulation estimates, the expected value of dirk is zero.
The assumption of no stochastic-simulation error, i.e. Jjtk = jitk and 6ii:k = ai,,
is obviously only approximately correct at best. Even with an infinite number of
draws the assumption would not be correct because the draws are from estimated
rather than known distributions. It does seem, however, that the error introduced
by this assumption is likely to be small relative to the error introduced by the fact
that some assumption must be made about the mean of the distribution of ditk.
Because of this, nothing more will be said about stochastic-simulation error. The
emphasis instead is on the possible assumptions about the mean of the distribu-
tion of dilk, given the assumptions of no stochastic-simulation error.
The procedure just described uses a given estimation period and a given
forecast period. Assume for sake of an example that one has data from period 1
through 100. The model can then be estimated through, say, period 70, with the
forecast period beginning with period 71. Stochastic simulation for the forecast
period will yield for each i and k a value of diTlk in (8). The model can then be
reestimated through period 71, with the forecast period now beginning with
period 72. Stochastic simulation for this forecast period will yield for each i and k
a value of di72k in (8). This process can be repeated through the estimation period
Ch. 33: Evaluating the Predictive Accuracy of Models 1991

ending with period 99. For the one-period-ahead forecast (k = 1) the procedure
will yield for each variable i 30 values of d,,, (t = 71,. . . ,100); for the two-
period-ahead forecast (k = 2) it will yield 29 values of dif2 (t = 72,. . . ,100); and
so on. If the assumption of no simulation error holds for all t, then the expected
value of dirk is zero for all t.
The discussion so far is based on the assumption that the model is correctly
specified. Misspecification has two effects on dirk in (8). First, if the model is
misspecified, the estimated covariance matrices that are used for the stochastic
simulation will not in general be unbiased estimates of the true covariance
matrices. The estimated variances computed by means of stochastic simulation
will thus in general be biased. Second, the estimated variances computed from the
forecast errors will in general be biased estimates of the true variances. Since
n&specification affects both estimates, the effect on ditk is ambiguous. It is
possible for misspecification to affect the two estimates in the same way and thus
leave the expected value of the difference between them equal to zero. In general,
however, this does not seem likely, and so in general one would not expect the
expected value of ditk to be zero for a misspecified model. The expected value
may be negative rather than positive for a misspecified model, although in general
it seems more likely that it will be positive. Because of the possibility of data
mining, m&specification seems more likely to have a larger positive effect on the
outside sample forecast errors than on the (within-sample) estimated covariance
matrices.
An examination of how the dilk values change over time (for a given i and k)
may reveal information about the strengths and weaknesses of the model that one
would otherwise not have. This information may then be useful in future work on
the model. The individual values may thus be of interest in their own right aside
from their possible use in estimating total predictive uncertainty.
For the total uncertainty estimates some assumption has to be made about how
misspecification affects the expected value of drtk. For the results in Fair (1980a)
it was assumed that the expected value of dirk is constant across time: for a given
i and k, misspecification was assumed to affect the mean of the distribution of
dilk in the same way for all t. Other possible assumptions are, of course, possible.
One could, for example, assume that the mean of the distribution is a function of
other variables. (A simple assumption in this respect is that the mean follows a
linear time trend.) Given this assumption, the mean can be then estimated from a
regression of dirk on the variables. For the assumption of a constant mean, this
regression is merely a regression on a constant (i.e. the estimated constant term is
merely the mean of the ditk values).6 The predicted value from this regression for
period t, denoted aitk, is the estimated mean for period t.

6For the results in Fair (1980) a slightly different assumption than that of a constant mean was
made for variables with trends. For these variables it was assumed that the mean of dlrk is
proportional to ;,:,, 1‘.e that the mean of di,k/Fi;i:k is constant across time.
1992 R. C. Fair

An estimate of the total variance of the forecast error, denoted Sg,, is the sum
of di$ - the stochastic-simulation estimate of the variance due to the error terms,
coefficient estimates, and exogenous variables - and aitk:

Since the procedure in arriving at 6& takes into account the four main sources of
uncertainty of a forecast, the values of c%&can be compared across models for a
given i, k, and t. If, for example, one model has consistently smaller values of 15,:~
then another, this would be fairly strong evidence for concluding that it is a more
accurate model, i.e. a better approximation to the true structure.
This completes the outline of the method. It may be useful to review the main
steps involved in computing I?& in (9). Assume that data are available for periods
1 through T and that one is interested in estimating the uncertainty of an
eight-period-ahead forecast that began in period T + 1, (i.e. in computing 15~3,for
t=T+l and k=l , . . . ,8). Given a base set of values for the exogenous variables
for periods T + 1 through T + 8, one can compute izi:, for f = T + 1 and k = 1,. . . ,8
by means of stochastic simulation. Each trial consists of one eight-period dynamic
simulation and requires draws of the error terms, coefficients, and exogenous-vari-
able errors. These draws are based on the estimate of the model through period T.
This is the relative inexpensive part of the method. The expensive part consists of
the successive reestimation and stochastic simulation of the model that are needed
in computing the ditk values. In the above example, the model would be
estimated 30 times and stochastically simulated 30 times in computing the ditk
values. After these values are computed for, say, periods T - r through T, then
airk can be computed for t = T + 1 and k = 1,. . . ,8 using whatever assumption has
been made about the distribution of dilk. This allows S& in (9) to be computed
for t=T+l and k=1,...,8.
In the successive reestimation of the model, the first period of the estimation
period may or may not be increased by one each time. The criterion that one
should use in deciding this is to pick the procedure that seems likely to corre-
spond to the chosen assumption about the distribution of dirk being the best
approximation to the truth. It is also possible to take the distance between the last
period of the estimation period and the first period of the forecast period to be
other than one, as was done above.
It is important to note that the above estimate of the mean of the ditk
distribution is not in general efficient because the error term in the ditk regression
is in general heteroscedastic. Even under the null hypothesis of no misspecifica-
tion, the variance of the drtk distribution is not constant across time. It is true,
however, that &J( 6& + d,,,) “* has unit variance under the null hypothesis,
and so it may not be a bad approximation to assume that P~,/(c?& + a,,,) has a
constant variance across time. This then suggests the following iterative proce-
Ch. 33: Evaluating the Predictive Accuracy of Models 1993

dure. 1) For each i and k, calculate ditk from the dirk regression, as discussed
above; 2) divide each observation in the ditk regression by C$ + airk, run another
regression, and calculate artk from this regression; 3) repeat step 2) until the
successive estimates of aitk are within some prescribed tolerance level. Litterman
(1980) has carried out this procedure for a number of models for the case in
which the only explanatory variable in the ditk regression is the constant term (i.e.
for the case in which the null hypothesis is that the mean of the ditk distribution
is constant across time).
If one is willing to assume that gitk is normally distributed, which is at best
only an approximation, then Litterman (1979) has shown that the above iterative
procedure produces maximum likelihood estimates. He has used this assumption
in Litterman (1980) to test the hypothesis (using a likelihood ratio test) that the
mean of the dirk distribution is the same in the first and second halves of the
sample period. The hypothesis was rejected at the 5 percent level in only 3 of 24
tests. These results thus suggest that the assumption of a constant mean of the
ditk distribution may not be a bad approximation in many cases. This conclusion
was also reached for the results in Fair (1982), where plots of dirk values were
examined across time (for a given i and k). There was little evidence from these
plots that the mean was changing over time.
The mean of the ditk distribution can be interpreted as a measure of the
average unexplained forecast error variance, (i.e. that part not explained by &)
rather than as a measure of n&specification. Using this interpretation, Litterman
(1980) has examined whether the use of the estimated means of the ditk distribu-
tions lead to more accurate estimates of the forecast error variances. The results
of his tests, which are based on the normality assumption, show that substantially
more accurate estimates are obtained using the estimated means. Litterman’s
overall results are thus quite encouraging regarding the potential usefulness of the
method discussed in this section.
Aside from Litterman’s use of the method to compare various versions of Sims’
(1980) model, I have used the method to compare my model [Fair (1976)],
Sargent’s (1976) model, Sims’ model, and an eighth-order autoregressive model.
The results of this comparison are presented in Fair (1979).

6. Conclusion

It should be clear from this chapter that the comparison of the predictive
accuracy of alternative models is not a straightforward exercise. The difficulty of
evaluating alternative models is undoubtedly one of the main reasons there is
currently so little agreement about which model best approximates the true
structure of the economy. If it were easy to decide whether one model is more
accurate than another, there would probably be by now a generally agreed upon
1994 R. C. Fair

model of, for example, the U.S. economy. With further work on methods like the
one described in Section 5, however, it may be possible in the not-too-distant
future to begin a more systematic comparison of models. Perhaps in ten or twenty
years time the use of these methods will have considerably narrowed the current
range of disagreements.

References

Bianchi, C., G. Calzolari and P. Corsi (1976) “Divergences in the Results of Stochastic and
Deterministic Simulation of an Italian Non Linear Econometric Model”, in: L. Dekker, ed.,
Simulation of Systems. Amsterdam: North-Holland Publishing Co.
Calzolari. G. and P. Corsi (1977) “Stochastic Simulation as a Validation Tool for Econometric
Models”. Paper presented at IIASA Seminar, Laxenburg, Vienna, September 13-15.
Cooper, J. P. (1974) Development of the Monetary Sector, Prediction and Policy Analysis in the
FRB-MIT-Penn Model. Lexington: D. C. Heath & Co.
Cooper. J. P. and S. Fischer (1972) “Stochastic Simulation of Monetary Rules in Two Macroecono-
metric Models”, Journal of the American Statistical Association, 67, 750-760.
Cooper. J. P. and S. Fischer (1974) “Monetarv and Fiscal Policv in the Fullv Stochastic St. Louis
Econometric Model”, Journal of money, Credit and Banking, 6, i-22. .
Evans, Michael K., Yoel Haitovsky and George I. Treyz, assisted by Vincent Su (1972) “An Analysis
of the Forecasting Properties of U.S. Econometric Models”, in: B. G. Hickman, ed., Econometric
Models of Cyclical Behavior. New York: Columbia University Press, 949-1139.
Evans, M.. K:, L. R. Klein and M. Saito (1972) “Short-Run Prediction and Long-Run Simulation of
the Wharton Model”. in: B. G. Hickman. ed., Econometric Models of ~ Cvclical
I Behavior. New York:
Columbia University Press, 139-185.
Fair, Ray C. (1971) A Short-Run Forecastrng Model of the United States Economy. Lexington: D. C.
Heath & Co.
Fair, Ray C. (1974) “An Evaluation of a Short-Run Forecasting Model”, International Economic
Review, 15, 285-303.
Fair, Ray C. (1976) A Model of Macroeconomic Activity. Volume II: The Empirical Model. Cambridge:
Ballinger Publishing Co.
Fair, Ray C. (1979) “An Analysis of the Accuracy of Four Macroeconometric Models”, Journal of
Political Economy, 87, 701-718.
Fair, Ray C. (1980) “Estimating the Expected Predictive Accuracy of Econometric Models,” Interna-
tional Economic Review, 21, 355-378.
Fair, Ray C. (1982) “The ElTects of l&specification on Predictive Accuracy,” in: G. C. Chow and P.
Corsi, eds., Evaluating the Reliability of Macro-economic Models. New York: John Wiley & Sons,
193-213.
Fromm, Gary and Lawrence R. Klein (1976) “The NBER/NSF Model Comnarison Seminar: An
Analysis of-Results”, Annals of Economic and Social Measurement, Winter, 5, i-28.
Fromm. Gary. L. R. Klein and G. R. S&ink (1972) “Short- and Lone-Term Simulations with the
Brookings Model”, in: B. G. Hickman, ed., Econometric Models of C$lical Behavior. New York:
Columbia University Press, 201-292.
Garbade, K. D. (1975) Discretionary Control of Aggregate Economic Activity. Lexington: D. C. Heath
& co.
Granger. C. W. J. and Paul Newbold (1977) Forecasting Economic Time Series. New York: Academic
Press.
Green, G. R., M. Liebenberg and A. A. Hirsch (1972) “Short- and Long-Term Simulations with the
OBE Econometric Model”, in: B. G. Hickman, ed., Econometric Models of C_vclicalBehavior. New
York: Columbia University Press, 25-123.
Haitovsky, Yoel and George Treyz (1972) “Forecasts with Quarterly Macroeconometric Models:
Equation Adjustments, and Benchmark Predictions: The U.S. Experience”, The Review of Economics
Ch. 33: Evuluating the Predictive Accuracy of Models 1995

und Statistics, 54, 317-325.


Haitovsky, Yoel, G. Treyz and V. Su (1974) Forecusts with Quarterly Mucroeconometric Models. New
York: National Bureau of Economic Research, Columbia University Press.
Haitovsky, Y. and N. Wallace (1972) “A Study of Discretionary and Non-discretionary Monetary and
Fiscal Policies in the Context of Stochastic Macroeconometric Models”, in: V. Zarnowitz, ed., The
Business Cycle Toduy. New York: Columbia University Press.
Hirsch. Albert A.. Bruce T. Grimm and Gorti V. L. Narasimham (1974) “Some Multiplier and Error
Characteristics of the BEA Quarterly Model”, International Economic’Review, 15, 616-631.
Howrey, E. P. and H. H. Kelejian (1971) “Simulation versus Analytical Solutions: The Case of
Econometric Models”, in: T. H. Naylor, ed., Computer Simulation Experiments with Models of
Economic Systems. New York: Wiley.
Intriligator, Michael D. (1978) Econometric Models, Techniques, and Applicutions. Amsterdam:
North-Holland Publishing Co.
Litterman, Robert B. (1979) “Techniques of Forecasting Using Vector Autoregression”. Working
Paper No. 115, Federal Reserve Bank of Minneapolis, November.
Litterman, Robert B. (1980) “Improving the Measurement of Predictive Accuracy”, mimeo.
McCarthy, Michael D. (1972) “Some Notes on the Generation of Pseudo-Structural Errors for Use in
Stochastic Simulation Studies”, in: B. G. Hickman, ed., Econometric Models of Cyclicul Behavior.
New York: Columbia University Press, 185-191.
McNees, Stephen K. (1973) “The Predictive Accuracy of Econometric Forecasts”, New England
Economic Review, September/October, 3-22.
McNees, Stephen K. (1974) “How Accurate Are Economic Forecasts?“, New England Economic
Review, November/December, 2-19.
McNees, Stephen K. (1975) “An Evaluation of Economic Forecasts”, New England Economic Reuiew,
November/December, 3-39.
McNees, Stephen K. (1976) “An Evaluation of Economic Forecasts: Extension and Update”, New
England Economic Reuiew. September/October, 30-44.
Muench, T., A. Rolnick, N. Wallace and W. Weiler (1974) “Tests for Structural Change and
Prediction Intervals for the Reduced Forms of the Two Structural Models of the U.S.: The
FRB-MIT and Michigan Quarterly Models”, Annuls of Economic and Social Measurement, 3,
491-519.
Nagar, A. L. (1969) “Stochastic Simulation of the Brookings Econometric Model”, in: .I. S.
Duesenberry, G. Fromm, L. R. Klein, and E. Kuh, eds., The Brookings Model: Some Further Results.
Chicago: Rand McNally & Co.
Nelson, Charles R. (1972) “The Prediction Performance of the FRB-MIT-PENN Model of the U.S.
Economy”, The American Economic Review, 62, 902-917.
Sargent, Thomas J. (1976) “A Classical Macroeconometric Model for the United States”, Journal of
Political Economy, 84, 207-237.
Schink, G. R. (1971) “Small Sample Estimates of the Variance-Covariance Matrix Forecast Error for
Large Econometric Models: The Stochastic Simulation Technique”, Ph.D. Dissertation, University
of Pennsylvania.
S&ink. G. R. (1974) “Estimation of Small Sample Forecast Error for Nonlinear Dynamic Models: A
Stochastic Simulation Approach”, mimeo. _
Sims. Christonher A. (1980) “Macroeconomics and Real&“. Econometrica, 48, l-48.
Sowey, E. R. i1973) “Stochastic Simulation for Macroeconomic Models: Methodology and Interpreta-
tion”, in: A. A. Powell and R. W. Williams, eds., Econometric Studies of Mucro and Monetary
Relutions. Amsterdam: North-Holland Publishing Co.
Theil, Henri (1966) Applied Economic Forecasting-Amsterdam: North-Holland Publishing Co.
Zamowitz, Victor (1979) “An Analysis of Annual and Multiperiod Quarterly Forecasts of Aggregate
Income, Output, and the Price Level”, Journal of Business, 52, l-33.
Chapter 34

NEW ECONOMETRIC APPROACHES TO STABILIZATION


POLICY IN STOCHASTIC MODELS OF MACROECONOMIC
FLUCTUATIONS
JOHN B. TAYLOR*

Stanford University

Contents

1. Introduction 1998
2. Solution concepts and techniques 1998
2.1. Scalar models 1999
2.2. Bivariate models 2016
2.3. The use of operators, generating functions, and z-transforms 2031
2.4. Higher order representations and factorization techniques 2033
2.5 Rational expectations solutions as boundary value problems 2037
3. Econometric evaluation of policy rules 2038
3.1. Policy evaluation for a univariate model 2039
3.2. The Lucas critique and the Cowles Commission critique 2040
3.3. Game-theoretic approaches 2041
4. Statistical inference 2041
4.1. Full information estimation 2041
4.2. Identification 2043
4.3. Hypothesis testing 2044
4.4. Limited information estimation methods 2044
5. General linear models 2045
5.1. A general first-order vector model 2045
5.2. Higher order vector models 2047
6. Techniques for nonlinear models 2048
6.1. Multiple shooting method 2049
6.2. Extended path method 2049
6.3. Nonlinear saddle path manifold method 2050
7. Concluding remarks 2051
References 2052

*Grants from the National Science Foundation and the Guggenheim Foundation are gratefully
acknowledged. I am also grateful to Olivier Blanchard, Gregory Chow, Avinash Dixit, George Evans,
Zvi Griliches, Sandy Grossman, Ben McCallum, David Papell, Larry Reed, Philip Reny, and Ken
West for helpful discussions and comments on an earlier draft.

Handbook of Econometrics, Volume III, Edited by Z. Griliches und M.D. Intriligator


0 Elsevier Science Publishers BV, 1986
1998 J. B. Taylor

1. Introduction

During the last 15 years econometric techniques for evaluating macroeconomic


policy using dynamic stochastic models in which expectations are consistent, or
rational, have been developed extensively. Designed to solve, control, estimate, or
test such models, these techniques have become essential for theoretical and
applied research in macroeconomics. Many recent macro policy debates have
taken place in the setting of dynamic rational expectations models. At their best
they provide a realistic framework for evaluating policy and empirically testing
assumptions and theories. At their worst, they serve as a benchmark from which
the effect of alternative assumptions can be examined. Both “new Keynesian”
theories with sticky prices and rational expectations, as well as “new Classical”
theories with perfectly flexible prices and rational expectations fall within the
domain of such models. Although the models entail very specific assumptions
about expectation formation and about the stochastic processes generating the
macroeconomic time series, they may serve as an approximation in other cir-
cumstances where the assumptions do not literally hold.
The aim of this chapter is to describe and explain these recently developed
policy evaluation techniques. The focus is on discrete time stochastic models,
though some effort is made to relate the methods to the geometric approach (i.e.
phase diagrams and saddlepoint manifolds) commonly used in theoretical con-
tinuous time models. The exposition centers around a number of specific proto-
type rational expectations models. These models are useful for motivating the
solution methods and are of some practical interest per se. Moreover, the
techniques for analyzing these prototype models can be adapted fairly easily to
more general models. Rational expectations techniques are much like techniques
to solve differential equations: once some of the basic ideas, skills, and tricks are
learned, applying them to more general or higher order models is straightforward
and, as in many differential equations texts, might be left as exercises.
Solution methods for several prototype models are discussed in Section 2. The
effects of anticipated, unanticipated, temporary, or permanent changes in the
policy variables are calculated. The stochastic steady state solution is derived, and
the possibility of non-uniqueness is discussed. Evaluation of policy rules and
estimation techniques oriented toward the prototype models are discussed in
Sections 3 and 4. Techniques for general linear and nonlinear models are
discussed in Sections 5 and 6.

2. Solution concepts and techniques

The sine qua non of a rational expectations model is the appearance of forecasts
of events based on information available before the events take place. Many
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 1999

different techniques have been developed to solve such models. Some of these
techniques are designed for large models with very general structures. Others are
designed to be used in full information estimation where a premium is placed on
computing reduced form parameters in terms of structural parameters as quickly
and efficiently as possible. Others are short-cut methods designed to exploit
special features of a particular model. Still others are designed for exposition
where a premium is placed on analytic tractability and intuitive appeal. Graphical
methods fall in this last category.
In this section, I examine the basic solution concept and explain how to obtain
the solutions of some typical linear rational expectations models. For expositional
purposes I feel the method of undetermined coefficients is most useful. This
method is used in time series analysis to convert stochastic difference equations
into deterministic difference equations in the coefficients of the infinite moving
average representation. [See Anderson (1971, p. 236) or Harvey (1981, p. 38)]. The
difference equations in the coefficients have exactly the same form as a determin-
istic version of the original model, so that the method can make use of techniques
available to solve deterministic difference equations. This method was used by
Muth (1961) in his original exposition of the rational expectations assumption. It
provides a general unified treatment of most stochastic rational expectations
models without requiring knowledge of any advanced techniques, and it clearly
reveals the nature of the assumptions necessary for existence and uniqueness of
solutions. It also allows for different viewpoint dates for expectations, and
provides an easy way to distinguish between the effects of anticipated versus
unanticipated policy shifts. The method gives the solution in terms of an infinite
moving average representation which is also convenient for comparing a model’s
properties with the data as represented in estimated infinite moving average
representations. An example of such a comparison appears in Taylor (1980b). An
infinite moving average representation, however, is not useful for maximum
likelihood estimation for which a finite ARMA model is needed. Although it is
usually easy to convert an infinite moving average model into a finite ARMA
model, there are computationally more advantageous ways to compute the
ARMA model directly as we will describe below.

2.1. Scalar models

Let yz be a random variable satisfying the relationship

(24

where OLand 6 are parameters and E, is the conditional expectation based on all
information through period t. The variable U, is an exogenous shift variable or
“shock” to the equation. It is assumed to follow a general linear process with the
J. B. Taylor

representation

00

U,= C eiEt-,? (2.2)


i=O

where 0,=0,1,2,... is a sequence of parameters, and where E, is a serially


u‘ncorrelated random variable with zero mean. The shift variable could represent
a policy variable or a stochastic error term as in an econometric equation. In the
latter case, 8 would normally be set to 1.
The information upon which the expectation in (2.1) is conditioned includes
past and current observations on et as well as the values of (Y, 6, and 0,. The
presence of the expected value of a future endogenous variable E,y,+, is
emphasized in this prototype model because the dynamic properties that this
variable gives to the model persist in more complicated models and raise many
important conceptual issues. Solving the model means finding a stochastic process
for the random variable y, that satisfies eq. (2.1). The forecasts generated by this
process will then be equal to the expectations that appear in the model. In this
sense, expectations are consistent with the model, or equivalently, expectations
are rational.

A macroeconomic example. An important illustration of eq. (2.1) is a classical


full-employment macro model with flexible prices. In such a model the real rate of
interest and real output are unaffected by monetary policy and thus they can be
considered fixed constants. The demand for real money balances- normally a
function of the nominal interest rate and total output -is therefore a function
only of the expected inflation rate. If pt is the log of the price level and m, is the
log of the money supply, then the demand for real money can be represented as

(2.3)

with p > 0. In other words, the demand for real money balances depends
negatively on the expected rate of inflation, as approximated by the expected first
difference of the log of the price level. Eq. (2.3) can be written in the form of eq.
(2.1) by setting (Y= p/(1 + p) and 6 =l/(l+ p), and by letting y, = pt and
u, = m,. In this example the variable u, represents shifts in the supply of money,
as generated by the process (2.2). Alternatively, we could add an error term u, to
the right hand side of eq. (2.3), to represent shifts in the demand for money. Eq.
(2.3) was originally introduced in the seminal work by Cagan (1956), but with
adaptive, rather than rational expectations. The more recent rational expectations
version has been used by many researchers including Sargent and Wallace (1973).
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations

2. I. I. Some economic policy interpretations of the shocks

The stochastic process for the shock variable u, is assumed in eq. (2.2) to have a
general form. This form includes any stationary ARMA process [see Harvey
(1981), p. 27, for example]. For empirical applications this generality is necessary
because both policy variables and shocks to equations frequently have com-
plicated time series properties. In many policy applications (where U, in (2.2) is a
policy variable), one is interested in “thought experiments” in which the policy
variable is shifted in a special way and the response of the endogenous variables is
examined. In standard econometric model methodology, such thought experi-
ments require one to calculate policy multipliers [see Chow (1983), p. 147, for
example]. In forward-looking rational expectations models, the multipliers depend
not only on whether the shift in the policy variable is temporary or permanent,
but also on whether it is anticipated or unanticipated. Eq. (2.2) can be given a
special form to characterize these different thought experiments, as the following
examples indicate.
Temporary versus permanent shocks. The shock U, is purely temporary when
8, = 1 and Bi = 0 for i > 0. Then any shock U, is expected to disappear in the
period immediately after it has occurred; that is E,u,+; = 0 for i > 0 at every
realization of u,. At the other extreme the shock u, is permanent when 0; =l for
i > 0. Then any shock u, is expected to remain forever; that is Etut+, = u, for
i > 0 at every realization of u,. In this permanent case the u, process can be
written as u, = u,_r + E,. (Although U, is not a stationary process in this case, the
solution can still be used for thought experiments, or transformed into a sta-
tionary series by first-differencing.)
By setting 0, = p’, a range of intermediate persistence assumptions can be
modeled as p varies from 0 to 1. For 0 < p < 1 the shock u, is assumed to phase
out geometrically. In this case the u, process is simply U, = put-r + E,, a first
order autoregressive model. When p = 0, the disturbances are purely temporary.
When p = 1, they are permanent.
Anticipated versus unanticipated shocks. In policy applications it is also im-
portant to distinguish between anticipated and unanticipated shocks. Time delays
between the realization of the shock and its incorporation in the current informa-
tion set can be introduced for this purpose by setting Bi = 0 for values of i up to
the length of time of anticipation. For example, in the case of a purely temporary
shock, we can set 0, = 0, 8, = 1, fli = 0 for i > 1 so that u, = s,_r. This would
characterize a temporary shock which is anticipated one period in advance. In
other words the expectation of u,+i at time t is equal to u,+i because et = u,+r is
in the information set at time t. More generally a temporary shock anticipated k
periods in advance would be represented by U, = E,_~.
A permanent shock which is anticipated k periods in advance would be
modeled by setting B;=O for i=l,..., k-l and Bi=l for i=k, k+l,....
2002 J. B. Taylor

Table 1
Summary of alternative policies and their effects.

Model: y, = aEy,+r + aa,, loI ~1.


m
du,+, , i=O,l,
Policy: u,= 1 %,e,_,-%,=-
i=O de,
Solution Form: y,= f ,&,_,-,=~;~=o,l,....
r=n f
Stochastics: E, is serially uncorrelated with zero mean.
Thought Experiment: One time unit impulse to Q.
Theorem: For every integer k >_0.
if
Oforick,
8,=
P1-k fori>k,
then
as-“-k’
___ fori<k,
l-ap
Y, =
i$-k
__ forizk.
i l-ap

Interpretation:
Policy is anticipated k periods in advance,
k = 0 means unanticipated.
Policy is phased-out at geometric rate p, 0 I p 2 1,
p = 0 means purely temporary (N.B. p” = 1 when p = 0).
p = 1 means permanent.

Similarly, a shock which is anticipated k periods in advance and which is then


expected to phase out gradually would be modeled by setting @,= 0 for i =
1,,..,k-1 and @,=p’-” fori=k, kfl,..., withO<p<l.Inthiscase(2.2)can
be written alternatively as U, = put-i + .c_~, a first-order autoregressive model
with a time delay.
The various categories of shocks and their mathematical representations are
summarized in Table 1. Although in practice, we interpret E, in eq. (2.2) as a
continually perturbed random variable, for these thought experiments we examine
the effect of a one-time unit impulse to E*.The solution for yt derived below can
be used to calculate the effects on y, of such single realizations of E,.

2.1.2. Finding the solution

In order to find a solution for y, (that is, a stochastic process for y, which satisfies
the model (2.1) and (2.2)), we begin by representing y, in the unrestricted infinite
moving average form
Ch. 34: Stabilization Palicy in Macroeconomic Fluctuations 2003

Finding a solution for yI then requires determining values for the undetermined
coefficients y, such that eq. (2.1) and (2.2) are satisfied. Current and past E,
represent the entire history of the perturbations to the model. Eq. (2.4) simply
states that y, is a general function of all possible events that may potentially
influence y,. The linear form is used in (2.4) because the model (2.2) is linear.
Note that the solution for y, in eq. (2.4) can easily be used to calculate the effect
of a one time unit shock to et. The dynamic impact of such a shock is simply
dy,+,/de, = Y,-
To find the unknown coefficients, the most direct procedure is to substitute for
Y, and E,Y,+, in (2.1) using (2.4) and solve for the y, in terms of (Y,6 and Bi. The
conditional expectation E,y,+, is obtained by leading (2.4) by one period and
taking expectations, making use of the equalities Er~,+i = 0 for i > 0. The first
equality follows from the assumption that E, has a zero unconditional mean and
is uncorrelated; the second follows from the fact that etti for i -C0 is in the
conditioning set at time t. The conditional expectation is

EY,+~= f Yi’t-i+l. (2.5)


I i=l

Substituting (2.2), (2.4) and (2.5) into (2.1) results in

C YiE,-lza C Yi&t-i+l+ 6 C OiE,_j. (2.6)


i=O i=l i=o

Equating the coefficients of Ed,&,_r, E~_~,. . . on both sides of the equality (2.6)
results in the set of equations

yi = cYy;+i+ sei i=O,1,2 ,.... (2.7)

The first equation in (2.7) for i = 0 equates the coefficients of E, on both sides of
(2.6); the second equation similarly equates the coefficient for e,_i and so on.
Note that (2.7) is a deterministic difference equation in the yi coefficients with
di as a forcing variable. This deterministic difference equation has the same
structure as the stochastic difference eq. (2.1). It can be thought of as a
deterministic perfect foresight model of the “variable” yi. Hence, the problem of
solving a stochastic difference equation with conditional expectations of future
variables has been converted into a problem of solving a deterministic difference
equation.

2.1.3. The solution in the case of unanticipated shocks

Consider first the most elementary case where U, = E,. That is, 8, = 0 for i 21.
This is the case of unanticipated shocks which are temporary. Then eq. (2.7) can
2004 J. B. Taylor

be written

y. = ayl + 6. (2.8)
Yi+l
= iyi i=1,2 )... . (2.9)

From eq. (2.9) all the y, for i > 1 can be obtained once we have yi. However, eq.
(2.8) gives only one equation in the two unknowns y,, and yi. Hence without
further information we cannot determine the yi coefficients uniquely. The number
of unknowns is one greater than the number of equations. This indeterminacy is
what leads to non-uniqueness in rational expectations models and has been
studied by many researchers including Blanchard (1979) Flood and Garber
(1980), McCallum (1983), Gourieroux, Laffont, and Monfort (1982) Taylor
(1977) and Whiteman (1983).
If la1 I 1 then the requirement that y, is a stationary process will be sufficient
to yield a unique solution. (The case where Ial > 1 is considered below in Section
2.1.4.). To see this suppose that yr # 0. Since eq. (2.9) is an unstable difference
equation, the y, coefficients will explode as i gets large. But then yI would not be
a stationary stochastic process. The only value for yi that will prevent the y, from
exploding is yi = 0. From (2.9) this in turn implies that yj = 0 for all i > 1. From
eq. (2.8) we then have that y0 = 6. Hence, the unique stationary solution is simply
y, = 6~~. In this case, the impact of a unit shock dy,+s/de, is equal to S for s = 0
and is equal to 0 for s r 1. This simple impact effect is illustrated in Figure la.
(The more interesting charts in Figures lb, lc, and Id will be described below).
Example
In the case of the Cagan money demand equation this means that the price
p, = (1+ &‘m,. Because p > 0, a temporary unanticipated increase in the
money supply increases the price level by less than the increase in money. This is
due to the fact that the price level is expected to decrease to its normal value
(zero) next period, thereby generating an expected deflation. The expected defla-
tion increases the demand for money so that real balances must increase. Hence,
the price p, rises by less than m,. This is illustrated in Figure 2a.
For the more general case of unanticipated shifts in U, that are expected to
phase-out gradually we set 8, = pi, where p < 1. Eq. (2.7) then becomes

1 &J’
Y r+l i=O,1,2,3 ,... . (2.10)
=2--T

Again, this is a standard deterministic difference equation. In this more general


case, we can obtain the solution y, by deriving the solution to the homogeneous
part y,(*) and the particular solution to the non-homogeneous part y,“).
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2005

dyt+s dy,,, (b)


det de,

s
1 -a,

dy
t+S (d)
det

on y, of an unanticipated
on y, of an anticipated
k
Figure l(a). Effect on y, of an unanticipated
s s 0 s

unit shift in U, which is temporary (a, = a,). (b). Effect


unit shift in u, which is phased-out gradually (u, = p u,_ r + E,). (c). Effect
unit shift in u, which is temporary (anticipated at time 0 and to occur at time
k) (u, = eI_k). (d). Effect on yr of an anticipated shift in u1 which is phased-out gradually (anti-

The solution to (2.10) is the sum of the homogeneous solution and the particular
solution yi = y/H) + yi(J’). [See Baumol (1970) for example, for a description of
this solution technique for deterministic difference equations]. The homogeneous
part is

(H) = ~yyO i=O,1,2 ,..., (2.11)


Yi+l

= (l/a)‘+‘yiH). A s in the earlier discussion if Ia] < 1 then for


with solution yi’+“l’
stationarity we require that y,jH) = 0. For any other value of yAH) the homoge-
neous solution will explode. Stationarity therefore implies that y:H) = 0 for
i=O,1,2 ,... .
2006 J. B. Tuvlor

Price level Price level

1 (a) (b)

Price level Price svel


h
Cc) Cd)

0 k s- 0 k 5

Figure 2(a). Price level effect of an unanticipated unit increase in m, which lasts for one period.
(b). Price level effect of an unanticipated increase in m, which is phased-out gradually. (c). Price level
effect of an anticipated unit increase in m,+k which lasts for one period. The increase is anticipated k
periods in advance. (d). Price level of an anticipated unit increase in m,+k which is phased-out
gradually. The increase is anticipated k periods in advance.

To find the particular solution we substitute yi(‘) = hb’ into (2.10) and solve for
the unknown coefficients h and b. This gives:

b=p, (2.12)

h=6(1-ap)-‘.

Because the homogeneous solution is identically equal to zero, the sum of the
homogeneous and the particular solutions is simply

Sp’
Yi’ I_apy i=o,1,2 )... . (2.13)
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2007

In terms of the representation for yt this means that

(2.14)

The variable yt is proportional to the shock U, at all t. The effect of a unit shock
E, is shown in Figure lb. Note that yt follows the same type of first order
stochastic process that U, does; that is,

6E
Y,=PYt-l+y--+ (2.15)

Example
For the money demand example, eq. (2.14) implies that

1 1
P1= 1+p
B ( 1 mt
l- i 1+p 1p
i 1
(2.16)
= l+p(l-p) i mr.

As long as p -c 1 the increase in the price level will be less than the increase in the
money supply. The dynamic impact on pt of a unit shock to the money supply is
shown in Figure 2b. The price level increases by less than the increase in the
money supply because of the expected deflation that occurs as the price level
gradually returns to its equilibrium value of 0. The expected deflation causes an
increase in the demand for real money balances which is satisfied by having the
price level rise less than the money supply. For the special case that p = 1, a
permanent increase in the money supply, the price level moves proportionately to
money as in the simple quantity theory. In that case there is no change in the
expected rate of inflation since the price level remains at its new level.

2.1.4. A digression on the possibility of non-uniqueness

If ICYI> 1, then simply requiring that y, is a stationary process will not yield g
unique solution. In this case eq. (2.9) is stable, and any value of y1 will give a
stationary time series. There is a continuum of solutions and it is necessary to
place additional restrictions on the model if one wants to obtain a unique solution
2008 J. B. Tqlor

for the y,. There does not seem to be any completely satisfactory approach to take
in this case.
One possibility raised by Taylor (1977) is to require that the process for y, have
a minimum variance. Consider the case where U, is uncorrelated. The variance of
yI is given by

Vary,=y,2+(y0-6)2((Y2-1)-1. (2.17)

where the variance of E, is supposed to be 1. The minimum occurs at y0 = Si2


from which the remaining y, can be calculated. Although the minimum variance
condition is a natural extension of the stationarity (finite variance) condition, it is
difficult to give it an economic rationale.
An alternative rule for selecting a solution was proposed by McCallum (1983)
and is called the “minimum state variable technique”. In this case it chooses a
representation for y, which involves the smallest number of Ed terms; hence, it
would give y, = 6~~. McCallum (1983) examines this selection rule in several
different applications.
Chow (1983, p. 361) has proposed that the uniqueness issue be resolved
empirically by representing the model in a more general form. To see this
substitute eq. (2.8) with 6 =l and eq. (2.9) into eq. (2.4) for an arbitrary yi. That
is, from eq. (2.4) we write

Y, = EY,&t-1
i=O
=("y1+1)EI+y1E1_1+(Y1/~)&t-2+(Yl/~2)&I-3+ ..-. (2.18)

Lagging (2.18) by one time period, multiplying by a-1 and subtracting from
(2.18) gives

(2.19)

which is ARMA (1,l) model with a free parameter yi. Clearly if yi = 0 then this
more general solution reduces to the solution discussed above. But, rather than
imposing this condition, Chow (1983) has suggested that the parameter yi be
estimated, and has developed an appropriate econometric technique. Evans and
Honkapohja (1984) use a similar procedure for representing ARMA models in
terms of a free parameter.
Are there any economic examples where Ia/ > l? In the case of the Cagan
money demand equation, (Y= p/(1 + p) which is always less than 1 since /3 is a
positive parameter. One economic example where (Y> 1 is a flexible-price macro-
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2009

economic model with money in the production function. To see this consider the
following equations:

m, - pr = az, -pi,. (2.20)

-P$ (2.21)

(2.22)

where z, is real output, i, is the nominal interest rate, and the other variables are
as defined in the earlier discussion of the Cagan model. The first equation is the
money demand equation. The second equation indicates that real output is
negatively related to the real rate of interest (an “IS” equation). In the third
equation z, is positively related to real money balances. The difference between
this model and the Cagan model (in eq. (2.3)) is that output is a positive function
of real money balances. The model can be written in the form of eq. (2.1) with

P
a=1+p-d(a+pc-1)’ (2.23)

Eq. (2.23) is equal to the value of (Yin the Cagan model when d = 0. In the more
general case where d > 0 and money is a factor in the production function, the
parameter cycan be greater than one. This example was explored in Taylor (1977).
Another economic example which arises in an overlapping generation model of
money was investigated by Blanchard (1979).
Although there are examples of non-uniqueness such as these in the literature,
most theoretical and empirical applications in economics have the property that
there is a unique stationary solution. However, some researchers, such as
Gourieroux, Laffont, and Monfort (1982), have even questioned the appeal to
stationarity. Sargent and Wallace (1973) have suggested that the stability require-
ment effectively rules out speculative bubbles. But there are examples in history
where speculative bubbles have occurred and some analysts feel they are quite
common. There have been attempts to model speculative bubbles as movements
of y, along a self-fulfilling nonstationary (explosive) path. Blanchard and Watson
(1982) have developed a model of speculative bubbles in which there is a positive
probability that the bubble will burst. Flood and Garber (1980) have examined
whether the periods toward the end of the eastern European hyperinflations in the
1920s could be described as self-fulfilling speculative bubbles. To date, however,
the vast majority of rational expectations research has assumed that there is a
unique stationary solution. For the rest of this paper we assume that lcxl< 1, or
the equivalent in higher order models, and we assume that the solution is
stationary.
2010 J. B. Taylor

2.1.5. Finding the solution in the case of anticipated shocks

Consider now the case where the shock is anticipated k periods in advance and is
purely temporary. That is, u, = E,_~ so that 8,=1 and 8,=0 for i#k. The
difference equations in the unknown parameters can be written as:

Y, = "Y;+1 i = 0,1,2 ,...k-1. (2.24)

(2.25)

Yr+l = ayi i=k+l, k+2,.... (2.26)

The set of equations in (2.26) is identical in form to what we considered earlier


except that the initial condition is at k + 1. For stationarity we therefore require
that yk+r = 0. This implies from eq. (2.25) that yk = 6. The remaining coefficients
are obtained by working back using (2.24) starting with yk = S. This gives
Y~=Gc~~-‘, i=O,1,2 ,... k-l.
The pattern of the y, coefficients is shown in Figure lc. These coefficients give
the impact of E, on JJ~+~,for s > 0, or equivalently the impact of the news that the
shock U, will occur k periods later. The size of y0 depends on how far in the
future the shock is anticipated. The farther in advance the shock is known (that is,
the larger is k), the smaller will be the current impact of the news.
Example

For the demand for money example we have

p,=6[a%,+&1e t-1
+ a.. + “‘+(kpl) + &t-k]. (2.27)

Substituting (Y= /I/(1 + p), S =l/(l+ /I), and E, = u,+~ = mr+k into (2.27) we
get

Pt= (&$f( &)k-‘m”k-‘. (2.28)

Note how this reduces to pr = (1 + j3P’m, in the case of unanticipated shocks


(k = 0), as we calculated earlier. When the temporary increase in the money
supply is anticipated in advance, the price level “jumps” at the date of announce-
ment and then gradually increases until the money supply does increase. This is
illustrated in Figure 2c.
Finally, we consider the case where the shock is anticipated in advance, but is
expected to be permanent or to phase-out gradually. Then, suppose that 0, = 0 for
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2011

i=l , . . ., k - 1 and 8; = piPk for i 2 k. Eq. (2.7) becomes

Yj = aYi+l i=O,1,2 ,..., k -1, (2.29)


1 6 i-k
P i=k,k+l,.... (2.30)
Y;+1 =3- a

Note that eq. (2.30) is identical to eq. (2.10) except that the initial condition starts
at k rather than 0. The homogeneous part of (2.30) is

y,‘+” = IY!H) i=k,k+l,.... (2.31)


a ’

In order to prevent the yitH) from exploding as i increases it is necessary that


Yk
cH) = 0. Therefore yi(H)=Ofor i=k,k+l,.... The unknown coefficients h and
b of the particular solution y!‘) = hbiek are

h=6(1-ap)-‘,
b=p. (2.32)

Since the homogeneous part is zero we have that

aPi-k i=k,k+l,.... (2.33)


Yi= l-ap

The remaining coefficients can be obtained by using (2.29) backwards starting


with yk = 6(1- ap)-‘. The solution for y, is

ake,+ ak-l&,_l + ’ ‘. + a&t-k+1 + Et-k

+ p&,-k-1 + p*&,-k-2+ * * * )* (2.34)

After the immediate impact of the announcement, yt will grow smoothly until it
equals S(l- ap)-’ at the time that U, increases. The effect then phases out
geometrically. This pattern is illustrated in Figure Id.
Example
For the money demand model, the effect on the price level p, is shown in Figure
2d. As before the anticipation of an increase in the money supply causes the price
level to jump. The price level then increases gradually until the increase in money
actually occurs. During the period before the actual increase in money, the level
of real balances is below equilibrium because of the expected inflation. The initial
increase becomes larger as the phase-out parameter p gets larger. For the
permanent case where p = 1 the price level eventually increases by the same
amount that the money supply increases.
2012 J. B. Taylor

2.1.6. General ARMA processes for the shocks

The above solution procedure can be generalized to handle the case where (2.2) is
an autoregressive moving average (ARMA) model. We consider only unantic-
ipated shocks where there is no time delay. Suppose the error process is

an ARMA (p, q) model. The coefficients in the linear process for U, in the form
of (2.2) can be derived from:

mW,p)
ej=#j+ C Pie,-1 j=o,1,2 ,.-.,q,
i=l

miN_i.p)

e,= C piej_i j>q. (2.36)


i=l

whtre q0 = 1. See Harvey (1981, p. 38), for example.


Starting with j = M = max( p, q + 1) the P),.coefficients in (2.36) are determined
by a pth order difference equation. The p mrtial conditions (8,-r,. . . , OM_p) for
this difference equation are given by the p equations that preceed the 8,
equation in (2.36).
To obtain the y, coefficients, (2.36) can be substituted into eq. (2.7). As before,
the solution to the homogeneous part is y,cH)= 0 for all i. The particular solution
to the non-homogeneous part will have the same form as (2.36) for j 2 M. That
is,

Yj= i PiYj-1 j=M,M+l )... . (2.37)


i=l

The initial conditions ( yM_ .., yw_,,) for (2.37), as well as the remaining y
r, .

values (Y~-~-~,..., y,,) can then be obtained by substitution of 0; for i = 0,. . . , M


- 1 into (2.37). That is,

Yi+l
= iyi - $ji i=O,l )...) M-l. (2.38)

Comparing the form of (2.37) and (2.38) with (2.36) indicates that the y,
coefficients can be interpreted as the infinite moving average representation of an
ARMA (p, A4 - 1) model. That is, the solution for y, is an ARMA (p, A4 - 1)
model with an autoregressive part equal to the autoregressive part of the U,
process defined in eq. (2.35). This result is found in Gourieroux, Laffont, and
Monfort (1982). The methods of Hansen and Sargent (1980) and Taylor (1980a)
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2013

can also be used to compute the ARMA representations directly as summarized


in Section 2.4 below.
Example : p=3,q=I
In this case M = 3 and eq. (2.36) becomes

8,=1,
@1= J/1+ Pi@,,
8, = Piei+ P2@cl>
6, = pie;-1 + pzei-2 + p3@i_s i = 3,4 )... . (2.39)

The y coefficients are then given by

Yi= P 1-f-l + P2Yi-2 + P3YI-3 i = 3.4 )... . (2.40)


. . . . .
and the mitral conditions yO, yi and y2 are given by solving the three linear
equations

(2.41)

Eqs. (2.40) and (2.41) imply that yt is an ARMA (3,2) model.

2.1.7. DiRerent viewpoint dates

In some applications of rational expectation models the forecast of future


variables might be made at different points in time. For example, a generalization
of (2.1) is

Y, = yy,+,+ OIYt+l + %EIY, + Uf. (2.42)

Substituting for y, and expected y, from (2.4) into (2.42) results in a set of
equations for the y coefficients much like the equations that we studied above.
Suppose U, = PU,_~ + E,. Then, the equations for y are

Yo=“lYl+h

Yi+l = 1 i
l-lx,
p--g yi--$& i=1,2,.... (2.43)
2014 J. B. Tuylor

Hence, we can use the same procedures for solving this set of difference equa-
tions. The solution is

Yo = “ibp + 6,
y, = bp’ i =1,2 )... .

where b = 6/(1- (Ye- pa, - pal). Note that this reduces to (2.13) when (Y*= (Ye
= 0.

2.1.8. Geometric interpretation

The solution of the difference eq. (2.7) that underlies this technique has an
intuitive graphical interpretation which corresponds to the phase diagram method
used to solve continuous time models with rational expectations. [See Calvo
(1980) or Dixit (1980) for example]. Eq. (2.7) can be written

1
yi+l
- yi =
i
--1
a 1
yi+ i=O,l )... . (2.44)

The set of values for which y, is not changing are given by setting the right-hand
side of (2.44) to zero. These values of (yi, (3,) are plotted in Figure 3. In the case
where Bi = pi, for 0 < p -c 1 there is a difference equation representation for Bi of
the form

8;+i-ei=(p-1)ei, (2.45)

where 8, = 1. The set of points where fl is not changing is a vertical line at Bi = 0


in Figure 3. The forces which move y and 8 in different directions are also shown
in Figure 3. Points above (below) the upward sloping line cause y, to increase
(decrease). Points to the right (left) of the vertical line cause ei to decrease
(increase). In order to prevent the yI from exploding we found in Section 2.1.3

I I
0 ei

Figure 3. Illustration of the rational expectations solution and the saddle path. Along the saddle path
the motion is towards the origin at geometric rate p. That is, 0, = PO,_ I.
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2015

that it was necessary for yi = (6/l - (~p)e,. This linear equation is shown as the
straight line with the arrows in Figure 3. This line balances off the unstable
vertical forces and uses the stable horizontal forces to bring y, back to the values
yi = 0 and 0; = 0 and i + 00. For this reason it is called a saddle point and
corresponds to the notion of a saddle path in differential equation models [see
Birkhoff and Rota (1962), for example].
Figure 3 is special in the sense that one of the zero-change lines is perfectly
vertical. This is due to the fact that the shock variable U, is exogenous to y,. If we
interpret (2.1) and (2.2) as a two variable system with variables y, and u, as the
two variables, then the system is recursive in that U, affects yt in the current
period and there are no effects of past y, on u,. In Section 2.2 we consider a more
general two variable system in which U, is endogenous.
In using Figure 3 for thought experiments about the effect of one time shocks,
recall that yj is dy,+,/de, and ti, is drc,+,jde,. The vertical axis thereby gives the
paths of the endogenous variable y, corresponding to a shock E, to the policy eq.
(2.2). The horizontal axis gives the path of the policy variable. The points in
Figure 3 can be therefore viewed as displacements of y, and U, from their steady
state values in response to a one-time unit shock.
The arrows in Figure 3 show that the saddle path line must have a slope greater
than zero and a slope less than the zero-change line for y. That is, the saddle path
line must lie in the shaded region of Figure 3. Only in this region is the direction
of motion toward the origin. The geometric technique to determine whether the
saddle path is upward or downward sloping is frequently used in practice to
obtain the sign of an impact effect of policy. [See Calvo (1980) for example].
In Figure 4 the same diagram is used to determine the qualitative movement of
y, in response to a shock to u, which is anticipated k periods in advance and
which is expected to then phase out geometrically. This is the case considered

Figure 4. Illustration of the effect of an anticipated shock to U, which is then expected to be phased
out gradually at geometric rate p. The shock is anticipated k periods in advance. This thought
experiment corresponds to the chart in Figure l(d).
2016 .I. B. Taylor

above in Section 2.1.5. The endogenous variable y initially jumps at time 0 when
the future increase in u becomes known; it then moves along an explosive path
through period k when u increases by 1 unit. From time k on the motion is along
the saddle path as y and u approach their steady state values of zero.

2.1.9. Nonstationary forcing variables

In many economic applications the forcing variables are nonstationary. For


example the money supply is a highly nonstationary series. One typically wants to
estimate the effects of changes in the growth rate of the money supply. What
happens when the growth rate is reduced gradually? What if the reduction in
growth is anticipated? Letting U, be the log of the money supply m,, these
alternatives can be analyzed by writing the growth rate of money as g, = m 1- m t _ 1
and assuming that

g,-gt-l=P(gt-1-gt-*)+%k.
Thus, the change in the growth rate is anticipated k periods in advance. The
new growth rate is phased in at a geometric rate p. By solving the model for the
particular solution corresponding to this equation, one can solve for the price
level and the inflation rate. In this case, the inflation rate is nonstationary, but the
change in the inflation rate is stationary.

2.2. Bivariate models

Let yr, and y,, be given by

Yl, = a1 E.h+
t
1+ PlOY,, + P,lY,,-1 + 4%
Y2t = a2J3.Yt lr+1 + P2Oh + P21h-l+ 82%

where U, is a shock variable of the form (2.2). Model (2.46) is a special bivariate
model in that there are no lagged values of y,, and no lead values of yzr. This
asymmetry is meant to convey the continuous time idea that one variable ylt is a
“jump” variable, unaffected by its past while y21 is a more slowly adjusting
variable that is influenced by its past values. Of course in discrete time all
variables tend to jump from one period to the next so that the terminology is not
exact. Nevertheless, the distinction is important in practice. Most commonly, y,,
would be a price and y,, a stock which cannot change without large costs in the
short run.
We assume in (2;46) that there is only one shock u,. This is for notational
convenience. The generalization to a bivariate shock (ulrr u2t) where ulr appears
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2017

in the first equation and uzl in the second equation is straightforward, as should
be clear below.
Because (2.46) has this special form it can be reduced to a first order
2-dimensional vector process:

This particular way to construct a first order process follows that of Blanchard
and Kahn (1980). A generalization to the case of viewpoint dates earlier than time
t is fairly straightforward. If yit_i or E,y,,+, also appeared in (2.46) then a
first-order model would have to be more than 2 dimensional.

2.2. I. Some examples

There are many interesting examples of this simple bivariate model. Five of these
are summarized below.
Example I: Exchange rate overshooting

Dombusch (1976) considered the following type of model of a small open


economy [see also Wilson (1979) and Buiter and Miller (1983)]:

m,-p,=-a Ee,+i--e, ,
( I 1

Pt-Pt-l=P(et-Pt),

where e, is the log of the exchange rate, and pI and m, are as defined in the
Cagan model. The first equation is simply the demand for money as a function of
the nominal interest rate. In a small open economy with perfect capital mobility
the nominal interest rate is equal to the world interest rate (assumed fixed)
plus the expected rate of depreciation E,e,+i - e,. The second equation describes
the slow adjustment of prices in response to the excess demand for goods. Excess
demand is assumed to be a negative function of the relative price of home goods.
Here prices adjust slowly and the exchange rate is a jump variable. This model is
of the form (2.47) with yit = e,, y,, = p*, a1 =l, pi0 = -l/a, pii = 0, 6, =1/c&
a2 = 0, &a = P/(1 + P), P2i = l/(1 + P), 62 = 0.
Example 2: Open economy portfolio balance model

Kouri (1976), Rodriquez (1980);and Papell(1984) have considered the following


type of rational expectations model which is based on a portfolio demand for
2018 J. B. Tqlor

foreign assets rather than on perfect capital mobility:

et + f, = a Ee,+l - e, + u,:
( i

r,-r,_I=P:,.

The first equation represents the demand for foreign assets f, (in logs) evaluated
in domestic currency, as a function of the expected rate of depreciation. Here U,
is a shock. The second equation is the “current account” (the proportional change
in the stock of foreign assets) as a function of the exchange rate. Prices are
assumed to be fixed and out of the picture. This model reduces to (2.47) with
y2, = f,, q = G+ a>, PI0= l/O + ~1, PII = 0, 4 = l/l + a, a2= 0,
ylr = e,,
Pm= P, Pz1=- 1, 6, = 0.
Example 3: Money and capital

Fischer (1979) developed the following type of model of money and capital.

Y, = yk t-1,
rt = - (l- Y)k,-1,

m,- pt= -alEr,+l-a2 i EP,+, - Pt)+Yv


f t

kt = b,Ert+, +b, ( Ept+l - Pt)+Yt.


t t

The first two equations describe output y, and the marginal efficiency of capital rt
as a function of the stock of capital at the end of period t - 1. The third and
fourth equations are a pair of portfolio demand equations for capital and real
money balances as a function of the rates of return on these two assets. Lucas
(1976) considered a very similar model. Substituting the first two equations into
the third and fourth we get model (2.47) with

a2 -al(l-v)
Yl, = Pt, Y2t = kt, a1=l+2 &II= 1t-a ’
2

a,= 1
PII = 03
l+a,’ 0L2= (l+b&))’

&o= (1+b;(:-y))’ p21= (I+b,;l-y)).


Ch. 34: Stuhilization Policy in Macroeconomic Fluctuations 2019

Example 4: Staggered contracts model


The model yt = atE,y,+t + a,y,_ 1 + 6~4, of a contract wage y, can occur in a
staggered wage setting model as in Taylor (1980a). The future wage appears
because workers and firms forecast the wage set by other workers and firms. The
lagged wage appears because contracts last two periods. This model can be put in
the form of (2.47) by stacking the y’s into a vector:

Example 5: Optimal control problem


Hansen and Sargent (1980) consider the following optimal control problem. A
firm chooses a contingency plan for a single factor of production (labor) n, to
maximize expected profits.

P*+jYt+j - :(“r+j -
Yjpj[ n*+j-l I’- w*+jnt+j] 3

subject to the linear production function y, = yn,. The random variables p, and
w, are the price of output and the wage, respectively. The first order conditions of
this maximization problem are:

This model is essentially the same as that in Example (4) where U, = wI - ypl.

2.2.2, Finding the solution

Equation (2.47) is a vector version of the univariate eq. (2.1). The technique for
finding a solution to (2.47) is directly analogous with the univariate case.
The solution can be represented as

Ylt= EYliEI-i?
i=O
(2.48)
Y,, = E Y*rEt-i.
i=O
2020 J. B. Taylor

These representations for the endogenous variables are an obvious generalization


of eqs. (2.4).
Utilizing matrix notation we rewrite (2.47) as

Bz, = CEz,,, + au,, (2.49)


t
Ez r+i = AZ, +da,, (2.50)
1

where the definitions of the matrices B and C, and the vectors z, and 6 in (2.49)
should be clear, and where A = C-‘B and d = - C-‘6. Let y, = (yli,y2,-J’,
i=O,1,2,... and set y2,_r = 0. Substitution of (2.2) and (2.48) into (2.50) gives

Yi+l
= Ay, + de; i=O,1,2 ,.... (2.51)

Eq. (2.51) is analogous to eq. (2.7). For i = 0 we have three unknown elements of
the unknown vectors y0 = (yr,,,O)’ and yr = (yrr, y&‘. The 3 unknowns are ylo,
yii and yZo. However, there are only two equations (at i = 0) in (2.51) that can be
used to solve for these three parameters. Much as in the scalar case considering
i = 1 gives two more equations, but it also gives two more unknowns ( yr2, y2r); the
same is true for i = 2 and so on. To determine the solution for the y, process we
therefore need another equation. As in the scalar case this third equation comes
by imposing stationarity on the process for y,, and yzr or equivalently in this
context by preventing either element of yj from exploding. For uniqueness we will
require that one root of A be greater than one in modulus, and one root be less
than one in modulus. The additional equation thus comes from choosing yr =
(yir, yZo)’so that yi does not explode as i + co. This condition implies a unique
linear relationship between yir and yZo. This relationship is the extra equation. It
is the analogue of setting the scalar yi = 0 in model (2.1).
To see this, we decompose the matrix A into H- ‘A H where A is a diagonal
matrix with Xi and X, on the diagonal. H is the matrix whose rows are the
characteristic vectors of A. Assume that the roots are distinct and that IX,1> 1
and 1X,1< 1. Let pLi= (pii, pZi)’ = Hy,. Then the homogeneous part of (2.51) is

Yi+l = H-‘AHy; i =1,2 3.e.9 (2.52)

so that

Pi+1 = A/.$ i=1,2 >.*.,

or

Pli+l = ~1Pli i=1,2 >--.,


(2.53)
p2;+1= X2P2; i =1,2 )... .
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2021

For stability of pli as i + 00 we therefore require that pii = 0 which in turn


implies that pii = 0 for all i > 1. In other words we want

ccl, = bYll+ h2Y20 = 0, (2.54)

where (hi,, hi,) is the first row of H and is the characteristic vector of A
corresponding to the unstable root A,. Eq. (2.54) is the extra equation. When
combined with (2.51) at i = 0 we have 3 linear equations that can be solved for
yic, yii and yzo. From these we can use (2.51) or equivalently (2.53) to obtain the
remaining yi for i > 1. In particular pli = 0 implies that

h
Yli = - jfY2i-1 i=1,2,...., * (2.55)
11

From the second equation in (2.53) we have that

Substituting for yii+ i and yii from (2.55) this gives

Y2i+l= ‘2Y2i i=O,1,2 ,.... (2.56)

Given the initial values y2i we compute the remaining coefficients from (2.55) and
(2.56).

2.2.3. The solution in the case of unanticipated shocks

When the shock U, is unanticipated and purely temporary, 0, = 1 and di = 0 for


all i > 0. In this case eq. (2.51) for i = 0 is

yll = allylo + di,


Y20= a2lYlo + d29 (2.57)

and the difference equation described by (2.51) for i > 0 is homogeneous. Hence
the solution given by (2.55) (2.56), and (2.57) is the complete solution.
For the more general case where 13,= pi, eq. (2.57) still holds but the difference
equation in (2.51) for i 2 1 has a nonhomogeneous part. The particular solution to
the nonhomogeneous part is of the form y,“) = gb’ where g is a 2 x 1 vector.
Substituting this form into (2.51) for i 2 1 and equating coefficients we obtain the
particular solution

vi”‘= (PI- /l-id+, i=1,2 )... . (2.58)


2022 J. B. Ta.vlor

Since eq. (2.55) is the requirement for stability of the homogeneous solution, the
complete solution can be obtained by substituting y$” = yll - y$” and yif) =
y2,, - ~4:) into (2.54) to obtain

cm= _ !p(Y2,_yp), (2.59)


Y11- Yll
11

Eq. (2.59) can be combined with (2.57) to obtain yr,,, ytI, and yzO. The remaining
coefficients are obtained by adding the appropriate elements of particular solu-
tions (2.58) to the homogeneous solutions of (2.56) and (2.57).

2.2.4. The solution in the case of anticipated shocks

For the case where the shock is anticipated k periods in advance, but is purely
temporary (6, = 0 for i = 1,. . . , k - 1, 0, = 0 for i = k + 1,. . . ), we break up the
difference eq. (2.51) as:

Y,+l= 45 i=O , 1 ,*.-, k-l. (2.60)

~k+l= AY, + d. (2.61)

~i+l= AY; i=k+l,k+2,.... (2.62)

Looking at the equations in (2.62) it is clear that for stationarity, yk+l =


(ylk+ 1, yzk)’ must satisfy the same relationship that the vector y1 satisfied in eq.
(2.55). That is,

-- h (2.63)
Ylk+l= j,ff Y2k’

Once Y2k and Ylk+1 have been determined the y values for i > k can be
computed as above in eqs. (2.55) and (2.56). That is,

h 12
Ylr+ 1 = - hY2i i=k ,*.e,
11

y2i+1= X2Y2i i=k )... . (2.65)

To determine y2k and ylk+ 1 we solve eq. (2.63) jointly with the 2( k + 1) equations
in (2.60) and (2.61) for the 2(k + l)+l unknowns yI1 ,..., ylk+l and yzo,. .., yzh.
(Note how this reduces to the result obtained for the unanticipated case above
when k = 0). A convenient way to solve these equations is to first solve the three
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2023

equations consisting of the two equations from:

yk+l= Ak+'y,,+ d, (2.66)

(obtained by “forecasting” yi out k periods) and eq. (2.61) for yzk, ylk+r and
yra. Then the remaining coefficients can be obtained from the difference equations
in (2.60) starting with the calculated value for yIO.
The case where ei=O for i=l,..., k-l and t3k=pk-r for i=k, k-l canbe
solved by adding the particular solution to the nonhomogeneous equation

yj+l = Ay, +dp(‘-k) i=k,k+l,k+2 ,..., (2.67)

in place of (2.62) and solving for the remaining coefficients using eqs. (2.60) and
(2.61) as above. The particular solution of (2.67) is

y,‘p’= (PI- A)-‘dp’-k i=k,k+l,k+2 ,.... (2.68)

2.2.5. The exchange rate overshooting example

The preceding calculations can be usefully illustrated with Example 1 of Section


2.2.1.: the two variable “overshooting” model in which the exchange rate (yr, = e,)
is the jump variable and the price level (yzt = p,) is the slowly moving variable.
For this model eq. (2.50) is

(2.69)

where the matrix

(2.70)

and the vector d = ( - l/cu,O)‘. Suppose that (Y= 1 and /3 = 1. Then the character-
istic roots of A are

A =1* -0.707. (2.71)

The characteristic vector .associated with the unstable root is obtained from

h,, &)A = hh,, h,,), (2.72)


2024 J. B. Tqdor

this gives - h,,/h,, = -0.414 so that according to eq. (2.56) the coefficients of
the (homogeneous) solution must satisfy

y,;+ t = - 0.414yz; i=O,l,.... (2.73)

Using the stable root we have

~z,+ 1= 0.293~,; i=O,l )... . (2.74)

The particular solution is given by the vector (PI- A)-’ dplWk as in eq. (2.68).
That is

(0.5 - p)/Pk
i=k,k+l,k+2 ,..., (2.75)
‘?= (1.5- p)(O.5- p)-0.25

i = k, k +l, k +2 ,..., (2.76)

where k is the number of periods in advance that the shock to the money supply
is anticipated (k = 0 for unanticipated shocks).
In Tables 2, 3, and 4 and in Figures 5, 6, and 7, respectively, the effects of
temporary unanticipated money shocks (k = 0, p = 0), permanent unanticipated
money shocks (k = 0, p = l), and permanent money shocks anticipated 3 periods

Table 2
Effect of an unanticipated temporary increase in money on the exchange rate and
the price level (k = 0, p = 0).

Period after shock: i 0 1 2 3 4

Effect on exchange rate: Yl, 0.59 -0.12 - 0.04 - 0.01 -0.00


Effect on price level: -rzr 0.29 0.09 0.03 0.01 0.00

Table 3
Effect of unanticipated permanent increase in money on the exchange rate and
the price level (k = 0, p = 1).

Period after shock: i 0 1 2 3 4

Effect on exchange rate: 1.04 1.01 1.00


particular solution: 1 1 1
homogeneous solution: 0.04 0.01 0.00
Effect on price level: 0.97 0.99 1.00
particular solution: 1 1 1
homogeneous solution: - 0.03 - 0.01 -0.00
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2025

Table 4
Effect of a permanent increase in money anticipated 3 periods in advance on the exchange rate
and the price level (k = 3, p = 1).

Period after the shock: i 0 1 2 3 4 5 6

Effect on the exchange rate: yr 0.28 0.43 0.71 1.21 1.06 1.02 1.00
particular solution: YliCP) - - - - 1.00 1.00 1.00
homogeneous solution: &W - - - - 0.06 0.02 0.01
Effect on the price level: 0.14 0.28 0.50 0.85 0.96 0.99 1.00
particular solution: 1.00 1.00 1.00 1.00
homogeneous solution: -0.15 - 0.04 - 0.01 -0.00

Impact on the exchange rate


1.0 -

Yli

-1 0 1 2 3 4i

Impact on the price level


1 .o
t

Figure 5. Temporary
OJL 0

unanticipated
1

increase
2 3

in money.
4i
2026 J. B. Tqvlor

Impact on the exchange rate

71 i

1.0

I I I I I
0 1 2 3 4i

Impact on the price level


,,O_-----

YZi

o.5 (--
0
-1
L-L 0
I

1
I
2 3
I
4 i

Figure 6. Permanent unanticipated increase in money.

in advance (k = 3, p = 1) are shown. In each case the increase in money is by 1


percent.
A temporary unanticipated increase in money causes the exchange rate to
depreciate (e rises) and the price level to increase in the first period. Subse-
quently, the price level converges monotonically back to equilibrium. In the
second period, e falls below its equilibrium value and then gradually rises again
back to zero (Table 2 and Figure 5).
A permanent unanticipated increase in money of 1 percent eventually causes
the exchange rate to depreciate by 1 percent and the price level to rise by 1
percent. But in the short run e rises above the long-run equilibrium and then
gradually falls back to zero. This is the best illustration of overshooting (Table 3
and Figure 6).
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2021

Impact on the exchange rate

Impact on the price level

Y2i

0.5 -

,
2 3 4 5 i

Figure 7. Permanent increase in money, anticipated 3 periods in advance.

If the increase in the money supply is anticipated in advance, then the price
level rises and the exchange rate depreciates at the announcement date. Subse-
quently, the price level and e continue to rise. The exchange rate reaches its
lowest value (e reaches its highest value) on the announcement date, and then
appreciates back to its new long-run value of 1 (Table 4 and Figure 7). Note that
p and e are on explosive paths from period 0 until period 3.

2.2.6. Geometric interpretation

The solution of the bivariate model has a helpful geometric interpretation.


Writing out eq. (2.51) with fli = 0 in scalar form as two different equations and
2028 J. B. Taylor

subtracting yri and yzi_ 1 from the first and second equation respectively results in

AYli+l= Y1;+1 -Yli= (all-l)Yli + al*Y2i-13

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (2.77)

According to (2.77) there are two linear relationships between yli and y2r_1
consistent with no change in the coefficients: Ayli=r = 0 and Ay2, = 0. For
example, in the exchange rate model in eq. (2.69), the equations in (2.77) become

P 1
AY1i+ 1=
“(l+@y”+ ar(1+p)Y2”’
(2.78)

Yli

Yli=-o YZi-1
I I
0 YZi-1

Figure 8. Geometric interpretation of the solution in the bivariate model. The darker line is the
saddle point path along which the impact coefficients converge to the equilibrium value of (0,O).

4
71 i

0.2 -

-0.2 1 I I
-0.2 -0.1 0 0.1 0.2 Y2i-1

Figure 9. Solution values for the case of temporary-unanticipated shocks. (k = 0, p = 0). The
numbered points are the values of i. See also Table 2 and Figure 5.
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations

The two no-change lines are

Yl, = - LY2i-lY
P (2.79)
Yli = Y2i-19

and are plotted in Figure 8. The arrows in Figure 8 show the directions of motion
according to eq. (2.78) when the no-change relationships in (2.79) are not
satisfied. It is clear from these arrows that if the y coefficients are to converge to

1.6

71i

1.4

1.2

1 .o

0.8

0.6 L

I I I I I I
-0.2 0.8 1.0 1.2
-0.2 0 0.2 0.4 0.6
%!i-1

Figure 10. Solution values for a permanent unanticipated increase in the money supply. The open
circles give the ( yIi, vz,) pairs starting with i = 0.
2030 J. R. Tqhr

1.4

Yli r

0.8 -

I
0.6 - I
I
I
I
I
0.4 - I
I
I
I
I
I
I
I
I

I
-0.2 I I I____L_ I I I
-0.2 0 0.2 0.4 0.6 0.8 1.0 1.2
YZi-1

Figure 11. Solution values for an anticipated permanent increase in the money supply. The open
circles give the y,, , yzzpairs starting with i = 0.

their equilibrium value (0,O) they must move along the “saddle point” path
shown by the darker line in Figure 8. Points off this line will lead to ever-increas-
ing values of the y coefficients. The linear combination of yli and yzi_i along this
saddle point path is given by the characteristic vector associated with the unstable
root A, as given in general by eq. (2.55) and for this example in eq. (2.73). Note
how Figure 8 immediately shows that the saddle point path is downward sloping.
In Figure 9 the solution values for the impacts on the exchange rate and the price
level are shown for the case of a temporary shock as considered in Table 2 and
Figure 5. In Figures 10 and 11, the solution values are shown for the case where
the increase in money is permanent. The permanent increase shifts the reference
point from (0,O) to (1,l). The point (1,l) is simply the value of the particular
Ch. 34: S~uhilrzation Policy in Macroeconomic Fluctuatior~s 2031

solution in this case. Figure 10 is the case where the permanent increase is
unanticipated; Figure 11 is the anticipated case.
Note that these diagrams do not give the impact on the exchange rate and the
price level in the Same period; they are one period out of synchronization. Hence,
the points do not correspond to a scatter diagram of the effects of a change in
money on the exchange rate and on the price level. It is a relatively simple matter
to deduce a scatter diagram as shown by the open circles in Figures 10 and 11.

2.3. The use of operators, generating functions, and z-transforms

As the previous Sections have shown, the problem of solving rational expectations
models is equivalent to solving nonhomogeneous deterministic difference equa-
tions. The homogeneous solution is obtained simply by requiring that the stochas-
tic process for the endogenous variables be stationary. Once this is accomplished,
most of the work comes in obtaining the particular solution to the nonhomoge-
neous part. Lag or lead operators, operator polynomials, and the power series
associated with these polynomials (i.e. generating functions or z-transformations)
have frequently been found useful in solving the nonhomogeneous part of
difference equations [see Baumol(1970), for economic examples]. These methods
have also been useful in rational expectations analysis. Futia (1981) and
Whiteman (1983) have exploited the algebra of z-transforms in solving a wide
range of linear rational expectations models.
To illustrate the use of operators, let FSxt = xt+$ be the forward lead operator.
Then the scalar equation in the impact coefficients that we considered in eq. (2.7),
can be written

(l-cuP)y,=M, i=O,1,2 ,.... (2.80)

Consider the case where 0, = pi and solve for yi by operating on both sides by the
inverse of the polynomial (1 - crF). We then have

Sp’
y;= l-&7

=- 6P’
i=O,1,2.... (2.81)
l-ap

the last equality follows from the algebra of operator polynomials [see for
example Baumol (1970)]. The result is identical to what we found in Section 2.1
using the method of undetermined coefficients to obtain the particular solution.
The procedure easily generalizes to the bivariate case and yields the particular
2032 J. B. Taylor

solution shown in eq. (2.58). It also generalizes to handle other time series
specifications of Bi.
The operator notation used in (2.80) is standard in difference equation analysis.
In some applications of rational expectations models, a non-standard operator
has been used directly on the basic model (2.1). To see this redefine the operator
F as FE,y, = E,y,+,. That is, F moves the date on the variable but the viewpoint
date in the expectation is held constant. Then eq. (2.1) can be written (note that
E,Y, = Y,):

(1-aF)vy,=S,. (2.82)

Formally, we can apply the inverse of (1 - aF) to (2.82) to obtain

Ey,=6(1-aF)-‘u,
I

=S(lft~F+(aF)~+ -)ut

u,+cKEu,+~+(w*Eu,+~+ ..a
1

=6(u,+apu,+(ap)2u,+ . ..)
s
= l_%’ (2.83)

and where we again assume that U, = put-i + cl. Eq. (2.83) gives the same answer
that the previous methods did (again note that E,y, = y,). As Sargent (1979, p.
337) has discussed, the use of this type of operator on conditional expectations
can lead to confusion or mistakes, if it is interpreted as a typical lag operator that
shifts all time indexes, including the viewpoint dates. The use of operators on
conventional difference operations like (2.6) is much more straightforward, and
perhaps it is best to think of the algebra in (2.82) and (2.83) in terms of (2.80) and
(2.81).
Whiteman’s (1983) use of the generating functions associated with the operator
polynomials can be illustrated by writing the power series corresponding to eqs.
(2.2) and (2.4):

Y(Z) = f YiZi,
i=O

e(z) = f B,z’.
i=O
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2033

These are the z-transforms [see Dhrymes (1971) for a short introduction to
z-transforms and their use in econometrics]. Equating the coefficients of E,_, in
eq. (2.6) is thus the same as equating the coefficients of powers of z. That is, (2.6)
means that

y(z) = (Yz-‘(u(z&J+~~(z). (2.84)

Solving (2.84) for y(z) we have

y(Z)=(l-Cx-rz)-l(yo-&x-lze(z)). (2.85)

As in Section 2.1, eq. (2.85) has a free parameter y,, which must be determined
before y(z) can be evaluated. For Y! to be a stationary process, it is necessary that
y(z) be a convergent power series (or equivalently an analytic function) for
]z( ~1. The term (l- a-‘~)-~ on the right-hand side of (2.85) is divergent if
(Y-’ > 1. Hence, the second term in parentheses must have a factor to “cancel out”
this divergent series. For the case of serially uncorrelated shocks, e(z) is a
constant 0, = 1 so that it is obvious that y0 = 6 will cancel out the divergent
series. We then have y(z) = 6 which corresponds with the results in Section 2.1.
Whiteman (1983) shows that in general y(z) will be convergent when ]a] < 1 if
y0 = M(a). For the unanticipated autoregressive shocks this implies that y(z) =
S(1 - par)-‘(1 - pz) which is the z-transform of the solution we obtained earlier.
When ]a( > 1 there is no natural way to determine yO, so we are left with
non-uniqueness as in Section 2.1.

2.4. Higher order representations and factorization techniques

We noted in Section 2.2 that a first-order bivariate model with one lead variable
could be interpreted as a second-order scalar model with a lead and a lag. That is,

Y, = o,~y,+, + azl)t-1+ suv (2.86)

can be written as a bivariate model and solved using the saddle point stability
method. An alternative approach followed by Sargent (1979), Hansen and
Sargent (1980) and Taylor (1980a) is to work with (2.86) directly. That the two
approaches give the same result can be shown formally.
Substitute for y,, y,_,, and E,y,+, in eq. (2.86) using (2.4) to obtain the
equations

(2.87)

Yi+l
= tyi - zyi_, - $ei i=1,2 )... . (2.88)
2034 J. B. Taylor

As above, we need one more equation to solve for all the y coefficients. Consider
first the homogeneous part of (2.88). Its characteristic polynomial is

1 a2
z2 --z+--, (2.89)
a1 a1

which can be factored into

(Ai - 402 - 47
(2.90)

where A, and A, are the roots of (2.89). The solution to the homogeneous part is
Y,(H) = klAil + k2Ai2. As we discussed above, in many economic applications one
root, say hi, will be larger than 1 in modulus and the other will be smaller than 1
in modulus. Thus, the desired solution to the homogeneous part is achieved by
setting k, = 0 so that y,cH)= k,X’, where k, equals the initial condition ydH).
Equivalently we can interpret the setting of k, = 0 as reducing the characteristic
polynomial (2.89) to (z - h2). Thus, the y coefficients satisfy

Y, = X,Y;-, i=1,2,... . (2.91)

Equivalently, we have “factored out” (z - A,) from the characteristic polynomial.


For the case where U, is uncorrelated so that 0, = 0 for i > 0, difference
equation in (2.88) is homogeneous. We can solve for y0 by using yi = h2y0 along
with eq. (2.87). This gives y,, = S(l- cy,X,)-‘X’, i = O,l,. . .
To see how this result compares with the saddle-point approach, write (2.88) as

(2.92)

The characteristic equation of the matrix A is h2 - (l/a,)h - (Y~/(Y~= 0. Hence,


the roots of A are identical to the roots of the characteristic polynomial associated
with the second-order difference eq. (2.88). [This is a well-known result shown for
the general pth order difference equation in Anderson (1971)].
The characteristic vector of the matrix A associated with the unstable root Xi
is found from the equation (hi,, h,,)A = X1(hi,, hi,). Thus, the saddle point path
is given by

y;=-~y,l= 1
11 i --x1
a1 1
Y,-1. (2.93)

For the two methods to be equivalent, we need to show that (2.91) and (2.93)
are equivalent, or that h, = l/a, - Xi. This follows immediately from the fact
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2035

that the sum of the roots (A, + A,) of a second-order polynomial equals the
coefficients of the linear term in the polynomial: A, + X2 = l/(~r.
For the case where fIi = pi, we need to compare the particular solutions as well.
For the second-order scalar model we guess the form y/P) = ab’. Substituting this
into (2.88) we find that b = p and u = 6(1- (~rp - cy2p?‘))‘. To see that this
gives the same value for the particular solution that emerges from the matrix
formulation in eq. (2.58), note that

(PI- A)-‘dp’=

I 6 \
1 -q
= pi. (2.94)
P2-$p+012 _s
a1 \ “l/

Eq. (2.94) gives the particular solution for the vector (yi(‘), y,‘!‘,‘), which corre-
sponds to the vector y!‘
1 ) in eq. (2.58). Hence

- pcll, ‘Sp’
y/P’=
p2 - pa; l+ a2a; l

= 6P’
1- ‘ylp - (Y2p-l ’

which is the particular solution obtained from the second-order scalar representa-
tion.
Rather than obtaining the solution of the homogeneous system by factoring the
characteristic equation, one can equivalently factor the polynomial in the time
shift operators. Because the operator polynomials also provide a convenient way
to obtain the nonhomogeneous solution (as was illustrated in Section 2.3), this
approach essentially combines the homogeneous solution and the nonhomoge-
neous solution in a notationally and computationally convenient way.
Write (2.88) as

(2.95)

Let H(L) = L-l -l/cur + ((Y~/(Y~)L be the polynomial on the left-hand side of
2036 J. B. Tuylor

(2.95) and let P(z) = z* -l/((~r)z + (Y~/oL~be the characteristic polynomial in


(2.89). The polynomial H(L) can be factored into

P(l- W’)(l- w, (2.96)

where (p = -p-l, $ = - p-&(~;l, and where I_Lis one of the solutions of


P(p) = 0; that is one of the roots of P(e). This can be seen by equating the
coefficient of H(L) and the polynomial in (2.96). Continuing to assume that only
one of the roots of P( .) is greater than one in modulus (say A,) we set
+ = A;’ < 1. Since the product of the roots of P( .) equals a+~;’ we immediately
have that # = A,. Thus, there is a unique factorization of the polynomial with +
and J, both less than one in modulus.
Because IJ = A,, the stable solution (2.97) to the homogeneous difference
equation can be written

(l- Ir/L)y!H)= 0. (2.97)

The particular solution also can be written using the operator notation:

SCY,lp’
yy) = (2.98)
/.l(l- +P)(l- l/L) .

The complete solution is given by yj = y/H) + y:” which implies that

(l-x,L)yi= (l-X,L)y~H)+(l-X,L_‘)y,(P). (2.99)

The first term on the right-hand side of (2.99) equals zero. Therefore the complete
solution is given by

&Y;‘p’
Yi='ZYi-1+
x,(1 - x,‘r’)
SCX;‘p’
= A*yi_l + (2.100)
A,(1 - px;r> *

This solution is equivalent to that derived by adding the particular solution in


(2.95) to the solution of the homogeneous solution of (2.91).
Note that this procedure or solving (2.95) can be stated quite simply in two
steps: (1) factor the lag polynomial into two stable polynomials, one involving
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 203-I

positive powers of L (lags) and the other involving negative powers of L (leads),
and (2) operate on both sides of (2.95) by the inverse of the polynomial involving
negative powers of L.
It is clear from (2.94) that the yj weights are such that the solution for y, can be
represented as a first-order autoregressive process with a serially correlated error:

y, = x,y,_,+ Gq’(& - p)-lu,, (2.101)

where

24, = pu,_1+ E,.

In the papers by Sargent (1979) Taylor (1980a) and Hansen and Sargent
(1980) the difference equation in (2.95) was written y, = E,y,+, and ei = E,u,+;, a
form which can be obtained by taking conditional expectations in eq. (2.86). In
other words rather than working with the moving average coefficients they worked
directly with the conditional expectations. As discussed in Section 2.3 this
requires the use of a non-standard lag operator.

2.5. Rational expectations solutions as boundary value problems

It is useful to note that the problem of solving rational expectations models can
be thought of as a boundary value problem where final conditions as well as
initial conditions are given. To see this consider the homogeneous equation

Yi+l
= iYi i=O,l )... . (2.102)

The stationarity conditions place a restriction on the “ final” value limj ~ ,yi = 0
rather than on the “initial” value y,,. As an approximation we want y, = 0 for
large j. A traditional method to solve boundary value problems is “shooting”:
One guesses a value for y,, and then uses (2.102) to project (shoot) a value of y,
for some large j. If the resulting y, # 0 (or if y, is further from 0 than some
tolerance range) then a new value (chosen in some systematic fashion) of y0 is
tried until one gets yj sufficiently close to zero. It is obvious in this case that
y0 = 0 so it would be impractical to use such a method. But in nonlinear models
the approach can be quite useful as we discuss in Section 6.
2038 J. B. Taylor

This approach obviously generalizes to higher order systems; for example the
homogeneous part of (2.88) is

Y,+l= $Yi- ZYi-1 i=O,1,2 ,.... (2.103)

with y_ t = 0 as one initial condition and y, = 0 for some large j as the one
“final” condition. This is a two point boundary problem which can be solved in
the same way as (2.102).

3. Econometric evaluation of policy rules

Perhaps the main motivation behind the development of rational expectations


models was the desire to improve policy evaluation procedures. Lucas (1976)
argued that the parameters of the models conventionally used for policy evalua-
tion - either through model simulation or formal optimal control -would shift
when policy changed. The main reason for this shift is that expectations mecha-
nisms are adaptive, or backward looking, in conventional models and thereby
unresponsive to those changes in policy that would be expected to change
expectations of future events. Hence, the policy evaluation results using conven-
tional models would be misleading.
The Lucas criticism of conventional policy evaluation has typically been taken
as destructive. Yet, implicit in the Lucas’ criticism is a constructive way to
improve on conventional evaluation techniques by modeling economic phenom-
ena in terms of “structural” parameters; by “structural” one simply means
invariant with respect to policy intervention. Whether a parameter is invariant or
not is partly a matter of researcher’s judgment, of course, so that any attempt to
take the Lucas critique seriously by building structural models is subject to a
similar critique that the researcher’s assumption about which parameters are
structural is wrong. If taken to this extreme that no feasible structural modeling is
possible, the Lucas critique does indeed become purely destructive and perhaps
even stifling.
Hansen and Sargent (1980), Kydland and Prescott (1982) Taylor (1982) and
Christian0 (1983) have examined policy problems where only the parameters of
utility functions or production functions can be considered invariant or structur-
al. Taylor (1979, 1980b) has considered models where the parameters of the wage
and price setting functions are invariant or structural.
The thought experiments described in Section 2 whereby multiplier responses
are examined should be part of any policy evaluation technique. But it is
unrealistic to think of policy as consisting of such one-shot changes in the policy
instrument settings. They never occur. Rather, one wants to consider changes in
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2039

the way the policymakers respond to events - that is, changes in their policy rules.
For this we can make use of stochastic equilibrium solutions examined in Section
2. We illustrate this below.

3.1. Policy evaluation for a univariate model

Consider the following policy problem which is based on model (2.1). Suppose
that an econometric policy advisor knows that the demand for money is given by

m, - pt = - B@,P,+~ - P~)+G (3.1)

Here there are two shocks to the system, the supply of money m, and the demand
for money u,. Suppose that u, = put-r + E,, and that in the past the money
supply was fixed: m, = 0; suppose that under this fixed money policy, prices were
thought to be too volatile. The policy advisor is asked by the Central Bank for
advice on how m, can be used in the future to reduce the fluctuations in the price
level. Note that the policy advisor is not asked just what to do today or tomorrow,
but what to do for the indefinite future. Advice thus should be given as a
contingency rule rather than as a fixed path for the money supply.
Using the solution technique of Section 2, the behavior of pt during the past is

Et
p*=ppt-1- l+p(l_p). (3.2)

Conventional policy evaluation might proceed as follows: first, the econometri-


cian would have estimated p in the reduced form relation (3.2) over the sample
period. The estimated equation would then serve as a model of expectations to be
substituted into (3.1); that is, E,pl+i = ppl would be substituted into

m,-p,=--P(fp,-p,)+u,. (3.3)

The conventional econometricians model of the price level would then be

m,-u,
(3.4)
pr= l+P(l-p).

Considering a feedback policy rule of the form m, = gu,_ r eq. (3.4) implies

2[g2+1-2gp]. (3.5)
varpf= [l+&)l’(l-pZ)u~

If there were no cost to varying the money supply, then eq. (3.5) indicates that the
best choice for g to minimize fluctuation in pr is g = p.
2040 J. B. Tqdor

But we know that (3.5) is incorrect if g # 0. The error was to assume that
E*Pl+l = ppl regardless of the choice of policy. This is the expectations error that
rational expectations was designed to avoid. The correct approach would have
been to substitute m, = gut-i directly into (3.1) and calculate the stochastic
equilibrium for pt. This results in

-I-P(l-d
pt= (l+p)(l+~(I-p))uf+~u~-,. (3.6)

Note how the parameters of (3.6) depend on the parameters of the policy rule.
The variance of pr is

1 (l+P(I-s))2 _ 2g(l+ P(l- g>>p + g2 a2.


Var pt =
(l+p)2(1-P2) 1+/W-P) E
i (I+P(l-P)J2 1

(3.7)

The optimal policy is found by minimizing Var pt with respect to g.


This simple policy problem suggests the following approach to macro policy
evaluation: (1) Derive a stochastic equilibrium solution which shows how the
endogeneous variables behave as a function of the parameters of the policy rule;
(2) Specify a welfare function in terms of the moments of the stochastic equi-
librium, and (3) Maximize the welfare function across the parameters of the
policy rule. In this example the welfare function is simply Var p. In more general
models there will be several target variables. For example, in Taylor (1979) an
optimal policy rule to minimize a weighted average of the variance of real output
and the variance of inflation was calculated.
Although eq. (3.1) was not derived explicitly from an individual optimization
problem, the same procedure could be used when the model is directly linked to
parameters of a utility function. For instance, the model of Example (5) in
Section 2.2 in which the parameters depend on a firm’s utility function could be
handled in the same way as the model in (3.1).

3.2. The Lucas critique and the Cowles Commission critique

The Lucas critique can be usefully thought of as a dynamic extension of the


critique developed by the Cowles Commission researchers in the late 1940s and
early 1950s and which gave rise to the enormous literature on simultaneous
equations. At that time it was recognized that reduced forms could not be used
Ch. 34: Stcrhilizdon Policy in Mucroeconomic Fluctuations 2041

for many policy evaluation questions. Rather one should model structural rela-
tionships. The parameters of the reduced form are, of course, functions of the
structural parameters in the standard Cowles Commission setup. The discussion
by Marschak (1953), for example, is remarkably similar to the more recent
rational expectations critiques; Marschak did not consider expectations variables,
and in this sense the rational expectations critique is a new extension. But earlier
analyses like Marschak’s are an effort to explain why structural modeling is
necessary, and thus has much in common with more recent research.

3.3. Game-theoretic approaches

In the policy evaluation procedure discussed above, the government acts like a
dominant player with respect to the private sector. The government sets g and the
private sector takes g as given. The government then maximizes its social welfare
function across different values of g. One can imagine alternatively a game
theoretic setup in which the government and the private sector each are maximiz-
ing utility. Chow (1983) Kydland (1975) Lucas and Sargent (1981), and Epple,
Hansen, and Roberds (1983) have considered this alternative approach. It is
possible to specify the game theoretic model as a choice of parameters of decision
rules in the steady state or as a formal non-steady state dynamic optimization
problem with initial conditions partly determining the outcome. Alternative
solution concepts including Nash equilibria have been examined.
The game-theoretic approach naturally leads to the important time incon-
sistency problem raised by Kydland and Prescott (1977) and Calvo (1979). Once
the government announces its policy, it will be optimal to change it in the future.
The consistent solution in which everyone expects the government to change is
generally suboptimal. Focussing on rules as in Section 3.1 effectively eliminates
the time inconsistency issue. But even then, there can be temptation to change the
rule.

4. Statistical inference

The statistical inference issues that arise in rational expectations models can be
illustrated in a model like that of Section 2.

4. I. Full information estimation

Consider the problem of estimating the parameters of the structural model

yt = aEy,+t + ax, + u,, (4.1)


2042 J. B. Tqhr

where u, is a serially uncorrelated random variable. Assume (for example) that xy


has a finite moving average representation:

x, = E, + BIEt_l + * . . + 8qEt-q’
(4.2)

where E, is serially uncorrelated and assume that Cov( u,, E$)= 0 for all t and s.
To obtain the full information maximum likelihood estimate of the structural
system (4.1) and (4.2) we need to reduce (4.1) to a form which does not involve
expectations variables. This can be done by solving the model using one of the
techniques described in Section 2. Using the method of undetermined coefficients,
for example, the solution for yt is

y, = y().st+ * * . + yyE,-q + uf’ (4.3)

where the y parameters are given by

’Yo
Yl
(4.4)

\ y9

Eqs. (4.2) and (4.3) together form a two dimensional vector model.

(4.5)

Eq. (4.5) is an estimatable reduced form system corresponding to the structural


form in (4.1) and (4.2).
If we assume that (u,, et) is distributed normally and independently, then the
full-information maximum likelihood estimate of (13,,. . . , Oq,a, 8) can be obtained
using existing methods to estimate multivariate ARMA models. See Chow (1983,
Section 6.7 and 11.6). Note that the coefficients of the ARMA model (4.5) are
constrained. There are cross-equation restrictions in that the 19and y parameters
are related to each other by (4.4). In addition, relative to a fully unconstrained
ARMA model, the off-diagonal elements of the autoregression are equal to zero.
Full information estimation maximum likelihood methods for linear rational
expectations models have been examined by Chow (19X3), Muth (1981) Wallis
(1980), Hansen and Sargent (1980, 1981), Dagli and Taylor (1985) Mishkin
Ch. 34: Stubilizution Policy in Macroeconomic Fluctuations 2043

(1983) Taylor (1979, 1980a), and Wickens (1982). As in this example, the basic
approach is to find a constrained reduced form and maximize the likelihood
function subject to the constraints. Hansen and Sargent (1980, 1981) have
emphasized these cross-equation constraints in their expositions of rational expec-
tations estimation methods. In Muth (1981), Wickens (1982) and Taylor (1979)
multivariate models were examined in which expectations are dated at t - 1 rather
than 1 and E,_iy, app ears in (4.1) rather than E,y,+,. More general multivariate
models with leads and lags are examined in the other papers.
For full information estimation, it is also important that the relationship
between the structural parameters and the reduced form parameters can be easily
evaluated. In this example the mapping from the structural parameters to the
reduced form parameters is easy to evaluate. In more complex models the
mapping does not have a closed form; usually because the roots of high-order
polynomials must be evaluated.

4.2. Identification

There has been relatively little formal work on identification in rational expecta-
tions models. As in conventional econometric models, identification involves the
properties of the mapping from the structural parameters to the reduced form
parameters. The model is identified if the structural parameters can be uniquely
obtained from the reduced form parameters. Over-identification and under-iden-
tification are similarly defined as in conventional econometric models. In rational
expectations models the mapping from reduced form to structural parameters is
much more complicated than in conventional models and hence it has been
difficult to derive a simple set of conditions which have much generality. The
conditions can usually be derived in particular applications as we can illustrate
using the previous example.
When q = 0, there is one reduced form parameter ya, which can be estimated
from (4.2) and (4.3), recalling that Cov (u,, E,) = 0, and two structural parameters
J and (Yin eq. (4.4). Hence, the model is not identified. In this case, 6 = y0 is
identified from the regression of y, on the exogenous x,, but (Yis not identified.
When q = 1, there are three reduced form parameters yO, y1 and 8, which can be
:stimated from (4.2) and (4.3), and three structural parameters 6, L-X, and 8,. (0, is
,oth a structural and reduced form parameter since x, is exogenous). Hence, the
node1 is exactly identified according to a simple order condition. More generally,
here are q + 2 structural parameters (6, (Y,6’,,. . . , 8,) and 2q + 1 reduced form
jarameters (ye, yi,. . . , y,, 8,, . . . , f?,) in this model. According to the order condi-
ions, therefore, the model is overidentified if q > 1.
Treatments of identification in more general models focus on the properties of
he cross-equation restrictions in more complex versions of eq. (4.4). Wallis (1980)
ives conditions for identification for a class of rational expectations models; the
2044 J. B. Taylor

conditions may be checked in particular applications. Blanchard (1982) has


derived a simple set of identification restrictions for the case where x, in (4.2) is
autoregressive and has generalized this to higher order multivariate versions of
(4.1) and (4.2).

4.3. Hypothesis testing

Tests of the rational expectations assumption have generally been constructed as


a test of the cross-equation constraints. These constraints arise because of the
rational expectations assumption. In the previous example, the null hypothesis
that the cross-equation constraints in (4.5) hold can be tested against the
alternative that (4.5) is a fully unconstrained moving average model by using a
likelihood ratio test. Note, however, that this is a joint test of rational expecta-
tions and the specification of the model. Testing rational expectations against a
specific alternative like adaptive expectations usually leads to non-nested hy-
potheses.
In more general linear models, the same types of cross-equation restrictions
arise, and tests of the model can be performed analogously. However, for large
systems the fully unconstrained ARMA model may be difficult to estimate
because of the large number of parameters.

4.4. Limited information estimation methods

Three different types of “limited information” estimates have been used for
rational expectations models. These can be described using the model in (4.1) and
(4.2). One method investigated by Wallis estimates (4.2) separately in order to
obtain the parameters 8,, . . . , 6,. These estimates then are taken as given (as
known parameters) in estimating (4.3). Clearly this estimator is less efficient than
the full information estimator, but in more complex problems the procedure saves
considerable time and effort. This method has been suggested by Wallis (1980)
and has been used by Papell(1984) and others in applied work.
A second method proposed by Chow (1983) and investigated by Chow and
Reny (1983) was mentioned earlier in our discussion of nonuniqueness. This
method does not impose the saddle point stability constraints on the model. It
leads to an easier computation problem than does imposing the saddle point
constraints. If the investigator does not have any reason to impose this constraint,
then this could prove quite practical.
A third procedure is to estimate eq. (4.1) as a single equation using instrumen-
tal variables. Much work has been done in this area in recent years, and because
of computational costs of full information methods it has been used frequently in
applied research. Consider again the problem of estimating eq. (4.1). Let e, =
E,Y,+, - Y,+1 be the forecast error for the prediction of Y,. Substitute E,y,+, into
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2045

(4.1) to get

Yt = "Yt+l+ 6%+ “t - aet+1* (4.6)

By finding instruments of variables for yI+ i that are uncorrelated with u, and e,, 1
one can estimate (4.6) using the method of instrumental variables. In fact this
estimate would simply be the two stage least squares estimate with y,, i treated as
if it were a right-hand side endogenous variable in a conventional simultaneous
equation model. Lagged values of x, could serve as instruments here. This
estimate was first proposed by McCallum (1976).
Several extensions of MeCallum’s method have been proposed to deal with
serial correlation problems including Cumby, Huizinga and Obstfeld (1983)
McCallum (1979), Hayashi and Sims (1983) Hansen (1982), and Hansen and
Singleton (1982). A useful comparison of the efficiency of these estimators is
found in Cumby, Huizinga and Obstfeld (1983).

5. General linear models

A general linear rational expectations model can be written as

B,JJ, + B,y,_, + . . . + $,y,_, + AIEyt+l + . . . + A&Y~+~ = Cut, (5.1)


t t

where y, is a vector of endogenous variables, u, is a vector of exogenous variables


or shocks, and Ai, B, and. C are matrices containing parameters.
Two alternative approaches have been taken to solve this type of model. Once
it is solved, the policy evaluation and estimation methods discussed above can be
applied. One approach is to write the model as a large first-order vector system
directly analogous to the 2-dimensional vector model in eq. (2.50). The other
approach is to solve (5.1) directly by generalizing the approach taken to the
second-order scalar model in eq. (2.86). The first approach is the most straightfor-
ward. The disadvantage is that it can easily lead to very large (although sparse)
matrices with high-order polynomials to solve to obtain the characteristic roots.
This type of generalization is used by Blanchard and Kahn (1980) and Anderson
and Moore (1984) to solve deterministic rational expectations models.

5.1. A general jrst-order vector model

Equation (5.1) can be written as

Ez t+i = AZ, + Du,, (5.2)


2046 J.B. Tqdor

by stacking y,, y,_,, . . . , Y,_~ into the vector z, much as in eq. (2.50). (It is
necessary that A, be nonsingular to write (5.1) as (5.2)). Anderson and Moore
(1984) have developed an algorithm that reduces equations with a singular A,
into an equivalent form with a nonsingular matrix coefficient of yfiq and have
applied it to an econometric model of the U.S. money market. (Alternatively,
Preston and Pagan (1982, pp. 297-304) have suggested that a “shuffle” algorithm
described by Luenberger (1977) be used for this purpose). In eq. (5.2) let z1 be an
n-dimensional vector and let u, be an m dimensional vector of stochastic
disturbances. The matrix A is n x n and the matrix D is n i< m.
We describe the solution for the case of unanticipated temporary shocks:
u, = Ed where E, is a serially uncorrelated vector with a zero mean. Alternative
assumptions about u, can be handled by the methods discussed in Section 2.2.
The solution for zy can be written in the general form:

z,= E riE*_l, (5.3)


i=o

where the ri are n X m matrices of unknown coefficients. Substituting (5.3) into


(5.2) we get

r,=AI’,+D,

c.+l = AC i =1,2 )... . (5.4)

Note that these matrix difference equations hold for each column of c sep-
arately; that is

yl= Ax, + d,
Yr+1= hi i =1,2 ,..., (5.5)

where y, is any one of the n x 1 column vectors in ri and where d is the


corresponding column of D. Eq. (5.5) is a deterministic first-order vector dif-
ference equation analogous to the stochastic difference equation in (5.2). The
solution for the Fi is obtained by solving for each of the columns of r, separately
using (5.5).
The analogy from the 2-dimensional case is now clear. There are n equations in
(5.5). In a given application we will know some of the elements of yo, but not all
of them. Hence, there will generally be more than n unknowns in (5.5). The
number of unknowns is 2n - k where k is the number of values of y0 which we
know. For example, in the simple bivariate case of Section 2 where n = 2, we
know that the second element of y0 equals 0. Thus, k = 1 and there are 3
unknowns and 2 equations.
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2047

To get a unique solution in the general case, we therefore need (2n - k)- n = n
- k additional equations. These additional equations can be obtained by requir-
ing that the solution for y, be stationary or equivalently in this context that the yi
do not explode. If there are exactly n - k distinct roots of A which are greater
than one in modulus, then the saddle point manifold will give exactly the number
of additional equations necessary for a solution. The solution will be unique. If
there are less than n - k roots then we have the same nonuniqueness problem
discussed in Section 2.
Suppose this root condition for uniqueness is satisfied. Let the n - k roots of A
that are greater than one in modulus be hi,. . . , A,_,. Diagonalize A as K’AH =
A. Then

ffY;+1=AHy, i=1,2 )‘.. . (5.6)

where A, is a diagonal matrix with all the unstable roots on the diagonal. The y
vectors are partitioned accordingly and the rows (HII, HJ of H are the char-
acteristic vectors associated with the unstable roots. Thus, for stability we require

Hllyl(l) + Hl,yj2) = 0. (5.8)

These n - k equations define the saddle point manifold and are the additional
n - k equations needed for a solution. Having solved for yi and the unknown
elements of yO we then obtain the remaining yi coefficients from

y,“’ = - H,-,1H12yi'2) i=2 Ye.., (5.9)


y/$ = A 2y;2) i=1,2 )... . (5.10)

5.2. Higher order vector models

Alternatively the solution of (5.1) can be obtained directly without forming a


large first order system. This method is essentially a generalization of the scalar
method used in Section 2.4. Very briefly, by substituting the general solution of y,
into (5.1) and examining the equation in the rj coefficients the solution can be
obtained by factoring the characteristic polynomial associated with these equa-
tions.
This approach has been used by Hansen and Sargent (1981) in an optimal
control example where p = q and B, = hA;. In that case, the factorization can be
2048 J. B. Taylor

shown to be unique by an appeal to the factorization theorems for spectral


density matrices. A similar result was used in Taylor (1980a) in the case of a
factoring spectral density functions.
In general econometric applications, these special properties on the Ai and Bi
matrices do not hold. Whiteman (1983) has a proof that a unique factorization
exists under conditions analogous to those placed on the roots of the model in
Section 5.1. Dagli and Taylor (1983) have investigated an iterative method to
factor the polynomials in the lag operator in order to obtain a solution. This
factorization method was used by Rehm (1982) to estimate a 7-equation rational
expectations model of the U.S. using full information maximum likelihood.

6. Techniques for nonlinear models

As yet there has been relatively little research with nonlinear rational expectations
models. The research that does exist has been concerned more with solution and
policy evaluation rather than with estimation. Fair and Taylor (1983) have
investigated a full-information estimation method for a non-linear model based
on a solution procedure described below. However, this method is extremely
expensive to use given current computer technology. Hansen and Singleton (1982)
have developed and applied a limited-information estimator for nonlinear models.
There are a number of alternative solution procedures for nonlinear models
that have been investigated in the literature. They generally focus on deterministic
models, but can be used for stochastic analysis by stochastic simulation tech-
niques.
Three methods are reviewed here: (1) a “multiple shooting” method, adopted
for rational expectations models from two-point boundary problems in the
differential equation literature by Lipton, Poterba, Sachs, and Summers (1982)
(2) an “extended path” method based on an iterative Gauss-Seidel algorithm
examined by Fair and Taylor (1983), and (3) a nonlinear stable manifold method
examined by Bona and Grossman (1983). This is an area where there is likely to
be much research in the future.
A general nonlinear rational expectation model can be written

(6.1)

for i=l,..., n, where y, is an n dimensional vector of endogenous variables at


time t, x, is a vector of exogenous variables, (Y;is a vector of parameters, and u,~
is a vector of disturbances. In some write-ups, (e.g. Fair-Taylor) the viewpoint
date on the expectations in (6.1) is based on information through period t - 1
Ch. 34: Stabilization Policy in Mucroeconomic Fluctuations 2049

rather than through period t. For continuity with the rest of this paper, we
continue to assume that the information is through period t, but the methods can
easily be adjusted for different viewpoint dates. We also distinguish between
exogenous variables and disturbances, because some of the nonlinear algorithms
can be based on known future values of x, rather than on forecasts of these from
a model like (2.2).

6. I. Multiple shooting method

We described the shooting method to solve linear rational expectations models in


Section 2.5. This approach is quite useful in nonlinear models. The initial
conditions are the values for the lagged dependent variables and the final
conditions are given by the long-run equilibrium of the system. In this case, a
system of nonlinear equations must be solved using an iterative scheme such as
Newton’s method. One difficulty with this technique is that (6.1) is explosive when
solved forward so that very small deviations of the endogenous variables from the
solution can lead to very large final values. If this is a problem then the shooting
method can be broken up in the series of shootings (multiple shooting) over
intervals smaller than (0, j). For example three intervals would be (0, ji), (ji, j,)
and (j,, j) for 0 < j, -Cj, < j. In effect the relationship between the final values
and the initial values is broken up into a relationship between intermediate values
of these variables. The intervals can be made arbitrarily small. This approach has
been used by Summers (1981) and others to solve rational expectations models of
investment and in a number of other applications. It seems to work very well.

6.2. Extended path method

This approach has been examined by Fair and Taylor (1983) and used to solve
large-scale nonlinear models. Briefly it works as follows. Guess values for the
E,y,+, in eq. (6.1) for j = 1,. _., J. Use these values to solve the model to obtain a
new path for yI+,. Replace the initial guess with the new solution and repeat the
process until the path Y,+~,j = 1,. . . , J converges, or changes by less than some
tolerance range. Finally, extend the path from J to J + 1 and repeat the previous
sequence of iterations. If the values of y,, on this extended path are within the
tolerance range for the values of J + 1, then stop; otherwise extend the path one
more period to J + 2 and so on. Since the model is nonlinear, the Gauss-Seidel
method is used to solve (6.1) for each iteration given a guess for y,,,. There are no
general proofs available to show that this method works for an arbitrary nonlin-
ear model. When applied to the linear model in Section (2.1) with Ial < 1 the
method is shown to converge in Fair and Taylor (1983). When J(Y~>1, the
2050 J. B. Taylor

iterations diverge. A convergence proof for the general linear model is not yet
available, but many experiments have indicated that convergence is achieved
under the usual saddle path assumptions. This method is expensive but is fairly
easy to use. An empirical application of the method to a modified version of the
Fair.model is found in Fair and Taylor (1983) and to a system with time varying
parameters in Taylor (1983). Carlozzi and Taylor (1984) have used the method to
calculate stochastic equilibria. This method also appears to work well.

6.3. Nonlinear saddle path manifold method

In Section (2.4) we noted that the solution of the second-order linear difference
eq. (2.88) is achieved by placing the solution on the stable path associated with
the saddle point line. For nonlinear models one can use the same approach after
linearizing the system. The saddle point manifold is then linear. Such a lineariza-
tion, however, can only yield a local approximation.
Bona and Grossman (1983) have experimented with a method that computes a
nonlinear saddle-point path. Consider a deterministic univariate second-order
version of (6.1):

fbi+,, Y,, rt-1) =07 i=1,2,... . (6.2)

A solution will be of the form

(6.3)

where we have one initial condition y,. Note that eq. (6.2) is a nonlinear version
of the homogeneous part of eq. (2.88) and eq. (6.3) is a nonlinear version of the
saddle path dynamics (2.91).
Bona and Grossman (1983) compute g( .) by a series of successive approxima-
tions. If eq. (6.3) is to hold for all values of the argument of g then

must hold for every value of x (at least within the range of interest). In the
application considered by Bona and Grossman (1983) there is a natural way to
write (6.4) as

g(x) = h(ddx))t dx),x>> (6.5)


for some function h(a). For a given x eq. (6.5) may be solved using successive
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2051

approximations:

&+1(X)= ~(g”(gnb))~
&b>~ 4 n=0,1,2 ).... (6.6)
The initial function g,(x) can be chosen to equal the linear stable manifold
associated with the linear approximation of f( .) at x.
Since this sequence of successive approximations must be made at every x,
there are two alternative ways to proceed. One can make the calculations
recursively for each point y, of interest; that is, obtain a function g for x = y,, a
new function for x = y, and so on. Alternatively, one could evaluate g over a grid
of the entire range of possible values of k, and form a “meta function” g which is
piecewise linear and formed by linear interpolation for the value of x between the
grid points. Bona and Grossman (1983) use the first procedure to numerically
solve a macroeconomic model of the form (6.2).
It is helpful to note that when applied to linear models the method reduces to a
type of undetermined coefficients method used by Lucas (1975) and McCallum
(1983) to solve rational expectations models (a different method of undetermined
coefficients than that applied to linear process (2.4) in Section 2 above). To see
this, substitute a linear function y, = gy_, into

Yt+1 = ZY, - -&; (6.7)

the deterministic difference equation already considered in eq. (2.88). The result-
ing equation is

2
g’---g+$ y,_1=0. (6.8)
i _I

Setting the term in parenthesis equal to zero, yields the characteristic polynomial
of (6.7) which appears in eq. (2.89). Under the usual assumption that one root is
inside and one root is outside the unit circle a unique stable value of g is found
and is equal to stable root h, of (2.89).

7. Concluding remarks

As its title suggests, the aim of this chapter has been to review and tie together in
an expository way the extensive volume of recent research on econometric
techniques for macroeconomic policy evaluation. The table of contents gives a
good summary of the subjects that I have chosen to review. In conclusion it is
perhaps useful to point out in what ways the title is either overly inclusive or not
inclusive enough relative to the subjects actually reviewed.
2052 J. B. Tuylor

All of the methods reviewed-estimation, solution, testing, optimization -


involve the rational expectations assumption. In fact the title would somewhat
more accurately identify the methods reviewed if the work “new” were replaced
by “rational expectations”. Some other new econometric techniques not reviewed
here that have macroeconomic policy applications include the multivariate time
series methods (vector auto-regressions, causality, exogeneity) reviewed by Geweke
(1983) in Volume 1 of the Handbook of Econometrics, the control theory methods
reviewed by Kendrick (1981) in Volume 1 of the Handbook of athematical
Economics, and the prediction methods reviewed by Fair (1986) in this volume.
On the other hand some of the estimation and testing techniques reviewed here
were designed for other applications even though they have proven useful for
policy.
Some of the topics included were touched on only briefly. In particular the
short treatment of limited information estimation techniques, time inconsistency,
and stochastic general equilibrium models with optimizing agents does not give
justice to the large volume of research in these areas.
Most of the research reviewed here is currently very active and the techniques
are still being developed. (About 3 of the papers in the bibliography were
published between the time I agreed to write the review in 1979 and the period in
1984 when I wrote it.) The development of computationally tractable ways to deal
with large and in particular non-linear models is an important area that needs
more work. But in my view the most useful direction for future research in this
area will be in the applications of the techniques that have already been
developed to practical policy problems.

References
Anderson, Gary and George Moore (1984) “An Efficient Procedure for Solving Linear Perfect
Foresight Models”. Board of Governors of the Federal Reserve Board, unpublished manuscript.
Anderson, T. W. (1971) The Statistical Anulysis of Time Series. New York: Wiley.
Baumol, W. J. (1970) Economic Dynamics: An Introduction, 3d ed. New York: Macmillan.
Birkhoff, Garret and G. C. Rota (1962) Ordinary DifSerential Equations. Waltham: Blaisdell, 2nd
Edition.
Blanchard, Olivier J. (1979) “Backward and Forward Solutions for Economies with Rational Expecta-
tions”, Americun L&onomic Review, 69, 114-118.
Blanchard. Olivier J. (1982) “Identification in Dvnamic Linear Models with Rational Exoectations”.
Technical Paper No. 24, ‘National Bureau of Economic Research.
Blanchard, Oliver and Charles Kahn (1980) “The Solution of Linear Difference Models under
Rational Expectations”, Econometrica, 48, 1305-1311.
Blanchard, Olivier and Mark Watson (1982) “Rational Expectations, Bubbles and Financial Markets”,
in: P. Wachtel, ed., Crises in The Economrc and Financial Srructure. Lexington: Lexington Books.
Bona, Jerry and Sanford Grossman (1983) “Price and Interest Rate Dynamics in a Transactions Based
Model of Money Demand”. University of Chicago, unpublished paper.
Buiter, Willem H. and Marcus Miller (1983) “Real-Exchange Rate &&shooting and the Output Cost
of Brinaina Down Inflation: Some Further Results”. in: J. A. Frenkel. ed.. Exchange Rates and
Jnterna~on~l Macroeconomics. Chicago: University of Chicago Press for National Bureau of
Economic Research.
Ch. 34: Stabilization Policy in Macroeconomic Fluctuations 2053

Cagan, Phillip (1956) “The Monetary Dynamics of Hyperinflation”, in: M. Friedman, ed., Studies in
the Quantity Theo& of Money. Chicago: University of Chicago Press.
Calvo. Guillermo (1978) “On The Time Consistencv _ of Ontimal _ Policv in a Monetary Economy”.
Econometrica, 46, 1411-1428.
Calvo, Guillermo (1980) “Tax-Financed Government Spending in a Neoclassical Model with Sticky
Wages and Rational Expectations”, Journal of Economic Dynamics and Control, 2, 61-78.
Carlozzi, Nicholas and John B. Taylor (1984) “International Capital Mobility and the Coordination of
Monetary Rules”, in: J. Bandhari, ed., Exchange Rate Management under Uncertainty. MIT Press,
forthcoming.
Chow, G. C. (1983) Econometrics. New York: McGraw Hill.
Chow, Gregory and Philip J. Reny (1984) “On Two Methods for Solving and Estimating Linear
Simultaneous Equations with Rational Expectations”. Princeton University, unpublished paper.
Christiano, Lawrence J. (1984) “Can Automatic Stabilizers be Destablizing: An Old Question
Revisited”, Carnegie- Rochester Conference Series on Public Policy, 20, 147-206.
Cumby, Robert E., John Huizinga and Maurice Obstfeld (1983) “Two-Step Two-Stage Least Squares
Estimation in Models with Rational Expectations”, Journal of Econometrics, 21, 333-355.
Dagli, C. Ates and John B. Taylor (1985) “Estimation and Solution of Linear Rational Expectations
Models Using a Polynomial Matrix Factorization”, Journal of Economic Dynamics and Control,
forthcoming.
Dhrymes, Pheobus J. (1971) Distributed Lags: Problems of Estimation and Formulation. San Francisco:
Holden-Day.
Dixit, Avinash (1980) “A Solution Technique for Rational Expectations Models with Applications to
Exchange Rate and Interest Rate Determination”. Princeton University, unpublished paper.
Dombusch, Rudiger (1976) “Expectations and Exchange Rate Dynamics”, Journal of Political
Economy, 84, 1161-1176.
Epple, Dennis, Lam P. Hansen and William Roberds (1983) “Linear Quadratic Games of Resource
Depletion”, in: Thomas J. Sargent, ed., Energy, Foresight, and Strategy. Washington: Resources for
the Future.
Evans, George and Seppo Honkapohja (1984) “A Complete Characterization of ARMA Solutions to
Linear Rational Expectations Models”. Technical Report No. 439, Institute for Mathematical
Studies in the Social Sciences, Stanford University.
Fair, Ray (1986) “Evaluating the Predictive Accuracy of Models”, in: Z. Griliches and M. Intriligator,
eds., Handbook of Econometrics. Amsterdam: North-Holland, Vol. III.
Fair, Ray and John B. Taylor (1983) “Solution and Maximum Likelihood Estimation of Dynamic
Nonlinear Rational Expectations Models”, Econometrica, 51, 1169-1185.
Fischer, Stanley (1979) “Anticipations and the Nonneutrality of Money”, Journal of Political
Economy, 87, 225-252.
Flood, R. P. and P. M. Garber (1980) “Market Fundamentals versus Price-Level Bubbles: The First
Tests”, Journal of Political Economy, 88, 745-770.
Futia, Carl A. (1981) “Rational Expectations in Stationary Linear Models”, Econometrica, 49,
171-192.
Geweke, John (1984) “Inference and Causality in Economic Time Series Models”, in: Z. Griliches and
M. Intriligator, eds., Handbook of Econometrics. Amsterdam: North-Holland, Vol. II.
Gourieroux, C., J. J. Laffont and A. Monfort (1982) “Rational Expectations in Linear Models:
Analysis of Solutions”, Econometrica, 50,409-425.
Hansen, Lam P. (1982) “Large Sample Properties of Generalized Method of Moments Estimators”,
Econometrica, 50,1029-1054.
Hansen, Lam P. and Thomas J. Sargent (1980) “Formulating and Estimating Dynamic Linear
Rational Expectations Models”, Journal of Economic Dynamics and Control, 2, 7-46.
Hansen, Lam P. and Thomas J. Sargent (1981) “Linear Rational Expectations Models for Dynami-
cally Interrelated Variables”, in: R. E. Lucas and T. J. Sargent, eds., Rational Expectations and
Econometric Practice. Minneapolis: University of Minnesota Press.
Hansen, L. P. and K. Singleton (1982) “Generalized Instrumental Variables Estimation of Nonlinear
Rational Expectations Models”, Econometrica, 50, 1269-1286.
Harvey, Andrew C. (1981) Time Series Models. New York: Halsted Press.
2054 J. R. Tuylor

Hayashi, Fumio and Christopher Sims (1983) “Nearly Efficient Estimation of Time Series Models
with Predetermined, but not Exogenous, Instruments”, Econometrica, 51, 783-798.
Kendrick, David (1981) “Control Theory with Applications to Economics”, in: K. Arrow and M.
Intriligator, eds., Amsterdam: North-Holland, Vol. I.
Kouri, Pentti J. K. (1976) “The Exchange Rate and the Balance of Payments in the Short Run and in
the Long Run: A Monetary Approach”, Scandinavian Journal of Economics, 78, 280-304.
Kydland, Finn E. (1975) “Noncooperative and Dominant Player Solutions in Discrete Dynamic
Games”, International Economic Review, 16, 321-335.
Kydland, Finn and Edward C. Prescott (1977) “Rules Rather Than Discretion: The Inconsistency of
Optimal Plans”, Journal of Political Econom.v, 85, 473-491.
Kydland, Finn and Edward C. Prescott (1982) “Time to Build and Aggregate Fluctuations”,
Econometrica, 50, 1345-1370.
Lipton, David, James Poterba, Jeffrey Sachs and Lawrence Summers (1982) “Multiple Shooting in
Rational Expectations Models”, Economefricu, 50, 1329-1333.
Lucas, Robert E. Jr. (1975) “An Equilibrium Model of the Business Cycle”, Journal of Pohticul
Economy, 83, 1113-1144. _
Lucas. Robert E. Jr. (1976) “Econometric Policv Evaluation: A Critioue”, in: K. Brunner and A. H.
Mehzer, eds., Carnegie Rochester Conference ‘Series on Public PO/;&. Amsterdam: North-Holland,
19-46.
Lucas, Robert E. Jr. and Thomas J. Sargent (1981) “Introduction”, to their Rationul Expectutions and
Econometric Practice. University of Minneapolis.
Luenberger, David G. (1977) “Dynamic Equations in Descriptor Form”, IEEE Trunscrctions on
Automatic Control, AC-22, 312-321.
Marschak, Jacob (1953) “Economic Measurements for Policy and Prediction”, in: W. C. Hood and
T. C. Koopmans, eds., Studies in Econometric Method. Cowles Foundation Memograph 14, New
Haven: Yale University Press.
McCallum, Bennett T. (1976) “Rational Expectations and the Natural Rate Hypothesis: Some
Consistent Estimates”, Econometrica, 44, 43-52.
McCallum. Bennett T. (1979) “Topics Concerning the Formulation, Estimation, and Use of Macro-
econometric Models with Rational Expectation”, American Statistical Association ~ Proceedings of
the Business und Economics Section, 65-72.
McCallum, Bennett T. (1983) “On Non-Uniqueness in Rational Expectations: An Attempt at
Perspective”, Journal of Monetary Economics, 11, 139-168.
Mishkin, Frederic S. (1983) A Rational Expectations Approach to Macroeconometrics: Testmg Poliq
Ineffectiveness and EfJicient-Markets Models. Chicago: University of Chicago Press.
Muth, John F. (1961) “Rational Expectations and The Theory of Price Movements”, Economefricu.
29, 315-335.
Muth, John F. (1981) “Estimation of Economic Relationships Containing Latent Expectations
Variables”, reprinted in: R. E. Lucas and T. J. Sargent, eds., Rational Expectations rend Econometric
Practice. Minneapolis: University of Minnesota Press.
Papell, David (1984) “Anticipated and Unanticipated Disturbances: The Dynamics of The Exchange
Rate and The Current Account”, Journal of international Money und Finance, forthcoming.
Preston, A. J. and A. R. Pagan (1982) The Theory of Economic Policy. Cambridge: Cambridge
University Press.
Rehm, Dawn (1982) Staggered Contructs, Capital Flows, and Mucroeconomtc Stability tn The Open
Economy. Ph.D. Dissertation, Columbia University.
Rodriquez, Carlos A. (1980) “The Role of Trade Flows in Exchange Rate Determination: A Rational
Expectations Approach”, Journal of Politico1 Economy, 88, 1148-1158.
Sargent, Thomas J. (1979) Macroeconomic Theory. New York: Academic Press.
Sargent, Thomas J. and Neil Wallace (1973) “Rational Expectations and The Dynamics of Hyperinfla-
tion”, Internutional Economic Review, 14, 328-350.
Sargent, Thomas J. and Neil Wallace (1975) “‘Rational’ Expectations, The Optimal Monetary
Instrument, and The Optimal Money Supply Rule”, Journal of Political Economy: 83, 241-254. .
Summers, Lawrence H. (1981) “Taxation and Cornorate Investment: A a-Theorv _ Annroach”.
&.
Brookings Papers on Economic Activity, 1, 67-127. L
Taylor, John B. (1977) “Conditions for Unique Solutions in Stochastic Macroeconomic Models with
Rational Expectations”, Econometrica, 45, 1377-1385.
Ch. 34: Stobilizution Policy in Macroeconomic Fluctuations 2055

Taylor, John B. (1979) “Estimation and Control of a Macroeconomic Model with Rational Expecta-
tions”, Econometrica, 47, 1267-1286.
Taylor, John B. (1980a) “Aggregate Dynamics and Staggered Contracts”, Journal of Political
Economy, 88, l-23.
Taylor, John B. (1980b) “Output and Price Stability: An International Comparison”, Journul of
Economic Dynamics and Control, 2, 109-132.
Taylor, John B. (1982) “The Swedish Investment Fund System as a Stabilization Policy Rule”,
Brookings Pupers on Economic Activity, 1, 57-99.
Taylor, John B. (1983) “Union Wage Settlements During a Disinflation”, American Economic Review,
73, 981-993.
Wallis, Kenneth F. (1980) “Econometric Implications of The Rational Expectations Hypothesis”,
Econometricu, 48, 49-73.
Whiteman, Charles H. (1983) Linear Rutionul Expectations Models: A User’s Guide. Minneapolis:
University of Minnesota.
Wickens, M. (1982) “The Efficient Estimation of Econometric Models with Rational Expectations”,
Review of Economic Studies, 49, 55-68.
Wilson, Charles (1979) “Anticipated Shocks and Exchange Rate Dynamics”, Journal of Political
Economy, 87, 639-647.
Chupter 35

ECONOMIC POLICY FORMATION: THEORY AND


IMPLEMENTATION (APPLIED ECONOMETRICS IN
THE PUBLIC SECTOR)
LAWRENCE R. KLEIN

University ofPennsylvaniu

Contents

1. Some contemporary policy issues 2058


2. Formal political economy 2063
3. Some policy projections 2070
4. The theory of economic policy 2072
5. Prospects 2087
Appendix: An outline of a combined (Keynes-Leontief)
input-output/macro model 2088

Hundhook of Econometrics, Volume III, Edited by Z. Griliches and M.D. Intriliguior


0 Elsevier Science Publishers B V, 1986
2058 L. R. Klan

1. Some contemporary policy issues

Mainstream economic policy, known basically as demand management, and its


econometric implementation are jointly under debate now. The main criticism
comes from monetarists, who focus on versions of the quantity theo_ry of money,
from advocates of the theory of rational expectations, and more recently from
supply side economists. All these criticisms will be considered in this paper, as
well as the criticisms of public policy makers, who are always looking for
precision in their choice procedures, even when the subject matter is inherently
stochastic and relatively “noisy”.
Demand management is usually identified as Keynesian economic policy, i.e. as
the type that is inspired by the aggregative Keynesian model of effective demand.
Also, the mainstream econometric models are called Keynesian type models; so
the present state of world wide stagflation is frequently attributed to the use of
Keynesian econometric models for the implementation of Keynesian policies.
These are popular and not scientific views. In this presentation the objective
will be to put policy measures in a more general perspective, only some of which
are purely demand management and aggregative. Also, the evolution of econo-
metric models for policy application to many supply-side characteristics will be
stressed. To a certain extent, the orientation will be towards experience derived
from the application of U.S. models to U.S. economic policy, but the issue and
methods to be discussed will be more general.
For purposes of exposition, two types of policy will be examined, (1) overall
macro policies, and (2) specific structural policies. Macro policies refer to tradi-
tional monetary and fiscal policies, principally of central governments, but the
model applications to local government policies are also relevant. As the world
economy becomes more interdependent, more economies are recognizing their
openness; therefore, trade/payments policies are also part of the complement
known as macro policy.
By structural policy I mean policies that are aimed at specific segments of the
economy, specific groups of people, specific production sectors, distributions of
aggregative magnitudes or markets. Economists like to focus on macro policies
because they have overall impacts and leave the distributive market process
unaffected, able to do its seemingly efficient work. Most economists look upon the
free competitive market as an ideal and do not want to make specific policies that
interfere with its smooth working. They may, however, want to intervene with
structural policy in order to preserve or guarantee the working of the idealized
market process.
Macro policies are quite familiar. Monetary policy is carried out by the central
bank and sometimes with a treasury ministry. Also, the legislative branch of
Ch. 35: Economic Policy Formation 2059

democratic governments influence or shape monetary policy. Central executive


offices of government also participate in the formation of monetary policy. It is a
many sided policy activity. The principal policy instruments are bank reserves
and discount rates. Reserves may be controlled through open market operations
or the setting of reserve requirements. Policies directed at the instrument levels
have as objectives specified time paths of monetary aggregates or interest rates.
At the present time, there is a great deal of interest in controlling monetary
aggregates through control of reserves, but some countries continue to emphasize
interest rate control through the discount window. On the whole, monetary
authorities tend to emphasize one approach or the other; i.e. they try to control
monetary aggregates along monetarist doctrinal lines or they try to control
interest rates through discount policy, but in the spirit of a generalized approach
to economic policy there is no reason why central monetary authorities cannot
have multiple targets through the medium of multiple instruments. This approach
along the lines of modern control theory will be exemplified below.
Monetary policy is of particular importance because it can be changed on short
notice, with little or no legislative delay. It may be favored as a flexible policy but
is often constrained, in an open economy, by the balance of payments position
and the consequent stability of the exchange value of a country’s currency.
Therefore, we might add a third kind of financial target, namely, an exchange
value target. Flexibility is thus restricted in an open economy. Monetary policies
that may seem appropriate for a given domestic situation may be constrained by a
prevalent international situation.
There are many monetary aggregates extending all the way from the monetary
base, to checking accounts, to savings accounts, to liquid money market instru-
ments, to more general credit instruments. The credit instruments may also be
distinguished between private and public sectors of issuance. The plethora of
monetary aggregates has posed problems, both for the implementation of policy
and for the structure of econometric models used in that connection. The various
aggregates all behave differently with respect to reserves and the monetary base.
The authorities may be able to control these latter concepts quite well, but the
targets of interest all react differently. Furthermore, monetary aggregates are not
necessarily being targeted because of their inherent interest but because they are
thought to be related to nominal income aggregates and the general price level.
The more relevant the monetary aggregate for influencing income and the price
level, the more difficult is it to control it through the instruments that the
authorities can effect.
Benjamin Friedman has found, for the United States, that the most relevant
aggregate in the sense of having a stable velocity coefficient is total credit, but this
is the least controllable.’ The most controllable aggregate, currency plus checking
‘Benjamin Friedman, “The Relative Stability of Money and Credit ‘Velocities’in the United
States: Evidence and Some Speculations”, National Bureau of Economic Research, working paper No.
645, March, 1981.
2060 L. R. Klein

accounts, has the most variable velocity. Between these extremes it appears that
the further is the aggregate from control, the less variable is its associated velocity.
This is more a problem for the implementation of monetary policy that for the
construction of models.
But a problem for both policy formation and modeling is the recent introduc-
tion of new monetary instruments and technical changes in the operation of credit
markets. Electronic banking, the use of credit cards, the issuance of more
sophisticated securities to the average citizen are all innovations that befuddle the
monetary authorities and the econometrician. Authorities find that new instru-
ments are practically outside their control for protracted periods of time, espe-
cially when they are first introduced. They upset traditional patterns of seasonal
variation and generally enlarge the bands of uncertainty that are associated with
policy measures. They are problematic for econometricians because they establish
new modes of behavior and have little observational experience on which to base
sample estimates.
Side by side with monetary policy goes the conduct of fiscal policy. For many
years - during and after the Great Depression-fiscal policy was central as far as
macro policy was concerned. It was only when interest rates got significantly
above depression floor levels that monetary policy was actively used and shown to
be fairly powerful.
Fiscal policy is usually, but not necessarily, less flexible than monetary policy
because both the legislative and executive branches of government must approve
major changes in public revenues and expenditures. In a parliamentary system, a
government cannot survive unless its fiscal policy is approved by parliament, but
this very process frequently delays effective policy implementation. In a legislative
system of the American type, a lack of agreement may not bring down a
government, but it may seriously delay the implementation of policy. On the
other hand, central banking authorities can intervene in the functioning of
financial markets on a moment’s notice.
On the side of fiscal policy, there are two major kinds of instruments, public
spending and taxing. Although taxing is less flexible than monetary management,
it is considerably more flexible than are many kinds of expenditure policy. In
connection with expenditures, it is useful to distinguish between purchases of
goods or services and transfer payments. The latter are often as flexible as many
kinds of taxation instruments.
It is generally safer to focus on tax instruments and pay somewhat less
attention to expenditure policy. Tax changes have the flexibility of being made
retroactive when desirable. This can be done with some expenditures, but not all.
Tax changes can be made effective right after enactment. Expenditure changes,
for goods or services, especially if they are increases, can be long in the complete
making. Appropriate projects must be designed, approved, and executed. Often it
is difficult to find or construct appropriate large projects.
Ch. 35: Economic Policy Formation 2061

Tax policy can be spread among several alternatives such as personal direct
taxes, (either income or expenditure) business income taxes, or indirect taxes. At
present, much interest attaches to indirect taxes because of their ease of collec-
tion, if increases are being contemplated, or because of their immediate effect on
price indexes, if decreases are in order. Those taxes that are levied by local, as
opposed to national governments, are difficult to include in national economic
analysis because of their diversity of form, status, and amount.
Some tax policies are general, affecting most people or most sectors of the
economy all at once. But, speci$c, in contrast to general, taxes are important for
the implementation of structural policies. An expenditure tax focuses on stimulat-
ing personal savings. Special depreciation allowances or investment tax credits
aim at stimulating private fixed capital formation. Special allowances for R&D,
scientific research, or capital gains are advocated as important for helping the
process of entrepreneurial innovation in high technology or venture capital lines.
These structural policies are frequently cited in present discussions of industrial
policy.
A favorite proposal for strictly anti-inflationary policy is the linkage of tax
changes, either as rewards (cuts) or penalties (increases), to compliance by
businesses and households with prescribed wage/price guidelines. Few have ever
been successfully applied on a broad continuing scale, but this approach, known
as incomes policies, social contracts, or TIPS (tax based incomes policies), is
widely discussed in the scholarly literature.
These monetary and fiscal policies are the conventional macro instruments of
overall policies. They are important and powerful; they must be included in any
government’s policy spectrum, but are they adequate to deal with the challenge of
contemporary problems? Do they deal effectively with such problems as:

-severe unemployment among certain designated demographic groups;


-delivery of energy;
-conservation of energy;
-protection of the environment;
-public health and safety;
-provision of adequate agricultural supply;
-maintenance of healthy trade balance?

Structural policies, as distinct from macro policies, seem to be called for in order
to deal effectively with these specific issues.
If these are the kinds of problems that economic policy makers face, it is
worthwhile considering the kinds of policy decisions with instruments that have
to be used in order to address these issues appropriately, and consider the kind of
economic model that would be useful in this connection.
2062 L. R. Klein

For dealing with youth unemployment and related structural problems in labor
markets, the relevant policies are minimum wage legislation, skill training grants,
and provision of vocational education. These are typical things that ought to be
done to reduce youth unemployment. These policy actions require legislative
support with either executive or legislative initiative.
In the case of energy policy, the requisite actions are concerned with pricing of
fuels, rules for fuel allocation, controls on imports, protection of the terrain
against excessive exploitation. These are specific structural issues and will be
scarcely touched by macro policies. These energy issues also effect the environ-
ment, but there are additional considerations that arise from non-energy sources.
Tax and other punitive measures must be implemented in order to protect the
environment, but, at the same time, monitor the economic costs involved. The
same is true for policies to protect public health and safety. These structural
policies need to be implemented but not without due regard to costs that have
serious inflationary consequences. The whole area of public regulation of enter-
prise is under scrutiny at the present time, not only for the advantages that might
be rendered, but also for the fostering of competition, raising incentives, and
containing cost elements. It is not a standard procedure to consider the associated
inflationary content of regulatory policy.
Ever since the large harvest failures of the first half of the 1970s (1972 and
1975, especially) economists have become aware of the fact that special attention
must be paid to agriculture in order to insure a basic flow of supplies and
moderation in world price movements. Appropriate policies involve acreage
limitations (or expansions), crop subsidies, export licenses, import quotas, and
similar specific measures. They all have bearing on general inflation problems
through the medium of food prices, as components of consumer price indexes,
and of imports on trade balances.
Overall trade policy is mainly guided by the high minded principle of fostering
of conditions for the achievement of multilateral free trade. This is a macro
concept, on average, and has had recent manifestation in the implementation of
the “Tokyo Round” of tariff reductions, together with pleas for moderation of
non-tariff barriers to trade. Nevertheless, there are many specific breaches of the
principle, and specific protectionist policies are again a matter of concern. Trade
policy, whether it is liberal or protectionist, will actually be implemented through
a set of structural measures. It might mean aggressive marketing in search of
export sales, provision of credit facilities, improved port/storage facilities, and a
whole group of related policy actions that will, in the eyes of each country by
itself, help to preserve or improve its net export position.
We see then that economic policy properly understood in the context of
economic problems of the day goes far beyond the macro setting of tax rates,
overall expenditure levels, or establishing growth rates for some monetary aggre-
gates. It is a complex network of specific measures, decrees, regulations (or their
Ch. 35: Economic Policy Formation 2063

absence), and recommendations coming from all branches of the public sector. In
many cases they require government coordination. Bureaus, offices, departments,
ministries, head of state, and an untold number of public bodies participate in
this process. It does not look at all like the simple target-instrument approach of
macroeconomics, yet macroeconometric modeling, if pursued at the appropriate
level of detail, does have much to contribute. That will be the subject of sections
of this paper that follow.

2. Formal political economy

The preceding section has just described the issues and actors in a very summary
outline. Let us now examine some of the underlying doctrine. The translation of
economic theory into policy is as old as our subject, but the modern formalism is
conveniently dated from the Keynesian Reuolution. Clear distinction should be
made between Keynesian theory and Keynesian policy, but as far as macro policy
is concerned, it derives from Keynesian theory.
The principal thrust of Keynesian theory was that savings-investment balance
at full employment would be achieved through adjustment of the aggregative
activity level of the economy. It was interpreted, at an early stage, in a framework
of interest-inelastic investment and interest-elastic demand for cash. This particu-
lar view and setting gave a secondary role to monetary policy. Direct effects on
the spending or activity stream were most readily achieved through fiscal policy,
either adding or subtracting directly from the flow of activity through public
spending or affecting it indirectly through changes in taxation. Thinking therefore
centered around the achievement of balance in the economy, at full employment,
by the appropriate choice of fiscal measures. In a formal sense, let us consider the
simple model
C=f(Y-T) consumption function
T=tY tax function
Z = g(AY) investment function
Y=C+Z+G output definition

where
G = public expenditures
A = time difference operator
Y= total output (or income, or activity level).
Fiscal policy means the choice of an appropriate value of t (tax rate), or level G
(expenditure), or mixture of both in order to achieve a target level of Y. This
could also be a dynamic policy, by searching for achievement of a target path of
Y through time. To complement dynamic policy it is important to work with a
2064 L. R. Klein

richer dynamic specification of the economic model. Lag distributions of Y-T or


AY in the C and I function would be appropriate. This kind of thinking inspired
the approach to fiscal policy that began in the 1930’s and still prevails today. It
inspired thoughts about “fine tuning” or “steering” an economic system. It is
obviously terribly simplified. It surely contains grains of truth, but what are the
deficiencies?
In the first place, there is no explicit treatment of the price level or inflation rate
in this system. Arguments against Keynesian policy pointed out the in!ationary
dangers from the outset. These dangers were minimal during the 1930’s and did
not become apparent on a widespread basis for about 30 years-after much
successful application of fiscal policy, based on some monetary policy as time
wore on. There is no doubt, however, that explicit analysis of price formation and
great attention to the inflation problem must be guiding principles for policy
formation from this time forward.
Another argument against literal acceptance of this version of crude
Keynesianism is that. it deals with unrealistic, simplistic concepts. Fiscal action is
not directed towards “t ” or “G”. Fiscal action deals with complicated al-
lowances, exemptions, bracket rates, capital gains taxation, value added taxation,
expenditures for military hardware, agricultural subsidies, food stamps, aid to
dependent children, and unemployment insurance benefits. These specific policy
instruments have implications for the broad, general concepts represented by “t ”
and “G “, but results can be quite misleading in making a translation from
realistic to such macro theoretical concepts. The system used here for illustration
is so simplified that there is no distinction between direct and indirect taxes or
between personal and business taxes.
The Keynesian model of income determination can be extended to cover the
pricing mechanism, labor input, labor supply, unemployment, wages, and mone-
tary phenomena. There is a difference, however, between monetary analysis and
monetarism. Just as the simple Keynesian model serves as the background for
doctrinaire Keynesian fiscal policy, there is another polar position, namely, the
monetarist model which goes beyond the thought that money matters, to the
extreme that says that only money matters. The monetarist model has its simplest
and crudest exposition in the following equation of exchange

Mv=Y.

For a steady, parametric, value of v (velocity), there is a linear proportional


correspondence between M (nominal money supply) and Y (nominal value of
aggregate production or income). For every different M-concept, say M,, we
would have *

MiV, = Y.

‘See the various concepts in the contribution by Benjamin Friedman, op. tit
Ch. 35: Economic Policy Formation 2065

A search for a desired subscript i may attach great importance to the correspond-
ing stability of ui. It is my experience, for example, that in the United States, u2 is
more stable than ut.
More sophisticated concepts would be

Mu= 5 WiY_,,
i=o

or

or

The first says that M is proportional to long run Y or a distributed lag in Y. The
second says that M is proportional to a power of long run Y or merely that a
stable relationship exists between long run Y and M. Finally, the third says that
M is a function of long run price as well as long run real income (X). In these
relationships no attention is paid to subscripts for M, because the theory would
be similar (not identical) for any M, and proponents of monetarist policy simply
argue that a stable relationship should be found for the authorities for some Mi
concept, and that they should stick to it.
The distributed lag relationships in P_; and X_i are evidently significant
generalizations of the crude quantity theory, but in a more general view, the
principal thing that monetarists need for policy implementation of their theory is
a stable demand function for money. If this stable function depends also on
interest rates (in lag distributions), the theory can be only partial, and analysis
then falls back on the kind of mainstream general macroeconometric model used
in applications that are widely criticized by strict monetarists.3
The policy implications of the strict monetarist approach are clear and are,
indeed, put forward as arguments for minimal policy intervention. The propo-
nents are generally against activist fiscal policy except possibly for purposes of
indexing when price movements get out of hand. According to the basic monetarist

3The lack of applicability of the monetarist type relationship, even generalized dynamically, to the
United Kingdom is forcefully demonstrated by D. F. Hendry and N. R. Ericsson, “Assertion without
Empirical Basis: An Econometric Appraisal of Friedman and Schwartz’ ‘Monetary Trends in.. the
United Kingdom,“’ Monetary Trends in the United Kingdom, Bank of England Panel of Academic
Consultants, Panel Paper No. 22 (October 1983), 45-101.
2066 L. R. Klein

relationship, a rule should be established for the growth rate of M according to


the growth rate of Y, preferably the long run concept of Y. A steady growth of
M, according to this rule, obviates the need for frequent intervention and leaves
the economy to follow natural economic forces. This is a macro rule, in the
extreme, and the monetarists would generally look for the competitive market
economy to make all the necessary micro adjustments without personal interven-
tion.
The theory for the steady growth of M and Y also serves as a theory for
inllation policy, for if the competitive economy maintains long run real income
(ZZw,X_ j) at its full capacity level-not in every period, but on average over the
cycle- then steady growth of M implies a steady level for long run price
(EqiPpi). The monetarist rule is actually intended as a policy rule for inflation
control.
There are several lines of argument against this seemingly attractive policy for
minimal intervention except at the most aggregative level, letting the free play of
competitive forces do the main work of guiding the economy in detail. In the first
place there is a real problem in defining M,, as discussed already in the previous
section. Banking and credit technology is rapidly changing. The various M,
concepts are presently quite fluid, and there is no clear indication as to which M,
to attempt to control. To choose the most tractable concept is not necessarily
going to lead to the best economic policy.
Not only are the Mi concepts under debate, but the measurement of any one of
them is quite uncertain. Coverage of reporting banks, the sudden resort to new
sources of funds (Euro-currency markets, e.g.), the attempts to live with inflation,
and other disturbing factors have lead to very significant measurement errors,
indicated in part at least by wide swings in data revision of various M, series. If
the monetary authorities do not know M, with any great precision, how can they
hit target values with the precision that is assumed by monetarists? It was
previously remarked that policy makers do not actually choose values for “t ” and
“G “. Similarly, they do not choose values for “M,“. They engage in open market
buying and selling of government securities; they fix reserve requirements for
specific deposits or specific classes of banks; they fix the discount rate and they
make a variety of micro decisions about banking practices. In a fractional reserve
system, there is a money multiplier connecting the reserve base that is controlled
by monetary authorities to M,, but the multiplier concept is undergoing great
structural change at the present time, and authorities do not seem to be able to hit
M, targets well.
A fundamental problem with either the Keynesian or the monetarist view of
formal political economy is that they are based on simple models -models that
are useful for expository analysis but inadequate to meet the tasks of economic
policy. These simple models do not give a faithful representation of the economy;
they do not explicitly involve the appropriate levels of action; they do not take
Ch. 35; Economic Policy Formdon 2061

account of enough processes in the economy. Imagine running the economy


according to a strict monetarist rule or fine tuning applications of tax policy in
the face of world shortages in energy markets and failing to take appropriate
action simply because there are no energy parameters or energy processes in the
expository system. This, in fact, is what people from the polar camps have said at
various times in the past few years.
What is the appropriate model, if neither the Keynesian nor the monetarist
models are appropriate? An eclectic view is at the base of this presentation. Some
would argue against eclecticism on a priori grounds as being too diffuse, but it
may be that an eclectic view is necessary in order to get an adequate model
approximation to the complicated modern economy. Energy, agriculture, foreign
trade, exchange rates, the spectrum of prices, the spectrum of interest rates,
demography, and many other things must be taken into account simultaneously.
This cannot be done except through the medium of large scale models. These
systems are far different in scope and method from either of the polar cases. They
have fiscal and monetary sectors, but they have many other sectors and many
other policy options too.
As a general principle, I am arguing against the formulation of economic policy
through the medium of small models-anything fewer than 25 simultaneous
equations. Small models are inherently unable to deal with the demands for
economic policy formation. An appropriate large-scale model can, in my opinion,
be used in the policy process. An adequate system is not likely to be in the
neighborhood of 25 equations, however. It is likely to have more than 100
equations, and many in use today have more than 500-1000 equations. The
actual size will depend on the country, its openness, its data system, its variability,
and other factors. The largest systems in regular use have about 5000 equations,
and there is an upper limit set by manageability.4
It is difficult to present such a large system in compact display, but it is
revealing to lay out its sectors:
Consumer demand
Fixed capital formation
Inventory accumulation
Foreign trade
Public spending on goods and services
Production of goods and services
Labor requirements
Price formation
Wage determination

4The Wharton Quarterly Model, regularly used for short run business cycle analysis had 1000
equations in 1980, and the medium term Wharton Annual Model had 1595 equations, exclusive of
input-output relationships. The world system of Project LINK has more than 15,000 equations at the
present time, and is still growing.
L. R. Klein

Labor supply and demography


Income formation
Money supply and credit
Interest rate determination
Tax receipts
Transfer payments
Inter industry production flows

In each of these sectors, there are several subsectors, some by type of product,
some by type of end use, some by age-sex-race, some by country of origin or
destination, some by credit market instrument, and some by level of government.
The production sector may have a complete input-output system embedded in
the model. Systems like these should not be classified as either Keynesian or
monetarist. They are truly eclectic and are better viewed as approximations to the
true but unknown Walrasian structure of the economy. These approximations are
not unique. The whole process of model building is in a state of flux because at
any time when one generational system is being used, another, better approxima-
tion to reality is being prepared. The outline of the equation structure for a
system combining input-output relations with a macro model of income de-
termination and final demand, is given in the appendix.
The next section will deal with the concrete policy making process through the
medium of large scale models actually in use. They do not govern the policy
process on an automatic basis, but they play a definite role. This is what this
presentation is attempting to show.
There is, however, a new school of thought, arguing that economic policy will
not get far in actual application because the smart population will counter public
officials’ policies, thus nullifying their effects. On occasion, this school of thought,
called the rational expectations school, indicate that they think that the use of
macroeconometric models to guide policy is vacuous, but on closer examination
their argument is seen to be directed at any activist policy, whether through the
model medium or not.
The argument, briefly put, of the rational expectations school is that economic
agents (household, firms, and institutions) have the same information about
economic performance as the public authorities and any action by the latter, on
the basis of their information has already been anticipated and will simply lead to
re-action by economic agents that will nullify the policy initiatives of the
authorities. On occasion, it has been assumed that the hypothetical parameters of
economic models are functions of policy variables and will change in a particular
way when policy variables are changed.5

‘R. Lucas, “Econometric Policy Evaluation: A Critique,” The Phillips Curve und Labor Mm-km,
eds., K. Brunner and A. K. Meltzer. (Amsterdam: North-Holland, 1976), 19-46.
Ch. 35: Economic Policy Formation 2069

Referring to a linear expression of the consumption function in the simple


Keynesian model, they would assume

C=(u+fi(Y-T)

P=B(CG)
This argument seems to me to be highly contrived. It is true that a generalization
of the typical model from fixed to variable parameters appears to be very
promising, but there is little evidence that the generalization should make the
coefficients depend in such a special way on exogeneous instrument variables.
The thought that economic models should be written in terms of the agent’s
perceptions of variables on the basis of their interpretation of history is sound.
The earliest model building attempts proceeded from this premise and introduced
lag distributions and various proxies to relate strategic parameter values, to
information at the disposal of both economic agents and public authorities, but
they did not make the blind intellectual jump to the conclusion that perceptions
of the public at large and authorities are the same. It is well known that the
public, at any time, holds widely dispersed views about anticipations for
the economy. Many do not have sophisticated perceptions and do not share the
perceptions of public authorities. Many do not have the qualifications or facilities
to make detailed analysis of latest information or history of the economy.
Econometric models are based on theories and estimates of the way people do
behave, not on the way they ought to behave under the conditions of some
hypothesized decision making rules. In this respect, many models currently in use,
contain data and variables on expressed expectations, i.e. those expected values
that can be ascertained from sample surveys. In an interesting paper dealing with
business price expectations, de Leeuw and McKelvey find that statistical evidence
on expected prices contradict the hypothesis of rationality, as one might expect.6
The rise of the rational expectations school is associated with an assertion that
the mainstream model, probably meaning the Keynesian model, has failed during
the 1970s. It principally failed because of its inability to cope with a situation in
which there are rising rates of inflation and rising rates of unemployment. In
standard analysis the two ought to be inversely related, but recently they have
been positively related. Charging that macroeconomic models have failed in this
situation, Lucas and Sargent, exponents of the school of rational expectations,
seek an equilibrium business cycle model consisting of optimizing behavior by

‘F. de Leeuw and M. McKelvey, “Price Expectations by Business Firms,” Brookings Papers on
Economic Activity, 1981) 299-314. The findings in this article have been extended, and they now
report that there is evidence in support of long run la& of bias in price expectations, a necessary but
not sutlicient condition for rationality of price expectations. See “Price Expectations of Business
Firms: Bias in the Short and Long Run,” American Economic Review, 14 (March 1984) 99-110.
2070 L. R. Klein

economic agents and the clearing of markets.7 Many, if not most, macroecono-
metric models are constructed piece-by-piece along these lines and have been for
the past 30 or more years. Rather than reject a whole body of analysis or demand
wholly new modelling approaches, it may be more fruitful to look more carefully
at the eclectic model that has, in fact, been in use for some time. If such models
have appropriate allowance for supply side disturbances, they can do quite well in
interpreting the events of the 1970s and even anticipated them in many instances.’

3. Some policy projections

Rather than move in the direction of the school of rational expectations, I suggest
that we turn from the oversimplified model and the highly aggregative policy
instruments to the eclectic system that has large supply side content, together with
conventional demand side analysis and examine structural as well as macro
policies.
In the 1960s aggregative policies of Keynesian demand management worked
very well. The 1964 tax cut in the United States was a textbook example and
refutes the claim of the rational expectations school that parametric shifts will
nullify policy action. It also refutes the idea that we know so little about the
response pattern of the economy that we should refrain from activist policies.
Both the Wharton and Brookings Models were used for simulations of the 1964
tax cut.’ A typical policy simulation with the Wharton Model is shown in the
accompanying table.
This is a typical policy simulation with an econometric model, solving the
system dynamically, with and without a policy implementation. The results in the
above table estimate that the policy added about $10 billion (1958 $) to real GNP
and sacrificed about $7 billion in tax revenues. Actually, by 1965, the expansion
of the (income) tax base brought revenues back to their pre-tax cut position.
The Full Employment Act of 1946 in the United States was the legislation
giving rise to the establishment of the Council of Economic Advisers. Similar
commitments of other governments in the era following World War II and
reconstruction led to the formulation of aggregative policies of demand manage-

‘Robert S. Lucas and Thomas J. Sargent, “After Keynesian Macroeconomics”, After the Phillips
Cuwe: Persistence of High Inflation and High Unemployment. (Boston: Federal Reserve Bank of
Boston, 1978). 49-72.
‘L. R. Klein. “The Longevity of Economic Theory”, Quantitative WirtschaJf~/orschuilg, ed. by H.
Albach et al. (Tubingen: J. C. B. Mohr (Paul Siebeck), 1977) 411-19; “Supply Side Constraints in
Demand Oriented Systems: An Interpretation of the Oil Crisis”, ZeitschriJt Jiir Nationaliikonomie, 34
(1974). 45-56: “Five-year Experience of Linking National Econometric Models and of Forecasting
International Trade”, Quanritatioe Studies of International Economic Relations. H. Glejser, ed.
(Amsterdam: North-Holland, 1976). l-24.
‘L. R. Klein, “Econometric Analysis of the Tax Cut of 1964,” The Brookmgs Model: Some Furrher
Re.wlfs. ed. by J. Duesenberry et al. (Amsterdam: North-Holland, 1969).
Ch. 35: Economic Policy Formation 2071

Table 1
Comparative simulations of the tax cut of 1964 (The Wharton Model).

Real GNP Personal tax and nontax payments


(bill 1958 $) (bill of curr. $)
Tax cut No tax cut Tax cut No tax cut
Actual simulation simulation Actual simulation simulation

1964.1 569.7 567.0 563.1 60.7 61.3 64.0


1964.2 578.1 575.8 565.4 56.9 57.9 64.5
1964.3 585.0 581.0 569.6 59.1 59.0 65.6
1964.4 587.2 585.0 574.7 60.9 59.9 66.7

ment on a broad international scale. New legislation in the United States, under
the name of the Humphrey-Hawkins Bill, established ambitious targets for
unemployment and inflation during the early part of the 1980s. The bill, however,
states frankly that aggregative policy alone will not be able to accomplish the
objectives. Structural policies will be needed, and to formulate those, with
meaning, it will be necessary to draw upon the theory of a more extensive model,
manely, the Keynes-Leontief model.
The Wharton Annual Model is of the Keynes-Leontief type. It combines a
model of income generation and final demand determination with a complete
input-output system of 65 sectors and a great deal of demographic detail. It is
described in general terms in the preceding section and laid out in equation form
in the appendix. To show how some structural policies for medium term analysis
work out in this system, I have prepared a table with a baseline projection for the
1980s together with an alternative simulation in which the investment tax credit
has been increased (doubled to 1982 and raised by one-third thereafter), in order
to stimulate capital formation, general personal income taxes have been reduced
by about 6% and a tax has been placed on gasoline (5Oe per gallon).” To offset
the gasoline tax on consumers, sales taxes have been cut back, with some grants in
aid to state and local governments increased to offset the revenue loss of the sales
taxes.
These policies mix aggregative fiscal measures with some structural measures to
get at the Nation’s energy problem. Also, tax changes have been directed
specifically at investment in order to improve the growth of productivity and hold
down inflation for the medium term. It is an interesting policy scenario because it
simultaneously includes both stimulative and restrictive measures. Also, it aims to
steer the economy in a particular direction towards energy conservation and
inducement of productivity.
As the figures in Table 2 show, the policy simulation produces results that
induce more real output, at a lower price level. Lower unemployment accompa-

“The investment tax credit provides tax relief to business, figured as a percentage of an equipment
purchase, if capital formation is undertaken. The percentage has varied, but is now about 10 percent.
2072 L. R. Klein

Table 2
Estimated policy projections of the Wharton Annual Model 1980-89
(Deviation of policy simulation from baseline)
Selected economic indicators

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989

GNP (bill $1972) -1 14 35 44 50 51 131 48 48 46


GNP deflator
(index points) -0.4 -0.7 -1.4 ~1.7 -2.1 -2.4 -3.0 -3.6 -4.6 -5.7
Unemployment
Rate
(percentage points) 0.0 ~0.5 -1.2 -1.6 -1.8 -1.9 -1.7 -1.5 -1.3 -1.1
Productivity change
(percentage points) -0.1 0.6 0.5 0.0 0.0 -0.1 0.0 0.1 0.1 0.0
Net Exports (bill $) 0.8 6.8 4.7 10.5 6.2 2.2 0.8 0.9 -0.5 -1.6
Federal surplus
(bill $) - 2.7 1.1 -0.2 -1.0 4.5 0.1 -2.5 -0.6 -9.2 -2.7
Energy ratio
(thou BTU/Real GNP) - 0.9 -0.8 -0.6 -0.5 -0.3 -0.3 -0.2 -0.3 -0.2 -0.2
Nonresidential
Investment
(bill $ 1972) 0.9 4.1 8.4 11.3 13.8 14.8 16.0 16.7 17.2 17.2

nies the higher output, and the improvement in productivity contributes to the
lower price index. The lowering of indirect taxes offsets the inflationary impact of
higher gasoline taxes.
A cutback in energy use, as a result of the higher gasoline tax, results in a lower
BTU/GNP ratio. This holds back energy imports and makes the trade balance
slightly better in the policy alternative case.
A contributing factor to the productivity increase is the higher rate of capital
formation in the policy alternative. There are no surprises in this example. The
results come out as one would guess on the basis of a priori analysis, but the main
contribution of the econometric approach is to try to quantify the outcome and
provide a basis for net assessment of both the positive and negative sides of the
policy. Also, the differences from the base-line case are not very large. Economet-
ric models generally project moderate gains. To some extent, they underestimate
change in a systematic way, but they also suggest that the present inflationary
situation is deep seated and will not be markedly cured all at once by the range of
policies that is being considered.

4. The theory of economic policy

The framework introduced by Tinbergen is the most fruitful starting point.” He


proposed the designation of two kinds of variables, targets and instruments, A

“J. Tinbergen, On The Themy of Economic Policy. (Amsterdam: North Holland, 1952).
Ch. 35: Economic Policy Formation 2073

target is an endogenous (dependent) variable in a multivariate-multiequation


representation of the economy. An instrument is an exogenous (independent)
variable that is controlled or influenced by policy making authorities in order to
lead the economy to targets. Not all endogenous variables are targets; not all
exogenous variables are instruments.
In the large eclectic model, with more than 500 endogenous variables, policy
makers cannot possible comprehend the fine movements in all such magnitudes.
Some systems in use have thousands of endogenous variables. At the national
economy level, top policy makers may want to focus on the following: GDP
growth rate, overall inflation rate, trade balance, exchange rate, unemployment
rate, interest rate. There may be intermediate or intervening targets, too, as in our
energy policy today - to reduce the volume of oil imports. This is partly a goal on
its own, but partly a means of improving the exchange value of the dollar, the
trade balance, and the inflation rate. There may be layers of targets in recursive
fashion, and in this way policy makers can extend the scope of variables
considered as targets, but it is not practical to extend the scope much beyond 10
targets or so. This refers to policy makers at the top. Elsewhere in the economy,
different ministers or executives are looking at a number of more specialized
targets-traffic safety, agricultural yield, size of welfare rolls, number of housing
completions, etc.
The large scale eclectic model has many hundreds or thousands of equations
with an equal number of endogenous variables, but there will also be many
exogenous variables. A crude rule of thumb might be that there are about as
many exogenous as endogenous variables in an econometric model.12 Perhaps we
are too lax in theory building and resign ourselves to accept too many variables in
the exogenous category because we have not undertaken the task of explaining
them. All government spending variables and all demographic variables, for
example, are not exogenous, yet they are often not explicitly modeled, but are left
to be explained by the political scientist and sociologist. This practice is rapidly
changing. Many variables that were formerly accepted as exogenous are now
being given explicit and careful endogenous explanation in carefully designed
additional equations; nevertheless, there remains a large number of exogenous
variables in the eclectic, large scale model. There are, at least, hundreds.
Only a few of the many exogenous variables are suitable for consideration as
instruments. In the first place, public authorities cannot effectively control very
many at once. Just as coordinated thought processes can comprehend only a few
targets at a time, so can they comprehend only a few instruments at a time.
Moreover, some exogenous variables cannot, in principle, be controlled effec-

12The Wharton Quarterly Model (1980) has 432 stochastic equations, 568 identities, and 401
exogenous variables. The Wharton Annual Model (1980) had 647 stochastic equations, 948 identities
and 626 exogenous variables. Exclusive of identities, (and input-output relations) these each have
approximate balance between endogenous and exogenous variables.
2074 L.R.Klein

tively. The many dimensions of weather and climate that are so important for
determining agricultural output are the clearest examples of non-controllable
exogenous variables -with or without cloud seeding.
The econometric model within which these concepts are being considered will
be written as

F(Y’, Y’- l... y'_,,x’x’_,... xlq,w’wI_l...wI,, z’, Z’_l, z),, 0’) = e 0)


F = column vector of functions:

y = column vector of target (endogenous) variables:

x = column vector of non-target (endogenous) variables:

Xl-$, . . ., XR2
n,+n,=n

w = column vector of instrument (exogenous) variables:

z = column vector of non-instrument (exogenous) variables:

zi, z 7_,..., Z m,

m,+m,=m

0 = column vector of parameters


e = column vector of errors:

In this system, there are n stochastic equations, with unknown coefficients, in n


endogenous variables and m exogenous variables. A subset of the endogenous
variables will be targets (nl I n), and a subset of the exogenous variables will be
instruments (m, 5 m).
The parameters are unknown, but estimated by the statistician from observable
data or a priori information. The estimated values will be denoted by 6. Also, for
any application situation, values must be assigned to the random variables e.
Either the assumed mean (E(e) = 0) will be assigned, or values of e will be
Ch. 35: Economic Policy Formation 2075

generated by some random drawings, or fixed at some a priori non-zero values.


But, given values for e and 6, together with initial conditions, econometricians
can generally “solve” this equation system. Such solutions or integrals will be
used in the policy formation process in a key way.
First, let us consider Tinbergen’s special case of equality between the number
of instruments and targets, n, = m,. Look first at the simplest possible case with
one instrument, one target, and one estimated parameter. If the f-function
expresses a single-valued relationship between y and W, we can invert it to give

w=g(y,G).

For a particular target value of y(y*), we can find the appropriate instrument
value w = w* from the solution of

w*=g(y*&).

If the f-function were simple proportional, we can write the answer in closed
form as

y=&w

w*=- *
;Y .

For any desired value of y we can thus find the appropriate action that the
authorities must take by making w = w *. This will enable us to hit the target
exactly. The only exception to this remark would be that a legitimate target y *
required an unattainable or inadmissable w*. Apart from such inadmissible
solutions, we say that for this case the straightforward rule is to interchange the
roles of exogenous and endogenous variable and resolve the system, that is to say,
treat the n, = m, instruments as though they were unknown endogenous variables
and the n, = m, targets as though they were known exogenous variables. Then
solve the system for all the endogenous as functions of the exogenous variables so
classified.
It is obvious and easy to interchange the roles of endogenous and exogenous
variables by inverting the single equation and solving for the latter, given the
target value of the former. In a large complicated system linear or not, it is easy to
indicate how this may be done or even to write closed form linear expressions for
doing it in linear systems, but it is not easy to implement in most large scale
models.
2076 L. R. Klein

For the linear static case, n, = m,, we can write

A,, is n, x n,; A,, is n, x n,; A,, is n2 X n,; A,, is n2 X n2


B,, is n, X m,; B,, is n, X m,; B,, is n2 X m,; B,, is n2 X m2

The solution for the desired instruments w* in terms of the targets y* and of z is

The relevant values come from the first n, rows of this solution.
This solution is not always easy to evaluate in practice. Whether the system is
linear or nonlinear, the usual technique employed in most econometric centers is
to solve the equations by iterative steps in what is known as the Gauss-Seidel
algorithm. An efficient working of this algorithm in large dynamic systems
designed for standard calculations of simulation, forecasting, multiplier analysis
and similar operations requires definite rules of ordering, normalizing, and
choosing step sizes. l3 It is awkward and tedious to re-do that whole procedure for
a transformed system in which some variables have been interchanged, unless
they are standardized.
It is simpler and more direct to solve the problem by searching (systematically)
for instruments that bring the n, values of y as “close” as possible to their targets
y*. There are many ways of doing this, but one would be to find the minimum
value of

L = g u,(y; - yI*y
i=l

subject to $;=;
where P = estimated value of F for 0 = 6
t = assigned values to error vector
In the theory of optimal economic policy, L is called a loss function and is
arbitrarily made a quadratic in this example. Other loss functions could equally
well be chosen. The ui are weights in the loss function and should be positive.
If there is an admissible solution and if n, = m,, the optimal value of the loss
function should become zero.
13L, R. Klein, A Textbook of Econometrics, (New York: Prentice-Hall, 1974). p. 239.
Ch. 35: Economic Policy Formation 2017

A more interesting optimization problem arises if ni 2 m,; i.e. if there are more
targets than instruments. In this case, the optimitation procedure will not, in
general, bring one all the way to target values, but only to a “minimum distance”
from the target. If m1 > n,, it would be possible, in principle, to assign arbitrary
values to m, - n, (superfluous) instruments and solve for the remaining n,
instruments as functions of the n, target values of y. Thus, the problem of excess
instruments can be reduced to the special problem of equal numbers of instru-
ments and targets.
It should be noted that the structural model is a dynamic system, and it is
unlikely that a static loss function would be appropriate. In general, economic
policy makers have targeted paths for y. A whole stream of y-values are generally
to be targeted over a policy planning horizon. In addition, the loss function could
be generalized in other dimensions, too. There will usually be a loss associated
with instrumentation. Policy makers find it painful to make activist decisions
about running the economy, especially in the industrial democracies; therefore, L
should be made to depend on w - w * as well as on y - y *. In the quadratic case,
covariation between yi - ri* might also be considered, but this may well be
beyond the comprehension of the typical policy maker.
A better statement of the optimal policy problem will then be

L= t { 5 ~~(y,,-y;)~+ j!JUi(wi,-w~)2)=min.
t=1 i=l i=l

w.r.t. wit subject toj= 2,


t=1,2 ,..., h.

The ui are weights associated with instrumentation losses. If future values are to
be discounted it may be desirable to vary ui and ui with t. A simple way would be
to write

where p is the rate of discount.


A particular problem in the application of the dynamic formulation is known
as the end-point problem. Decisions made at time point h (end of the horizon)
may imply awkward paths for the system beyond h because it is a dynamic
system whose near term movements (h + 1, h + 2,. . .) will depend on the (initial)
conditions of the system up to time h. It may be advisable to carry the
optimization exercise beyond h, even though policy focuses on the behaviour of
the system only through period h.
Many examples have been worked out for application of this approach to
policy making- few in prospect (as genuine extrapolations into the future) but
2078 L. R. Klein

many in retrospect, assessing what policy should have been.14 A noteworthy series
of experimental policies dealt with attempts to have alleviated the stagflation of
the late 1960s and the 1970s in the United States; in other words, could a
combination of fiscal and monetary policies have been chosen that would have led
to full (or fuller) employment without (so much) inflation over the period
1967-75?
The answers, from optimal control theory applications among many models,
suggest that better levels of employment and production could have been achieved
with very little additional inflationary pressures but that it would not have been
feasible to bring down inflation significantly at the same time. Some degree of
stagflation appears to have been inevitable, given the prevailing exogenous
framework.
Such retrospective applications are interesting and useful, but they leave one a
great distance from the application of such sophisticated measures to the positive
formulation of economic policy. There are differences between the actual and
optimal paths, but if tolerance intervals of error for econometric forecasts were
properly evaluated, it is not likely that the two solutions would be significantly
apart for the whole simulation path. If the two solutions are actually far apart, it
is often required to use extremely wide ranges of policy choice, wider and more
frequently changing than would be politically acceptable.
Two types of errors must be considered for evaluation of tolerance intervals,

var( 6)

var(e).

The correct parameter values are not known, they must be estimated from small
statistical samples and have fairly sizable errors. Also, there is behavioral error,
arising from the fact that models cannot completely describe the economy.
Appropriate valuation of such errors does not invalidate the use of models for
some kinds of applications, but the errors do preclude “fine tuning”.
A more serious problem is that the optimum problem is evaluated for a fixed
system of constraints; i.e. subject to

14A. Hirsch, S. Hymans, and H. Shapiro, “Econometric Review of Alternative Fiscal and Monetary
Policy, 1971-75,” Review of Economics and Statistics, LX (August, 1978), 334-45.
L. R. Klein and V. Su, “Recent Economic Fluctuations and Stabilization Policies: An Optimal
Control Approach,” Quantitative Economics and Deuelopment, (New York: Academic Press, 1980) cds,
L. R. Klein, M. Nerlove, and S. C. Tsiang.
M. B. Zarrop, S. Holly, B. Rutem, J. H. Westcott, and M. O’Connell, “Control of the LBS
Econometric Model Via a Control Model,” Optimal Control for Econometric Models, ed by S. Holly,
et al. (London: Macmillan, 1979), 23-64.
Ch. 35: Economic Policy Formation 2079

The problem of optimal policy may, in fact, be one of varying constraints,


respecifying F.
It has been found that the problem of coping with stagflation is intractable in
the sense that macro policies cannot bring both unemployment and inflation close
to desired targets simultaneously. On the other hand, there may exist policies that
do so if the constraint system is modified. By introducing a special TIPS policy
that ties both wage rates and profit rates to productivity

X/hL = real output per worker-hour

it has been found that highly favorable simulations can be constructed that
simultaneously come close to full employment and low inflation targets. These
simulation solutions were found with the same (Wharton) model that resisted full
target approach using the methods of optimal control. The wage and profits
(price) equations of the model had to be re-specified to admit

Alnw=Aln(X/hL)
Aln(PR/K)=Aln(X/hL)
PR = corporate profits
K = stock of corporate capital

Equations for wages and prices, estimated over the sample period had to be
removed, in favor of the insertion of these.t5
A creative policy search with simulation exercises was able to get the economy
to performance points that could not be reached with feasible applications of
optimal control methods. This will not always be the case, but will frequently be
so. Most contemporary problems cannot be fully solved by simple manipulation
of a few macro instruments, and the formalism of optimal control theory has very
limited use in practice. Simulation search for “good” policies, realistically for-
mulated in terms of parameter values that policy makers actually influence is
likely to remain as the dominant way that econometric models are used in the
policy process.
That is not to say that optimal control theory is useless. It shows a great deal
about model structure and instrument efficiency. By varying weights in the loss
function and then minimizing, this method can show how sensitive the uses of
policy instruments are. Also, some general propositions can be developed. The
more uncertainty is attached to model specification and estimation, the less
should be the amplitude of variation of instrument settings. Thus, William
Brainard has shown, in purely theoretical analysis of the optimum problem, that

“L. R. Klein and V. Duggal, “Guidelines in Economic Stabilization: A New Consideration,”


Wharton Qwrterly, VI (Summer, 1971), 20-24.
2080 L. R. Klein

Table 3
Growth assumptions and budget deficit fiscal policy planning, USA February 1984a

1984 1985 1986 1987 1988 1989

Real GNP estimates or assumptions (W)


administration 5.3 4.1 4.0 4.0 4.0 4.0
Congressional Budget OtXce
Baseline 5.4 4.1 3.5 3.5 3.5 3.5
Low alternative 4.9 3.6 -0.9 2.1 3.8 3.1
Fiscal Years
Estimated deficit ($ billion)
administration 186 192 211 233 241 248
Congressional Budget Office
Baseline 189 197 217 245 212 308
Low alternative 196 209 261 329 357 390

“Source: Baseline Budget Projections for Fiscal Years 1985-1989 Congressional Budget Office,
Washington, D. C. February 1984 Testimony of Rudolph G. Penner, Committee on Appropria-
tions, U.S. Senate, February 22, 1984.

policy makers ought to hold instruments cautiously to a narrow range (intervene


less) if there is great uncertainty. l6 This is a valuable advice developed from the
analysis of optimal policy.
A particular case of uncertainty concerns the business cycle. The baseline
solution for y, should reflect whatever cyclical variation is present in the actual
economy if predictions of y, are at all accurate. For example, the existence of a
cycle in the United States has been well documented by the National Bureau of
Economic Research and has been shown to be evident in the solutions of macro
econometric models.”
Although the baseline solution of a macro economy extending over 5 to 10
years should reflect a normal cyclical pattern unless some specific inputs are
included that wipe out the cycle, that is not the usual practice in public policy
planning. Policy makers are reluctant to forecast a downturn in their own
planning horizon. The accompanying table illustrates this point in connection
with U.S. budget planning in early 1984. The official baseline path assumes steady
growth of the economy, contrary to historical evidence about the existence and
persistence of a 4-year American cycle. An argument in support of this practice
has been that the exact timing of the cyclical turning points is in doubt. If they

16W. Brainard, “Uncertainty and the Effectiveness of Policy,” American Economic Review LVIII
May 1967), 411-25. See also L. Johansen, “ Targets and Instruments Under Uncertainty,” Institute of
Economics, Oslo, 1972. Brainard’s results do not, in all theoretical cases, lead to the conclusion that
instrument variability be reduced as uncertainty is increased, but that is the result for the usual case.
“See I. and F. Adelman, “The Dynamic Properties of the Klein-Goldberger Model,” Econometrica
27 (October 1959). 596-625. See also, Econometric Models of Cyclical Behavior, ed. B. G. Hickman
(New York: Columbia University Press, 1972).
Ch. 35: Economic Policy Formation 2081

are not known with great precision, it is argued that it is better not to introduce
them at all. An appropriate standard error of estimate is probably no larger than
f 1.0 year; therefore, they ought to be introduced with an estimated degree of
certainty.
The Congressional Budget Office in the United States has a fairly steady
expansion path for its baseline case, but introduces a cycle downturn for 1986, in
a low growth alternative case, between 4 and 5 years after the last downturn. It
would seem more appropriate to consider this as a baseline case, with the steady
growth projection an upper limit for a more favorable budget projection.
A series of randomly disturbed simulations of an estimated model

F = e;i) t =1,2,... H i=1,2 )... R,

with R replications of random error disturbances, generates solutions of the


estimated equation system F. Each replication produces

’y{i) \
’Zl
y2(i)
given ‘:f and initial conditions.

\ZH,
,YP,

The R stochastic projections will, on average, have cycles with random timing
and amplitude. They will produce R budget deficit estimates. The mean and
variance of these estimates can be used to construct an interval that includes a
given fraction of cases, which can be used to generate a high, low, and average
case for budget deficit values. The stochastic replications need not allow only for
drawings of e,(‘); they can also be used to estimate distributions of parameter
estimates for F.18 This is an expensive and time consuming way to generate policy
intervals, but it is a sound way to proceed in the face of uncertainty for
momentous macro problems.
It is evident from the table that provision for a business cycle, no matter how
uncertain its timing may be, is quite important. The higher and steadier growth
assumptions of the American administration produces, by far, the lowest fiscal
deficits in budgetary planning. A slight lowering of the steady path (by only 0.5
percentage points, 1986-89) produces much larger deficits, and if a business cycle
correction is built into the calculations, the rise in the deficit is very big. In the
cyclical case, we have practically a doubling of the deficit in five years, while in

lXThe technique employed in G. S&ink, Estimation of Forecast Error in a Dynamic and/or


NowLinear Econometric Model (Ph.D. dissertation, University of Pennsylvania (1971)) can be used for
joint variation of parameters and disturbances.
2082 L. R. Klein

the cycle-free case the rise is no more than about 50 percent in the same time
period.
Also, optimal control theory can be used to good advantage in the choice of
exogenous inputs for long range simulations. Suppose that values for

areneededfort=T+l,T+2,T+3,... T + 30 where T + 30 is 30 years from now


(in the 21st century). We have little concrete basis for choice of

WT+ 30 3 ZT-C 30’

By optimizing about a balanced growth path for the endogenous variables, with
respect to choice of key exogenous variables, we may be able to indicate sensible
choices of these latter variables for a baseline path, about which to examine
alternatives. These and other analytical uses will draw heavily on optimal control
theory, but it is unlikely that such theory will figure importantly in the positive
setting of economic policy.
The role of the baseline (balanced growth) solution for policy making in the
medium or long term is to establish a reference point about which policy induced
deviations can be estimated. The baseline solution is not, strictly speaking, a
forecast, but it is a policy reference set of points. Many policy problems are long
term. Energy availability, other natural resource supplies, social insurance reform,
and international debt settlement are typical long term problems that use econo-
metric policy analysis at the present time.
At the present time, the theory of economic policy serves as a background for
development of policy but not for its actual implementation. There is too much
uncertainty about the choice of loss function and about the constraint system to
rely on this approach to policy formation in any mechanistic way.” Instead,
economic policy is likely to be formulated, in part at least, through comparison of
alternative simulations of econometric models.
In the typical formulation of policy, the following steps are taken:

(i) definition of a problem, usually to determine the effects of external events


and of policy actions;
(ii) carry out model simulations in the form of historical and future projections
that take account of the problem through changes in exogenous variables,
parameter values, or system specification;
(iii) estimation of quantitative effects of policies as differences between simula-
tions with and without the indicated changes;

“See, in this respect, the conclusions of the Royal Commission (headed by R. J. Ball) Committee on
Policy Optimisation, Report, (London: HMSO, 1978).
Ch. 35: Economic Policy Formation 2083

(iv) presentation of results to policy decision makers for consideration in compe-


tition with estimates from many different sources.

Policy is rarely based on econometric information alone, but it is nearly always


based on perusal of relevant econometric estimates together with other assess-
ments of quantitative policy effects. Among econometric models, several Will
often be used as checking devices for confirmation or questioning of policy
decisions.
It is important in policy formulation to have a baseline projection. For the
short run, this will be a forecast of up to 3 years’ horizon. For the longer run, it
will be a model projection that is based on plausible assumptions about inputs of
exogenous variables and policy related parameters. For the longer run projec-
tions, the inputs will usually be smooth, but for short run forecasts the inputs will
usually move with perceptions of monthly, quarterly, or annual information
sources in a more irregular or cyclical pattern.
The model forecast or baseline projection serves not only as a reference point
from which to judge policy effects. It also serves as a standard of credibility. That
is to say, past performance of forecast accuracy is important in establishing the
credibility of any model.
Judgmental information, quantitative reduced form extrapolations (without
benefit of a formal model) and estimated models will all be put together for joint
information and discussion. Models are significant parts of this information
source but by no means the whole. In many respects, model results will be used
for confirmation or substantiation of decisions based on more general sources of
information.
Models are most useful when they present alternative simulations of familiar
types of changes that have been considered on repetitive occasions in the past, so
that there is an historical data base on which to build simulation analyses. A new
tax, a new expenditure program, the use of a new monetary instrument, or, in
general, the implementation of a new policy that calls on uses of models that have
not been examined in the past are the most questionable. There may be no
historical data base in such situations from which to judge model performance.
In new situations, external a priori information for parameter values or for
respecification with new (numerical) parameter values is needed. These new
estimates of parameters should be supplied by engineers or scientists for technical
relations, by legal experts for new tax relationships, or by whatever expertise can
be found for other relationships. The resulting simulations with non sample based
parameter estimates are simply explorations of alternatives and not forecasts or
projections.
Much attention has been paid, in the United States, recently to changes in laws
for taxing capital gains. There is no suitable sample that is readily available with
many observations at different levels of capital gains taxation. Instead, one would
2084 L. R. Klein

be well advised to look at other countries’ experience in order to estimate


marginal consequences of changing the tax laws for treatment of capital gains. In
addition, one could investigate state-to-state cross section estimates to see how
capital gains taxes might influence spending behavior. Similar analyses across
countries may also be of some help. Finally, we might try to insert questions into
a field survey on people’s attitudes towards the use of capital gains. These are all
basic approaches and should be investigated simultaneously. There is nothing
straightforward to do in a new situation, but some usable pieces of econometric
information may be obtained and it might help in policy formation. Recently,
claims were made about the great benefits to be derived from the liberalization of
capital gains rates in the United States, but these claims were not backed by
econometric research that could be professionally defended. For the ingenious
econometric researcher, there is much to gain on a tentative basis, but care and
patience are necessary.
In all this analysis, the pure forecast and forecasting ability of models play key
roles. Forecasts are worthwhile in their own right, but they are especially valuable
when examined from the viewpoint of accuracy because users of model results are
going to look at forecast accuracy as means of validating models. It is extremely
important to gain the confidence of model users, and this is most likely to be done
through the establishment of credibility. This comes about through relative
accuracy of the forecast. Can forecasts from models be made at least as accurately
as by other methods and are the forecasts superior at critical points, such as
business cycle turning points?
These questions are partly answered by the accuracy researches of Stephen
McNees and others.20 The answer is that models do no worse than other methods
and tend to do better at cyclical turning points and over larger stretches of time
horizon. The acceptability of model results by those who pay for them in the
commercial market lends greater support to their usefulness and credibility. This
supports their use in the policy process through the familiar technique of
alternative/comparative simulation.
International policy uses provide a new dimension for applications of econo-
metric models. Comprehensive models of the world economy are relatively new;
so it is meaningful to examine their use in the policy process. The world model
that is implemented through Project LINK has been used in a number of
international policy studies, and an interpretation of some leading cases may be
helpful.

“Stephen McNees, “The Forecasting Record for the 197Os,” New England Economic Review,
(September/October, 1979), 33-53.
Vincent Su, “An Error Analysis of Econometric and Noneconometric Forecasts,” American Economic
Review, 68, (May, 1978), 360-72.
Ch. 35: Economic Policy Formation 2085

Some of the problems for which the LINK model has been used are: exchange
rate policy, agricultural policy associated with grain failures, oil pricing policy,
coordinated fiscal policies, coordinated monetary policies.
When the LINK system was first constructed, the Bretton Woods system of
fixed exchange rates was still in force. It was appropriate to make exchange rates
exogenous in such an environment. At the present time exchange rate equations
have been added in order to estimate currency rates endogenously. An interesting
application of optimal control theory can be used for exchange rate estimation
and especially for developing the concept of equilibrium exchange rates. Such
equilibrium rates give meaning to the concept of the degree of over- or under-
evaluation of rates, which may be significant for the determining of fiscal
intervention in the foreign exchange market.
In a system of multiple models, for given exchange rates there is a solution,
model by model, for

(PX) * Xi - (PM) i * Mi = trade balance for i th country


( PX) i = export price
Xi = export volume (goods/services)
(PM) i = import price
Mi = import volume (goods/services)

These are all endogenous variables in a multi-model world system. The equi-
librium exchange rate problem is to set targets for each trade balance at levels
that countries could tolerate at either positive or negative values for protracted
periods of time- or zero balance could also be imposed. The problem is then
transformed according to Tinbergen’s approach, and assumed values are given to
the trade balance, as though they are exogenous, while solutions are obtained for

( EXR) i = exchange rate of the i th country

The exchange rates are usually denominated in terms of local currency units per
U.S. dollar. For the United States, the trade balance is determined as a residual
by virtue of the accounting restraints.

i i

and the exchange rate in terms of U.S. dollars is, by definition, 1.0.
As noted earlier, this problem, although straightforward from a conceptual
point of view, is difficult to carry out in practice, especially for a system as large
2086 L. R. Klein

and complicated as LINK; therefore, it has to be solved empirically from the


criterion

~{[(PX)~*x~-(PM)i*M~]-[(PX)j*Xl-(PM),*M,]*}2
i
=min=O,

with the entire LINK system functioning as a set of constraints. The minimization
is done with respect to the values of the exchange rates (instruments). With
modern computer technology, hardware, and software, this is a feasible problem.
Its importance for policy is to give some operational content to the concept of
equilibrium exchange rate values.
Optimal control algorithms built for project LINK to handle the multi-model
optimization problem have been successfully implemented to calculate Ronald
McKinnon’s proposals for exchange rate stabilization through monetary policy.21
As a result of attempts by major countries to stop inflation, stringent monetary
measures were introduced during October, 1979, and again during March, 1980.
American interest rates ascended rapidly reaching a rate of some 20% for short
term money. One country after another quickly followed suit, primarily to protect
foreign capital holdings and to prevent capital from flowing out in search of high
yields. An internationally coordinated policy to reduce rates was considered in
LINK simulations. Such international coordination would diminish the possibil-
ity of the existence of destabilizing capital flows across borders. Policy variables
(or near substitutes) were introduced in each of the major country models. The
resulting simulations were compared with a baseline case. Some world results are
shown, in the aggregate, in Table 4.
The results in Table 4 are purely aggregative. There is no implication that all
participants in a coordinated policy program benefit. The net beneficial results are
obtained by summing gains and losses. Some countries might not gain, individu-
ally, in a coordinated framework, but on balance they would probably gain if
coordination were frequently used for a variety of policies and if the whole world
economy were stabilized as a result of coordinated implementation of policy.
Coordinated policy changes of easier credit conditions helps growth in the
industrial countries. It helps inflation in the short run by lowering interest cost,
directly. Higher inflation rates caused by enhanced levels of activity are restrained

” Ronald 1. McKinnon, An International Standard for Monetary Stabilization, (Washington, D. C.:


Institute for International Economics), March, 1984.
Peter Pauly and Christian E. Petersen, “ An Empirical Evaluation of the McKinnon Proposal” Issues
in International MonetaT Policy, Project LINK Conference Proceedings, (San Francisco: Federal
Reserve Bank), 1985.
Ch. 35: Economic Policy Formation 2087

Table 4
Effects of coordinated monetary policy, LINK system world aggregates
(Deviation of policy simulation from baseline)

1979 1980 1981 1982 1983 1984

Value of world trade


(bill $) 15 53 85 106 125 149
Volume of world trade
(bill $, 1970) 4.7 14.4 20.2 22.8 24.7 26.9
OECD (13 LINK countries)
GDP growth rate (W) 1.9 1.9 1.0 -0.2 -0.5 -0.4
Consumer price inflation rate (%) - 0.2 -0.5 -0.4 0.1 0.3 0.3

by the overall improvement in productivity. This latter development comes about


because easier credit terms stimulate capital formation. This, in turn, helps
productivity growth measured as changes in output per worker. A pro-inflationary
influence enters through the attainment of higher levels of capacity utilization, but
it is the function of the models to balance out the pro and counter inflationary
effects.
Policy is not determined at the international level, yet national forums consult
simulations such as this coordinated lowering of interest rates and the frequent
repetition of such econometric calculations can ultimately stimulate policy think-
ing along these lines in several major countries. A number of fiscal and exchange
rate simulations along coordinated international lines have been made over the
past few years.22y23

5. Prospects

Economic policy guidance through the use of econometric models is clearly


practiced on a large scale, over a wide range of countries. Fine tuning through the
use of overall macro policies having to do with fiscal, monetary, and trade matters

‘*L. R. Klein, P. Beaumont, and V. Su, “Coordination of International Fiscal Policies and
Exchange Rate Revaluations,” Mode&g the International Transmission Mechanism. ed. J. Sawyer
(Amsterdam: North-Holland, 1979), 143-59.
H. Georgiadis, L. R. Klein, and V. Su, “International Coordination of Economic Policies,” Greek
Economic Review I (August, 1979), 27-47.
L. R. Klein, R. Simes, and P. Voisin, “Coordinated Monetary Policy and the World Economy,”
Prt%sion et Analyse iconomique, 2 (October 1981), 75-104.
23A new and promising approach is to make international policy coordination a dynamic game. See
Gilles Oudiz and Jeffrey Sachs, “Macroeconomic Policy Coordination among the Industrial Countries”
Brookings Papers on Economic Activity (1, 1984) l-64.
2088 L. R. Klein

has been carried quite far, possibly as far as it can in terms of methodological
development. There will always be new cases to consider, but the techniques are
not likely to be significantly improved upon. To some extent, formal methods of
optimal control can be further developed towards applicability. But significant
new directions can be taken through the development of more supply side content
in models to deal with the plethora of structural policy issues that now confront
economies of the world. This situation is likely to develop-further along supply
side lines. The bringing into play of joint Leontief-Keynes models with fully
articulated input-output systems, demographic detail, resource constraints and
environmental conditions are likely to be important for the development of more
specific policy decisions requiring the use of more micro details from models. This
is likely to be the next wave of policy applications, focusing on energy policy,
environmental policy, food policy, and other specific issues. It is clear that
econometric methods are going to play a major role in this phase of development.

Appendix: An outline of a combined (Keynes-Leontief) input-output/macro model

The first five sectors listed on p. 2067 are the components of final demand as they
are laid out in the simple versions of the Keynesian macro model, extending the
cases cited earlier by the explicit introduction of inventory investment and foreign
trade. When the Keynesian system is extended to cover price and wage formation,
then the production function, labor requirements, labor supply and income
determination must also be included. These, together, make up the main compo-
nents of national income. Interest income and monetary relationships to generate
interest rates must also be included. This outlines, in brief form, the standard
macro components of the mainstream econometric model. The interindustry
relationships making up the input-output system round out the total model.
The flow of goods, in a numeraire unit, from sector i to sector j is denoted as

Correspondingly, the total gross output of j is X,. The technical coefficients of


input-output analysis are defined as

aij = Xi,/Xj

and the basic identity of input-output analysis becomes

” n

Xi= C Xi,+&= C a,jX,+<,


j=l j=l
Ch. 35: Economic Policy Formation 2089

where Fi is final demand, and the total number of sectors is n. In matrix notation
this becomes

(z-A)X=F.

X is a column vector of gross outputs, and F is a column vector of final demand.


F can be decomposed into

Fc + F, -I- FG + FE - FM = F

where Fc is total consumer demand, FI is total investment demand (including


inventory investment), FG is public spending, FE is export demand, and FM is
import demand. The detail of decomposition of F used here is only illustrative.
Many subcategories are used in a large system in applied econometrics.
The elements of F sum to GNP. If we denote each row of F as

and divide each component by its column total, we get

F r;;., I;;.M
a,, = rC ; ail = -r;;., ; ai, = -e;;.G; aiE = F ; aiM = F .
Fc FI FG E M

The array of elements of these final demand coefficients make up a rectangular


matrix, called C. If we denote the column

’ Fc
4
9= FG
FE
\ - 44

by 9 (standing for GNP), we can write

F=CS

or

(z-A)X=CS
x= (I- A)%3

This gives a (row) transformation expressing each sector’s gross output as a


2090 L. R. Klein

weighted sum of the components of GNP. It shows how a model of ~3 (GNP)


values can be transformed into individual sector outputs if we make use of the
matrix of input-output and final demand coefficients.
This transformation is extended from gross output values to ualues-added by
sector. The transformation is

X= BY

where

/ 1 \
0
l- 5 ai,
i=l

B=
1
0
l- i ai,
r=l I
\

We observe also that the sum of Y. gives the GNP total, too.

Y= B-‘(I-A)-‘@.

This gives the (row) transformation between elements of 9 and elements of Y,


where both column vectors are different decompositions of the GNP, one on the
side of spending and the other on the side of production.
If we construct synthetic price deflators for values added Py and for final
demand Pg, we find the relationship

PY’Y = P;G.

This can be transformed into

P;B-‘(14-b= Pi’“.

By equating corresponding terms in the elements of 3 we have the (column)


transformations

Pgi= 2 hj,Pyj i=C,I,G,X,M


;=1

A typical element of B-‘(I - A)-‘C is denoted as hji.


Ch. 35: Economic Policy Formation 2091

Industry outputs are weighted sums of final expenditures, and expenditure


deflators are weighted sums of sector output prices. Prices, in this model, are
determined by mark-up relations, over costs, at the industry sector level and
transformed into expenditure deflators. Demand is determined at the final ex-
penditure level and transformed into output levels.
The relation

X= BY

provides a set of simple transformations to convert from sector gross output


values to sector values added. There is a corresponding price transformation

(I - A’)Z’, = B-‘Py

P,=B(Z-A’)P,

This is derived as follows:

P,,X, = Pyjz;. + 5 PxiXij


i=l
n

PxjXj = PyjZ; + C PxiaijX,


i=l

Pxj = Pyjz + F Pxia,j


J i=l

The ratio q/Xj (value added to gross output of sector j) can be written as

;=I- t aij;
J i=l

they are the reciprocals of the diagonal elements of B. In matrix notation we have

P, = B - ‘Py + A’P,

or, more compactly

Py = B(Z- A’)P,.

This system of equations provides transformation from gross output prices to


value added prices, or vice-versa. In the model’s behavioral equations, there is
first determination of P,; then the above transformation derives P,,, and from
these, we estimate Pg.
2092 L. R. Klein

The integration of input-output analysis with macro models of final demand,


income generation, and monetary relationships is seemingly straightforward and
non-stochastic, to a large extent. This is, however, deceptive because the ajj from
the inter-industry flow matrix, and the expenditure shares in the final demand
coefficient matrix are not time invariant parameters; they are ratios of variables.
This model has been generalized so that production functions are written as

and the input-output coefficients

must be generated from a set of relationships describing behavior of the produc-


ing units of the economy. Similarly, the final demand ratios should be generated
by behavioral relationships in consuming and market trading sectors of the
economy. The main point is that these coefficients should all depend on relatiue
prices.
The Wharton Model has been estimated for the case in which the functions E;;
are generalized CES functions for the intermediate output flows, while the
original factors Lj and Kj are related to value added production in a
Cobb-Douglas relationship.24

uj = elasticity of substitution.

The associated optimization equations for producer behavior are

24R. S. Preston, “The Wharton Long Term Model: Input-Output Within the Context of a Macro
Forecasting Model,” Econometric Model Performance, ed. by L. R. Klein and E. Burmeister, (Phila-
delphia, University of Pennsylvania Press, 1976), 271-87. In a new generation of this system, the
sector production functions are nested CES functions, with separate treatment for energy and
non-energy components of X,,.
Ch. 35: Economic Policy Formation 2093

This system has an implicit restriction that the elasticity of substitution between
pairs of intermediate inputs is invariant for each sector, across input pairs. This
assumption is presently being generalized as indicated in the preceding footnote.
In other models, besides the Wharton Model, different production function
specifications are being used for this kind of work, e.g. translog specifications.
The demand side coefficients of final expenditure have not yet been estimated
in terms of complete systems, but they could be determined as specifications of
complete expenditure systems.25

All these equations are stochastic and dynamic, often with adaptive adjustment
relations.

25See Theodore Gamaletsos, Forecasting Sectoral Final Demand by a Dynamic Expenditure System,
(Athens: Center of Planning and Economic Research, 1980), for a generalization of this expenditure
system.

You might also like