0% found this document useful (0 votes)
27 views332 pages

An Introduction To Bayesian Inference in Ecometrics

This book serves as an introduction to Bayesian inference in econometrics, discussing its relationship to general scientific inference. It covers fundamental concepts, applications, and comparisons between Bayesian and sampling theory approaches across various econometric models. The text is designed for readers with a background in probability theory, calculus, and econometrics, and includes numerous examples and problems for practical understanding.

Uploaded by

lacmd413
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views332 pages

An Introduction To Bayesian Inference in Ecometrics

This book serves as an introduction to Bayesian inference in econometrics, discussing its relationship to general scientific inference. It covers fundamental concepts, applications, and comparisons between Bayesian and sampling theory approaches across various econometric models. The text is designed for readers with a background in probability theory, calculus, and econometrics, and includes numerous examples and problems for practical understanding.

Uploaded by

lacmd413
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 332

An Introduction to Bayesian Inference in Econometrics ARNOLD

ZELLNER Wiley Classics Library Edition Published 1996 A WILEY-


INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New
York � Chichester � Brisbane � Toronto e, Singapore

Copyright � 1971 by John Wiley & Sons, Inc. Wiley Classics Library
Edition Published 1996. All rights reserved. Published simultaneously in
Canada. Reproduction or translation of any part of this work beyond that
permitted by Section 107 or 108 of the 1976 United States Copyright Act
without the permission of the copyright owner is unlawful. Requests for
permission or further information should be addressed to the Permissions
Department, John Wiley & Sons, Inc. Library of Congress Catalogue Card
Number: 70-156329 ISBN 0-471-98165-6 ISBN 0-471-16937-4 Wiley
Classics Library Edition Printed in the United States of America 10987654
to Agnes and our sons

Preface The purpose of this book is to provide readers with an introduction


to Bayesian inference in econometrics. An effort has been made to relate
the problems of inference in econometrics to the general problems of
inference in science and to indicate how the Bayesian approach relates to
general prob- lems of scientific inference in Chapter I. In Chapter II some
fundamental concepts and operations emp19yed in the Bayesian approach
to inference are presented, discussed, and applied in analyses of several
simple and important problems. Chapters III through IX are devoted to
Bayesian analyses of models often encountered in econo- metric work, with
the main emphasis on estimation. Many comparisons of Bayesian results
with sampling theory results are made. In Chapter X I treat the problems of
testing and comparing hypotheses, and in Chapter XI I analyze several
control problems relating to regression and Other processes. In Chapter XII
I present some concluding remarks. Appendices A and B provide a resum6
of the properties of a number of important univariate and multivariate
distributions. In Appendix C univariate and bivariate numerical integration
techniques are briefly described. I have tried to keep the analysis and
notation in this book as simple as possible. However, readers are assumed
to be familiar with basic concepts and operations of probability theory,
differential and integral calculus, and matrix algebra. A background in
econometrics and statistics at about the level of A. S. Goldberger's text,
Econometric Theory, is desirable for an appreciation of the econometric
relevance of the stochastic models considered and for an assessment of
comparisons of Bayesian and sampling theory results. For several years the
material in this book has been presented to graduate students in economics
and business at the University of Chicago in a course entitled Bayesian
Inference in Econometrics. From experience in this course, I found that
students not only mastered technical elements of the Bayesian approach but
also gained considerable understanding of the properties of both sampling
theory and Bayesian approaches to inference and the criteria used to
appraise alternative systems of inference. Thus this teaching A. S.
Goldberger, Econometric Theory, New York: Wiley 1964. vii

viii PREFACE experience lends substantial support to Lindley's statement


that the Bayesian and orthodox "approaches complement each other and
together provide a substantially better understanding of statistics than either
approach on its own."2 In teaching Bayesian Inference in Econometrics the
material in Chapter I can serve as a basis for introducing students to some
basic issues in the philosophy and methodology of science. It is desirable
for an instructor to relate this material to philosophical and methodological
issues in economics and econometrics. The text of the chapter and the
questions at its end, are designed to encourage students to think about the
nature and foundations of science and scientific method so that they may
have a better understanding of research in economics and econometrics.
Chapter II provides a resum of basic concepts and principles of Bayesian
analysis along with some simple, but important, applications. Since most of
the remaining chapters involve applications of the concepts and principles
set forth in Chapter II, it is clearly vital that students master the subject
matter of Chapter II. Perhaps the most difficult topics in Chapter II are the
role and nature of prior information in analyses of data and the use of prob-
ability density functions to represent prior information. These topics require
careful and thorough discussion. Chapters III through IX are in essence
technical chapters wherein the prin- ciples of Chapter II are applied in
analyses of a number of models encount- ered in economics and
econometrics. While the Bayesian principles involved in these analyses are
the same, each problem has its peculiar technical features. In mastering
technical features, the student is introduced to a number of distributions and
operations which are thought to be valuable in a number of different
contexts. Also, problems and applications are included which give the
analyses relevance to current econometric research. In working some of the
problems students will require use of numerical integration computer
programs which are generally available at computation centers or which can
easily be programmed. Experience in using numerical integration programs
will be valuable in approaching the analysis of a broad range of applied
problems. Chapter X takes up problems of comparing and testing
hypotheses and models. The material in this chapter is in the nature of an
introduction and may suggest fruitful areas for further methodological as
well as applied work. In Chapter XI some control problems are analyzed,
and again there is an opportunity for additional work on methods and on
applications. ' D. V. Lindley, Introduction to Probability and Statistics from
a Bayesian Viewpoint, Part 2. Inference. Cambridge: Cambridge University
Press, 1965, p. 70. 3 See, for example, B. Noble, Numerical Methods, H:
Differences, Integration and Dif- ferential Equations, New York:
Interscience Publishers, Inc., 1964 and Appendix C. PREFACE ix Finally,
Chapter XII provides one person's, my own, summary and con- cluding
remarks. Since systems of inference are extremely controversial, it is' not to
be expected that all will agree with what is presented. Thus, in teach- ing,
Chapter XII can be utilized as a point of departure for developing each
student's assessment of the Bayesian approach. The content of this volume
reflects intellectual interaction with a number of individuals. The work of
H. Jeffreys has contributed importantly to ad- vancing my understanding of
the problems of inference and of elements of the Bayesian approach. In
addition, papers and books by G. A. Barnard, G. E. P. Box, I. J. Good, D. V.
Lindley, H. Raiffa and R. Schlaifer, and L. J. Savage have had an important
influence on me, both with respect to funda- mental conceptual problems
and technical matters. In regard to current and former colleagues, I have
benefited considerably from association with and the writings of G. E. P.
Box, A. S. Goldberger, I. Guttman, M. Stone, and G. C. Tiao at the
University of Wisconsin, Madison, Wisconsin, during the period 1960-
1966. At the University of Chicago J. Drze, S. J. Press, H. �. Roberts, D.
Sharma, and H. Thornber have been stimulating colleagues in connection
with the development of the present volume. Among past and current
students who have contributed to the current work in the classroom and as
research assistants, the following have been particularly helpful: V. K.
Chetty, R. V. Cooper, M. S. Geisel, P.M. Laub, C. J. Park, J. F. Richard, N.
$. Revankar, U. Sankar, P. A. V. B. Swamy, and H. Thornber. Much of the
research reported in this volume was supported by grants from the National
Science Foundation for which I am grateful. Under the initial grant, GS-
151, research was carried forward in association with G. C. Tiao. During
periods of the first and second renewals of the original grant S. J. Press and
H. Thornher participated in the research. Thanks are due to Tiao, Press, and
Thornher for their valuable contributions and splendid co- operation in
achieving the objectives of the NFS grants. The H. G. B. Alexander
Endowment Fund gave salary support to the author since 1966 when he was
appointed H. G. B. Alexander Professor of Economics and Statistics in the
Graduate School of Business, University of Chicago. Mrs. Shirley Black
provided expert assistance in typing and proofing the manuscript for which
I express my sincere thanks. Chicago, Illinois April 1971 ARNOLD
ZELLNER

Contents I Remarks on Inference in Economics 1.1 The Unity of Science


1.2 Deductive Inference 1.3 Inductive Inference 1.4 Reductive Inference 1.5
Jeffreys' Rules for a Theory of Inductive Inference 1.6 Implications of the
Rules Questions and Problems 1 2 4 5 7 8 12 II Principles of Bayesian
Analysis with Selected Applications 13 xi 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9
2.10 2.11 2.12 2.13 Bayes' Theorem Bayes' Theorem and Several Sets of
Data Prior Probability Density Functions Marginal and Conditional
Posterior Distributions for Parameters Point Estimates for Parameters
Bayesian Intervals and Regions for Parameters Marginal Distribution of the
Observations Predictive Probability Density Functions Point Prediction
Prediction Regions and Intervals Some Large Sample Properties of
Bayesian Posterior Pdf's Application of Principles to Analysis of the Pareto
Distribution Application of Principles to Analysis of the Binomial
Distribution 13 17 18 21 24 27 28 29 30 31 31 34 38

xii III IV v CONTENTS 2.14 Reporting the Results of Bayesian Analyses


Appendix Questions and Problems The Univariate Normal Linear
Regression Model 3.1 3.2 The Simple Univariate Normal Linear Regression
Model 3.1.1 Model and Likelihood Function 3.1.2 Posterior Pdf's for
Parameters with a Diffuse Prior Pdf 3.1.3 Application to Analysis of the
Investment Multiplier The Normal Multiple Regression Model 3.2.1 Model
and Likelihood Function 3.2.2 Posterior Pdf's for Parameters with a Diffuse
Prior Pdf 3.2.3 Posterior Pdf Based on an Informative Prior Pdf 3.2.4
Predictive Pdf 3.2.5 Analysis of Model when X'X is Singular Questions and
Problems Special Problems in Regression Analysis 4.1 4.2 4.3 The
Regression Model with Autocorrelated Errors Regressions with Unequal
Variances Two Regressions with Some Common Coefficients Appendix 1
Appendix 2 Questions and Problems On Errors in the Variables 5.1 The
Classical EVM: Preliminary Problems 5.2 Classical EVM: ML Analysis of
the Functional Form 5.3 ML Analysis of Structural Form of the EVM 41 54
58 58 58 60 63 65 65 66 70 72 75 82 86 86 98 108 110 110 112 114 114
123 127 VI VII VIII 5.4 5.5 5.6 CONTENTS Bayesian Analysis of the
Functional Form of the EVM Bayesian Analysis of the Structural Form of
the EVM Alternative Assumption about the Incidental Parameters
Appendix Questions and Problems Analysis of Single Equation Nonlinear
Models 6.1 The Box-Cox Analysis of Transformations 6.2 Constant
Elasticity of Substitution (CES) Production Function 6.3 Generalized
Production Functions Questions and Problems Time Series Models: Some
Selected Examples 7.1 First Order Normal Autoregressive Process 7.2 First
Order Autoregressive Model with Incomplete Data 7.3 Analysis of a
Second Order Autoregressive Process 7.4 "Distributed Lag" Models 7.5
Applications to Consumption Function Estimation 7.6 Some
Generalizations of the Distributed Lag Model Appendix Questions and
Problems Multivariate Regression Models 8.1 The Traditional Multivariate
Regression Model 8.2 Predictive Pdf for the Traditional Multivariate
Regression Model 8.3 The Traditional Multivariate Model with Exact
Restrictions 8.4 Traditional Model with an Informative Prior Pdf 8.5 The
"Seemingly Unrelated" Regression Model Questions and Problems xiii 132
145 145 154 157 162 162 169 176 183 186 186 191 194 200 207 213 216
220 224 224 233 236 238 240 246

xiv IX x xI CONTENTS Simultaneous Equation Econometric Models 9.1


9.2 9.3 9.4 9.5 9.6 9.7 248 On 10.1 10.2 10.3 10.4 10.5 Fully Recursive
Models 250 General Triangular Systems 252 The Concept of Identification
in Bayesian Analysis 253 Analysis of Particular Simultaneous Equation
Models 258 "Limited Information" Bayesian Analysis 265 Full System
Analysis 270 Results of Some Monte Carlo Experiments 276 9.7.1 The
Model and Its Specifications 277 9.7.2 Sampling-Theory Analysis of the
Model 278 9.7.3 Bayesian Analysis of the Model 278 9.7.4 Experimental
Results: Point Estimates 280 9.7.5 Experimental Results: Confidence
Intervals 286 9.7.6 Concluding Remarks on the Monte Carlo Experiments
286 Questions and Problems 287 Comparing and Testing Hypotheses 291
Posterior Probabilities Associated with Hypotheses 292 Analyzing
Hypotheses with Diffuse Prior Pdf's for Parameters 298 Comparing and
Testing Hypotheses with Nondiffuse Prior Information 302 Comparing
Regression Models 306 Compgring Distributed Lag Models 312 Questions
and Problems 317 Analysis of Some Control Problems 11.1 11.2 11.3 11.4
11.5 11.6 319 Some Simple One Period Control Problems 320 Single-
Period Control of Multiple Regression Processes 327 Control of
Multivariate Normal Regression Processes 331 Sensitivity of Control to
Form of Loss Function 333 Two-Period Control of the Multiple Regression
Model 336 Some Multiperiod Control Problems 344 Appendix 1 354
Appendix 2 356 Questions and Problems 358 XII Conclusion Appendix A
Appendix B Appendix C Bibliography Author Index Subject Index
CONTENTS XV 360 363 379 400 415 423 427

CHAPTER I Remarks on Inference in Economics It is a mistake for a


sculptor or a painter to speak or write very often about his job. It releases
tension needed for his work. By trying to express his aims with rounded-off
logical exactness, he can easily become a theorist whose actual work is only
a caged-in exposition of conceptions evolved in terms of logic and words.
Henry Moore x What Moore has said about discussing work in art is
undoubtedly applicable to discussions of methodology in economics. The
importance, however, of knowing what it is we are doing in economic
research makes it worthwhile on occasion to reflect on the general
foundations of our work. 1.1 THE UNITY OF SCIENCE The point of view
taken herein is that scientific inferences made about economic phenomena
are not fundamentally different from inferences made about phenomena in
other areas of science. This stress on the unity of science has been elegantly
expressed by Karl Pearson in the following words�': Now this is the
peculiarity of scientific method, that when once it has become a habit of
mind, that mind converts all facts whatsoever into science. The field of
science is unlimited; its material endless, every group of natural
phenomena, every phase of social life, every stage of past or present
development is material for science. The unity of all science consists alone
in its method, not in its material. The man who classifies facts of any kind
whatever, who sees their mutual relation and describes their sequences, is
applying the scientific method and is a man of science. The facts may
belong to the past history of mankind, to the social statistics of our great
cities, to the atmos. phere of the most distant stars, to the digestive organs
of a worm, or to the life of a scarcely visible bacillus. It is not the facts
themselves which form science, but the methods by which they are dealt
with. To see that inferences in economic research are not fundamentally
different from inferences in other areas of science it is relevant to review
the kinds of x Henry Moore, "Notes on Sculpture," in B. Ghiselin, ed., The
Creative Process. New York: Mentor Books, 1952, p. 73. ' Karl Pearson,
The Grammar of Science. London: Everyman, 1938, p. 16.

2 REMARKS ON INFERENCE IN ECONOMICS inferences that are


relied on in scientific work. Aristotle lists three types of inference, namely,
deductive, inductive, and reductive (also translated from Greek as
"abductive" or "retroductive"). It is important that the nature of these types
of inference be set forth clearly to appreciate their role in economic
research. 1.2 DEDUCTIVE INFERENCE On the nature of deductive
inference, Reichenbach writes as followsa: Logical proof is called
deduction; the conclusion is obtained by deducing it from other statements,
called the premises of the argument. The argument is so constructed that if
the premises are true the conclusion must also be true .... It unwraps, so to
speak, the conclusion that was wrapped up in the premises. There can be no
doubt that deductive inference plays an important role in economics. It must
be appreciated, however, that deductive inference alone is inadequate to
serve as a basis for inference in economics. Primarily, this is the case
because, as Jeffreys points out, Traditional or deductive logic admits only
three attitudes to any proposition: definite proof, disproof, or blank
ignorance. But no number of previous instances of a rule provide a
deductive proof that the rule will hold in a new instance. There is always
the formal possibility of an exception. This will be recognized as a
restatement of Hume's views on the impossi- bility of complete certainty in
knowledge; for example, we cannot be com- pletely certain (probability
equal to one) on purely deductive or inductive grounds that the sun will rise
tomorrow. The fact that an exception to a rule or law is always possible
means that deductive logic with its extreme attitudes of definite proof,
disproof, or blank ignorance is inadequate to deal with the usual situations
faced by researchers, situations in which the researcher requires and
produces statements less extreme than those yielded by deductive logic.
Another point made by Jeffreys, which illustrates the inadequacies of
deduction to serve as the sole process in research, is the fact that for any
given set of data there is usually an infinite number of possible laws that
will "explain" the data precisely; for example, suppose we observe the
consump- tion and income of N households and suppose, merely for
argument's sake, that the plot of consumption c against income y is exactly
linear. We recog- nize that the same data will also be exactly described by
the following infinity of laws: c = + fly + f(y)(y - y)(y. - y) ... (y, - y), 3
Hans Reichenbach, The Rise of Scientific Philosophy. Berkeley: University
of California Press, 1958, p. 37. Harold Jeffreys, Theory of Probability (3rd
ed.). Oxford: Clarendon, 1961, pp. 2-3. DEDUCTIVE INFERENCE 3
where f(y) is any arbitrary function that is not infinite at y, i = 1, 2,..., n.
Further, an infinity of these laws will be contradicted by one additional
observation. Now deductive logic alone has nothing to say about which of
these laws is the one for the researcher to choose. Some broader principles
of choice are required, one of which is the principle of simplicity; that is,
given a collection of models, which fit the facts equally well, the simplest is
chosen. Abstracting from the obvious problem of defining simplicity, some
choose the simplest model because they believe that it will predict best.
Others assert that it is worthwhile to consider simple models because,
although they may not be of greatest ultimate value, they are valuable, for
they make strong statements about phenomena that are readily testable. This
facilitates the primary activity of learning from experience. Although no
definite conclusions are yet available on these two positions, both involve a
prejudice for working with simple models. With respect to the issue of
simplicity, the following remarks of W. G. Cochran are relevant and
interesting, About 20 years ago, when asked in a meeting what can be done
in observational studies to clarify the step from association to causation, Sir
Ronald Fisher replied: "Make your theories elaborate." The reply puzzled
me at first, since by Occam's razor the advice usually given is to make
theories as simple as is consistent with known data. What Sir Ronald meant,
as the subsequent discussion showed, was that when construct- ing a causal
hypothesis one should envisage as many different consequences of its truth
as possible, and plan observational studies to discover whether each of
these consequences is found to holdfi Thus, although a theory may be
simple, it is usually desirable that its implica- tions be far reaching and
investigated to evaluate their empirical validity. In summary, our position is
that deductive inference is an important ingredient in scientific inference
but that, by itself, it is an inadequate founda- tion on which to base all
inference. This view, of course, conflicts with the view that economics is a
purely deductive science. It cannot be denied that there are those in
economics and elsewhere who are active in deducing the logical
implications of stated assumptions. A prime example of this type of work is
Arrow's Social Choice and Individual Values. It must be recognized,
however, that work of this type is just part of what is done in economic
research. The empirical relevance of such deductive results is a primary
issue. To assess this issue the wider process of induction is needed. An
apparently opposite view is strongly stated by Popper who writes, 6 "This
appraisal of the hypothesis relies solely upon deductive consequences
(predictions) which may be drawn from the hypothesis: There is no need
even s Quotation from W. G. Cochran, "The Planning of Observational
Studies of Human Populations," J. Roy. Statist. Soc., Series A, Part 2, 234-
255 (1965). 6 Karl R. Popper, The Logic of Scientific Discovery. New
York; Science Editions, 1961, p. 315.

4 REMARKS ON INFERENCE IN ECONOMICS to mention ' induction.'"


[His italics.] The major point to be appreciated in assessing Popper's
position is that for him induction is viewed more narrowly than we are
viewing it. (Cf. p. 27 if.) Herein, we follow Jeffreys' point of view, quite at
variance with Popper's, that inductive logic is such that deductive logic is
encompassed within it. Its statements of proof and disproof are limiting
cases of the types of statement yielded by inductive logic. Thus, in our
view, inductive logic and deductive logic should not be regarded as
mutually exclusive alternatives. In inductive logic deduction plays an
impor- tant role; however, because inductive logic is broader than deductive
logic, the rules for making inductive inferences must be stated and will
differ in certain respects from those governing deductive inferences.
Further, as regards Popper's statement of the infinite regress argument,
namely, that to justify an inductive approach an inductive argument is
required which itself needs inductive justification and so on, it seems clear
that a deductive approach is open to the same kind of criticism. The best
that can be done now, it seems, is to follow Jeffreys by adopting a
pragmatic solution, that is, not to prove that induction is valid--since if this
could be done deductively induction would be reduced to deduction which
is impossible--nor to show that induction is valid by empirical
generalizations--since in this case the argument would be circular--but to
state a priori rules governing inductive logic, accepted independently of
experience. Then "induction is the application of the rules to observational
data." 7 Jeffreys further remarks, 8 "All that can be done is to state a set of
hypotheses, as plausible as possible, and see where they lead us." We shall
see that these hypotheses or rules for inductive inference encompass many
elements of Popper's deductive approach, which is as it should be since
herein induction is being viewed as a broader process than deduction, one in
fact that includes deductive logic as a special limiting case. 1.3
INDUCTIVE INFERENCE Jeffreys aptly remarks, 9 The fundamental
problem of scientific progress, and a fundamental one of everyday life, is
that of learning from experience. Knowledge obtained in this way is partly
merely description of what we have already observed, but part consists of
making inferences from past experience to predict future experience. This
part may be called generalization or induction. It is the most important part;
events that are merely described and have no apparent relation to others
may as well be forgotten, and in fact usually are. Note that for Jeffreys
induction is not mere description and inductive generalizations are not
economical modes of describing past observations. In 7 Harold Jeffreys, op.
cit., p. 8. 8 Ibid., p. 8. Ibid., p. 1. REDUCTIVE INFERENCE 5 fact, he is
critical of Math, who took this latter point of view, because "Mach missed
the point that to describe an observation that has not been made yet is not
the same thing as to describe one that has been made; consequently he
missed the whole problem of induction. "� Although Jeffreys emphasizes
generalization, he is careful not to exclude description as part of the process
of learning from experience and of generalization for prediction. In fact, we
shall see that unusual facts play an important role in the process of
reduction, the third kind of inference. Also, Jeffreys emphasizes �.. that
inference from past observations to future ones is not deductive� The
observa- tions not yet made may concern events either in the future or
simply at places not yet inspected. It is technically called induction ....
There is an element of uncertainty in ail inferences of the kind considered.
1.4 REDUCTIVE INFERENCE This type of inference, also referred to as
"abductlye" or "retroductive," is the most illusive to define and discuss.
Pierce states that induction is the experimental testing of a finished theory;
it can never originate any idea whatever? This is a somewhat narrower view
of induction than taken by Jeffreys, since Jeffreys includes generalization as
part of the inductive process. Jeffreys, however, is vague on the process of
generalization. According to Pierce, abduction or reduction suggests that
something may be; that is,' it involves studying facts and devising theories
to explain them. For Pierce and others the link of reduction with the unusual
fact is emphasized. Examples in economics are not hard to find. Kuznets'
finding that the long-run saving ratio was constant led to reductive activities
that resulted in several well- known theoretical explanations. Although we
recognize that unusual and surprising facts often trigger the reductive
process to produce new concepts and generalizations, it is still pertinent to
probe more deeply into the nature of the process. Here Hada- mard's work
on discovery in the mathematical field seems particularly relevant. He
writes, 3 "Indeed, it is obvious that invention or discovery, be it in
mathematics or anywhere else, takes place by combining ideas." In
agreement with Poincar6, Hadamard views the problem of discovery or
invention as one of choice among the many possible combinations of ideas.
These combinations are formed in both the conscious and unconscious
minds. 0 Harold Jeffreys, Scientific Inference (2nd ed.). Cambridge:
Cambridge University Press, 1957, p. 15. Ibid., p. 13. zo. See N. R. Hanson,
Patterns of Discovery. Cambridge: Cambridge University Press, 1958, p.
85. xa Jacques Hadamard, The Psychology of Invention in the
Mathematical FieM. New York: Dover, 1945, p. 29.
6 REMARKS ON INFERENCE IN ECONOMICS It is the task of the
researcher to avoid choosing useless combinations and to select only those
that are useful, usually only a small fraction of the total. In this process
Choice is "imperatively governed by the sense of scientific beauty." x
Further, most of this work is done in the unconscious mind. To make this
process work, however, the conscious mind plays an important role; it
"starts its action and defines, to a greater or lesser extent, the general
direction in which that unconscious has to work. "x5 It mobilizes ideas
initially which lead to a stirring up of ideas previously held and this leads to
new combinations. In this respect it is important that the conscious mind not
be restricted to narrow lines of thought or held too closely to previous lines
of thought. "Thinking aside'.' leads to a richer variety of ideas from which
combinations can be formed. Being in touch with developments in several
fields serves the same purpose, and above all, a good deal of hard
preparatory work seems to be required for the reductive process to work--"
sudden inspirations... never happen except after some days of voluntary
effort which has appeared absolutely fruitless and whence nothing good
seems to have come, where the way taken seems totally astray." 6 This, for
Hadamard, represents an answer to those who adopt the "chance," "rest,"
and "forget- ting" hypotheses of discovery. The preparatory work period is
accompanied and followed by an incubation period and finally by
illumination. After that the conscious mind precises the illumination and
proceeds to verify it. There are at least two conceptions of this process: (a) a
goal being given and how to reach it and (b) discovering a fact and then
imagining how it could be useful. "Now, paradoxical as it seems, that
second kind of invention is the more general one and becomes more and
more so as science advances. Practical application is found by not looking
for it, and one can say that the whole progress of civilization rests on that
principle." x7 It is clear from the above discussion that many aspects of
reduction are not fully understood. Some significant features of the process
do emerge, however. First, there is recognition that the process is one of
choosing particular com- binations of ideas that seem fruitful. In this choice
both the unconscious and conscious minds play a role importantly guided
by an esthetic sense of scientific beauty. This esthetic sense is indeed a
subjective element bearing some relation to the concept of simplicity, albeit
not necessarily a one-to-one correspondence. As was stated above in the
discussion of the concept of simplicity, there are those who argue that
simplicity is to be desired in the choice of combinations of ideas on grounds
other than esthetics. Second, in the process of reduction, the conscious mind
takes an important part in Ibid., p. 39. Ibid., p. 46. Ibid., p. 45. Ibid., p. 124.
JEFFREYS' RULES FOR A THEORY OF INDUCTIVE INFERENCE 7
choosing the general area of investigation and in the period of intensive
preparatory work. The preparatory work usually involves observation and
experimentation. Observation, that is, a strong interaction with the data of a
problem, oftentimes is the key factor in reduction. During this preparatory
phase old combinations of ideas are being disturbed, new combinations are
formed, and the problem of choice is forced on the investigator. Once a
choice is made, the problems of precising and verification must be faced.
Since reducfive inference is far from being completely understood, fruitful
rules governing this kind of inference have not yet been formulated. Ideally,
we should like to have useful rules covering both reductive and inductive
inference. Lacking them, we review a set of rules just for inductive
inference. 1.5 JEFFREYS' RULES FOR A THEORY OF INDUCTIVE
INFERENCE xs Remembering that the most important part of induction is
generalization from past experience and data to predict still unobserved
phenomena, we now review a set of rules put forward by Jeffreys to govern
the process of induction. Rule 1. All hypotheses used must be explicitly
stated and the conclusions must follow from the hypotheses. Rule 2. A
theory of induction must be self-consistent; that is, it must not be possible
to derive contradictory conclusions from the postulates and any given set of
observational data. Rule 3. Any rule given must be applicable in practice. A
definition is useless unless the thing defined can be recognized in terms of
the definition when it occurs. The existence of a thing or the estimate of a
quantity must not involve an impossible experiment. Rule 4. A theory of
induction must provide explicitly for the possibility that inferences made by
it may turn out to be wrong. Rule 5. A theory of induction must not deny
any empirical proposition a priori; any precisely stated empirical
proposition must be formally capable of being accepted in the sense of the
last rule, given a moderate amount of relevant evidence. Jeffreys regards
these five rules as "essential." Rules 1 and 2 impose on inductive logic
criteria already required in pure mathematics. We may add that they are
usually required in economics as well. The third and fifth rules bring to the
fore the distinction between a priori and empirical propositions. Note that
the third rule incorporates elements of Bridgeman's operationalism and,
very importantly, rules out impossible experiments. Finally, Rule 4 xs Cf.,
Jeffreys, Theory of Probability, 1oc. cit.

8 REMARKS ON INFERENCE IN ECONOMICS makes explicit the


distinction between induction and deduction; that is, it imposes on us
recognition of the fact that scientific laws may have to be modified or even
replaced as new evidence accumulates. However, "we do accept inductive
inference in some sense; we have a certain amount of confidence that it will
be right in any particular case, though this confidence does not amount to
logical certainty." o In addition to the five rules stated above, Jeffreys states
three more which are "useful guides." Rule 6. The number of postulates
should be reduced to a minimum. Rule 7. Although we do not regard the
human mind as a perfect reasoner, we must accept it as a useful one and the
only one available. The theory need not represent actual thought processes
in detail but should agree with them in outline. Rule 8. In view of the
greater complexity of induction, we cannot hope to develop it more
thoroughly than deduction. We therefore take it as a rule that an objection
carries no weight if an analogous objection invalidates part of generally
accepted pure mathematics. Rule 6 is essentially a restatement of Ockham's
rule and, as such, is likely to be regarded as acceptable. Rule 7 is indeed
important. It means that a theory of induction must agree in outline with
thought processes, in par- ticular those that play a role in thinking about and
evaluating generalizations or propositions about empirical phenomena.
Finally, Rule 8 does not appear objectionable even though it must be
recognized that there are disputes about the foundations of pure
mathematics. 1.6 IMPLICATIONS OF THE RULES The eight rules listed
in Section 1.5 have important implications for theories of the inductive
process. As Jeffreys remarks, �'� They rule out... any definition of
probability that attempts to define probability in terms of infinite sets of
possible observations, for we cannot in practice make an infinite number of
observations. The Venn limit, the hypothetical infinite population of Fisher,
and the ensemble of Willard Gibbs are useless to us by Rule 3 .... In fact, no
"objective" definition of probability in terms of actual or possible
observations, or possible properties of the world, is admissible. For, if we
made anything in our fundamental principles depend on observations or on
the structure of the world, we should have to say either (1) that the
observations we can make, and the structure of the world, are initially
unknown; then we cannot know our fundamental principles, and we have
no possible starting-point; or (2) that we know a priori something about
observations on the structure of the world, and this is illegitimate by Rule 5.
x, Ibid., p. 9. ao Ibid., p. 11. IMPLICATIONS 'OF THE RULES 9 He goes
on to explain that "the essence of the present theory is that no proba- bility...
is simply a frequency. The fundamental idea is that of a reasonable degree
of belief, which satisfies certain rules of consistency and can in
consequence of these rules be formally expressed by numbers .... ,,o.x Thus,
in terms of de Finetti's classification of probability theories, Jeffreys' is a
subjective theory which attempts to provide consistent procedures for
behavior under uncertainty as contrasted with those subjective theories that
try to characterize psychological and rational behavior under uncertainty. ''
With probability regarded as representing a degree of reasonable belief
rather than a frequency, numerical probabilities can be associated with
degrees of confidence that we have in propositions about empirical
phenom- ena, a distinctive feature of the Bayesian approach to inference.
As Jeffreys puts it, "... there is a valid primitive idea expressing the degree
of confidence that we may reasonably have in a proposition, even though
we may not be able to give either a deductive proof or a disproof of it" o.0;
for example, when considering a particular explanation of an observed
event, a researcher may remark that the explanation is "probably true."
What he means by the phrase "probably true" is that, based on previous
information, studies, and experience, he has a high degree of confidence in
the explanation'. The Bayesian approach, and Jeffreys' theory in particular,
involves a quantifica- tion of such phrases as "probably true" or "probably
false" by utilizing numerical probabilities to represent degrees of
confidence or belief that individuals have in propositions. By using
probabilities in this connection, we automatically allow for the possibility
that a proposition may not be valid in accord with Rule 4. Also, to the
extent that normal thought processes involve associating probabilities with
uncertain propositions, we may also state that the formalization of this
procedure in the Bayesian approach is in accord with Jeffreys' Rule 7. Of
course, the degree of reasonable belief that we have in a proposition, say a
proposition about economic behavior deduced from the permanent income
hypothesis, depends on the state of our current information. There- fore, in
general, a probability representing a degree of reasonable belief that we
have in a proposition is always a conditional probability, conditional on our
present state of information. As our information relating to a particular
proposition changes, we revise its probability or our belief in it. This
process of revising probabilities associated with propositions in the face of
new o.x Ibid., p. 401. o.o. Some classify Jeffreys' theory of probability as
"necessary" or even "objective," since he provides procedures which, if
adopted, produce identical results for different investigators, given the same
data and model. Although this is so, his view of probability as an
individual's degree of reasonable belief is a subjective one. aa Jeffreys,
Theory of Probability, op. cit., p. 15.

10 REMARKS ON INFERENCE IN ECONOMICS Initial information o


p{H Prior probability (1) (2) New data, ] >1 p(y IH) y J Likelihood function
(3) (4) Saves', ] I v("ly,0) theorem ] - Posterior probability (5) (6) Figure 1.1
The process of revising probabilities, given new data. information is the
essence of learning from experience. We shall see in what follows that the
process of revising probabilities representing degrees of belief in
propositions to incorporate new information can be made operational and
quantitative, in accord with Rule 3, by use of a simple rule of probability
theory, namely Bayes' theorem. Schematically the process of revising
probabilities, given new data denoted by y, is represented as in Figure 1.1.
In the upper left-hand boxes, (1) and (2), we indicate that our initial or prior
probability associated with a particular proposition H, p(HIIo), is based on
our initial information I0. This information is generally of various types,
usually a combination of information from previous data and studies,
theoretical considerations, and casual observation. ' In the lower left-hand
boxes, (3) and (4), we show that the probability density function (pdf),
p(ylH), for the new observation, y, given H, the proposition. This pdf is the
well-known likelihood function. Then the prior probability p(HIIo) is com-
bined with the likelihood function p(ylH) by means of Bayes' theorem to
yield the posterior probability p(Hly , Io). The posterior probability p(Hly ,
Io) is seen to depend on both the prior information I0 and the sample
information y. In this way we obtain a revision of our initial prior
probability p(HIIo) to reflect the information in our new data y; that is,
p(HIIo) is transformed via Bayes' theorem into p(Hly , Io). If we were
concerned about a parameter O, we would use the approach shown in
Figure 1.1 but with 0 replacing H; that is, in box (2) we would have p(Ollo)
rather than p(H[Io), where p(Ollo ) is a prior pdf for the parameter 0, given
our initial information. This prior pdf represents our initial beliefs about the
parameter 0 based on our initial information Io. In box (4) we would have
P(Yl 0), the likelihood function. Then, on combining p(OIIo) and P(Yl O)
by use of Bayes' theorem, we would obtain the posterior pdf, p(Oly, Io) in
box (6). This latter pdf incorporates our initial information, as represented
in 24 In Chapter 2 we consider prior information in more detail.
IMPLICATIONS OF THE RULES 1 l the prior pdf, p(O[Io), and our
sample information y. The posterior pdf p(Oly, Io), can be employed to
make probability statements about 0; for example, to compute the
probability that a < 0 < b, where a and b are given numbers. This and other
uses of posterior pdf's are illustrated in subsequent chapters. At present it is
pertinent to emphasize that the posterior pdf represents our beliefs about the
parameter 0 and incorporates both prior and sample information. As the
sample information grows, under very general conditions it will more and
more dominate the posterior pdf which will become more concentrated
about the true value of the parameter. In addition, if two individuals have
different prior pdf's, perhaps because of their having different initial
information, under rather nonrestrictive con- ditions their posterior pdf's
will become similar as they combine additional common data with their
respective prior pdf's, for as the data base grows the information it provides
will swamp the initial prior information. It is of the utmost importance to
realize that the procedure given graphic- ally in Figure 1.1 and described in
the text is operational and applicable in the analysis of a wide range of
models and problems in econometrics and other areas of science. This is as
it should be, since the procedure outlined is central to the inductive process
as Jeffreys and others view it. That there is a unified and operational
approach to problems of inference in econometrics and other areas of
science is a fundamental point that should be appreciated. Whether we
analyze, for example, time series, regression, or "simultaneous equation"
models, the approach and principles will be the same. This stands in
contrast to other approaches to inference that involve special techniques and
principles for different problems. '5 Since, in the past, most econometricians
have employed non-Bayesian techniques in their work, it is useful and
interesting to compare Bayesian and non-Bayesian analyses of a range of
models and problems. In the following chapters this comparative approach
is pursued, since, as Anscombe remarked some years ago about the state of
statistics, "A just appreciation of the situation can only be had by studying
the orthodox and the Bayesian approaches to a variety of statistical
problems, noting what each one does and how well it does it." 0.6 0.5 B. de
Finetti, in a lecture at Frascati in June 1968, used I. J. Good's phrase, "ad
hockeries," to describe this aspect of non-Bayesian approaches to inference.
ae F. J. Anscombe, "Bayesian Statistics," Am. Statisn., 15, 21-24 (1961), p.
21. The Reverend Thomas Bayes was an English Presbyterian minister who
was born in 1702 and died in 1761.-His paper, "An Essay Toward Solving a
Problem in the Doctrine of Chances," was published posthumously by
Richard Price in Phil. Trans. Roy. Soc. (London), 53 (1763), 370-418, and
is reprinted in Biometrika, 45, 293-315 (1958). See also Facsimiles of Two
Papers by Bays (with commentary by W. Edwards Deming). New York:
Hafner, 1963.

12 REMARKS ON INFERENCE IN ECONOMICS QUESTIONS AND


PROBLEMS 1. What are some examples of simple economic theories or
models that have been particularly fruitful ? 2. Is it the case that simple
theories have been more useful than complicated theories in economics ? 3.
Are there any problems associated with representing degrees of belief about
hypotheses on a unidimensional probability scale ? 4. Are economic
theories expressed in terms of a few mathematical equations and parameters
always simpler than those expressed in terms of a larger number of
equations and parameters ? (See H. Jeffreys, Theory of Probability (1966
ed.), pp. 47-49, for an interesting discussion of this issue.) $. In economics
what are some instances wherein recognition of an unusual fact or statistical
regularity led to the formulation of a new economic theory ? 6. What are
examples of economic theories that are combinations of ideas and concepts
from several different areas of knowledge ? 7. With respect to any
competing economic theories, for example, the absolute and permanent-
income theories of consumer behavior, provide numerical probabilities to
represent your degrees of belief in alternative theories. On what kinds of
evidence do you base your degrees of belief in competing theories ? 8. If, in
the process of reduction, a strong interaction with the data of a problem is
considered important in formulating a theory, how can it be established that
the theory so derived is more than a description of the facts ? 9. What
accounts for the usually observed fact that different researchers have
differing degrees of belief in a particular economic theory or proposition ?
10. If different researchers have widely divergent beliefs about the validity
of a theory, is this to be construed as evidence against the theory ? 11. In
what ways can prior beliefs about the nature of economic phenomena
condition the design and formulation of a research project, say one to deter-
mine the economic effects of public regulation of private utility companies
? 12. Can the use of prior knowledge and beliefs in formulating a research
project condition and even vitiate the final research results ? 13. In the area
of economic policy making, what are some examples of policy
recommendations that are critically dependent on degre e of belief in one or
another economic theory ? 14. What are some specific cases in which those
concerned with economic policy appear to have differing degrees of belief
about particular economic theories ? CHAPTER II Principles of Bayesian
Analysis with Selected Applications In this chapter we view some basic
principles and concepts of Bayesian analysis and provide analyses of some
relatively simple but important models and problems. 2.1 BAYES'S
THEOREM An essential element of the Bayesian approach is Bayes's
theorem, als0 referred to in the literature as the principle of inverse
probability. Here we state the theorem for continuous random variables. Let
p(y, 0) denote the joint probability density function (pdf) 2 for a random
observation vector y and a parameter vector 0, also considered random. The
parameter vector 0 may have as its elements coefficients of a model,
variances and covariances of disturbance terms, and so on. Then, according
to usual operations with pdf's, we have p(y,0) = p(yl 0) p(0) (2.1) = p(0ly )
p(y) and thus p(O) p(ylO) (2.2) p(Oly) = p(y) , x In problems involving
"inverse probability" we have given data and from the informa- tion in the
data try to infer what random process generated them. On the other hand, in
problems of "direct probability" we know the random process, including
values of its parameters, and from this knowledge make probability
statements about outcomes or data produced by the known random process.
Problems of statistical estimation are thus seen to be problems in "inverse
probability," whereas many gambling problems are problems in "direct
probability." 2 Here and below we use the symbol p to denote pdf's
generally and not one specific pdf. The argument of the function p as well
as the context in which it is used will identify the particular pdf being
considered. 13

14 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS with p(y) 0. 2 We can write this last expression as
follows' p(Oly) oc p(ylO) (2.3) oc prior pdf x likelihood function, where oc
denotes proportionality, p(01y ) is the posterior �df�or the parameter
vector 0, given the sample information y, p(0) is the prior pdf 4 for the
param- eter vector 0, and p(y[0), viewed as a function of 0, is the well-
known likeli- hood function. 5 Equation 2.3 is a statement of Bayes's
theorem, a simple mathematical result in the theory of probability. Note that
the joint posterior pdf, p(0]y), has all the prior and sample_ information
incorporated in it. The prior information enters the posterior pdf via the
prior pd�, whereas all the sample information enters via the likelihood
function. In this latter connec- tion the "likelihood principle" states that
p(yl0), considered as a function of 0. ". 2. constitutes the entire evidence of
the experiment, that is, it tells all that the experiment has to tell." 6 The
posterior pdf is employed in the Bayesian approach to make inferences
about parameters. Example 2.1. Assume that we have n independent
observations, y'= (y, y.,..., y,), drawn from a normal population with
unknown mean and known variance ' We wish to obtain the posterior pd�
for/. CyO 2. Applying (2.3) to this particular problem, we have (2.4) P(IY,
fro ') oc p()p(Ylt, %'), where p(p]y, 0 ') is the posterior pdf for the parameter
, given the sample information y and the assumed known value %', p() is
the prior pdf for t, and �(y]p, %'), viewed as a function of the unknown
parameter is the likelihood function. The likelihood function is given by or
(2.5) P(Ylm a) = (2o) -n exp 20 . = (y -/)- = (2rr%a)-n/a exp [ 2oa[Vsa+n(l-
l)']], a The quantity �(y), the reciprocal of the normalizing constant for the
pdf in (2.2), can be written as p(y) = J'p(0)�(y10) dO. 4As noted in
Chapter 1, the prior pdf depends on the state of our initial information
denoted by Io. Here, to simplify the notation, we do not show this
dependence explicitly; that is, we write p(0) rather than p(O[1o). The
likelihood function is often written as /(0]y) to emphasize that it is not a
pdf, whereas p(y10) is a pdf for the observations given the parameters. e L.
J. Savage, "Subjective Probability and Statistical Practice," in L. J. Savage
et ai., The Foundations of Statistical Inference. London and New York:
Methuen and Wiley, 1962, pp. 9-35, p. 17. Savage presents a discussion of
the likelihood principle and provides references to earlier literature.
BAYES' THEOREM 15 where v = n - 1, = (l/n) ?_-x yt, the sample mean,
and s a --- (l/v) (y, - )', the sample variance? As regards a prior pdf for , we
assume that our prior information regarding this parameter can be
represented by the following univariate normal pdf: ] 1 exp 2r7,. (v - v,)a ,
(2.6) p() = % where is the prior mean and , is the prior variance, parameters
whose values are assigned by the investigator on the basis of his initial
information. Then, on using Bayes's theorem to combine the likelihood
function in (2.5) and the prior pdf in (2.6), we obtain the following posterior
pdf for : (2.7) P(IY, o ') oc p(tz)p(Ylm o ) from which it is seen that is
normally distributed, a posteriori, with mean (2.8) Ep ct ' + %'/n = (%'/n) -
+ (%)- and variance given by %%o'/n 1 (2.9) Vary) = = ' + %a/n (%'/n) - +
(rr,f') - Note that the posterior mean in (2.8) is a weighted average of the
sample mean g and the prior mean p, with the weights being the reciprocals
of %'/n and %. If we let h0 = (*oa/n) - and h = (%)-, then Ep = (gho + ph)
[(ho + h), where the h's are often referred to as "precision" parameters. Also
we have Var(p)= 1/(h0 + h) from (2.9), and thus the precision 'parameter
associated with the posterior mean is just [Var(p)] - = ho + h, the sum of the
sample and prior precision parameters. To provide some illustrative
numerical results suppose that in Example 2.1 our sample of n = 10
observations is The expression in the exponent in the econd line of (2.5) is
obtained by using the following result: =i (Y )2 = =l [(Y - ) - ( - )] = =i (Y -
)2 + n(- g) with the cross product term .(y- g)- g) disappearing, since Zt. (y-
) = 0.

BAYES' THEOREM AND SEVERAL SETS OF DATA 17 16


PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED
APPLICATIONS Observation Observations Number y 1 0.699 2 0.320 3
-0.799 4 -0.927 5 0.373 6 -0.648 7 1.572 8 -0.319 9 2.049 10 -3.077 Sample
mean: 10 10 Yt : -0.0757 where the y[s are independently drawn from a
normal population with unknown mean v and known variance o. = %o. =
1.00. Assume that our prior information is suitably represented by a normal
pdf with prior mean Va-- -0.0200 and prior variance, %o.= 2.00. This prior
pdf, which is plotted in Figure 2.1, represents our initial beliefs about the
unknown parameter v. On combining this prior pdf with the likelihood
function, the posterior pdf is given by the expression in (2.7). For the
particular sample shown above, with mean = -0.0757 and the values of the
prior parameters v, = -0.02 and %o. = 2.00, the mean of the posterior pdf
from (2.8) is E/: - 0.0757/0.100 - 0.0200/2.00 1/0.100 + 1/2.00 and its
variance from (2.9) is = -0.0730 1 Var () = 1/0.100 + 1/2.00 = 0.0952. For
comparison with the prior pdf the posterior pdf is plotted in Figure 2.1. It is
seen that combining the information contained in just 10 independent
observations with our prior information has resulted in a considerable
reduction in our uncertainty about the parameter V; that is, our prior
variance is %' = 2.00, whereas the variance of our posterior pdf is 0.0952.
In addition, our posterior mean EV = -0.0730 is not very different from =
-0.0757, the sample mean, but is quite a bit larger in absolute value than our
prior mean, t, = -0.0200. Note, however, that our prior pdf has a substantial
variance, %'= 2.00, and thus initially there is substantial 1.4 1.2 Posterior
pdf Posterior mean: E =-0.0703 Posterior variance: Var = 0.0952 1.0 0.8 0.6
0.4 0.2 Prior pdf Prior mean: gt a = -0.02 Prior' variance: o'2a = 2,00 0.0
-4.0 -3.0 -2.0 -1.0 0 1.0 2.0 3.0 4.0 Figure 2.1 Plots of prior and posterior
pdf's for /. The prior and posterior pdf's are shown in (2.6) and (2.7),
respectively. probability density in the vicinity of -0.0730; that is, in this
case our prior information is somewhat "vague" or "diffuse" in relation to
the information in the sample. 2.2 BAYES'S THEOREM AND SEVERAL
SETS OF DATA If initially our prior pdf for a parameter vector 0 is p(0)
and we obtain a set of data yx with pdfp(yx10), then from (2.3) the
posterior pdf is (2.10) p(01y0 oc p(0) p(yxl0). If we now obtain a new set of
data, yo., generated independently of the first set, with pdfp(yo. 10), we can
form the posterior pdf for 0 as follows. Use the posterior pdf in (2.10) as the
prior pdf in the analysis of the new set of data ya to obtain by means of
Bayes's theorem (2.11) p(Olyx, Y.) cc p(Olyx) p(Y.Io), where p(01yx, is the
posterior pdf based on the information in p(0) and the two samples of data
yx and y.. It is interesting to note that, since p(01yx) cc p(0) p(yx[0) from
(2.10), (2.11) may be written as (2.12) p(Olyx, y.) ccp(O) p(yxlO) p(y.lO).
18 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED
APPLICATIONS In (2.12)p(yxl0)p(yo. 10) is the likelihood function for 0
based on the com- bined samples yx and yo.. Therefore it is the case that we
obtain the same posterior pdf for 0, whether we proceed sequentially from
p(0) to p(0lyx) and then to p(0lyx, y.) or whether we use the likelihood
function for the combined samples p(yx, yo.[0) in conjunction with the
prior pdf p(0). This general feature of the process of combining information
in a prior pdf with informa- tion in successive samples can easily be shown
to hold for cases involving more than two independent samples of data. 2.3
PRIOR PROBABILITY DENSITY FUNCTIONS The prior pdf, denoted
p(0) in (2.3), 8 represents our prior information about parameters of a
model; that is, in the Bayesian approach prior information about parameters
of models is usually represented by an appropriately chosen pdf. In
Example 2.1, for example, prior information about a mean p is represented
in (2.6) by a normal pdf with prior mean t and variance rr '. The prior mean
and variance/ and a ' are assigned values by the investigator in accord with
his prior information about the parameter/. If this normal prior pdf is judged
an adequate representation of the available prior informa- tion, it can be
used, as demonstrated above, to obtain the posterior pdf for /. On the other
hand, if the prior information is not adequately represented by a normal
prior pdf, another prior pdf that does so will be used by the investigator. To
take a specific example, if we have a scalar parameter O, say a proportion,
that by its very nature is limited to the closed interval 0 to 1, it would not be
appropriate to employ a normal prior pdf for O, since a normal pdf does not
limit the range of O to the closed interval 0 to 1. The pdf chosen for O
should be one, say possibly a beta pdf, that can incorporate the available
information on the range of 0. Considerations of this sort point up the
importance of exercising care and thought in choosing a prior pdf to
represent prior information. As regards the nature of prior information, we
recognize that it may be information contained in samples of past data
which have been generated in a reasonably scientific manner and which are
available for further analysis. When a prior pdf represents information of
this kind, we shall term such a prior pdf a "data-based" (DB) prior. In other
cases prior information may arise from introspection, casual observation, or
theoretical considerations; that is, from sources other than currently
available samples of past data of the kind described above. When a prior
pdf represents information of this kind, we refer to it as a "nondata-based"
(NDB) prior. Although in many 8 In general, the pdf p(0) will involve some
prior parameters that we have not shown explicitly in order to simplify the
notation. PRIOR PROBABILITY DENSITY FUNCTIONS 19 situations
prior pdf's represent both DB and NDB information, we think that the
distinction between these two kinds of information is worth making, since
they obviously have somewhat different characteristics. It is extremely
difficult to formulate general precepts regarding the appro- priate uses of
the two kinds of prior information mentioned above, since much depends
on the objectives of analyses; for example, if an individual wishes to
determine how new sample information modifies his own beliefs about
parameters of a model and his initial information is NDB, he will, of
course, use a NDB prior pdf in conjunction with a likelihood' function to
obtain a posterior pdf. Then, on comparing his posterior pdf with his NDB
prior pdf, he can determine how the information in his sample data has
modified his initial NDB beliefs, a fundamental operation in much scientific
work. Again, if an economist is carrying through an analysis of sample data
in order to make a'policy decision, he may indeed incorporate NDB as well
as DB prior information in his analysis to ensure that his final decision will
be based on all his available information, prior and sample. Although the
above uses of NDB prior information are extremely valuable, it must be
noted that one person's NDB prior information can differ from that of
another's. In a research situation this is just another way of stating that
different investigators may have different views, a not unusual state of
affairs; for example, in the early days of Keynesian employment theory
there were some old line quantity theorists who argued that the investment
multi- plier could be negative, zero, or positive. These views conflicted with
those of the Keynesians, who argued, on the basis of theoretical
considerations and casual observation, that the multiplier is strictly positive.
Given a model for observations involving the multiplier and data, it is
possible to compute a posterior pdf for the investment multiplier to
determine what the information in the data has to say about the value of the
multiplier. An analysis of this sort might yield the conclusion that the
probability that the multiplier will be negative is negligibly small. Thus
information in data can be employed to make comparisons of alternative
prior beliefs or hypotheses. Specific tech- niques for making such
comparisons are provided in Chapter 10. In addition, a framework for
making a choice among alternative conflicting beliefs or hypotheses which
utilizes information in sample data is described and applied. It.is possible
that two investigators working with the same model and DB prior
information can arrive at different posterior beliefs if they base their prior
information on different bodies of past data. These investigators can be
brought into agreement by pooling their past samples of data and thereby
providing them with the same DB prior information. Whether prior
information is DB or NDB, it is conceivable that there is little prior
information; for example, there may be no past sample data

20 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS information available. A situation involving NDB prior
information may be one in which an investigator has vague ideas about the
phenomenon under study and in which case we refer to our prior
information as "vague" or "diffuse." If our prior information relates to
parameters of a model and is vague or diffuse, we employ a "diffuse" prior
pdf in the analysis of our data. Various considerations and principles used in
obtaining "diffuse" prior pdf's are discussed in the appendix to this chapter.
To illustr(te the use of a diffuse prior pdf consider the following example. c
) Example 2.2. Consider n independent observatiorl,s.. y'= (y,y.,...,yn)
drawn from a normal population with unknown mean/ and known standard
deviation r = %. Assume that our prior information regarding the value of /
is vague or diffuse. To represent knowing little about the value of/, we
follow Jeffreys (see the appendix to this chapter) by taking (2.13) p(t) oc
constant as our prior pdf. 9 Then the posterior pdf for/, p([y, e = e0) is given
by (2.'14) P(IY, = oc = cc exp - where l(/[y, = e0) is the likelihood function
and fi = _-x y/n, the sample mean. It is seen that the posterior pdf is normal
with mean/2 and variance %2/n. This same result would be obtained in
Example 2.1 if there we spread out the normal prior pdf for (i.e., allowed e -
co). When we have NDB prior information that we wish to incorporate in
an analysis, the problem of choosing a prior pdf to represent the available
prior information must be faced. Ideally, we should like to have a prior pdf
represent our prior information as accurately as possible and yet be
relatively simple so that mathematical operations can be performed
conveniently; for example, in Example 2.1 our prior information about a
mean was assumed to be adequately represented by the normal pdf in (2.6),
which is relatively simple and mathematically convenient. We shall see
from what follows that 0 This prior pdf is improper; that is I_o P(t0 dr* is
not finite. Jeffreys and others make extensive use of improper prior pdf's to
represent "knowing little." Jeffreys remarks in his Theory of Probability
(3rd ed.). Oxford: Clarendon, 196 I, p. 1 ! 9, that use of improper pdf's
poses no difficulty, and in fact Renyi's axioms and his accompanying
definition of conditional probability can be used to state Bayes' theorem
when improper prior pdf's are employed. See D. V. LindIcy, Introduction to
Probability and Statistics from a Bayesian Viewpoint. Part 1. Probability.
Cambridge: Cambridge University Press, 1965, pp. 11, 13, for a brief
discussion of this point. MARGINAL AND CONDITIONAL POSTERIOR
DISTRIBUTIONS FOR PARAMETERS 21 (2.6) is an example of a
"natural conjugate" prior pdf. x� Such prior pdf's are often useful in
representing prior information, relatively simple, and mathe- matically
tractable. We now explain the definition of a natural conjugate prior pdf. Let
p(y[0, n) be the pdf for an n x 1 vector of observations y, where 0 is a
parameter vector. If p(y[0, n) = p(t10, n) p.(y), where t' = (tx, t.,..., t), with tt
= t(y) a function of the observations and p9.(y) does not depend on 0, then
the t's are defined as sufficient statistics. A natural conjugate prior pdf for 0,
say f(01 .), is given by f(0[ .) oc p(t10, n), with the factor of proportion-
ality depending on t and n but not 0. It is seen that f(01 '), defined in this
way, has a functional form precisely the same as that as p(t10, n); however,
the argument off is 0 and its k + 1 parameters are the elements of t and n. To
represent prior information an investigator assigns values to t and n, say to
and no, to obtain f(0[t0, no) oc p(to10, no) as his informative prior pdf. ' As
an example of a natural conjugate prior pdf, consider the data in Example
2.2 with rr -- 1. We have p(yl/, n) = (x/)-" exp [-� Y.-- (y -/)'] =
(V)_,exp{_�[( n _ 1)s o. + n(tz _ fi)o.]}, where = =y/n and (n- 1)s �'= Y2-
-x (Y-/2) 2. Then we can write p(yim n)- px(lmn)po.(y), with pxlp, n)--exp
and p.(y)= (X/) -" exp I-(n- 1)f/2]. Clearly/2 is a sufficient statistic for p,
and the natural conjugate prior pdf for p, f(Pl'), is: f(pl/2o, no) = c exp [-
(no/2)(p - o)�'], with' c = nX/no/2, a normal pdf with prior mean/20 and
prior variance 1/no. To use this prior pdf an investigator should check that it
adequately represents his prior informa- tion and, if it does, supply values
for its parameters rio and no. 2.4 MARGINAL AND CONDITIONAL
POSTERIOR DISTRIBUTIONS FOR PARAMETERS As with usual joint
pdf's, marginal and conditional pdf's can be obtained from a joint posterior
pdf; for example, let 0 be partitioned as 0' = (0' i 0') and suppose that we
want the marginal posterior pdf for 0 which may x0 See H. Raiffa and R.
Schlaifer, Applied Statistical Decision Theory. Boston: Graduate School of
Business Administration, Harvard University, 1961, Chapter 3, for a
detailed discussion of natural conjugate prior pdf's. xx See, for example,
Lindley, Introduction to Probability and Statistics from a Bayesian
Viewpoint. Part 2. Inference. Cambridge: Cambridge University Press,
1965, pp. 46 if. for further discussion of sufficient statistics. xa Note that
when p(y10, n) = px(t10, n) pa(y) and p(0) is a prior pdf for 0, the posterior
pdf, p(0ly, n) is p(0ly, n) oc p(0)p(yl0, n) oc p(0)px(t[0, n). As explained in
Section 2.11, for large n, p(01y, n) is approximately proportional to px(t10,
n) under rather general conditions. Thus in large samples the posterior pdf
assumes the form of px(t10, n), which is also the form of the natural
conjugate prior pdf.

22 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS contain one or several elements of 0. This marginal
posterior pdf, is readily obtained as follows' (2.15a) p(Oly ) =/ p(o, o.ly)
doe 02 (2.15b) = j',, p(01102, y) ;,(021y) do2, 02 where R0 denotes the
region of 0. and p(0x10o., y) is the conditional posterior pdf for 0, given 0.
and the sample information y. Equation 2.15b illustrates the fact that the
marginal posterior pdf for 0x may be viewed as an averaging of conditional
posterior pdf's, p(0]0., y), with the weight function being the marginal
posterior pdf for 0., p(0o. ly). The integration shown in (2.15) provides an
extremely useful way of getting rid of "nuisance" parameters, that is
parameters that are not of special interest. Example 2.3. Assume that we
have n independent observations, y'= (y, yo.,..., Yn), from a normal
population with unknown mean t and unknown standard deviation *. If
our_p_.ri9r information about values of the mean and standard deviation is
vague or diffuse, we can represent this state of our initial information by
taking our prior pdf as In (2.16) we have assumed t and e to be
independently distributed, a priori, with t and log a each uniformly
distributed [see the appendix at the end of this chapter for further discussion
of (2.16)]. Then the joint posterior pdf for t and e is (2.17) p(, plY) ca P(,
e)l(, plY), 1 [vs o. + n(/- ca e -(n+) exp where l(t, plY) ca e- exp [- 1/2e �'
Y2= (Y - t) �'] is the likelihood function, v=n-1, = Y2_-xydn, and vs �'=
y�_(y{-)'. From the form of (2.17) it is clear that the conditional posterior
pdf for t given e and the sample information, that is, p(lla, y), is in the
univariate normal form with conditional posterior mean and variance E(vle,
y) -- and Var(vI e, y) -- e�'/n, respectively. Although these conditional
results are of interest, it is clear that the conditional pdf for t given e and y
depends critically on e whose value is unknown. If we are mainly interested
in v, e is a nuisance parameter MARGINAL AND CONDITIONAL
POSTERIOR DISTRIBUTIONS FOR PARAMETERS 23 and, as stated
above, such a parameter can generally be integrated out of the posterior pdf.
In the present instance we have TM (2.18) o o p(m plY) do 1 [vs o. e -('+ x>
exp oc {rs ' + n(/ - �)'}-(+ x)/.. + n(/ -/)o.1} de From (2.18) it is seen that
the marginal posterior pdf for t is in the form of a univariate Student t pdf
with mean p; that is, the random variable has a Student t pdf with v = n - 1
degrees of freedom. If the parameter e is of interest, we can integrate P(I,
PlY) in (2.17) with respect to t to obtain the marginal posterior pdf for e,
namely, s (2.19) p(ely) = ? p(m ply) ca e -(+> exp --], This posterior pdf for
e is in the "inverted gamma" form (see Appendix A) and will be proper for
v > 0. Further, from properties of (2.19) we have I'(,,/2) s for v > 1 and vS 2
Var(ely ) _- [E(ely)F' for v > 2. The modal value of the posterior pdf in
(2.19) is sx/V/(v + 1). a Note that I .-(n+) exp (--a/2( a) de = 2(n-
a>aP(n/2)/a nta. This result is easily obtained by letting x = a/2( �'. Then
the integral becomes (2 {n - "'/a')I x - a>tae- x dx = 2(n-a>taF(n/2)/a nta,
where I' denotes the gamma function. In using this result in (2.18), a = vs
�' + n(t - ) and the factor 2(n-�'>taF(n/2) is absorbed in the factor of
proportionality. See Appendix A at the end of the book for properties of this
pdf. xe In integrating (2.17) with respect to/, note that, for given a, (2.17) is
in the univariate normal form. Letting z = x/(V- )/c, dz oc dt/(r, and thus
(2.17) becomes 1/a n exp (-vsa/2a ) exp (-za/2) de dz from which (2.19)
follows.

24 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS 2.5 POINT ESTIMATES FOR PARAMETERS From
Section 2.3 above it is seen that the Bayesian approach yields the complete
posterior pdf for the parameter vector 0. If we wish, we may characterize
this distribution in terms of a small number of measures, say measures of
central tendency, dispersion, and skewness, with a measure of central
tendency to serve as a point estimate. The problem of choosing a single
measure of central tendency is a well-known problem in descriptive
statistics. In some circumstances we may have a loss function, say L = L(0,
), where = (y) is a point estimate depending on the given sample
observations y' = (y,..., Yn). Since $ is considered random, L is random.
One generally used principle, which generates point estimates and which is
in accord with the expected utility hypothesis, is to find the value of that
minimizes the mathematical expectation of the loss function; that is (2.20)
min EL(O, 0) = min fR L(0, 0) p(0ly ) dO, which assumes that EL(O, fi) is
finite and that a minimum exists. As an important illustration of (2.20),
consider the case of a quadratic loss function, L = (0 - )'C(O - ), where C is
a known nonstochastic positive definite symmetric matrix. Then the
posterior expectation of the quadratic loss function is 6 (2.21) E� = E(0 -
)'C(0 - ) = r[(0 - r0) - ( - r0)]'C[(0 - r0) - ( - = r(0 - r0)'C(0 - r0) + ( - r0)'C(0
- r0). The first term of this last expression does not involve . The remaining
term (- EO)'C(- EO) is nonstochastic and will be minimized if we take =
E0, given that C is positive definite. Thus for positive definite quadratic loss
functions the mean E0 of the posterior pdf p(01y), if it exists, is an optimal
point estimate. For other loss functions similar analysis can be performed to
yield optimal point estimates. Example 2.4. Consider Example 2.1 when
our loss function is L, fi) = c(p - t) 2, where t; is a point estimate, and c is a
positive constant. Then, taking t = EI = (hog + h,a%)/(ho + h,,), the mean of
the.posterior pdf for t will minimize EL -- cE(t - fi)o.. x6 In the second line
of (2.21) the posterior mean E0 has been subtracted from 0 and added to �
which does not affect the value of EL. In going from the second line of
(2.21) to the third, the cross terms disappear; that is E(0- E0)'C(�- E0)= 0
since E(O - EO) = O. POINT ESTIMATES FOR PARAMETERS 25
Example 2.5. Suppose that our loss function is L = 10 - 01 and the posterior
pdf for 0 is a proper continuous pdf, p(0[y), with a < 0 < b where a and b
are known. Then the point estimate 0 which minimizes expected loss can be
found as follows: I 0 - OI p(01y) dO (0 - 0) p(01y) dO + I; (0 - O) p(0ly) dO
=OP(Oly) - ft o p(oly) dO + fb � Op (Oly)dO-0[,-P(OlY)l, where P(OIy) --
j'o p(Oly ) dO is the cumulative posterior distribution function. Then, on
differentiation 7 with respect to O and setting the derivative equal to zero,
we have dEL dO = (0IY) - 1 + (0ly) -- o or J>(01y) = �. The 0 which
satisfies this necessary condition for a minimum is the median of the
posterior pdf. That this value for 0 produces a minimum of EL can be
established by noting that d'EL/dO ' is strictly positive for 0 = median of the
posterior pdf. Thus for the absolute error function L = [0 - 0[ the median of
the posterior pdf is an optimal point estimate. Next, we review a
relationship between Bayesian and sampling theory approaches to point
estimation. Let � = �(y) be a sampling theory esti- mator. 6 The risk
function associated with the estimator � is given by (2.22) r(0) = fR L(0,
�) p(y10) dy, y where L(0, it) is a loss function, p(y10) is a proper pdf for
y, given 0, and the integral in (2.22) is assumed to converge. As indicated
explicitly in (2.22), the risk function depends on the value of the unknown
parameter vector 0. Since it is impossible to find a which minimizes r(0) for
all possible values of 0, TM 7 It is assumed that the needed derivatives exist
for a < 0 < b. 8 As is well known, the term "estimator" indicates that = (y) is
regarded as a random quantity. x* For example, if we take $ = b, a vector of
constants, this "estimator" will have smaller risk when 0 = b than any other
estimator and thus no single estimator can minimize r(0) for all 0.

26 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS we shall seek the estimator that minimizes average risk
when average risk is defined by (2.23) Er(O) = fR p(O) r(O) dO. o In
(2.23)p(0) is a "weighting function" used to weight the performance of �,
an estimator, in regions of the parameter space. Then our problem is to find
the estimator that minimizes average risk, that is, that solves the following
problem: (2.24) min Er(O) = min fR f p(0) L(0, �) p(y10) dy dO. � � o y
Given that the integrand of (2.24) is non-negative, we can interchange the
order of integration and, using p(0)p(yl 0) --p(y) p(01y), write (2.24) as
(2.25) min Er(0) = min f [f L(0, �)p(01y)dO] p(y) dy. � y o The � that
minimizes the expression in square brackets will minimize expected risk,
provided that Er(O) is finite, and this estimator is, by definition, the Bayes
estimator. '� Therefore, if a specification is made for the seriousness of
estimation errors in the form of a loss function, L(0, $), and for the
weighting of parameter values over which good performance is sought by a
choice of p(0), then on an average risk criterion the Bayesian estimator
gives the best performance in repeated sampling. a 9.0 When the double
integral in (2.25) converges, and thus Er(O) is finite, the solving the
minimization problem in (2.25) will also be a solution to the minimization
problem in (2.20). If the double integral in (2.25) diverges, however, the
minimization problem in (2.25) will have no solution, but still a solution to
the problem in (2.20) often exists. When this is the situation, the solution to
the minimization problem in (2.20) has been called a quasi-Bayesian
estimator. Quasi-Bayesian estimators often arise when improper diffuse
prior pdf's are employed along with usual loss functions, for example,
quadra loss functions. For further discussion of this point see H. Thornber,
"Applications Decision Theory to Econometrics," unpublished doctoral
dissertation, University of Chicago, 1966, and M. Stone, "Generalized
Bayes Decision Functions, Admissibility and the Exponential Family," Ann.
Math. Statist., 38, 818-822 (1967). 2x The relevance of the criterion of
performance in repeated samples is questioned by some. They want an
estimate that is appropriate for the given sample data and thus will solve the
problem in (2.20) which involves no averaging over the sample space Ry.
When the solution to (2.20) is identical to the solution of (2.24), as it often
is, this consideration makes no prac.t. ical difference. On the other hand,
many sampling theorists object to the introduction of the "weighting
function" (prior pdf)p(0) and therefore do not attach much importance to
the minimal average risk property of Bayesian estimators. BAYESIAN
INTERVALS AND REGIONS FOR PARAMETERS 27 2.6 BAYESIAN
INTERVALS AND REGIONS FOR PARAMETERS Given that the
posterior pdfp(01Y) has been obtained, it is generally possible to compute
the probability that the parameter vector 0 lies in a particular subregion, , of
the parameter space as follows' (2.26) Pr(0 e -IY) = fP(0lY) dO. The
probability in (2.26) measures the degree of belief that 0 e given the sample
and prior information. If we fix the probability in (2.26), say at 0.95, it is
generally possible to find a region (or interval) , not necessarily unique,
such that (2.26) holds. In many important problems with unimodal posterior
pdf's, it is possible to obtain a unique region (or interval) by imposing the
conditions that its probability content be/g, say/g -- 0.95, and that the
posterior pdf's values over the region or interval be not less than those
relating to any other region with the same probability content; for example,
for unimodal symmetric posterior pdf's the region or interval with given
probability content/g, which is centered at the modal value of the posterior
pdf is the Bayesian "highest posterior density" region or interval. '' Example
2.6. Consider Example 2.3 in which it was found that the posterior pdf of (t
- i)/s', where s' = s/V' is a Student t pdf with v = n - 1 degrees of freedom.
Thus the probability that t will lie in a particular interval, say + ks', with k
given, can easily be evaluated by using tables of the Student t distribution.
'a Alternatively, k can be determined so that the posterior probability that -
ks' < tz < + ks' is a given value, say/ = 0.90. The interval so obtained, P. +_
ks', is numerically exactly the same as a sampling theory confidence
interval but is given an entirely different interpretation in o.o. See G. E. P.
Box and G. C. Tiao, "Multiparameter Problems from a Bayesian Point of
View," Ann. Math. Statist., 36, 1468-1482 (1965), for further discussion of
"highest posterior density" Bayesian regions. In general, if we seek a
"highest" interval with probability content 18 for a unimodal pdf, p(x), it
can be obtained by solving the following problem: minimize (b- a) subject
to p(x)dx = 18. On differentiating b- a + [ p(x) dx - 18], where ; is a
Lagrange multiplier, partially with respect to a and b and setting these
derivatives equal to zero, yields 1 + p(a) = 0 and 1 + p(b) = 0, and thus a
and b must be such that p(a) = p(b) for these necessary conditions to be
satisfied. Determining a and b such that p(x)dx = 18 with p(a) = p(b) leads
to a shortest interval with probability content 18, and this interval will be a
"highest" interval given that p(x) is unimodal. In the example above in
which z is a standardized normal variable p(z) is unimodal and symmetric
about zero. Thus taking a = -z and b = z satisfies the condition p(a) = p(b).
o.a See, for example, N. V. Smirnov, Tables for the Distribution and Density
Function of t. Distribution. New York: Pergamon, 1961.

28 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS the Bayesian approach. As is well known, the sampling
theorist regards his interval as random and having probability/ = 0.90 of
covering the true value of the parameter. For the Bayesian whose work is
conditional on the sample observations the interval +_ ks' is regarded as
given and his state- ment is that the posterior probability that Ix will lie in
the interval is/ = 0.90. Note that the probability statements being made by
the sampling theorist and the Bayesian are not identical. 2.7 MARGINAL
DISTRIBUTION OF THE OBSERVATIONS In certain instances it is of
interest to obtain the marginal pdf for the observations, denoted by p(y).
This pdf can be obtained as follows: (2.27) P(Y) = fo p(o, y) do p(yl o) p(o)
dO. The second line of (2.27) indicates that the marginal pdf of the
observations is an average of the conditional pdfp(yl0 ) with the prior
pdfp(0) serving as the weighting function. Example 2.7. Let yx be an
observation from a normal distribution with unknown mean Ix and known
standard deviation a = %. Then PREDICTIVE PROBABILITY DENSITY
FUNCTIONS 29 Thus the marginal pdf for y is normal with mean Ix., the
prior mean for Ix, and variance a. ' + ao '. Since Ix., a. ', and ao ' are
assumed known, it is possible to use P(YO to make probability statements
about y, a fact that is often useful before yx is observed. 2.8 PREDICTIVE
PROBABILITY DENSITY FUNCTIONS On many occasions, given our
sample information y, we are interested in making inferences about other
observations that are still unobserved, one part of the problem of prediction.
In the Bayesian approach the pdf for the as yet unobserved observations,
given our sample information, can be obtained and is known as the
predictive pdf; for example, let y represent a vector of as yet unobserved
observations. Write (2.28) p(y, 01y) -- p(y[0, y) p(0ly ) as the joint pdf for
and a parameter vector 0, given the sample information y. On the right of
(2.28) p(y[0, y) is the conditional pdf for y, given 0 and y, whereas p(0[y) is
the conditional pdf for 0 given y, that is, the posterior pdf for 0. To obtain
the predictive pdf, P(YlY), we merely integrate (2.28) with respect to 0; that
is p(YlY) = f p(Y, Oly) dO o (2.29) 1 exp go �' (y _ Ix)o.. -- p([0, y)p(0ly)
dO. p(y[ix, a = ao) = ao o If the prior pdf for Ix is p(ix)= (x/,,)-exp [-(2a,.)-
(ix- Ixf], -oo < Ix < c, where Ix and tr are the prior mean and standard
deviation, respectively, the marginal pdf for y is P(YO = , p(y [ix, a = %)
p(ix) dix 1 .-'o. Ix)o. + (IX _ 71 -- exp ao a. I} aIx' On completing the
square for Ix in the exponent and performing the integra- tion, 2 the result is
exp [- (yx - Ix,)o. ]. 2(a, �' + ,o')/ 1 P(Yx) = x/2r(aff + %o.) 2 A less
tedious way to derive P(YO is to write yx = + e with e, a scalar random
variable normally distributed and independent of , with zero mean and
variance %2. Then the mean of yx is ., the mean of , and the variance of yx
is %2 + .oO.. Since yx is linearly related to and {, it will have a normal pdf.
The second line of (2.29) indicates that the predictive pdf can be viewed as
an average of conditional predictive pdf's, p(]0, y), with the posterior pdf
for 0, p(0ly) serving as the weighting function. Example 2.8. In Example
2.2 we had n independent observations y'= (yx, yo.,..., y) from a normal
population with unknown mean Ix and known standard deviation a = %.
With diffuse prior information about Ix, the posterior pdf [see (2.14)] was
found to be normal with mean , the sample mean, and variance ao�'/n. We
now wish to obtain the predictive pdf for a new observation, say . + x which
has not yet been observed. The two factors in the integrand of the second
line of (2.29) are [' ] P(Y+IIX, ' = 'o, Y) cc exp 2/}f' (Y+x - Ix)' and from
(2.14) p(ix[ = ao, y) oc exp [ n ] 2%. (IX - p,)' , -oo < ix <

30 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS Then from (2.29) (2.30) p(Y = y) = y) exp [ 1 2ao o.
[(y,,+x - /x) �' + n(/x - g)o.]] On completing the square on t in this last
expression as and integrating (2.30) with respect to t, the predictive pdf for
.,,+ x is (2.31) [ n ] P(+xlY) oc exp 2(n + Oaf' (Y+x - )o.. It is seen that ,,+x
is normally distributed with mean E(.+[y) = , the sample mean, and variance
Var(.+xly ) = ao�'(n + 1)/n. The pdf in (2.31) can, of course, be employed
to make probability statements about given y. 2.9 POINT PREDICTION
The predictive pdf, p(yly), can be used to obtain a point prediction; for
example, we can use a measure of central tendency, say the mean or modal
value, as a point prediction, or, if we have a loss function L - L(y, 9), where
9 is a point prediction for , we can seek the vector 9 that minimizes the
mathematical expectation of the loss function; that is (2.32) min L(y, 9)
P(YlY) dy. If a solution to the problem in (2.32) exists, it is an optimal
point prediction in the sense of minimizing expected loss. Analysis similar
to that presented in Section 2.5 on point estimation provides the result that
the mean of the predictive pdf is optimal if our loss function is quadratic;
that is, if L(, 9) = ( - 9)'Q( - 9), with Q a positive definite symmetric matrix,
then taking 9 = E(]Iy) as our point prediction provides minimal expected
loss; for example, in Example 2.8 the mean of the predictive pdf is the
sample mean , and this is an optimal point prediction for fin + , given that
our loss function is of the form L( + , .9 + ) = c(. + - 9,,+ 0 �', c > 0. For
other loss functions .5 That is, (y.+x - )2 + n(t - )2 = y+x + (n + 1) - 2(Yn+x
+ n) + n = (n + 1)[ -- 2n+x + n)/(n + 1)] + n + Y+x = (n + 1)[-- n+x + n)/(n
+ 1)1 z +n(n+x -- )/(n + 1). On substituting this last expression in the
second line of (2.30), the integration with respect to can be done readily to
yield (2.3D. An alternative, simpler derivation of (2.31) is obtained from
noting that Yn+x = + +x with the normal random error n+x independent of ,
given y, with mean zero and known variance ,o . Since both IY and +x are
normal, Yn+ x has a normal pdf with mean EYn+xIY = ElY = F, since Ely
= from (2.14), and Varn+xly) = Var(ly) + Vat en+x = %2/n + o 2 = o2(n +
1)In. SOME LARGE SAMPLE PROPERTIES OF BAYESIAN
POSTERIOR PDF'S 31 similar analysis can be performed to obtain optimal
point predictions, of course under the assumption that a solution to the
problem in (2.32) exists. 2.10 PREDICTION REGIONS AND
INTERVALS Given that we have the predictive pdf, p(yly), we can, for a
given region (or interval) , generally evaluate (2.33) Pr( e ]y) = f p(ly) d,
where is a subspace of R, the space of the elements of y. In (2.33) we have
the probability that the future observation vector y will lie in the region R.
Alternatively, given a stated probability in (2.33), we can seek a region R
such that (2.33) is satisfied. As with regions for parameters in Section 2.6,
this region can be made unique for unimodal pdf's if we require it to be a
"highest predictive density" region; that is, a region with the given proba-
bility content and such that the predictive pdf's values over the region are
not less than those relatingto any other region with the same probability
content. Example 2.9. In Example 2.8 the predictive pdf for .+ in (2.31) is
normal with mean and variance %�'(n + 1)/n. Then z = (.+ - )/o, with o =
,o/(n + 1)In, has a normal pdf with zero mean and unit variance. From
tables of the standardized normal distribution we can find the Pr{a < z < b},
where a and b are given constants. The statement a < z < b is equivalent to
f, + ao < + x < + bo and thus the probability that .+ will satisfy these
inequalities is the same as Pr{a < z < b}. On the other hand, if we are
required to find a and b such that Pr{a < z < b} =/, where/ is given, it is
clear that there are many possible values for a and b such that Pr{a < z < b}
=/g. The requirement that the interval be a "highest" interval leads to a
unique a and b, namely, a = -z and b = z, where the area over the interval -z
to z is just/. 2.11 SOME LARGE SAMPLE PROPERTIES OF BAYESIAN
POSTERIOR PDF'S In this section we discuss briefly some large sample
properties of posterior pdf's. �' First, let us consider the posterior pdf for a
scalar parameter 0' (2.34) p(Oly ) oc p(o)I(Oly) oC p( O)e TM t(ol), a* For
other discussions of this topic see Jeffreys, op. cit., p. 193 if.; Lindley, op.
cit., p. 128 if., and "The Use of Prior Probability Distributions in Statistical
Inference and Decisions," in J. Neyman (Ed.) Proc. Fourth Berkeley Symp.
Math. Statist. Probab. Berkeley: University of California Press, 1, 453-468
(1961); L. LeCam, "On Some

32 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS where p(O) is our prior pdf and/(0]y) denotes the
likelihood function based on n independent sample observations, y' = (yx,
y., ..., Y0. We assume that both p(O) and/(01y ) are nonzero in the
parameter space and have continuous derivatives and that l(01y ) has a
unique maximum at 0 = 0, the maximum likelihood estimate. In general, as
Jeffreys points out, log l(01y ) will be of order n, whereas p(O) does not
depend on n, the sample size. Thus, heuristically, in large-sized samples the
likelihood factor in (2.34) will dominate the posterior pdf. Since under
general conditions the likelihood function assumes a normal shape as n gets
large, with center at the maximum likelihood estimate 0, the posterior pdf
will be normal in large samples with mean equal to the maximum likeli-
hood estimate 0. To put these considerations in more explicit terms, we can
expand both factors of (2.34) around the maximum likelihood estimate 0 as
follows: p(O) = p(O) + (o - O)p'(O) + I(0 - +... (2.35) [ (0- O)p'(O) 3(0-
O)'p"(O) +...] = p(O) 1 + p(O) + p(O) and, with g(O) = log l(0ly ), exp
(g(O)) = exp {g(0) + 3(0 - 0)'g"(t) + (0 - O)ag'"(O) +...) (2.36) cc exp {3(0 -
O)'g"(O)}[ 1 + (0 - O)ag"'(O) +... ], where the fact that g'(O) = 0 (since 0 is
the maximum likelihood estimate) has been employed and where the
expansion e'= 1 + x +... has been utilized. Then, on multiplying (2.35) and
(2.36), we have (2.37) p(0ly) oc e(0-0?g"(0>[1 + (0 - O)p'(O) + 3(0 - O)'p"
(O) + - +...]. The leading term in (2.37), e 'A<�-�?'g"<�>, is in the
normal form, centered at the maximum likelihood estimate with variance '7
Var(01y) '- [-g"()]-= [ d'l0}(0[Y)']- 0 Thus, if we use just the leading term of
(2.37), the approximate large sample posterior pdf for 0 is Asymptotic
Properties of Maximum Likelihood and Related Bayes Estimates," Univ.
Calif. Publ. Statist., 1,277-330 (1953); and "Les Propri6t6s Asymptotiques
des Solutions de Bayes," Publ. Inst. Statist., University of Paris, Vol. 7,
1958, pp. 17-35; R. A. Johnson, "An Asymptotic Expansion for Posterior
Distributions," ,,inn. Math. Statist. 38, 1899-1906 (1967). a7 Note that,
since g(O) has a maximum at 0 = it, g"() < O. SOME LARGE SAMPLE
PROPERTIES OF BAYESIAN POSTERIOR PDF'S 33 (2.38) p(0ly) '-
Since Ig"(O)l is usually of order n, as n gets large the posterior pdf becomes
sharply centered around 0, that is, the variance, becomes smaller as n grows
larger. With respect to the quality of the approximation in (2.38), Jeffreys
points out that 0 - 0 is of order n -'A, and thus in (2.37) the terms (0 - O) p '
( O) /p( O) and (0- O)ag'"(O) are of order n-'A, '8 whereas 3(0- O)'p"
(O)/P(O) is of order n-L Thus the approximation in (2.38) '�involves an
error of order Example 2.10. Assume that we have n independent
observations from a normal population with unknown mean t and known
standard deviation * = ao. It is well known that the sample mean = IL_ ydn
is the maximum likelihood estimate for t. Then, employing (2.38) for any
prior pdf satisfying the assumptions set forth above, the posterior pdf, p(tly,
*'), can be approxi- mated as follows in large samples: Ig")l P(IY, ') -'--- 'V
(n12a02)(l _/)2 '- e- ao where g(t) = log l(p, ly, ao) = -log - n log ao and
g"g) = .. Thus the large sample posterior pdf for t is a normal pdf with mean
g and variance Ig"()l -- oo/n. The above argument generalizes easily to the
case in which we have a vector of parameters, say 0, rather than a scalar
parameter; that is, in large samples the posterior pdf for 0 will be
approximately normal with mean , the maximum likelihood estimate, and
covariance matrix (2.39) [ 0�' 1�g l(01Y)'[ - ' ' g'(/J) is usually of order n
if it is nonzero. 0 It is possible to improve the approximation in (2.38) by
retaining additional terms appearing in the square brackets in (2.37). See,
for example, Lindley, "The Use of Prior Probability Distributions in
Statistical Inference and Decisions," op. cit., p. 457 if.

34 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS In this case, proceeding as above, we can expand the two
factors in the posterior pdf for 0, p(0ly)rEp(0)/(01y ) =p(0)e g(�ly>, where
g(01y)= log l(0 [y). Then, if we retain just the leading term in the
expansion, we have, with c;c denoting "approximately proportional to,"
(2.40) p(0ly ) c;c exp [-�(0 - )'C(O - fi)], which is in the multivariate
normal form with mean fi, the maximum likeli- hood estimate, and
covariance matrix C -x, which is just the matrix in (2.39)? It is indeed
interesting to observe the close agreement of Bayesian results in large
samples with those flowing from the maximum likelihood approach. Of
course, a moot problem is how large a sample size is required for these
large sample approximate results to be reasonably accurate. Fortunately
there is usually no need to rely on large-sample approximate results, since
finite sample posterior pdf's are available, given the elements appearing in
Bayes' theorem. In certain instances, however, in which computational
problems arise in the analysis of complicated posterior pdf's the above
large- sample results are useful. 2.12. APPLICATION OF PRINCIPLES
TO ANALYSIS OF THE PARETO DISTRIBUTION Assume that we have
n independent observations y'= (yx, y2,..., Y,,), each with the Pareto pdf
given by (2.41) p(y,l A, s) y,+i 0 < A < y, < Such a pdf has been frequently
assumed to represent the distribution of incomes above a known value A.
Given that A is known, the only unknown parameter in (2.41) is a. We shall
obtain the posterior pdf for a. From (2.41) the likelihood function is /(sly, =
p(y, IA, or (2.42) /(sly, A) = -- Gn(a +,ii, where G = (YlY." 'Y,O TM is the
geometric mean of the observations. ao Note that since � is assumed to be
a maximizing value, C will be a positive definite matrix'. APPLICATION
OF PRINCIPLES TO ANALYSIS OF THE PARETO DISTRIBUTION 35
As regards prior pdf for s, we assume that our information abou.t the value
of this parameter is diffuse or vague and represent this state of our prior
information by assuming log s uniformly distributed ax which implies 1
(2.43) p(s) rE-, 0 < s < o. On combining this prior pdf with the likelihood
function in (2.42), the posterior pdf for s is S n - ln p(sl,, y) rE (2.44) rE s n
- l e - anct where a = lnG/A. The posterior pdf in (2.44) is seen to be in the
form of a gamma pdf. The normalized posterior pdf is thus (an)' s'-Xe-'"% 0
< s < c, (2.45) P(slA' Y) = I'- which will be proper for n > 0. This posterior
pdf for s represents our knowledge about s based on the information in our
sample y and the prior pdf in (2.43). If we wish, we can easily compute the
posterior probability that cx < s < co., where cx and co. are given numbers.
o�' Also, since the posterior moments of s are given by Es r = (an) -r I'(n +
r)/I'(n), r = 1, 2,..., we have aa 1 1 (2.46) Es = - = a ln(G/A)' which is an
optimal point estimate for a quadratic loss function in the sense of
minimizing posterior expected loss. If we have a new sample of q
independent observations, each with a pdf in the Pareto form (2.41), we can
use the posterior pdf in (2.45) as a prior pdf in the analysis of the new
sample; that is, the likelihood function for the new sample, denoted by y., is
sqAq x (2.47) l(slA, Y,) rE G,q(a + ax This form for the diffuse prior is in
accord with Jeffreys' rule for a parameter that can assume values from 0 to
oo; see the appendix to this chapter. Also, the prior pdf for in (2.43) is a
Jeffreys' invariant prior pdf, since Ilnfl A = I-E(d 2 log I/da)l oc l/a, where
Infa is Fishefts information matrix. ao. The computation of $ p(,IA, y)da
can be done by using a numerical integration program; see Appendix C. aa
It is interesting to note that 1/In(G/A) is the maximum likelihood estimate
for .

36 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS where G. is the geometric average of the q new
observations. On combining (2.44) and (2.47), the posterior pdf, based on
both samples, is p(lA, y, y,) oc (z n + a - 1A(n + a>a ( Gn G .a) (2.48) (G2)
(n + q)a , where G. is the geometric mean of the n + q observations in the
two samples and a. = ln(G./A). It is seen that (2.48) is in the same gamma
form as (2.44) and thus can be easily analyzed. Often in analyses of the
Pareto distribution the available data are not the individual observations y,
y.,..., y but are in the form of frequencies, no, n,..., n,, where nt is the
number of individuals whose y values, say incomes, fall in a particular
income interval, say xt to xt+, where xt+ > xt, xo = A, x,+ = c, and t = 0, 1,
2,..., T- 1, T. From the Pareto pdf in (2.41) the probability that an individual
chosen at random will have a y value such that xt < y < xt + x is Pr{xt < y <
Xt+l} = ; xt+l //Aa t I 1) Jx, y -+ dy = A for t = 0, 1, 2,..., T - 1. For the
interval x, < y < c the Pr{xr < y < o} = A/xrt Then, given N individuals
selected at random, the probability that nt individuals have y values in xt to
xt+x for t = 0, 1, 2,..., T - 1 and n, have y values in the interval x. to c is 3
� t = o \Xt Xt + 1/ where N = Y.[= ont. This is a pdf for the random hiS,
which, when viewed as a function of the unknown parameter , is the
likelihood function. It can be expressed more compactly as (2.49) t=o \xt+]
J where a = lnO/A with O (1-I/=o = (no, nx,.. nr). = xp0 xm and n' ., a This
expression, based on the multinomial distribution, is given in D. J. Aigner
and A. S. Goldberger, "On the Estimation of Pareto's Law," Workshop
Paper 6818, Social Systems Research Institute, University of Wisconsin,
Madison. APPLICATION OF PRINCIPLES TO ANALYSIS OF THE
PARETO DISTRIBUTION 37 Given a prior pdf for , say p(e), it can be
combined with (2.49) to yield the following posterior pdf T-1 [ o]rt t (2.50)
p(aJA, n, N) oc p(z)e -aa H 1 -- ( x, ] . t=0 xt+/ J If little prior information is
available, the prior pdf could be taken as shown in (2.43). If more prior
information about the value of is available, p() can be taken in a form to
represent it. In either case the posterior pdf in (2.0) can be normalized and
analyzed by using numerical inteation techniques 5; for example, given a
loss function L(, &), the value of a which minimizes the posterior
expectation of the loss function can be obtained numerically by evaluating
EL(a, ) for different values of a. Also, posterior intervals can be obtained by
using numerical integration techniques. To provide an application of these
results for grouped data, we employ the following U.S. data for 1961 which
relate to N = 1004 households with incomes of A = $10,000 or more. As
regards prior assumptions about the parameter a in (2.0), we assume that we
know little about this parameter and represent it by taking lo uniformly
distributed which implies p() 1/, 0 < < . With this prior pdf inserted in (2.0)
and using the data numerical integration procedures were employed to
obtain the following normalized posterior pdf for : n, y) = 1 xx/3 t=0 Table
2.1 FREQUENCY DISTRIBUTION OF HOUSEHOLDS WITH
INCOMES OF $10,000 OR OREATER, UNITED STnTES, 1961 Relative
Absolute Income Interval Frequency Frequency Xt (dollars) nd N nt t (104
dollars) 10,000-14,999 0.170319 171 0 1 15,000-24,999 0.221116 222 1 1.5
25,000-49,999 O. 159363 160 2 2.5 50,000-99,999 0.219124 220 3 5.0
100,000-149,999 0.0478088 48 4 10.0 150,000-499,999 O. 137450 138 5
15.0 500,000-- 0.0448207 45 6 50.0 Tota!.q 1.000 N = 1004 a, See
Appendix C. a0 These data are presented in R. Barlow, H. Brazer and J. N.
Morgan, Economic Behavior of the Affluent. Washington, D.C.: Brookings
Institution, 1966, p. 193, Table D-1. These same data have been analyzed
with various sampling theory approaches by Aigner and Goldberger, loc. tit.

38 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS with k being the normalizing constant and a = lnG/A with
G = (1-I[= o x0 m. In Figure 2.2 a plot of this posterior pdf is provided. The
posterior mean and 20 A Posterior mean and variance: 18 / \ �a = 0.6218
16 14 = 12 ' lO o Figure 2.2 Posterior distribution for parameter of Pareto
distribution derived from grouped data. variance, computed by numerical
integration, are Ez -- 0.6218 and Var z = 0.00041. Of course, for a quadratic
loss function the optimal point estimate is the posterior mean? If loss
functions other than the quadratic are deemed more appropriate by an
investigator, he can readily compute estimates which minimize expected
loss for any particular loss function provided that the mathematical
expectation of the loss function and a minimum exist. Further in this
instance in which the sample size is rather large, the form of the posterior
pdf resembles that of a normal pdf. " 2.13 APPLICATION OF
PRINCIPLES TO ANALYSIS OF THE BINOMIAL DISTRIBUTION
Consider the outcomes of n independent events, each of which can occur in
one of two mutually exclusive ways, say A and B; for example, the n events
may be n independent tosses of a two-sided coin. The outcome on each toss
is either a head (A) or a tail (B). Let 0 be the probability, assumed to be the
same for all events, that A will occur and 1 - 0, the probability that B will
occur. Then the probability of observing n A's and n - nx B's in n
independent events is given by the discrete binomial pdf a7 For this set of
data the maximum likelihood estimate is 0.6218, numerically the same as
the posterior mean. APPLICATION OF PRINCIPLES TO ANALYSIS OF
THE BINOMIAL DISTRIBUTION p(nlO, n) = nx (2.51) where 39 n) nx =
nx ! (n - with 0! --- 1. The function in (2.51), viewed as a function of the
unknown parameter 0, is the likelihood function. Suppose that we have
some prior information about 0 and can represent it by the following beta
pdf a8 (2.52) p(O) = kO-X(1 - O) b-, a, b > O, 0<0<_1, where k = F(a +
b)/F(a) F(b) is the normalizing constant and a and b are prior parameters
whose values represent our prior information about 0. In assigning values to
a and b, note that, for the beta pdf, EO = a/(a .4- b) and Var 0 = ab/(a +
b)�'(a + b + 1). Given that a and b have been assigned suitable values to
represent the available prior information about 0, (2.52) can be combined
with the likelihood function in (2.51) to yield the posterior pdf for 0, (2.53)
p(Olnx, n) which is in the beta form (2.52) with parameters a' = nx + a and
b' = n - nx + b. The normalizing constant for (2.53) is F(a' + b')/F(a') F(b'),
whereas the posterior mean is a'/(a' + b') and the posterior modal value is
0=oa = (a' - 1)/(a' + b' - 2). Posterior probabilities, for example Pr(c < 0 <
d), where c and d are given numbers, can be easily evaluated by numerical
integration or by use of available tables of the incomplete beta function. a
Also, as is well known, a random variable with a beta pdf can be
transformed to a variable with an F-distribution. In the present instance, if
we let x = (b'/a')O/(1 - 0), a posteriori x has an F distribution with 2a' and
2b' degrees of freedom? � This is a useful fact x for making posterior
probability state- ments about 0/(1 - 0), the odds relating to the events A
and B. a8 See Appendix A for properties of the beta pdf. ao See, for
example, K. Pearson, Tables of the Incomplete Beta Function, Cambridge:
Cambridge University Press, 1948. 4o In general, if 0 has a beta pdf, p(O)
dO oc 0"-x(1 - 0) - x dO, then the pdf for x -- (b/a)0/(1 -- O)isp(x) oZ xa- X
/(b + ax) "+ oc x a"m- x/J1 q- (2a/2b)x] (a"+a)la, which is in the Fa,,a form;
see Appendix A. In deriving this expression for p(x), note that 0 = x/(x +
b/a), (1 - O) = (b/a)(x + b/a)- and dO oc dx/(x + b/a) a. 4x That is, tables of
the F-distribution can be consulted to obtain probabilities. Alternatively, it
is easy to use numerical integration techniques to compute probabilities; for
example the probability that x will lie in a given interval.

40 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS When our prior information about 0 is diffuse or vague,
there has been considerable discussion in the literature about the form of the
prior pdf to represent knowing little about the value of 0. 4' We shall adopt
the following improper pdf to represent vague knowledge about the value of
0: 1 (2.54) p(O) cc 0(1 - 0)' 0 _< 0 < 1. This prior can be viewed as a
limiting form of the prior in (2.52) as both a and b approach zero.
Alternatively, as Jeffreys and Lindley 4a point out, = 0/(1 - 0) ranges from 0
to c, and if we take v = log uniformly dis- tributed over the entire real line
this implies (2.54) as the prior for 0. On combining the prior pdf in (2.54)
with the likelihood function in (2.51), the following is the posterior pdf for
0: s (2.55) p(Olnx, n) oc 0-x(1 - 0)'*- -x, 0 _< 0 < 1, which is in the beta
form with parameters nx and n - nx. The posterior pdf will be proper for nx
> 0 and n - nx > 0; that is, if our sample information includes at least one
occurrence of the event A and one of the event B. With this condition
satisfied, the posterior mean of 0 is E(O[nx, n) = nx/n, the sample
proportion, and the posterior variance is Var(0ln, n) = nx(n - nO/ n'(n + 1).
Further, given that we have the posterior pdf in (2.55), the pos- terior
probability that 0 will lie in any given interval can be readily computed.
2.14 REPORTING THE RESULTS OF BAYESIAN ANALYSES In
reporting the results of Bayesian analyses involving estimation of
parameters in scientific journals, it is important to provide at least (a) a
detailed discussion of the stochastic model assumed to generate the
observa- tions, (b) a full discussion of prior assumptions about parameter,.
values, (c) the sample information, and (d) information about posterior pdf's
for parameters of interest. With respect to the stochastic model for the
observations, subject matter considerations should be reviewed to justify its
form and stochastic assump- tions. Given that this has been done
satisfactorily, the likelihood function p(yl 0) should be shown explicitly,
where y is an observation vector and 0 is a parameter vector. 42 See, for
example, Jeffreys, Theory of Probability, op. cit., pp. 123-125. 42 Lindley,
op. cit., p. 145. 44 That is, from v = log ,/= log 0/(1 - 0), dv = d log 0 - d log
(1 - 0) = dO/O(1 - O) and thus p(v) dv oc dv implies p(O) dO oc [0(1 - 0)]-
dO. If, instead of (2.54), we had used the Bayes-Laplace uniform prior,
p(O) oc constant, 0 _< 0 < 1, the exponents in (2.55) would each be
changed by one, the equivalent of two sample observations, which will not
be important in moderate-sized samples. PRIOR DISTRIBUTIONS TO
REPRESENT "KNOWING LITTLE" 41 As regards prior assumptions
about 0, that is, a choice of prior pdf for 0, all information used to make
such a choice should be explicitly stated. If data-based prior information is
being employed, this fact should be noted and references provided to the
sources of such prior information. If nondata- based prior information is
employed, it should be carefully examined and explicated. In this way the
reader will understand what information is being added to the sample
information in performing an analysis. Of course, if little prior information
is available or if the investigator wishes to show what results from an
analysis assuming little prior information, he will use a vague or diffuse
prior pdf. With respect to reporting the data employed in an analysis, it is
good procedure to describe in detail how they were obtained and have them
available for any interested party by including them in the report or by
making it known that they can be obtained on request. By having the data
available other parties can perform analyses using whatever prior pdf's they
choose to use. Also, should there be any controversy about the form of the
likelihood function, the data can be employed to explore alternative
formulations. 4 With respect to reporting information about posterior pdf's
for parameters of interest, it is good practice to report the complete
posterior pdf and to provide summary characteristics, say measures of
central tendency and dis- persion. Also posterior intervals (or regions) often
help readers to appreciate what the prior and sample information implies
about the values of parameters. By paying special attention to the above
points, readers will understand how the reporting investigator learned from
the information in his sample7; that is, he will have information regarding
the investigator's initial beliefs about the parameters and model and can
then see how they are altered by the data. This change in beliefs is indeed
an essential part of the process of learning from experience. APPENDIX
PRIOR DISTRIBUTIONS TO REPRESENT "KNOWING LITTLE" As
stated in the body of this chapter, there are situations in which investi-
gators know little, or wish to proceed as if they knew little, about
parameters 46 In some cases p(yl 0) = g(tx, t2 ..... tl O) h(y), where t = t(y),
i = 1, 2,..., k, are functions of the observations called sufficient statistics.
Reporting just the tfs allows others to perform their own analyses, provided
that they accept the form of the likelihood function as being satisfactory. To
investigate alternative likelihood functions the complete sample y is usually
required. 4 See C. Hiidreth, "Bayesian Statisticians and Remote Clients,"
Econometrica, 31, 422-438 (1963), for further consideration of the problem
of reporting.

42 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS of a model. Thus there is a need for explicit rules for
selecting prior distri- butions to represent "knowing little" or ignorance.
Perhaps, surprisingly, filling this need has been a difficult and controversial
aspect of the Bayesian approach to inference. In this appendix we consider
some approaches that have been put forward to deal with this problem.
When a parameter's value is completely unknown, Jeffreys 48 suggests two
rules for choosing a prior distribution, which, according to him, "... cover
the commonest cases." He states that, "If the parameter may have any value
in a finite range, or from -c to +c, its prior probability should be taken as
uniformly distributed. If it arises in such a way that it may conceivably have
any value from 0 to c, the prior probability of its logarithm should be taken
as uniformly distributed." 49 Let us consider application of Jeffreys' first
rule to the case of an unknown parameter, t, say a mean, which can
conceivably assume values from -c to +c. Jeffreys' prescription for
representing ignorance about the value of is to take (1) cede, < < that is, p(t)
cc constant. This rectangular pdf is obviously improper since j'-oo P(t ) dt =
oo. Given that we know -c < t < c to be a certain state- ment, this means that
Jeffreys is using to represent the probability of the certain event, - < t < c,
rather than 1.s� The fact that (1) integrates to c is a virtue from Jeffreys'
point of view since then Pr{a < t < b}/Pr{c < t < d} = 0/0 is indeterminate,
where a, b, c, and d are any finite numbers. sx Since this ratio of
probabilities is indeterminate, we can make no statement about the odds
that t lies in any particular pair of finite intervals. Jeffreys views this
property of (1) as a formal representation of ignorance. 8 Jeffreys, op. cit.,
p. 117. Plackett regards Jeffreys' work as "an authoritative modern'! version
of the Bayes-Laplace procedure" for representing ignorance. See R. L.
Plackett,' "Current Trends in Statistical Inference," J. Roy. Statist. Soc.,
Series A, Part 2, 249- 267, p. 251 (1966). 9 Ibid., p. 117. 5o Jeffreys, ibid.,
p. 21, remarks, ". � � there are cases where we wish to express ignorance
over an infinite range of values of a quantity, and it may be convenient to
express certainty that the quantity lies in that range by oo (rather than 1) ....
" ; 5x It may appear that Pr{a < < b} = 0 is an informative statement'about .
This i statement, however, does not logically imply that is outside the
interval a to b with certainty given that is uniformly distributed, -oo to oo.
Jeffreys (p. 21) provides an, ' example to illustrate this point. If x is a
continuous random variable with a uniform pdf from 0 to I, the fact that
Pr(x = �) = 0 does not logically imply that x :P � with! certainty, since �
is a possible value of x. ' PRIOR DISTRIBUTIONS TO REPRESENT '
KNOWING LITTLE" 43 If, in place of (1), we had taken, say, 1 (2) p() -
M<tx < M, a proper pdf, we would have introduced prior information about
the range of t and thus would not be completely ignorant about the value
of/. With (2) we. have for finite a and b in the closed interval - M to M, Pr{a
< / < b} = (b - a)/2M 0, and thus Pr{a < t <b} (3) Pr{c < / < d} (b- a)/2M b-
a (d- c)/2M = L'-' where c and d are finite and in the closed interval -M to
M. The ratio of probabilities in (3) is determinate in contrast to what
follows from (1). However, if we consider limu_.oo Pr{a < t < b}/lim.u_.oo
Pr{c < t < d}, this ratio of limiting probabilities is in the form 0/0. In this
sense we can regard (2) as an approximation to (1) as M grows large. It may
be asked if we are incorporating information about t by the choice of the
rectangular form of the pdf in (1) or (2). By Jeffreys' line of reasoning the
indeterminancy of the ratios Pr{a < t < b}/Pr{c < t < d} seems adequate to
justify the use of the rectangular pdf. Other improper pdf's, however, have
this property. There appears to be no way of answering this question about
the form of the pdf to express ignorance without introducing a measure of
information. If we agree to measure information in a pdf, for example p(t),
by (4) H = f_ p(/)log p(/)dt, a measure employed by many, including
Shannon?' the proper pdf which minimizes H is the one shown in (2)? Thus
the rectangular pdf is a "minimal information" prior pdf. By letting M get
large we get an approximation to (1). so. See, for example, C. E. Shannon,
"The Mathematical Theory of Communication," Bell System Tech. J. (July-
October 1948), reprinted in C. E. Shannon and W. Weaver, The
Mathematical Theory of Communication. Urbana: University of Illinois
Press, 1949, pp. 3-91. Shannon defines W = -H as the entropy or uncertainty
associated with a pdf, say p(p). a That is, minimize H = [t P()logp()d subject
to J' p()d - 1. Form the Lagrangian expression H + A[[_ p() d - 1 ], where A
is a Lagrange multiplier. Then, 'on varying p(), we have as the condition for
H to be minimized subject to the condition, .?- [1 + logp(p) + A] $p(/) = 0.
Thus p(p) = e -t+x On taking A + 1 = Iog2M, i!11:p(l) = 1/2M is the proper
pdf which minimizes H.

44 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS Example 2.3 in the text of this chapter illustrates that on
combining the improper pdf in (1) with a likelihood function by use of
Bayes' theorem the resulting posterior pdf is proper. The sample
information in the likelihood function has in this case taken us from an
improper, "noninformative" pdf for t to a proper, "informative" posterior
pdf. In this way we have moved from ignorance, as represented by (1), to a
more informed position, as represented by our proper posterior pdf; for
example, a posteriori, Pr{a < t < bly}/Pr{c < t < d[y} is not indeterminate.
The second rule that Jeffreys gives pertains to parameters, which by their
nature, can assume values from 0 to ; for example a standard deviation a.
For such a parameter Jeffreys suggests taking its logarithm uniform; that is,
if we let 0 = log a, the prior pdf for 0 is taken as follows: (5) p(O) dO oc
dO, -o < 0 < . With 0 = log a, note that O's range is -o to oo and thus (5) is
consistent with Jeffreys' first rule. Since dO = de/a, (5) implies (6) p(e) de
oc--, 0 < e < o, as the improper pdf to represent ignorance about e. Jeffreys
presents several interesting observations about (6) and possible alternatives
to it. First, (6) is invariant to transformations of the form 4 = a; that is, d4 =
na -x de and thus d4/4 oc da/e. For Jeffreys this invariance property is
important because, for example, some parameterize a model in terms of the
standard deviation a and others, in terms of the variance e ', or a precision
parameter, h -- 1/e �'. As can easily be checked, if we take de/e as our
prior for e, this logically implies de/e oc de�'/e �' cc dh/h' Thus by
applying Jeffreys' rule to e, 0 < e < , to e �', 0 < e �' < , or to h, 0 < h < o,
provides prior pdf's in the same form and consistent with one another in the
sense that de/e cc de�'/e �' oc dh/h is satisfied. Further, posterior proba-
bility statements, based on the alternative parameters, will be consistent.
Next, Jeffreys points out that (6) has the following properties: (a) ff de/e -- ;
(b) f de/e = ; and (c) fo de/e = . Property (a) indicates that again is being
used to represent certainty. Then (b) and (c) together imply that Pr{0 < e <
a}/Pr{a < e < m} is indeterminate and thus nothing can be said about the
ratio of these two probabilities; that is, the odds pertaining to the
propositions 0 < e < a and a < e < . Again this indeterminacy is regarded as
a formal representation of ignorance. As alternatives to (6), Jeffreys
considers p(e) oc constant and p(e) oc e- with 0 < e < . The first of these
pdf's has the property that the probability that e > c, where c is positive and
finite, is oo, which is certainty on Jeffreys' scale. Thus the probability that 0
< e < c is zero, which implies that we PRIOR DISTRIBUTIONS TO
REPRESENT "KNOWING LITTLE" 45 know something about e.
Therefore Jeffreys considers p(e)cc constant, 0 < e < c, as an unacceptable
representation of ignorance about the value of e. With respect to p(e) oc e -*
Jeffreys notes that a factor k must be introduced in the exponent, since e has
the dimensions of length and the exponent of e must be a pure number.
Also, with a pdf in the form p(e) e -*, Pr{0 < e < c}/Pr{c < e < oo}, for
finite positive c, has a finite deter- minate value that contradicts the premise
that we know nothing about the value of e. Further, if k were unknown, we
should have to introduce a prior pdf for it so that no progress would have
been made. We have noted that 0 = log e is a parameter such that - < 0 <
given that 0 < e < . Then the information measure in (4) will be mini- mized
by taking p(O)cc constant, and this is an information theoretic justification
for taking 0 = log e uniformly distributed, which implies (6)2 4 In general,
if ,/ = f(e), where f is differentiable, f(0) = -, and f(oo) = we could take p(,/)
d,/ oc d,/, in accord with Jeffreys' first rule. This implies p(a) de ocf'(e)de. If
f (a) = log e, we get Jeffreys' prior pdf, shown in (6). Jeffreys wants f(e) to
be such that it involves no new parameters. This con- dition would rule out,
for example, f(e) = e*(l - 1/e). Jeffreys states that f'(a i must be of the form
Ae , where A and n are constants, if we want to express ignorance about e,
given only the knowledge that 0 < e < . He further argues that only by
taking n = -1 will the ratio Pt{0 < e < a}/ Pr{a < e < } be indeterminate.
This result, in addition to the property of "invariance with respect to
powers," is for him a compelling reason to take (6) as the prior pdf for e
representing ignorance. In Example 2.3 in the body of this chapter we have
seen how the improper prior pdf, p(e) oc l/a, combines with a likelihood
function to yield a proper posterior pdf for e. As mentioned above, p(e) oc
1/e implies p(e 2) oC 1/e 2, and thus we use the same form for the pdf to
express ignorance about e �' (or any power of e). Thus, if one investigator
uses the parameter e while another uses 4 = e', given that their priors to
represent ignorance are p(e) cc 1/e and P(4) cc 1/4, respectively, their
posterior probabilities relating to e and 4 will be consistent. it seems that
some are hesitant to employ the improper pdf's recom- mended by Jeffreys.
Rather they introduce "locally uniform" or "gentle" 54 The pdf for which
minimizes H is assumed to be proper. Thus, if vx < < v2, the ! proper pdf
minimizing H is p(a) = (log v2/vi)- X(d,/,). As v2 co and vx -- 0, we get an
approximation to Jeffreys' improper pdf. On this point Jeffreys, op. cit., p.
122, .: comments" (i) an intermediate range contributes most of the values
of the various integrals in any case, and the termini would make a
negligible contribution; (2) in the mathematical definition of an infinite
integral a finite range is always considered in the first place and then
allowed to tend to infinity. Thus the results derived by using infinite
integrals are just equivalent to making vx [in p(*) = (log v/vi)- l(d,/*)] tend
to 0 and v. to infinity."

46 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS prior pdf's for unknown parametersil s These terms are
employed for prior pdf's which are "reasonably flat" or "gentle" over the
range in which the likelihood function assumes appreciable values. Outside
this range it matters little what shape the prior pdf has, since, in deriving th
e posterior pdf, the prior pdf gets multiplied by small likelihood values. The
situation is as shown in Figure 2.3. 1(/. ] y) p(u) A B Figure 2.3 Example of
a "locally uniform" prior pdf, p(). Since the posterior pdf oc prior pdf x
likelihood function, it is clear that the shape of the prior pdf to the left of the
point A and to the right of the point B will have little influence on the shape
of the posterior pdf. Analytically, for a parameter t, the posterior pdfp(tlY)
is given by (7) p(/[y) oc p(/)l(/ly). Let/o be a value of/ located in the region
in which/(/[y) assumes appre- ciable values. In many examples/o can be
taken as the modal value of/(/[y). Then expand p(/) as follows (8) = + -
to)p'(o) + - +" '. If the first-order and higher order terms in this expansion
are negligible in the region in which the likelihood function assumes
appreciable values, as would be the case if p(t) were flat or "gentle" in this
region, we have p(ly) c;c I(IY), where ck denotes "approximately
proportional to." By taking p(/) "locally uniform" and proper it is clear that
Jeffreys' con- dition for complete ignorance is not being satisfied [see the
discussion in connection with (2)]. Also, this procedure for choosing a prior
pdf implies that we know something about the likelihood function, which
may or may not ** See, for example, L. J. Savage, "Bayesian Statistics," in
Decision and Information Processes. New York: Macmillan, 1962. G. E. P.
Box and G. C. Tiao, "A Further Look at Robustness via Bayes Theorem,"
Biometrika, 49, 419-433 (1962). PRIOR DISTRIBUTIONS TO
REPRESENT "KNOWING LITTLE" 47 be the situation in practical cases.
If this information about the range of/ and the likelihood function is
available, and some claim that it usually is, it can be used to good
advantage; for example, if it is known that -M < t < M and that the design
of an experiment is such that the likelihood function can assume
appreciable values for - 2M < t < 2M, it is apparent that use of a prior p(t) dt
oc dt for -c < t < c can lead to inferences at variance with the prior
information - M < t < M. Thus, when an investigator knows something
about t and the experimental design, it is, of course, important that this
information be considered in making inferences about t. When such
information is not available, it usually makes very little practical difference
whether we use a "locally uniform" prior pdf or Jeffreys' improper pdf for
In the discussion above we have encountered several examples of
invariance properties of prior pdf's; for example, it was noted that p(a) de cc
d,/a is invariant with respect to powers of a. Jeffreys has provided a
remarkable generalization of this invariance property. He points out s6 that
if our prior pdf for the parameter vector 0 is taken as (9) p(0) oc Ilnf0l,
where Inf0 is Fisher's information matrix for the parameter vector 0, that is,
(10) Inf0 = - E[ ?' log p(yl0).] 0 O0j ]' where the expectation E is with
respect to the pdf for y, p(y]0), the prior pdf in (9) will be invariant in the
following sense. If an investigator parametrizes his model in terms of the
elements of 1, where 1 = F(0) with F a one-to-one dif- ferentiable
transformation of the elements of 0, and takes his prior pdf for 1 as (11)
P01) oc Ilnfn[ , his posterior probability statements will be consistent with
those of a person using the parameter vector 0 in conjunction with the prior
pdf in (9). A proof of this property follows. Let p(y[0), where 0' = (0, 0o.,...,
0k), be a pdf for the observation vector y. The information matrix for 0,
given in (10), can alternatively be expressed as (12) Inf0 = E( logpO logp.
i,j = 1, 2, k, a0 a0j ]' ''" where we have written p for p(y10) and the
expectation E is taken with respect to y. Let 1 = F(0) be any one-to-one
differentiable transformation s6 Jeffreys, op. cit., p. 179 if.

48 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS of the elements of 0; that is, */1 = f(0), w. = fo.(0),..., w =
f(0). Then we have (13) [Inf0[ ' dO = Ilnfnl u dl. Proof. s? Since a logp/aO
= logp O,j the (i, j)th element of Inf0, Inf0a.j, can be expressed as where =
NS' x 5' (Infn.,.s) , Inf0,t, ,__a= x z-' x O0i in,,,s = E( log p a log p Then
(14a) and (14b) Inf0 = (Info,,,) = J InfnJ' IInfol = IJI IInfnl where J is the
Jacobian matrix, associated with the transformation 1 = F(0), which has the
typical element /0. Noting that [J] dO = dl, we have from (14b) IInf01 �
dO = [Infnl dl, which was to be shown. The importance of this result is that
if investigator A parameterizes a model in terms of 0 and uses ]Info[ dO as
his prior pdf his posterior pdf is p(Oly) dO oc [Inf01�2p(y[O) dO, whereas
if investigator B parameterizes the model in terms of 1 = F(0) his posterior
pdf is P01]Y)dl oc [Infn[v'p(yll)dl. Since (13) is true, B can employ 1 =
F(0) to transform his posterior pdf to relate too and he gets exactly the
posterior pdf that A has obtained. Alternatively, A can use 1 = F(0) to
express his posterior pdf in terms of 1 and, given (13), this posterior for 1
will be precisely B's. Thus, when investigators take their prior pdf's
proportional to the square root of the information matrix, they are led to
consistent posterior pdf's in the sense explained above. To consider
invariance properties more generally, it is useful to note that the Bayesian
"transformation" of a given prior pdf, p(0), (15) p(0ly) oc p(0) p(yl0), 57
This proof, essentially the same as Jeffreys', was presented in a lecture by
M. Stone in 1965. PRIOR DISTRIBUTIONS TO REPRESENT
"KNOWING LITTLE" 49 involves several components, namely, a pdf for
y, p(y10), a parameter space fi, and a sample space S. We shall write (16) ' =
{p(y[0), 0 e fl c R , and y to represent collections of these quantities where
fl denotes an open subset in the k-dimensional Euclidean space R and S, an
open subset in R'*. We assume that includes just pdf's for y, given 0, which
have a continuous 0 derivative for all y e S. Hartigan s8 considers
properties of the Bayesian transformation in (15) for various 's. He
establishes that (15) has the following invariance properties when p(0) oc
]Inf0[ , that is, when the prior pdf is of the form suggested by Jeffreys: I. S-
labeling invariance: If z = G(y) is a differentiable one-to-one trans-
formation which takes the sample space S for y into S*, the sample space
for z, then (17) p(0lz ) cc p(01y ). This property is particularly important,
for example, when the transformation z = G(y) involves a change in the
units of measurement. II. fl-labeling invariance: If 1 = F(0) is a
differentiable one-to-one trans- formation of 0, then there exists p*(yll) =
p(y[0) and (18) P(IIY) dl oc p(0ly ) dO. We have provided a proof of this
property and commented on its importance. III. a-restriction invariance:
Assume that 0 e fl* c fl. Then the posterior pdf based on p*(y[0) with 0 e
fl*, that is, p*(0[y) ocp(0)p*(yl 0) will be proportional to p(0[y) oc p(0)
p(y[0) with 0 This property implies that when we use Jeffreys' prior we get
the same posterior if we work with the likelihood function defined for 0 e
fl* that we would get if we worked with the likelihood function defined for
0 e fl and then restricted the resulting posterior pdfp(0[y) to be zero outside
the'region 0e IV. Sufficiency invariance: If t' = (tx, to.,..., t) is sufficient for
0 andp*(t[0) is the pdf for the sufficient statistics, then p*(0[t) oc p(0[y),
where p*(0lt ) oc p*(0) p*(t10) and p(0[y) is given by (15) when the priors
are of Jeffreys' form. a j2 Hartigan, "Invariant Prior Distributions," Ann.
Math. Statist., 35, 836-845 (1964).

50 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS V. Direct product invariance' If we have two independent
samples, yx and y., with pdf's px(yx[0x) and p.(y.[0.), respectively, 0x e fx
and 0. e fl., and p(y[0) ocpx(y[00p(y.10o.), where 0 e f = f x o., then p(0ly )
oc px(0xlyx) pa(Oalyd, where p(0ly) oc pt(O,) p,(y, 10,) for i = 1, 2, and
p(0ly ) oc p(0) p(yl0) when the prior pdf's px(0x), po.(0o.), and p(0) are
each taken in Jeffreys' form. VI. Repeated product invariance' Suppose that
yx, Ya,..., y, n x 1 independent observation vectors from p(y[0). Then and If
are each p(yl, y2, ..., y,d0) = 1--I P(Y,I O) l--l. m, p*(0lyx, y.,..., y) oc p*(O)
1-I P(Y, I0) � p*(Oly, y.,..., y,,) oc p(Oly 0 I--[ P(Yd�), _-9. where p(0[y 0
oc p(0) p(yx[0), we have "repeated product invariance." Taking p*(O) and
p(O) in Jeffreys' form will provide this property? These six invariance
properties are indeed important properties relatipg to prior pdf's, and the
fact that Jeffreys' prior, p(O) oc [Inf0['A, has them is fortunate. However, in
what sense can Jeffreys' prior be considered as a representation of "knowing
little" or "ignorance"? As in the discussion of the uniform distribution as
representing "knowing little," it appears necessary to consider this problem
in information theoretic terms. To do so, let p(y[O) be the pdf for y given 0.
Then define (19) I(0) = f p(yl0) logp(y10 ) dy as measuring the information
in p(y[0). A priori, the prior average informa- tion will be defined as (20) L
= f p(O) do, where p(0) is a prior pdf, here proper. Then we introduce (21)
G = - f p(0) log p(0) dO = f I(0) p(0) dO - f p(0) log p(0) dO 5o Hartigan,
loc. cit., notes that if p(y[0)= px(yx]0)pa(y2]0), where px and P2 are
different pdf's, requiring the "repeated product invariance" property implies
that other invariance properties, including the fl-labeling invhriance
property, will be violated. PRIOR DISTRIBUTIONS TO REPRESENT
"KNOWING LITTLE" 51 to measure the gain in information; that is, the
prior average information associated with an observation y, denoted i,
minus the information in our prior pdfp(0), as measured by f p(0) logp(0)
dO. We now define a "minimal information" prior pdf to be one that maxi-
mizes G for given p(y[0). Although this is not the only possible definition
of a "minimal information" prior pdf (e.g., other measures of information
could be employed), it is of interest to apply this definition to a few
particular cases to illustrate prior pdf's yielded by it and compare them with
Jeffreys' prior pdf's. Consider first 1 p(ylO) = exp [-�(y - 0)9'1, -o < y <
that is, y is normally distributed with unknown mean 0 and known variance
equal to one. Then = P(Yl o) log p(Yl o) dy log 2,r - �(y - 0) �'1 p(y[ O)
dy : -�(log 2,r + 1), which is independent of 0. Therefore, for proper p(O),
f I,(0) p(O) dO = -�(log 2rr + 1) and G = -�(log 2rr + 1) - f p(O) log p(O)
dO will be maximized if we minimize f p(O)logp(0)dO. As shown above,
the solution to this problem is a rectangular or uniform pdf for 0; that is,
p(O) oc constant. This is also the form of the prior yielded by Jeffreys' rule;
that is, p(0) oc Ilnf01 oc constant. As a second example, consider [ y2] 1
exp - , p(yla) = -oo <y < m. Here we have assumed y normal with known
mean equal to zero and unknown standard deviation . Then I,(.): y:, p(yl,)
log p(y[,) dy = -�(log 2r + 1) - log.,
52 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED
APPLICATIONS and for proper G = -� log (2r + 1) - f log o p(o) do - f
p(o) log p(o) The necessary condition for G to be a maximum subject to J'
p(o) do = 1 is -log o- 1- logp(o)+ h = 0 where h is a Lagrange multiplier.
This implies that 1 (22) p(o) oc-, a result also in accord with p(o) oc [Inf,[
oc 1/o. As a third example, consider 1 exp -i- (y - O) ' , p(y]O, o) = -c < y <
oo, wherein both 0 and o are unknown. Then 6(0, o) -- 7o. p(y10,
o)logp(y10, o)dy = -�(log 2rr + 1) - log o, and for proper p(O, o) G =
-�(log + 2) - f log op(O, o) ao ao - f f p(O, o) logp(0, o) ao ao; G will be
maximized, subject to p(O, o) dO do = 1 for 1 (23) p(O, o) oc -. Thus the
"minimal information" prior pdf is one in which 0 and o are independent,
with 0 and log o uniformly distributed. A prior pdf in the form of (23) is the
one used extensively by Jeffreys, even though s� p(O, o) oc [Inf0,l (24) oc
. Jeffreys explains his departure from the use of the general rule in terms of
a Note that the information matrix is given by 0) Info,, = 2n and thus (24)
follows. PRIOR DISTRIBUTIONS TO REPRESENT "KNOWING
LITTLE" 53 prior judgment that 0 and o are independentY Then he applies
his rule separately to 0 and o to obtain a prior pdf in the form of (23). He
notes that the prior pdf in (23) will be invariant to transformations of the
following kind, ,/= If we consider an asymptotic form of the quantity G in
(21), (25) G. -- f p(0) log x/llnf0l dO - f p(0) log p(0) dO, where n = number
of independent drawings from p(y[0), and seek the prior pdf, p(0), which
maximizes G subject to f p(0)dO = 1, the result is just 6�' (26) p(0) oc
IInf01 , which is in the form of Jeffreys' invariant prior pdf. Thus for the
asymptotic form of G given in (25) Jeffreys' prior is a minimal information
prior. For the nonasymptotic form of G in (21), however, it must be
recognized, as shown above, that Jeffreys' invariant priors are not always
minimal informa- tion prior pdf's since they do not always maximize G. �a
When a Jeffreys' prior pdf is not the prior pdf that maximizes G, it is the
case that use of Jeffreys' prior involves introduction of more prior
information in an analysis than associated with the use of a prior pdf that
maximizes G. When the number of parameters is large, this difference can
be important; for example, in k normal populations with unknown means, 0'
= (0, 0o.,..., 0h), and the common variance, 1/o x+x which, for large k,
contrasts sharply with the minimal information prior p(0, o) oc 1/o. It is
thus important, as Jeffreys himself emphasizes, to examine the form of
vague or diffuse prior pdf's carefully in order to avoid putting some
unwanted prior information into an analysis, a consideration that is
particularly relevant for small sample situations? 4 6x Jeffreys, op. cit., p.
182, recognizes that if we have k normal populations with unknown means
0, 02 ..... 0 and common unknown variance ,a his invariant prior pdf is
IXnfl'A oc 1/o + x. He deems this unsatisfactory since, for example, if we
have n observations from population i, with n = Y= n{, the marginal
posterior pdf for 0x will be in the univariate Student t form with n degrees
of freedom for any k. That there is no loss of degrees of freedom associated
with integrating out the k - 1 means other than 0x is due to the appearance
of a factor 1/, for each mean in Jeffreys' prior and is considered
unreasonable by Jeffreys. 6a This result appears in Lindley, op. cit., p. 467.
oa Prior pdf's that maximize G do not in general have the fl-labeling
invariance property of Jeffreys' prior pdf. However, investigators using
different parameterizations can get compatible results if they adopt the
convention of using minimal information prior pdf's (i.e., priors that
maximize G) for any given parameterization when they know little about
the values of the parameters. A small sample situation can be roughly
defined as one in which the ratio of the number of parameters to the number
of observations is not small.

54 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS QUESTIONS AND PROBLEMS 1. Assume that the
following data represent experimental measures of yields of a new variety
of rice: 10.40, 10.36, 9.16, 10.03, 9.31, 9.75, 8.69, 9.89. If these data are
regarded as having been generated by independent drawings from a normal
distribution, derive the posterior distributions for the population mean yield
and the population standard deviation using a diffuse prior pdf for the mean
Ix and standard deviation , namely, p(ix, ) oc I/a, with -or < Ix < and 0 < < .
What are the first two moments of the marginal posterior pdf's for p and for
? 2. From (2.19), which is the posterior pdf for a standard deviation , derive
the posterior pdf for the variance of = e and expressions for the posterior
mean and variance of . Then show that the posterior pdf for x = vs2/ is the e
pdf with v degrees of freedom shown in (A.35) of Appendix A. 3. Suppose
that another set of experiments, independent of those referred to in Problem
1, produced the following yields for a variety of rice currently in use: 8.47,
7.35, 12.08, 7.83, 8.43, 10.29, 11.34, 8.40. If these data are regarded as
having been generated by independent drawings from a normal distribution
with mean 0 and standard deviation , what is the likelihood function for the
parameters p, 0, , and , given the data of Problems 1 and 3 If the , 0, log ,
and log are assumed to be independently and uniformly distributed a priori,
each with a range of - to , derive the joint marginal posterior pdf for and .
Then, using the analysis in Appendix A, Section 6, show that the posterior p
for = e/e is in the form of an F pdf. Compute and plot this pdf. Also
compare the posterior pdf's for d for p. How does the new variety of rice
compare with the one currently in use ? 4. If price, p, is related to yield per
acre, y, by p = a - aeNy, with ,z and a2 positive parameters and N a given
number of acres under cultivation, sales revenue is SR = Nyp = Nya - a(Ny)
e. If y is assumed to be random with mean p and varian e, obtain an
expression for the mathematical expecta- tion of SR, given N, ,z and ae.
How does ESR depend on p and e ? Use the posterior pdf for p and 2,
obtained in Problem 1, to obtain the posterior mean of ESR. 5. As an
alternative prior pdf for the analysis of Example 2.3 consider the following
natural-conjugate, "normal-gamma" pdf for p and : with a pdf in the
inverted gamma form (see Appendix A, Section 4), and QUESTIONS AND
PROBLEMS 55 a pdf in the univariate normal form, where Ixa, ha, ., and v.
are given param- eters of the prior pdf with h., *a, va > 0. (a) What is the
modal value of the marginal prior pdf for , px(.[.., "a)? What condition on v.
is required for the mean and the variance of this pdf to exist ? (See
Appendix A, Section 4.) (b) What are the mean and variance of the normal
conditional prior pdf for IX, given ,, IX, and h, po.(ix I , Ix, ha)? (c) After
integrating the joint prior pdf for ix and with respect to , show that the
marginal prior pdf for ix is in the form of a univariate Student t pdf. What
condition on v is required for the mean and variance of this pdf to exist ?
(See Appendix A, Section 2.) Given that this condition is satisfied, derive
the prior mean and variance of Ix. (d) Comment on other properties of the
marginal prior pdf's for ix and , particularly how their properties depend on
values of the prior parameters. 6. Combine the natural-conjugate prior pdf
in Problem 5 with the following normal likelihood function: /(ix, *IY) oc
exp 2cr2 (y, - Ix)2 oc- exp [rs + n(ix -/)] , where and Y'= (Yx, Y,...,Y,), v=
n- 1, rs2 = y. (y, _ 12)2, t2 = .. y,/n tO obtain the joint posterior pdf for v and
(a) What are the form and properties of the marginal posterior pdf for a ?
(b) What are the form and properties of the conditional posterior pdf for Ix
given a, the prior parameters' values, and the sample observations y ? (c)
What are the form and properties of the marginal posterior pdf for Ix ? 7. In
Problem 2 the posterior pdf for 4' = ae was obtained. Consider and compare
the following two loss functions: Lx = kx(4' - x) 2 and where the k's are
positive numerical constants and the $'s are point estimates. What values for
x and for 2 minimize expected loss ? 8. Contrast the point estimates for 4'
obtained in Problem 7 with the maximum likelihood and other sampling
theory estimators for 4' = ' in the normal mean problem with likelihood
function, as shown in Problem 6. In particular,

56 PRINCIPLES OF BAYESIAN ANALYSIS WITH SELECTED


APPLICATIONS consider the mean-square error of an estimator of the
following form = cs 2, where c is a constant. Show that E( - 4)', where the
expectation is with respect to the pdf for s 2 with 4 fixed, has a minimal
value for c = d0' + 2), where , = n - 1. In taking the expectation, note that
vs2/4 has a x 2 pdf with v = n - 1 degrees of freedom. Compare the
resulting estimator vs2/(v + 2) with what was obtained as point estimates in
Problem 7. 9. If z is a strictly positive random variable and if y = lnz is
normally distri- buted with mean t and standard deviation ,, then z is said to
have a log- normal distribution. What is the pdf for z ? Show that e" and e"
+ *' are the median and mean, respectively, of the pdf for z. 10. Let y denote
the natural logarithm of the ith individual's annual labor income, z,; that is,
y = lnz, i = 1, 2,..., n. Further assume that the y's are normally and
independently distributed, each with mean t and variance ,2. Then under the
assumption that the prior pdf for t and is P(t, ) oc 1/or, with-m < where 0 is
the median of the log-normal pdf for annual labor income, has a univariate
Student t form and thus 0 has a "log-Student t" posterior pdf. 11. In Problem
10 show that the natural logarithm of the mean, 7, of the log- normal pdf, In
with conditional mean/2 + �or 2 and conditional variance crY/n, where =
Y.I= ydn. Then provide the joint posterior pdf for In and and explain how
this bivariate posterior pdf can be transformed to one for indicate how the
latter bivariate pdf can be normalized and analyzed by employing bivariate
numerical integration techniques. 12. Suppose that a two-sided coin is fairly
tossed and comes down with the head side upward. Given this outcome for
a single toss, what is the likelihood function and the maximum likelihood
estimate of the probability 0 of obtaining a head on a single toss? If our
prior pdffor 0 is uniform, 0 _< 0 _< 1, what is the mean of the posterior pdf
for 0 ? On the other hand, if our prior pdf for 0 is given by (2.54), what is
the posterior mean ? How do you interpret the fact that the maximum
likelihood estimate and the posterior means are numerically quite different ?
13. In Problem 12 plot the posterior pdf's associated with the two diffeient
prior pdf's. What can be said about the precision with which 0 can be
estimated from a sample of size n = 1 ? What are the posterior variances
and the vari- ance of the maximum likelihood estimator ? 14. Assume that
of 10 randomly selected consumers from a homogeneous large population,
four responded that they bought product A and six responded that they did
not. Under the assumption that consumers' choices of products are
independent and that there is a common probability 0 of buying product A
for all members of the population, what is the likelihood function and
maximum likelihood estimate for 0 ? What is the posterior pdf for 0
employing the prior pdf in (2.54)? Find its mean and modal value and
provide a justi- fication for the selection of the posterior mean as a point
estimate of the probability of buying Brand A, given a quadratic loss
function. QUESTIONS AND PROBLEMS 57 IS. In Problem 14 obtain the
posterior pdf for 0 by using an informative prior pdf in the form of a beta
pdf With prior mean equal to 0.5 and prior variance equal to 0.024.
Compare prior and posterior moments. 16. What might have been the
source or sources of the prior information described in Problem 15 ?

CHAPTER III The Univariate Normal Linear Regression Model In this


chapter we first take up the analysis of the simple univariate normal linear
regression model and then turn to the normal linear multiple regression
modelY Throughout we adopt assumptions of normality, independence,
linearity, homoscedasticity, and absence of measurement errors. Certain
departures from these specifying assumptions and their analysis are treated
in subsequent chapters. 3.1 THE SIMPLE UNIVARIATE NORMAL
LINEAR REGRESSION MODEL 3.1.1 Model and Likelihood Function In
the simple univariate normal linear regression model we have one random
variable (hence the term "univariate"), the "dependent" variable, whose
variation is to be explained, at least in part, by the variation of another
variable, the "independent" variable. That part of the variation of the
dependent variable unexplained by variation in the independent variable is
assumed to be produced by an unobserved random "error" or "disturbance"
variable which may be viewed as representing the collective action of a
number of minor factors that produce variation in the dependent variable.
Formally, with the dependent variable, denoted by y, and the independent
variable denoted by x, we have the following relationship: (3.1) Y = fix +
fio. xi + ut, i = 1, 2,..., n, Bayesian analyses of the univariate normal linear
regression model appear in H. Jeffreys, Theory of Probability (3rd rev. ed.).
Oxford: Clarendon, 1966, pp. 147-161; D. V. Lindley, Introduction to
Probability and Statistics from a Bayesian Viewpoint. Part 2. Inference.
Cambridge: Cambridge University Press, 1965, Chapter 8; and H. Raiffa
and R. Schlaifer, Applied Statistical Decision Theory. Boston: Graduate
School of Business Administration, Harvard University, 1961, Chapter 13.
58 THE SIMPLE UNIVARIATE NORMAL LINEAR REGRESSION
MODEL 59 where y = ith observation on the dependent variable, x -- ith
observation on the independent variable, u, = ith unobserved value of the
random disturbance or error variable, and fix and rio. = regression
parameters, namely, the "intercept" and "slope coefficient," respectively.
Note that the relation in (3.1) is linear in fix,/go., and u,, hence the term
"linear" regression.o. Assumption 1. The u, i = 1, 2,..., n, are normally and
independently distributed, each with zero mean and common variance o .
Regarding the independent variable, we make the following assumption:
Assumption 2. The x, i = 1, 2,..., n, are fixed nonstochastic variables.
Alternatively, we can make the following assumption about x: Assumption
3. The x, i = 1, 2,..., n, are random variables distributed independently of the
u, with a pdf not involving the parameters x, o., and . To form the likelihood
function under assumptions 1 and 3, we write the joint pdf for y' = (Yx, Y.,.
�., Y) and x' = (x, xo.,..., x0, namely (3.2) p(y, xlx, a, ,a, 0) = p(yl x, x, a,
,a) g(x10), when 0 denotes the parameters of the marginal pdf for x. Since
by assumption (3) 0 does not involve , a, or a, the likelihood function for x,
a, and, can be formed from the st factor on the right-hand side of (3.2). Note
from (3.1) that for given x, , a, and a a, y will be normally distributed wkh
E(ydx,, , , ) = + =x and Var(y, lx{, ,, , ) = , i = 1, 2,..., n. Further, the y,,
given the x,, fix, fia, and , will be independently distributed. Thus we have
(3.3) p(Ylx, x, a, a) -- exp (y, - x - ax,) a , - 2 with the summation extending
from i = 1 to i = n. Also, (3.3) would have resulted had we adopted
Assumption 2 about the x rather than Assumption 3. e expression in (3.3),
viewed as a function of the parameters fix, fia, and � , is the likelihood
function to be combined with our prior pdf for the pameters. o. The relation
in (3.1) need not be linear in the "underlying" variables; for example, it may
be that y, = log w, where we, is the ith observation on an underlying
variable, or x may represent z �', where z, is an observation on an
underlying variable.

60 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL THE


SIMPLE UNIVARIATE NORMAL LINEAR REGRESSION MODEL 61
3.1.2 Posterior Pdf's for Parameters with a Diffuse Prior Pdf For our prior
pdf for fx, f. and p we assume that fx,/%. and log p are uniformly and
independently distributed, which implies 1 (3.4) p(fi. f., p) oc - 0<p<o.
Then, on combining (3.3) and (3.4), the joint posterior pdf for/gx, f., and p
is given by (3.5) 1 [, ] P(fi-/., plY, x) oc exp (y,-/gx - f.x,) ' � pn + 1 22 This
joint posterior pdf, which serves as a basis for making inferences about fx,
f2, and p, can be analyzed conveniently by taking note of the following
algebraic identity a' (3.6) Y. (Y, - fix - fax,) a = vs �' + n(fix -/x) a + (fi -
l�.) ' Y. x? + 2(fi -/3(fi. -/.) Y. x,, where , = n - 2, (3.7) (3.8) with y = n-x y
and = n-x y x. To establish (3.6) we write (Y,- fx - f.x) ' = E {(Y,- x - .x,) -
[(fix - x) + (fi. - .)x,]} '. On expanding the rhs, note that the cross-product
term vanishes and thus (3.6) results. On substituting from (3.6) in (3.5), we
have 1 p(fix,/92, plY, x) oc p +x (3.9) 1 [s o. x exp --a + n(fix -/x) ' + (/g.
-/.)' x, ' + 2(fix -/0(fo. - .) xd}' From (3.9) it is immediately seen that the
conditional posterior pdf for fx and f., given p, is in the bivariate normal
form with mean (/x,/.) and a If the expression in (3.6) is substituted in (3.3),
the likelihood function can be expressed in terms of s%/, and/2, which are
sufficient statistics. covariance matrix Of course, since p' is rarely known in
practice, this result is not very useful. To obtain the marginal posterior pdf
for fx and f. we integrate (3.9) with respect to p to obtain (3.10) oc [,s + 2(fi
- 0(fi - ) x,] -, which is seen to be in the form of a bivariate Student t pdf
(see Appendix B). From properties of the bivafiate Student t pdf we have
the following results: (3.11) P(fily, x) Iv+ (x'-m) ] , -<<, (3.12) P(fiIY, x) If
we make the following transformations in (3.11) and (3.12), [ s x?/n I (fi - )
= t and (3.14) s/[ (x,- g)] = the random variable t has the Student t pdf with
degrees of freom. These results enable us to make inferences about x and ,
using tables of the t-distribution. As regards the posterior pdf for a, it can be
obtained by integrming (3.9) with respect to x and . This operation yields 1
[ s (3.15) P(*IY, x) From (3.15) a is distributed in the form of an inverted
gamma function (see Appendix A). Thus we have [ /v F[( - 1)/2]. Var(a) =
s% (Ea). E. = side} r(d2) ' - 2 For the pdf in (3.15) to be proper we must
have v > 0; for the mean to exist we need > 1; and for the variance to exist
we need v > 2. See Appendix A.

62 THE LYNIVARIATE NORMAL LINEAR REGRESSION MODEL


Further, if we transform from to ', the posterior pdf for the variance is (3.16)
p(a�'ly, x) oc [(o.)/o.]-x exp -], 0 < < . Finally, the posterior pdf for the
precision parameter h = 1/ is given by (3.17) p(hly, x) h /a-i exp (_v), 0 < h
< . It is seen from (3.17) that the variable vsah has the X a pdf with v
degrees of freedom. From (3.16) and (3.17) we have, for example, Ea u =
vsU/( v - 2) and Eh = 1/s a. Further properties of these pdf's are discussed in
Appendix A. To make joint posterior inferences about x and flu we shall
show that the following quantity [n(x - x) + ( - ) E x, + 2(fix - x)( - ) x,]
(3.18) = 2s �' is distributed a posteriori as Fo..v. To show this let us write =
where 8' - (fi - ,/o. - .) and 111 (3.19) A ' x, x,o. ' Using this notation, the
posterior pdf for 8 is given from (3.10) by (3.20) p(81y , x)oc (1 + _2 8,A8)
Now, since A is positive definite, we can write A -- K'K, where. K is non-
singular and thus 8'A8 = (KS)'K8 = V'V, where V = K8 is a 2 x 1 vector and
V' = (vx, vo.). Then - hi2 (3.21) p(Vly, x) cc (1 + 2 V'V � Now let v = cos
0, vo. = sin 0. The Jacobian of this transformation is �. Note, too, that V'V
= v �' + v. �' = (cos �' 0 + sin �' 0) = . Thus (3.22) p(q'IY, x) 1 + - , v I
THE SIMPLE UNIVARIATE NORMAL LINEAR REGRESSION
MODEL 63 which is the Fo.,v pdf. s This result can be employed to
construct posterior confidence regions for fix and rio.. In (3.10) we noted
that/x and/%. are distributed in the bivariate Student t form. An important
property of this bivariate pdf is that a single linear combination of variables
so distributed has a pdf in the univariate Student t form. This result is
illustrated below in obtaining the posterior pdf of .1o, defined by (3.23)
E(ylx = Xo) = '1o = tg + t %.xo' It is seen that % is a linear form in/ and fi.
and thus will be distributed in the univariate Student t form with mean 4o =/
+/o. xo; that is .1o-4o ~ (3.24) s[1/n + (Xo - g)o./y. (x, - The result in (3.24)
can be derived by changing variables in (3.10) from/x and/o. to .1o and/o.
as follows: .1o - 4o = - + - - = - The Jacobian of this transformation is 1.
Then, on integrating out ., the marginal posterior pdf for .1o is given by
(3.25) [ n (x' - :g)�' ]- p(wolY, x) ocv + s o. Z (x, - Xo) a (% - 4�)a , -oo <
.1o < o. Then note that Y. (x{ - Xo) a = [(x, - g) - (Xo - g)]a = (x - )a + n(xo
- X') a and thus (3.24) follows. The re. sult in (3.25) provides the complete
posterior pdf for .1o and (3.24) can be utilized to construct posterior
intervals for %. 3.1.3 Application to Analysis of the Investment Multiplier
To illustrate application of the results presented above, we interpret (3.1) as
relating income, the dependent variable, to autonomous investment, the
independent variable. The parameter fia is then termed the investment
multi- plier. If our prior information about fi, fia, and a is vague, we can
employ the diffuse pdf in (3.4) to represent it. Note that this involves
assuming -oo </g. < o; that is, our prior information is not precise enough to
fix even the algebraic sign of the multiplier? Our data, taken from a paper
by In general p(F)oc F(m-2)m/(l + m/qF) (m+q)m is the Fm,q pdf with 0 <
F < o; see Appendix A. In early discussions some argued that the
investment multiplier might be positive, negative, or possibly zero.

64 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL


Haavelmo,* are given in Table 3.1. From these data, with n = 20, we
compute the sample quantities shown in (3.7) and (3.8)' / = 345, j%. = 3.05,
s ' = 662.8. Table 3.1 HAAVELMO'S DATA a ON INCOME AND
INVESTMENT Year Income b Investment � Income b Investment � ($
per capita, deflated) Year ($ per capita, deflated) 1922 433 39 1932 372 22
1923 483 60 1933 381 17 1924 479 42 1934 419 27 1925 486 52 1935 449
33 1926 494 47 1936 511 48 1927 498 51 1937 520 51 1928 511 45 1938
477 33 1929 534 60 1939 517 46 1930 478 39 1940 548 54 1931 440 41
1941 629 100 ' T. Haavelmo, "Methods of Measuring the Marginal
Propensity to Consume," J. Am. Statist. Assoc., 42, p. 88 (1947). Income is
per capita personal dispos&ble income deflated by the BLS Cost of Living
Index, 1935-1939 = 100. o Haavelmo defines investment to be the per
capita price-deflated difference between personal disposable income and
personal consumption expenditures. He is aware of the fact that this
empirical measure contains certain elements which are not strictly invest-
ment outlays. Further we have 1 =45.35 and (x-)'=285.55. Then, utilizing
the results in (3.13) and (3.14) along with tables of the Student t pdf, 8 we
obtain the posterior pdf's for/x and/. Shown in Figure 3.1. From the bottom
panel of Figure 3.1 we see that the posterior pdf for the multiplier/%. is
centered at/%. = 3.05, the posterior mean. Further, we see 7 T. Haavelmo,
"Methods of Measuring the Marginal Propensity to Consume," J. Am.
Statist. Assoc., 42, 105-122 (1947), reprinted in William C. Hood and T. C.
Koopmans, Studies in Econometric Method. New York: Wiley, 1953, pp.
75-91. 8 See, for example, N. V. Smirnov, Tables for the Distribution and
Density Functions of t-Distribution. New York: Pergamon, 1961. These
tables give values of p(tv), that is, ordinates of the Student t pdf with v
degrees of freedom. In our problem v = 18. To obtain ordinates of the
posterior pdf for f12 we have from (3.14) dry = k dl2 where k = [ (xt-
g?]V:/s. Thus p(tv)dtv = p(tv)k dla, and the ordinates of the posterior pdf for
fla are given by p(tOk, with the values ofp(tv) obtained from the tables.
0.024 0.020 0.016 0.012 0.008 0.004 I THE NORMAL MULTIPLE
REGRESSION MODEL I i I I I I I I I I 285 305 325 345 365 385 405 (a) 1
= 345 65 1.2F111 I I lill I 1.0-- 0.8-- - 0.6 ' 0,4 0.2 1.5 2.0 2.5 3.0 3.5 4.0 4.5
(b) 2 = 3.05 isure 3,1 ostedor pdf's for the itercept (z) and ivestmet
mu]tip]ie () based o aave]mo's model ad dat, ad diffuse prior distributions
fo pmmetes. () ostefior pdf fo imecept. (b) ostefio pdf fo the multiplier. that
thee is a almost elJib]e postefio pobabJ]it that the multip]ie Js Qeative. hus,
although o pfio beliefs did not peclude eative wIues [o the mu]tiplJe, use of
the peset model ad the information i the sample data has esuIted in a
postefio pdf that jdJcates that eafive values fo the multiplie ae improbable.
3.2 THE NORMAL MULTIPLE REGRESSION MODEL 3.2.1 Model and
Likelihood Function With the normal multiple regression model, we assume
that an n x 1 vector of observations y on our dependent variable satisfies
(3.26) y = X[5 + u, where X = an n x k matrix, with rank k, of observations
on k independent variables, [5 = a k x 1 vector of regression coefficients, u
= an n x 1 vector of disturbance or error terms.

66 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL We


assume that the elements of u are normally and independently distributed,
each with mean zero and common variance e'; that is Eu = 0 and Euu' =
e'In, where I is an n x n unit matrix. With respect to the matrix X, if the
regression equation is assumed to have a nonzero intercept, all elements in
the first column of X will be ones; that is, the first column is t with t' = (1,
l,..., 1). The remaining elements of X may be nonstochastic or stochastic, as
in Section 3.1. If elements of X are stochastic, it is assumed that they are
distributed independently of u with a distribution that does not involve the
parameters 15 and e. Under the above assumptions the joint pdf for the
elements of y, given X, 15, and e, is p(Yl x, 15, e) cc -- exp 1 e n (3.27) 1
oc- exp [rs ' + (15 - )'X'X(15 - )] , e n wherev=n-k, (3.28) and (3.29) = (x,x)-
X,y. s.= (y- xO)'(y- xO) are sufficient statistics. The second line of (3.27)
makes use of the following algebraic identity: (y- X15)'(y- X15)= [y- XO-
X(15- O)]'[y- X- X(15- )] = (y - x)'(y - x) + (15 - )'x'x(15 - ), since the
cross-product terms (15 - )'x'(y - x) = (15 - )'[x'y - x'x(x'x)-x'y] = o. 3.2.2
Posterior Pdf's for Parameters with a Diffuse Prior Pdf As prior pdf in the
analysis of the multiple regression model, we assume that our information
is diffuse or vague and represent it by taking the elements of 15 and log e
independently and uniformly distributed; that is (3.30) p(15, e) oc-, e 0 < e
< 2), for i = 1, 2,..., k. On combining (3.27) and (3.30), the joint posterior
pdf for the parameters 15 and e is (3.31) p(15, ely, X) oc e--ci exp 1 [rs . +
(15 _ )'X'X(15 - )] � THE NORMAL MULTIPLE REGRESSION
MODEL 67 From (3.31) it is seen immediately that the conditional
posterior pdf for 15, given e [i.e., �(151e, y, X)], is a k-dimensional
multivariate normal pdf with mean and covariance matrix (X'X)-e '.
Although this fact is interesting and useful in certain derivations, e ' is
rarely known in practice and thus the conditional covariance matrix (X'X)-e
' cannot be evaluated. To get rid of the troublesome parameter e, we
integrate (3.31) with respect to e to obtain the following marginal posterior
pdf for the elements of 15: (3.32) P(151Y, X) = f; p(15, ely, X) de (s + ( -
)'x'x( - ))-, which is in the form of a multivariate Student t pdL This
posterior pdf serves as a basis for making inferences about . Before turning
to further analysis of it, we note that the marginal posterior pdf for can be
obtained from (3.31) by integrating with respect to the elements of ; that is,
(3.33) 1 ( vs exp -], which is in the form of an inverted gamma pdf and
exactly in the same form as (3.15), except that here v = n - k. By simple
changes of variable the posterior pdf's for ea or h = 1/e a can be obtained
from (3.33) if they are wanted. We now return to the analysis of (3.32), the
marginal posterior pdf for . First, we derive the marginal posterior pdf for a
single element of , say fi. This can be done in two ways, namely, by
integrating (3.31) with respect to fie, fie, � �., fi and then with respect to e
or by integrating (3.32) with respect to fie, fie,..., . We take the second route
here. For convenience rewrite (3.32) as follows: (3.34) p(IY, x) ( + 'H) -,
with ' = ( - )' and H = X'X/s . Now let x = fi - , a scalar, and ' = (fie - fie, fia -
fie,..., fi - fi). Then (3.3) a'Ha = &h + a'Ha + 2Ha, where H has been
partitioned to conform with the partitioning of ; that is I ' i-z \i-I. : with hn, a
scalar, Hx. a 1 x (k - 1) vector, H.x a (k - 1) x 1 vector and H.. a (k - 1) x (k
- 1) matrix. Now complete the square on $. in (3.35)

68 as follows THE UNIVARIATE NORMAL LINEAR REGRESSION


MODEL (3.36) + = - + ( + 8xH22-XHux)'Hu2($2 + xH22-XH2x). Then
substitute from (3.36) in (3.34) to obtain (3.37) x + + + with C = Haa/(v +
xa/hXX), where h = (hxx - HxaHaa-XHax) -, the (1, 1) element of the
inverse of H. Now (3.37) can be integrated with respect to by using the
properties of the multivariate Student t pdf to yield �' (3.38) Thus (3.39)
(hXX)� s(mXX) ~ tv, where m xx is the (1, 1) element of (X'X)-. Since the
choice of which element of is labeled x is open, we have for the ith element
of ' (3.40) 8 - - * (h,) s(m,) t,. This result enables us to make inferences
about , conveniently by consulting the t tables for = n - k degrees of
freedom. Note also that a simple change of variable in (3.38), namely, F =
xa/h = ( - x)a/sam , yields (3.41) P(F) F-(v + F) -(+x)/, 0 < F < , and thus F
= (fix - x)/sm xx has the Fx. pdf. If we are interested in the marginal
posterior pdf of a subset of elements of (or of = - ), we can partition ' = ('')
and write the quadratic form 8'H8 appearing in (3.34) as follows' [H I H/ 9
The integration of ICI[1 + ( + z-xHz)'C( + zH-zHz)]-*� with respect to $2
yields a constant independent of 8x. Since C involves z, it must appear in
(3.38). THE NORMAL MULTIPLE REGRESSION MODEL 69 where the
partitioning of H has been done to conform with that of [i. On completing
the square on [io. in (3.42) we have [i' H[i = [i' H [i - [i' H .Ho.o.- H. [i + +
+ which, when substituted in (3.34), yields (3.43) p([ix, [i.ly, x) oc [,, +
[ix'(Hxx - + ([io. + H.o.-XH.x[iO'Ho..([io. + H..-XHo. x[ix)] -"/'. As an
aside, we see from (3.43) that the conditional pdf for [i., given p([i.I[ix, y,
X), is in the multivariate Student t form with conditional mean -H.o.-XHo.
x[ix. Thus, since [i. = 1S. - o., the conditional mean of 1So., given To obtain
the marginal posterior pdf for fix, we have to integrate (3.43) with respect to
[i. which can be done by using properties of the multivariate Student t pdf.
This operation results in x� (3.44) p([ixJy, X) oc [u + [ix'(Hxx - Hx2H2.-
XH20[Ix] -('-)12, where ka is the number of elements in [io.. Note that n -
ka = n - k + k - ka = v + kx so that the exponent of (3.44) is -(v + kx)/2.
Thus from (3.44) the marginal posterior pdf for [ix is in the multivariate
Student t form with E[ix =E(lax-x)=0, that is Elax = x, v > 1, and E(lx - )
(1S - 0 '= [(v/(v- 2)](H - HaHao.-H.x) -x, v > 2, where it is to be
remembered that the matrices H,t, a, l = 1, 2, are submatrices of H =
X'X/s% Next, using the prior assumptions in (3.30), we turn to the problem
of deriving the posterior pdf of a linear combination of the elements of 1S,
say = 1'[5, where a is a scalar parameter and 1' is a 1 x k vector of fixed
numbers; for example, if the elements of [5 are Cobb-Douglas production
function parameters, we might be interested in c =/x +/%. +... +/, which is
the "returns to scale" parameter. In this application 1'= (1, 1,..., 1). In other
situations other linear combinations of the elements of 1 may be of interest.
To obtain the posterior pdf of a = 1'1S we note that the joint posterior pdf
for [5 and a can be written as (3.45) P([S, *IY, X) = p(�l *, y, X)P(*IY, X),
x0 Write (3.43) as p(fx, 8alY, X)oc {iv + 8x'(Hzx - HxaHaa- XHx)fx]-
'ia[C[- V.,} x {ICl[l + ( + Ha2-XH2zSz)'C($2 + H22-XHaz$1)]-nla}, with C
-- H2a/[v + $x'(Hxx - Hx2H22- xH2x)fx]i By integrating over 89. the
second factor yields a constant independent of $ and the first factor is
proportional to the expression shown in (3.44).

70 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL with


p([5[er, y, X) in the multivariate normal form. Thus, conditional on er, =
1'[5 will be normally distributed, since it is a linear combination of
normally distributed variables, the elements of [5, with mean '& - 1', and
conditional posterior variance E[(a - &)�'ler , y, X)] = Eli'([5 - )([5 -
O)'ller, y, X] = l'(X'X)-qer ', since the conditional covariance matrix for [5,
given er, is (X'X)-Xer '. Thus the marginal posterior pdf for a can be
obtained by integratingp(a, erIy, X) = P(ler, Y, X)p(erly, x) with respect to
er. Letting c = I'(X'X)-q, we have 1 [ (a - &)'] p(ler, y, x) oc exp -?'7 ]' 1 (
P(erlY, X) oc er--rri exp and P(IY, X) = f; p(ler, Y, X) p(erlY, JO der (3.46)
1 - + c err + o. exp vs ' der o: [,, + - ?'l -(" + ,S'9'� J Thus a has a posterior
pdf in the univariate Student t form; that is, (3.47) a - & sc� tv, with v = n -
k, & = 1' and c = I'(X'X)-q. This fact can be utilized to make inferences
about a. xx 3.2.3 Posterior Pdf Based on an Informative Prior Pdf We next
take up the problem of using the posterior pdf in (3.31) as a prior pdf in the
analysis of a new sample of data.generated by the same regression process.
To distinguish between the two samples subscripts 1 and 2 are employed.
With this notation the posterior pdf in (3.31), which we use as a prior pdf
for analysis of a new sample, is 1 [--a (Y - Xx[5)'(y - X[5)] P([5, erJYx, Xx)
oc exp 1 ern 1 + 1 (3.48) 1 oc exp [vxsx �' + ([5 - [x)'Xx'Xx([5 - x)] , ern I
+1 x x See, Appendix B for the derivation of the joint distribution of several
linear com- binations of variables having a multivariate Student pdf. THE
NORMAL MULTIPLE REGRESSION MODEL 71 with v = n - k, =
(X'XO-X'y and vs �' = (y - X)'(y - X). Viewing (3.48) as a prior pdf, we
see that it factors into a normal part for [5, given er, with mean x and
covariance matrix (Xx'Xx)-xer ' and a marginal pdf for er in the inverted
gamma form with parameters vx and sx�'; that is, from the second line of
(3.48) 1 [-'ffo. ([5- i)tXltXi(P -- P([sler, 1, Sl �') oc 3- exp 1 and 1 ( vsa
P(erl , sx a) oc exp � Thus the prior pdf's parameters are just the quantities
[, s a, X'X, and v. The likelihood function for the second sample (Ya, Xa),
where yo. is an na x 1 vector of observations on the dependent variable in
the second sample and Xa is an na x k matrix with rank k of observations
on the k independent variables in the second sample, is assumed to be given
by 1 [-(Y- )(Y-X)] (3.49) l(p, *lYe, X) m exp 1 x ' � Note that and a for the
second sample are assumed to be the same param- eters as those for the first
sample. On combining the prior pdf in (3.48) with the likelihood function in
(3.49), we obtain the posterior pdf: 1 p(P, *lyx, Y2, Xx, Xa) n 1 + 2 + 1
(3.50) x exp - [(Yx- Xx)'(Yx- Xx) + (y - ) (y - x)] � This expression can be
brought into a more convenient form on completing the square in the,
exponent; that is, (y- X)'(y- X)+ (y- X)'(y- X) = , + ( - )'( - ), where M = Xx'
Xx + Xa' Xa, = M-x(Xx'yx + Xa'ya), s= (y- x)'(y- x')+ (y- x)'(y- x), Thus
(3.50) can be written as (3.5) p(, *lye, y, x, x) m exp [vs a + ( - )'M( - )] �

72 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL It is


seen that (3.51) is in exactly the same form as (3.31) and thus it can be
analyzed by using exactly the same techniques. Further, if we had pooled
the two samples, based our likelihood function on both samples, and used
the diffuse prior pdf in (3.30), the resulting posterior pdf would be exactly
(3.50), from which (3.51) can be derived. 3.2.4 Predictive Pdf In this
section we derive the predictive pdf for a vector of q future observa- tions,
y' = (y + , y +.,..., y + ), which is assumed to be generated by the multiple
regression process specified at the beginning of this section; that is, with n
sample observations y, given X, and with the diffuse prior assumptions in
(3.30), we wish to derive the pdf for y which is assumed to be generated by
(3.52) = g + fi, where g is a q x k matrix of given values for the independent
variables in the q future periods and fi is a q x 1 vector of future disturbance
terms normally and independently distributed, each with mean zero and
common variance As mentioned in Chapter 2, one way of deriving the
predictive pdf is to write down the joint pdfp(, , a[ X, g, y) and integrate
with respect to and a to obtain the marginal pdf for , which is the predictive
pdf. In the present problem this joint pdf factors as follows' (3.53) p(Y, , 1
X, , Y) = P(Y[, , )P(, lY, X) with p(, lY, x) being the posterior pdf for and a,
shown in (3.31) and (3.54) p(y[O, a, ) m 7 exp - (y - )'(y - � With this
noted, (3.53) is proportional to (3.55) and our problem is to integrate (3.35)
with respect to a and . On integrating with respect to a we obtain (3.56) p(,
[y, X, g) [(y - X)'(y - X) + ( - g)'( - g)]-t +q,/. On completing the square on
we have (y- x)'(y- xo)+ (y- gO)'(y- ) = y'y + y'y + 'M- 2'(X'y + 'y) = y'y + ' -
(y'X + y')M-X(X'y + ') + [- M-x(X'y + 'y)]'M[- M-x(X'y + ')], THE
NORMAL MULTIPLE REGRESSION MODEL 73 where M = J('J( + 'g.
On substituting in (3.56) and integrating with respect to the k elements of ,
we obtain (3.57) p(yJy, X, g) oc [y'y + y'y - (y'X + y'g)M-x(X'y + where v =
n- k. To put (3.57) in a more intelligible form we write the quantity within
brackets on the rhs of (3.57) as (3.58) y'(I- XM-xX')y + y'(I- gM-Xg,)y_
2y,gM-XX,y = y'(I- XM-xX')y_ y,XM-Xg'(i + [ - (I- gM-g')-XgM-xX'y]'(I-
gM- x [- (I- iM-Xg')-xgM-xX'y]. Now note the multiplication' (3.59)
following result which can be verified (I- gM-g') - = I + g(X'X)-t g '. by
direct matrix Using this result, we have (I- gm- g')- gM - (3.60) = [I +
g(X'X)-Xg']gM = = g[1 + (X'X)-Xg'g](X'X + = g(x'x)-'. Substitution of this
result in (3.58) leads to y'(I- XM-XX')y - y'XM-X g'g(X'X)-X'y + (Y-
g0)'(I- ,M-Xg')(Y- g0) = y'{I- X[M - + M-xg'g(X'X)-x]X'}y + (- gO)'(I- gM-
g')(Y- gO) = y'[I- XM-X(X'X + g'g)(X'X)-xX']y + (y- g)'(I- gM-g')(y- gO) =
y'[I- X(X'X)-X'Iy + ( - g)'(I- gM-g')(y - gO), where = (X'X)- X'y. Utilizing
this result, and since y'[I - X(X'X)-xX']y = (y - XO)'(y - XO) = vs a, we can
write (3.57) as (3.61) P(YIY, J/, g) oc [1, + (y - gO)'/-/(Y - gO] -(+)'a, where
H = (1/s')(I - gM-xg'). It is seen from (3.61) that y is distributed in the
multivariate Student t form. Thus we have for the mean of (3.62) Ey: g0 > 1
74 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL and
for the covariance matrix (3.63) �(y- = H - v>2 v$ 2 v - 2 (I- gM-g') - - 2
N + Also, of course, the properties of the multivariate Student t pdf,
established in connection with (3.32), apply here as well; for example, the
marginal pdf of a single element of , say , will be in the univariate t form:
(3.64) where (i) is the ith row of � and h is the (i, i)th element of the
inverse of H. Last, as iri (3.32) and (3.47), a linear combination of the
elements of will be distributed in the univariate Student t form; that is, let I
be an q x 1 nonstochastic vector with given elements. Then V = 1' will be
distributed in the univariate Student t form: (3.65) (l,H_q) where A
particular linear combination of future observations often encountered in
economic work is the following: (3.66) where r is a given discount rate. For
(3.66) r= [O + r) 0 + r)-:,..., 0 + r)-q. With the result in (3.65) available, the
distribution of the quantity V in (3.66) is known. Further, if we have a
utility function depending on V, say U(V), its expectation can be evaluated:
(3.67) EU(V) = f U(V) p(V[y) dV, since from (3.65) we know the pdf for V,
p(Vly). mputation of (3.67) provides a means of comparing the expected
utility associated with variom xa We could easily modify (3.66) to
incorporate different discount rates for different future periods if that were
thought appropriate; that is V- I +r (1 +ra) a +'" +(1 +r) THE NORMAL
MULTIPLE REGRESSION MODEL 75 V's, provided that each V is a
linear combination of the future observations generated by a normal
regression model. 3.2.5 Analysis of Model when X'X is Singular The
moment matrix X'X will be singular when the n x k matrix X is of rank q
with 0 < q < k. This occurs, for example, when the observations on the
independent variables satisfy an exact linear relation and thus the columns
of X are not linearly independent; that is, for k = 2, X = (xx, x.), if xx and
xo. satisfy an exact linear relation it can easily be shown that ]X'X[ = 0 and
thus X'X is not of rank k = 2. This situation is commonly termed
"multi9ollinearity." Another example in which X cannot be of rank k is
when n < k; that is, when the number of observations is less than the
number (k) of independent variables or coefficients to be estimated. This
problem often arises in connection with analyses of reduced-form equations
associated with large simultaneous equation econometric models. xa Also,
design matrices in the area of experimental design are often not of full rank.
4 When X'X is singular, for whatever reason, it is generally appreciated that
prior information, in some form, must be added to the sample information
in order to estimate all k regression coefficients. Below we analyze the
model with a natural conjugate prior pdf. Then we review a sampling theory
approach that utilizes generalized inverses and give it a Bayesian inter-
pretation. xs In the approach to the analysis of the regression model when is
singular, employed by Raiffa and Schlaifer x� and Ando and Kaufman, x?
it is assumed that we have prior information about 15 and which can be
represented by the following natural conjugate prior pdf: (3.68) P(15, *) =
P(151)P(*), xa See, for example, F. M. Fisher, "Dynamic Structure and
Estimation in Economy- wide Econometric Models," in J. S. Duesenberry
et al., The Brookings Quarterly Econo- metric Model of the United States.
Chicago: Rand-McNally, 1965, pp. 589-635, especially p. 622 if. x Se, for
example, F. A. Graybill, An Introduction to Linear Statistical Models, New
York: McGraw-Hill, 1961. x5 The material in this section appears in S. J.
Press and A. Zellner, "On Generalized Inverses and Prior Information in
Regression Analysis," manuscript, September 1968. x5 H. Raiffa and R.
Schlaifer, Applied Statistical Decision Theory. Boston: Division of
Research, Graduate School of Business Administration, Harvard University,
1961. x, A. Ando and G. M. Kaufman, "Bayesian Analysis of the
Independent Multinormal ProcessNeither Mean Nor Precision Known," J.
Amer. Statistical Association, 60, 347-358 (1965).

76 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL


where (3.69) IAI 1 )] p([5[.) oc 7 exp [---= ([5 - )'A([5 - and 1 ( VoCo"\
(3.70) p(a) o5: ' ,o + exp -ff/=., vo > 0. In (3.69) we have a normal prior pdf
for [5, given a, with prior mean and prior covariance matrix a"A-x, which is
assumed to be nonsingular. In (3.70) the prior pdf for a is in the inverted
gamma form with prior parameters Vo and Co". The prior parameters , A,
Vo, and Co" must be assigned appropriate values to represent the prior
information that is assumed to be available. The prior pdf's in (3.69) and
(3.70) can easily be combined with the likeli- hood function' in the first line
of (3.27) to yield the posterior pdf for [5 and (18: P([5, 'IY) oc (3.71) 1
a,++Vo+ exp [VoCo" + ([5 - )'A([5 - ) + (y- x[5)'(y- x[5)]} a,,++. exp -5.-
ffi=[n'c" + ([5- )'(A + X'X)([5- where n' = n + Vo, n'c" = VoCo" + y'y + 'A -
'(A + X'X), and (3.72) = (A + X'X)-(A + X'y). On integrating (3.71) with
respect to , the marginal posterior pdf for [5 is (3.73) p([Sly ) oc [n'c" +
which is a proper posterior pdf in the multivariate Student t form with mean
, shown in (3.72). Using (3.73), posterior inferences about all the elements
of .[5 can be made. Thus, given that we have prior information which can
be adequately represented by the prior pdf's in (3.69) and (3.70), the
Bayesian analysis of the model is quite straightforward, even though X'X is
assumed to be singular. We next review a sampling theory approach to the
analysis of the model x8 In going from the first line of (3.71) to the second,
we complete the square on [ as follows: ([ -- )'A( - ) + (y -- X[)'(y - X[) =
'(A + X'X) - 2['(A + X'y) + y'y + 'Alg=([- )'(A + X'X)([ - ) + Y'Y + 'A[- '(A
+ X'X)l,with[= (A + X'X)-(A + X'y). THE NORMAL MULTIPLE
REGRESSION MODEL 77 when X'X is singular and give it a Bayesian
interpretation. It is convenient and illuminating to reparameterize the
model: (3.74) y = XP� + u where �, a k x 1 vector of parameters, is given
by (3.75) and P is a k x k orthogonal matrix x� such that (3.76) P'X'XP= ()
00), where D is a q x q nonsingular diagonal matrix with the nonzero
charac- teristic roots of X'X on the diagonal. Then the "normal equations"
for � are (3.77) P'X'XP� = P'X'y or <,.,,, \P..'X'yl where �' = (x'i�.'),
with �1 and �. q x 1 and (k - q) x 1 vectors, respec- tively, and P1 is a k x
q submatrix of P given by P = (PliP0. From (3.76) P 'X' = 0, and thus the
vector on the rhs of (3.78) has a zero note that . subvector. The complete
solution of the normal equations in (3.78) is given by "� (3.79) =
(P'X'XP)*P'X'y + [I- (P'X'XP)*P'X'XP]z, where (P'X'XP)* denotes a
generalized inverse " (131) of P'X'XP and z is an x Given that P is
orthogonal, on substituting from (3.75) into (3.74) we have y = XPP'[ + u =
X + u, since PP' = I. a0 See, for example, C. R. Rao, Linear Statistical
Inference and Its Applications. New York: Wiley, 1965, p. 26. a M* is a GI
of M if, and only if, MM*M = M. This definition does not make M* unique,
as is well known. For further discussion of GI's, see C. R. Rao, ibid., pp.
24-26, and "A Note on a Generalized Inverse of a Matrix with Applications
to problems in Mathematical Statistics," J. Roy. Statistical Soc., Series B,
24, 152-158 (1962); T. N. E. Greville, "Some Applications of the
Pseudoinverse of a Matrix," SIAM Rev., 2, 15-22 (1960), and "The
Pseudoinverse of a Rectangular or Singular Matrix and Its Application to
the Solution of Systems of Linear Equations," ibid., 1, 38--43 (1959); R.
Penrose, "A Generalized Inverse for Matrices," Proc. Cambridge Phil. Soc.,
51, 406-413 (1955); and E. H. Moore, General Analysis, Part I.
Philadelphia: Memoirs of the American Philosophical Society, Vol. I, 1935.

78 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL


arbitrary k x 1 vector. On direct substitution of 9, given in (3.79), into the
normal equations it is seen that is a solution �'�' to the normal equations
for any GI of P'X'XP and for any z. To show how the choice of a GI and a
choice of z affect solutions to the normal equations note that for any
selection of the matrices C, E, and F is a GI of P'X'XP. On sub- stituting
from (3.76) and (3.80) into (3.79) we have, with z' = (zx'!zo.'), (3.81a) cg =
9. \ CPx'X'y ] + = z- CDz or (3.81b) 9 = D-Px'X'Y and 9. = z: + C(P'X'y -
Dz) (3.810 = z. + CD(x - zx). We see from (3.81b) that the estimator for yx
is independent of the choice of GI and of z. However, o. in (3.810 obviously
depends on C and thus on the choice of a GI and on the choice of zx and If,
for example, we use the Moore-Penrose GI for P'X'XP, namely, (3.82)
(P'X'XP)* = ()- 00)' or any GI with C = 0 [see (3.80)], the estimator for �
in (3.81a) is unaffected. However, 9o. in (3.81c) becomes (3.83) o. = za,
where z. is arbitrary. To see what this' analysis implies for the estimation of
la'we have la = P� from (3.75), and thus from (3.79) = J'9 = j,
(j,'x'xJ,)*J"x'y + [J' - (3.84) = (x'x)*x'y + [I- 9. That is, with N = P'x'xP, the
lhs of (3.77) is, with y = ', N, -- NN*P'X'y+ N(I- N*N)z = NN*P'X'y + (N -
NN*N)z -- NN*P'X'y -- P'X'y since from the definition of N*, NN*N N,
and NN,P,X,y =.(O D )(5 x E\ [P'X'\ = o I y= = P'X'y, given that Pa'X' = 0
from (3.76). THE NORMAL MULTIPLE REGRESSION MODEL 79 since
P(P'X'XP)*P' = (X'X)*, a generalized inverse of X'X, the singular moment
matrix? Then from (3.81) (3.85) and it is obvious that the estimates and o.
both depend on the choice of the GI and of z. We now wish to use the
Bayesian approach to analyze the model, when X'Xis singular, with a prior
pdf representing the prior information employed in the GI approach
described above. The likelihood function, in terms of the parameters �x,
o., and e, is given by l(�, o., ely) oc- exp - (y - XPy)'(y - XP�) (3.86) oc
exp -3- [a + (�x - 90'D(�x - 90] , where x = D-xPx'XY and a = (y -
XPx90'(Y - XPxgx). It is of funda- mental importance to observe that the
likelihood function does not .depend on �9.. 9'4 In the GI approach based
on a GI in the form of (3.80) with C : 0 we represent the prior information
regarding �x and e by the following improper diffuse pdf: 1 (3.87) P(Yx,
e) oc-, -oo < 7x < oo, i= 1,2,...,q, 0 < e< oo. The remaining prior
information, corresponding to (3.81b), takes the form of the following
linear relations or side conditions connecting �x and (3.88) �o. = zo. +
CD(�x - zx). These relations, for example, may be suggested by economic
theory or other considerations. The matrix C and the vectors zx and zo.
must be assigned values in accord.with the assumed available prior
information. Combination of the prior pdf in (3.87) with the likelihood
function in (3.86) produces the posterior pdf for �x and e: 1 (- [a + (y -
90'D(y - �&)]}, (3.89) P(�x, ely) oc e-r, x exp 1 o.a From the definition
of a GI, P'X'XP(P'X'XP)*P'X'XP = P'X'XP. On premultiply- ing both sides
by P and postmultiplying by P' we have X'XP(P'X'XP)*P'X'X = X'X; that
is, P(P'X'XP)*P' is a GI of X'X. 0.4 This is perhaps more easily seen fromy
= XPy + u -- X(Pi;Po.)()+u= XPiyi + u, since XPa = 0 from (3.76).

80 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL


which will be proper if n - q > 0. The posterior pdf for �x has mean 'x.
Further, from (3.88) the posterior expectation of �. is (3.90) Eye. = z. +
CD( - z0, which is identical to the point estimate in (3.81b) yielded by the
sampling theory GI approach. Thus the improper prior pdf in (3.87),
combined with the a priori relations in (3.88), yields posterior means for
�x and ya which are identical to estimates provided by the GI approach.
When C = 0, as would be the case in the GI approach if we used the Moore-
Penrose inverse (3.82), the a priori relations in (3.88) reduce to (3.91) 2 =
Z2. In Bayesian terms, if we assign a value to zo., and thus �a, in (3.9I),
this represents "dogmatic" prior information about Ya; that is, the prior pdf
for 2 is degenerate with all its mass at the point za. Further from (3.88) it is
clear that taking C = 0 removes any dependence of yo. on yx from the
analysis. The prior assumption in (3.91) is rather restrictive in many
situations, given that the value of �2 is not known precisely. One way we
can loosen this prior assumption is by taking (3.92) p(y2l) oc -7:3 exp - (yo.
- z0'Q(yo. - z2) to be the prior pdf for the k - q elements of Y2 and
assuming Y2 independent of yx. Given that Q is nonsingular, the mean of
this pdf is Ey2 = za and Vat(y2) = Q-xa. Thus by using (3.92) we can relax
the prior assumption in (3.91). When we combine the prior pdf's in (3.87)
and (3.92)aS.with the likelihood function (3.86), the posterior pdf is (3.93)
�o., ely) On transforming from to 15 this posterior pdf becomes as v.s
Here, as with C = 0, we are assuming yx ant ya independent a priori. 6 For
comparison with (3.73) the following is the prior pdf for , given .*, implied
by the prior assumptions regarding yx and ya in (3.87) and (3.92): p(tla)oc
exp[-(1/2oa)(Pa' - za)'Q(Pa'[ - za)]. Since the prior pdf for y' = (yx'iya') is
im- proper, this prior pdf for is also improper. THE NORMAL MULTIPLE
REGRESSION MODEL 81 (3.94) p(15, 'IY) oc exp - [a + (15 - )'PFP'(15 -
)1 , with = Pet, where 9' = (x'!za') and (3.95) F=() 0Q). Given that n - q > 0,
the posterior pdf in (3.94) is proper, since PFP' = t'xDPx' + P.QPa' is a
nonsingular matrix even though PxDPx' = X'X and p, PaQ a are each
singular. The pdf in (3.94) can be used to make posterior inferences about
15 and a. Last, it is interesting to inquire what happens if we introduce a
diffuse prior pdf for all elements of 15 when X'X is singular. Given that we
assume (3.96) p(15) oc constant, -c < fit < c, i = 1, 2,..., k, the posterior pdf
for 15, given , is just the following improper pdf: 1 x15)] p(151a, y) oc exp
[- (y - X15)'(y - (3.97) ] oc exp - (15 - )'X'X(15 - ) , where fi is any solution
to the normal equations, that is, X'X = X'y. As (385) indicates, is not
unique. Since 15 = P� from (3.75), we can express (3.97) in terms of � as
follows: [' ] (3.98) P(�x, �.1 a, Y) oc exp - (�x - x)'D(�x - ) � This
posterior pdf is the product of a proper normal pdf for �x, with unique
mean C&. and a diffuse posterior pdf for �., which, of course, is identical
to the prior pdf for o., since the likelihood function in (3.86) does not
involve �.. Since the marginal posterior pdf for �x in (3.98) is a proper
normal pdf, the distribution of the q x 1 vector 0x = Hx'�x, where Hx is
any q x q nonsingular matrix, will be proper and therefore inferences can be
made about 0, given a; for example, E(0la, y)= Hx'x and Var(0xla, y)=
Hx'D-XHxa �'. To relate these results to 15 note that X15 = XP� =
XPx�x, since XPa = 0. If R' is any q x n matrix of rank q and R'XPx is a q
x q nonsingular matrix, then R'X15 = R'XPx�x is a q x 1 vector with a
proper normal pdf. Thus, even though X'X is singular and we introduce no
prior information, it is possible to make inferences about q linearly
independent combinations of the elements of 15, R'X15. In sampling theory
terms, the q linear functions of the elements of 15, R'X15, are called
"estimable functions." a See, for example, Graybill, op. cit., p. 227 if.

82 THE UNIVARIATE NORIVIAL LINEAR REGRESSION MODEL


QUESTIONS AND PROBLEMS 1. If the investment variable in the
investment multiplier model in Section 3. is considered random, what
assumptions about it are needed for the anal of Section 3.1.3 to be
appropriate? 2. Using the prior assumptions employed in Section 3.1.3 and
the data in Ta 3.1, derive and plot the posterior pdf for 2, the common
variance of error terms in the investment multiplier model. Show that a
posteft vs2/a a has a x a pdf with v = n - 2 degrees of freedom and use this
fact construct a 957o Bayesian confidence interval for the variance. 3.
Using the data, assumptions, and investment multiplier model of Sect:
3.1.3, derive the predictive pdf for income, given that investment assume
value of 50. Compare the mean and variance of this predictive pdf with tl of
the predictive pdf for investment with a value equal to 100. 4. In Problem 3
compute a predictive interval with probability 0.80 of includ the unobserved
value of income associated with investment equal to 50. i the same for the
value of income associated with investment equal to 1 Compute a predictive
region with probability 0.80 of including the values income associated with
investment equal to 50 and 100. Compare the t intervals with the region and
interpret the comparison. 5. Suppose, in the analysis of the investment
multiplier in Section 3.1.3, that had taken a Keynesian viewpoint by
relating the investment multiplier/%. the marginal propensity to consume a:
/ = 1/(1 - a), with satisfy 0 < < 1 on a priori grounds. (a) What prior
restriction on the range of/2 is implied by the conditl 0<<17 (b) If is
assumed uniformly distributed over the interval zero to one, w; is the
implied pdf for/a ? Comment on its properties. (c) If has a beta pdf with
parameters a and b (see Appendix A, Section what is the implied pdf for/o.
= 1/(1 - a) with 0 < a < 1 ? (d) In the analysis of the investment multiplier
model with the data of Ta 3.1, assume that the prior pdf for the parameters
is given by p(/x,/a, or) g()/a, with -c < / < c, 0 < cr < 0% and g(/2), the prior
1 obtained in (b) of this question. Derive the posterior pdf for /a. univariate
numerical integration procedures (see Appendix C) to normal it. Then
compare the results with the posterior pdf for/a plotted in S tion 3.1.3 to see
how sensitive results are to changes in prior assun tions. 6. Consider the
posterior pdf for parameters of the normal multiple regressi model shown in
(3.31). What is the conditional posterior pdf for 1, given Give its mean
vector and variance-covariance matrix. 7. Suppose that in a normal multiple
regression model y = X[$ + u we p tition X = (Xx ! Xa) and I' = (lx'la') and
write y = Xxlx + Xo.152 + How does the condition Xx'Xa = 0 affect
properties of the conditio: QUESTIONS AND PROBLEMS 83 posterior
pdf for 1' = (lx'[ Io.'), given (y, which was derived in Problem 6? Also, what
does the condition Xx'X'a = 0 imply about the marginal posterior pdf for 15'
= ([$x'ila') shown in (3.32) of the text? In particular, does the condition
Xx'Xa = 0 imply that elements of [$x will be uncorrelated and distributed
independently of those of gla ? 1957 U.S. ANNUAL SURVEY OF
MANUFACTURES DATA FOR THE TRANSPORTATION EQUIPMENT
INDUSTRY State No. of Aggregate Aggregate Aggregate Establish- Value
Added, Capital Service Man-Hours ments, V Flow a K Worked, b L N
(millions of dollars) (millions of man-hours) Alabama 126.148 3.804
31.551 68 California 3201.486 185.446 452.844 1372 Connecticut 690.670
39.712 124.074 154 Florida 56.296 6.547 19.181 292 Georgia 304.531
11.530 45.534 71 Illinois 723.028 58.987 88.391 275 Indiana 992.169
112.884 148.530 260 Iowa 35.796 2.698 8.017 75 Kansas 494.515 10.360
86.189 76 Kentucky 124.948 5.213 12.000 31 Louisiana 73.328 3.763
15.900 115 Maine 29.467 1.967 6.470 81 Maryland 415.262 17.546 69.342
129 Massachusetts 241.530 15.347 39.416 172 Michigan 4079.554 435.105
490.384 568 Missouri 652.085 32.840 84.831 125 New Jersey 667.113
33.292 83.033 247 New York 940.430 72.974 190.094 461 Ohio 1611.899
157.978 259.916 363 Pennsylvania 617.579 34.324 98.152 233 Texas
527.413 22.736 109.728 308 Virginia 174.394 7.173 31.301 85 Washington
636.948 30.807 87.963 179 West Virginia 22.700 1.543 4.063 15 Wisconsin
349.711 22.001 52.818 142 a Net capital stock is defined as "gross book
value on December 31, 1957" minus "accumulated depreciation and
depletion up to December 31, 1956," minus "depre- ciation and depletion
charged in 1957." Capital service flow is defined as deprecia- tion and
depletion charged in 1957 plus 0.06 times the net capital stock plus the sum
of insurance premiums, rental payments, and property taxes paid. b These
figures refer to production workers.

84 THE UNIVARIATE NORMAL LINEAR REGRESSION MODEL 8.


From (3.32) provide an explicit expression for the posterior mean ISx of a
subvector of 1S; that is, I' = (1S'!1.'). If Xx'X. = 0, show that the pos- terior
mean of 13 is I = (Xx'Xx)-xXx'y. (Note that 1 is the mean of the posterior
pdf when the term X.[. is omitted from the regression model, given a diffuse
prior pdf for I and e.) If Xx'X2 -Y: O, compare fix with the posterior mean
for ISx. In particular, show that = + where [' = ([$ :[2'), with [ = (X'X)-X'y,
and P -- (Xx'XO-X'X2. 9. Shown in the table on p. 83 are data relating to
the U.S. Transportation Equipment Industry for 1957. Assume that the data
are generated by a Cobb-Douglas production function; that is, (a) or L, +
flaln + u, (b) In . . where fl = In A, f12, and are parameters with unknown
values; V, L, K, and N are defined in the table; the subscript i denotes values
of the variables for the ith state, i = 1, 2, ..., 25, and u is a random
disturbance term. We assume that the u's are normally and independently
distributed, each with zero mean and common variance es. After reviewing
alternative assumptions about independent variables in regression models,
consider whether In (L/N) and In (K/N) can reasonably, on a priori ounds,
be assumed to satisfy any or all of these alternative assumptions. 10. Under
the assumption that it is appropriate to analyze the data in Problem 9 within
a regression framework, what are the likelihood function and maxi- mum
likelihood estimates for , s, , es, and A ? 11. Derive and compute posterior
pdf's for the parameters of (b) in Problem 9, using the data presented in that
problem and the diffuse prior p p(, ,) m 1/,with0 < < mand-m < < re, i= 1,2,
and3. How do the means of these posterior pdf's compare with the
maximum likelihood estimates of corresponding parameters in Problem 107
12. In Problem 11 derive and plot the marginal posterior pdf for A = e. Do
the mean and higher posterior moments of A exist ? Compare the modal
value of the posterior pdf for A with the maximum likelihood estimate of A.
What can be said about these two quantities in large samples ? 13. In, view
of the economic theory of production functions, comment on the prior
assumptions about the production funion parameters that were introduced in
Problem 11. Determine whether restricting s and a to non-negative has a
great effect on the numerical results in Problem 11. 14. Under the contions
of Problem 11, derive d compute the posterior pdf QUESTIONS AND
PROBLEMS 85 for *2 = f12 + fla, the returns to scale parameter. What is
an 857o Bayesian confidence interval for this parameter ? 15. In connection
with (b) in Problem 9, suppose that we assume constant returns to scale,
that is, *2 = f12 + fla = 1, and that 0 < /%. and 0 < fla. How would you
formulate a prior pdf to reflect this information in the analysis of the data
and (b) in Problem 9 ? 16. Let ,2 -- f12 + fla be a returns to scale parameter
and consider a prior pdf for /and f12, namely, P02,/52) = px(rt)P.(fl.l rt).
Can Px(O, the marginal prior pdf for /, andp.(fl2[rt), the conditional pdf
for/%., given /, both be in form of beta pdf's with 0 _< ,2 -< 2 and 0 _< /%.
< rt ? Provide an example to illustrate your answer.

CHAPTER IV Special Problems in Regression Analysis The topics treated


in this chapter provide examples of how some of the specifying
assumptions of the regression model, considered in Chapter 3, can be
relaxed; for example, we consider the regression model with autocorre-
lated errors. Since this and other departures from our "standard" assump-
tions are often encountered in practice, it is important to be able to deal with
them. Failure to take account of possible departures from the standard
assumptions can, of course, result in incorrect inferences. Thus it is
imperative that users of the regression model evaluate critically the
adequacy of specifying assumptions in applications. 4.1 THE
REGRESSION MODEL WITH AUTOCORRELATED ERRORS x
Initially we analyze a simple regression model with a disturbance term
generated by a first-order autoregressive process; that is, (4.1a) (4. lb) Yt =
lxt + ut, ut = put- + t t = 1,2,...,T. In (4.1a) Yt is the tth observation on the
dependent variable, / is a scalar regression coefficient, xt is the tth
observation on an independent variable, assumed nonstochastic, and ut is
the tth error term. In (4. lb) the first-order autoregressive process, assumed
to generate the error term ut, is presented. It involves a scalar parameter p
and an error term t. It is assumed that the t are normally and independently
distributed with zero means and common variance '. Note that if p = 0 (4.1a,
b) would reduce to a simple regression model satisfying the standard
assumptions of Chapter 3. From (4.1a, b) we obtain (4.1c) Y = PYt- + (xt --
pxt- ) + t, t = 1, 2,..., T. x This section is based mainly on work in the
following paper: A. Zellner and G. C. Tiao, "Bayesian Analysis of the
Regression Model with Autocorrelated Errors," J. Am. Statist. Assoc., 59,
763-778 (1964). 86 THE REGRESSION MODEL WITH
AUTOCORRELATED ERRORS 87 Note that Yo appears in (4.1c), and
thus something must be said about initial conditions before we can proceed
with the analysis of the model. If we assume that the process represented by
(4.1a, b) has been operative for t = 0,-1,-2,...,-To, where To is unknown, we
can write Yo- /Xo = M + o, where M = p(y_ -/x_); M is regarded as a
parameter, since it depends on unobservable and unobserved quantities.
Under these assumptions Yo is normally distributed with mean/Xo + M and
variance '. These assumptions are broad enough to apply to explosive (Ipl >
1) as well as nonexplosive (IPI < 1) schemes and to situations in which the
process commences at any unknown point in the past. On the other hand, it
may be that Yo is fixed and known; for example, if the observations relate
to a price and t = 0 is the last period during which this price was fixed by a
governmental body, it may be appropriate to take Yo as fixed and known.
This situation can also be represented in the frame- work introduced in the
preceding paragraph by assuming that eo has zero variance. Other
assumptions which may be appropriate for other circum- stances are that o
is normal with known variance, o ' or that Yo is distributed independently of
y' = (yx, y.,..., YT) and has a distribution that does not involve any of the
parameters of the model. From what follows it will be seen that any of the
assumptions regarding Yo lead to the same joint posterior pdf for the
parameters/, p, and a. -'"Under the assumptions introduced, the joint pdf for
Yo and y'= (Yx, Y., . . ., Yv) is given by P(Yo, Yl/, P, , M) = P(Yo[/, p, a, M)
P(Y[Yo,/g, p, a, M) 1 { 1 (4.2) oc a-- exp -- (Yo -/gXo - M) ' 1 T which,
viewed as a function of the parameters, is the likelihood function, M, *lYo,
Y), with -oo < / < c, -c < p < 0% -oo < M < c, and > 0. Since -c < < 0% we
are allowing the process in (4. lb) to be explosive or nonexplosive. As
regards prior assumptions, we assume that we have little prior informa- tion
and represent it by assuming that/, , log , and M are uniformly and
independently distributed; that is 1 (4.3) p(/, p, M, a) oc -.

88 SPECIAL PROBLEMS IN REGRESSION ANALYSIS On combining


this prior pdf with the likelihood function, we obtain the following joint
posterior for the parameters' p(/9, p, e, mly ) oc e--e- exp - (Yo -/9Xo - M) '
(4.4) 1 If we are interested in investigating M, the initial level of the process
in (4.1c), it is possible to obtain the posterior pdf for M by integrating (4.4)
over fi, p, and m If interest does not center on M, the influence of this
parameter can be eliminated by integrating (4.4) with respect to M to yield
(4.5) p(/9, p, ely ) oc e---ci+ exp 2?' [Yt - pY-x -/9(xt - px,_ x)]' , which is
the joint posterior pdf for fi, p, and e. In deriving (4.5), Yo was assumed
normal with mean M +/9Xo and variance e '. It is straightforward to verify
that by employing the other assumptions about Yo, discussed above, we
would also obtain (4.5) as our posterior pdf. On integrating (4.5) with
respect to e, we obtain the following bivariate posterior distribution' P(/9,
PlY) oc (4.6) oc {Z -/9x - 0D- with the summations extending from t = 1 to
t = T. The bivariate pdf in (4.6) enables us to make joint inferences about/9
and p; that is bivariate numerical integration procedures (see Appendix C)
can be employed to evaluate the normalizing constant and to compute, for
example, the posterior probability that ax _< /9 _< a. and bx _< p < b.,
where ax, a., bx, and b. are given numbers. Also the contours of the
posterior pdf can be readily com- puted to provide information about the
shape of the bivariate posterior pdf. To obtain the marginal posterior pdf for
p complete the square on/9 in the first line of (4.6) and use properties of the
u ,nivariate Student t pdfto integrate out/9. Similarly, to obtain the marginal
posterior pdf for /9, complete the square on p in the second line of (4.6) and
integrate out p again by using properties of the univariate Student t pdf.
These operations yield P(IY) oc [Z (y-x -/9x-D'] (4.7) x Z (y, - - Y (Y-x -/9-
--t- 5 ' 'f ' THE REGRESSION MODEL WITH AUTOCORRELATED
ERRORS 89 In order for the distribution in (4.8) to be proper, the quantity
(xt - pxt- )' must be positive. This implies that we must assume that for any
p there exists some t such that xt pxt_ , which is not very restrictive? The
posterior pdf's in (4.7) and (4.8) can be analyzed by using univariate
numerical integration procedures. To illustrate the results of these computa-
tions, we have computed these pdf's with data generated from the following
model: y, = 3xt + ut Ut = pUt-z nt' t t= 1,2,..., 15, where the ds, given in
Table 4.1, were drawn from a table of standardized random normal deviates.
The x's are rescaled investment expenditures taken from a paper by
Haavelmo. a The first series of 15 observations was generated with p -- 0.5
and the second set was generated with p = 1.25. We refer to the first set of
y.'s as the "nonexplosive" series and the second set as the "explosive" series.
Although we distinguish these two cases, it is important Table 4.1 Yt Yt t xt
(for p = 0.5) (for p = 1.25) 0 � � � 3.0 9.500 9.500 1 0.699 3.9 12.649
13.024 2 0.320 6.0 18.794 19.975 3 -- 0.799 4.2 12.198 14.270 4 - 0.927 5.2
14.372 16.760 5 0.373 4.7 13.909 15.923 6 --0.648 5.1 14.556 16.931 7
1.572 4.5 14.700 17.111 8 -0.319 6.0 18.281 22.195 9 2.049 3.9 13.890
18.992 10 -3.077 4.1 10.318 18.338 11 -0.136 2.2 5.473 14.012 12 - 0.492
1.7 4.044 13.873 13 -- 1.211 2.7 6.361 17.855 14 - 1.994 3.3 7.036 20.099
15 0.400 4.8 13.368 27.549 Uo = 0.5 9. However, if xt = 1 for all t, the
condition is violated for p = 1. With the xt's all equal to 1, our prior pdf
must assign a zero density to p = 1. From (4.1c) note that with p = 1 and the
xt's = I, does not appear in the model. T. Haavelmo, "Methods of Measuring
the Marginal Propensity to Consume," J. Am. Statist. Assoc., 42, 105-122
(1947).

90 SPECIAL PROBLEMS IN REGRESSION ANALYSIS to realize that


the results given in (4.6), (4.7), and (4.8) are appropriate in the analysis of
both. The marginal distributions of/ and of p for these data are shown in
Figure 4.1. It is seen that the posterior pdf for p, derived from the explosive
series, is much sharper than that relating to the nonexplosive case. 2.0 1.0
1.0 0.5 2.30 2.60 2.90 3.20 3.50 -0.12 0.18 0.48 0.78 1.08 3.0 2.18 2.48
2.78 3.08 3.38 8 7 4 3 2 1 I I I I 0.94 1.04 1.14 1.24 1.34 Figure 4.1
Marginal distributions of/ and p. (a) Nonexplosives series (T = 15). (b)
Explosive series (T = 15). THE REGRESSION MODEL WITH
AUTOCORRELATED ERRORS 91 The posterior pdf's for/ in Figure 4.1
enable us to make inferences about ais parameter which incorporate an
allowance for the departure from adependence postulated in the model. That
allowance be made for such a ,eparture is extremely important because
inferences would be markedly [liferent if we analyzed these data under the
assumption of independence. as shown in Chapter 3 under the assumption
of independence ( = 0), we 'vould have P(/IY) in the univariate Student t
form; that is, s/(y. ~ tv, vith v = T - 1,/ = Y. xtyt/Y. xt ' and vs ' = Y. (Yt -
flxt) 2. For our two sets >f data the posterior pdf's for under the
independence assumption are shown in Figure 4.2 by the curves labeled =
0. These distributions are far :lifferent from those shown in Figure 4.1. To
appreciate the situation fully it is instructive to write the marginal
distribution of/ as (4.9) p(/IY) = f;oo p(tlp, y) p(pIY) dp. The integrand in
(4.9) contains two factors, the conditional posterior pdf for / given p,
p(fS[p, y), and the marginal posterior pdf for p, p(p[y). Thus, as p9inted out
in Chapter 2, the marginal posterior pdf for/ can be regarded as a suitably
weighted average of the conditional pdf's p(/[p, y), with p(ply) serving as
the weight function; that is, the conditional pdf, p(/[ p, y), provides
inferences about/ for an assumed value of p. On the other hand, the
marginal pdf, p(ply) reflects the plausibility of assertions about the value of
p in the light of the data and our original assumptions. Unless the
conditional pdf is insensitive to changes in p, it is clear that an assumption
that p equals some fixed value, say p = 0 (observations independent) or p =
1 (first differences of the observations independent), could lead to a
posterior pdf for far different from that given in (4.7). To analyze this point
further, note that the conditional pdf for , given p, which is easily obtained
from (4.6), is (4.10a) P(fllP, Y) m [(p)]-u{1 + [ 2 J ' where v = T - 1, (4.
lOb) /(p) =.-- E (xt- pXt-z)(Yt- E (x, - ox,- 0 and (4.10c) . (Xt -- pXt_ l) 9'

92 5 SPECIAL PROBLEMS IN REGRESSION ANALYSIS - //' P = -1.0 _


- 2.4 2.6 2.8 3.0 3.2 (a) 2.0 1.5 1.0 0.5 1.2 / \p =1.8 ' ; I =0 / ,,/ i ! . P= 1.0! ,,
' \ ,' / � / \ / / : \ 2.5 3.0 :3.5 4.0 4.5 Figure 4.2 Conditional posterior
distribution of / for various p. (a) Nonexplosive series (T = 15). (b)
Explosive series (T = 15). THE REGRESSION MODEL WITH
AUTOCORRELATED ERRORS 93 From (4.10) (4.11) /S - (P) s(p) " tv;
that is, this quantity has the Student t pdf with v = T - 1 degrees of freedom.
To show how sensitive inferences about/S are to what is assumed about p,
we have computed conditional posterior pdf's for /S for various assumed
values of p which are shown in Figure 4.2. The results indicate that for the
nonexplosive series the center of the conditional pdf is relatively insensitive
to changes in p, whereas the spread of the distribution is quite sensitive to
such changes. On the other hand, both the center and spread in the
explosive series change markedly as p is varied. Thus an inappropriate
assumption about can vitally affect an analysis. This fact underlines the
importance of working with the marginal posterior pdf for/S which
incorporates a proper allowance for the role of p in the model. We now
generalize these methods to apply to the multiple regression model with
errors generated by a first-order autoregressive process. Our model is 4
(4.12a) y = Z[$ + u, (4.12b) u = pu_ + e, or, alternatively, (4.12c) y = py_ +
(X- pX-O + e, where y' = (yx, y.,..., y.) and y_ x' = (Yo, Yx,..., Yr- ) are 1 x
T vectors of observations, u': (u, ua,...,u.) and u_x'= (Uo, ux,..., u.-O are (1
x T) vectors of autocorrelated errors, [5' = (/Sx,/So.,...,/S,) is a (1 x k) vector
of regression coefficients, p is a scalar parameter, ' = (q, �., ..., �e) is a 1 x
T vector of random errors, and (4.13) X= : i , X-x= i : LXvz xT2 ... xvJ
LX(:r-z)z x(:r_z)2 ... xa,- are T k matrices with given elements. As above,
we assume that the elements of e are normally and independently
distributed, each with mean zero and common variance ,'. Further, we make
the same assumptions about initial conditions and prior pdf's for and, as 4
We assume that there is no intercept in the regression. If there is, our prior
assumptions must preclude the value p = 1, for when p = 1 the intercept
term disappears in (4.12c).

94 SPECIAL PROBLEMS IN REGRESSION ANALYSIS introduced.


Last, we assume that the regression coefficients are a priori uniformly and
independently distributed; that is (4.14) p([S) oc const. - oo </ < 0% i = 1,
2,..., k. Under thse assumptions the joint posterior pdf for 15, p, , and M is
given by (4.15) where Xo' = (Xox, Xo., �.., Xo) is the first row of X_x. On
integrating (4.15) over M and , the joint posterior pdf for 15 and p is readily
obtained as (4.16) P(15, PlY) oc {[y -- py_ -- (X - pX_)15]'[y - py- - (X -
pX_)15]} For any fixed value of p it is seen from the first line of (4.16) that
the con- ditional pdf for 15 is (4.17) v=T-k, vsa(p) = [y- py_- (X-
pX_)l(p)l'[y- py-- (X- pX_)O(p)]. The distribution in (4.17) is in the form of
a multivariate Student t distribu- tion. This result is not surprising, since, for
given p, (4.12c) can be regarded as a usual regression model, with the
results from Chapter 3 applicable. We note that in deriving (4.17) it is
implicitly assumed that the matrix H is positive definite for any fixed value
of p. A necessary and sufficient con- dition for this to be so is given in
Appendix 1 to this chapter. For the case k = 1 the condition reduces to that
given in connection with (4.8); that is, for any p there exists some t such
that xt - px_ . In the more general case, k > 1, the condition implies that any
linear combination of the columns of the matrix of independent variables
for periods 0, 1,..., T, must not satisfy an exact first-order autoregressive
scheme. This is not a restrictive condition. THE REGRESSION MODEL
WITH AUTOCORRELATED ERRORS 95 To obtain the marginal
posterior pdf's for 15 and p5 we merely integrate (4.16) with respect to
these parameters. This can be done easily by completing squares and using
properties of the univariate and multivariate Student t pdf's to yield P(151Y)
(4.18) x (y - X15)'(y - X15) - (y_x _ X_ xlS)'(y_x - X_ and p(ly) (4.19)
where v, H, (p), and sa(p) have been defined in connection with (4.17). If
interest centers on the marginal posterior distribution of a single element of
, say x, its posterior can be obtained in principle from (4.18) by integra-
tion. However, this integration, when viewed analytically or numerically,
appears to be quite difficult, particularly when k is large. As an alternative,
we have (4.20) P(x, PlY) = P(PIY) P(x[, P, Y), with p(ply ) given by (4.19)
and p(lp, Y) obtained from (4.17) by integration with respect to the
elements of other than x. From properties of the multiviate t distribution we
have from (4.17) (4.21) - (p) where h xx denotes the (1, 1)th element of H -
x. The result in (4.21) gives us the form of the second factor on the rhs of
(4.20). Then bivariate numerical integration procedures can be employed to
integrate out p and thus to obtain the marginal posterior pdf for x. As an
alternative, we can obtainp(, ly) in a different way by integrating (4.16)
with respect to a, ,..., . To perform this integration we partition ' = (x ['), X =
(x[ ) and X_x = (x_ _0, where x and x_ x denote the first column of X and
X_ , respectively. Then, with (4.22) w = y - vy_ - (x - s The procedures
described have been incorporated in a computer program. See H. Thornber,
"Bayes Addendum to Technical Report 6603 ' Manual for B34T--A
St.,epwise Regression Program'," Graduate School of Business, University
of Chicago, September 1967.

SPECIAL PROBLEMS IN REGRESSION ANALYSIS THE


REGRESSION MODEL WITH AUTOCORRELATED ERRORS 97 96 we
have P(P, , PlY) oc {[w - (, - p,-0]'[ w - (' - P '-01)-/" Integration with
respect to yields (4.23) p(, pJY) oc JJ-{w'[I-(g- p2-0t-( 2- Pg-0'lw) -('-+)/"
where = (�- p�-0'(�- and w is defined in (4.22). The posterior pdf for/x
can be obtained from (4.23) by numerical integration. The advantage of the
form (4.23) is that its use involves inverting a (k - 1) x (k - 1) matrix H'-,
whereas use of (4.20) involves inverting a k x k matrix H. Note further that/
is a -matrix 6 of second degree in p. Thus the inverse can be expressed as a
h-matrix of degree 2(k - 2) in p, divided by a scalar polynomial of degree
2(k - 1) in p. Putting the inverse of in such a form is computationally
convenient, since it will avoid the necessity for inverting a matrix for each
value of p in the integration. The above results and methods can be applied
readily in practice to make inferences about 1 and p. Thus there seems little
reason to develop approxi- mate large-sample techniques. It is interesting,
however, to compare the results of a large-sample approximate procedure
with the results flowing from application of the results above. As shown in
(4.12c), our model is (4.24) y = pY-z + (X- pX-z) + �. Note that pl is a
nonlinear combination of parameters. Let us linearize our model by
expanding pl about maximum likelihood estimates, 7 say fi and and apply
linear theory to the linearized model? On expanding about the maximum
likelihood estimates we obtain (4.25) y-' py_x + XI- X_x[+(p- )+(1- )]+E 6
See R. A. Frazer, W. J. Duncan, and A. R. Collar, Elementary Matrices, for
a dis- cussion of the properties of -matrices. A square -matrix of degree N
takes the form A0),N + AxA N-x +... + AN-x)' + Am where the A, i = 0, 1,
2,..., N, are square matrices whose elements are independeni-of . * Various
ways of computing maximum likelihood estimates of the parameters in
(4.24) are reviewed in Zellner and Tiao, op. cit., pp. 776-778. This appears
to be the Bayesian analogue of the large-sample sampling theory approach
suggested by W. A. Fuller and J. E. Martin, "The Effects of Autocorrelated
Errors on the Statistical Estimation of Distributed Lag Models," J. Farm
Econ., 43, 1961, 71-82. See also C. Hildreth and J. Y. Lu, "Demand
Relations with Autocorrelated Disturbances," Tech. Bull. 276. East Lansing,
Mich.: Michigan State University Agricultural Experiment Station, 1960. or
(4.26) y-- X_fiO-' p(y- - X_O) + (X- fiX-0 + E, which is linear in the
parameters and I. With the uniform prior pdf's with which we have been
working, application of the linear theory of Chapter 3 to (4.26) leads to a
posterior pdf for p and 1 in the multivariate t form. To illus- trate results of
this approach we have applied the linearization procedure to analyze the
data, shown in Table 4.1, generated from our simple nonexplosive model.
Then this sample of 15 observations was augmented to 20, 30, and 40
observations. In Figure 4.3 the resulting approximate posterior pdf's for our
scalar parameter/ are compared with the exact pdf's computed from (4.7). I I
I I I I ' I t 2 T= 15 _ 2.6 2.8 3.0 3.2 3.4 4 I I I I I I 1 I 2.6 2.8 3.0 3.2 3.4 4 2
l,l,IJllll o ,1 , I I ''", I 2.8 3.0 3.2 4 I I I i I I I1 I 2.8 3.0 T= 40 3.2. 3.4 Figure
4.3 Exact (--) and approximate (---) marginal distribution' of/S for several
sample sizes and nonexplosive series.

98 SPECIAL PROBLEMS IN REGRESSION ANALYSIS Although the


modes of the approximate and exact pdf's occur at approxi- mately the same
values, it is seen that there are some rather large differences in the shapes of
these pdf's. For T = 40 the approximate and exact pdf's are just in fair
agreement. These results illustrate quite graphically how careful one must
be in using large-sample approximate procedures. 4.2 REGRESSIONS
WITH UNEQUAL VARIANCES � Here we consider two normal linear
regression equations: (4.27) yx = Xx15 + ux, (4.28) where Yx = Y2 = U 1
U 2 y. = X215 + u2. an nx x 1 vector of observations on a dependent
variable, an n2 x 1 vector of observations on a dependent variable, an nx x k
matrix, with rank k, of observations on k independent variables, an n. x k
matrix, with rank k, of observations on k independent variables, a k x 1
vector of regression coefficients, an n x 1 vector of error terms, an n. x 1
vector of error terms. We assume that the elements of ux and u. are
normally and independently distributed with zero means. The elements of
ux are assumed to have common variance 0.x', and the elements of u. are
assumed to have common variance 0.2 2 0.22. Note that if 0.2 = 0.22 or 0. =
c0.22, with c a known factor, we could use the methods of Chapter 3 to
analyze our data; that is, we could write (4.27) and (4.28) as y = X15 + u,
with y' = (yx'iy2'), X' = (Xx'iX2'), and u' -- (ux'iu2'), which is in the form of
the standard model considered in Chapter 3. In this section we take up the
situation in which 0.x2 0.22. We analyze first the case in which 0.x2 is
known and %2 is unknown and then turn to the case in which both
variances are unknown. The problem posed in this section may be
encountered in practice in the following circumstances. Suppose that the
observations in (4.27) pertain to a particular historical period, say the period
between World Wars I and II, x� and that the observations in (4.28) pertain
to the post-World War II period. We may be willing to assume that the
regression coefficients are the same for the two periods, xz that is, 15 is the
same in (4.27) and (4.28), but that the error o This section includes much of
the material presented in G. C. Tiao and A. Zellner, "Bayes's Theorem and
the Use of Prior Knowledge in Regression Analysis," Biometrika, 51, 1 and
2, 219-230 (1964). x o With so many wars, we eschew use of the term
"interwar." ' Below we show how this assumption can be relaxed.
REGRESSIONS WITH UNEQUAL VARIANCES 99 terms' variances 0.x'
and 0.2 ' are different in the two periods. Alternatively, (4.27) can be
viewed as a regression model for a microunit, say a firm, and (4.28) as a
regression model for a second microunit. Although we may be willing to
assume that both microunits have the same regression coefficient vector 15,
we may wish to assume that the units have disturbance terms that are
independent but have different variances. 'Let us turn to the case in which
0.' is known, one that is not often en- countered in practice but is considered
here to bring out the relation between Bayesian and certain sampling theory
results. The likelihood function is given by 1 exp [ 1 z(15, Y) oc 22..--- 20.x
2 (yx - Xx15)'(yx - (4.29) 1 2, (y- X)'(y- X)], where y' = (Yx'Y') and it is
understood that we are given Xx and Xe. As prior assumptions we assume
that log a and the elements of are uniformly and independently distributed,
which implies 1 - < < , i = 1, 2,..., k, (4.30) p(, a) --, 2 0 < 2 < . O '
combining (4.29) and (4.30) and integrating with respect to ,, the posterior
pdf for is [' ] Y) exp 22x2 (Yx - Xx)'(Yx - (4.31) x [(y. - 215) (Y. X215)] oc
exp [--- 1 (15 - 0'z(15 - + (15 - 12)'z.(15 - + V2522 ' where Zx = Xx'Xx, =
Zx -xXx'yx, Z2 = X.'X., . = Z2-xXe'y2, v. = no.- k, and v2s?'= (Y2-
X..)'(y2- Xaa). It is seen that (4.31) is the product of two factors, the first in
the normal form in and the second in the multivariate Student t form. For
this reason we refer to (4.31) as a "normal-t" pdf. On expanding the second
factor in an asymptotic series (see Appendix 2 to this chapter), we have for
the leading normal term in the expansion p(]a, y) exp [ 1 1 ] (4.32) 2axa (p -
x)'Zx( - x) 2saa ( - a)'Za( - a) exp [-(p - )'A(p - )1,

lOO SPECIAL PROBLEMS IN REGRESSION ANALYSIS with (4.33)


and (4.34) The second line of (4.32) is obtained simply by completing the
square on 15 in the first line of (4.32). It is interesting to note that (4.33) is
the quantity that Theil x. recommends as a sampling theory estimator
incorporating prior stochastic information and for which he provides a large
sample justification. Here the quantity in (4.33) appears as the mean of the
leading normal term in an asymptotic expansion approximating the
posterior pdf for 15. In Appendix 2 to this chapter methods are presented
that enable us to take account of additional terms in the asymp- totic
expansion and thus to get a better approximation to the posterior pdf. We
now take up the case in which both and ao. are unknown, the case most
often encountered in practice. The likelihood function is given by l(15,
(4.35) 1 exp [- {yll {y2r2 1 2,o. (yx - x15)'(y- x15) 1 x, ' - ] 2ao. o. (yo. -
o.15) (yo. X.15) � As prior pdf, we assume that we have diffuse
information about 15, ex, and ao. and represent this by 1 -co < fi < co'
(4.36) p(15, a., ,.) oc 0 < *:t < co, i = 1, 2,. '., k, Here we have assumed that
the elements of 15, log ax, and log . are inde- pendently and uniformly
distributed. The joint posterior pdf' for the parameters is given by (4.37)
p(15, *o.lY) oc + + exp 1 2, ' (yx- x15)'(y- x15) 2.o.o. (yo. - xo.15)'(Yo. -
xo.15) � 2 H. Theil, "On the Use of Incomplete Prior Information in
Regression Analysis," J. Am. Statist. Assoc., 58, 401-414 (1963).
REGRESSIONS WITH UNEQUAL VARIANCES 101 The integrations
over , and . are easily performed to yield the following joint posterior pdf
for the elements of 15' (4.38) where vt = n{ - k, Z = X'X, = Z{-xX'y, and
v& ' = (y - Xd)' x (y - X), with i = 1, 2. We see that (4.38) is the product of
two factors, each in the multivariate Student t form. Thus we call this a
multivariate "double-t" distribution. To analyze it we expand each of the
factors in an asymptotic expansion (see Appendix 2) that yields the
following as the leading normal term: 1 P(151Y) 4c exp -2st--- (15 -
x)'Zx(15 - x) (4.39) & exp [-�(15 - )'D(15 - )1, 1 ] 2so.o. (15 - where &
denotes "approximately proportional to," (4.40) and (4.41) where M = Z/&
�' D = Mx + Mo., = Xx'Xx/sx 2 and Mo. = Z2/so. ' = x2'x2/s2 2. Using
these definitions, (4.41) can be written as (4.42) which is, of course, the
mean of the leading normal term in the asymptotic expansion of the double
t pdf in (4.38). The analysis required to take account of higher order terms
in the asymptotic expansion and thus to get a better approximation to the
posterior pdf is given in Appendix 2. It is also interesting to observe that
(4.42) would emerge in the sampling theory approach as an approximation
to the generalized least squares estimator for the system in (4.27) and (4.28)
if the unknown parameters axe' and ao.o. appearing in this estimator were
replaced by s �' and so.o., respectively; that is, the generalized least
squares estimator is given by (X,y._xX)_XX,y._Xy = 1 Xx'Xx + 1 X.'XO.
Xx'yx + o.yo.)} a2" 2 r22

102 SPECIAL PROBLEMS IN REGRESSION ANALYSIS where X" =


(Xx'iX:'), y' = (y'[yo.'), and Thus, if we set ' = s ' and = s , we obtain an
approximation to the generalized least squares estimator which is usually
given a large sample justification. In the Bayesian approach we see from
(4.37) that the conditional posterior pdf for , given ax and %, is ($1, :, y) <
exp - (y - x$)'(y - x$) 1 , + (y: - ) (y: - x$) < exp [-($ - )'(x'z:x)($ - )], with =
x'x + x:'x. X'y + -- X:'y. � 0.2 9' Thus in the Bayesian approach the
generalized least squares quantity appears as the mean of the conditional
posterior pdf p(lalax, o., y), which is in the multivariate normal form with
covariance matrix (X,_X)_ = 1 Xx'X + X'X � If in this conditional pdf we
set ax:= sx: and e:= s: :, we obtain the approximation to the generalized
least squares quantity as the mean of our conditional pdf. In large samples s
and s: will be close to the true values of ax and :: and thus the use of the
conditional pdf may be' satisfactory. In general, however, it is better to
integrate with respect to and : to obtain the marginal pdf for $ and to base
inferences on it rather than use the conditional pdE To illustrate application
of these techniques we analyze a simple investment model with annual time
series data, 1935-1954, relating to two corporations, General Electric and
Westinghouse. xa In this model price deflated gross investment is assumed
to be a linear function of expected profitability and beginning-of-year real
capital stock. Following Grunfeld, x the value of xa The data are taken from
J. C. G. Boot and G. M. de Witt, "Investment Demand: An Empirical
Contribution to the Aggregation Problem," Intern. Eton. Rev., 1, 3-30
(1960). x 4 y. Grunfeld, "The Determinants of Corporate Investment,"
unpublished PhD thesis, University of Chicago, 1958. REGRESSIONS
WITH UNEQUAL VARIANCES 103 outstanding shares at the beginning
of the year is taken as a measure of a firm's expected profitability, an
assumption that has received critical com- ment but which we use just for
illustrative purposes. The two investment relations are (4.43a) Yx(t) = %
+/xxxx(t) +/%.xx.(t) + ux(t), (4.'43b) yo.(t) = ao. + tx.x(t) + t%xo.o.(t) + u.
(t), where t in parentheses denotes the value of a variable in year t (t = 1, 2,
20), and ''" General Variable Electric Westinghouse Annual real gross
investment Value of shares at beginning of year Real capital at beginning of
year Error term Yx(t) y.(t) xxx(t) xo.x(t) xx.(t) x,.(t) ux(t) uo.(t) The
parameters/9x and/90. in (4.43) are taken to be the same for the two firms in
this illustrative example; however, ax and a., the intercepts, are assumed to
be-"different to allow for possible differences in the investment behavior of
the two firms. Further, ux(t) and u.(t) are assumed to be independently xs
and normally distributed for all t with zero means and variances axe' and o.
�', respectively. If we employ a diffuse prior pdf for the parameters,
namely, (4.44) p(, {3, x, .) oc. 1 , with ' = (x, c.) and 1S' = (/x,/o.), we obtain
the following joint posterior pdf: p(, a, , ".ly) oc (+ )- ( + )- (4.45) x exp (- [
(yx _ at- Xxt$)'(yx- axt- Xxt) + __1 (yo.- a.t- X4a)'(y. a.t X4)]} iF2 2 ,
where n = 20, t' = (1, 1,..., 1), a 1 x n vector, and Xx and .Y. are n x 2
matrices of observations on G.E.'s and W ' ' ' � estlnghouse s independent
variables. Below we show how this assumption can be relaxed.

104 SPECIAL PROBLEMS IN REGRESSION ANALYSIS If interest does


not center on ax and ao., they can be integrated out of (4.45). x6 Then on
integrating over ax and e., we have (4.46) x where vx = v. = 17, x is the
least squares quantity, a 2 x 1 vector, obtained from G.E.'s data, . is the least
squares quantity based on Westinghouse's data, x 1 m = sO. n n and 1 1 (X
M. - s.. - 7: The sample quantities are shown below: General Electric
Westinghouse (0.02655 = ,0.1517 / sx �' = 777.4463 vx -- 17 [4185.1054
299.6748] Mx = [ 299.6748 1535.0640] /0.05289 o. = k0.09241/ sa 2 =
104.3079 v2 = 1.7 [9010.5868 M: = [1871.1079 1871.10791 706.3320]
Further the quantity in (4.42), x6 ' = (0.0373, 0.1446). A plot of the
contours of the joint posterior pdf for/x and/%., given in (4.46), is shown in
Figure 4.4. Also in this figure are lines showing the loci of conditional
modes. We see that the posterior distribution is concentrated rather sharply
in the region 0.0278 < /x < 0.0468 and 0.1216 < /. < 0.1676, with mode at
approximately (0.0373, 0.1446). Further,/x and/o. are nega- tively
correlated and the contours are approximately elliptical. This is because the
joint density function is nearly a bivariate normal distribution due tO the
fact that both vx and v. are fairly large in this example. If interest centers on
only one of the parameters, say/x, we can obtain If interest does center on
az and a, (4.45) can be integrated with respect to [, *, and -9.. Then ax and
aa will be distributed in the form of two independent Student t variables.
Further, the difference ax - aa has the Behrens-Fisher distribution. x That is,
let z = (I- tt'/n)y and W = (I- tt'/n)X; then I = (W(WO-W(z, i= 1,2. xa Here,
because we have integrated out ex and e2, all moments appearing in (4.42)
become moments about sample means. 0.0468 0.0421 0.0373 0.0325
0.0278 REGRESSIONS WITH UNEQUAL VARIANCES 105 Figure 4.4 I
I I I Conditional modes of /2 given 40O 0.1216 0.1331 0.1446 0.1561
0.1676 Contours of the joint posterior distribution of fix and/a. its m?ginal
pdf by methods discussed in Appendix 2 and utilizing an asymp- totic
expansion of (4.46)? This marginal pdf is plotted as the solid line in Figure
4.5. Also shown in Figure 4.5 is an approximate posterior pdf for/, based on
the leading normal term in the asymptotic expansion [see (4.39) and
Appendix 2]. It is seen that the posterior pdf for/x, given by the solid line, is
somewhat flatter at the center and fatter in the tails than the approximating
large-sample normal pdf, represented by the broken curve. A comparison of
the first two moments of these pdf's is shown below: Large Sample Normal
Finite Sample Approximation Approximation Mean 0.0373 0.03726
Variance 9.01445 x 10 -s 9.6158 x 10 -s a. Broken curve in Figure 4.5. b.
Solid curve in Figure 4.5, based on asymptotic expansion of (4.46),
disregarding terms for which i + j > 2 (see Appendix 2). x In this particular
example, with just two elements in [, bivariate numerical integration
techniques can be employed to obtain the marginal pdf for/x. We use the
asymptotic � expansion here because it can be used when [ contains more
than two elements. Also, see footnote 20 below.

106 SPECIAL PROBLEMS IN REGRESSION ANALYSIS 40 -- 30 -- 20


m 10 -- I I I 0.0231 0.0373 0.0515 Figure 4.5 In this figure, the solid curve
represents the posterior distribution of fx and the broken curve represents
the limiting normal approximation. The mean of/x is extremely close to its
normal approximating value. On the other hand, the variance of/1 is about
670 larger than that provided by the approximating normal distribution. We
have concentrated on making inferences about regression coefficients. In
some circumstances we may be interested in making inferences about ex
and 2, given the model shown in (4.27) and (4.28), the likelihood function
shown in (4.35), and prior assumptions shown in (4.36). In the joint
posterior pdfin (4.37) we change variables from , , and 2 to , 1, and = 2/22,
REGRESSIONS WITH UNEQUAL VARIANCES 107 ) < , < c. The
Jacobian of this transformation is J cx: 0.23/0.12 = 0.1 -3/2 thus, in terms of
I, 0., and , (4.37) becomes 2� 4.47) P(, 0.1; ;qy) ec A(% - 2)2 0.'/1 +n2 + 1
exp - 1 20.12 } + A(Y2- 2[)(y2- J2[)] � We now complete the square on 15
in the exponent' (Yl -- Xx)'(yx - Xi) -- (Y2 -- X2)'(y2 - X2) - '(Xx'Xx +
.3Y2'X2) - 2'(Xi'yx + X2'Y2) + Yl'Yl + hY2'Y2 = ([ - Cx-xC2)'Ci( - C1-
1C2) + y'y + Y2'y2- C2'C-C2, with Cx = XltX 1 + AX2tX 2 and.C2 = X'yx
+ IX2'Y2. Then on substituting in (4.47) and integrating with respect to we
have (4.48) P(0.1, ;lY) (n 2 - 2)/2 0.,/, IClI x exp 2tr12 [Y'Y + ;Y2'Y2- C2
C1 C2] , wldch is the bivariate posterior pdf for 0.x and ,. On integrating
(4.48) with respect to 0.x we have the following for the marginal posterior
pdf for p(;qy) oc (4.49) (na _ 2)121Cl l _1 (Yl'Yl + 'Y2'Y2 - C2'C - C2)(' +
% - k)12 [Yl'Yl + Y2'Y2 - (Xx'yl + 2 Y2) x (xdxl + x2'x2)-(Xdy + X2'y2)]
(", This posterior pdf can be analyzed by using univariate numerical
integration techniques. It should be noted that if the regression coefficient
vectors in (4.27) and (4.28) were not identical and we assumed a priori that
all regression coefficients, log 0.1 and log 0.2, were uniformly and
independently distributed, the posterior pdf for , -- 12/0.22 would be in the
form of an F distribution. The expression in (4.49) departs from being in the
F form because it incor- porates the information that the coefficient vectors
in (4.27) and (4.28) are the same. 9.0 Equation 4.47 can also be employed
to obtain the marginal posterior pdf of a single element of [3, say fix.
Analytically integrate (4.47) with respect to ox and the elements of [3 other
than fix. The result is a bivariate pdf for fix and , which can be analyzed
numerically.

108 SPECIAL PROBLEMS IN REGRESSION ANALYSIS TWO


REGRESSIONS WITH SOME COMMON COEFFICIENTS 109 4.3 TWO
REGRESSIONS WITH SOME COMMON COEFFICIENTS 'x Assume in
connection with the system in (4.27) and (4.28) that the coeffi- cient vectors
appearing in these equations are not entirely the same; for example, in the
numerical example of Section 4.2 we allowed the intercept terms to be
different in the two investment relations. In general we may have (4.50) Yx
= WdSx + W.15. + ux, (4.51) yo. = Zx15x + Z.y9. + u9., where Yx and y.
are nx x 1 and n. x 1 vectors of observations on our dependent variables,
(Wxi W.) is an nx x kx matrix, with rank kx of given observations on kx
independent variables, (ZxiZ) is an m. x k. matrix, with rank k. of given
observations on k. independent variables, 15x is an m x 1 vector of
coefficients appearing in both equations, 15. and y. are mx x 1 and m. x 1
vectors, respectively, of regression coefficients, and ux and u. are nx x 1 and
n. x 1 vectors of error terms. Note that kx = m + mx and k. = m + m. and
that Wx has the dimension nx x m and Zx, the dimension n. x m. We
assume that the disturbance terms in ux and u. are normally and
independently distributed each with zero mean and common variance a '.
For convenience we rewrite the system as follows: 0 /15\ yo. Z 0 Z. uo. or
(4.53) y = X15 + u, with y' = (yx'i y.'), 15' = (151'i 152 'i �2'), u' = (ul'iu2'),
and X denoting the partitioned matrix on the rhs of (4.52). It is seen that
(4.53) is in the form of a multiple regression model with nx + n.
observations assumed to satisfy the standard assumptions. Thus with a
diffuse prior on the elements of 15 and log a the posterior pdf for 15 will be
in the multivariate Student t form; that is (4.54) P(151Y) oc (vso. + (15 -
l)'X',Y(15 - with v=nx +no.-rn-mx-rn., =(X'X)-XX'Y and vs'=(y-X)'(y-X).
'x This problem has been analyzed in V. K. Chetty, "On Pooling of Time
Series and Cross-Section Data," Econornetrica, 36, 279-290 (1968). Here
we take a somewhat different approach in the derivation of some of his
results. With (4.54) noted, the problem of obtaining the marginal posterior
pdf for, say, 15x is just the problem of getting the marginal posterior pdf for
a subset of a set of variables distributed in the multivariate Student t form, a
problem considered in Chapter 3. Here we partition 15'= (15x'iy'), '= ([x' ?'),
and Wx' Wx + Zx ZdZx 0 with �' = (15o.'!�2') and ,' = ({.2'!?d). Then the
marginal posterior pdf for 15x is given by (4.55) p(1511y) oc {rs 2 + (lax -
g0'H(151 - g0} with (4.56) H = Mxx - Mxo. Mo.o.-Mo.x = w'wx -
wdw2(w2'w2)-xw2'w + z'zx - zdzo.(z2'z2)-xz2'zx. Note that nx + no. - mx -
mo. = v + m, and thus the exponent of(4.55) can be written as -(v + m)/2.
The mean of (4.55), {x, a subvector of 1 = (X'X)-XX'y, can be solved for
explicitly and is given by (4.57) where H is defined in (4.56), V: = Zx'Zx -
Zx'ZO.(ZO.'ZO.)-xZo.'Zx, x is the least squares quantity obtained from a
least squares fit of Yx on Wx and Wo., and fix is the least squares quantity
obtained from a least squares fit of yo. on Zx and Zo.. From (4.56) and the
definitions of Vx and Vo. we have H = Vx + Vo.. Thus 1 in (4.57) is a
"matrix weighted average" of $x and x. Similar analysis can be per- formed
to obtain the posterior pdf's for The above analysis will be useful in pooling
data from two sources. Note that it is not equivalent to what would be
obtained had we analyzed (4.51) conditional on 15x = x, the least squares
quantity obtained from (4.50). In this case, for example, we obtain a
conditional rather than a marginal posterior pdf for �o.; the marginal pdf
can be obtained from (4.54) as was done for 15 above. Last, should the
elements ofux and uo. in (4.50) and (4.51) have differing variances, say
*xo. and ,o.o., the methods of Section 4.2 would have to be employed in the
analysis of (4.50) and (4.51)? See V. K. Chetty, 1oc. cit., for further details
and an application of these methods.

110 SPECIAL PROBLEMS IN REGRESSION ANALYSIS TWO


REGRESSIONS WITH SOME COMMON COEFFICIENTS 1 1 1
APPENDIX I Here we provide the lemma needed to establish that the
matrix H appearing in (4.17) is positive definite. Lemma. Let X,' be the k x
(T + 1) augmented matrix X,'= [Xo'iX'], where Xo--(Xo,Xo2,...,Xo0 and let
z'=(1, p, p2,...,p') be a 1 x (T + 1) vector. If z and X, are linearly
independent, H is positive definite. Proof. It suffices to show that the matrix
X - pX_ is of rank k. We can write X- pX_ = AX,, where -p 1 -p 1 A= -p 1_
is a T x (T + 1) matrix with all elements not shown being zero. It is easily
seen that A is of rank T and w = z is the only nontrivial solution of the
system of equations Aw = 0. Since X, and z are assumed to be linearly
independent, there exists a (T + 1) x (T- k) matrix C such that B = [zX,]C]
is a (T + 1) x (T + 1) nonsingular matrix. Thus the rank of the product AB
is T, but note that = [OAX, iAC] has only T nonzero columns. Hence the
rank of AX, must be k and the lemma follows? APPENDIX 2 In this
appendix we present asymptotic expansions of the multivariate "normal-t"
pdf in (4.31) and the multivariate "double-t" pdf in (4.38). With respect to
(4.31), the factor in the multivariate Student t form can be expanded as
follows. We can write log + v2/j 2a The foregoing condition is also
necessary; that is, if z and X, are not linearly independent, H will not be
positive definite. where Q. ([ .) .([ O/s?'. Then, on employing ( 0) 02 1 (02]
2 +1 log 1 q--= = v,. 2\v-/ 5\v--/ =Q----a+R, where R is the remainder term,
(1) can be written as' exp (--)exp (_15 [k Q-a+(vo.+ k)R]}. Now expand the
second exponential factor as e"= 1 + x + x2/2! + xa/3! +... to obtain (2) exp
(--} qva -* , t--0 where qo = 1, q = �[Qo? - 2kQo.], q2 = t613Q.4 - 4(3k +
4)Q. a + 12k(k + 2)Q2�'], and so on. Thus (4.31) can be approximated by
exp -� ( - )' Z--! ( - ) + {22 12 t = 0 = exp [-i( - )'A( - )l qY2-', with and A
given in (4.33) and (4.34), respectively. Thus is the mean of the leading
normal term in the asymptotic expansion of the multivariate "normal-t" pdf
as stated in the text. In the case of the multivariate "double-t" pdf in (4.38),
namely ( Qx.]-%+,)l.(1 Q.,,)-%+)l. 1+ + , v / \ with Q: = ({t - [x)'Zx([ -
$:)/sx �', both factors are expanded exactly as described above to yield exp
[-�(Qx + Qo.)] Z p,vx-' Z t=O = exp [-�([ - )'D([ - )] Z l, j=O where and
D are shown in (4.40) and (4.41), respectively, the q[s are as defined above,
and the p{'s are given by Po = 1,p = �(Q2 _ 2kQ). p._,. =

112 SPECIAL PROBLEMS IN REGRESSION ANALYSIS �-[3Q 4 -


4(3/c + 4)Q a + 12/c(/c + and so on. Thus is the mean of the leading normal
term in the asymptotic expansion of the multivariate "double-t" pdf.
Further, in the paper by Tiao and Zellner, loc. cit., methods are described
for taking higher order terms into account in analyzing this pdf; that is,
integration of the above series is accomplished by noting that each term is a
bivariate polynomial in Q: and Q2. Thus integration of each term involves
evaluating mixed moments of the quadratic forms Q and Qo., which is done
by employing the bivariate moment-cumulant inversion formulas given by
Cook. �'4 QUESTIONS AND PROBLEMS 1. Using the model shown in
(4.1) and the prior assumptions shown in (4.3), derive the conditional
predictive pdf for yr is a given value. How do the mean and variance of y,. ,
given p = po, depend on po? Explain how the unconditional predictive pdf
for yr.z can be computed. 2. Suppose that the parameter p, appearing in
(4.1b) is believed to satisfy 0 < p < 1 and to have prior mean and variance
equal to 0.5 and 0.04, respectively, How can this prior information be
represented by a beta pdf? 3. Use the prior pdf in Problem 2 along with
other prior assumptions in Section 4.1 to obtain the joint posterior pdf for
the parameters/S and p in the simple regression model in (4.1). 4. For each
of the two sets of data in Table 4.1 use the result obtained in Problem 3 to
compute a marginal posterior pdf for p by means of a bivariate numerical
integration. Comment on the properties of the resulting posterior !pdf's for
p. 5. In Chapter 3, Table 3.1, Haavelmo's data on income (y) and investment
(x) are presented. Use these data to compute the posterior pdf for p in the
following model: y - -- /s(xt - .) + ut, t "- 1, 2,..., 20, ltt= pltt-! + t, where
and , are sample means for income and investment, respectively. Employ
the assumptions of Section 4.1. What could account for the posterior pdf for
p being centered far from zero ? 6. In Problem 5 compute the marginal
posterior pdf for the investment multi- plier IS and compare it with the
posterior pdf shown in Figure 3.1. � ?. Assume that (4.1a) has an intercept
term,/So; that is, Yt =/so + /sxt + ut. By combining this with (4. lb)we
obtain yt a M. B. Cook, "Bivariate ,t-statistics and Cumulants of their Joint
Sampling Dis. tribution," Biometrika, 38, 179-195 (1951). QUESTIONS
AND PROBLEMS 113 + t. Is there a difficulty in estimating/go from this
equation when p = 1 ? Will this difficulty be present if a priori we restrict
the range of p by 0 < [pl < 1 or use a prior pdf that assigns zero probability
density to the value =17 8. Using the likelihood function in (4.2), evaluate
Fisher's information matrix, a typical element of which is given by -E(& 2
log l/ ), where 0 and denote the ith and jth parameters, respectively, and the
expectation is taken with respect to the pdf for the y's. In performing this
derivation, note that (a) yt - Isxt = pM + Y.J_- o pt- , (b) E(yt - IsxO = pM,
and (c) E(y - Isx) = p�'M�' + '(1 - p2+z)/(1 - :). From an examination of -
E(&: log l/&p ) what can be said about the information regarding p when []
> 1 ? If 0 < ]Pl < 1 and T is large, show that the part of the information
matrix pertaining to p and IS is approximately diagonal. What does this
imply ? 9. For the model in (4.12), given p = o, derive the predictive pdf for
a vector of future observations, say z'= (yr+x, yr+:,...,YT+q), assumed to be
generated by z = W[3 + u. with W a given q x k matrix and u. a q x 1 vector
of future error terms generated by the same process as the elements of u in
(4.12). Use a diffuse prior pdf for the unknown parameters of the model. 10.
Derive the marginal posterior pdf's for , and ,2 in (4.43a,b) using a diffuse
prior pdf for the parameters and assuming that o. 2:. Compare these pdf's
with those obtained using the same prior assumptions but with : = :' = : and
a diffuse prior pdf for . 11. Interpret the pdf's plotted in Figure 4.. 12. In
(4.43a,b), corresponding slope coefficients ISx and/so. are assumed to be
the same in the two relationships. If this assumption is questioned, what can
be computed to provide a check on this point ? 13. For the system in (4.S0)
and (4,1), with a diffuse prior pdf for the parameters, derive the joint
marginal posterior pdf for }he elements of y. and provide its mean and
variance-covariance matrix. 14. Provide an analysis of (4.50) and (4.1)
when the elements of ux have variance 2 and those of u have variance tr
with x - 2 and trx and are assumed to be independently distributed a priori.

CHAPTER V On Errors in the Variables It has been generally recognized


that economic data often contain errors and that the presence of
measurement errors can vita}ly affect the results of analyses. In view of
these generally accepted propositions it is not surprising to observe that
considerable effort has been expended on the development of methods for
analyzing data which are 'contaminated with measurement errors. In this
chapter we consider several models and problems related to measurement
errors. After analysis of several preliminary problems, which illustrate
problems associated with certain basic "errors-in-the-variables" models
(EVM's), we take up the analysis of the classical EVM. This model can be
viewed as a generalization of the simple regression model, considered in
Chapter 3, which takes account of random measurement errors in both the
dependent and independent variables. Two forms of the classical EVM,
namely the functional and the structural, are analyzed by maximum likeli- i
hood and Bayesian techniques. This comparative approach is particularly i
revealing in the present instance because, as will be seen, prior
informationll plays a vital role in both sampling theory and Bayesian
approaches. After analyzing the classical EVM, we consider a form of this
model which! incorporates special assumptions about the systematic parts
of observed variables, namely that they can be represented by the
systematic parts of regression equations. As will be seen, this analysis is
closely related t o "instrumental variable" estimation techniques for the
EVM. Although the analyses of this chapter cover only a subset of those
bearing on EVM's, thi s subset is of considerable importance in econometric
work. 5.1 THE CLASSICAL EVM: PRELIMINARY PROBLEMS Before
turning to the classical EVM, it is instructive and illuminating t o consider
the closely related problem of n means. Let y, y.,..., y be n independent
observations drawn from n normal populations, each with thi same variance
a 2 but with different means; that is, y, (i = 1, 2,..., n)i! assumed to be
randomly drawn from a normal population with mean : an x This problem is
discussed in M. G. Kendall and A. Stuart, The Advanced Theory Statistics,
Vol. II, New York: Hafner, 1961, p. 61. 114 THE CLASSICAL EVM:
PRELIMINARY PROBLEMS 1 15 variance ,2. It is to be noted that we
have n observations and n + 1 unknown parameters, n means, :x, :.,..., :,,
and . Just from this count of observa- tions and parameters we may guess
that it will be difficult to estimate all n + 1 unknown parameters. We wish to
analyze the nature of this difficulty because it arises in the functional form
of the classical EVM, albeit in a slightly more complicated manner. The
likelihood function for the problem of n means is given by (5.1) l(, ,[y) oc-
exp - (y - )'(y - ) where g'= (:, :.,..., :,), a vector of unknown means, and y'=
(y, y.,..., y,), the observation vector. On differentiating the logarithm of the
likelihood function partially with respect to and the e,'s and setting the
derivatives equal to zero in an effort to obtain maximum likelihood (ML)
estimates, we find (5.2a) and (5.2b) 0 . = (Y - )'(y - ) Thus from (5.2a), the
y[s are apparently the ML estimates of the correspond- ing-'s. On inserting
the "ML estimate" = y for g in (5.2b), it appears that the "ML estimate" for
a is 0 ' = 0. As noted by Kendall and Stuart, this is obviously an absurd
resultf The basic defect with the above "ML analysis" is that the likelihood
function in (5.1) does not possess a finite maximum in the admissible region
of the parameter space 0 < a < c and -oo < : < co, i = 1, 2,..., n. This is easily
established by substituting from (5.2a) in (5.1) to obtain /(*IY, g - Y) c 1/a
n, which clearly does not possess a maximum for 0 < < c. Alternatively, if
we substitute from (5.2b) in (5.1), we obtain /(glY, -- 0') oc [(y - g)'(y - g)].,
which again has no finite maximum value in the admissible parameter
space. Thus, although (5.2a) is the ML estimate for g for given finite " and
(5.2b) is the ML estimate for for They state, op. cit., p. 61, that this is an
example in which "... the ML method may become ineffective." a C. M.
Stein, "Inadmissibility of the Usual Estimator for the Mean of a
Multi9ariate Normal Distribution," in J. Neyman, Ed., Proc. Thh'd Berkeley
Symp. Math. Statist. Probab., Vol. 1, Berkeley: University of California
Press, 197-206 (1956), shows that the ML estimator in this case is
inadmissible relative to a quadratic loss function for n > 3. See also W.
James and C. M. Stein, "Estimation with Quadratic Loss," in J. Neyman,
Ed., Proc. Fourth Berkeley Symp. Math. Statist. Probab., Vol. 1, Berkeley:
University of California Press, 361-379 (1961), and C. M. Stein,
"Confidence Sets for the Mean of a Multivariate Normal Distribution," J.
Roy. Statist. Soc., Series B, 24, 265-285 (1962).

116 ON ERRORS IN THE VARIABLES given , (5.2a) and ($.2b) do not


jointly yield ML estimates for and Further, it is not hard to show that the
likelihood function in (5.1) does not approach a limit as a2_+0 and --y;
therefore 0 and y are not ML estimates. 4 It is interesting to explore the
results of a Bayesian analysis of the n means problem. Let us for present
purposes employ the following prior pdf: 1 -co < < , i = 1,2,...,n, (5.3) p(l[, )
oc -, a On combining (5.3) with (5.1), the posterior pdf is (5.4) p(L oly) oc -
r exp ( - y)'( - Y) ' From (5.4) we note that the conditional posterior pdf for
g, given a is a proper multivariate normal pdf with mean vector y and
covariance matrix r�'I,. Further, for given 1[, the conditional posterior pdf
for a is a proper inverted gamma pdf. Thus, as with the ML approach,
conditional inferences can be readily made. However, joint inferences about
g and cannot be made, since (5.4) is an improper pdf; for example, if (5.4)
is integrated with respect to the elements of , the result is p(oIY) cr 1/,, 0 <
o < c, which is improper and precisely in the same form as our diffuse prior
pdf in (5.3). Thus the sample information in this problem gives us no
information about r. Also, on integrating (5.4) with respect to , the result is
p(lg[y)cr.i [([ - y)'(l[ - y)]- ' which is an improper pdf. Therefore there is not
enough sample information to make joint inferences about a and the
elements of [.i On the other hand, with prior information about one or some
of the param-'i eters, for example, given a, inferences can easily be made
about the elements of [. Thus exact prior information, say * = -0 with o
known, enables us to:. make inferences about the elements of . It is
extremely important to appreciate that less precise information about' , less
precise than a = o, also permits us to make inferences about the elements of
1[; for example, if we use the following prior pdf, 1 [ < ;, < (5.5) p(l[, .) cc
/.exp , rrVo + \- 2a ' } 0 < * < c, i= 1,2,...,n Observe that lim l as I[--* y and
then .a---,-0 is infinite.-On the other han lim l as *a--* 0 and then y--* I[
approaches a finite limiting value of zero; no t limo_.o(..)-Xexp[_(2.a)-X(y-
l[)'(y- I[)] = 0. Since the limits are differeni depending on how we approach
the point I[ = y and .a = 0, the function does not exis at this point. Also note
that Oa = 0 is not in the admissible parameter space 0 < THE CLASSICAL
EVM.' PRELIMINARY PROBLEMS where vo and So are prior
parameters, v0, So > 0, the posterior pdf is 1 (_[VoSoO.+(l[_y),(l[_y)], (5.6)
p(, [y) cc a,+o + exp 1 which is a proper pdf. The marginal posterior pdf for
is 117 (5.7) P(IY) oc [VoS0 2 + ( - y)'( - y)]-o+Vo>/., which is in the form
of a multivariate Student-t pdf with mean vector y. Thus, rather than
assuming that = 0, a known value, we are able to incorporate less restrictive
prior information which permits us to make inferences about the elements
of g. However, on integrating (5.6) with respect to the elements of 1[, we
obtain the prior pdf for as the result. Thus the sample information does not
add to our knowledge of r in the present problemfi Let us take up next the
problem of n means in which we have m observa- tions for each mean; that
is, our model for the observations is (5.8) y, = t: + u, i = 1, 2,..., n, where y[
= (y, y{.,..., Yirn), is an m x 1 vector of ones, that is, t' = (1, 1,..., 1), : is the
ith unknown mean, and u, is an rn x 1 error vector. We assume that Eut = 0,
Euu[ = a�'Im for i = 1, 2,..., n, and Eutu/ = O, a null matrix, for i j. Further,
we assume that the elements of the u vec- tors are jointly normally
distributed. To simplify the notation, we can write (5.9a) Ya = . + u or
(5.9b) y = WE + u, where y denotes the vector on the lhs of (5.9a), W, the,
block diagonal matrix on the rhs of (5.9a), g' = (:x, :o.,. :), and u' = (ux', u '
.., � ', 2 , � Un t). Note that in the present problem we have nrn
observations and n q- 1 . unknown parameters, so that for n and m each
larger than 1 we have more observations than unknown parameters in
contrast to the situation in which ?.m = 1, analyzed above. Still, as shown
below, there is a fundamental %replication with the ML approach. Also, as
n increases, (5.7) does not become concentrated about the mean vector y.
118 ON ERRORS IN THE VARIABLES The likelihood function for the
system in (5.9) is 1 [l ] ely) exp (y - - (5.10) cc- exp --ff [(y - 14)'(y - W) + (
- )'W'W(I[ - )] , e nm where = (W' W) - W'y. Since W'W is positive definite,
it is clear that = is the ML estimate. As is easily established, = (t't)-Xt'y = Y,
the ith sample mean; that is y 1 = Yw Further, the ML estimate for e 2 is
(5.11) 02_ 1 (y_ W)'(y- nm which is obtained by differentiating the
logarithm of the likelihood function with respect to e ' and setting the
derivative equal to zero. 6 As can easily be established, = and e ' = 0 ' are
indeed values for and e ' associated with a finite maximum of the likelihood
function. Yet, as Neyman and Scott and Kendall and Stuart point outf there
is still a problem. The ML estimator for e ' has a bias that does not
disappear as n-+ c with m fixed; that is, (5.12) EO 2 nm - n e2 nm Thus, as
n--> with rn fixed, the bias of the ML estimator does not dis- appear. If, for
example, rn = 2, E0 2 = �e 2 for all n. Heuristically, Kendall and Stuart
interpret this situation as the persistence of the small sample bias of the ML
estimator. Note that the number of unknown parameters in- creases as n
increases. Indeed, the ratio of the number of parameters to the number of
observations, (n + 1)/nm approaches 1/rn which can be appre- ciable, � for
the case m = 2. Thus we do not get out of the small sample situation as n
increases in the present problem. 6 From (5.10), log I = const -nm log. - (2-
2) - X(y _ Wig)'(y - Wig). Then (dlog D/de = -nrn/. + (.a)-X(y _ Wig)'(y -
Wig) = 0 yields 2 = (nm)-X(y - Wig)'(y - Wig), which, ,with ig = , yields
(5.11). J. Neyman and E. Scott, "Consistent Estimates Based on Partially
Consistent Observa- tions," Econornetrica, 16, 1-16 (1948). THE
CLASSICAL EVM' PRELIMINARY PROBLEMS 1 19 An ad hoc method
for correcting this defect of the ML method in this "incidental parameter"
problemf as suggested by Kendall and Stuart, would be to make a "degrees-
of-freedom" correction to the estimator for e ' in (5.11). We have nm
observations and estimate nf[s. Therefore the degrees of freedom left to
estimate e 2 is nrn - n = n(rn - 1). If we define 1 (5.13) �2= n(m- 1) (y -
W)'(y- W), then Eo ' = e 2 for all n. Below we shall see that a similar
"incidental param- eter" problem is present in the classical EVM. For the
Bayesian analysis of the model in (5.9) let us employ the prior assumptions
given in (5.3). Then the posterior pdf is p(g, ely ) oc e+---- exp -- (y - W)'(y
- Wg) (5.14) Or; enrn exp ---ff [(y -- W)'(y - W) + ( - )'W'W(g - )] � The
marginal posterior pdf for g is (5.15) P(IY) c [(y - W)'(y - W) + ( - )'W'W(g
- )]-m,., which is in the form of a proper multivariate Student t pdf with
mean vector E = and covariance matrix (W'W)-%'s2/(v ' - 2). 0 It should be
noted that with m fixed as n --> co this covariance matrix does not approach
a null matrix ;' that is, as n --> with m fixed, the marginal posterior pdf for
does not become concentrated about . This Bayesian result is an analogue of
the sampling theory result that the ML estimators for the elements of g have
variances that do not tend to zero as n --> oo with rn fixed. We can obtain
the marginal posterior pdf for e by integrating (5.14) with respect to the n
elements of , an operation that yields (5.16) 1 [ p(ely ) cr exp where v' =
n(m - I) and s ' = (y - W)'(y - W)/v'. Note that the integra- tion with respect
to the elements of leads automatically to a reduction of the exponent ofe in
the denominator of(5.14) from nm + 1 to n(m - 1) + 1, which is analogous
to the degrees-of-freedom correction discussed in con- nection with (5.11).
Also from (5.16) we have E(e'ly) = v's2/(v ' - 2), the posterior mean of e '.
On viewing this quantity as an estimator, it is seen to be a consistent
estimator in contrast to that shown in (5.11). Neyman and Scott have called
the 5's incidental parameters. Here v' = n(m - 1) and v's ' = (y - W)'(y - W).

120 ON ERRORS IN THE VARIABLES Next, we take up the problem of n


means with rn = 2 observations for each unknown mean, one with variance
*' and the other with variance that is, our n x 1 observation vectors, y and y.,
are assumed to be generated as follows: (5.17a) yx -- + Ux, y. = + U., and
Eux = Eu. = 0, Euxux'--' 'I,, Eu2uo.' (5.170) where 1[' = (x, o.,..., ,) a.-I, and
Euxuo.' = 0, an n x n null matrix. We also assume that the elements of ux
and u. are normally distributed. Under these assumptions the likelihood
function is given by l(, ax, ,o. lYx, yo) cr (5.18) Let us attempt to find ML
estimates. By differentiating log 1 with respect to ,x and *2 and setting
these derivatives equal to zero we obtain (5.19) a (yx - l[)'(yx - ) and ao. ' =
(y' - l[)'(y2 - ) ---- n n as the maximizing values of ,o. and ,o. which are seeri
to depend on the unknown vector 1[. On differentiating log l with respect to
the elements of we obtain y/.2 + y2/a?, i = 1, 2, n, - 1/cha + 1/ �' .... ,
(5.20) __ (yx, + 1+ where ; = ,x/eo?, as the maximizing values of the :[s
which depend on the variance ratio ,. It is clear that { is a weighted average
of yx, and yo.{, with the reciprocals of their respective variances as
weights. Now, if is known, the estimators in (5.19) can be computed. On the
other hand, if ; is known, the {'s can be computed from (5.20), and on
inserting for in (5.19) the estimator for a is o �' = (Y - )'(Y - )/n. However,
it is not difficult to show that x� Eax ' = exa;/(1 + ;); that is, axe' has a bias
that does not disappear as n --> or. Further, if we write the likelihood
function in (5.18) as O-We have 8 2 = (l/n) [(Y - ) - ( - )]2. With - = [(Y -
)/2 + (Yai -- 5t)] 0221/(1/0x2 + 1/022)' E&12 --- el2 -- 2(1/0x2 + 1/02'3-1 +
(1/q?' + 1/�22) -1 o - (1/o + lloyd') - = o/(1 + ). THE CLASSICAL EVM'
PRELIMINARY PROBLEMS 121 (5.21) l(, ,x[,, y, y.) c ,-- exp 2. [(y - )'(y
- ) + A(ya - )'(ya - )] , the ML estimator for ,xa, given A, is 1 (5.22) x = 2
[(Yx - )'(Y, - ) + h(y - )'(y - )1, with the elements of given by (5.20). It is
straightforward to show that the expectation of the estimator in (5.22) is Ea
a = �,a. Again the ML estimator has a bias that does not disappear as n--->
co. This difficulty arises because no allowance has been made for the fact
that the n elements of the g vector have been estimated; an ad hoc
adjustment for degrees of freedom could, of course, be made to remove the
bias just discussed. This problem arises also in the EVM as will be seen
below. To return to (5.18), we may ask if it is possible to obtain ML
estimators for all the parameters of the model, the n elements of g and the
two variances ' and ?, given our 2n observations, y and yo.. Intuitively, this
does not appear to be possible, since opposite each unknown mean we have
just two observations, each with its own unknown variance. On substituting
from (5.20) in (5.19), we obtain 1 ' (y _ y), off=n(1 + X) (y- y) and O =n(1
+.X} and us for these two conditions to be satisfied we must have which
can hold only if a = 1. Thus the necessary conditions for a maximum cannot
in general be satisfied; a maximum of the likelihood function does not exist
for ,x a aa a with both ax a and *aa unknown. xx The above difficulty arises
because xa and :a are not identified. This is most easily seen by considering
the distribution of w = yx - ya = ux - ua. The vector w has a zero mean and
covariance matrix waI, where wa= ax + aa a. There are many values of ,xa
and a a that sum to a particular wa. Since the pdf for w is completely
determined.by specifying the quantity wa, it is not possible to identify ,xa
and ,aa without further prior information. If we approach the present
problem from the Bayesian point of view with the following diffuse prior, 1
- < < , i = 1, 2,...,n, (5.23) p(, a, %) m x2 0 < a< , i= 1,2 x Note that if we
substitute from (5.19) into (5.18) the resulting function has no finite
maximum.
122 ON ERRORS IN THE VARIABLES it is not difficult to show that on
combining this prior with the likelihood function in (5.18) the resulting
posterior pdf, (5.24) p(L Y2) c 1 [ O1 n+ ly2n+ 1 exp - 1 2ox ' (yx - 1D'(y x
- 1 2o?' (y9. - l[)'(y2 - ) , is improper. However, the conditional posterior pdf
for g, given ox and and the conditional posterior pdf's for ox and ., given g,
are all propePS'; for example, given and o, the posterior pdf for g, is the
following proper normal pdf: (5.25) p(lax, aa, yx, y0 oc exp [ �2 + ) ] - - ,
where = (yx/ox' + ya/a?')/(1/ax2+ 1/a)) is the posterior conditional mean.
This conditional posterior distribution's covariance matrix rrx'rr.2/ (ox a q-
oo.')I, has elements that do not tend to zero as n increases. To illustrate the
nature of the identification problem associate d with (5.24), we complete the
square on and integrate with respect to th e elements of this vector, an
operation that yields the following result' (5.26) p(o, o2[y, cr -- 1 1 [ ox. (x'
+ o.a) 'v�' exp 1 Y2)]' 2(o 2 + o22) (y - ya)'(y - The factor 1/oxoo. comes
from our diffuse prior, whereas the second factor is just the normal pdf for
yx - ya. From the form of this latter pdf, it is impossible to identify ox and
,a without adding more prior information than we have added with our
diffuse prior pdfi Further, by changing variables in (5.26) from ax and 2 to
,x and = oa/*a a and on integrating with respect to ax the marginal posterior
pdf for ,X is just p(,Xlyx, ya) which is improper. xa Thus, when ox and 2
are both unknown, there are difficulties in making inferences about all
parameters, whether from the ML or Bayesian approach. If, however, the
ratio of unknown variances, say = ox'/o. a, is known, it can be established
that the above difficulties disappear. In fact, for ,X = 1 the problem reduces
to the one analyzed above in which we have two observa- tions per
unknown mean, with their variances equal. Knowing the value of , is prior
information that resolves the identification problem. x2 Also, the
conditional posterior pdf for 1[, given )t = 012/022, is a proper normal pdf.
xa Note that p(x, .) oc l/oxon. implies P(*x, )t) a: (1/x)(1/)t) and thus the
posterior pdf for , is identical to its prior pdL CLASSICAL EVM: ML
ANALYSIS OF THE FUNCTIONAL FORM 123 5.2 CLASSICAL EVM:
ML ANALYSIS OF THE FUNCTIONAL FORM In the 'classical EVM we
have n pairs of observations, (yx, Y2), i = 1, 2, ..., n, which are assumed to
be generated under the following conditions' (5.27) Yu = :{ + uu i = 1, 2,...,
n, Y2 = h + (5.28) with (5.29) *h = rio + fi:, i -- 1, 2,..., n, where rio, fi, the
{'s, and the :'s are unknown parameters. In (5.27) to (5.28) we assume that
(yx{, y.) are distributed independently of (yx, y.), i % j and have a normal
distribution with Eyn = , Ey.{ = % Vat Yn = ox ', Vat y., -- % and Cov(yn,
y.) = 0. This last condition implies that the error in Yu, namely, uu, is
uncorrelated with (and here independent of) the error in t2. TM If we
combine (5.28) and (5.29), the model can be written as (5.30) Yu = e + uu }
i = 1, 2,..., n. (5.31) Ya, = rio + fie, + Written in this form, it is clear that this
model is closely associated with both the simple regression model and the
problem of n-means; that is, if there were.no measurement error in (5.30)
(i.e., yx = t5), the model would be in precisely the form of a simple
regression model. On the other hand, if/go = 0 and fi = 1, the present model
becomes the n-mean problem, with two observa- tions per mean having
unequal variances o = and %=. This fact suggests that a problem concerning
the existence of a maximum of the likelihood function may arise in
connection with the present model just as it did in the n-mean problem to
which reference has just been made. The likelihood function for the
parameters of the system (5.30) to (5.31) is given by l(l, , (5.32) 1 1 exp [---
O1 n O2 n 1 2Ol 2 (Yx - )'(yx - ) 1 2oa..(Ya - riot - fil[)'(ye -/got- fig) , x As
is well known, (5.27) to (5.29) can be viewed as a form of Friedman's
consumption function model if for the ith household in a sample of n
households we let Yx = log of measured income, y, = log of measured
consumption, e = log of "permanent" income, , = log of "permanent"
consumption, ux, = "transitory" income, and u{ = "transitory" consumption;
5, % uu, and u are unobserved quantities. Further, Friedman's theory
suggests that/ -- 1; that is, the elasticity of permanent consumption with
respect to permanent income is one.

124 ON ERRORS IN THE VARIABLES where [' -- (/80,/g), [' = (x, .,..., ,),
Y' = (Y', yo.'), yj' = (yjx, yo.,..., y), j -- 1, 2, and t is a n x 1 column vector,
with each element equal to one; that is, t' = (1, 1,..., 1). On taking the
logarithm of both sides of (5.32) and differentiating with respect to the
elements of , 0.x, and 0.2 we obtain (5.33) log/ 1 (Yx - ) + 1 [_/go. +/g(Y2
t/go)], d-'-- = 0.o. 0..-'. - log I n (5.34) 1 (y _ )'(Yx - ID, (5.35) Olog/ _n + 1
(yo. - 0.2 --' 0'2 0.2 3 As a necessary condition for a maximum, values of
the parameters must exist in the admissible parameter space, which sets
these derivatives equal to zero as well as the derivatives with respect to /go
and/g. On setting (5.33) equal to zero, we have 6 yx/3x' + (o./3?')(y. - ot)/ =
+ � = 1+' where 0 = 0�'/�'/02 �' and = (yo. -/ot)//. Substituting from
(5.36) in (5.34) and (5.35) and setting these derivatiyes equal to zero, we
have the following results' (5.37) --' (1 + 0) ' (yx - )'(Yx - ), 2 (5.38) o. =
(Yx - v)'(yx - ). 0.2 n(1 + 0) 2 These two equations can hold simultaneously
if, and only if, o. = x, For this problem we define the admissible parameSer
space as follows: 0 < **o. < o, i= 1,2; -o < < 0% i= 1,2,...,n; -oo < o, < oo;
ax 2 oo.o.; and o. oo.�'/oxo.. The reason for introducing this last condition
is explained in the text. x Equation 5.36 can be more fully appreciated by
writing (5.30) to (5.31) as Yx = ( + ux and (Y2 - o)/ = + uad. The elements
of (5.36) are then weighted averages of Yx{ and (Y2 - o)/18. For given t8o
and 18(0) this is an n-means problem with two observations per mean and
variances x 2 and .2/2. x7 This result was obtained by D. V. LindIcy,
"Regression Lines and the Linear Func- tional Relationship," J. Roy. Statist.
Sec. (Supplement) 9 (1947), 218-244. His inter- pretation of it, as well as
that of J. Johnston, Econometric Methods. New York: McGraw- Hill, 1963,
p. 152, is somewhat different from ours. CLASSICAL EVM: ML
ANALYSIS OF THE FUNCTIONAL FORM 125 However, in our
definition of the admissible parameter space, we explicitly stated that /g'-
0..o./0.. and thus the quantity o.= &../&o. falls in an inadmissible region of
the parameter space. Since this is so, the necessary first-order conditions for
a maximum of the likelihood function cannot be simultaneously'satisfied;
hence a maximum of the likelihood function does not exist in the
admissible region of the parameter space. xa Since there are basic
difficulties with the analysis of the model in (5.30) to (5.31) when all
parameters are assumed unknown, analyses have often gone forward under
the assumption that 4 = 0.o.'/0.x �' is known exactly? Under this
assumption a unique maximum of the likelihood function exists and thus
ML estimates can be obtained. To do so we write the likelihood function as
(5.39) 1 ( I(1, g, 0.o. ly, oc exp - 1 + (Y2 -/got -/g)'(yo, -/got - On
differentiating the logarithm of the likelihood function L with respect to the
unknown parameters and setting these derivatives equal to zero, we have
(5.40) L = .. 1 (y. _/got -/gg)'t = 0, (5.41) L 1 (y. _/got -/gg)'g = O, 0.2 2
(5.42) OL = 1 [4(Y - ) +/g(Y. - riot- I)1 0, and (5.43) OL 2n+ 1
[4(Yx-)'(Yx-)+(y. /got /gg)'(y. /got /g)] 0. From (5.40) (5.44) xs If we
combine the following diffuse prior pdf, P(18o, 18, , ax, *.) c l/*xo., 0 < **
< 0% i = 1, 2, and -oo < rio, fi, fi < 0% i = 1, 2,..., n, with the likelihood
function in (5.32), it is not difficult to show that the joint posterior pdf is
improper. x* In econometrics this precise knowledge is usually unavailable.
Below we present methods for utilizing less precise information 'about .

126 ON ERRORS IN THE VARIABLES where y. = ydt/n and = 't/n. On


substituting this value for/go in (5.41), (5.45) (y. - - - b). = b) Further from
(5.42) we have (5.46) andO-O (5.47) On substituting from (5.47) in (5.45),
the result is + + (5.48) = m + 2m + m where m = (y, - Y,0'(Y - yt)/n for i,j =
1, 2. This last expression yields (5.49) amxa + (mxx - maa) --mxa = 0 as the
necessary condition on . Then the ML estimator for is a solution of the
quadratic equation (5.49), namely, (5.50) maa - mxx + (maa - mx) a + mla.
= 2mxa Note that the algebraic sign in front of the square root sign is
positive, since this choice leads to a maximum of the likelihood function. ax
With = Yx from (5.) and (5.46), the ML estimator for o, from (5.44), is
whereas the ML estimator for , from (5.46), is (5.s2) = + ao Equation 5.47
is obtained by multiplying both sides of (5.46) on the left by t'[n d on the
right by t and subtracting the resulting expressions from the quantities on
the lhs and rhs of (5.46). ax See, for example, A. Madansky, "The Fitting of
Straight Lines when Both Variabl Are Subject to Error," J. Am. Statist.
Assoc., 54, 173-205 (1959). ML ANALYSIS OF STRUCTURAL FORM
OF EVM 127 Last, from (5.43), we have for the ML estimator for ?': (5.53)
,,9._. 1 '9. 2-' [(y - )'(y - ) + (y' - Jo[- J)'(Y. - Jo,- J)l. It has been pointed out
in the literature 2' that although Jo and j are con- sistent estimators 0?' is
not. In fact, plim = �?. This result is completely analogous to that obtained
in Section 5.1 in which we analyzed the problem of n means with two
observations and the same variance per mean. In the present problem we
know the value of 4, the ratio of the variances and in effect have just one
unknown variance; see the likelihood function in (5.39) which can be
written just in terms of . when 4 is known. The inconsistency of the
estimator 0 ' appears to be due to the fact that the ML method makes no
allowance for the fact that in forming the estimator, the n elements of g
have been estimated. Since the number of elements of g grows with the
sample size, an appreciable fraction of the sample is employed in estimating
the elements of g in small as well as large samples. As Kendall and Stuart
2a put it, the finite sample bias of the ML estimator for . does not disappear
as n grows, since we never leave the small sample situation. They suggest
the following procedure to correct the inconsistency. There are 2n
observations, and in (5.53) we have inserted estimates for n elements of g,
/0, and /g, that is, for n+2 parameters. Then 2n- (n+2)=n- 2 represents the
degrees of freedom remaining for the estimation of . A "corrected"
consistent estimator is o? = .2n,o./(n - 2). 5.3 ML ANALYSIS OF
STRUCTURAL FORM OF EVM The model in (5.30) to (5.31) in which
the vector is assumed to be stochastic is often referred to as the "structural
form" of the EVM. Perhaps the most basic general result for this case is due
to Kiefer and Wolfowitz who proved that if the parameters are identified
then "... under the usual regularity conditions, the ML estimator of a
structural parameter is strongly consistent, when the (infinitely many)
incidental parameters are independently distributed chance variables with a
common unknown distribution func- tion." .4 Among other results, they
establish that if the :[s are independently distributed and each has the same
non-normal distribution the ML estimators o.o. See for example, Kendall
and Stuart, op. cit., p. 386. o.a Op. cit., p. 61 and p. 387. 04 j. Kiefer and J.
Wolfowitz, "Consistency of the Maximum Likelihood Estimator in the
Presence of Infinitely Many Incidental Parameters," 4nn. Math. Statist., 27,
887-906, p. 887 (1956).

128 ON ERRORS IN THE VARIABLES for the structural parameters will


be consistent. The condition of non- normality is required to identify the
structural parameters when nothing is assumed to be known about the
parameters, a result due to Reiersol. 2s To indicate the nature of the
identification problem for the model in (5.30) to (5.31) when the :'s are
assumed to be independent of the y's and normally and independently
distributed, each with mean/z and variance f', we note that under this
assumption, along with the other distributional assumptions introduced in
connection with (5.30) to (5.31), the pairs of variables (Yu, yo.,), i = 1, 2,...,
n, will be independently and identically distributed, each pair with a
bivariate normal distribution. The following are the moments of (y{, yo4)
for i = 1, 2,..., n: (5.54) Eyx = p, (5.55) =/go (5.56) Varyu = *a + ', (5.57)
Varyo., =/g%' + o?, (5.58) Coy(y., Since these five moments completely
determine a bivariate normal distribu- tion and since sample moments are
sufficient statistics in this instance, we can equate sample moments to
population moments in an effort to obtain estimates. There is, however, a
basic difficulty with this approach, namely, that although we have five
relations (5.54) to (5.58) there are six unknown parameters, t,/go,/g, *x',
o..o., and r �'. Thus we cannot obtain estimates of all parameters unless
prior information is available to reduce the number of unknown parameters.
Let us first consider the case in which we know that/go = 0. When this
information is available, we can equate sample moments of the y's to their
respective population moments to obtain estimates as follows. From (5.54)
a = y and from (5.54) to (5.55) = .go./., where y and yo. are sample means.
The estimator/ is in the form of the ratio of two correlated normal random
variables; hence its mean and higher moments do not exist? 6 Using *'s O.
Reiersol, "Identifiability of a Linear Relation Between Variables Which Are
Subject to Error," Econometrica, 18, 375-389 (1950). 26 For the derivation
of the distribution of the ratio of two correlated normal variables see E. C.
Fieller, "The Distribution of the Index in a Normal Bivariate Population,"
Biometrika, 24, 428-440 (1932); R. C. Geary, "The Frequency Distribution
of the Quotient of Two Normal Variates," J. Roy. Statist. Soc., 93, 442-446
(1930); and G. Marsaglia, "Ratios of Normal Variables and Ratios of Sums
of Uniform Variables," J. Am. Statist. Assn., 60, 193-204 (1965). ML
ANALYSIS OF STRUCTURAL FORM OF EVM 129 this estimate for/g,
we have from (5.56) to (5.58) and where m12 2 2,.2 o. 2 --- m22 -- , 12 =
roll -- (Yet - YO(Ya - Y) i,j = 1, 2. m. = , Although the information/go = 0
enables us to obtain estimates of the remaining parameters, it must be noted
that this approach can lead to negative estimates for any or all of the
following variances' ,x, 0o. ' and r; that is, the prior information that these
variances are positive has not been introduced explicitly and thus
meaningless variance estimates can be obtained? Further, from (5.56) and
(5.58) we have Var yx = Cov(yx, y)/ + 0 and from (5.57) and (5.58), Var y =
Cov(yx,y,) + ,. Since ,x > 0 and , > 0, we have (5.59) Var yx{ Cov(yu, y) -
>0 and (5.60) Var y{ -/g Cov(yx, y2,) > 0. From (5.59) to (5.60), with
Cov(yu, yo.) > 0, we have (5.61) Cov(yu, y.) < Var yx Cov(yu, y.,) If
Cov(ylt , Y4) < 0, the inequalities in (5.61) are reversed. Thus the prior
information that o.xo. > 0 and o. > 0, combined with the relations (5.56) to
(5.58), implies that/g falls in one of two finite intervals, given by (5.61) or
(5.61) with the inequalities reversed? Since our estimator for/g is y./yx, it
has a range -c to co and thus can violate the bounds set by (5.61). In
summary, when it is known that/go = 0, point estimates for the remain- ing
parameters can be obtained by equating sample moments to their respective
population moments. However, estimates so obtained may not be a7 Similar
problems occur in "random ffects" models. See,. for example, G. C. Tiao
and W. Y. Tan, "Bayesian Analysis of Random-Effect Models in the
Analysis of Variance. I. Posterior Distribution of Variance Components,"
Biometrika, 52, 37-53 (1965). This condition has been generally recognized
and is stressed in D. V. LindIcy and G. M. EI-Sayyad, "The Bayesian
Estimation of a Linear Functional Relationship," J. Roy. Statist. Sot., Series
B, 30, 190-202 (1968).

ML ANALYSIS OF STRUCTURAL FORM OF EVM 131 130 ON


ERRORS IN THE VARIABLES consistent with basic prior information;
for example, estimates of variances may be negative contrary to the prior
information that variances are non- negative. Sampling theory procedures
for dealing with this "negative variance" problem for the EVM are not yet
availabler '� If, rather than/o, 4, = (.o./ax. is known, the sample moments
rnn, and rnx. can be inserted in (5.56) to (5.58) for their population
counterparts and estimates for/, r ', and (xo., obtained; that is, if we let 0n =
Var yx, 0a. = Vary2 and 02 = Cov(y,y.), then (5.58) yields ,= 0.// and (5.56)
yields -- 0n - r2 = 0n - 0o.//. Equation 5.57 is 0o.o. =/%2 + ax'4,, and on
substituting for r �' and ex �' the result is " p2012 + P�011 - 022) - 4,012
= 0. On replacing the 0j's by their sample counterparts, we have /�2m12 +
fl(4,mn - m22) - 4,mx2 = O, which is in precisely the same form as (5.49).
The ML estimate for/g is then given by (5.50). 30 Estimates for the
remaining parameters are given by q2 = rnx./, rnn o. and .2 = 4'. Also, from
(5.54) = yx and from (5.55) = y2 - Above, we have gone forward under the
assumption that the :'s are normally and independently distributed, each
with mean . and variance r �'. With this assumption, additional prior
information'must be added to identify the parameters. In line with Reiersol's
results, however, if the e[s have a non- normal distribution, the parameters
will be identified. To illustrate one case of non-normality assume (5.62)
p(g) cr const, -co < see < co, i= 1, 2,..., n. In (5.62) we assume that the e,'s
are uniformly and independently distributed. Below we shall see how this
assumption about the e's affects ML estimates. To formulate the likelihood
function in general we consider the joint pdf of Yx, Y2, and (5.63) P(Yx,
y2, glq) = P(Yx, where �' = (�', �0.') denotes the vector of parameters,
�x, the vector of o A "standard" approach would be to maximize the
likelih6od function subject to inequality constraints on the parameters. Of
course, if the estimates obtained by the procedure described in the text
satisfy the inequality constraints, they constitute a solution to the
constrained problem. If they violate the constraints, a nonlinear pro-
gramming problem has to be solved to obtain estimates that satisfy the
inequality constraints. ao Note that taking a positive sign before the square
root in (5.50) gives the same algebraic sign as m.. This ensures that the
estimate for r �', namely, q' -- mx.[l, is positive. parameters in the
conditional pdf for Yx and Y2, given g, and q2, the subvector of q
appearing in the marginal pdf for the elements of g. To obtain the marginal
pdf for yx and y. (5.63) must be integrated with respect to the elements of
1[; that is (5.64) h(yl, Y2lq) = f p(Yl, qx) g(glq) dg. Then h(yl, y2lq),
viewed as a function of the elements of q, is the likelihood function. For the
EVM, with the vector assumed to have a pdf in the form of (5.62), the
likelihood function is obtained by integrating the following expression with
respect to the elements of [: P(Yx, Y2, l[Iq0 oc 1 exp [- O, l n O'2 n 1 exp (-
y ln Y 2 n 1 2,2 (Yx - lD'(yx - 1[) 1 ] 2�2 2 -/got -/g)'(y2 -/ot 1 2.22 [(Yl -
)'(Yx - ) + (y2 - Pot- PD'(y2 - riot- fiD]}' To perform the integration with
respect to the elements of g we can complete the square in the exponent and
use properties of the normal distribution to obtain 3x h(yl, Y21�) (5.65) x
exp[ 2aa(: +/a)(ya-/ot--/y)'(ya-/ot- where q' = (/go,/, *x, 6). On maximizing
log h with respect to ,x,/go, and/, we find the maximizing values for/go and/
to be (5.66) Jo = Y2 - JY. and j = (Yl - ylt)'(y2 - y2t). (Yx - Yxt)'(yl - yxt) It
is seen that/o and are just the simple least squares estimates obtained from
regressing y. on yx. This surprising result 3' is vitally dependent on the ax
Note that (Yx - l[)'(Yx -- I[) + (ya - riot -/)'(ya - riot -ril[) = (+ ri2)l''l[ -
21['[yx + ri(Y2 -- riot)] + qYx'Yx + (Y2 -- riot)'(Y2 -- rio t) = (b + ri2)(l' -
)'0[ -- ) + t t Yx'Yx + (Y -- rio )(y9. -- riot) -- ( + ri9), = ( + ri9)(l[ )'( - ) +
6[(y9. - ' t ri), where [by + ri(y riot)]/( + riot - )(y2 - rio - riyDl/(4' + = - a,.
This result appeared in a preliminary paper by A. Zellner and U. Sankar,
"Errors in the Variables," manuscript, 1967.

132 ON ERRORS IN THE VARIABLES assumption about the pdf for g in


(5.62), an assumption which implies that the elements of g have infinite
variances. With this assumption the measure- ment errors' variances for the
elements of yx are negligible with respect to the variances of the elements
of 1 and/ is a consistent estimator. aa The informa- tion about the spread of
the t's contained in (5.62) is reflected by having the ML estimators take the
form shown in (5.66). 5.4 BAYESIAN ANALYSIS OF THE
FUNCTIONAL FORM OF THE EVM In the Bayesian analysis of the
functional form of the EVM information required to identify the parameters
of (5.30) to (5.31) is introduced by means of a prior pdf for the parameters.
As will be seen, there is no need to assume, for example, that the value of 6
= a0../exo. is known exactly in order to make inferences about parameters
of interest. Further, it should be appreciated that since prior information is
needed to identify unknown parameters no matter how large the sample size
is this prior information will exert an important influence on posterior
inferences. as To provide the Bayesian analogue of the ML results in (5.66),
let us employ the following prior pdf: 1 (5.67) rio, ri, ., with -oo < rio, ri, : <
c, i = 1, 2,..., n, and 0 < at < c, i -- 1, 2. In (5.67) we are assuming a priori
that the 's, 8o, ri, log x, and log 0. are uniformly and independently
distributed. The pdf in (5.67) represents our subjective prior beliefs about
the unknown parameters, whereas (5.62) is usually not given this
interpretation by sampling theorists. On combining the prior pdf in (5.67)
with the likelihood function in (5.32), we have the following posterior pdf:
p(la, lg, 0.1Y) m 1 1 a[+x+xexp 1 ( 1 2 (5.68) [- - - 1 ] 2?' (yo. - riot- ri)'(Y.
- riot - ) 1 2e [(Yx - )'(yx - ) + (ya - riot- fi)'(Ya- flor- fi)], aa Note that, in
general, plim/ = fi(1 + ax2/T 0.)- x, where/ is given in (5.66) and T2 is the
common variance of the elements of . As -' --* o% plim/-- fl. a Of course,
prior assumptions, say about the ratio of error variances, also affect
sampling theory analyses of the EVM, no matter how large the sample size.
BAYESIAN ANALYSIS OF THE FUNCTIONAL FORM OF THE EVM
133 where 15' = (rio,/8), y' = (yx', yo.'), and = e?/ex0.. Completing the
square on in the exponent and integrating with respect to the elements of ,
we have p(lS, oc (5.69) ,:4 + 4,) "/0' x exp -2et0.(ri0. + 6) (y0. - riot -
riY0'(Y0. - riot Changing variables to [5, ax, and 6, note that d0./0. oc d/�
and thus p(lS, ex, 6[y) (5.70) 4,o' + :"�0. + 4,) [ ' x exp -2x0.(ri0. + 6) (y0.
- riot - riyx)'(y0. - riot Then, on integrating with respect to x, (5.71) 1 p(l,
6ly) oc [(y0. - riot - riY0'(Y0. - riot l fvs0. + ([ - �)'X'X([ - where v = n - 2,
X= (t i yx), � = ( X' X)- ' X'y0., and vs0. = (y0. - X�)'(y0. - From the
form of (5.71) we see that [Y = (o, ) has a posterior pdf in the bivariate
Student-t form with mean , precisely the least squares quantity obtained
from regressing ya on y.a As with the ML analysis yielding (5.66), e present
result is critically dependent on the assumption about the form of the pdf for
in (5.67). In addition, note that the posterior pdf for in (5.71) is improper
and exactly in the form implied by the prior assumptions about and a in
(5.67). Thus with the prior assumptions in (5.67), no new information is
provided about from the sample. Although the assumptions embodied in
(5.67) permit us to make posterior .inferences about o and which are
appropriate in certain situations, they, of course, are not appropriate for all
problems. Since it is often dicult to know what to assume about the n
elements of , we shall develop a con- ditional analysis wherein we analyze
the EVM, given that (5.72) g = = yx + fi(ya - riot). e quantity on the rhs of
(5.72) is exactly in the form of (5.46), the "ML equation" for .a The joint
pdf for Yx and ya, given the parameters, is 0a This result appeared in A.
Zellner and U. Sankar, !oc. cit. ao S� also the discussion of (5.36) above in
which it is pointed out that the elements of ; are weighted averages of
estimates of , i = 1, 2,..., n.

134 ON ERRORS IN THE VARIABLES (5.73) 1 P(Yx, yo. lLla, or2, 4)


exp [4 2; (Yx, - + 2; (y2-/go ' with the summations taken from i = 1 to i = n.
If we use (5.72) to con- ditionalize (5.73), the result is P(Yx, y21g = la, 4)
(5.74) Before proceeding to introduce a prior pdf for the parameters of
(5.74), it would be instructive to study the properties of (5.74). 07 1. Given
4 = 4o, a given value, (5.74) has a unique mode at the ML estimates
for/go,/g, and .. This follows, since g -- is the conditional ML estimate for
2. For finite fi', as 4 = *."/�x' gets large, the second term in the exponen-
tial dominates likelihood function. *� Under these conditions the modal
values for/go and/g will be close to what is obtained from a least squares
regression of y. on yx. The assumption that 4 is large implies that the
variance of the measurement error in yx{ is small relative to the variance of
u2t in y2i = rio + fie + u2. 3. For finite/g', as 4 = q.'/ax 2---> 0, the first term
in the exponential of (5.74) dominates the likelihood function. Note that this
first term involves the sum of squares 2; [Yx, - (1//g)(y -/go)] �'. Thus
when 4 is very small ML estimates can be obtained approximately by
regressing Yx{ on y.{. The ML estimate of/g will be close to the reciprocal
of the least squares slope coeffi- cient estimate, whereas the negative of the
intercept estimate times the estimate for fi yields approximately the ML
estimate of/go when 4 is very small. 4. If/g - 0, the first term in the
exponential is zero and the second can be used to make inferences about/go.
5. Although the form of (5.74) is of interest in showing how the two sums
of squares associated with the regressions ofyx on Y2 and Y2 on Yx appear
in a, Since I[ -- [ is the conditional ML estimate for I[, the properties of
(5.74) are relevant. for likelihood analyses of the EVM. a8 We can write the
exponential of (5.74) as exp {-[2a22(I + /2/4)21-x[(8/4)x [Yx - (I/18)(y2 -
/o)l + (Ym - /o - /Yx0l}, and as 4 --* oo the first term becomes relatively
less important than the second. On the other hand, as 4 --* 0, the first term
becomes relatively more important than the second. BAYESIAN
ANALYSIS OF THE FUNCTIONAL FORM OF THE EVM 135 the
likelihood function conditional on = , it is the case that (5.74) can be
expressed more simply as (5.75) P(Yx, y21 = la, �2, 4) ,c exp 2,22(1
+/g2/4 ) (Ya,-/go - This is the form of the likelihood function, given 1[ = ,
which we shall combine below with prior pdf's for the parameters. Again, it
is important to emphasize that (5.75) has a unique mode at the ML
estimates for/g,/go, and 2, given 4, the location of which is dependent on
what is assumed about 4? As regards a prior pdf for the unknown
parameters of (5.75), namely, [5' -- (/go,/g), oo., and 4, we shall first make
the following assumptions: (5.76) p(lS, ., 4) cr Px(/g) Pa(), 2 where pa() is
the prior pdf for , with still unspecified form. In (5.76) we are assuming that
flo, , , and ,a are a priori independently distributed. The ranges of the
parameters are 0 < , < m and -m < o < m. With respect to , it is important to
realize that its range is finite, a priori, as the discussion of (5.61) above
indicates. Thus we shall assume that an investi- gator knows the algebraic
sign of and assigns an a priori range for this parameter, guided by the
analysis leading to (5.61). Over this range, say fig to v, the prior pdf for is
px(). With respect to ,, we are being diffuse on this parameter by taking log
,a uniformly distributed. � On combining the prior assumptions embodied
in (5.76) with the con- ditional likelihood function in (5.75), the resulting
posterior pdf is (5.77) 1 [ + - - � We can integrate (5.77) analytically with
respect to o, - < o < , and ea, 0 < ea < , to obtain the marginal posterior pdf
for and ' (5.78) P(, IY, Y, = ) /() () ( + /) - a, We have pointed out that as the
modal value for will approach ax = mxa/mn, and as 0 it will approach =
maa/mxa. Since xa- ax = (mxmaa - mxaa)/mxxmxa, is larger in absolute
value than ax. Thus for mxa > 0 ?. assuming to be large will lead to a
smaller positive point estimate for than assuming . to be small. ' 0 The
analysis can be extended to the case in which we use an informative prior
pdf .for *a in the inverted gamma form; see Appendix A for properties of
this pd/

136 ON ERRORS IN THE VARIABLES with 0 < 4, < co and flL < fl < flu,
where flL and flu are the a priori bounds placed on fl. The denominator of
(5.78) has a minimum at o. -- mx./mn, the least squares quantity from a
regression of y2 on y, and thus were it not for the factors involving fl in the
numerator (5.78) would have a mode at fl., pro- vided fl < /. <- flu. The
factor (1 + tg�'/$) ' produces a modal value for larger in absolute value
than/s.4 The amount by which the modal value for fl is increased absolutely
will depend on what is assumed about 4,. If the prior pdf for 4, favors large
values, the modal value for fl will be close to/., abstracting from the
information about fl included in p(fl).4' To use (5.78) in practice prior pdf's
for fl and 4' must be assigned. With respect to the prior pdf for fl, p(tg),.tgr
<- fl < flu, we can assign a beta pdf of the following form *: (5.79) p(zla, b)
B(a, b5 z'-(1 - z) b- a, b > 0, = 0<z_< 1, where z = (/ -/D/(/gu -/gD, a and b
are prior parameters to be assigned by an investigator, and B(a, b) denotes
the beta function with arguments a and b. The pdf in (5.79) is a rather rich
one that accommodates the prior information that/gL </g <-/gu. If, for
example, a = b = 1, (5.79) gives us a uniform pdf for/g. With respect to 4,,
the prior pdf p.(4,) might be taken in the following inverted gamma form: 1
[ voSo\ 0 < 4, < (5.80) po.(4,lVo, So) cr 4,,--775 exp ,----], vo, So > 0.
where vo and So are prior parameters. Although (5.79) and (5.80) are not
the only forms of prior pdf's that can be used for the present problem, they
appear to be rich enough to be capable of representing available prior
information in a wide range of circumstances. Substituting from (5.79) and
(5.80) into (5.78), we have a bivariate posterior,. pdf which can be analyzed
using bivariate numerical integration tech- niques. Marginal posterior pdf's
for/g and 4' can be computed along with measures characterizing them. In
addition, it is possible to compute joint posterior regions for/g and 4,. In
order to illustrate application of the above techniques, data have bc
generated from the model Yx = : + ux, y{ = 2.0 + 1.0 + u4, i = 1, 2,..., 20, x
This can be interpreted heuristically as a correction for the inconsistency of
' Note that, given 4,, the value of/ which maximizes (1 + ta/4,)/(rnaa -
2/rnxa + is the ML estimate. Thus the modal value of (5.78) for given 4,
will be close to the ML estimate if pz(/) oc constant. � a See Appendix A
for a review of the properties of the beta pdf. 4 See Appendix A for
properties of the inverted gamma pdf. BAYESIAN ANALYSIS OF THE
FUNCTIONAL FORM OF THE EVM 137 under the following conditions.
The values of ux{ and uo., were drawn inde- pendently from normal
distributions with zero means and variances 4 and 1, respectively; that is 4,
= e.o./,x = �. The ,'s were drawn independently from a normal distribution
with mean t = 5 and variance r 2 = 16. The 20 pairs of observations are
shown in Table 5.1 and plotted in Figure 5.1. -5 -10 o Y2 i 20- o � -- 0 o o
o o �o o o o o o I I I l Yi 5 10 15 Figure 5.1 Plot of generated data. Table
5.1 GENERATED OBSERVATIONS i Yx Yat i Yz{ Ym 1 1.420 3.695 11
1.964 4.987 2 6.268 6.925 12 1.406 6.647 3 8.854 8.923 13 0.002 2.873 4
8.532 14.043 14 3.212 4.015 5 -5.398 -0.836 15 9.042 10.204 6 13.776
16.609 16 1.474 1.953 7 5.278 4.405 17 8.528 10.672 8 6.298 9.823 18
7.348 9.157 9 9.886 12.611 19 6.690 8.552 10 11.362 10.174 20 5.796
10.250 Yx = 5.587 Ya = 7.784 mxx = 19.332 maa = 17.945 mxa = 16.925

138 ON ERRORS IN THE VARIABLES Using the data in Table 5.1, first
conditional posterior pdf's for/g, given 4 '= 0.25 and 4-- 1.0, were computed
4 from (5.78) and are shown in Figure 5.2. It is seen that the location and
spread of the posterior pdf's are 5.0 4.0 3.0 2.0 0.25, y,y2) = 1.0, y, y2 ) 1.0
Figure 5.2 I 0.80 0.90 1.00 1.10 1.20 Conditional posterior pdf's for/, given
4,. 1.30 not too sensitive to what is assumed about 4. In this instance, when
4 is assumed to be equal to 0.25, the value used to generate the data, the
con- ditional posterior pdf for/, has its mode at/ = 1.02 close to the value
used to generate the data, namely, 1.0. With 4 assumed to be equal to 1.0,
the modal value of the conditional posterior pdf for/ is located at/ = 0.96.
These results contrast with what is obtained from an analysis of the model
under the assumption that a �' = 0, that is, no measurement error in y. In
this case, with a diffuse prior pdf on/o and/, the posterior pdf for/ is centered
at fl2x = mx2/mxx = 0.876, the least squares quantity. 46 Although the
conditional pdf for/, given 4, is of interest, it is often the case that we lack
prior information that is precise enough to assign a specific value to 4.
However, it may be possible to choose a prior pdf to 45 A uniform prior pdf
for/ with a rather large range was employed. � 6 A simple regression of
Ym on Yx results in .9m = 2.893 + 0.876yx, (0.674) (0.0948) where the
figures in parentheses are conventional standard errors. BAYESIAN
ANALYSIS OF THE FUNCTIONAL FORM OF THE EVM 139 represent
the information available about 4; for example, the inverted gamma pdf,
shown in (5.80), could be used for this purpose. To illustrate, let us assign
the following values to the parameters of (5.80)' Vo = 8 and = _L With these
values assigned, the prior pdf for 4 has its modal value SO 2 1 6' at 4 -
0.236, mean equal to 0.246, and variance equal to 0.0230 (standard
deviation = 0.152). 47 On inserting this prior pdf for for Pa(4) in (5.78) and
using the data shown in Table 5.1, we computed the joint posterior pdf for /
and 4 with a uniform prior for/. Also the marginal posterior pdf's for/ and
for 4 were computed. The results of these computations are shown in Figure
5.3. It is seen from the plots in Figure 5.3 that the marginal posterior 5.0 4.0
2.0 1.0 10.0 (. 8.0 II 6.0 4.0- 2.0- 0 ,-,, 0.80 1.0 1.2 1.4 0 O. 10 0.20 0.30
0.40 Figure 5.3 Marginal posterior pdf's for/ and 4' based on generated data
and utilizing inverted gamma prior pdf for 4,. pdf's for/g and 4 are
unimodal. As regards the pdf for 4, it appears that the information in the
particular sample employed has reduced the modal value from 0.236 in the
prior pdf to about 0.18 in the posterior pdf. The posterior pdf for/g has a
modal value at about 1.03, a value close to that used in generating the data.
We have gone ahead conditional on g = , an assumption that obviates the
need for a distributional assumption regarding the :,'s. On the other hand, *
See Appendix A for algebraic expressions for the modal value, mean,
variance, etc., of the inverted gamma pdf.

117 exp {- (5.83) OC /2 11--a- exp {- 140 ON ERRORS IN THE


VARIABLES in certain circumstances we may find it appropriate to assume
that the n elements of g are, a priori, normally and independently distributed
with a common mean and variance48; that is, (5.81) p(l[[t , r) rr--exp -rr2
(1[ - tt)'( - t0 , where t and r �' denote the common mean and variance,
respectively. If we assume further that t .and - are to be unknown 4� and
distributed a priori as p(p, ,) oc const, with -oo < t < c and 0 < ' < c, then the
marginal prior pdf for 1[ isS�: (5.82) j,() c [( - 0'( - 0l -(-'' - < ' i = 1, 2,..., n,
where = t'g/n. Below we shall combine this prior pdf and prior pdf's for
other parameters with the likelihood function. The likelihood function is l(l,
6, 11,., lyx, 1 211o. o. [6(Y - )'(Y - ) + (y,. -/got-/glg)'(y,. -/got- 1[ +6+
where I' = o, fi), = */*x, and = [y + fi(Ye - fio0]/( + fi). e second line of
(5.83) is obtained from the first by completing the square on [. Note that is
the conditional maximum likelihood estimate for used in the conditional
Bayesian approach leading to the posterior pdf in (5.77). As prior pdf for
the parameters, we shall employ (5.84) p(, , , ) P(, , ) P(), o If, for example,
{ = log YV, where Y{V is the ith individual's permanent income, the prior
assumption that the 's have the normal pdf in (5.81) is, of course, an income
distribution assumption. � o The parameters and * in (5.81) could be
assigned values on a priori ounds. The resulting posterior pdf in this case
has not yet been analyzed, nor in the case that an informative prior pdf is
employed for and ,. ,o This prior pdf for the elements of g is close to one
that Stein indicates can be used to generate his estimator for n means when
n 3. S C. M. Stein, "Confiden Sets for the Mean of a Multivariate Normal
Distribution," op. cit, p. 281. ,x For details see footnote relating to (5.65)
above. BAYESIAN ANALYSIS OF THE FUNCTIONAL FORM OF THE
EVM 141 where po.() is given by (5.82) and Px(l, 4, 11o.), not yet
specified, is discussed below. Formally, the posterior pdf is (5.85) p(l, 6,
11,., [IY, Yo.) oc p([t, 4, )p,()l(, , , glY, Y,), with the likelihood function as
shown in (5.83). On integrating (5.85) with respect to the elements of ,s we
obtain n12 P(, &, *IY, Y:) px($, , :) 4_:( + :) (5.86) x exp 2,::& + (y: - flor-
yx)'(y: - flor- yx) (_) r( + �) x e -' ! r[ + (n- 1)/21' where s t122 n 11o?(4
+/go.) (4o.mxx +/go.mo.o. + 24/grnxo.). As regards the prior pdfpx(l, 4,
11o.) in (5.86), we shall first analyze the case in which/go,/g, 4, and 11o. are
assumed to be independently distributed. If our information about/go and
11o. is vague, then the prior pdf can be taken as es (5.87) with g(/g) and go.
(4) still unspecified prior pdf's for/g and 4, respectively. On substituting
from (5.87) into (5.86) and integrating with respect to/go, the resulting
posterior pdf is gx(/g) go.(4)4(,- x)/o. (5.88) x exp 2074 +/go. [Yo.,- .o.
-/g(yx, - Y0] { r( + �) x e-qto.%a,=o \27a} a! P(a + (n - 1)/21' ,a The
appendix to this chapter describes this integration in detail. as In the
following expressions = Y.n=xdn, rnxx = Y.l=x (yx,- yx)2/n, tna2 = E=x
(Y2 - yo.)O'/n and rnxa = EI=x (yx, - Yx)(Y2{ - y.)/n. Alternatively, we can
pursue the analysis with-an informative prior pdf for -2 in the inverted
gamma form. LindIcy and EI-Sayyad, op. cit., also assume that ft, 6, and ,o.
are independently distributed a priori. In an analysis of the present model
they provide valuable discussion and obtain approximations to the posterior
pdf.

142 ON ERRORS IN THE VARIABLES where $/.' = $, which has been


defined above in connection with (5.86). On integrating (5.88) termwise
with respect to ., 0 < ,. < c, the result is (5.89) p(fl, lyx, y.) oc gx(fi) g'()4(-
x>/' l () with - YOF' + and d = F(e + n - �)F(e + �). ! Pie + (n - 1)/2] By
straightforward algebra it is the case that A = n(m.. + 4mn) and Thus (5.89)
can be expressed as s6 A (m.. + 4mn)(4 + fi') (5.90) p(/, 41yx, yO oc gx(fi)
g.(4) 4 �*- x [4"mn + fi'm.. + 24fimx.] ' 4. .=o L + 4mx0(4 + fi') J Given
explicit forms for the prior pdf's gx(fi) and g=(4), bivariate numerical
integration techniques can be employed to obtain the normalizing constant
for (5.90), the marginal posterior pdf's for fi and 4, and measures pertaining
to the joint and marginal posterior pdf's; for example, we might employ a
beta prior pdf for and an inverted gamma or F pdf for 4. Since numerical
methods are employed in the analysis of (5.90), the choice of prior pdf's for
fi and for 4 is not very restricted. The bivariate posterior pdf for and 4 in
(5.90) was analyzed by using the generated data presented in Table 5.1. The
prior pdf for fi, gfffi), was taken to be .uniform over a large range, whereas
the prior pdf for 4, g.(4), was taken in the inverted gamma form [see
(5.80)], with prior parameter values Vo = 8 and So'= , the same values
employed in the calculations underlying Figure 5.3. In Figure 5.4 the
marginal posterior pdf for/g is presented2 It is centered close to 1.0. Further
note that it is more spread out than the so Note that ax/A = (4'mxx + 18'm +
2mx=)/(m.. + 4m)(4 + 18=), for given 4, has a maximum at 18 =/, where/is
the ML estimate; that is, for given (d/d18)($x/A) = (rn== + rnxx)-
x[2(18rno. o. + )rnx.)/() + 18) - 218(=rnxx + 18�'rno. o. + 218rnx=)/( + On
setting this derivative equal to zero, the necessary condition for a maximum
is 18%nx9. + 18(rnxx - rn==) - rnxu = 0, which is identical to (5.49). v In
these calculations the first 501 terms in the series shown in (5.86) were
employed. 4.0 3,0 2.0-- .0-- 0 0.60 BAYESIAN ANALYSIS OF THE
FUNCTIONAL FORM OF THE EVM 143 I I I I 0.80 1.0 1.2 1.4 1.6
Figure 5.4 Marginal posterior pdf for 18 computed from (5.86) and
generated data. See text for explanation of the prior information utilized.
posterior pdf for fi, given = , which is shown in Figure 5.3. The integration
over the elements of appears to be responsible for the greater spread of the
posterior pdf in the present case. As an alternative to the prior assumptions
in (5.87), wherein it is postulated .that 4 and -o. are a priori independent, it
may be appropriate in some situa- tions to assume that ,x and -o. are
independently distributed a prioriSm; for example, if we assume that our
prior information about these two inde- pendent parameters can be
represented by inverted gamma pdf's, 1 ( ..(5.91) p([v,, s) ,c *l" +---'-i exp
-2]' 0 < { < co, i = 1, 2, where (v, s0, i = 1, 2, are prior parameters, then on
transforming to 4 = a'/az ' and ,., we have , (5.92) ga(4, so This was
assumed in Zellner and Sankar, op. cit., and in R. L. Wright, "A Bayesian
Analysis of Linear Functional Relations," manuscript, University of
Michigan, 1969. If, for example, different measuring instruments were
employed to generate the yx,'s and the y[s, the assumption that *x and , are
independent may be appropriate. In the context of Friedman's permanent
income hypothesis, if individuals who chose occupa- tions with a high
variance of transitory income also tend to have a high variance of transitory
consumption, it is not appropriate to assume that , and 0' are independent a
priori. Here it is probably better to assume that = ,?'/,x and ' are
independent, as above.

144 ON ERRORS IN THE VARIABLES as the joint prior pdf. It is seen


that the conditional prior pdf for o., given 4, is in the inverted gamma form.
Also, the marginal pdf for (s0./so.')4 is in the form of a Fisher-Snedecor F
pdf with vx and vo. degrees of freedom? Then, if in place of (5.87) we use
the prior pdf (5.93) P(I $, 4, o.) gx(fl) ga�, ), with ga(, *), as shown in
(592) and gx(fl) not yet specified, we can substitute from (5.93) into (5.86)
and .perform integrations with respect to flo, -m < o < m, and , 0 < < m, in
much the sine way as shown above. The resulting posterior pdf for fl and is
g()+- 1 () (5.94) P(fi, y=) ( + fi=) B,+%_a>/ d', with Vo = v + d,' P[( + n +
(vo - 3)/21r( + �) = + (n - 1)/21 ' B = and (4 q/g.)[(nm.o. + vo. sa�')/n +
(nm + vs )/nl where $x and A have been defined in connection with (5.88)
and (5.89). Thus we can write the joint posterior pdf for/g and , as follows:
(5.95) 4'lYx, cr (4 +/g')� (ao + ax4) x 'm +/o.mo.. + 24/mo.] d' --o +/ilo.)
(ao + ax4) ] ' where ao = (nm2o. + vo. so.')/n and ax = (nmxx + VlSlo.)/tz?
O As mentioned above, for given 4 the quantity (4'rnn + tgo.rn.o. +
241rnxo.)/ (, +/g') has a maximum at the ML estimate. Thus for given 4
(5.95) will have a conditional modal value for/g close to the ML estimate.
If, however, we integrate (5.95) with respect to 4, both the prior and sample
information regarding 4 will play a role in determining the location of the
marginal posterior pdf for . so See the section in Appendix A dealing with
the Fisher-Snedecor F pdf for a proof. o In (5.95) the factors (n+vx-a)to'/(ao
+ axe) (o.n+�-a)t'+a, -- 0, 1, 2,..., are in the form Of F pdf's with vx + n - 1
and va + n - 2 + 2a degrees of freedom. Note that the prior pdf for ,
obtainable from (5.92), is in the F form with vx and va degrees of freedom.
ALTERNATIVE ASSUMPTION ABOUT THE INCIDENTAL
PARAMETERS 145 5.5 BAYESIAN ANALYSIS OF THE
STRUCTURAL FORM OF THE EVM In the structural form of the EVM
the vector is regarded as random in formulating the likelihood function; that
is, the pdf for is introduced as part of the model and not as subjective prior
information. The analysis shown in (5.63) to (5.64) is relevant for obtaining
the marginal pdf for the observa- tions yx and y., which serves as a basis for
the likelihood function. If the elements of 1[ are assumed to be normally
and independently distributed, each with unknown mean p and unknown
variance .o., then with a prior pdf for these two parameters and other
parameters of the model the formal analysis goes forward in exactly the
same way as in the preceding section. Similarly, a conditional analysis
based on the assumption that = is possible in the present case, and it too will
be identical in form to that presented in the preceding section. To repeat, the
main difference in analyzing the functional and structural forms of the
model lies in the interpretation given to the pdf for the elements of [.ox 5.6
ALTERNATIVE ASSUMPTION ABOUT THE INCIDENTAL
PARAMETERS 6o. In this section we analyze the EVM under the
assumption that the inci- dental parameters, the elements of g, can be
represented by a linear combina- tion of observable independent variables;
that is f, = rrx + rro.xo. +... + %x, i = 1, 2,..., n, where the x's are given
observed values of inde- pendent variables and the kr?s are unknown
parameters. ea The key impor- tance of this assumption is that the number
of parameters in the model is no longer dependent on the sample size n; that
is, the n unknown f's are replaced by k unknown r's and thus complications
associated with the incidental ' Similar considerations apply in general to
comparisons of sampling theory random parameter models and to Bayesian
analyses of corresponding fixed parameter models. See, for example, P. A.
V. B. Swamy, Statistical Inference in Random Coefficient Regres- sion
Models, doctoral dissertation, University of Wisconsin, Madison, 1968, p.
75 if., for further discussion of this point in the context of regression
models. ea The material in this section is based on A. Zellner, "Estimation
of Regression Rela- tionships Containing Unobservable Independent
Variables," !nt. Econ. Rev., 11, 441-454 (1970). ea This assumption has
been utilized in many econometric studies which include J. Crockett,
"Technical Note," in I. Friend and R. Jones (Eds.), Proc. Conf.
Consumption and Saoing, Vol. II. Philadelphia: University of Pennsylvania,
1960, 213-222, and M. H. Miller and F. Modig}iani, "Some Estimates of
the Cost of Capital to the Electric Utility Industry," 1954-1957, Am. Ecan.
Re., 56, 333-391 (1966).

146 ON ERRORS IN THE VARIABLES ALTERNATIVE ASSUMPTION


ABOUT THE INCIDENTAL PARAMETERS 147 parameter problem,
some of which have been analyzed above, are circum- vented. Our model
then is (5.96) (5.97) (5.98) or (5.99) (5.100) y = + u, y. = f + u., = Xn y =
Xrc + u, where y and Y2 are n x 1 vectors of observations, X is an n x k
matrix, with rank k, of observations on k independent variables, n is a k x 1
vector of parameters, is a scalar parameter, 64 and u and uo. are each n x 1
vectors of error terms. We assume that Eu = Eu: = 0, Euu' = Eu:u:' -- %'I
and Euuo.' = 0, an n x n matrix with zero elements; that is, we assume that
the error terms have zero means and are homoscedastic and uncorrelated.
Later on we shall require a normality assumption for the error terms. Before
turning to a Bayesian analysis of the model in (5.99) to (5.100), it would be
instructive to consider sampling theory approaches which have been
employed quite widely, particularly a so-called "instrumental variable"
approach? s In this approach it is generally recognized that if the elements
of the vector n in (5.100) had known values (5.100) would be in the form of
a simple regression model, the analysis of which would be straightforward.
Since x's value is rarely known in practice, (5.99) is employed to obtain an
estimate of , k = (X'X)-X'y, and then yo. is regressed on = X to obtain the
following estimator for : (5.101) where . = X: = X(X'X)- XX'y. and the
reason for denoting the estimator /(oo) will be made clear in what follows.
This estimator, often referred to as a "two-stage least squares" estimator, is
thus seen to be one formed con- ditional on Xnx assumed equal to Xx. e4 In
(5.100) we have suppressed the intercept term. Below we shall take up the
analysis of the model with (5.100) expanded to include an intercept term
and additional observable independent variables with unknown coefficients.
* See, for example, A. S. Goldberger, Econometric Theory. New York:
Wiley, 1964, 284-287; and F. D. Carlson, E. Sobel, and G. S. Watson,
"Linear Relationships between Variables Affected by Errors," Biometrics,
22, 252-267 (1966). If, in (5.99) to (5.100), we define n: = nx/, the system
becomes (5.102) y = Xn:/ + ux (5.103) yo. = Xn: + ua. Since the vector X.
has elements whose values are unknown, (5.103) can be employed to get an
estimate, namely, X, where . = (X'X)-X'., and this can be inserted in
(5.102). Then an estimator for fi is obtained by re- gressing on . -- X. and
taking the reciprocal of the slope coefficient estimator; that is, (0) = 2'2
(5.104) = where 9 = X. Some remarks regarding the estimators . and <o>
follow: 1. Both .> and o> are consistent estimators of . 2. In small samples
the distributions of ,> and <o will be differentO7; for exmple, under a
normality assumption for the error terms <,> can have a finite mean,
whereas in general the mean and higher moments of 0> do not exist? 3.
When the mean of <.> exists, as pointed out in the literature, {.> is biased
toward zero; that is IE<>l < Il, Heuristically, the bias arises because
substitution of X for Xnx in (5.100)introduces measurement error in the
independent variable for finite sample size. 4. In connection with (5.99) and
(5.100)the estimator = (X'X)-X'y uses just the n observations in y. Since y:
contains sample information lating to nx, it is the case that not all the
sample information is being employed in the estimation of n when we use
as an estimator. 5. The fact that (.> and <o> are different estimators means
that in any practical case numerical results will depend on which one is
used. As an alternative to the approach leading to (> and <0> as estimators
for Note { = a'X'X/'X'X. Since plim a = nx and plim = n, plim ()=fl.
Similarly, plim(o)= plim(a'X'Xa/a'X'X)= . It is assumed that limn, X'X/n = ,
a matrix with finite elements. 7 See D. H. Richardson, "The Exact
Distribution of a Structural Coefficient Esti- mator," J. Am. Statistical
Assn., 63, 1211226 (1968), T. Sawa, "The Exact Sampling Distribution of
Ordinary Lst Squares and Two-Stage Least Squares Estimators," J. m.
Statistical Assn., 64, 923-937 (1969); and the references cited in these
works for analysis bearing on the distribution of From (5.104), for $ven a,
(o) is in the form of the reciprocal of a normal variable,
148 ON ERRORS IN THE VARIABLES /, consider the following least
squares approach. Our sum of squares (SS) to be minimized with respect to
nx and/ is SS = 1 1 (y._ rqfi)(y Xxfi) ex (yx - Xgx)'(yx - Xgx) + (5.05) 1
[(Yx Xnx)'(yx Xnx) + (y g x)(Ya gnx)], 2 where = aa/*xa and Z = X. � If
we complete the square on nx in (5.105), we are led to (5.106) SS = [yx'yx
+ ya'Ya- v'm-xv + (nx - m-xv)'m(nx - m-%)], 2 2 where M = Z'Z + X'X and
v = X'yx + Z'ya. From the form of (5.106) and the fact that M is positive
definite, the conditional minimizing value for nx is given by = (z'z + + Z'y)
(5.07) + + where , = (X'X)-xX'y, i = 1, 2. Since a estimates nx, we can write
(5.107) as (5.108) = 1 + + Thus x is a weighted average of two estimators
for nx, namely, x and a/. For given , if m, x , the value used above. to
construct the estimator {), whereas, if 0, x a/, and we use just the
information in ya to get an estimate of nx as in forming the estimator (o
above. In general, x in (5.108) utilizes the information in both yx and ya, 2n
observa- tions, and is thus a more precise estimator for nx than those based
on just n observations? On putting nx = x = M-Xv in (5.106), we have
(5.109) SS 1 [ (- )'(- )] = saa+sn+ a ' ,aa + 69 Note that if the elements of ux
and u. are normally distributed the likelihood function is proportional to
,nt./o.f.,e-ss/% Thus the values of the parameters minimizing SS will
maximize the likelihood function. 70 Since Ettx = tx and E2 =txfi, Ex =tx,
given fi and 4. Also, E(x - rcx)(x - = (1 + fi2/4)-%x2(X'X)-X, whereas E(tx
- *tx)(x - ,rx)' = x2(X'X)-x. Thus, when fi2/4 is large, the elements of tx
have much smaller variances than those of for given/ and 4.
ALTERNATIVE ASSUMPTION ABOUT THE INCIDENTAL
PARAMETERS 149 where 9, = X and s, = (y - X)'(y, - X,) for i = 1, 2.
Differentiating (5.109) with respect to and setting the derivative equal to
zero, we obtain (5.110) - yo. yo) - 9Yx yo. = 0 as the necessary condition on
fi for SS to be minimized. It is to be noted that (5.110) is in exactly the
same form as the necessary condition arising in the classical EVM, except
that here sample moments are in terms of "computed" values, x and ..
Solving the quadratic in (5.110), we are led to (5.111) as our estimator for
fi, given 56. Since we can estimate 56 in the present problem, that is, *x
(5.112) = saa, an estimator for fi, ](,) can be obtained by setting 56 = in
(5.111). Also and/($) can be substituted in (5.108) to give us a computable
estimator for From (5.109) we see that if 56/fl --> co the minimizing value
for/g -->/() .'x/x'x. On the other hand, if 56/fl --> 0, the minimizing value
for fl --> fl(o = .'/x'.. For any given sample it is not difficult to show that
(5.113) Also, as n --> o% all three estimators converge to the true value
fi.7o The system in (.99) and (5.100) can be expanded to include an
intercept term and other observable independent variables: (5.114) (5.115)
ya = Xnx/g + WO + u., where Wis an n x kx matrix, with rank kx, of
observations on kx independent variables 7 and 0 is a kx x 1 vector of
unknown' parameters. As above, we consider minimization of 7x With the
error terms normally distributed, is a maximum likelihood estimator for =
aa/ a. 7a The results of some sampling experiments relating to these
estimators have been reported in P. R. Brown, Some Aspects of Valuation in
the Railroad Industry, doctoral dissertation, University of Chicago, 1968. 70
In small samples, if k/fi 2 is very large or very small, (o) and (o) have
distributions centered at values which are quite far apart. 74 Wcould be a
submatrix of X; that is X = (Xxi W), where Xx is of size n x k - kx.

150 ON ERRORS IN THE VARIABLES (5.116) SS = __1 [4(Yx - Xx)'(yx


- Xx) + (y. - Zx - W0)'(ya - Zx - W0)], t22 with respect to nx, 0, and/, where
Z = X/. Differentiating with respect to 0, we obtain as the conditional
minimizing value of 0 (5.117) 0 - ( W' W) - x W'(y - Zx), which when
substituted in (5.116) yields (5.118) SSx = 1__ [4(Yx - Xx)'(yx - Xx) + (ye*
- Z*x)'(y.* - Z*x)], 1722 where y*-Z*x = [I- W(w'w)-xW'](ye- Zx). Since
(5.118) is in precisely the form of (5.105), the same steps can be performed
to obtain minimizing values for x and . The resulting estimator for will
depend on 4, and, as above, we can use an estimate of 4 to obtain a
computable estimate for fl. Estimates of fi and 4, so obtained, can be used to
obtain an estimate for x, which can be substituted in (5.117) to obtain an
estimate of 0. In the sampling theory results above we have seen that results
depend critically on the size of 4 = 17?./17x75 The least squares approach
pursued leads to estimators which depend on 4. Fortunately in the present
model an estimate of 4 can be obtained from the data which can be
employed to get an approximation to the least squares estimator 7e for . In
the Bayesian approach it will be seen that the quantity (& is close to the
modal value of the conditional posterior pdf for fi when we use a diffuse
prior pdf and assume that 4 -- - In the Bayesian analysis of the model in
(5.99) and (5.100), in addition to other assumptions made about the error
terms, we assume that they are normally distributed. Then the likelihood
function is (5.119) l(, x, 4, *.lYx, Y.) oc 17-- exp 217o. o. [4(Yx - Xnx)'(yx-
X'nx)+ (y9. - Znx)'(yo. - Zx)] , where Z = X/. As prior pdf we employ
(5.120) 4, 0 < 4 < cro. d -oo < r u < oo, i = 1,..., k, where Px(4) and po.(fi)
are still of unspecified form. In (5.120) we are assuming that fi, the
elements of nx, 4, and log 17o. are independently distributed with the *
More accurately on '/4. ,e This is a case in which an optimal (least squares)
estimator depends on a nuisance parameter 4. ALTERNATIVE
ASSUMPTION ABOUT THE INCIDENTAL PARAMETERS 151 pdf's
for the elements of nx and log 17o. uniform? Then the posterior pdf for the
parameters is given by (5.121) 4 2 p(4) p.() p0g, ,xx, 4, "alYx, Ya) oc x exp
2,aa [(yx- Xnx)'(yx- Xnx) + (y - z0'(y - zd]}. On completing the square on
nx in the exponent, we have p(/, x, 4, alyx, y.) oc (5.122) 4 �- p�) p.()
172n + 1 x exp (- 1 217ao. [4yx'yx + ya'yo. - v'M-Xv + ( - M-Xv)'M(nx -
M-v)]}, where v and M have been defined in connection with (5.106). Thus,
given/, , and ,., the conditional posterior pdf for nx is normal with (5.123)
and E(nxl/g, 4, a., Yx, Ya) = M-Xv = (Z'Z + 4.,Y'X)-x(4X'yx + (5.124)
Var(nx[/?, 4, o., yx, yo.) = M-X17o? ' where = (X'X)-XX'y, i = 1, 2. The
expression in (5.123) is seen to be precisely in the form of (5.107), the
conditional minimizing value for nx in the least squares sampling theory
approach. On integrating (5.122) with respect to the elements of , we obtain
e (5.125) -+(F + 4) ' x exp 217. [4Yx'Yx + Y.'Y. - v'Mv] � ?? The analysis
can be extended to incorporate an informative prior pdf for 'a in the inverted
gamma form. ? Note {M/*o?l- � = [,ae/68 a + &qIX'X/- �.

152 ON ERRORS IN THE VARIABLES ALTERNATIVE ASSUMPTION


ABOUT THE INCIDENTAL PARAMETERS 153 Then, on integrating
(5.125) with respect to o., the result is p(/, 4lye, yo.) c (5.126) c where { =
X and s, = (y, - X)'(y, - X{), i = 1, 2. This then, is the joint posterior pdf for/
and 4. The factor in brackets in the denominator is identical to the quantity
in brackets in (5.109). Thus, given 4,/ =/(0), as shown in (5.111), will
minimize the quantity in brackets in (5.126). Aside from prior factors, for
given 4, the posterior pdf in (5.126) will have a modal value at
approximately / =/(0). Given explicit forms for the prior pdf's P(4) and
p2(]), TM bivariate numerical integration techniques can be employed to
evaluate the normalizing constant of (5.126) and to compute joint and
marginal posterior pdf's. 8� To illustrate results obtained by using (5.126)
with a diffuse prior 8x for 4 and/, we generated data from the following
model: with y2 = 0r0 + nx)/ + u:, rx0 = 5.0, rn = 0.5, i = 1, 2,..., 30, and fi =
0.9. ' 'S Further, the u s and ua were drawn independently from a normal
distri- bution with zero mean and variance equal to 225; that is oa = aa =
225. The x's were drawn independently from a normal distribution with
mean and variance equal to 5 and 9, respectively. Shown in Figure 5.5 are
the contours of the joint posterior pdf for and . Given that the error term
variances are rather large, the joint posterior pdf is quite spread out. The
mode is located at = 0.88 and 4 = 0.56. The If we are diffuse on 6 and 2 in
the following way, p(, aa) c l/ao., this implies p(cx, ao) c 1/%(., and thus use
of the latter diffuse prior will lead to the same posterior pdf as use of the
former, given the same prior assumptions about fl and nx in both cases. Of
course, if information is available about and/, it can be introduced in (5.126)
by choice of appropriate prior pdf's. 8o Since the Bayesian analysis of
(5.114) and (5.115) proceeds in much the same way as that shown above,
we do not present it. x That is, px()pa(/) cc 1/, 0 < < o, and -o < 1.8 1.4--
1.2-- 1,0-- 0.8-- 0.6-- 0.4- 0.2- 0 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1,2 1,a Figure
5.5 Contours of the joint posterior pdf for and . The model value is 8.55 at =
0.88 and = 0.56. marginal posterior pdf's for 4 and fl are shown in Figure
5.6. The marginal posterior pdf for 4 has a mean equal to 0.694 and a
posterior standard deyiation equal to 0.279, thus showing quite a bit of
dispersion. On the other hand, the posterior pdf for/ is rather sharp, with a
mean equal to 0.898 and standard deviation equal to 0.0847. Also, the
marginal posterior pdf for/ appears to be symmetric. a�' In concluding this
chapter it is worthwhile to point out that just random measurement errors
have been considered. In practice it is often the case that both random and
systematic measurement errors are present. General met. hods for treating
both kinds of error would be of great value but unfor- tunately they remain
to be developed? aa It is interesting to note that from (5.111) (&-- 0.894 for
these generated data, a value close to the posterior mean for/. On the other
hand, for these data/if(o) = 1.033 and/() = 0.832, values which depart
somewhat from the posterior mean of/L aaSee Harold Jeffreys, Theory of
Probability (3rd ed.). Oxford: Clarendon, 1966, pp. 300-307, for an analysis
of a special problem that involves both random and syste- matic
measurement errors.
154 ON ERRORS IN THE VARIABLES 2.0 1.5 .o 0.5 o 0 2.0 i i I i 0.4 0.8
1.2 1.6 5.0 4.0 3.0 2.0 1.0 0 Figure 5.6 0 0.4 0.8 1.2 1.6 Marginal posterior
pdf's for d, and fl. APPENDIX In this appendix we take up the problem of
integrating (5.85) with respect to the n elements of 1. 8 The factors of (5.85)
involving 1 are shown below: ] (1) [(1 - t)'(l - t)] -(-2)/0' exp 2tro?' ( d- fi�')
(l - )'( - ) , 84 The approach employed below was presented in A. Zellner
and U. Sankar, "On Errors in the Variables," manuscript, 1967, and is used
and studied further in R. L. Wright, "A Bayesian Analysis of Linear
Functional Relations," manuscript, University of Michigan, Ann Arbor,
1969. ALTERNATIVE ASSUMPTION ABOUT THE INCIDENTAL
PARAMETERS 155 where I has been defined in connection with (5.83).
This expression must be integrated with respect to the elements of g over -
oo < f < o% i = 1, 2,.. o, n. First, make the change of variables (2a) (4' + =
(f{ - fO, i = 1, 2,..., n {7 2 OF {7 2 (23) f' = (4 + fi:) (z, + q3, where q, = (4
+ fia)�stda:. Then i= 1,2,...,n, 0'2 n de, = (4 + ?)mo. - t=l and (1) can be
written in terms of z' = (z, zo.,..., (3) where q' = (qx, qo.,..., q) and M = I,, -
tt'/n, with I an n x n unit matrix and t an n x 1 column vector in which all
elements are equal to one. On letting w = z + q, (3) becomes (4) We can
view the integration of (4) with respect to the elements of w as the problem
of finding the exp.ectation of [YL- (w - ):]-<-'/', with the having a normal
pdf. To get rid of � in the denominator, we use Helmert's transformation c
= Bw, which is W 1 w 2 C 1 --- wx + wo. - 2wa w+ wo.+wa- 3w (5) = wx+
w:+...+ w,_-(n- l)w, n(n - 1) w + wo. +... + w,

156 ON ERRORS IN THE VARIABLES It is known that the matrix B in


the Helmert transformation is orthogonal. Thus the Jacobian of the
transformation is one. Since the c[s 'are linear combinations of the w,'s, they
have a normal pdf with Ee -- BEw = Bq. Then e - Ee = B(w - q) or w - q =
B-X(e - Ee) and (6) (w -- q)'(w -- q) -- (c -- Ec)'(B-X)'B-X(c - Ec) = (�-
E�)'(�- Ee), since B is orthogonal. Last, from properties of the Helmert
transformation in (5) we have QUESTIONS AND PROBLEMS 157 Since
term-by-term integration is appropriate in the present instance and we have
(11) v'-�e -�" dv = 2"+'AP(a + 3), the integral in (10) is proportional to
(12a) or to se o a"2 [r( + 7 o. e ~* 4=0 (2a)! P(a + (n - 1)/21 (7) (w,- Using
(6) and (7), we can express (4) as (y2 2 I- In - ! cf'. 3'+ if' e-* 6=! o=0 where
(8) -1 exp [-3(e- E�)'(�- E�)] r( + 3) a! Pta + (n - 1)/2]' [ 1 x exp -
(c-)�'] exp [- (c- ).] where , = Eq, i = 1, 2,..., n. On integrating (8) with
respect to c,, -oo to +0% we get a numerical constant. The integration with
respect to the remaining c{'s is viewed as obtaining the expectation of (Y._-
-x x cf') -(-�'>t�', with the c[s independent and normal, each with its own
mean . Under these conditions v = Y. IUx x cf' has the following noncentral
;c �. pdfeS: (9) 2V e-(+)v"-x (2) P( + 3P)' 0 < v < 0% it----0 ' with p = n-
1. In (9) $ is the noncentrality parameter given by $- lUx x G ' = Y./L x (q -
j), with q as defined in connection with (2b) above. Then on multiplying (9)
by v- �(- o.)to., the integral of interest is proportional to QUESTIONS
AND PROBLEMS 1. Consider the n-mean problem described in Section
5.1 with the likelihood function given in (5.1). Suppose that in place of
(5.3) we employ the following prior pdf for the n elements of g and ,: *o" 2-
77o (g - t,o)'(g - to) 0 < (;< o0, --o0 < :t < 00 .. i = 1, 2,..., n, where rs and o
are assigned values by an investigator. (a) Provide an interpretation of the
above prior pdf. (b) What is the mean of the posterior pdf for , given, ? (c)
Comment on properties of the marginal posterior pdf for ,. 2. Consider the
analysis of the system in (5.17a, b), with the likelihood function shown in
(5.18), using informative prior pdf's for ,x and ,o. in the form of inverted
gamma pdf's. (In this analysis express the posterior pdf in terms of , , and ,,
where , = /o..) 3. For the system in (5.30) and (5.31) to be analyzed
appropriately as a simple regression model what assumption about
parameter values is sufficient ? If this assumption is justified, what is the
mean of the posterior pdf for fi, using a diffuse prior pdf for rio,/% and as. ?
See, for example, T. W. Anderson, An Introduction to Multivariate
Statistical Analysis. New York: Wiley, 1958, p. 113. .e In moving from
(12a) to (12b) the following expressions were used: F(a + �) = (24)!
V'r/22,,F( + 1), and F( + 1) = !.

158 ON ERRORS IN THE VARIABLES 4. In connection with the


inequalities shown in (5.61), under what assumptions about the values of
the parameters in (5.56) to (5.58) will the following estimators closely
bracket the true value of/ in large samples ? ]o. '= (y' - .0)(yo.,- 90) and o. =
.-_ (yo.,- ) S. Establish that the root of (.49), shown in (5.50), is the one that
is associated with a maximum of the likelihood function. 6. If, in (.27),
represents the logarithm of permanent income for the ith individual,
whereas in (.28) represents the logarithm of permanent con- sumption,
explain in detail the differences in assumptions about these quantities in the
functional and structural forms of the EVM. 7. In developing (.61) we
assumed that ux and u: in (.30) and (5.31) were uncorrelated. If Euu2 = , i ='
1, 2,..., n, how are the relations in (5.56) to (.58) and the inequalities in (.61)
altered? 8. Assume that we have measurements of the same quantity, say
income per household, from two different sources; for example for the ith
region y might be a suey estimate and y:, an estimate derived from a second
independent survey. Consider these data in connection with the EVM: (i) y
= + (ii) y = + u i = 1, 2,..., n, (iii) , = o + J (a) Interpret and as well as ux and
u:. (b) If, in (iii), o = 0 and = 1, in what sense are the two sets of measure-
ments consistent ? (c) Given the assumptions made in connection with
(3.30) and (.31), explain how 0 and can be estimated, given that x: has a
known value Or = 0. (d) Discuss possible sources of prior information about
the parameters and indicate how it can be used in estimation. 9. In
employing a prior pdf such as that shown in (3.67) in analyzing the EVM,
will the influence of the prior pdf be negligible as the sample size increases
? Similarly, does the assumption that the 's are normally and independently
distributed affect Bayesian and maximum likelihood estimation results in
just small sample situations ? 10. Use the information in Table .1 to obtain
estimates of the bounds on the slope co�cient shown in (.61). 11. Consider
the system (.27) to (3.30) as representing Friedman's permanent income
model with the logarithms of measured income and consumption and of
permanent income and consumption given by y, y, , and , respec- tively.
Interpret this model within the context of the functional and structural forms
of the EVM. 12. In Problem 11 assume the structural form of the EVM and
set forth sucient conditions for all parameters to be identified. How does the
assumption Euu: = x: 0 affect your analysis ? QUESTIONS AND
PROBLEMS 159 13. Under the identifying assumptions of Problem 12,
derive maximum likelihood estimates for the parameters of the permanent
income model. 14. Provide a prior pdf for the parameters of the permanent
income model, dis- cussed in Problems 11 and 12 above, and indicate how
marginal posterior pdf's for the parameters can be computed. 15. Suppose
that we have measurable proxies for permanent consumption and permanent
income, denoted and F,, respectively, and wish to investigate the
proportionality hypothesis; that is, the hypothesis that permanent con-
sumption is proportional to permanent income. Is U, = fi0 + fi + e, with flo
0 and an error term, an economically reasonable alternative model,
particularly over the range of low values for F ? As another alternative to
the proportionality hypothesis, consider log _ Cd = so + xlogY + where v is
an error term. What does this last equation imply about C/Y, the average
propensity to consume, as -+ 0 with < 0 ? 16. Shown below and on the next
page are per capita data, expressed in U.S. 1955 dollars, for i7 and g = i7 -
C, proxies for permanent income, and permanent savings relating to 26
countries7: Use these data, along with diffuse prior assumptions, to analyze
the two relations put forward in Problem 15 as alternatives to the
proportionality hypothesis within a regression framework. In particular,
what is the posterior probability that x < 0 ? Country United States 1659.1
123.2 1535.9 Canada 1208.1 84.0 1124.1 New Zaland 928.0 81.2 846.8
Australia 905.0 96.6 808.4 Belgium 877.6 95.9 781.7 France 835.7 47.7
788.0 Luxemburg 80L2 107.3 693.9 Sweden 765.0 58.5 706.5 United
Kingdom 737.6 30.7 706.9 Denmark 723.2 65.7 657.5 Netherlands 476.4
45.7 430.7 Ireland 416.5 29.5 387.0 Austria 411.9 41.2 370.7 ? These data
appear ih H. S. Houthakker, "On Some Determinants of Saving in
Developed and Under-Developed Countries," Chapter 10, pp. 212-224, p.
212, of E. A. G. Robinson (Ed.), Problems in Economic Development. New
York: St. Martin's, 1965. Data for two countries, Panama and Peru, for
which $ is negative, have been omitted. See Houthakker's paper for the
method and weights employed in computing the figures presented in the
table.

160 ON ERRORS IN THE VARIABLES Country St tt -- - St Malta 316.8


64.8 252.0 Costa Rica 257.8 13.7 244.1 Jamaica 235.8 8.0 227.8 Spain
234.8 10.7 224.1 Japan 199.2 28.8 170.4 Colombia 198.7 8.6 190.1 Ghana -
197.7 9.6 188.1 Mauritius 197.0 18.0 179.0 Honduras 164.0 11.4 152.6
Ecuador 134.5 5.2 129.3 Brazil 127.7 5.6 122.1 Rhodesia 115.5 8.8 106.7
Belgian Congo 58.3 2.4 55.9 17. In the relation log (CalF0/(1 - "dFO = o +
x log t + vt assume that and Yt have common measurement errors, perhaps
due to common errors in weighting or in the use of exchange rates in the
conversion to U.S. dollars, that is, Ct = cCt*e" and = c F*e% here * and F*
are true values of the variables, c is a constant, and u is a random error term
with zero mean and variance au . How does the presence of such
measurement errors affect the results obtained in the calculations in
Problem 16 ? 18. Assuming the measurement error structure for C and set
forth in Problem 17, compute maximum likelihood estimates for the
parameter x in the relation, l�gl - C,/ - eo + ex log F,* + with log = log c +
log F*+ u, using the data in Problem 16 and assuming various values for A
= 19. Assuming that h = %:/%: has a given value, perform a Bayesian
analysis of the model in Problem 18. Determine how properties of the
conditional posterior pdf for x, given A, depend on the value assigned to h.
20. Explain how Problem 18 can be analyzed from the Bayesian point of
view with a prior pdf for A = %:/%a and other parameters. Use the data in
Problem 16 to compute posterior pdf's. 21. Consider the system, analogous
to (5.99) and (5.100), Ylt = X/*l + Yt = Xt'xfi + Uxt, but assume that the k
x 1 vector xt is stochastic, independent of the u's, and with ro mean and k x
k pds covafiance matrix Extxt' = . By examining the second moments of
Yxt and yat, establish that inequality constraints on similar to those shown
in (5.61), can be derived. Does the pdf for xt, t = 1, 2, ..., T, contain
information relating to these bounds, hence to 7 QUESTIONS AND
PROBLEMS 161 22. Prove the result shown in (5.113). 23. In the
forecasting area much attention is given to the comparison of forecasts (Ft)
with actual measured outcomes (At). Since both F and At contain errors,
consider the following model: (i) F= 0t+ u } (ii) At = /t + v i = 1, 2,..., n,
(iii) It = ,80 + with EFt = 0t and EAt = % If in (iii) ,80 = 0 and ,8 = 1, in
what sense can it be said that forecasts are unbiased ? 24. In connection
with Problem 23, provide examples wherein it would be reasonable to
assume that the u's are probably independently distributed and examples
wherein the ut's are probably not independently distributed. 25. Consider
the analysis of the system in Problem 23 under assumptions of the EVM
considered in this chapter. 26. Appraise the following statement: although a
regression of At on F yields inconsistent estimates of ,8o and ,8 in (iii) of
Problem 23, such a regression may be valuable in providing systematic
corrections to forecasts.

THE BOX-COX ANALYSIS OF TRANSFORMATIONS 163 CHAPTER


VI Analysis of Single Equation Nonlinear Models In some circumstances
economic and/or statistical considerations lead us to the problem of
analyzing models that are nonlinear in the parameters. In Section 4.1 we
encountered a nonlinear relation in the analysis of the regres- sion model
with autocorrelated error terms, specifically (4. lc) and (4.12c). In this
chapter we analyze other nonlinear models; for example, the "constant
elasticity of substitution" (CES) production function and "generalized
production function" (GPF) models. Here nonlinearities develop mainly
because of economic considerations; that is, the CES function is a
generaliza- tion of the Cobb-Douglas (CD) function in the sense that it
permits the elasticity of substitution parameter to assume values other than
one, its value for the CD function. Similarly, GPF's permit generalization in
this respect and also with respect to the behavior of the returns to scale
parameter. The economic and statistical importance of taking account of
appropriate func- tional forms of relationships cannot be emphasized too
strongly. Use of incorrect functional forms can often lead to serious errors.
Thus special attention is given in this chapter to the analysis of several
nonlinear forms that appear to be useful in a number of applications. 6.1
THE BOX-COX ANALYSIS OF TRANSFORMATIONS In the Box-Cox x
analysis of transformations we encounter relationships nonlinear in one or
more parameters; for example, among other trans- formations Box and Cox
consider the following one for the dependent variable y in a regression
model with y > 0, a = 1, 2,..., n: (6.1) y'- 1 2 = f + f.x2 +"' + fx + u, = 1, 2,...,
n, where the f's and are unknown parameters, the x's are observations on x
G. E. P. Box and D. R. Cox, "An Analysis of Transformations," J. Roy.
Statist. Soc., Series B, 26, 211-243 (1964). 162 independent variables, and u
is the ath error term. They assume that for some unknown value of 2 the
transformed observations (y - 1)/2, -- 1, 2,..., n, satisfy the standard
assumptions of the normal multiple regression model; that is, they are
normally ' and independently distributed with common (constant) variance
rF'. Thus, by assumption, for some value of 2 a transformation on the
dependent variable is assumed (a) to induce normality, (b) to stabilize the
variance, and (c) to induce simplicity of structure in the sense that E(y, ' -
1)/A = fix + fo.x. +... + fexe is a simple function of the f's and x's. Note that
the particular power transformation in (6.1) has the following properties.
For 2 = 1, (y - 1)/, = y - 1 and the model in (6.1) is linear in the y. For 2 = 0,
(y - 1)/2 = logy and thus the model in (6.1) has log y as the dependent
variable. For other values of , powers of y appear as the dependent variable.
Since 2 is an unknown parameter, it will have to be estimated along with
the other unknown parameters, the f's and , and thus information in the data
is used to determine the appropriate transformation for the dependent
variable. Below we show how this estima- tion problem can be solved by
using the maximum likelihood (ML) method 3 and the Bayesian approach.
For convenience, following Box and Cox, we rewrite (6.1) as follows: (6.2)
ym = X[5 + u, where y{ is an n x 1 vector with typical element (y.X - 1)/,,
X is an n x k matrix, with rank k, of given observations on k independent
variables, and u is an n x 1 vector of error terms, assumed to be normally
and independently distributed, each with zero mean and common variance '.
To write the joint pdf for the y's, given the parameters A, [5, and tr, and X,
we need the Jacobian of the transformation from the n u's to the n y's. Since
u/Oy = y- and = YI=x IOudOyl, we have (6.3) where p = (I-I--x Y)X/, the
geometric mean of the y's. Note that J > 0, since we have assumed y > 0, =
1, 2,..., n. Then we have for the joint 2 Note, as pointed out by J. B.
Ramsey, that the range of (ya; - 1)/)t will not be- oo to oo but just a
subinterval of this range. If the probability that (ya; - 1)/)t will fall in the
excluded interval is small, the normality assumption will not be vitally
affected. a Note from Chapter 2 that in large samples maximum likelihood
estimates are approxi- mate means of the jdint posterior pdf for the
parameters, a pdf that will usually be approximately normal. If a
transformation other than the one shown in (6.1) is employed, the typical
element of yO) will, of course, be different from that employed here. See
Box and Cox, loc. cit., for examples of other transformations.

164 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS pdf


for the elements of y J [ l (y('-X[3)'(ym-X[)], (6.4) the expression in (6.4)
viewed as a function of the parameters is the likelihood function, l(2`, 13,
�'IY). Letting L = log l, we have n 1 (6.5) L=const+logJ-logoa .... 2or a On
differentiating with respect to 13 and oo. and setting these derivatives equal
to zero, we obtain (2`) = ( X' X)- X'y, ' (6.6) and 1 (ym_ X)'(ym_ X). (6.7)
0u(2`) = , If 2` were known, (6.6) and-(6.7) could be computed and would
be ML estimates. Since 2` is assumed unknown, .we substitute from (6.6)
and (6.7) in (6.5) to obtain the maximized log likelihood function, denoted
L=ax(2`), which is given by Lmax(2`) = const + log J- n : log a(2`) (6.8) =
const log y n -: log a'(2`). We now evaluate (and plot) Lx(2`) for various
values of 2` until we find the value, say i, for which (6.8) attains its
maximal value. This is the ML estimate for 2`. Then (6.6) and (6.7),
evaluated for 2` = i are ML estimates for I$ and ', respectively. Further Box
and Cox note that approximate large-sample confidence intervals can be
constructed by using the result that in large samples 2[L=()- L=x(2`)] is
approximately distributed as x' with one degree of freedom, a fact that
follows from general results regarding the large-sample distribution of log
likelihood ratios2 In addition to applications reported in the Box-Cox paper,
the ML approach described above has been applied in an analysis of the
demand for money function. e This analysis posits that the demand for
money function can be written as * See, for example, M. G. Kendall and A.
Stuart, The Advanced Theory of Statistics, Vol. II. New York: Hafner, 1961,
pp. 230-231. P. Zarembka, "Functional Form in the Demand for Money,"
Social Systems Research Institute Workshop Paper, University of
Wisconsin, Madison, 1966, J. Am. Statist. Assocn., 63, 502-511 (1968).
THE BOX-COX ANALYSIS OF TRANSFORMATIONS 165 (6.9) M'
71=/x+ /u(Ya--1) + /ga(.rax +u, with the subscript denoting the value of a
variable in the ath year: M -- price-deflated currency, demand and time
deposits, Y, = price-deflated measured income, r, = commercial paper
interest rate, u -- disturbance term. The data are annual observations for the
U.S. economy, 1869 to 1963. In (6.9) a power transformation involving the
parameter 2` is applied not only to the dependent variable M but also to the
variables Y and r which are given independent variables. 7 If 2` = 1, the
relation in (6.9) is linear in the variables. If 2, - 0, it is linear in the
logarithms of the variables. As above, the data are employed to estimate 2`
along with the fl's and c, ', the common variance of the u,'s. The u,'s are
assumed to be normally and independently distributed, each with mean zero
and variance oF. For notational convenience we relabel the variables as
follows: (6.10) y(a = Ma - 1, x(a) Ya - 1 ..(a) r x - 1 h o.,:, = h ' a = h '
Further ym, x), and x > denote n x 1 column vectors with typical elements,
as shown in (6.10), and X m (t, x(, , .m is an n x 3 matrix with t an n x 1
column of ones. With this notation introduced, the likelihood function can
be expressed as -- [ 1 (Y()- X()I)'(Ym- Xml)], (6.11) l(la, A, .Jy) m J exp -
here ' = (x, a, a) and J = = - = x m- x. As above, we maximize L = log l with
respect to , h and ,a in a two-step fashion. First, for given A, the
maximizing values for and e given by (6.12) (A) = ( X m' X ()) - x X()'y(),
(6.13) 1 x(")ta(2`)]. = - On substituting these values into log L, we obtain
(6.14) log Lx -- - log (2`) + (2` - 1) log y, ? No attempt has been made to
cope with possible "simultaneous equation" problems in this analysis. Also,
see the Box-Cox paper for a discussion of transformations for the dependent
and independent variables.

166 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS which


is the maximized log-likelihood function except for a constant. Plots of log
Lmax against for analyses based on data for the over-all period 1869 to
1963 and for the period 1915 to 1963, using two definitions of money,
namely price-deflated currency, demand and time deposits (C + D + T) and
price-deflated currency and demand deposits (C + D), are shown in Figure
6.1. For the period 1869 to 1963 the point estimate for is = 0.19. An -8O
-9O -11o -120 -130 -140 -- -150 -0.90 Figure 6.1 I+ D + T, 1869--1963
-0.45 0 +0.45 0.90 1.35 Values of the log likelihood, given approximate 95o
confidence interval for , which was obtained from Lmax(}) -Lm(h) <
�Xxa(0.05) = 1.92, is 0.19 + 0.10. This and the other results in Figure 6.1
indicate that a "log-log" version of the demand for money, formulated in
terms of the variables mentioned above, may be THE BOX-COX
ANALYSIS OF TRANSFORMATIONS 167 approximately in agreement
with the information in the data. When h = is substituted in (6.12), this
yields 8 /x() = - 1.0551; /() = 1.1124; /a() = -0.0974 (0.2387) (0.0 ! 63)
(0.0160) as the ML estimates for/, f12, and/a; the numbers in parentheses
are large- sample standard errors. Similarly, a ML estimate for ,' can be
obtained by substituting } in (6.13). Analyses within the above framework
which employ an expected income variable in the money-demand function
and which take account of a money-adjustment process are reported in
Zarembka's paper. In addition, in handling the autocorrelation problem, if it
is assumed that the u's are generated by a first-order autoregression process,
u = pu_ + , in which the %'s are assumed to be normally and independently
dis- tributed, each with zero mean and common variance ,', then combining
this assumption with (6.9) yields (6.15) For given values of h nonlinear
least squares techniques can be employed to obtain conditional estimates of
the/'s, p, and ,,, which can be used, along with the associated values of h, to
evaluate the logarithm of the likelihood function to find the values
associated with a maximum as described above. Having considered the ML
analysis of the model in (6.1), which, as already mentioned, can be viewed
as an approximate large-sample Bayesian analysis, we now take up the
Box-Cox Bayesian analysis of the model. They proceed as follows to set
forth a diffuse prior pdf for the parameters of the model, the k elements of
15, c,, and h. Let (6.16) p(15, ,, A) = Px(15, o'lh) be the joint prior pdf, with
px(15, tr[ ), the conditional prior pdf for 15 and given , and p(h), the
marginal prior pdf for ,. They take p.(h) uniform; that is p2(,) oc const. With
respect to p(15, tr[h), they remark that if this condi- tional pdf were
assumed to be independent of " nonsensical results would be obtained."
This is the case, since the general size and range of the transformed
observations ym may depend strongly on ,. Recognizing this, Box and Cox
write the diffuse conditional pdf for 15 and log tr as follows: (6.17) g() d15
d(log a), where the subscript is introduced to emphasize that this is
conditional on 8 The/'s are not invariant with respect to changes in
measurement units, whereas ,, t statistics and elasticities, being pure
numbers, are.

168 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS given


A and where g(A) shows the dependence of pz(l, rr[A) on A. Using an
approximate consistency argument, 9 Box and Cox take g(A) = J-/, where
Jis the JacobJan shown in (6.3). Thus the final expression for the diffuse
prior pdf which Box and Cox employ is 1 (6.18) p(l, *, A) oc ,j/. On
combining (6.18) with the likelihood function in (6.4), we obtain the
following posterior pdf for the parameters J-'/ [ 1 (ym- Xl)'(ym- XI)]. (6.19)
p([, r, A[y) oc ,+ x..exp -- Note that we can write (6.20) (y(> - Xf)'(y <' -
Xf) = vs'(A) + [f - (A)I'X'X[f - $(A)], wherev=n-k, (6.21) and (6.22) = (x,
x)- - x$(,x)]. On substituting from (6.20) in (6.19), we have (6.23) p(l, To
obtain the marginal posterior pdf for A from (6.23) integrate with respect to
to obtain (6.24) p(l, Aly) oc JW{vs2(A) + [l - (A)]'X'X[[t - (A)]}-m. a Their
argument proceeds as follows. Take an arbitrary reference value for A, say
A and assume provisionally, for fixed A, that the relation between ya> and
y}XO is effectively linear over the range of the observations; that is, (i) y?)
= const + l,y'O. Then choose g(A) in (6.17) so that when the linear relation
between ya) and yXO holds the conditional prior pdf's for [3 and are
consistent with one another for different values of A. From (i), above, we
have (ii) log ax �' = const + log ,xo., and thus, to this order, the prior pdf
for ;o. is independent of A. Further, from (i) we have the fl's in y(;) linearly
related to those in yO,) and dla/d], = Ix. Since there are k/'s, we take g(A)
proportional to 1/lff. Last, to determine l n we note that in passing from A:
to A a small element of volume in the n-dimensional sample space is
multiplied by J(A)/J(A), where J is the Jacobian quantity in (6.3). An
average scale change for a single element of y is the nth root of this ratio.
Since A is just an arbitrary reference value, we have approximately la =
[J(A)] TM. Thus g(A) a: lx - = [J(A)] -m is the final expression for g(A)
which Box and Cox tentatively employ. CONSTANT ELASTICITY OF
SUBSTITUTION (CES) PRODUCTION FUNCTION Integrating with
respect to l yields (6.25) p(Aly ) 169 which is the marginal posterior pdf for
A. This pdf can be analyzed numeric- ally. Note, too, that (6.26) logp(Aly ):
const +v [log J - n log s'(h)] [ ] = const + (h - 1) log y - log s:(h) , and thus,
on comparison with (6.8), it is seen that the modal value of the posterior pdf
is identical to the ML estimate. To obtain the marginal pdf for an element of
1, say 1, (6.24) can be integrated with respect to 1o., ]a,...,/ by using
properties of the multi- variate Student t pdf. Then the result is a bivariate
posterior pdf for A and 1 which can be analyzed numerically. Last, note
from (6.24) that the con- ditional pdf for 1, given A, is in the form of a
multivariate Student t distri- bution. This fact can be employed to study
how sensitive inferences about the /'s are to what is assumed about A. 6.2
CONSTANT ELASTICITY OF SUBSTITUTION (CES) PRODUCTION
FUNCTION In a path-breaking paper � Arrow, Chenery, Minhas, and
Solow analyzed a class of production functions with a constant elasticity of
substitution parameter which we shall denote by $, 0 < $ < v. They show
that in a tWo-input function, when g = 1, the CES production function
becomes the Cobb-Douglas (CD) function, when g = 0, the Leontief fixed
proportion model, and, when $ - o% a production function with perfect
substituta- bility. Although the parameter g has most often been estimated
by using a necessary condition for profit maximization, here we take up the
direct estimation of the nonlinear function with two inputs, as presented by
Thornber, and then go on to consider an alternative approach which can
accommodate more than two inputs and is closely related to the Box-Cox
analysis of transformations considered above. 0 K. Arrow, H. Chenery, B.
Minhas, and R. Solow, "Capital-labor Substitution and Economic
Efficiency," Rev. Econ. Statist., XLIII, 225-250 (1961). n H. Thornher, "The
Elasticity of Substitution: Properties of Alternative Estimators," manuscript,
University of Chicago, 1966. See also V. K. Chetty and U. Sankar,
"Bayesian Estimation of the CES Production Function," Rev. Econ. Studies,
36 (1969).

170 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS In the


CES function the eth observation on output y is related to the capital and
labor inputs K and L as follows' (6.27) y = y[$K -o + (1 - $)L-q-W'e% tz =
1, 2,..., n, or (6.28) logy = log), + vlog{[$K-" + (1 - $)L-"] -x/�) + u,,
where y, $,' v, and p = -1 + 1/$ are parameters that satisfy 0 < y < , 0 < $ <
1,--oo < v < 0% and -1 < p < oo. Further, u is the (zth dis- turbance term.
We assume that the uds are normally and independently distributed, each
with mean zero and common variance '. Finally, we assume that either the K
and L are nonstochastic or, if stochastic, are distributed independently of the
u with a distribution not involving the parameters y, p, $, v, and t,. Under
the above assumptions the likelihood function is 1 ( 1 [fi'fi+ (0-)'X'X(O-
0)]}, (6.29) l(% v, 8, , [y, K, I.) oc -- exp -- where. 0= log {[SKx- log
([SKr,-" + (1 - 8)L,,-"]-''�). log 1 Yx , Y= : , Llo 3' and a'a = (y- x0)'(y-
Note that the n x 2 matrix X, 0, and fi'fi are functions of the parameters and
p. To generate ML estimates we take the logarithm of (6.29), denoted by L
= log l, and maximize with respect to and O, which leads to (6.30) . = 1 fi,fi
and 0 as the maximizing values. On evaluating L for these values we obtain
Lmax given by (6.31) L = constant - log (fi'fi), where fi'fi, shown above, is a
function of p and 8. By searching over a grid of CONSTANT
ELASTICITY OF SUBSTITUTION (CES) PRODUCTION FUNCTION
171 values for p and 8 x' we can find those values of and 8 that minimize
fi'fi, if they exist, and they will be ML estimates. Then the quantities in
(6.30) can be evaluated for the minimizing values of and $ to provide ML
estimates for c, and 0. Thornber points out that the mean and variance of the
ML estimator for o , the dasticity of substitution parameter, do not exist in
finite samples. However, the mean and variance of its asymptotic normal
dis- tribution, given y, v, 8, and , do exist. The mean is o , whereas the
variance of this asymptotic conditional distribution is Var(o) = (E[O l�g
l($[Y, v, 8, �t, Y)]' -x (6.32) with (6.33) = v25(/) ' l/p)[SK�, -a + (1 -
8)L,,, -a] log [SK,,, -a + (1 - 8)L, -a] ] 2 + [SK-" log K + (1 - 8)L -� log
L],, 8K-O + (1 - 8)L-O where = (1 - o)/d '. This result can be employed to
compute an approxi- mate large-sample standard error. In addition to these
large-sample results, Thornber reports the results of some Monte Carlo
experiments designed to provide estimates of the risk function associated
with alternative estimators for $, including the maximum likelihood
estimator, a linearized maximum likelihood estimator, and two estimators
generated by the criterion that they minimize expected loss, with the
expectation being taken by using the posterior pdf for the parameters. The
loss function he employed in this work is (6.34) L(o + + a loss function that
yields greater relative loss for an underestimate than for an overestimate.
Thornber used the following two prior pdf's in his experiments' First Prior:
px(y, v, c,, 8, o ) c p2(y, v, ,, 8, l) oc Second Prior: (1 4- d')Ue -e (1 + $)e -e
,, x2 Actually reparameterizing in terms of A = 1/(1 + fir) = (1 + p)/(2 + p)
or p = (2A- 1)/(1 - A) is convenient, since 0 _< A < 1, and thus the search
over and A is confined to the unit square, a point made by Thornbet.

172 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS with 0


< , y, v < o% 0 < $ < 1, and g > 0. The normalized marginal prior pdf's for o
are fx(o ) = (1 + $)2e- and f.(') = ]-o(1 + o)ae-. The first of these, fx(o), has
its mode at $ = 1, whereas the mode associated withfa(d') is at about $ =
2.12. Shown in Figures 6.2 and 6.3 are the results of Thornber's
experiments for - Maximum likelihood o I / � Linearized ML I / . o
MELO-1 0.1 0.25 0.45 0.7 0.95 1.05 1.4 1.8 2.3 2.8 Figure 6.2 Risk
functions for n = I0. Estimated (- g)2/[O + )' ( + risk functions under L(', )=
0.2 0.1 0.1 0,25 0.45 0.7 0,95 1.05 1.4 1.8 2.3 2.8 Figure 6.3 Risk functions
for n = 20. two sample sizes, 10 and 20. The points labeled MELO-1 and
MELO-2 are Bayesian minimum expected loss estimators using the first
prior pdf and the second, respectively. It is seen that use of the minimum
expected loss pro- cedure to generate estimators that incorporate the prior
information has resulted in substantial reduction in risk over almost all of
the parameter space. It is just for low values of the parameter $' that the risk
associated with the maximum likelihood estimators is lower than that
associated with the minimum expected loss estimators CONSTANT
ELASTICITY OF SUBSTITUTION (CES) PRODUCTION FUNCTION
173 In appraising these results it should be emphasized that a frequentist
criterion is being employed, one that not everyone accepts. For many, the
estimate that minimizes expected loss, given the sample information, is
optimal in line with the expected utility hypothesis and no frequentist
argument is required. We now turn to an alternative approach to the analysis
of the CES pro- duction function which illustrates an interesting connection
with the Box- Cox analysis of transformations. Let us initially consider the
deterministic form of the CES function with two inputs, xz and x2, and
constant returns to scale, namely, (6.35) V= = y[axxx=O + (1 - $,)x:]x,o, e =
1, 2,..., n, where V denotes the systematic part of output and g = -p = ($ _
1)/d , where o is the elasticity of substitution parameter. Then, on raising
both sides of (6.35) to the power g and rearranging terms, we have (6.36)
V?>= x( + 7o$[x(z _ x(] + 7(0)(1 + gx(), where V?> -- (V - 1)/g, y{> = (y _
1)/g and -{> (x 1)/g, i = 1, 2. Now assume that the observed output y
satisfies (6.37) Y(> = V? > + u, = 1, 2,..., n, where or (6.38) where .39) z=
1,2,...,n h = g, /x = 7$z, and with , a free parameter. a Note that in (6.38) we
do not write y? = (y, _ 1)/g as the dependent variable, since it does not seem
reasonable to assume that a power transformation with parameter g - (g _
1)/$ will induce normality and stabilize the variance of the disturbance
terms. Rather, we introduce a new parameter = g and use it in the power
transformation on the dependent variable. If g = 0, (6.38) reduces to the CD
form. If -- 1 and g = 1, we obtain a form linear in the variables. Also,
clearly, if h = 0 and ig = 1, we have a semilog relation. It is seen that
introducing the new ilParameter Widens the range of possible functional
forms under i.0nsideration. :'a Equation 6.38 involves power
transformations on the dependent and independent !i:iWariables and is thus
an exam,qe of B .... '"'--'- -' .......... ?::. v .,^ ,,,.u ,.,.,n mscusslon In ectlon 8
or- their paper, iWhat follows below is a presentation of their procedure for
analyzing this case.

174 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS With


w = y() - x? ), X = (x?) - x(ag> t + gx(ag)), 15' = (fix, rio), and when the u
in (6.38) are assumed to be normally and independently distributed, each
with mean zero and common variance o., the likelihood function is l(15, g, ,
ly) oc exp --= (w - x15)'(w - x15) (6.40) < - exp where J = =x Y-', the
Jacobian of the transformation from the uds to the ' y s, (6.41) = (X'X)-
XX'w, and (6.42) o:= k (w- n It should be emphasized that the quantities
and 0 are functions of A and g. Also and 0" are values associated with a
maximum of the likelihood function for any given A and g. On taking the
logarithm of (6.40) and sub- stituting (6.41) and (6.42) for 15 and c, ,
respectively, we obtain n 8u ' (6.43) Lmx(g, A) -- const +(A- 1) logy - 1og
Now we can use the computer to evaluate Lm,x(g, ,) over a grid of values
for g and A to find the pair associated with a maximum of Lm,x(g, ,), if
such a pair exists. Then these values say and }, and the values of and
associated with and } are ML estimates. Further from (6.39), given that we
have the estimates , ,/, and/, it is easy to obtain estimates of y, $, and &. In
situations in which we have more than two inputs and assume returns to
scale that may not be constant, a similar approach can be employed to
obtain ML estimates. Here in place of (6.34) we have xx,..., x for the k
input variables (we suppress the observation subscript for convenience of
notation): (6.44) V = y[8x + 8x. g +..' + 8_x_ + (1 - $ - &. ..... 8-)xg] ,
where again V is the systematic part of output, g = - p = (o - 1)/o, y and the
$'s are parameters, and v is the returns-to-scale parameter. Raising both
sides of (6.44) to the power g[v and rearranging terms, we get (6.45) = * -
xD + - +'" + - xg) + CONSTANT ELASTICITY OF SUBSTITUTION
(CES) PRODUCTION FUNCTION 175 and, on further rearrangements,
(6.46) V (�'= vx( g) + (x? )- x( )) + fi(x?- x( g)) +... + r"(a) - x( a)) + fi(1 +
gx()). In (6.46) the fi's are defined as follows' (6.47) fi = v87 /", i = 1, 2,..., k
- 1, and fi = 7 (g/). As above, we see no reason why a power transformation
involving the parameters g and v should induce normality and stabilize the
variance. Rather, we assume that the observed output y is related to V as
follows: (6.48) yta) = V() + u, where u is a disturbance term and here 2 =
3g/v, with a free parameter. Then, in matrix terms, the model for the
observations is (6.49) w = X15 + u, where w = ym, 15, = (v, fix,/9,.,..., ), u'
= (ux, u,.,..., u0, and X = (xg% x7 .,(o) x? x? - x?, - * , - ,..., ._ t + gx(>),
where t is an n x 1 column of ones. Just as above we can formulate the
likelihood function, which is (6.50) l(15, , g, v, ,[y) oc -- exp (w - X15)'(w -
X15) , where J is the JacobJan factor, and proceed to maximize it in the
Box-Cox two-step fashion. For any given values of and g the conditional
maximizing values for 15 and cr �' are (6.51) 1 = (X'X)-XX'y �" and
(6.52) = 1 (y(a)_ Xl),(y(a)_ Xi) ' On substituting these quantities in the
logarithm of the likelihood function, we obtain (6.53) L,x(2, g) = const + (2
- 1) x log y - log 0, which is just a function of the parameters 2 and g. By
using the computer (6.53) can be evaluated for a range of , and g values to
find the pair of values that yields a maximum and to study the properties of
the surface. Given that 3 and g are the values associated with the maximum
of (6.53), the I and
176 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS values
associated with } and are ML estimates for [5 and a '. Since the first element
of [5 is defined as v, we have the ML estimate of v, 0. Then, on refer- ring
back to (6.47), we can determine ML estimates of , and the $'s. As usual
with ML estimation, large-sample standard errors may be obtained from the
inverse of the estimated information matrix. x4 These large-sample ML
results will be useful in circumstances in which we have adequate numbers
of observations showing enough variation to measure the properties of the
highly nonlinear CES function. With small samples of data showing
relatively little variation, it will, of course, be difficult to make precise
inferences with these large-sample techniques. 6.3 GENERALIZED
PRODUCTION FUNCTIONS x6 Generalized production functions (GPF's)
are another broad class of functions which are usually nonlinear in both
parameters and variables. These functions have been introduced to permit
generalization in two directions. We wish to have production functions with
a preassigned elasticity of sub- stitution, say constant, but unknown, or
variable, say some function of the capital labor ratio. In addition to this
requirement on the elasticity of sub- stitution, we want our production
function to have returns-to-scale that vary with the level of output according
to a presssigned function. Zellner and Revankar have provided a method,
briefly described below, of generating production functions that meet both
requirements. Then we take up the problem of estimating the parameters of
a particular GPF. In deterministic terms we consider the following
differential equation: dr (6.54) df = fa'---' with solution (6,55) V = g(f),
where or(V) is the returns-to-scale as a function of output V,f = f(K, L) is in
the form of a neoclassical produ'ction function, and ar is the returns-to-scale
parameter associated with f. The function a(V) is chosen to. ensure that
dV/df > 0 for all f, 0 < f < . Thus (6.55) is a monotonic transformation. off
with the property that the shapes of the isoquants for g(f) will be the 4 See,
for example, M. G. Kendall and A. Stuart, The Advanced Theory of
Statistics, Vol. 2. New York: Hafner, 1961, pp. 511 if, a. This. section draws
on the results presented in A. Zellner and N. S. Kevankar, "Generalized
Production Functions," Social Systems Research Institute Workshop Paper
6607, University of Wisconsin, Madison, 1966, published in the Rev.
Economic Studies, 36, 241-250 (t969). GENERALIZED PRODUCTION
FUNCTIONS 177 same as those for f. Therefore the elasticity of the
substitution parameter, constant or variable, associated with V = g(f) will be
the same 6 as that associated with the function f. As an example, z7
illustrating the analysis of a particular GPF, let us take a(V) in the following
form: (6.56) a(V) = 1 + OV' with a and 0 parameters. If 0 = 0, the returns-
to-scale do not depend on V. On the other hand, if 0 > 0, the returns-to-scale
fall from a (a > 0), as V-> 0 and toward zero as V--> oo. Inserting a(V),
given in (6.56), into (6.54, the resulting differential equation is (6.57) dV V
o df = 7 at(1 + Or)' with solution Ve �v = Cf%, where C is a constant of
integration. If we let f = AL(x-O)K , then we obtain (6.58) Ve ov = >,KO(x-
O>L,O as our GPF with y = CA. Taking the natural logarithms of both sides
of (6.58) and adding a disturbance term, we have (6.59) log Vt + OV =/x
+/%. log K +/a log L + u, where the subscript i denotes the ith observation, i
= 1, 2,..., n, a = (1 - $), and/ = If, in (6.59), we assume that the u[s are
normally and independently distributed, each with mean zero and common
variance a ', the likelihood function is x� (6.60) _ i1 ] l(l$, 0, rldata) oc J
exp -'5 (z0 - Xl)'(z0 - XI) , O.n where z0 is an n x 1 vector, with a typical
element log V, + OVa, [5'= (x,/o., a), X is an n x 3 matrix with a typical row
given by (1, log K, See A. Zellner and N. S. Revankar, loc. cit., for an
explicit proof. x? Other examples are provided in the Zellner-Revankar
paper. x Note that fla and fla are pure numbers, whereas/x and 0 have values
that depend on the units of measurement employed. xo We assume that the
values of K and L are fixed or, if random, distributed indepen- dently of the
disturbance terms with a pdf not involving parameters of (6.59). See Zellner
and Revankar, op. cit., p. 246, and A. Zellner, J. Kmenta, and J. Drze,
"Specification aad Estimation of Cobb-Douglas Production Function
Models," Econometrica, 34, 784-795 (1966), for further discussion of these
assumptions.

178 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS log


L), and J denotes the JacobJan of the transformation from the n u6's to the n
V's given by (6.59). The explicit expression for J is (6.61) We first indicate
how the Box-Cox approach can be applied in the present instance to obtain
maximum likelihood estimates. We substitute from (6.61) into (6.60)and
then take the logarithm of both sides to obtain n 1 (6.62) L = const- 1ogo
where L denotes the log-likelihood, log l. Maximizing with respect to o2
leads to (6.63) o2= k (zo- x[5)%- x[5) as the conditional maximizing value
for ', given 0 and [5. Substituting o = ' in (6.62) yields n (6.64) Lx = const -
log (z0 - X[5)'(z0 - X[5) + log (1 + OV6). 6=! From the form of (6.64) it is
clear that, for any given O, Lx will be maximized if (z0 - X[5)'(zo - X[5) is
minimized with respect to the elements of [5. The minimizing value for [5,
given 0, is just (6.65) o = (X'X)-XX'zo, and on substituting this value in
(6.64) we have (6.66) where '(6.67) L2 = const - logs0 ' + log (1 + OVa), $0
2 = (z0- x$o)'(zo- x[5o) where v = n - 3 for the present problem. We can
now evaluate the last two terms on the rhs of (6.66) for various values of 0
to find the value associated with a maximum of L.. � Let us denote this
value by 0. This can be sub- 2o Note that this can be accomplished by
regressing z0 on X for selected values of 0 and obtaining so 2. Then the last
two terms on the rhs of (6.65) are evaluated. The con- ditional regressions
of z0 on X are often of interest in that they show how sensitive results are to
what is assumed about the value of 0. GENERALIZED PRODUCTION
FUNCTIONS 179 stituted in (6.66) to obtain the ML estimate for [5,
denoted $. Then, in (6.63), we can take [5 = $0 and z0 = z0 to compute the
ML estimate for c, '. Large- sample standard errors associated with ML
parameter estimates can be obtained from the inverse of the estimated
information matrix. The parameters associated with (6.59) have been
estimated by using the ML approach and 1957 annual cross section
observations for the U.S. transportation equipment industry. 'x In this
application the ML estimate for 0, based on data for n -- 25 states, was
found to be 0.134, with a large-sample standard error of 0.0638. With --
0.134 and a -- 1.49, ' the returns-to- scale function in (6.56) can be
evaluated, given V. Returns-to-scale were found to vary from a high of 1.45
to a low of 0.76 over the range of V observed in the data. To pursue a
Bayesian analysis of the model in (6.59), we require a prior pdf for the
parameters. Given 0, we assume that the prior pdf for/9x,/, , and is given by
�'a (6.68a) with. 0< 0, or < 00, 0 < tz < oo, (6.68b) g(a) oc (6.68c) (6.68d)
and (6.68e) po.(e) oc const, 1 p(o) oc-. In (6.68b) we follow Box and Cox's
argument, presented in connection with (6.16) and (6.17), to obtain a
proportionality factor, g(O), in the conditional .x See Ze!lner and Revankar,
loc. eit., for a fuller discussion of the data (presented in their paper) and the
ML results. 22 Zellner and Revankar, loc. cit., obtained the following
estimates: /x = 3.0129, /.--0.3330, /a = 1.1551, (0.3554) (0.1023) (0.1564)
where figures in parentheses are large-sample standard errors. Since =/. +/,
the ML estimate of is given by a =/. +/a. 2 Since/. +/8 = , we find it
convenient initially to parameterize the prior pdf in terms of/b. and , rather
than/o. and/8. Note that (6.59) can be written as log V + 0 V =/, + 0a log
KdL + log L + u.

180 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS prior


pdf in (6.68). The prior pdf for fi., given , shown in (6.68e), is a beta pdf
with parameters qx and q., whereas (6.68d) and (6.68e) represent diffuse
prior assumptions about and ,..4 Finally, the marginal prior pdf for 0, say
p4(O), must be specified. Given that numerical integration techniques are to
be employed, p(O) can be assigned any of a variety of forms to represent
the available prior information about 0. In the present application we
assume that our prior information about the parameters is relatively vague.
In (6.68c) we take qx = q. = 1 and take p(O), the marginal prior pdf for 0, to
be uniform. Thus the prior pdf for the parameters to be employed in the
calculations to follow is 1 (6.69) p(/x,/u, , c,, 0) oc j---,, with J given in
(6.61). In the present instance we can transform (6.69) to obtain u5 1 (6.70)
p(fi,/u, fia, a, 0) oc jat,' On combining (6.70) with the likelihood function in
(6.70), we have the following posterior pdf for the parameters (6.71) p(15,.,
0ldata ) oc j( - 3)In Crn+ 1 1 x15)] exp (z0 - X15)'(z0 - oc .-,-i exp --o 2
where v = n - 3 and 1o and so u are shown in (6.65) and (6.66), respectively.
As is apparent from the second line of (6.71), the conditional posterior pdf
for 15, given 0 and ,, is in the multivariate normal form u5 with conditional
mean 1o and covariance matrix (X'X)-xa u. The marginal posterior pdf's for
the parameters can be obtained as follows. 24 Alternatively, the analysis can
be performed with an inverted gamma prior pdf for , and a rather flexible
choice of prior pdf's for a, given that numerical integration techniques are
employed in analyzing the posterior pdf. 25 The transformation from the
variables of (6.69) to those in (6.70) has a JacobJan equal to 1, since 26
Since the economic theory of the problem tells us fi2, fia > 0, this is a
truncated normal pdf. For the present analysis we shall not utilize the prior
information regarding the non-negativity of fi. and fia but shall let these
parameters have ranges from -o to +o. Below it will be seen that for the data
utilized the truncation is not important. If it were, a trivariate numerical
integration would be needed to obtain marginal posterior pdf's.
GENERALIZED PRODUCTION FUNCTIONS 181 If interest centers on
and 0, (6.71) can be integrated with respect to 15 to obtain the bivariate
posterior pdf for 0 and c,' (6.72) p(c,, 0ldata ) occ,-- exp \--1. The pdf in
(6.72) can be analyzed numerically to obtain the marginal pos- terior pdf's
for c, and 0. Alternatively, the marginal posterior pdf for 0 can be obtained
by integrating (6.72) with respect to c, analytically to yield (6.73) p(OIdata
) oc (so.) Univariate numerical integration techniques can be employed to
obtain the normalizing constant and to analyze other features '7 of this
marginal pdf. Since, as noted above, 0 has the dimensions of t. hose of the
reciprocal of the output rate [see (6.59)], it must be recognized that both the
ML estimate of 0 and the pdf in (6.73) will be affected by a change in units
of measurement of output. Just as the ratio of the ML estimate to its
standard error is free of units of measurement, the mean of 0 divided by its
standard deviation, the 5.0 4.0 'o 3.0 2.0 1.0 Figure 6.4 0 0.10 0.20 0.30
0.40 0.50 0.60 o Marginal posterior pdf for 8 computed from (6.73), '7 It is
interesting to observe that the mode of (6.73) occurs at 0 = 5, the ML
estimate; that is Iogp(0[data) -- const + (v/n)(logJ - n/2 log so ') = const +
(v/n)[X[L_x log (1 + OVO - n/2 log so']. The quantity in square brackets is
precisely the same as the last two terms on the rhs of (6.66) and thus the
modal value of the posterior pdf is precisely equal to the ML estimate.

182 2.0-- 1.0 3.0 2.0 1.0 I ANALYSIS OF SINGLE EQUATION


NONLINEAR MODELS illlllilllllllliJlllltilltllJ 0 0.2 0.4 0.6 0.8 1.0 (a). III I
Ill I III Jl I Jill II lll II It I I 0.6 0.8 1.0 1.2 1.4 1.6 (b) Figure 6.5 I I I I I I I I
Marginal posterior pdf's for ,82 and coefficient of variation, is also not
dependent on units of measurement. Alternatively, for a given output rate,
say , the quantity O V is a pure number and its posterior pdf can be obtained
from (6.73) by a simple change of variable from 0 to = VO. The posterior
pdf for i = OV is of interest because, as can be seen from (6.59), it is
precisely the term that reflects a departure from the Cobb-Douglas form of
the production function. To obtain the marginal pdf's for one of the/'s, say/x,
integrate (6.71) with respect to ,/a, and/a analytically. The result is a
bivariate posterior pdf for fix and 0. Then bivariate numerical integration
techniques can be employed to obtain the marginal posterior pdf for/.
Similar operations yield the marginal posterior pdf's for/a and The posterior
pdf for =/o. +/a is obtained by integrating (6.71) with respect to/x and
analytically. The result is the posterior pdf for/a,/a, and QUESTIONS AND
PROBLEMS 183 0. For given 0 this pdf is in the form of a bivariate
Student t pdf. Then make a change of variables to e = o. +/3 and 0s and
integrate out 3, an operation that yields the joint posterior pdf for 0 and .
Bivariate numerical integration techniques can be used to calculate the
marginal posterior pdf for . The operations above have been applied by
using the U.S. Census of Manufactures cross-section data for the
transportation equipment industry presented in the Zellner-Revankar paper.
For each of n = 25 states the data are value added, labor input, and capital
input, all on a per establishment basis. Shown in Figure 6.4 is the posterior
pdf for 0. It is clear that it has most of its density over positive values of 0,
which suggests that the returns- to-scale do vary with the level of output. In
Figure 6.5 the marginal posterior pdf's for o. and Oa are shown. With the
relatively diffuse prior pdf's employed in this analysis, it is seen that the
posterior pdf's have modal values that are close to the ML estimates.
Although this is the case, it should be noted that the posterior pdf's depart
from being normal, which indicates that "large sample" conditions are not
yet encountered for n = 25. �' QUESTIONS AND PROBLEMS 1.
Consider a simple regression model, y = Po + x + u, i = 1, 2,..., n, with , --
E(ydx, Po, P) = Po + ,axe. After providing assumptions sufficient to obtain
a posterior pdf for o and/ derive the posterior pdf of the elasticity (00 of V{
with respect to x,, namely, Ox o + X& 1 + po/PX o for given x g 0. If 0 and
/ have a posterior pdf in the bivariate Student t form, will the posterior mean
of 0 in Problem 1 exist ? If the denominator of 0 has a very small
probability of being nonpositive as n grows, justify the approximation EO,
'- x,p/(po + x,p) for large n, where Po = Y - P and = [P. (y, - y) x (x, - )1[?.
(x, - )2. 3. Let the observation y satisfy y = f(x, ) + e, i = 1, 2,..., n, where
the e's are assumed NID(0, ao) and f(xu ) is a continuous, twice dif-
ferentiable function of an independent variable, x, and a scalar parameter,
28 This suggests that "large-sample" properties of standard errors, based on
the inverse of the estimated information matrix, and of sampling theory
confidence intervals may not be encountered for the present model and data
with n = 25.

184 ANALYSIS OF SINGLE EQUATION NONLINEAR MODELS . If we


expand f(x, ,) in a Taylor's series about , the ML estimate for , retaining just
the linear term, and write Of I + q, y .-f(x. ) + ( - ;0 ,= explain how linear
Bayesian theory can be utilized to analyze this linearized equation.
Comment on the form, mean, and variance of the posterior pdf for ,, given
that a diffuse prior pdf for, and e is employed. 4. In Problem 3 an
approximation to a nonlinear equation was introduced. Comment on the
quality of the approximation as n gets large and the likeli- hood function
becomes sharp. In particular, consider the behavior of the second-order term
in the Taylor's series expansion as n grows large. When n is small, would it
be appropriate to use the approximation for f(x, -) along with an informative
prior pdf for, ? 5. Generalize the considerations in Problems 3 and 4 to the
case in which - represents a vector of parameters rather than a scalar
parameter. 6. Assume that logy = fix + ax + u, i = 1, 2,..., n, where the u's
are NID(0, ) and the x's are given values of an independent variable. Derive
expressions for the mean and median of the pdf for y, given x, x, a, and ,
and explain how to obtain posterior pdf's for these two measures of central
tendency relating to the pdf for y, 7. Assume that we have a discrete random
variable, y, which assumes the value 1 with probability P and the value 0
with probability 1 - P. Then, if, in n independent trials, we obsee nx l's and n
- nx O's, the likelihood function (l) is ven by l=e, (l-V,). Further, suppos ha
P safises P = ] - g-', = 1, 2,..., , wber = and = ar parameters and x s a non-
ngafiv valued gvn obsabe "stimulus" variable. (a) Dscuss properties of h
function, nroducd above, gvng h locus of th P's. (b) Explain how a
computer sarch mhod can be utilized o obtain L sfimas for and =. (c)
Formu]a a prior pdf for = and = and indca how bivafia numfiea mafion
chnqus can used o mak post,flor nferncs. 8. If, n Problem ?, w assume h
following aRrnafiw, logistic functional form for th P's, P = (1 + -"+"P)-l, =
], 2,..., ,, whr z s a gwn obsrwab] vafiab] ha can rang from - o +, (a)
inwsfiga th mahmafiea props of h ogsfic function, particularly th depndnc of
is shap on h algebraic sgn of , (b) prsn a pro- cdur for computing L sfimas
for and , gwn samp nformafion, and (c) n rms of a particular suggested
application, using h abow mod], QUESTIONS AND PROBLEMS 185
formulate a prior pdf for/x and/%. and indicate how a Bayesian analysis can
be performed. 9. Do Problem 8 under the assumption that = -- e- Awu dw,
where yx and 9,0. are parameters with unknown values. (In Part a,
investigate the dependence of the shape of the cumulative normal function,
shown above, on the algebraic sign of 9,0..) 10. Consider the following
"Engel relation": y = c, + q, i = 1, 2,..., n, where y = expenditure, x =
income, and/ are parameters with unknown values, q is an error term, and
the subscript i denotes variables pertaining to the ith household. Assume
that the xfs are given independent variables and the q's are NID(0, o) with
o. having an unknown value. Provide a convenient algorithm for computing
ML estimates of ,/, and 11. If, in Problem 10, we have a prior pdf for ,/, and
, say p(,/, ) oc px(,/)/, where 0 < < co and px(,/) is a prior pdf for and/,
derive the joint poster- ior pdf for and /. Then explain how to compute the
posterior mean and variance of x , given x = Xo, a known value. 12. If, in
Problem 11, we assume that px(s,/) cc const, what are the modal values of
the joint posterior pdf for and/? Comment on the assumption p(, const and
provide an alternative that is more in accord with previous experi- ence
with Engel curve analysis. 13. Let y = [(x* - xO + q with xt* = oz, i = 1,
2,..., n, which implies y = ](sz - x) + q, where z and x are given independent
variables, and/ are parameters with unknown values, and the q's are random
error terms assumed to be NID(0, o.), with o. the common unknown
variance of each q. Derive ML estimates of s and/ and comment on the
sampling properties of the ML estimator for s. In particular, does its mean
exist ? Evaluate Fisher's informa- tion matrix for the parameters s,/, and a
and comment on its properties. 14. Given that s = so, obtain the posterior
pdf for/ in Problem 13, using the following prior pdf: p(fi, ) oc p()/, with 0 <
< co and p(fi) is a proper prior pdf for/. As n grows large, what is the mean
of the posterior pdf for /, given = so and the sample information ? 15.
Explain how the model in Problem 13 can be analyzed using the following
informative prior pdf:p(,/, ) = p(s) po.(/) pa(), with p(s) and po.(]) being beta
pdf',s and Pa() an inverted gamma pdf.

CHAPTER VII Time Series Models: Some Selected Examples Most, if not
all, economic analyses involve time series data. Thus it is important to have
good techniques for the analysis of time series models. In this chapter we
take up the analysis of selected time series models to demonstrate how the
Bayesian approach can be applied in the analysis of time series data. It will
be seen that if the likelihood function is given and if we have a prior pdf the
general principles of Bayesian analysis, presented in Chapter 2, apply with
no special modifications required. This is indeed fortunate, since it means
that our general principles are applicable to time series problems as well as
to others. 7.1 FIRST ORDER NORMAL AUTOREGRESSIVE PROCESS
The model assumed to generate our observations, y' = (yx, y,..., YT) is '
(7.1) y =/g + tgyt- + ut, t = 1, 2,..., T where/x and/g are unknown parameters
and u is a disturbance term. We assume that the ut's are normally and
independently distributed, each with zero mean and common variance ,o..
As regards initial conditions, we shall first go ahead conditional on given
Y0, the observation for t = 02 With these assumptions the likelihood
function is (7.2) l(, rio., *IY, y0) oc- exp Z (Y*- fi - fi2yt )2 T ZO - with the
summation extending from t = 1 to t = T. The distinction often made
between time series and cross-section data does not invalidate the statement
made above, since cross-section data are in fact observations on time series
variables pertaining to individual units in a cross section. Overlooking the
time series nature of cross-section data can at times lead to serious errors in
analyzing data. 2 Here we use the subscript t to denote the tth value of a
variable. a See the discussion of initial conditions presented in connection
with the problem of autocorrelation in regression analysis, presented in
Chapter 4, for other possible assumptions. 186 FIRST ORDER NORMAL
AUTOREGRESSIVE PROCESS 187 As regards a prior pdf for the
parameters, we shall assume that our information is diffuse and represent
this in the usual way, namely, 1 (7.3) p(/,/o., *) oc-, with -oo </ < oo, -oo </
< oo, and 0 < , < oo. Note that we do not restrict/ to be within the interval -
1 to + 1 and thus the analysis applies to both the explosive and
nonexplosive cases. 4 In fact, our posterior pdf for will reflect what the
sample information has to say about whether the process is or is not
explosive. On combining (7.2) and (7.3), the posterior pdf for the
parameters is (7.4) P(15, IY, Yo) oc -F'i exp 22 (Yt - tg - fiYt-)o. , where 15'
= (/,/o.), which is in a form exactly similar to that obtained in our analysis
of the simple normal regression model in Section 3.1. To obtain the
marginal posterior pdf for 15 we integrate (7.4) with respect to, which
yields P(151Y, Yo) oc [Y. (y,-/ -/o. yt_0�'] (7.5) oc [vs + (15 - )'H(15 -
where v = T- 2, (7.6) H= Ey_ ( r (7.7) = Y- Y-,.] \Y. Y- Yd and vs a = (Yt -
fix - aYt-x) a. It is seen from (7.5) that the joint posterior pdf for fix and a is
in the bivariate Student t form with mean given by (7.7), the least squares
quantity. This fact permits us to make inferences about d a quite readily. In
particular, the marginal posterior pdfs for will each be in the form of
univariate Student t pdf. Explicitly, the quantities will each be distributed as
a Student t variable with v = T- 2 degrees of frdom. Further, by integrating
(7.4) with respect to , the marginal pdf for is given by 1 [ s (7.8) P(lY, yo)
wire = r - 2 and = - - Of course, if we have information that the process is
nonexplosive and wish to use it, the prior pdf in (7.3) can be altered to
incorporate this information. See below for an example.

188 TIME SERIES MODELS' SOME SELECTED EXAMPLES As


mentioned above, these results for the normal first order autoregressive
process parallel those for the simple normal regression model. Als0, the
predictive pdf for the next future observation y.. z will be in the univariate
Student t form with mean +/.y., which depends just on given sample values.
If we have additional information available about the autoregressive process
in (7.1) and wish to incorporate it in the analysis, this can be done without
much difficulty; for example, we might assume that the process is stationary
with I/gel < 1. Then the initial observation, Yo, is given by 5 oo co or (7.9)
l=0 1=0 Yo = 1 - f. + fdu_. l=0 From (7.9) Y0 is normally distributed with
mean f/(1 - re) and variance p2/(1 -/g2e). Thus the joint pdf for the T + 1
observations Y0 and y is given by (1--fee) ( 1 [ ( fl f2)e p(y, Yoil$,p) oc p+
exp -- (1-fie a) Yo 1-- ' (7.10) - + Z (Yt- fx- feYt-x)e]} ' where [fie[ < 1, 0 <
p < 0% and -oo < f < co. This pdf, viewed as a function of its parameters, is,
of course, the likelihood function. As regards prior assumptions, we assume
that fx, f2, and log p are independently dis- tributed. With respect to f and
log p we assume that they are uniformly and independently distributed. Our
prior pdf for f2 is designated by P(f2). Thus our joint prior pdf is 6: (7.11)
p(fx, fie, P) oc where -oo < f < o, {/e{ < 1, and 0 < Under the above prior
assumptions the posterior pdf for the parameters is p(la, plY, yo) oc (7.12)
p(fe)(1 - fee) fx e pr+e exp (-b [(1 - fee)(Y� 1-f] 5 As usual, with the
assumption of stationarity, the series "starts up" in the infinite past. If the
series started up at - To, with To finite, it would not be strictly stationary.
However, with modification of (7.9), the model could be analyzed by using
the above methods, if To is known. 6 The analysis can also be carried
forward with informative pdf's for and fix. FIRST ORDER NORMAL
AUTOREGRESSIVE PROCESS 189 On completing the square on f in the
exponent of (7.12), we have p(la, plY, yo) oc (7.13) p(fe)(1 - fee) " pT+2
exp (-b [c(fix - /.)e 1 (T . + +7 where c = (1 + re)/(1 - re) + T, Y.x = Y. [Yt -
f - fe(Yt- - f-0] e, e = Y. [yt - feyt-. - (1 - fe)Yol e, y = Y. ydT, Y- = Zy,-dT,
and fi = [(1 + fe)Yo + (Yt - feYt-O]/c. From (7.13) it is seen that the
conditional mean of f, given fe and p, is/. Also,/1 can be written as follows'
t = hx (y - feyt-O/r + he(1 - fe)Yo h + ha ' where h -- T/et a and ha = (1 +
re)/(1 - fe)p e. Note that, given re, from (7.9) (1 - fe)Yo is an estimate off
and Y. (yt - feYt-0/Tis another estimate of f. The quantity/x is a weighted
average of these two estimates with their respective precisions as weights.
Also, from the form of (7.13) the conditional posterior pdf for f, given fe
and p, is normal with mean/ and variance e'/c. As T grows large pe/c--->-0
and t-->. Y. (y,- feyt-O/T; thus the influence of the quantity (1 -fe)Yo on the
location of the conditional posterior pdf diminishes as T grows large. To
obtain the marginal posterior pdf for f, (7.13) can be integrated with respect
to p and the resulting bivariate posterior pdf for f and fe can be analyzed by
using bivariate numerical integration techniques. Since interes[ often
centers on f2, we now discuss its marginal posterior pdf which can be
obtained by integrating (7.13) with respect to f and p. Inte- gration with
respect to f yields Since h = TIp e, as T gets large this posterior pdf is
approximately pro- portional to (l/p) *+x exp [-(2oa) -x x] in large samples,
7 a form that is free from dependence on information regarding initial
conditions except insofar as Y0 appears in . The pdf in (7.14a) can be
employed to make joint inferences about fe and or marginal posterior
inferences about p. 7 Since the factor p(fi2)(1 - fi22)� does not depend on
T, it will not be important in large samples.

190 TIME SERIES MODELS: SOME SELECTED EXAMPLES Next we


can integrate (7.14a) with respect to, to obtain the following marginal
posterior pdf for (7.14b) oc x '1 1 + (l/T)(1 +/,.)/(1 - ,.) 1 This pdf can be
analyzed numerically. For large samples it will be approxi- mately
proportional to 8 (x)- TM = { [yt - Y -/%.(Yt-x - Y-x)]'}- va which, as
pointed out above, is in the univariate Student t form and does not reflect
either the prior pdf p(/a) or the factors in the second line of (7.14b) arising
from consideration of the pdf for Y0. With respect to the prior pdf for/%.,
p(/.), often the following beta pdf defined over the range -1 to + 1 will be
flexible enough to represent prior information8: p(/%) oc (1 -/.)-x(1 +/.)--x,
where kx and k. are prior parameters. to be assigned by the investigator. If
kx = ka = �, this prior pdf is identical to what is produced by an
approximate application of Jeffreys' invariance theory, namely, p(/ga) oc (1
-/.)-(1 +/%)- = (1 -/?')-, < 1,... (see the appendix to this chapter for details).
Thus, if we wish to go forward by using an approximate Jeffreys' diffuse
prior pdf for this problem, it is p(/x,/., *) oc (1 -/aa)-*-x.x0 As already
explained, as the sample size grows, the influence of the prior pdf on the
properties of the posterior pdf diminishes. Also, as seen from the present
analysis of the stationary first order process, the influence of the initial
conditions, 'that is, the assumptions about Y0, diminish in importance as the
sample size grows. 8 This approximation can be appreciated most easily by
taking the log of both sides of (7.14b) and noting that - T/2 log Ex is the
dominant term as T gets large. o This prior pdf is suggested in H. Thornber,
"Finite Sample Monte Carlo Studies: An Autoregressive Illustration," J.
Am. Statist. Assoc. (September 1967), who studied the above system with
/x = 0. Note that a change of variable z = (1 +/.)/2 yields 1 - z = (1 - /.)/2
and thus p(z) or z. -x(1 - z)x- x with 0 < z < 1, since -1 < t. < 1; p(z) is in
the usual form of a standard beta pdf and will be proper if kx, k. > 0. o See
Thornber, loc. cit., and J. B. Copas, "Monte Carlo Results for Estimation in
a Stable Markov Time Series," J. Roy. Statist. Soc., Series A, No. 1, 110-
116 (1966). These studies show that the sampling properties of Bayesian
estimators compare favorably with alternatives. In particular, in Thornber's
experiments the estimated average risk of the ML estimator was more than
50% higher than that for the Bayesian estimator. See also G. H. Orcutt and
H. S. Winkour, Jr., "First Order Autoregression: Inference, Estimation, and
Prediction," Econometrics, 37, 1-14 (1969), for Monte Carlo experiments
relating to the finite sample properties of certain sampling theory
techniques. FIRST ORDER AUTOREGRESSIVE MODEL WITH
INCOMPLETE DATA 191 The latter result is not surprising, since the
initial condition involves just one observation, Y0, a small fraction of the
sample information when T is even moderately large and [/l < 1. 7.2 FIRST
ORDER AUTOREGRESSIVE MODEL WITH INCOMPLETE DATA xx
Suppose that we are interested in making inferences about the parameters in
the following autoregressive model for quarterly data, (7.15) y(t) = py(t - 1)
+ X(t)l + u(t), t= 1, 2,..., 4T, where t in parentheses denotes the value of a
quantity for the nh quarter, y(t) is a dependent "stock" variable, u(t) is a
disturbance term, X(t) = (x(t), xdt),..., x(t)), a 1 x k vector of observations
on k independent variables, and l'= (/,/.,...,/) and p are unknown coefficients.
We assume that the disturbance terms are normally and independently
distri- buted, each with zero mean and common variance ,'. Although
quarterly observations are available for X(t), we assume that only the
following T + 1 annual observations are available for y(t): y(0), y(4),
y(8),..., y(4T); for example, y(t) might be end of quarter stock of capital or
money. With just T + 1 annual observations on y(t) our problem is to make
inferences about the parameters of (7.15), namely, p, 13, and ,. Denoting a
quarter for which an observation on the dependent variable is available by
t', we find by direct operations that y(t') = p y(t' - 4) + [X(t') + p X(t' - 1) + p
X(t' - 2) + p X(t' - 3)]1 (7.16) +u(t')+pu(t'- 1)+ pu(t'-2)+pau(t'- 3), t' = 4,
8,..., 4T, which is what the model in (7.15) logically implies for the
observations we have on y. If we let (7.17) w(t') = u(t') + p u(t' - 1) + p'u(t' -
2) + pau(t' - 3), t' = 4, 8,..., 4T, it is clear that the w(t') are normally and
independently distributed, each with mean zero and common variance (1 +
p' + p + p)*'. Then, if we assume that y(0) is fixed and known, the
likelihood function, based on the observations we have on the y's, is (7.18)
[(1 + p + ( x exp -2(1 Z [w(t')]a} ' x The material in'this section is drawn
from A. Zellner, "On the Analysis of First Order Autoregressive Models
with Incomplete Data," Intern, Econ. Rev., 7, 72-76 (1966).

192 TIME SERIES MODELS: SOME SELECTED EXAMPLES where y


denotes the T q- 1 observations, y(0), y(4),..., y(4T), the summation in the
exponent is taken over the following values of t', t' -- 4, 8,..., 4T, and w(t'),
given by (7.17), represents (7.19) w(t') = y(t') - p y(t' - 4) - [X(t') + pX(t' - 1)
+ pa X(t' - 2) + p3 X(t'- 3)]15. First, we shall indicate how ML estimates of
the parameters can be obtained. On taking the logarithm of the likelihood
function, differentiating with respect to ,a and setting the derivative equal to
zero, we obtain: 1 [w(t,)].. (7.20) Oa= T(1 + pa + p, + p6) t, On substituting
b ' for o a in the log-likelihood function, the following is the result: T (7.21)
Lax(p, 15) --- const - log t, [w(t')] ', which will be maximized for those
values of p and 15 that minimize Y.v [w(t')] . One method of searching for
these values is to evaluate the residual sum of squares, say s'(p), for
regressions of y(t') - p4(y(t' - 4) on [-X(t') + pX(t'- 1) + p' X(t'- 2) + p3 X(t'-
3)1 for various values of p. If is the value of p for which s'(p) is minimal,
then and the associated given by (7.22) l = H - [X(t') + t3 X(t' - 1) + t3 ' X(t'
- 2) + a X(t' - 3)1' t' where x [y(t')- t3 y(t'- 4)1, (7.23) [X(t') + /9 X(t' - 1) +
/9" X(t' - 2) + t3 a X(t' - 3)]' x [X(t') + t3 X(t' - 1) + /9 = X(t - 2) + t3 a X(t' -
3)1 are ML estimates. On substituting these estimates in (7.20), we obtain a
ML estimate for o a. Coupled with large-sample standard errors, obtained
from the inverse of the estimated information matrix, these results can be
employed to make approximate large-sample inferences. For the finite
sample Bayesian analysis of this problem we employ the likelihood
function in (7.18) along with the following diffuse prior pdf: 1 (7.24) p(p,
15, ,) oc -, with -c < fie < oo, i = 1, 2,..., k, and 0 < * < oo. As regards p, we
can assume either - < p < o or -1 < p < 1, since these assumptions will just
affect the range of numerical integrations in what follows. However, it
FIRST ORDER AUTOREGRESSIVE MODEL WITH INCOMPLETE
DATA 193 should be recognized that assuming IP[ < 1 implies that the
autoregressive process is nonexplosive. With these prior assumptions the
posterior pdf for the parameters is (7.25) P(P, 15, *tY) oc x exp (1 + p' + p +
p), ] [w(t')]} ' where w(t') is given explicitly in (7.19). To obtain the joint
marginal posterior pdf's for p and 15 integrate (7.25) with respect to to
obtain (7.26) p(p, 15[y) oc { [w(t')]'} -. It is convenient to write (7.26) as
follows: (7.27) P(P, 151Y) oc [(z - A15)'(z - A15)l -v'', where z is a T x 1
vector with typical element y(t') - py(t' - 4) and A is a T x k matrix with
typical row X(t') + pX(t' - 1) + p'X(t - 2) + paX(t - 3). Letting (7.28) and
(7.29) we can write (7.27) as (7.30) p(p, 151Y) oc [$(p)]-'t'(v (p) = (A'A)-
A'z sa(p) = (r- - Z(p)l'[z - + [15 - (p)l'Z'z[15 - where u = T- k. From (7.30)
we see immediately that the conditional posterior pdf for 15, given p, is in
the multivariate Student t form with mean given by (7.28). x This fact
makes it easy to assess how sensitive inferences about the elements of 15
are to what is assumed about p. To derive the marginal posterior pdf for p
we integrate (7.30) with respect to 15, using properties of the multivariate
Student t pdf which yields (7.31) P(PlY) which can be analyzed
numerically to make inferences about p. As regards xa If p were taken equal
to , the ML estimate, (7.28) would yield the ML estimate for [t as the mean
of the conditional posterior pdf. Since inferences about [3 may be sensitive
to what is assumed about p, it is better to use the marginal posterior pdf for
[3 to make inferences. In this connection see, for example, the analysis of
the problem of auto- correlation in regression analysis in Chapter 4.

194 TIME SERIES MODELS: SOME SELECTED EXAMPLES the


marginal posterior pdf for an element of 15, say fix, (7.30) can be integra-
ted with respect to/., fia,. �., fie, to yield (7.32) p(p,/%IY) oc + {,, + (g'.,,-
g;.,)LSl- where the V's are submatrices of A'A/sU(p); that is s(P) = L l'%l
where the partitioning has been done to conform with the partitioning of 15
into fix and a vector of remaining elements. Thus Vxl is a scalar, Vx, a 1 x
(k - 1)'row vector, Vx = Vl', and V.. is a (k - 1) x (k - 1) matrix. Numerical
techniques can be employed to analyze the bivariate pdf in (7.32). 7.3
ANALYSIS OF A SECOND ORDER AUTOREGRESSIVE PROCESS la
In this section we show how Bayesian techniques can be employed to make
inferences about the dynamic properties of solutions to stochastic difference
equations which are often encountered in practice. What is presented is an
analysis of a second order linear autoregressive model designed to answer
questions of the following kind. On the basis of the data we have, what is
the posterior probability that the model's solution will be nonexplosive and
oscillatory? Or, what is the posterior probability that the solution will be
oscillatory ? Clearly, such questions resemble those asked by Samuelson in
his well-known paper TM on the multiplier-accelerator interaction and also
those considered by Theil and Boot in their large-sample analysis xs of
Klein's Model I. Our model for the observations is assumed to be (7.33) Yt
= (zlYt-1 + (zYt-9. q- Ut, t = 1, 2,..., T, where Yt is the tth observation on a
random variable, el and . are unknown coefficients, and ut is a disturbance
term. We assume that the ut's are normally and independently distributed,
each with mean zero and common variance x0 This material is based on one
section of A. Zellner, "Bayesian Inference and Simul- taneous Equation
Econometric Models," paper presented to the First World Congress of the
Econometric Society, Rome, 1965. x4 P. A. Samuelson, "Interactions
between the Multiplier Analysis and the Principle of Acceleration," Rev.
Econ. Statist., 21, 75-78 (1939). xs H. Theil and J. C. G. Boot, "The Final
Form of Econometric Equation Systems," Rev. Intern. Statist. Inst., 30, 136-
152 (1962), reprinted in A. Ze!lner (ed.), Readings in Economic Statistics
and Econometrics. Boston: Little, Brown, 1968, pp. 611-630. ANALYSIS
OF A SECOND ORDER AUTOREGRESSIVE PROCESS 195 o a.
Further, assuming that the initial values y_ x and Yo are given, the likeli-
hood function is (7.34) l(al, a., *IY) oc exp -2o (yt - exYt-1 - e.Yt-) t--1
where y' = (y_ l, Yo, Yl,..., y.). As regards prior information about el, , and
,, we assume that little is known about these parameters and represent this
in the usual way 16- (7.3-5) P(al, ., (0 oc 1, where -o < el, . < oo and 0 <, <
oo. Then, using Bayes' theorem, the posterior pdf for the parameters is
(7.36) (.+'--'i 2?' On integrating with respect to , the marginal posterior pdf
for ax and is found to be (7.37) where = (7.38) &= (lt \c/ and p(l, oc + - -
(7.39) H = LY.Y,-ly,- y_. J' with all summations extending from t = 1 to t =
T. It is seen that the posterior pdf for ex and . in (7.37) is in the bivariate
Student t form with mean vector a, the least squares quantity shown in
(7.38). Given that we have the observations y, we can use (7.37) to make
joint inferences about el and and thus about properties of solutions; that is,
just as Samuelson did in his multiplier-accelerator paper (cit. supra), we can
determine regions in the (el, ) plane corresponding to solutions having
certain properties. These regions, relating to the present model, are shown
in Figure 7.1. Since we have the joint posterior pdf, p(ex, e.Jy), we can use )
The analysis presented below can be extended easily to the case of an
informative pdf for z and a.

196 TIME SERIES MODELS.' SOME SELECTED EXAMPLES 14} E-O


Figure 7.1 Regions in the parameter space for which solution has particular
properties. Regions: (1) E - NO: Explosive and nonoscillatory; (2) NE -
NO: nonexplosive and nonoscillatory; (3) NE- O: nonexplosive and
oscillatory; (4) E- O: explosive and oscillatory. bivariate numerical
integration techniques to evaluate its normalizing constant 7 and the volume
over each of the regions. Given that the posterior pdf for and ao. has been
normalized, these volumes are posterior probabilities relating to properties
of the solution. If, for example, the volume of the posterior pdf over the
"oscillatory-nonexplosive" region were com- puted to be 0.85, we would
say that the probability is 0.85 that the solution will be osciIlatory and
nonexplosive. Further, by adding the probability that the solution will be
oscillatory and nonexplosive and the probability that it will be oscillatory
and explosive, we have the probability that the model's solution will be
oscillatory. Similarly, by adding the probability that it will be oscilla- tory
and nonexplosive and the probability that it will be nonoscillatory and
nonexplosive, we have the probability that the solution will be
nonexplosive. Application of this approach, using data generated from
known models, is provided below. We note further that it is possible to
derive the posterior pdf's for quantities whose value determines particular
properties of the solution; for example, from the characteristic equation for
the model x - x - . -- 0, we have the following roots: + /z �' + 4. - V ' + 4.
(7.40) xx= 2 and xo.= 2 ? Alternatively, this normalizing constant is known
from properties of the bivariate Student t pdf. ANALYSIS OF A SECOND
ORDER AUTOREGRESSIVE PROCESS 197 As is well known, the
solution will exhibit oscillations if z a + 4zo. < 0. Thus, on occasion, it will
be of interest to have the posterior pdf for the quantity ea + 4o.. To obtain
this pdf we introduce the following transformation: (7.41) vx = ax and v. =
axa + 4a2, a transformation from the variables ax and aa to vx and v. with a
nonzero Jacobian that does not involve any of the variables. Using the
posterior pdf in (7.37), we have, for the posterior pdf for vx and va, 4 (7.42)
+ 2(v- &0( v'- v' &.)h] , 4 where hu, ho., and haa are elements of H, shown
in (7.39). Bivariate numerical integration techniques can be employed to
integrate (7.42) with respect to vx and to normalize the pdf. The result is the
normalized marginal posterior pdf for va = z �' + 4a, denoted p(valy). This
distribution can be employed to make inferences about the quantity z a +
4ea and thus about whether or not the solution is oscillatory. To illustrate
application of these techniques, we have generated data from the model
shown in (7.33) under the conditions given in Table 7.1. In each run the u[s
were independently drawn from a normal population with mean zero and
variance one. The sample size for each run was T = 20 and the initial values
y_ and Y0 were set at zero in all three runs. Table 7.1 Value of Value of Run
z a Properties of Solution 0.500 -- 0.750 1.250 --0.375 1.600 - 0.550
Oscillatory and nonexplosive Nonoscillatory and nonexplosive
Nonoscillatory and explosive The contours of the posterior pdf p(ex, eo.[y)
for runs A, B, and C are shown in Figure 7.2 along with the mean values for
ex and o., namely, &x and &o., the least squares quantities shown in (7.38).
The computation of volumes above the regions shown in Figure 7.1
produced the results given in Table 7.2. These results square nicely with the
known properties of the solu- tions indicated in Table 7.1. In Run B it
should be noted that, from the true values of ex and ea, exa + 4zo. = r,
which is a small number. With only 20 observations it is difficult to make
precise inferences and this shows up in the results, namely, that the
probability that the solution will be nonoscillatory

198 TIME SERIES MODELS: SOME SELECTED EXAMPLES Run A 2


0.6011 -0.7895 Run B 2 -1 -- 1.2o64 o 2 -- -o.2941 2 Run C a = 1,4551 a 2 -
0.3645 Figure 7.2 Contours of posterior distributions. ANALYSIS OF A
SECOND ORDER AUTOREGRESSIVE PROCESS Table 7.2 199
Probability that the solution is Run Nonos�illatory Os�illatory
Nonos�illatory Os�illatory and and and and Nonexplosive Nonexplosive
Explosive Explosive A 0.000 0.866 0.000 0.134 B 0.593 0.242 0.162 0.003
C 0.025 0.016 0.953 0.006 1.0 0.5 I I I Run A -6.0 -4.0 -2.0 0 1.0 I I Run B
I -1.0 0.0 1.0 2.0 "2 I i i 1.0- 0.5- -0.5 0.5 1.5 I Run C I 2.5 >' I I -6.0 -4.0
-2.0 0. /)2 .oo - - :3 o I -1.0 0.0 1.0 2.0 /)2 -0.5 I I 0.5 1.5 2.5 u 2 Figure 7.3
Posterior distributions for

200 TIME SERIES MODELS'. SOME SELECTED EXAMPLES is 0.593


+ 0.162 = 0.755, whereas the posterior probability that it will be oscillatory
is 0.245, which is fairly substantial. Last, the posterior pdf for vo. = ax a +
4aa was computed for each of the above runs. The results are shown in
Figure 7.3. We see that for Runs .4 and C there is clear-cut sample evidence
that shows whether the solution is or is not oscillatory. As mentioned above,
in the case of run B, vo. = exo. + 4-a =, a small number, and we see that a
non-negligible portion of the pdf is located over negative values. However,
this pdf does reflect accurately the information our sample has bearing on
vo.. In closing this section we emphasize the importance of analyzing the
dynamic properties of models and hope that generalizations of the methods
discussed, appropriate for a range of other models/8 will be forthcoming.
7.4 "DISTRIBUTED LAG" MODELS Since the pioneering work of
Koyck, x9 distributed lag models have been utilized in a number of
econometric studies. a� These models typically incor- porate lagged
effects arising from habit persistence, institutional or tech- nological
constraints, and/or expectations effects which link anticipations with
experience. Further, to conserve on the number of parameters involved in
distributed lag models, it is usually assumed that the coefficients of lagged
variables are not all independent but functionally related. This functional
relation serves to reduce the number of parameters required to represent
lagged responses to one or a few parameters. In what follows we take up the
analysis of distributed lag models. The first model we consider is (7.43) y, =
a )xt- + ut where the subscripts t and t - i denote the values of variables in
the tth and (t - i)th time periods, respectively, Yt is the observed random
"response" variable, ut is an unobserved random disturbance term, xt_ is the
value a given "stimulus" variable in period t - i, and and : are unknowt xo If,
for example, (7.33) were elaborated as follows, Yt = axyt- x + ao. yt- 9. q-
X(t)f5 + ut, t = 1, 2 .... , T, where X(t) is a 1 x k row vector of independent
variables and is a k x 1 vector of coefficients, it would be possible to obtain
the marginal pdf for e and ea and pursue the above analysis. x* L. Koyck,
Distributed Lags and Investment Analysis. Amsterdam: North-Holland,
1954. ao See, for example, Z. Griliches, "Distributed Lags: A Survey,"
Econometrica, 35, 16-49 (1967). DISTRIBUTED LAG" MODELS 201
parameters such that - < a < oo and 0 < : < 1. On subtracting Ay_ from both
sides of (7.43), we obtain (7.44) Yt = hYt-x + axt + ut - hut_ , which is the
form of the model usually considered for analysis. For (7.44) we assume
that the ut's have zero means and a common variance, ,a, and are normally
and independently distributed. Note then that the variance of ut- ,Xut_x is
,a(1 + :a), whereas the first order auto- covariance is given by E(ttt - ttt _ .)
(ttt _ . - ttt- 9.) --- - rra,;. All higher order autocovariances are zero, given
that the ut's are independently distributed� Thus for t = 1, 2,..., T the
covariance matrix for the first order moving average error term in (7.44) is
E(u - Au_0(u - :u-0' = aG, with (7�45) G = 71 + 2t a -: 1 + 2t a 1 + :a all
entries not shown are equal to zero. The joint pdffor y' = (yx, Ya,..., Y,),
given Y0, isaX (7�46) go) oc or T 1 exp -'a (Y - ?Y _x - ax)'G-X(y - Ay_x
- ax)], where y_'= (Yo, Y, Ya,..., Y,-0 and x'= (x, xa,..., x,). Our diffuse prior
pdf for the parameters is 1 (7�47) p(:, a, o) oc -. Then the posterior pdf for
the parameters is given by (7.48) [ 1 -ax)'G-(y ;y_ ax)] p(A,., IY, Yo) oc r�
exp --- (y - :y_ - - � We can simply integrate (7.48) with respect to, to
obtain (7.49) p(?t, a[y, Yo) oc [(y - Ay_ - ax)'G-(y- 2ty_ - ax)] 'ta' which is
the joint posterior pdf for the parameters A and a. Employing ax This
analysis is similar to that in Thornber, "Application of Decision Theory to
Econometrics," 1oc. cit., except that our matrix G differs slightly from the
one he uses.

202 TIME SERIES MODELS: SOME SELECTED EXAMPLES bivariate


numerical integration techniques, the normalizing constant can be evaluated
and the bivariate pdf employed to make joint inferences about and . The
marginal posterior pdfs for and can also be obtained numerically. �.�. As
an elaboration of (7.43) we may entertain the hypothesis that our data are
generated by = + {=0 with a, ;, Yt, and xt-6 as defined above. Here we
assume that the response to current and lagged disturbance terms takes the
same form as that to current and lagged x's and involves the same parameter
;. Subtraction of Yt- from both sides of (7.50) yields (7.51) y, = hyt- + axt +
ut. Now, if we make the standard assumptions about the u[s, as above, and
assume that we have t = 1, 2,..., T observations, then, with Y0 given, the
likelihood function is 1 [ 1 -.x)'(y hy_x - .x)]. (7.52) l(h,e,,yly, y0) cr exp --
(y - hy_x - Using the prior assumptions in (7.47) to form the joint posterior
pdf and integrating with respect to ,, we have (7.53) P(h, '4Y, Y0) oc [(y .- ?
ty_x - ax)'(y - ?ty_x - ex)] -v/a, which would be in the form of a bivariate
Student t pdf were it not for the fact that we have assumed 0 < < 1. With
this restriction (7.53) can be analyzed numerically to make joint inferences
about and and to obtain the marginal posterior pdf's a3 for and for ). On
some occasions we may wish to broaden the model to accommodate the
possibility that the u[s in (7.51) are possibly autocorrelated; for example,
suppose that (7.54) ut = put-x + t, t = 1, 2,..., T, where the t's are normally
and independently distributed, each with mean 9.9. As regards the marginal
posterior pdf for ,, it can alternatively be obtained from (7.49) by
completing the square on a in the denominator and using properties of the
univariat� Student t pdf to integrate with respect to a analytically. The
result is a univariate posterior pdf for , which can be analyzed numerically.
aa Alternatively, the marginal pdf for , can be obtained by completing the
square on in (7.53) and integrating with respect to it by using properties of
the univariate Student i! tpdf. "DISTRIBUTED LAG" MODELS 203 zero
and common variance r ', and p is a parameter of the first order auto-
regressive scheme assumed to generate the u[s. Then, on combining. (7.51)
and (7.54), the result is (7.55) Yt = (, + P)Yt-x - phyt_a + a(xt - PXt-O + ,.
An alternative way of arriving at (7.55) is to assume . that the disturbance ut
- ;u_x in (7.44) satisfies (7.56a) ut- AUt-1 = P(Ut-z -- AUt_.) + e t or
(7.56b) ut = (p + ;)ut-x - p;ut_. + et, a fairly general second order process.
On combining (7.56) with (7.44), we are led to precisely the equation
shown in (7.55). Thus it appears that the assumptions leading to (7.44),
coupled with those in (7.56), are equivalent to those underlying (7.50,
coupled with those in (7.54). To analyze (7.55)s we adopt the following
diffuse prior pdf: (7.57) p(h, a, p, r) oc _1, where -oo < p, a < o, 0 < h < 1,
and 0 < r < oo and r �' is the common variance of the q's. Given these prior
assumptions, the assumptions about the q's in (7.55), and two initial values,
Yo and y_ x, the posterior pdf for the parameters is p(h,,p, rly )oc 1 ( 1 v
(7.58) r-- exp -- ]r t= [Yt -- 'Yt- x -- where y' -- (y_ x, Yo, Yx, �.., Yv).
Integration with respect to r yields - TI2 �(h, , PlY) oc (7.59) - eye- - - Py,-
- (xt - ex_ O] It is seen from the second line of (7.59) that the conditional
pdf for We subtract PYt-x = pAyt_ + Pext-x + PUt-x from (7.51) to obtain
(7.55). The following assumption is utilized in W. A. Fuller and J. E.
Martin, "The Effects of Autocorrelated Errors on the Statistical Estimation
of Distributed Lag Models," J. Farm Econ., , 71-82 (1961), and A. Zellner
and C. J. Park, "Bayesian Analysis of a Class of Distributed Lag Models,"
Econometric Ann., Indian Econ. J., 13, 432-444 ::(1965). ao This analysis is
presented in A. Zellner and C. J. Park, 1oe. eit.

204 TIME SERIES MODELS: SOME SELECTED EXAMPLES given t,


would be in the bivariate Student t form were it not for the fact that we have
assumed that 0 < ; < 1. Below we show how sensitive inferences about ; and
a are to what is assumed about p by analyzing the conditional pdf, p(;, alt,
y), numerically. From the first line of (7.59) we see that we can complete
the square on and integrate with respect to it analytically, an operation that
yields p(A, IY) (Yt-, - Ay,_e - x,_x)el (7.60) x Z (y - y_ - axt) IX (Yt - hyt-x
- axt)(yt-x - hyt-a - axt-x)la} - Z - - 0 ' To determine how sensitive
inferences are to incorrect assumptions about the parameter p we have
computed the conditional posterior pdf's, p(lo, y) and P(A[po, Y) [see the
second line of (7.59)], for selected values of po from the 6 3 2 I I I I I I I T
= 30 T = 20 5O 2O 10 I I 0.75 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 0,85 0.90 T
= 40 T = 30 fT = 20 I\.k--" r = I 0.80 Figure 7.4 Marginal posterior
distributions for a and ; computed from data generated with p = -0.5.
"DISTRIBUTED LAG" MODELS 205 6 5- 4- 3- 2- 1- I I I T= 20 T= 30
T=15 20- - 15 _T = 30 T = 20-,,,,,,& 10- - 2.6 2.8 3.0 3.2 3.4 0.70 0.80 0.90
Figure 7.5 Marginal posterior distributions for a and A computed from data
generated with p = 1.25. same data (T = 20) used to compute the
distributions in Figures 7.4 and 7.5. These conditional pdfs are shown in
Figure 7.6 for just the data set generated with p = -0.5. The sensitivity, with
respect to location and spread, of these posterior pdfs to what is assumed
about p is striking28 It is vividly seen that an incorrect assumption about p
can vitally affect an analysis. Thus, when it is suspected that p differs from
zero, we recommend that the marginal posterior pdf's for and 2, rather than
the conditional pdf's be used to make inferences. Last, we note that since we
have the joint posterior pdf for and A; P(a,, AlY), it is not difficult in
general to derive the distribution of a function .. of the variables a and 2; for
example, in some problems interest centers on 'the "long-run" quantity ,/=
a/(1 - 2 0. To derive the posterior pdf for ,/ we change variables in p((, AlY)
from and A to ,/and A. The Jacobian of this i: transformation is 1 - A, which
is different from zero for 0 < A < 1 Thus aa The same result applies to the
conditional posterior pdf's based on the second data i..sOt for which the true
p = 1.25. In fact, greater sensitivity is shown in this case.

206 TIME SERIES MODELS: SOME SELECTED EXAMPLES


APPLICATIONS TO CONSUMPTION FUNCTION ESTIMATION 207
Figure 7.6a Conditional posterior distributions for computed from data
generated with p -- -0.5. Figure 7.6b Conditional posterior distributions for
A computed from data generated with p --- -0.5. the marginal posterior pdf
for , say g([y), is given by performing the follow- s[ I I I I ing integration
with respect to ; numerically' (a) = -o.s where c is a normalizing constant
that can be evaluated numerically. Having a g(*/lY), we can use it as a basis
for making inferences about the "long-run" o = 0.0 t t -- parameter ,/= /(1 -
). ,_-o.,/-,x/ / 2 ' 7.5 APPLICATIONS TO CONSUMPTION FUNCTION
ESTIMATION �'� 12 Let (7.62) C = k Yt* + ut � be our consumption
function, where, for the,t, th period, t = 1, 2,..., T, Ct is measured real
consumption, Y** is "normal real income, k is a parameter whose value is
unknown, and ut is an error term. Assume that "normal" income satisfies
(7.63a) Y* - Y*_ = (1 - )(Yt - Y*_ ) or from (7.63a), on successive
substitution for lagged values of Y**, (7.630) Y* = (1 - ,X)(Ye + ,XYe_ +
,XuYt_o. +...+ ,VYe_, +...), where the parameter ; is assumed to have a
value such that 0 < ; < 1. On combining (7.62) and (7.63b), we obtain 3�
(7.64) C = C_ + k(1 - )Y + u - ut-, which is the basic equation analyzed
under the assumption that Yt is an exogenous variable; that is, we abstract
from "simultaneous equation" complications. As regards the disturbance
term in (7.64), ut- hut-x, we entertain the 0.70 0.80 0.90 .0 following
assumptions: Assumption I. ut - ,ut- = qt, t = 1, 2,..., T, with the qt's
normally and independently distributed, each with zero mean and common
variance ,xo.; that is NID(0, ,xa). a0 This section draws on results in A.
Zellner and M. S. Geisel, "Analysis of Distributed Lag Models with
Applications to Consumption Function Estimation," invited paper presented
to the Econometric Society meeting in Amsterdam, September 1968, and to
appear in Econometrica.' a0 Substitute from (7.63b) in (7.62). Then subtract
ACt_ from both sides to obtain (7.64).

208 TIME SERIES MODELS' SOME SELECTED EXAMPLES


Assumption II. The ut's in (7.64) are NID(0, Assumption llI. The ui's in
(7.64) satisfy a first order autoregressive scheme, ut = put-x + 3t, with the
%[s NID(0, aaa). Assumption IV. The error term in (7.64) satisfies ut- ;ut-x
= y(ut-x- ;ut-a) + q, with the q[s NID(0, It should be noted that if the
parameter p in Assumption III were equal to ; III would be
indistinguishable from I. Similarly, if p = 0, III and II would be equivalent
assumptions. Also, if y = 0 in IV, IV would be equivalent to I. We now turn
to the analysis of (7.64) under Assumptions I to IV by using U.S. quarterly
price-deflated, seasonally adjusted data on personal disposable income and
consumption expenditures, 19471-1960IV? Under Assumption I the joint
pdf for the observations is a (7.65) p(C[,k, 0 oc-- exp 2o. [C- )Ct- - k(l -
)OYt] �' , 0'1T where C' = (C, Ca,..., C�). With respect to prior
assumptions about the parameters ;, k, and r, we assume that ) 0<A,k< 1,
(7.66) p(A, k, ) oc 0 < < oo. In (7.66) we assume that the parameters are
independent, with uniform pdf's a3 on ;, k, and log a. Note that the prior
information 0 < ; < 1 and 0 < k < 1 is being employed. On combining (7.65)
and (7.66) and integrating with respect to r, the joint posterior pdf for ; and
k is (7.67) p(,klC) oc [c - c_x - k(1 - )Yt] a , o < ,k < 1. It is interesting to
observe that the conditional posterior pdf for , given k, and the conditional
posterior pdf for k, given , are in the form of truncated univariate Student t
pdf's. Since these pdf's are truncated, analytical deriva- tion of the marginal
posterior pdf's for ; and k is complicated. In view of this fact bivariate
numerical integration techniques were employed to obtain marginal pdf's by
using the U.S. quarterly data, 19471-1960IV, referred to above. Some
features of the posterior pdf's are described in Table 7.3. From ax The data
are given in Z. Griliches, G. S. Maddala, R. Lucas, and N. Wallace, "Notes
on Estimated Aggregate Quarterly Consumption Functions," Econometrica,
30, 491-500, pp. 499-500 (1962). aa Throughout this section we go forward
conditional on given initial observations; for example, Co in (7.65). aa See
below for use of nonuniform prior pdf's for , and k. APPLICATIONS TO
CONSUMPTION FUNCTION ESTIMATION 209 the results it is seen that
the posterior pdf for k is rather sharp, whereas that for ; has a much greater
spread. Also, the results support the belief that ; has a value markedly
different from zero. 3 Table 7.3 POSTERIOR MEASURES ASSOCIATED
WITH THE MARGINAL POSTERIOR PDF'S OF (7.67) BASED ON
ASSUMPTION I Posterior Marginal Posterior Marginal Posterior Measure
Pdf for , Pdf for k Mean 0.759 0.959 Modal Value 0.78 0.95 Variance
0.0074 0.00021 Next we take up the analysis of (7.64) under Assumption II
about the error terms. Under this assumption the joint pdf for the
observations is k, oc [al { (7.68) a . exp - 1 2,aa [C- ,C_z- (1 - ,)kY]'G -t x
[C- ,C_x- (1 - ,)kY]), where C_ ' = (Co, C,..., C,_ 0, Y' = (Y, Y.,. �., Yr),
and G is the band matrix shown in (7.45). As regards prior assumptions
about the parameters, we employ (7.66), with ax replaced by aa. Using
Bayes' theorem to combine (7.66) and (7.68) and integrating with respect to
ra, we have the following as the joint posterior pdf for A and k under
Assumption II' (7.69) lc) oc ([C- IC_- (1 - )QkY]'G-x[C- C_- (1 - )kY]}
with 0 < I, k < 1. This posterior pdf was analyzed numerically by using
bivariate numerical integration procedures a and the quarterly data, 19471-
a, If ,t = 0, from (7.63a) we have Y** = Yt and thus (7.62) becomes C = k
Yt + ut, which is a form of the absolute-income hypothesis. Finding ,t : 0 is
thus important with respect to assessing the empirical validity of the
absolute-income hypothesis. a For each value of , the method given in R. S.
Varga, Matrix Iterative Analysis, Englewood Cliffs, N.J.: Prentice-Hall,
1963, p. 195, was used to compute the denomina- tor of (7.69). In A.
Zellner and M. S. Geisel, 1oc. cit., the following approach was also used to
analyze the model under Assumption II. With *h = Ct - ut, we have, from
(7.64), */t = A*/t- + k(1 - A)Y t = ;t,/o + (I - )t)kZt(A) and then Ct --t*1o +
(1 - ,)kZ,(A) + ut, where Zt(A) = Yt + A Yt_i +...+ - iYx. With the ut's
NID(0, aaa), the likelihood function for ,, k, o, and ,a can be formulated and
combined with a prior pdf for the parameters. Note that this approach
involves use of a parameter, %, associated with initial conditions.

210 TIME SERIES MODELS: SOME SELECTED EXAMPLES 1960IV;


the results are given in Table 7.4. In this instance the marginal pdf's for the
parameters were found to be quite different from those encountered with
Assumption I about the error terms. In fact, the marginal pdf's were found
to be bimodal, a6 and use of the prior information 0 < ;, k < 1 pro- duced
serious truncation of the posterior pdf's. These results point up the fact that
assumptions about error terms can influence the results of an analysis
significantly. In the present instance there is evidence that Assump- tion II,
in combination with the other assumptions embedded in our model, is not
supported by the data? Table 7.4 POSTERIOR MEASURES
ASSOCIATED WITH THE MARGINAL POSTERIOR PDF'S OF (7.69)
BASED ON ASSUMPTION II Posterior Marginal Posterior Marginal
Posterior Measure Pdf for , Pdf for k Mean 0.508 0.948 Modal Value 0.38
and 0.90 0.94 and 1.0 Variances 0.0643 0.0004 In the analysis of (7.64)
under Assumption III it is convenient to note that, with */s = Ct - us, (7.64)
can be expressed as */t = :*/t- + k(1 - :)�s, or - to*/t- = (*/t- - to*/s-2) + k(l
- )(Ys- t o Ys-0, which leads to (7.70a) ,/,(to) = A%/o' + (1 - A)k[Y(to) +
AYs-(to) +'" + A- or (7.70b) where toC_, and Zt(to,)O = Y(to)+ Y-(to)+"'+ -
Y(to)' Then the joint pdf for the observations under Assumption III is
APPLICATIONS TO CONSUMPTION FUNCTION ESTIMATION 211
which, when combined with (7.71), yields the joint posterior pdf for the
parameters. The posterior pdf can be integrated analytically with respect to
%', to, and % to yield the following marginal bivariate posterior pdf for :
and k: [R,Ri-A ) 0 < ,X < 1, (7.73) p(:, klC) oc [(v- R)'(v- R)I {"�'>/2 0 <
k < 1, where v' = (v, vo,..., vu..., v.), with vs = Ct - (1 - :)k Y.:0 :'Y_, R is a T
x 2 matrix, with tth row = -- -- Y4--o Yt-- x], and = (R'R)-R'v. The pdf in
(7.73) was analyzed numerically; the results �6 are given in Table 7.5.
Table 7.5 POSTERIOR MEASURES ASSOCIATED WITH THE
MARGINAL POSTERIOR PDF'S OF (7.73) BASED ON ASSUMPTION
III Posterior Marginal Posterior Marginal Posterior Measure Pdf for , Pdf
for k Mean 0.597 0.878 Modal Value 0.61 0.94 Variance 0.0338 0.0403
Next,' we take up the analysis of (7,64) under Assumption IV about the
error terms. The joint pdf for the observations is given by �9 (7.74) p(ClA,
k, y, ) oc -- exp e'e ' 2 2 ' cs(to) = + (1 - to) + t(P) t pt-x, o(P) = o' Yt(p) = Yt
-- pYt-:, Ct(p) = Ct - where e = C - AC_: - (1 - A)kY - y[C_: - AC_ - (1 -
A)kY_:], = - ' T x I vector. The following prior pdf was employed in the
present case: [G(P)- :%'-(1 - )kZs(, p)]}. (7.71) p(CI:, k, O'3 T t= 1 As prior
pdf we use (7.72) p(A, k, 0<,k<l, 0 < c3 < O0, --OO < */o', to < O0, a6
Zellner and Geisel, 1oc. cit., found that the likelihood function was also
bimodal with a global maximum at , = 0.963 and k = 1.129. This large value
for k is indeed hard to believe and probably arises because Assumption II is
inappropriate. a7 See Chapter 10 in which posterior probabilities associated
with the model under various assumptions about error terms are presented.
a } 0<,k<l, 1 (7.75) p(2, k, y, ) oc 0 < < oo, -oo < y < oo. On combining
(7.74) and (7.75) by means of Bayes' theorem, the posterior pdf for the
parameters can be integrated analytically with respect to , and , to yield the
following bivariate posterior pdf for : and k: {IL - - (1 - (7.76) klC ) oc
{ZL- [G - - (1 - )OkYt ' ' - - - (1 - rd'} as See the Zellner-Geisel paper for
additional discussion and plots of these posterior pdf's. 39 Thais analysis is
similar to that presented for (7.44) combined with (7.56).

212 TIM SERIES MODELS: SOME SELECTED EXAMPLES where 0 <


2`, k < 1 and where The posterior pdf in (7.76) was analyzed numerically
by employing U.S. quarterly data. The results are broadly consistent with
those above which show 2` to be quite different from zero. However, the
mean for 2` in Table 7.6 is somewhat different from those already reported,
thus indicating that results are somewhat sensitive to what is assumed about
the properties of error terms. Table 7.6 POSTERIOR MEASURES
ASSOCIATED WITH THE MARGINAL POSTERIOR PDF'S OF (7.76)
BASED ON ASSUMPTION IV Posterior Marginal Posterior Marginal
Posterior Measure Pdf for Pdf for k Mean 0.769 0.959 Modal Value 0.81
0.95 Variance 0.00937 0.00183 We have worked with relatively diffuse
prior pdf's and thus have permitted our posterior pdf's to reflect mainly the
information in the data. To illustrate how information in the data affects
other than relatively diffuse prior beliefs assume that individuals A and B
have differing prior beliefs. Both agree that 2` and k are a priori
independent but disagree with respect to the value of 2,. Assume that their
prior pdf's are 4� (7.77) PA(2`, k, ,x) oc 2`a4(1 - 2`)8�kS�(1 - k)S} 0 <
2`, k < 1, % 0 < % < for A and (7.78) ps(A, k, ,) A(1 - A)kS�(1 - k)�.} 0
< A, k < 1, for B. Both A and B use the same prior assumptions regarding k
and ,. For k their prior pdf is a beta pdf with mean 0.9 and variance
0.00089, whereas for , they employ the same diffuse pd[ As regards both
use a prior pdf in the beta form but with different prior parameters. A has
chosen the parameters of his prior pdf for A in (7.77) to provide a prior
mean equal to 0.7 and prior variance 0.0041; B has assigned his prior
parameters values such that the prior mean of A is 0.2 and its variance is
0.0146. 4o We shall assume that A and B are working with (7.64) under
Assumption I about the error terms. SOME GENERALIZATIONS OF THE
DISTRIBUTED LAG MODEL 213 When we combine the prior pdf's in
(7.77) and (7.78) with the likelihood function in (7.65), we can see how the
information in our U.S. quarterly data changes the prior beliefs of d and B,
as represented by (7.77) and (7.78). In particular, on multiplying (7.65) and
(7.77) or (7.78) and integrating analy- tically with respect to ,, we arrive at
bivariate posterior pdf's for 2` and k for d and B. These posterior pdf's were
analyzed numerically with the results shown in Table 7.7. Table 7.7
MARGINAL POSTERIOR PDF'S FOR 2` AND k FOR PRIOR PDF'S
FOR A (7.77) AND B (7.78) UNDER ASSUMPTION I a Marginal
Posterior Marginal Posterior Pdf for A Pdf for k Posterior Measure A B A B
Mean 0.704 0.581 0.947 0.941 Modal Value 0.71 0.59 0.95 0.94 Variance
0.0025 0.0066 0.00004 0.00003 a The prior means and variances are EA =
0.7, VaT A = 0.0041 for A and EA = 0.2 and VaT A = 0.0146 for B. Both A
and B have assumed a priori that Ek = 0.9 and VaT k = 0.00089. The
information in Table 7.7 reveals that the sample information has
moved'both A and B toward a higher value for the parameter k as compared
with prior expectations. Also, the sample information has resulted in con-
siderable reduction in the variance of the pdf for k. As regards 2`, the views
of A and B are brought more in agreement by the sample information. A
priori, A gave 2` a mean of 0.7, whereas B assigned a value equal to 0.2.
Their posterior pdf's for 2` have means equal to 0.704 and 0.581,
respectively. Thus the common sample information, when combined with
the prior pdf's of A and B has diminished the difference of
opinion,regarding the value of 2`. 7.6 SOME GENERALIZATIONS OF
THE DISTRIBUTED LAG MODEL Many generalizations of the simple
distributed lag models can be analyzed. Here we take up just two. The first
involves the assumption that the data have been generated as follows:
(7.79a) y = t: 2`'xt_,,! + flo. 2`'xt_,, +... + fl, 2`'xt_,,, + ut t=0 t=O t=O or
(7.79b) y = h'X(t- i)p + t=0

214 TIME SERIES MODELS' SOME SELECTED EXAMPLES where


X(t- i)= (xt-.z, xt_.,...,xt_,), a 1 x k row vector, 15'= (/x,/%.,...,/), a 1 x k
vector of unknown coefficients, and ut is a dis- turbance term. In (7.79a, b)
we assume that yt is influenced by k variables, each with a specific
coefficient but with a common form and parameter for the lag structure. On
subtracting Aye_ x from both sides of (7.79b), we obtain (7.80) We now
assume that the disturbance term ut - hut-x satisfies (7.56), and on
combining (7.56) with (7.80) we have (7.81) Given that we have t = 1, 2,...,
T observations, that the q's satisfy the standard assumptions, that the initial
values Yo and y_ x are taken as given, and that our prior assumptions are
(7.82) i= 1,2,...,k, 0 < where v a is the variance of the [s, the posterior pdf
for the parameters is (7.83) p(h, , , vIY) < exp - [w - (X - eX_x)]'[w - (X -
eX-x)] � Here we have written (7.84) w where. y' = (yx,..., Yv), Y-x' = (Yo,
Y,..., Yv-D, Y-a = (y-x, Yo, Y,..., y-), X= : i : and X-x= : ! i � \X, x X, X,c/
\X,_ x,x x,_ x,9. X,_ x, From the form of (7.83) it is apparent that the
elements of 15 and r can be integrated out analytically. On integration with
respect to r, the following is the result: (7.85) p(h, 15, PlY) oc {[w - (X -
pX_015]'[w - (X - pX_x)15]} -ri�' Then, using properties of the
multivariate Student t pdf, we can integrate 4: with respect to 15, which
gives This assumes that (X - pX_ x)'(X - pX_ x) is positive definite for any
fixed value of See the appendix to Chapter 4 for the condition that ensures
this, a condition on the x's that is not very restrictive. SOME
GENERALIZATIONS OF THE DISTRIBUTED LAG MODEL 215 (7.86)
p(h, PlY) oc [Hl-'{w'[I - (X- where H = (X- pX_x)'(X- pX_x). The bivariate
posterior pdf in (7.86) can be analyzed numerically to evaluate its
normalizing constant, to make joint inferences about p and , and to obtain
marginal posterior pdf's for these parameters. As regards an element of 15,
say/gx, its posterior pdf can be obtained from (7.85) by integration with
respect to /ga,/a,...,/g to yield p(2,,/gx, PLY), a trivariate posterior pdf that
will have to be analyzed numeri- cally to obtain the marginal pdf for The
second generalization of the model to be considered is (7.87) Yt h'xt_, +
Z(t)� + u' except for the addition of the term Z(t)�, this is the model
considered previously. In (7.87) Z(t) is a 1 x k vector of observations on m
given independent variables and � is an m x 1 vector of coefficients.
Subtraction of hy-x from both sides of (7.87)yields (7.88a) Yt - hYt-x = ex,
+ [Z(t) - hZ(t - 1)]y + u,- hu,_x or = W15+u-hu_x, where y' = (Yx,..., Yv),
Y-x' = (Yo, Yx,..., Yv-x), x' = (xx, xa,..., xv), Z is a T x rn matrix of
observations on the given independent variables, W = (xiZ - LZ_ x), and 15'
= ( !�'). If we assume that the u[s are normally and independently
distributed with zero means and common variance and go ahead,
conditional on given Yo, the likelihood function is given by (7.89) [ ' ] v
exp -(y- hyx- W)'G-X(y- hy_x- W) , where Ge �' is the T x T variance-
covariance matrix for u - hu_ x, with G given explicitly in (7.45). With a
diffuse prior pdf, (7.90) 15, .) oc - O<A<I} oo < < oo 0<rr<oo i = 1, 2,..., k,
the joint posterior pdf is given by (7.91) p(h, 15,*ly, yo) oc IG[ - [ 1 (w -
W15)'G-(w - W15)] v+x exp -,-, ,

216 TIME SERIES MODELS: SOME SELECTED EXAMPLES where we


have written w = y - ;y_. Clearly (7.91) is similar to the expres- sion in
(7.48). It can be analyzed as follows: first integrate (7.91) with respect to a,
which yields (7.92) �(A, laJy, Y0) oc JGJ-[(w - WIS)'G-(w - WIS)] Note
that, for given A, the conditional posterior pdf for la is in the multi- variate
Student t form. This fact can be used to determine how sensitive inferences
about la are to what is assumed about A. To derive the marginal posterior
pdf for A we integrate (7.92) with respect to la to obtain 4 (7.93) where, it is
to be remembered, w = y - ;y_ and (7.94) = (W'G-W)-XW'G-w, which is a
function of ;. The univariate pdf in (7.93) can be analyzed numerically.
Finally, if interest centers on an element of la, say fix, the other elements of
la can be integrated out of (7.92) by using properties of the multivariate
Student t pdf to yield a bivariate posterior pdf, �(;, fi]y, Y0), which can be
analyzed numerically. To close this chapter it is important to emphasize that
the general prin- ciples of Bayesian analysis apply in the analysis of time
series models with no special modification required. Of course, it is
imperative that care be taken in formulating time series models to get an
adequate representation of economic phenomena; for example, to
approximate lag structures adequate- ly, to use appropriate assumptions
about initial conditions, and to employ appropriate functional forms. Given
that this representation has been made and that the likelihood function can
be formulated, as stated above, the analysis will proceed along usual
Bayesian lines. Point estimates, intervals, etc., can be derived by using the
principles presented in Chapter 2. APPENDIX DIFFUSE PRIOR PDF FOR
STATIONARY AUTOREGRESSIVE PROCESS In Section 7.1 we
considered the first order stationary autoregressive process given by Yt = fi
+ fiYt- + ut (1) fix ' - 1 - fi + fi2Zut-t' t= 1, 2,..., T, /=0 42 Note that (w -
W[)'G- X(w - W) = w'G- Xw + ' W'G- W - 2' W'G- Xw = w,G-Xw _ ,W'G-
XW + ( - )'W'G-W( - ), where is defined i- (7.94). APPENDIX 217 where
the ut's NID(0, o ) and 0 < I/e[ < 1. It is clear from (1) that 0 = fix/(1 - o.)
gives the level of the yt's. In what follows we reparameterize the model in
terms of fi and a; (2) = fie, 0 = 1 -/. that is (3) yt = (1 -/g)0 +/gyt_x + ut. The
likelihood function is (1 - l(fi, 0, *IY, Y0) oc orT+l (4) exp (- 1 [(1 - fi�')(y
o - 0) 2 + y. {y,- 0 - As will be recalled from Chapter 2, Jeffreys has
suggested that the square root of the determinant of the information matrix
be taken as a diffuse prior pdf, although not without great care and thought;
that is, Jeffreys' diffuse prior is given by (5) with (6) Inf = -E p(/, O, .) oc
IInfl, L OL OL - 0% O0 Off O0 ' _ where E denotes the expectation with
respect to the pdf for the observations and L = log l, the log-likelihood.
After (5) has been evaluated we can trans- form to the implied prior pdf for
fix, fia, tnd , using the relations in (2). To illustrate the operations that are
required to obtain Jeffreys' diffuse prior for this problem we first evaluate
the information matrix. From (4), L = const + � log (1 - fi) - (T + 1) log (7)
1 -2, {(1 - fi�')(y o - 0) ' + [Yt - 0- fi(Yt-x - 0)]�') � Then, with ttxe
quantity in braces in (7) denoted {. }, we have OL T+I+ 1 0"7 = {'}, 0% T+
1 3 = ,,, {'},

218 TIME SERIES MODELS: SOME SELECTED EXAMPLES and OL 1


(_ 2/9(y � _ 0) 2 _ 2 o) - - - o)). On taking expectations with respect to the
y's, we obtain _ 02L (S) = 2(T + 1). E O'L 02L 2 /9 rr 2 ' 'rr=O; and E0t rrl--
In deriving the results in (8), note that E(y,- 0)'= 2/(1- fi2) and E(yt - O)
(yt_x - 0) =/32/(1 -/92) for all values of t. Further aL = 1 (-2(1 -/92)(yo- 0)-
2 00 2o a a'L 1 /32 T(1 002 = - [1.- + - ,8)21, [y,- 0 -/9(Yt-x - 0)](1 -/9)},
and O.L 0/9 00 1( 2g a 4/9(yo - 0) + 2 [y- 0-/9(Yt-x - O) + (1 -/9)(Yt-x -
0)1). Taking expectations of these last two quantities, we obtain these
results' O'L 1 /9 ' T(1 - E 0/3 0-'--- = 0. (9) E- = - [1 - + /9) 2 ] and Last and
aL /9 1( 2. 2 -2/9(yo - 0) 2 - 2 [y,- 0 -/9(yt_ - 0)](yt_ - 0)), 1+/32 1[ ] (1 -
fi2)2 + (Yo- 0) 2-(yt-- 0) 2 ' The expectation of this last expression is _ 02L
1 ( 2/92 . (10) e--fi = 1 -/92 r + 1 -/321 APPENDIX 219 On collecting
results from (8) to (10), we evaluate the information matrix in (6) as . /9 0
2(t + 1) 2/9 0 o o *(1 -/92) 2/9 r + 2/92/(1 - /92) (11) Inf = rr(1 - rio.) 1 -/90.
0 /9 0 0 T(1 -/9)' + 1 -/92 0 From (11) we see that information about 0 is
independent of information about either. or/9. On evaluating the
determinant of the matrix in (11) and retaining just the dominant terms,
those in T D, we have the result 2(1 -/9)'/(1 -/92).4. On taking the square
root of this quantity, we obtain an approximate Jeffreys' diffuse prior,
namely, 1-fi 1 0<1/91<1. (12) p(fi, O, ) oc (1 -/32)� �ri' - The appearance
of the factor 1/ ' in this last expression rather than 1/ appears to result from
reasons similar to those discussed at the end of the appendix to Chapter 2?
If we follow Jeffreys and apply his principle separately for and for the other
parameters, that is, take the square root of the (1, 1) element of (11) to
obtain a diffuse prior for (r and then take the square root of the determinant
of the 2 x 2 information matrix for/3 and 0, retaining just terms in T ', the
result is (13) V(fi, 0, ) oc .(1 -/32) ' wherein the factor 1/ rather than 1/ '
appears. To obtain the prior for fi, fi., and note from (2) that dO = d/9/(1
-/92) and thus (14) P(fix, fi2, rr) oc (r(1 -/922)'A This approximate Jeffreys'
prior involves assuming that/9x, log , and/9. are independent, with the first
two quantities uniformly distributed and the last with a pdf proportional to
(1 -/322) - = (1 -/92)-�(1 +/92) -, a form of the beta pdf with parameters
(�, �). This pdf for/92 has its greatest density at the end points/92 = - 1
and/92 = 1 and a minimum at/92 = 0. .40 Here, if we had more than one 0,
for each additional 0, an additional 1/ factor would appear in the prior pdf.

220 TIME SERIES MODELS: SOME SELECTED EXAMPLES It is


interesting to ascertain the form of the "minimal information prior" (see the
appendix to Chapter 2) for the present problem. The pdf for an observation
is P(Yl, O, ,) = 1 (1-ri�') � Then the information in the data pdf is Iv = f o
P(Y[fi' O, rr) log p(y[, O, rr) dy log (1 - = � - � log 2rr. If we maximize
the prior average information in the data pdf minus the information in a
prior pdf, subject to the condition that the prior pdf be proper, the result is
(1 -/) (15) p(fi, 0, .) oc , 0 _< I/1 < 1. This prior pdf for fi, 0, and, involves
similar properties with respect to 0 and, as that in (13). 44 The form for fi is
somewhat different, however, in that (15) has a unimodal form with a
maximum at/ = 0 and falls to zero at fi = + 1. The factor 1 - fi in (13)
behaves in a similar fashion to (1 - (1 - fi)/(1 - rio.)�, however, has
decidedly different properties. QUESTIONS AND PROBLEMS 1. Assume
that yt is generated by a first order linear autoregressive scheme; that is, yt =
pyt-x + q, t = 1, 2,..., T, where the q's are NID(0, o). Specify a prior pdf for
p and a in the normal-gamma form; that is, p(p, ) = px(pl,)po.(,), with
px(p[,) oc 1/ exp [-(c/2,�')(p - )o.], -oo < p < oo, and po.(a) oc (,Vo+X)-x
exp (-VoSo�'/2,�'), 0 < < oo, where c, , Vo, and So �' are prior
parameters whose values are assigned by an investigator to reflect his prior
beliefs. (a) Given the above prior pdf for p and , what are the form and
properties of the marginal prior pdf for p ? (b) Assuming that yo is given,
derive the marginal posterior pdf's for p and and provide a summary of their
properties. 2. In Problem 1 formulate and plot a prior pdf for p which
reflects the prior beliefs that -1 < p < 1 and that the prior mean and variance
of p are 0.5 and 0L04, respectively. 44 Here we are assuming that the
ranges for a and 0 in (15) are extremely large. QUESTIONS AND
PROBLEMS 221 3. Show how the prior pdf in Problem 2 can be utilized in
an analysis of the linear autoregressive scheme described in Problem 1. 4.
Assume that the linear autoregressive scheme yt ----- PYt-x + gt, t = 1, 2, ...,
T, is known to have "started up" at t = - To, with To having a known value
and Y-to = 10. Derive the pdf of yo under these assumptions and with the
assumption that the q's are NID(0, ,2) for all t. 5. Consider the process yt =
pyt-i + ut, where ut = aut-x + et, p and a are parameters, and ut and t are
random error terms with zero means. Further assume that the t's are NID(0,
'). Note that we can write, yt - ayt- = p(Y t- - ay t- 0.) + t or Yt - pYt- = a(y
t- - pY t- 0.) + t. Explain the statement, "Without additional prior
information about and f, these parameters are not identified." Then provide
several examples of prior information about f and/or a that is sufficient to
identify these parameters. 6. Let yt = Y.["--o wxt-t + t, where w is an
unknown weight parameter for xt-, i = 0, 1, 2,..., m, where m's value is
known, xt-t denotes the value of an independent variable in time period t - i,
and t is a random error term. In the Aimon approach it is assumed that wt
can be approximated by a polynomial in i; for example, if we use a second
degree polynomial, we have wt = yo + yi + y2i 2, i = O, 1, 2,..., m, which
when substituted in the expression for yt above yields tO t=0 1--0 Now
assume that the q's are NID(0, ,0.) and that we are employing a diffuse prior
pdf, p(yo, yx, y0., *) oc l/a, with the y's ranging from -oo to +oo and 0 < , <
oo. Derive the joint posterior pdf for the y's, given sample observa- tions on
Yt, t = 1, 2,..., T and x-(,,-x), x_(_0.), . . ., Xo, xx, . . ., xv. What is the
marginal posterior pdf for yx ? 7. In Problem 6 derive the joint posterior pdf
for Wo = yo, wx = yo + yx + ya and w. = yo + 27x + 470. and explain how
to construct a 95% Bayesian confidence region for Wo and wx. 8. Often in
applying the Aimon technique, see Problem 6, it is assumed that w-x = yo -
y + 70. = 0 and w+x = yo + (m + 1)7x + (m + 1)�'72 = 0. To investigate
these assumptions, use the results of Problem 6 to derive the joint and
marginal posterior pdf's for w-x = yo - y + ),0. and w,+x = yo + (m + 1)yx +
(m + 1)2721 If the posterior probability density is very low in the vicinity
of the point w_ x = 0, what, if anything, does this imply about the
assumption that w_ x = 0 ? 9. Consider the following "multiplier-
accelerator" model: (i) Ct = Yt- + ut (ii) It = ft(Yt_ - Yt-o.) + vt t = 1, 2,...,
T, (iii) Yt = G + It where C = consumption, It = investment, Y = income,
and/ are scalar parameters, and ut and vt are random disturbance terms. By
substituting from

222 TIME SERIES MODELS: SOME SELECTED EXAMPLES (i) and


(ii) in (iii) write down the "final equation" for Ys. After providing
assumptions about the error terms' properties and a prior pdf for the
parameters, explain how to obtain the joint and marginal posterior pdf's for
a and fl. 10. Show how the posterior pdf for a and fl in Problem 9 can be
employed to make probability statements about properties of the solution to
the final equation for Yt obtained in Problem 9. 11. Let net expenditures on
housing in the tth period Es be assumed to satisfy Es = Ht- Hs-x = fl(Hs* -
Hs-x) + q, t -- 1, 2,..., T, where fl is an adjustment parameter, Hs-x is the
given stock of houses at the end of period t - 1, s is a random disturbance
term, and Hs* is the desired stock of houses for the tth period. Since H* is
unobscrvable, assume that H* = ao + axxs, where xt is an observable
independent variable and ao and ax are parameters with unknown values.
On substituting this expression for H* in the equation for Es, we have Es:
tiao + flaxs - flHt- + t : Yo + yxx + y2Hs-x + , t: 1, 2,..., T, where yo = tiao,
yx = flax, and y2 = -fl. Viewing these last three relations as a transformation
from yo, yx, and y2 to ao, ax, and fl, we need the Jacobian of the
transformation to be different from zero for the transformation to be regular
or "one-to-one." Evaluate the JacobJan and show that the required condition
is/' 0. Then, under the assumption that the t's are NID(0, o a) and given the
initial observation Ho, what are the ML estimators for yo, y, and y. ? From
the ML estimators for the y's, obtain ML estimators for ax, and fl and
comment on their sampling properties in large samples. 12. If, in Problem
11, we employed the following diffuse prior pdf, P(yo, yx, y, oc 1/, with -or
< y < oo, i = 0, 1, 2, and 0 < , < 0% derive and comment on the properties
of the implied prior pdf for ao, ax, fl, and ,, using yo = tiao, yx = flax, and y.
= -fl. Then obtain the joint posterior pdf for yo, yx, and y. Show that the
marginal posterior pdf for fl = -y. is in the form of a univariate Student t pdf.
Noting that the marginal posterior pdf's for ao = -- Yo/y. and ax = - Yx/Y.
are in the form of the ratio of correlated Student t variables, show that these
quantities will be distributed as the ratio of correlated normal variables in
large samples. 13. Write the equation for Et in Problem 11 as follows' Et:
fl(ao + alXs - Hs-1) + Explain how ML estimates of fl, ao, and ax can be
obtained by minimizing Y. tv_-x t', using a two-parameter search over
values of ao and ax. Will this procedure yield the same ML estimates
obtained in Problem 11 ? 14. In Problem 13 assume that the prior pdf for fl,
ao, ax, and a is given by p(fl, ao, ax, ) oc Px(fl)/a, 0 < a < oo, and -oo < at <
oo, i = 0, 1. Derive the conditional posterior pdf for ao and ax, given fl and
a. What condition on fl is required for this conditional pdf for ao and al to
be proper? QUESTIONS AND PROBLEMS 223 15. In Problem 11
suppose that we have no data on Et, net expenditures, but just on gross
expenditures, Gt--Es + Rs, where Rt represents replacement expenditures. If
it is assumed that Rs -- 8Ht_x, where 8 is a depreciation parameter, we have
Gs = tiao + fia.x + (8 - )m,-x + t. With 8's value unknown, are the
parameters of this equation identified'? Alternatively, if 8 has a known
value, say 8 = 80, explain how ML and Bayesian estimates of fl, ao, and ax
can be obtained. 16. In Problem 15 formulate an informative prior pdf for
the parameter and show how it can be utilized in deriving marginal
posterior pdf's for the parameters in the equation for G.

THE TRADITIONAL MULTIVARIATE REGRESSION MODEL 225


CHAPTER VIII Multivariate Regression Models In many circumstances in
economics and elsewhere we encounter sets of regression equations. What
is more, it is often the case that the disturbance terms in different equations
are correlated; for example, if we have a set of regression relations
pertaihing to firms in the same industry, it is probably the case that the
disturbance terms in one firm's regression equations are correlated with
those in the equations of other firms. x This may be so because firms in the
same industry generally experience common random shocks. Or, if we have
a set of consumer demand relations, it is the case that disturbance terms in
different demand relations are often correlated. ' It is important that
nonindependence of disturbance terms be taken into account in making
inferences. If this is not done, inferences may be greatly affected. In this
chapter we first analyze the traditional multivariate regression model. After
that, attention is directed to the interpretation and analysis of the "seemingly
unrelated" regression model, which is, in certain respects, somewhat more
general than the traditional model and has been used in quite a few
applications reported in the literature. 8.1 THE TRADITIONAL
MULTIVARIATE REGRESSION MODEL We assume that our
observations Y = (yx, y.,..., Y,), an n x m matrix of observations on m
variables, have been generated by the following model: (8.1) r = XB + U, x
This has actually been observed empirically. See below for an example. 2
See, for example, A. P. Barren, "Consumer Demand Functions under
Conditions of Almost Additive Preferences," Econometrics, 32, 1-38
(1964). * Two papers dealing with the analysis of this model from the
Bayesian point of view are S. Geisser, "Bayesian Estimation in Multivariate
Analysis," Ann. Math. Statist., 36, 150-159 (1965), and G. C. Tiao and A.
Zellner, "On the Bayesian Estimation of Multivariate Regression," J. Roy.
Statist. Sec., Series B, 26 (1964), 277-285. 224 where X is an n x k matrix,
with rank k of given observations on k inde- pendent variables, B = (isx,
ISm,..., ISm) is a k x m matrix of regression parameters, and U = (ux, u,.,...,
am) is an n x m matrix of unobserved random disturbance terms. We
assume that the rows of U are independently distributed, 4 each with an m-
dimensional normal distribution with zero vector mean and positive definite
m x m covariance matrix Z. Under these assumptions the pdf for Y, given
X, B, and Z is (8.2) p(rlX, B,Z) oc Izl-exp [-�tr(r- XB)'(Y - XB)Z-x],
where "tr" denotes the trace operation. Noting that (Y- xB)'(y- xB) = (r-
xt})'(y- xt}) + ( - B)'x'x( - t) (8.3) = s + (B- t})'X'X(B - t}), where (8.4) t =
(x'x)-x'Y, a matrix of least squares quantities, and (8.5) $ = (�- X/})'(_r-
X/}), a matrix proportional to the sample disturbance covariance matrix, we
can write the likelihood function for B and Z as follows: (8.6) I(B,I Y, X) oc
IZl -,' exp [-� tr Sy,-x _ � tr (B - t})'X'X(B - B)Z-x]. We assume that little
is known, a priori, about the parameters, the elements of B, and the m(m +
1)/2 distinct elements of Z. As our diffuse prior pdf, we assume that the
elements of B and those of Z are independently distributed; that is, (8.7)
p(B, Z) = p(B) p(Z). In (8.7), using the invariance theory due to Jeffreys, 5
we take (8.8) p(B) = const and (8.9) p(,) oc ICl -(m+ * This assumption
precludes any auto or serial correlation of disturbance terms. 5 See Harold
Jeffreys, Theory of Probability, Oxford: Clarendon, 1961, p. 179, and the
Appendix at the end of Chapter Two above.

226 MULTIVARIATE REGRESSION MODELS With respect to (8.9), we


note that in the special case in which m = 1, it reduces to 1 (8.10) p(,hl) oc ,
a prior assumption that we have employed many times. Also, it is
interesting to observe that if we denote e""' as the (t,/)th element of the
inverse of Z the Jacobian of the transformation of the m(rn + 1)/2 variables,
(fill, ff12,..., O'mm) to (an, ff12,.. ', o. mm), is I llm+l. (8.11) J = 0(-', a-,
amm) l = Consequently, the prior pdf in (8.9) implies the following prior
pdf on the rn(rn + 1)/2 distinct elements of Z-1: (8.12) a diffuse prior pdf
used by Savage, � who arrived at it through a slightly different argument,
and by others. In addition to Jeffreys' invariance theory's approach leading
to (8.12), Geisser 7 points out that (8.12) would result from taking an
informative prior pdf on Y.- 1 in the Wishart pdf form and allowing the
"degrees of freedom" in the prior pdf to be zero. 8 With respect to the
diffuse prior pdf's in (8.8) and (8.9) Geisser � also comments, "These
unnormed densities or weight functions presumably may be 'justified' by
various rules, e.g., invariance, conjugate families, stable estimation, etc., or
heuristic arguments. Although their utilization here does not necessarily
preclude other contenders which may also be conceived of as displaying a
measure of ignorance, it is our view that no others at present seem to be
either more appropriate or as convenient. The fact that their application
yields in many instances the same regions as those of classical confidence
theory is certainly no detriment to their use, but in fact provides a Bayesian
interpretation for these well established procedures." 8 L. J. Savage, "The
Subjective Basis of Statistical Practice," manuscript, University of
Michigan, 1961. ? S. Geisser, "A Bayes Approach for Combining
Correlated Estimates," J. Am. Statist. Assoc., 60, 602-607, p. 604 (1965). o
That is, if we take an informative prior pdf for E -x, p(-x)dY:-Xoc [y:,- x] -
vt2 x exp {-�tr[(rn- v + 1)A-t]}dY: -x, where A is positive definite, which
is in the Wishart form (see Appendix B), and let the degrees of freedom m -
v + I = 0, then v = rn + 1 and the Wishart form reduces to (3.12). With zero
degrees of freedom we have a "spread out" Wishart pdf which can serve as
a diffuse prior pdf in the sense that it is diffuse enough to be substantially
modified by a small number of observations. o Geisser, "Bayesian
Estimation in Multivariate Analysis," loc. cit. THE TRADITIONAL
MULTIVARIATE REGRESSION MODEL 227 With this said about the
diffuse prior pdf's in (8.8), (8.9), and (8.12), we combine them with the
likelihood function in (8.6) to obtain the following joint posterior pdf for
the parameters' (8.13a) x exp (-� tr [S + (B- B)'X'X(B-/})]Z -x) or (8.13b)
x exp {- � tr[S + (B- t})'X'X(B- From (8.13a) we can write with p(s, Y.I
�, x) = p(sly., �, x) p(Y.I �, x) (8.14) P(BIY" Y' X) oc lY.l-, exp [-( -
),y.-1 � X'X($ - 1)], where IS' = (lax', IS2',..., ISm'), ' = (1', 12',..., m'), and
� denotes Kronecker or direct matrix multiplication and (8.15) x) oc I1-,2
exp (-� tr Z-iS), withv=n-k+rn+ 1. It is seen from (8.14) that the
conditional posterior pdf for B, given Y., is multivariate normal with mean
and covariance matrix x� Y. � (X'X)-1. If interest centers on a particular
equation's ck>efficient vector, say 151, its conditional posterior pdf is (8.16)
1 (, ) p(isll y., Y, X) oc _- exp 211 (IS1 - i)'X'X(IS1 - 1) , which is, of
course, normal with mean vector 1 and covariance matrix (X'X)-ln. The
marginal posterior pdf for Z, given in (8.15), is in what Tiao and Zellner
call the "inverted" Wishart form. They show (op. cit., p. 280) that the
elements ofY. ll, the p x p upper left-hand principal minor matrix of Y., with
p < rn, xo Note that (-t � X'X)- ' = � (X'X)-, which can be verified by
direct matrix multiplication, and that I -x � X'X] A oc I]-et xx That is, x is
defined by a ' 2 l' See Appendix B for the analysis leading to the marginal
pdf for Z shown in (8.17).

228 MULTIVARIATE REGRESSION MODELS also have a posterior


pdfin the inverted Wishart form: (8.17) p(Z I Y, X) oc [Z[-tv-.(m-,)]. exp (-
� tr Z-$), where $ is the upper leR-hand principal minor matrix of fl. In
particar, ifp = 1, the posterior pdf for a is (8.18) p([ Y, X) which is in the
form of an inverted gamma pdf. If we had just one (m = 1) regression
equation, (8.18) would specialize to - 1 ( (8.19) From (8.18) we see that as
m increases the posterior pdf for *n becomes less and less concentrated
about . This is an intuitively pleasing result because, as m increases, a larger
part of the sample information is utilized to estimate *, *a, � �., *. In fact,
the exponent of ,tim in (8.18) differs from that in (8.19) by n-k+2- [n-k+m+
1-2(m- 1)] =m- 1. Thus we may say that "one degree of freedom is lost for
each of the m - 1 elements Further, on specializing (8.17) to the case p = 2,
we can follow the development in Jeffreys TM to obtain the posterior pdf
for the correlation coefficient = */(**), which is (8.20) r, x) (] _ with n' = n -
k - (m - 2), r = d(&) , and The result in (8.20), except for the changes in the
"degrees of freedom," is in the same form as that given by Jeffreys for
sampling from a bivariate normal population. To obtain the marginal
posterior pdf for a particular equation's coefficient vector, say lax, it is
pertinent to observe from (8.16) that the conditional posterior pdf for l,
given Y., depends only on tr. Thus from (8.16) and (8.18) we have for the
marginal posterior pdf for p(lal r, x) j' r, x) y., r, x) (8.22) xa Jeffreys, op.
cit., p. 174. See also Section 3 of Appendix B. p(l r, x) (8.23) THE
TRADITIONAL MULTIVARIATE REGRESSION MODEL 229 which is
in the form of a multivariate Student t pdf. This permits us to make
inferences easily about the elements of [S. If we had just one regression
equation, that is, rn = 1, (8.22) reduces to the result that we had earlier in
Chapter 3. With rn % 1 the only difference is a change in the "degrees of
freedom" due to the inclusion of the rn - 1 parameters *xo.,..., *xm in the
model. A slightly different way of looking at this problem is to consider the
case of rn regression equations with a diagonal covariance matrix
containing unknown variances on the main diagonal. Then, with our prior
pdf taken as p(B, *n, *xa,..., rrmm) OC H=i {Y/-i, the marginal posterior
pdf for lax would be' in the multivariate Student t form, as shown in (8.22),
but with exponent -n/2. In the case of an rn x rn nondiagonal covariance
matrix Y., that is, with correlated observations, there is less information in
the data than with uncorrelated observations, and this fact is reflected in
(8.22) by a reduction in the degrees of freedom relative to the uncorrelated
case. With respect to the joint posterior pdf for all the regression
coefficients B, properties of the Wishart pdf � can be utilized to integrate
(8.13b) with respect to the distinct elements of 2;-; that is 1 IS + (s-
/})'x'x(B- /})1 'exp{-� tr [S + (B-/})'X'X(B-/})]2;- x} I Y.- x 1{- a- )/0'
which yields (8.24) 1 Is + (s- /})1 since the integral in (8.23) is just equal to
the normalizing constant of the Wishart pdf which does not depend on the
parameters B. The joint pdf in (8.24) has been called the "generalized
multivariate Student t" pdf. Some properties of this pdf follow. First, we
have already shown that the marginal posterior pdf for a vector of
regression coefficients, say , is in the form of a multivariate Student t pdf
[see (8.22)]. We now show that if we express the joint pdf of B = (, ..., m) as
x' then each of factors on the rhs of (8.25) can be expressed in the form of a
multivariate Student t pdf. First we derive an expression for p(l$, 1So.,...,
See Appendix B. Below it is understood that we are taking X given. This is
proved in Tiao and Zellner, 1oc. cit.

230 MULTIVARIATE REGRESSION MODELS Im-l[ r)�($m]Sl, I., .. .,


$ra-1, I0. Note that the determinant in (8.24) can be expressed as $+ 0 :s+q
(8.26) IS + 121 = (s + q)': srara + qrara = IX + Cl[$rara + qram -- (S + q)'(X
+ 0)-1(S + q)], where Q = (B - t})'X'X(B - t}), + Q is the (m - 1) x (m - 1)
upper lefthand principal minor matrix of S + Q, s' = (sral, srao.,..., Sin<m-
1), and q' = (qml, qrao.,. �., qra<ra_ 1)). In the second factor in the second
line of (8.26) let us write (8.27) qt = %'Y, with Yt = X(gt - 1,), , l = 1, 2,...,
m, and (8.28) q' = Yra'Y, Cwhere y = (Y1, %,..., Ym-0. Using these
definitions and after some algebraic rearrangement, we have that IS + QI --
IX + Ql[cra + (�ra - [&)'Dra(�ra -- )1, (8.29) where and Dra = I- y(X +
0)-iT ', = Dra-ly(X + Cra = Sram -- We now make use of a theorem due to
TocheP e which says that if A is an m x n matrix and B, an n x m matrix,
then (8.30) (Im- AB) -1 = Im q- A(Ir- BA)-iB. Applying (8.30) and noting
that y'y = C), we obtain Dra- 1 = I + , - 17' , (8.3 la) (8.31b) and (8.31c) Cra
Thus Ca is the (m, m)th element of S- 1. Then the second factor on the rhs
of (8.29) is in terms of [cra + (yra - (8.32) with nra = ra + d$-1s and d = (11
- 1,..., lra-1 - ra-1). xe K. D. Tocher, "Discussion on Mr. Box and Dr.
Wilson's Paper," J. Roy. Statist. Sot., Series B, 13, 39-42 (1951). THE
TRADITIONAL MULTIVARIATE REGRESSION MODEL 231 Using
(8.29) and (8.32), we can write the joint pdf for B, shown in (8.24), as
(8.33) p(S I Y) oc [g + O]-,/�.{cra + (ra - lra)'X'DraX(ra _ lm))-n19.. The
determinant of the matrix X'DraX is IX'DraXI = Ix'xI I- (x'x)-lx'r(x + O)-
l/xI (8.34) We now make use of a theorem 17 which states that if A and B
are two n x n matrices then [I- AB] = II- BA]. This can be generalized to
the case in which A is an rn x n matrix and B is an n x rn matrix. Suppose
that m < n; then Using this result, we can write the second determinant on
the right of (8.34) as Hence I- a(X + O)-lr'xl = I- (X + 0)-lr'xal . (8.35)
IX'OraXI = Ix'xI 1I- ( q- Q)-l/r I --Ix'xI IXl IX q- 01-1. Consequently the
pdf in (8.33) can be written as (8.36) with (8.37) and p(BI �) = p(l, � �.,
lra-ll r) p(lll, � �., ra-1, Y) (8.38) p(ftmlll, . . ., lra_x, Y) oc IX Dra.X' I
[Cra + (pra -- 'lra) X DraX(lra - lra)] -"'' In (8.38) it is seen that the
conditional pdf for Ira can be expressed in terms of a multivariate Student t
pdf, whereas the marginal pdf for (11,..., lra_ 0, shown in (8.37), is of the
same form as the original pdf for B [see (8.24)], except, of course, for the
changes in the dimensions of the matrix X + Q and x? R. Bellman,
Introduction to Matrix Analysis. New York: McGraw-Hill, 1962, p. 95.

232 MULTIVARIATE REGRESSION MODELS the value of the exponent


of the determinant. Then on repeating the same process rn - I times we can
express the joint pdf for B as (8.39) m x IX'OXl[c + (p- ,l)'x'ox(p- Ya)] -(n-
m+a)19', where D,, %, and ca are defined in exactly the same way as e = rn,
given above. The factors in (8.39) correspond precisely to those shown in
(8.25) and clearly the first factor is the marginal pdf for 15x which, of
course, is in agreement with (8.22). Thus the generalized multivariate
Student t pdf is an interesting example in that, even though conditional pdf's
and the marginal pdf's for certain subsets of its variables are in the
multivariate Student t form, the joint pdf fails to be of the same form. As a
second property of the generalized multivariate Student t pdf, we shall
derive the marginal posterior pdf of Bx given in (8.40) X, IBx\ = XBx +
X.B. + U; that is, B is the submatrix of B containing the coefficients of the
variables in X for all equations. Following Geisser, loc. cit., we have for the
quantity in the determinant in (8.24) ( - q ' i _x. (_x_ _ _ _x_ _:_ _x_q i - ) s
+ (s - )'x'x(e- )= s + -l x'x x'xd-z- (8.41) = S + (Bx - x)'F(Bx - + (e- a)'x'x:(e-
a), where we have completed the square on B., F= X.'X.- X[X.(X.'Xo.)-
xXo.'Xx, and a = IL. - (X;X.)-X.'X(lh - t), which do not involve B.. Thus
the joint pdf for B and Bo. is (8.42) p(S, s.l r) IS + (s - )'F(s - &) + (s. -
a)'x.'x.(s. - a)l -,., which, when viewed as a function of B., given Bx, is in
the same form as PREDICTIVE PDF FOR TRADITIONAL
MULTIVARIATE REGRESSION MODEL 233 (8.24). Using properties of
the generalized multivariate Student t pdf, x8 we can perform the
integration with respect to the parameters in Bo. readily to yield (8.43)
where F has been defined above and k. denotes the number of rows in B.. It
is seen that the marginal posterior pdf for Bx is in the generalized multi-
variate Student t form. Thus the results of Tiao and Zellner can be applied
in its analysis to yield marginal pdf's for any column of Bx and also
conditional pdf's. As a last property of the joint pdf for B, given in (8.24),
Geisser points out that the quantity (8.44) = ISl IS + (s- )'x'x(- )1 is
distributed like ,,e._e, that is, as a product of beta variables defined by
Anderson? Thus, as Geisser points out, the posterior pdf for , where B is
random and the other quantities are fixed, is the same as the sampling
distribution of , where B is fixed and $ and /} are the sets of random
variables. Then a posterior region for the elements of B is given by where
,),_ is the th percentage point. 8.2 PREDICTIVE PDF FOR THE
TRADITIONAL MULTIVARIATE REGRESSION MODEL '� Assume,
as in Section 8.1, that we have sample observations generated by a
multivariate normal regression model (8.45) Y = XB + U xo See A. Ando
and G. M. Kaufman, "Bayesian Analysis of the Independent Multi- normal
Process--Neither Mean nor Precision Known," J. Am. Statist. Assoc., 60,
347-358, p. 352 (1965); J. M. Dickey, "Matricvariate Generalizations of the
Multi- variate t Distribution and the Inverted Multivariate t Distribution,"
Ann. Math. Statist., 38, 511-518 0967) and Appendix B to this book. T. W.
Anderson, An Introduction to Multivariate Statistical Analysis, New York:
Wiley, 1958, p. 194 if., discusses which appears as the n/2 power of the
likelihood ratio for testing a hypothesis on a subset of the elements of B. 0
Work on various aspects of this problem appears in S. Geisser, Ioc. cit., A.
Ando and G. M. Kaufman, "Bayesian Analysis of Reduced Form Systems,"
manuscript, MIT, 1.964, and A. Zellner and V. K. Chetty, "Prediction and
Decision Problems in Regression Models from the Bayesian Point of View,"
J. Am. Statist. Assoc., 60, 608-16 (1965).

234 MULTIVARIATE REGRESSION MODELS and wish to derive the


predictive pdf for future observations on the dependent variables, say W, a p
x m matrix, assumed to be generated by the same model generating Y; that
is, (8.46) W = ZB + V, where Z is a p x k matrix of given values for the
independent variables in the next p time periods and V is a p x m matrix of
future normal error terms with independently distributed row vectors having
zero means and each having covariance matrix Y., the same as that assumed
for the rows of U. Then the predictive pdf for W is given by (8.47) where
(8.48) p(WIZ, B, Y.-x) oc IZ-l,2 exp {-� tr[(W-ZB)'(W - ZB)Y.-q}, the
joint pdf for W, given Z, B, and Y.-x, and where p(B, Z-xl Y, X) is the joint
posterior pdf for B and Z -x, which we shall take as shown in (8.13). On
substituting this expression and (8.48) in (8.47), we have for the integrand
(8.49) [Z-I <+-"-"' exp (-� tr AY.- ), with A = (Y - XB)'(Y - XB) + (W-
ZB)'(W- ZB). From the form of (8.49) properties of the Wishart pdf can be
used to integrate with respect to the distinct elements of Z-x, which yields
(8.50) To integrate with respect to the elements of B we complete the square
on B, which gives (8.51) where M = X'X + Z'Z and = M-x(X'Y + Z'W). The
pdfin (8.51) is in a form that we have already encountered [see references in
connection with (8.42)]. Thus the integration with respect to B can be
performed easily and results in the predictive pdf for W, (8.52) p(wI To
simplify this expression we complete the square on W as follows' = Y'(I-
XM-xX')Y + W'(I- ZM-xZ')W (8.53) - Y'XM-xZ'W- W'ZM-XX'Y = Y'(I-
XM-xX ' - XM-xZ'C-XZM-XX')Y + (W- C-xZM-xX'Y)'C(W - C-xZM-
XX'Y), PREDICTIVE PDF FOR TRADITIONAL MULTIVARIATE
REGRESSION MODEL where � -- I - ZM-Z '. Further, we have C - = (I-
ZM-xZ') - = I + Z(X'X)-xZ ', which can be verified by direct
multiplication..x Also C-XZM - = [I + Z(X'X)-xZ,]ZM- (8.54) 235 = Z[I +
(X'X)-Z'ZIM- = Z(X'X)-X(X,X + Z'Z)M - = z(x'x)- Finally, XM-XX ' +
XM-XZ'C-XZM-X,= X[M-X + M-xZ'Z(X'X)-qX , (8.55) = XM-X(X'X +
Z'Z)(X'X)-xX' = x(x'x)-,x', where (8.54) has been used in the first line of
(8.55). Substituting from (8.54) and (8.55) in the third line of (8.53), we
obtain (8.56) r'r + + (w' - zt)' c(w - zi), with/} = (X'X)-XX, y. Noting that
(y_ x&'(�- xtb = s, we can write the predictive pdf in (8.52) as (8.57) It is
thus seen that W, the matrix of future observations, has a pdf in the
generalized multivariate Student t form, the same form found for the
posterior pdf for B, shown in (8.24). Thus the distributional results
established for (8.24) apply as well to (8.57). In particular, the marginal
predictive pdf for any column or row vector of W will be in the multivariate
Student t form. Further, if we partition W as follows, W' = (Wx' Wa'), the
marginal pre- dictive pdf for Wx will be in the generalized multivariate
Student t form. Finally, as pointed out by Geisser, .just as in (8.44) we have
that = 1 IS + (W- is distributed as .. O,,.v,,-. In the special case of a single
regression, m = 1, ax (I- ZM-XZ')[I + Z(X,X)-XZ,] = I -- Z[M -x -- (X'X)-
x + M-xZ,Z(X,X)_X] Z, = I - ZM-X[X,X _ M + Z'Z](X'X)-xZ, =L since
X'X - M + Z'Z = 0, given the definition of M = X'X + Z'Z. aa See T. W.
Anderson, op. cit., pp. 194 if., for a discussion of this distribution. He uses
the letter U where we have used O.

236 MULTIVARIATE REGRESSION MODELS S is a scalar, and S-x(W-


Zt})'(I- ZM-xZ')(W- Z1}) is a quadratic form distributed as [p/(n - k)]Fv._.
When p = 1, that is, W is a 1 x rn row vector, the pdf in (8.57) reduces to a
multivariate Student t form 'a and the quantity I1 - Z(X'X + Z'ZO-Z'](W -
Z1})S-(W - Z1})' is distributed as [rn/(n - k - rn + 1)]Fm,--m+x, where Zx
is the first row of Z and Wx is the first row of W. 8.3 THE TRADITIONAL
MULTIVARIATE MODEL WITH EXACT RESTRICTIONS In some
circumstances we may know that certain coefficients in the B matrix are
zero or that elements of B are constrained by exact linear restric- tions. In
these situations the joint posterior pdf for B, shown in (8.24), can be
conditionalized to incorporate this information and the conditionalized
posterior pdf can be employed to make inferences about the remaining
nonzero coefficients. Several cases are discussed below. If exact zero
restrictions pertain just to the elements of a particular coefficient vector, say
x, the posterior distribution for x in (8.22), which is in the multivariate
Student t form, can be analyzed easily to obtain a conditional pdf that
incorporates the conditioning information; for example, if we partition as ' =
(1,', o') and it is known that o = 0, the posterior pdf P($,IY, $o = 0) can
readily be obtained. Other exact linear restrictions on the elements of $x can
also be imposed by using properties of the multi- variate Student t pdf. On
the other hand, if restrictions relate to several or all coefficient vectors, the
situation is somewhat more complicated. The special case B = C, where Y
= XBx + Xo.B. + U and C is known, is easily handled by using the result in
(8.42); that is, setting B: = Cx yields the conditional posterior pdf for Ba,
given B = C, which is in the generalized Student t form. When zero
restrictions pertain to coefficients of different subsets of variables in
equations of the system, that is, (8.58) y. = X.p: + X. ogl. o + u., a = 1, 2,...,
m, aa This result follows from (8.57) on observing that with I4/x and Zx,
the first rows of W and Z, respectively, we can write the determinant in
(8.57) as [1 + abb'l = 1 + ab'b, with a= 1-Zx(X'X+Zx'Z)-:Zx ' and b= A(Vx-
ZxJ)', an rn x 1 column vector, where A is a nonsingular matrix such that
ASA' =Im or S - = A'A. Then p(WIXx Y, Zx) oc II.+ abb'[ -�n+x-e>ta = (1
+ ab'b) -�+:-e>ta = [1 + a(Wx - Z:)S - x (W - Z:)'] -{n+:-e>ta, which is in
the multivariate Student t form. THE TRADITIONAL MULTIVARIATE
MODEL WITH EXACT RESTRICTIONS 237 with X = (X.: ! X.o) and I.'
= (1.', l$.o'), with I.o = 0, the problem is more difficult, since the
partitioning of X is not the same for all equations. To analyze this case the
joint posterior pdf for B in (8.24) will be expanded and the leading normal
term in the expansion will be conditionalized by using the restrictions gl.o -
- 0, e = 1, 2,..., m. To expand (8.24) let us write it as (8.59) ,(el r) oc 15 +
(e- 1})'m(e- 1})1 where $ = n- S and M = n- X'X. Now let H be a
nonsingular matrix such that HgH' = I and H(B - t})'M(B - t})H' = D, where
D = D(A,) is a diagonal matrix with the characteristic roots, the ;[s, of(B -
1})'M(B - t}) g- on the diagonal. These will be small if n is large and lim,_.
M = , a constant. Also, from HH' -- I, we have H'H = g- . Then I,. + (B-
1})'M(B- 1})1 = IHH'I,.I I + DI-, �. -- I$l-'a exp (- log II + DI) (8.60) = [[-
nta exp [--: log (1 + =l[-'aexp{--;[trD--}trDa+}trD Note that tr D = tr H(B -
)'M(B - )H' = tr H'H(B - )'M(B = tr g-Q, where Q = (B - )'M(B - ). Similar
operations yield tr D a = tr tr D a = tr $-05-Q$- 0 By use of these results
(8.60) becomes P(B] Y) oc exp (-; [ tr g- Q (8.61) 1 2n a tr -Q-XQ + - tr -
Q$-Q$ -. oc exp (-� tr g-Q) exp nn tr -XQ-Q ) 6n . tr -XQ-Q-XQ .... ,

238 MULTIVARIATE REGRESSION MODELS where Q = (B - t})'X'X(B


- ). On expanding the second factor of the second line of (8.61) as e': = 1 +
x + x'/2! +..., we obtain 1 ] (8.62) p(B I Y) ck exp (-� tr $-XQ) 1 + - tr $ -
Q$- +... , with ck denoting "approximately proportional to." The leading
factor of (8.62) is in the multivariate normal form; that is ' (8.63) p(B I Y)
ck exp [-� tr -X(B - )'X'X(B - )] 6c exp [-�(15 - )'g- � X'X(15 - )]. Then
(8.64) P(15'xl Y, 15'o -- 0) 6c exp {-�d'[g"t(X, i X,o)'(Xt i Xt0)ld }, where
15.x denotes the coefficients assumed to be not equal to zero, 15'o, those
equal to zero, 't is the e, lth element of g-x, ,t(X, X,a)'(Xt Xto) is a typical
element of a partitioned matrix, and (8.65) d'= [( - x)',-x0',..., ( - )',-0'1 is ( -
)' conditionalized to reflect the zero restrictions. With this notation
introduced, (8.64) can be expressed as Y, 'o = 0) & exp [-(.x - .x)'(X,x'XtS")
(.x - .) (8.66) - 2.o'(Xo'X)(. - .) + Letting V = (X,'Xt ) and R = (Xo'Xt5 ) and
completing the square in the exponent of (8.66), we have (8.67) p(15.xl Y,
15.o) dc exp - - V-XR'.o)'V(15.x - l.x - Thus this normal approximation to
the conditional posterior pdf has mean 1. + V-R'[.o and covariance matrix V
- = (X,'Xa) -. It is seen that the mean is the least squares quantity .x, plus
another term that includes the vector 'o, the sample estimate of the zero
restrictions. 8.4 TRADITIONAL MODEL WITH AN INFORMATIVE
PRIOR PDF We have gone forward above employing diffuse prior pdf's
and considering cases in which we have exact restrictions on some of the
elements of the B .4 It is interesting to note that the leading normal term in
this expansion is just the conditional posterior pdf for B given g
TRADITIONAL MODEL WITH AN INFORMATIVE PRIOR PDF 239
matrix. In this section we consider the problem of introducing prior
informa- tion about the elements of B by use of an informative prior pdf.
Given that such prior information is reasonably accurate, we, of course,
shall improve the precision of our inferences by using it. In addition, we
shall learn how the sample information modifies our beliefs by comparing
the properties of the prior and posterior pdf's. Our problem is to formulate a
prior pdf that will be useful in representing prior information in a broad
range of circumstances and that will be relatively convenient
mathematically. Clearly, it should be recognized that one class of prior pdf's
will not be appropriate for all situations; however, it is believed that the
class studied below will be useful in many situations. As Rothenberg 's has
noted, if we use a "simple" natural conjugate prior distribution 'e for the
traditional multivariate regression model, this will involve placing
restrictions on the parameters, namely, the variances add covariances of
coefficients appearing in equations of the system. This is due to the fact that
the matrix (X'X) - enters the covariance structure in the following way, Y.
� (X'X) -. Thus, for example, the ratios of variances of corresponding
coefficients in the first and second equations will all be equal if we use a
simple natural conjugate prior pdf. The way to avoid this problem is to use
a general multivariate normal prior pdf for all coef- ficients of the model. '
By so doing we will avoid the problem raised by Rothenberg but we will
have to pay a price; that is, such a general normal prior pdf does not
combine so neatly with the likelihood function as the simple natural
conjugate pdf. However, the price is well worth paying, since the
restrictions involved in using the natural conjugate prior pdf for the
traditional model are not reasonable in most situations. In view of what has
been said above, we introduce the following prior pdf: (8.68) p(15, Y.-x) oc
]-xl-("+>/a exp [-�(15 - )'C-(15 _ )], where , an mk x 1 vector, is the mean
of the prior pdf, assigned by the investigator, and C = (c,) is an mk x mk
matrix, the prior covariance matrix, also assigned by the investigator. As
above, we assume that little is known a priori about the elements of Y and
use the diffuse prior pdf introduced and discussed in Section 8. I. On
combining the prior pdf in (8.68) with the likelihood function in (8.6), * T.
J. Rothenberg, "A Bayesian Analysis of Simultaneous Equation Systems,"
Report 6315, Econometric Institute, Netherlands School of Economics,
Rotterdam, 1963. That is, if we took our prior pdf in the same form as the
likelihood function, we would have a natural conjugate prior pdfi ? From
what is presented below it is the case that the prior pdf that we shall use is
the natural conjugate prior pdf for the "seemingly unrelated" regression
model (see Section 8.5).

240 MULTIVARIATE REGRESSION MODELS we have for. the joint


posterior pdf p(B, Z-Xl Y) cr IZ-x[ <-m-x>/2 exp {-� tr Y-x[S + (B-
)'X'X(B-/})1} (8.69) x exp [-�(15 - )C-X(15 - )], where B = (15x,..., 15,),
15' = (15x',. �., 15,/), and B = (X'X) -xX' Y. We can now integrate (8.69)
with respect to Z -x to obtain p(BI Y) oc I $ + (B - )'X'X(B - i)1 -n/' exp [-
�(15 - )'C-X(15 - )]. (8.70) The posterior pdf .for B is seen to be the
product of a factor in the generalized multivariate Student t form and a
factor in the multivariate normal form. Since (8.70) is rather complicated as
it stands, we shall expand the first factor on the right-hand side of (8.70)just
as we did in Section 8.3. This yields the following as the leading normal
term approximating the posterior pdf: (8.71) p(BI Y) ck exp [-�(15 - )'X -
� X'X(15 - ))exp [-�(15 - )'C-(15 - )1 & exp [-�(15 - b)'F(15 - b)], where
(8.72) b = (C - + - � X'X)-(C-X + - � X'X) and (8.73) F = C - + $- �
X'X. The quantity b in (8.72) is the mean of the leading normal term of the
expansion and is seen to be a "matrix weighted average" of the prior mean
and the least squares quantity whose weights are the inverse of the prior
covariance matrix C and the sample covariance matrix, ($� X'X-X) -,
respectively. The matrix F in (8.73) is the inverse of the covariance matrix
of the leading normal term approximating '8 the posterior pdf for B. 8.5
THE "SEEMINGLY UNRELATED" REGRESSION MODEL The
"seemingly unrelated" regression model '9 is in a certain sense a
generalization of the traditional multivariate regression model in that the 9.8
Further work to take account of additional terms in the expansion along the
lines of Appendix 4.2 will lead to a better approximation of the posterior
pdf. 9.9 For sampling theory analyses of this model, see A. Zellner, "An
Efficient Method of Estimating Seemingly Unrelated Regressions and Tests
for Aggregation Bias," J. Am. Statist. Assoc., 57, 348-368 (1962); A.
Zellner, "Estimators for Seemingly Unrelated Regression Equations: Some
Exact Finite Sample Results," ibid., 58, 977-992 (1963); A. Zellner and D.
S. Huang, "Further Properties of Efficient Estimators for Seem- ingly
Unrelated Regression Equations," Intern. Econ. Re., 3, 300-313 (1962); J.
Kmenta and R. F. Gilbert, "Small Sample Properties of Alternative
Estimates of Seemingly Unrelated Regressions," J. Am. Statist. Assoc., 63,
1180-1200 (1968). THE "SEEMINGLY UNRELATED" REGRESSION
MODEL 241 matrix X appearing in each equation of the traditional model
is permitted to be different in the seemingly unrelated regression model;
that is, the model assumed to generate the observations is (8.74) y. = X. 15.
+ u. , y Xm 15,n Um where y, -- 1, 2,..., m, is an n x 1 vector of
observations on the th dependent variable, Y is an n x k, matrix, with rank
k, of observations on k independent variables appearing in the eth equation
with coefficient vector 15, a k x 1 column vector, and u is an n x 1 vector of
disturbance terms appearing in the eth equation; for example, the subscript
e might refer to the eth firm. Thus in (8.74) we have a regression relation
for each of m firms with n observations on each of the variables and with
each firm having its own independent variables �� and coefficient vector.
For simplicity we rewrite (8.74) as follows: (8.75) y = Z15 + u, where y' =
(yx', y.',..., y,/), 15' = (15x', 15o.',..., 15,/), u' = (ux', u',..., urn'), and Z
denotes the block diagonal matrix on the right-hand side of (8.74). Our
distributional assumptions about the nrn elements of u are the same as those
employed in the analysis of the traditional model, namely that they are
jointly normally distributed with Eu = 0 and (8.76) Euu' = where In is an n
x n unit matrix and Z is a positive definite symmetric m x rn matrix. Then
the likelihood function for the parameters 15 and Y, is given by /(15, Ely )
cr IZ-ln/' exp [-�(y - z15)'z -x � In(y - Z15)] (8.77) IZ-ln,: exp (-� tr
AY,-), where in the second line of, (8.77) we have written [ (y- X15)'(y-
X15) ... (y- X15)'(ym- X, n15,n)] (8.78) = i , [(y Xr15m)'(Yx X150 '" (Yr
Xm15m)'(Y, Xr15r)J an m x rn symmetric matrix. 0 That having the X's
different in (8.74) leads to results differing from those associated with the
traditional model came as a surprise. In fact, the phrase "seemingly
unrelated" was chosen to emphasize this point.

242 MULTIVARIATE REGRESSION MODELS We shall use the same


diffuse prior assumptions about the parameters employed in the analysis of
the traditional multivariate regression model, namely p(la, Y.- 1) = p(15)
p(y.- ) (8.79) ly.-11 On combining (8.77) and (8.79), the joint posterior pdf
for the parameters is (8.80) p(l, Z-lly) c 1,-11 exp [-�(y - Z15)'Z -x � I(y -
Z15)l where A has been defined in (8.78). From (8.80) it is seen that the
conditional posterior pdf for 15, given is in the multivariate normal form
with mean (8.81) E(151Z -1, y) = [Z'(Z -1 � I.)Z]-:Z'(Z - � I.)y and the
conditional covariance matrix (8.82) The conditional mean in (8.81) is
precisely the generalized least squares quantity that Zellner obtained in
studying the system in (8.74) from the sampling theory point of view; that
is, we multiply both sides of (8.75) by a matrix H, Hy = HZ15 + Hu, where
H is a nonsingular matrix such that EHuu'H' = HY. � I.:H' = Inn, which is
possible, since Z � IN is assumed to be a positive definite symmetric
matrix. Thus, from H'H = Z-1 � I.:. Then the transformed system Hy =
HZ15 + Hu satisfies the conditions of the Gauss-Markoff theorem, which
means that the sampling theory estimator (8.83) = (Z'H'HZ)- iZ'H'Hy = �
;.)ZI-*Z'Z -1 � ;. Y, with covariance matrix (8.84) Cov() = [Z'(y-x � I,)Z]
-x, is a minimum variance linear unbiased estimator for 15. Also, with a
normality assumption, as can be seen from the likelihood function in (8.77),
the estima- tor in (8.83) is a maximum likelihood estimator for 15, given Y..
As can readily :.i be established, if the X's are all the same (or proportional)
and/or ' ' diagonal, the quantities in (8.81) and (8.83) reduce algebraically to
vectors.li of single-equation least squares estimates; that is, with respect to
(8.83), i under these conditions, l, = (X,,'X,,)-1X='y,, a = 1, 2,..., m, and
similarly for (8.81). THE "SEEMINGLY UNRELATED" REGRESSION
MODEL 243 As mentioned above, the sampling theory estimator in (8.83)
is identical to the mean of the conditional posterior pdf for 15, given Z or,
equivalently, given Z-x. Note, however, that the sampling theory estimator
in (8.83) depends on the matrix Z which is usually unknown. Zellner's
suggestion that Y, be replaced by a consistent estimate , formed from the
residuals of the equations estimated individually by least squares to yield
the estimator 1 (8.85) b = [z'(2-1 � � is equivalent to what is obtained
from a Bayesian analysis if we go ahead under the assumption that Z = .
With this assumption the conditional mean in (8.81) is precisely the
quantity b in (8.85). In large samples will not differ markedly from Z, and
thus going ahead with the assumption Z = will produce satisfactory results.
In small samples, a= however, it is better to obtain the marginal posterior
pdf for and base inferences on it rather than rely on conditional results. To
obtain the marginal posterior pdf for we can use properties of the Wishart
pdf to integrate (8.80) with respect to the distinct elements of This yields
(8.86) (Yx- &x)'(yx- Xxx) ... (Yx -- &l)'(Ym- Xmm ) (Ym- Xmm)'(yx- xx)
'" (Ym- Xmm)'(Ym- Xmm) as the joint posterior pdf for the elements of .
Although this distribution resembles a generalized multivariate Student t
pdf, it does not appear to be possible to bring it into this form because of the
fact that not all X's are the same. Further work to provide techniques for
analyzing (8.86) is required before it can be used in practical work. Another
way of looking at the seemingly unrelated regression model is to consider it
as a "restricted" traditional multivariate regression' that is, we can write '
(8.87) (yx,..., Yn) - (XXX.' ' . Xn) 152 '.. 0 ... n + (ul,..., ax This estimator
has the same large sample properties as that of the estimator I in (8.83). a
See A. Zellner, "Estimators for Seemingly Unrelated Regression Equations'
Some Exact Finite Sample Results," !oc. cit., for an analysis of the finite
sample properties of the estimator b in (8.85) for a two-equation model.
Kmenta and Gilbert, !oc. cit., provide 'interesting Monte Carlo results, and
N. C. Kakwani, "The Unbiasedness of Zellner's Seemingly Unrelated
Regression Equations Estimators," J. Am. Statist. Assoc. 62, 141-142
(1967), shows that b is unbiased.

244 MULTIVARIATE REGRESSION MODELS which shows the (zero)


restrictions explicitly. In cases in which the matrix (XX.... Xr) has full rank
the approach taken in Section 8.3 to incorporate zero restrictions in the
traditional model can be applied. However, if we work with just the leading
normal term in the expansion, we have seen that this is equivalent to using
the conditional posterior pdfp(151Y. = 2g, y). Thus, to this degree of
accuracy, using the results in (8.81) and (8.82) with Y. - 2g will be
satisfactory in large samples. To illustrate an application of these large-
sample results we consider annual investment data pertaining to 10 large
U.S. corporations from 1935 to 1954. For each of the 10 firms we posit a
regression relation that "explains" its deflated annual gross investment in
terms of two explanatory variables, the Table 8.1 POSTERIOR MEANS
AND STANDARD DEVIATIONS FOR TWO ANALYSES OF TEN
FIRMS' INVESTMENT RELATIONS a Analysis Based on Equations
(8.81) and (8.82) Analysis Based on Individual Firm's Data Alone
Corporation Slope Coefficients b Slope Coefficients b Intercept (1) (2)
Intercept (1) (2) General Electric - 11.2 0.0332 0.124 - 9.96 0.0266 0.152
(21.1) (0.00928) (0.0214) (31.4) (0.0156) (0.0257) Westinghouse 4.11
0.0525 0.0412 - 0.509 0.0529 0.0924 (5.09) (0.00794) (0.0347) (8.02)
(0.0157) (0.0561) U.S. Steel - 18.6 0.170 0.320 -49.2 0.175 0.390 (78.4)
(0.0377) (0.101) (148) (0.0742) (0.142) Diamond Match 2.20 -0.0181
0.365) 0.162 0.00457 0.437 (1.12) (0.0151) (0.0578) (2.07) (0.0272)
(0.0760) Atlantic Refining 26.5 0.131 0.0102 22.7 0.162 0.00310 (6.48)
(0.0473) (0.0187) (6.87) (0.0570) (0.0220) Union Oil -9.67 0.112 0.128
-4.50 0.0875 0.124 (9.01) (0.0456) (0.0155) (11.3) (0.0656) (0.0171)
Goodyear - 2.58 0.0760 0.0641 - 7.72 0.0754 0.0821 (7.59) (0.0202)
(0.0229) (9.36) (0.0340) (0.0280) General Motors -133.0 0.113 0.386 -150
0.119 0.371 (73.2) (0.0167) (0.0312) (106) (0.0258) (0.0371) Chrysler 2.45
0.0672 0.306 - 6.19 0.0780 0.316 (11.5) (0.0166) (0.0271) (13.5) (0.0200)
(0.0288) IBM -5.56 0.131 0.0571 -8.69 0.132 0.0854 (3.56) (0.0167)
(0.0575) (4.54) (0.0312) (0.100) a The data underlying these computations
were taken from J. C. G. Boot and G. M. de Wit, "Investment Demand: An
Empirical Contribution to the Aggregation Problem," Intern. Econ. Rev., 1,
3-30 (1960). ' The slope coefficient (1) is the coefficient of the value of
outstanding shares at th0 beginning of the year while (2) is that of the firm's
beginning of year real capital stock. THE "SEEMINGLY UNRELATED"
REGRESSION MODEL 245 value of its outstanding shares at the
beginning of the year, and its beginning- of-year real capital stock. In
addition, we assume a nonzero intercept term in each relation. We further
assume that the model generating the data is the normal seemingly
unrelated regression model, that is, (8.74) with m = 10. Shown in Table 8.1
are the approximate means of the posterior pdf for the coefficients based on
the conditional posterior pdf p(BlY. = ., Y). Shown below each entry in
parentheses are the conditional standard deviations of the coefficients
computed as square roots of diagonal elements of the matrix in (8.84).
Further, the results of an equation-by-equation analysis of the 10 regression
equations is presented, an analysis that utilized diffuse prior pdf's, as in
Chapter 4, on the multiple regression model. Entries in the table are
elements of = (X'X)-X,'y for each firm's data; the figures in paren- theses
are square roots of diagonal elements of (X,'X,)-xS,, = 1, 2,..., 10. On
comparing the precision with which coefficients have been determined, it is
seen that the analysis using all the observations yields sharper posterior
lxtf's for individual coefficients than did that employing just the individual
firm's data in making inferences about its coefficients2 � To the extent that
the analyses using just part of the data fail to incorporate information in all
the data, information is being lost. To indicate the extent to which the Table
8.2 SAMPLE DISTURBANCE CORRELATION MATRIX General
Electric 1.00 0.74 0.45 0.60 -0.02 0.02 0.43 0.29 -0.07 0.48 Westing- house
-- 1.00 0.64 0.62 0.00 0.14 0.54 0.16 0.12 0.55 U.S. Steel -- 1.00 0.75 0.24
-0.30 0.30 -0.28 0.36 0.39 Diamond Match -- 1.00 0.13 -0.23 0.28 -0.27
0.12 0.41 Atlantic Refining .... 1.00 0.15 - 0.15 - 0.32 0.06 0.22 Union Oil
..... 1.00 0.20 0.53 -0.13 0.14 Goodyear ...... 1.00 0.21 0.07 - 0.18 General
Motors ....... 1.00 -0.28 0.12 Chrysler ........ 1.00 0.21 IBM ......... 1.00 This
has to be qualified in that the analysis using all firms' data is based just on a
::large-sample result.

246 MULTIVARIATE REGRESSION MODELS observations are


correlated the sample disturbance correlation matrix is presented in Table.
8.2. It is seen that some of these correlations are substantial and that there is
some reason to doubt that the observations are independent. QUESTIONS
AND PROBLEMS 1. Given the diffuse prior pdf for the distinct elements
axx, a.., axe. of a 2 x 2 pals covariance matrix, that is, p(Y.) oc IZl-x, with 0
< *xx, *a. and -oo < axe. < oo, derive the implied prior pdf for the
correlation coefficient p = ax./x/,x"- and describe its properties. 2. Given
that Yx, Y., � �., Y, are n vectors of observations, each 2 x 1 and drawn
independently from a bivariate normal population with mean vector W = (x,
.) and 2 x 2 pals covariance matrix Z, derive the bivariate marginal posterior
pdf for x and . by employing a diffuse prior pdf for and the distinct elements
of Z. 3. With the information and assumptions of Problem 2, derive the
posterior pdf for the difference in means, namely x- .. Also obtain the
marginal posterior pdf's for *x and for p, the correlation coefficient. 4.
Suppose in Problem 2 that the yfs are each rn x 1 vectors, ' = (x, a, � �., ,)
and Y, is an rn x rn pals matrix. With diffuse prior assumptions about and
the distinct elements of Z, show that the marginal posterior pdf for is in the
multivariate Student t form and obtain the marginal posterior pdf for x. 5.
For the conditions of Problem 4 explain how to construct a 90% Bayesian
confidence interval for c', where c' = (el, e.,..., era), a 1 x rn vector whose
elements have known values. 6. In Problem 4 obtain the marginal posterior
pdf for the distinct elements of Y, and comment on its properties. From the
posterior pdf for Z derive the posterior pdf for f = A'Y.A, where A is an rn x
rn nonsingular matrix whose elements have known values. 7. Given the
posterior pdf for Z, derived in Problem 6, derive the marginal posterior pdf
of the distinct elements of the mx x mx matrix Zxx, a submatrix of Z shown
below: 11.2 . What are the posterior pdf's for -1 and -1 (5-]11 _ 5119..-1.1)
-1 = y.x implied by the posterior pdf for 52 obtained in Problem 6 ? 9.
Consider the standard multivariate regression model Y = XB + U which
was analyzed in Section 8.3. From the marginal posterior pdf for the
elements of B shown in (8.24) derive the posterior pdf for the elements of O
= CB, where C is a k x k nonsingular matrix whose elements have known
values. QUESTIONS AND PROBLEMS 247 10. Let Y = X:B + Ux and Y.
= X.B + U., where the matrices Y, X, and U are of sizes n x m, n x k, and n
x rn, respectively, where i = 1, 2, and B is a k x m matrix of regression
coefficients. Assume that the 1 x rn rows of Ux and U. are all independently
and normally distributed with zero mean vectors and that the rows of U
have a common pds m x rn covariance matrix 52 and those of U. have
common pds m x m covariance matrix With Xx and X. given, each of rank
k, and using the following diffuse prior pdf, where the elements of B range
from -oo to +oo and ]Y,[ > 0, i = 1, 2, show that the marginal posterior pdf
for B is in the form of a product of two factors, ach in the generalized
Student t form. 11. Provide the leadin normal term in an asymptotic
expansion of the posterior pdf for B obtained in Problem 10. 12. Consider
the following reg?ession system: Yl{ : P11Xl{ '- P19.X94 + /'/1 i = 1, 2,...,
n, y. = x + f..x., + u:) where the ifs are regression coefficients, x, and x4 are
given values of two independent variables, Yx and y., are dependent
variables, and ux, and u4 are disturbance terms. Assume that the n x 2
matrix X = (xxx:) has rank 2 and that the pairs of random variables (ux, u.),
i = 1, 2,..., n, are normally and independently distributed, each with zero
mean vector and common 2 x 2 pds covariance matrix Y,. If the elements of
Z have known values, derive the conditional posterior pdf for the ifs by
using a diffuse prior pdf. 13. Suppose, in Problem 12, we assume that fx: =
.x = 0. Obtain the con- ditional posterior pdf for 11 and fio.., given Z and
fixo. = .x = 0; that is, P(fxx, f..]Z, fx. = fo.x = 0, yx, y:). What are the mean
and the covariance matrix of this conditional posterior pdf? How is the
posterior variance of fhx affected by the assumption that x ' = 07 i xo. = x
xixo4 = 14. Given the results in Problem 13, compare the posterior variance
of xx, given 52 and . = 2x = 0, with the posterior variance of xx, given exx
obtained from an analysis of yx = xxxx + ux with a diffuse prior pdf for 15.
Show that the expression in (8.81) reduces to E(I]Y.-x, y) = (Z'Z)-XZ'y if
(a) Y.-x is a diagonal matrix and/or (b) Xx = Xo. .... = Xm. Interpret the
quantity (Z'Z)-XZ'y. Also consider the form of (8.82) under assumptions (a)
and (b).

CHAPTER IX Simultaneous Equation Econometric Models Since most


economic phenomena involve interactions among several or many
variables, it is important to construct and analyze models that incor- porate
such interactions or feedback effects and are capable of "explaining" the
variation of a set of variables. This is one of the main objectives we have in
mind when we construct a simultaneous equation model to represent, say, a
particular market or national economy. In the former instance the model
will provide an explanation of the variation of price and quantity of the
com- modity or service traded in the market. With respect to models of
national economies, x one objective of model construction is the
explanation of varia- tion in such variables as national income, consumption
expenditures, investment, and the price level. Here, the term "simultaneous
equation econometric model" is taken to mean a stochastic model that
permits an investigator to make probabilistic statements about a set of
random variables, the so-called endogenous variables. ' The models we
shall consider are linear in the parameters and a generalization of the
multivariate regression model in the sense that in multi- variate regression
models we have just one "dependent" variable per equation whereas in
simultaneous equation models we may have more than one dependent
variable appearing in each equation; that is, in the simul- taneous equation
model it is assumed that the data are generated by a model in the following
form*: (9.1) YI' = XB + U, x See M. Nerlove, "A Tabular Survey of Macro-
Econometric Models," Intern. Econ. Rev., 7, 127-173 (1966), for a review
of some salient features of a number of models which have appeared in the
literature. 9. To make such probability statements information about a
model's initial conditions, the form of the pdf for its stochastic variables,
exogenous variables, and parameter values is required. Providing
information about parameter values is an objective of statistical estimation
techniques. a Insofar as possible, the notation in this chapter parallels that
used in discussing multivariate regression models in Chapter 8. 248
SIMULTANEOUS EQUATION ECONOMETRIC MODELS 249 where Y
= (yx, yo.,..., Ym), an n x m matrix of observations on rn "de- pendent" or
endogenous variables whose variation is to be explained by the model, I' is
an m x rn nonsingular matrix of coefficients for the endogenous variables,
X = (xx, x.,..., x) is an n x k matrix of observations on k "predetermined"
variables, B is a k x rn matrix of coefficients for the "predetermined"
variables, and U = (ux, u.,..., Urn) is an n x m matrix of random disturbance
terms. The variables in X, the predetermined variables, may include lagged
values of the endogenous and/or exogenous variables. In the latter category
we include both nonstochastic and stochastic variables whose Variation is
determined outside the model. Stochastic exogenous variables, by
definition, are assumed to be distributed independently of the elements of U
and to have distributions not involving any of the parameters of the model;
that is, I', B, and the elements of the covariance matrix of the elements of U.
Further, in this chapter we assume that the elements of U have zero means
and that the rows of U are normally and independently distributed, each
with m x rn pds covariance matrix Y.. This independence assumption rules
out any form of auto or serial correlation. Note that if the elements of the
matrix I' in (9.1) had elements with known values the model could be
analyzed by using results for multivariate regres- sion models 4 presented in
Chapter 8. One special case that deserves mention is I' = Ir, where Im is an
rn x rn unit matrix. Here the model is in the form of a regression system,
except that X usually includes lagged values of the y's. More accurately,
then, the system with I' = Ir is usually a set of auto- regressive equations.
That I' =Im does not mean that there are no feedback effects in the system;
it means that any feedback effects in the system are lagged. Other special
cases, which we distinguish below, are that in which I' is triangular and Z,
the disturbance covariance matrix, is diagonal, the "fully recursive" case
and that in which I' is triangular and Z is not diagonal, called the
"triangular" case. Others which can be distinguished are combinations of
the I' block diagonal and the disturbance covariance matrix full or block
diagonal. Finally, when I' and the disturbance covariance matrix have no
special form, we refer to the system simply as an "interdependent" model.
We discuss first the analysis of fully recursive and triangular models. After
that we treat the concept of identification in Bayesian terms. Then several
simple models are considered, followed by the presentation of methods for
"limited information," single-equation, and full-system Bayesian analysis.
Last the results of some Monte Carlo experiments which compare the
sampling properties of Bayesian and well known sampling theory
estimators are reported. 4 This assumes that we have gone ahead depending
on given initial conditions if the model had autoregressive features.

250 SIMULTANEOUS EQUATION ECONOMETRIC MODELS 9.1


FULLY RECURSIVE MODELS We assume that our observations Y --
(yx,..., y,) have been generated by the following model: yx = Xd + u Ya =
YxY.x + X4. + u. (9.2) Yo = YxYox + Y.Y. + X[3 + u3 Ym = YxYmx -I-
Y.Ya. +'" + Y,-xY,,,-x + Xa[ta + ua, where y, is an n x 1 vector of
observations on the eth endogenous variable, X, is an n x k, matrix with
rank k, of observations on k, predetermined variables appearing in the eth
equation with coefficient vector [t,, a k, x 1 column vector, u, is an n x 1
vector of disturbance terms, and the y,, are scalar coefficients. 6 Further, we
assume that the elements of the u,, e = 1, 2,..., rn, are normally distributed
with zero means and covariance matrix (9.3) Euu' = D(% 2) � In, where u'
= (ux',..., urn') and D(% ) is a diagonal matrix with ,x', o.',..., c, ' on the main
diagonal. Thus (9.3), along with the assumption of normality, implies that
all elemenf. s of u are independently distributed and that the disturbance
terms in different equations have different variances. With the above
distributional assumptions and noting that the Jacobian of the
transformation from the u's to the y's is 1, we obtain the likelihood function
(9.4) l(8, where Yo denotes given initial conditions, ,'= (,x, ,.,..., %0, Z =
(yx, y.,..., y_ i X), ,' = (yj[j) with %' = (y,x, y,..., y.,_0, and As regards prior
assumptions we employ the following diffuse prior assumptions, p(S, ) =
p(S)p(), with (9.5) p(a) constant and p(.) =1 0t < < t, 5 In (9.2) we have
introduced all 't which could be non-zero, subject to our normaliza- tion,
and yet yield a triangular structure. If some of these coefficients are known
to be zero, a priori, they can be set equal to zero without affecting the
analysis in any funda- mental way. FULLY RECURSIVE MODELS 251
where t denotes a column vector of ones. Under these assumptions the joint
posterior pdf for tJ and. is 1 1 (9.6) p(IJ, '1 �, Yo) cr ,__ . exp [ 2%. (y,-
ZlJ,)'(y,- From the form of (9.6) it is seen immediately that each factor
contains a particular equation's parameters, % and I,, and thus each
equation's parameters are, a posterJori, distributed independently of those of
other equations., The posterior pdf for the th equation's parameters is 1 [
P(S, %1 Y, Yo) c +----- exp - (9.7) cr ,]+---"i exp where (9.8) g. = and fi = y
- Z. Thus the conditional posterior pdf for t5, given %, is in the multivariate
normal form with the mean , given in (9.8). Further, on integrating (9.7)
with respect to 15, we find that the marginal posterior pdf � for % is (9.9)
P(%I Y, Yo) oc *:-a" +x exp where q, is the number of elements in IJ,,
which is in the inverted gamma form. Last, on integrating (9.7) with respect
to %, we have for the marginal posterior pdf for I (9.10) r, �o) oc + - -
which is in the multivariate Student t form with mean g, shown in (9.8). It
should be noted that g, is also the maximum likelihood estimate for . It is
seen that the results for the fully recurslye model, given initial con- ditions,
are completely analogous to Bayesian results for the multiple regression
model. Thus many of the results obtained for the multiple re- gression
model can be carried over to apply to the equations of fully recurfive
models; for example, if, instead of the diffuse prior pdf for the clements of
8,, we had employed an informative prior pdf in the multivariate normal
form, that is, 1 (tJ $.)'R.(8 $,)], (9.11) p($,) oc exp 2,. - _

252 SIMULTANEOUS EQUATION ECONOMETRIC MODELS where Ig,


is the prior mean vector and r,'R- is the prior covariance matrix, the joint
posterior pdf for li, assuming that our prior pdf for % is p(%) m 1/%, is �,
ro, oc + - - [ 1 (tJ )'R(8 - )], (9.12) x exp 2%. - where P! denotes prior
information� It is seen that the posterior pdf in (9.12) is in what Tiao and
Zellner 6 have called the "normal-t" form and the methods they have
presented (see Appendix 4.1) can be used in its analysis. Also, if an
informative prior pdf for % in the inverted gamma form is intro- 'duced
with the normal prior for ti, in (9.11), the joint posterior pdf for li can easily
be shown to be in the normal-t form. Among other results that can be
carried over from regression analysis to apply to fully recursive models is
the analysis of autocorrelation and the Box-Cox analysis of transformations.
Then, too, it is not difficult to obtain the predictive pdf for the y's in period
n + 1. Going further into the future, ' however, is somewhat complicated in
situations in which the system is autoregressive. The simplicity of the fully
recursive model, both with respect to its triangu- lar form and analysis, is
striking. However, it may be that the assumption that the disturbance terms
are contemporaneously uncorrelated is not satisfied in all circumstances. We
now turn to the analysis of triangular systems without the assumption that
the disturbance covariance matrix is diagonal. 9.2 GENERAL
TRIANGULAR SYSTEMS In this section we analyze the system in (9.2)
with the assumption in (9.3) replaced by (9.13) Euu' = Y. � In, where Z is
an rn x rn positive definite symmetric matrix. The assumption in (9.13)
permits nonzero contemporaneous disturbance covariances and differing
variances for disturbance terms of different equations but rules out any kind
of auto- or serial correlation. Except for (9.13), all other distributional
assumptions about the system are the same as those in Section 9.1. 6 G. C.
Tiao and A. Zellner, "Bayes' Theorem and the Use of Prior Knowledge in
Regression Analysis," Biornetrika, $1, 219-230 (1964). THE CONCEPT
OF IDENTIFICATION IN BAYESIAN ANALYSIS 253 Note that we can
write the system in (9.2) as (9.14) y. = Zo. &. + . , m gm m \ Un / where the
Z's and 8's have been defined in connection with (9.4). Equation 9.14 is in
the form of the "seemingly unrelated" regression model. Thus, given initial
conditions, and employing the following diffuse prior pdf � e analysis of
the system in (9.14) goes through in exactly the same fashion as that for the
seemingly unrelated regression model. In particular, the con- ditional
posterior pdf for , given , will be multivariate normal with mean (9.16) E( I
Y, Yo, Z) = [Z'(Z -x I,)Z]-xZ'(Z -x I0Y and conditional covariance matrix
(9.17) V(]L Yo, Z) = [Z'(Z - IOZ] -x, where Z denotes the block diagonal
matrix on the rhs of (9.14) and y, the vtor on the lhs of (9.14). A large-
ample analysis based on these conditional results can be performed by using
a consistent sample estimate of Z. v As regds finite sample results, the joint
posterior pdf for the elements of is wi (9.19) A = '. � L(y. ... - - The
posterior pdt in (9.18) is in exactly the same form as that for the coef-
ficients of the seemingly unrelated regression model. Further work is
required to produce useful techniques for obtaining marginal pdf's
associated with the pdf in (9.18). 9.3 THE CONCEPT OF
IDENTIFICATION IN BAYESIAN ANALYSIS At this point we turn to a
consideration of the concept of identification. At the outset, just as in
sampling theory, it is important to emphasize that the identification problem
is not peculiar to simultaneous equation models 7 In addition, the results
presented in Sections 9.5 and 9.6 can be specialized to apply to triangular
systems.
254 SIMULTANEOUS EQUATION ECONOMETRIC MODELS but
arises in connection with all statistical models; that is, in sampling theory
terms, if the observations y are assumed to be generated by a pdf p(y[0),
where 0 is a parameter vector, we may ask if there is another vector of
parameters, say , such that p(y[0) = p(ylq,). If this is the case, it is clear that
there will be a problem in deciding from the information in any sample
whether the model generating the observations is p(yt$) or P(YI); that is
there will be a problem in identifying the model generating the
observations. In this situation the model is said to be not identified.
Equivalently, there will be a problem in determining whether the parameters
are 0 or ; that is, the parameters are not identified. Usually in the sampling
theory framework exact restrictions are imposed on the parameters of a
model to achieve identification; for example, certain elements of 0 may be
assumed to be zero. If we denote the restricted parameter vector by 0r and
p(yl0r) P(YI) for all not identical to 0, the model p(y100 is identified or,
equivalently.,' the parameter vector 0r is identified. Since prior information
in the Bayesian framework need not necessarily take the form of exact
restrictions but can be introduced flexibly by use of prior pdf's, there is a
need to broaden the concept of identification to allow for the more general
kind of prior information used in Bayesian analysis. It is this problem that
we now consider. Suppose that we have two models, say Mx with its
parameter vector 0x and prior pdf p(0xlMx) and Ma with its parameter
vector 0. and prior pdf p(0.lM0. If both models, along with their associated
prior information, lead to exactly the same marginal pdf for a set of
observations, s we shall say, by definition, that Mx and its associated prior
information and M. and its associated prior information are observationally
equivalent and we are unable to use data to discriminate between them.
Alternatively phrased, Mx with its prior information is not identified in
relation to M. with its prior information. Explicitly, we have (9.20) p(Y,
0lIMO: p(yI01, M1)p(OllM1), where y is a vector of observations, p(y,
Ox[M) is the joint pdf for y and 0, given model M, p(y[Ox, M) is the pdf
for y, given 0 and Mx, and is the prior pdf for the parameter vector 0x
associated with Mx. Then the marginal pdf for y is (9.21) p(yIMx) = f
p(ylOx, M)p(OxlMx ) dO. Similarly, for model M. with its associated
parameter vector 0. (9.22) p(yIMO = f p(y{0., M.) p(O.[M.) dO.. See
Chapter 2 for a definition of the marginal pdf for the observations. THE
CONCEPT OF IDENTIFICATION IN BAYESIAN ANALYSIS Then M
and Mo. and their respective prior information observationally equivalent if,
and only if, (9.23) p(ylMx) = p(ylMo.). 255 are defined as If the condition
in (9.23) holds, we cannot decide whether M and the information in p(0[M)
or M. and the information in "explain" the data. Both have exactly the same
implications for the distribution of the observations. In the special case of
simultaneous equation models let M be 9 (9.24) YP = XB + U, with
disturbance covariance matrix Y., and let Mo. be (9.25) YPA = XBA + UA,
where A is any nonsingular matrix, or (9.26) YFa = XBO. + Ua, with
disturbance covariance matrix Y'a and where Fa = FxA, Ba = BA, Ua =
UxA, and Zo. = A'YaA. Under these conditions which involve no
restrictions on coefficients or covariance matrix elements, we have, as is
well known, (9.27) p( Y{ �, M) =- p( Y{�u, Mo.), where � = (P, B, Z)
and �o. = (Pu, Bu, Given the identity in (9.27), let us see what happens
when we use "non- informative" prior pdf's for the parameters and assume
that our non- informative or diffuse prior pdf's are chosen to be invariant
x� to the class of 9 The notation and distributional assumptions introduced
in Section 9.1 are employed below. x0 See H. Jeffreys, Theory of
Probability (3rd ed.). Oxford: Clarendon, p. 179 if., and J. Hartigan,
"Invariant Prior Distributions," Ann. Math. Statist., 35 (1964), 836-845, and
Appendix 2.1, for a discussion of invariance theory and procedures for
obtaining invariant noninformative prior pdf's. As an example of invariance,
suppose that our prior pdf for Zx is p(Z) oc {Zx[-(r+x)m, a pdf used above.
Now consider Y"a = with A nonsingular, as a transformation of the distinct
elements of Y"x to those of The Jacobian of this transformation is [A[ -(r +
x). Thus It is seen ihat our noninformative prior pdf for Za is in exactly the
same form as that for Z, a result that follows from the fact that the particular
prior pdf employed, n%mely, p(Z0 oc lY,[-( m + x)t2, is an invariant one
provided by Jeffreys' invariance theory. If im- proper prior pdf's are
employed, the integrals in (9.28) and (9.29) will not in general con- verge
indicating that not enough prior information has been introduced to obtain
proper marginal pdf's for the observations. Below, we assume a large finite
range for the param- eters which makes Jeffreys' prior pdf's proper.

256 SIMULTANEOUS EQUATION ECONOMETRIC MODELS


transformations being considered above, namely, P. = lVxA, B. = BA and
Y,. = A'EA. In this case our prior pdf for � will be in exactly the same
form as our prior pdf for �.. Then p(rlM 0 = p( r I 0, U0 p( Ol0 (9.28) and
(9.29) will be identical in view of the relation in 0.27) and our assumptions
about the way in which the noninformative or diffuse prior pdf's were
obtained. OLcourse, with informative prior pdf's in general, the pdf's in
(9.28) and (9.29) will not be identical and we would state that Mx and its
associated prior information are not observationally equivalent to Mu and
its associated prior information. Thus, in connection with the general linear
simultaneous equation model, YP = XB + U, it is the case that prior
information mt be introduced to identify it. On the question of how strongly
the prior information identifies M in relation to Mu, we suggest use of the
information theory quantity J, the "divergence," n as a possible measure;
that is, (9.30) J= f ( YM) - p( M)] log [ YM) This approach to the
identification problem can be extended easily to apply to posterior pdf's for
parameters; for example, if the assumptions that produce equality of (9.28)
and (9.29) hold, we shall see that the posterior pdf's for Ox and O are
indistinguishable. For posterior pdf's we have p(rl0 (9.31) and (9.32) P(�.I
Y, M.) = p(�aIM') p( YI �., M.) p( YI From (9.27), the equality of (9.28)
and (9.29), and the fact thatp(�xlMO dO = p(�.lM.) d�., given Jeffreys'
invariant prior pdf's, (9.31) and (9.32) are exactly the same with respect to
form and distributional parameters. Thus there is no basis for knowing
whether the posterior pdf relates to � or 0., and this means that there is an
identification problem. We have discussed the identification problem in
general terms applicable to a wide range of statistical models, including
simultaneous equation models as a special case. Since the identification
problem for simultaneous equation models is often considered in terms of
the relations between reduced form x See Jeffreys, ibid., pp. 179 if., and S.
Ku!lback, Information Theory and Statistics. New York: Wiley, 1959, pp. 6
if. for a discussion of the properties of this measure. THE CONCEPT OF
IDENTIFICATION IN BAYESIAN ANALYSIS 257 parameters and
structural parameters, we consider this approach in Bayesian terms by
making use of the valuable work of Drze. . The reduced-form system
associated with the simultaneous equation model YP = XB + U is given by
(9.33) Y = XI-I + V, where 1-I = BF-X and V = UF -. Let us denote the
reduced-form dis- turbance matrix by f2 = (F-x)'ZF- and assume initially
that f2 is known. Then, viewing 1-I and F as our unknown parameters, we
have for the joint posterior pdf (9.34) where p(lV, II) is our prior pdf and/(H
I Y) is the likelihood function. From (9.34) it is seen immediately that *
(9.35) p(rln, r) p(rln); that is, that the conditional posterior pdf for I', given
II, is proportional to the prior .conditional pdf for I', given II. This result,
due to Drze, shows that since the conditional prior pdfp(P[II) is unaffected
by the observations, the posterior conditional pdfp(PIII ' y) can be affected
only by prior informa- tion. It is the prior pdf for II, p(II) that gets modified
by the sample informa- tion x4 and, of course, the sample information adds
to our knowledge of II. In the traditional approach exact restrictions on I',
B, and Y are introduced to achieve identification. By limiting ourselves to
the case in which exact restrictions are placed on F and B we can
distinguish three others, namely, (a) too few restrictions to get a unique set
of values for the elements of F and B, given the elements of II and the
restrictions, the case of "under- identification," (b) the case of just an
adequate number of restrictions to achieve unique values for the elements of
I' and B, given II and the restric- tions, the case of "just-identification," and
(c) too many restrictions to solve for the elements of F and B uniquely,
given II and the restrictions, the case of "overidentification." In Bayesian
terms, following Drze's exposition, with nonstochastic a priori restrictions
achieving just-identification, case (b), and the prior on II, p(II) oc const, p(F
I II) has all its mass concentrated at a 2 j. Drze, "The Bayesian Approach to
Simultaneous Equations Estimation," Research Memorandum No. 67,
Technological Institute, Northwestern University, 1962. a Note that p(l', II I
Y) = p(FIII ' yjp(Ill y) and p(F, I-i) = p(FllI)p(I-l) ' 4 If the reduced-form
disturbance covariance matrix is unknown and our prior pdf :p(I',II, f) = p(1
v, rl) p(fl), the result mentioned above follows immediately; p(flI', :II) 4
p(fl) can be shown to lead to this result if v(fIF, rl) = p(fl[II), that is, when a
priori f is independent off for given II. Although this will not be true in
general, it is true when we have a diffuse prior pdf for f.

258 SIMULTANEOUS EQUATION ECONOMETRIC MODELS single


point. xs If one (or more generally r) of these restrictions is dropped, we
have (a), underidentification, in wh/ch case p(FIII ) would not have its mass
concentrated at a single point but rather uniformly spread out over a line, or
more generally an r-dimensional hyperplane, and there would be one, or r
dimensions, in which the posterior pdf p(F, II I �) or p(F, B I Y) would
become uniform. Finally, if we add one (or r) additional a priori restrictions,
the prior pdfp(II) could not be uniform because not all values of II are free,
given the prior exact restrictions. Examples illustrating aspects of these
cases are provided below. 9.4 ANALYSIS OF PARTICULAR
SIMULTANEOUS EQUATION MODELS The first .model to be analyzed
is the simple Haavelmo consumption model, x6 (9.36) ct = fi + ayt + ut}. t
= 1, 2,..., T, (9.37) y = c + zt where ct and yt are endogenous variables--per
capita price-deflated con- sumption and disposable income, respectively--zt
is an exogenous variable, "autonomous expenditures," and ut is a
disturbance term. We assume that the ut's have zero means and common
variance c, ' and are normally and independently distributed. With these
assumptions, the likelihood function for the model is x7 (9.38) /(e, fi, ,ldata)
oc I1 -elr [ 1 ] exp 27,2 - with the summation extending from t = 1 to t = T.
As prior assumptions, we shall go forward with the following prior pdf: 1
(9.39) p(, fi, .) - xs Note that linear restrictions on I' and B imply a set of
bilinear restrictions on I' and II, since II -- BI'-x. Thus we can carry forward
the discussion in terms of I' and II rather than in tei'ms of F and B. x6 T.
Haavelmo, "Methods of Measuring the Marginal Propensity to Consume,"
Journal of the American Statistical Association, 42, 105-122 (1947),
reprinted as Chapter 4 in Wm. C. Hood and T. C. Koopmans, Studies in
Econometric Method, New York: Wiley, 1953. This model has been
analyzed from the Bayesian point of view by T. J. Rothenberg, "A Bayesian
Analysis of Simultaneous Equation Systems," cit. supra, and by V. K.
Chetty, "Bayesian Analysis of Some Simultaneous Equation Models and
Specifi- cation Errors," unpublished doctoral dissertation, University of
Wisconsin, Madison, 1966, and "Bayesian Analysis of Haavelmo's
Models," Econometrica, 36, 582-602 (1968). 7 From (9.41) and (9.42) we
have ct = fi + (cs + zt) + ut. Thus the Jacobian of the transformation from
each us to each cs is I1 - l. Since there are Tsuch transformations, the factor
l1 - l T appears in (9.38). ANALYSIS OF PARTICULAR
SIMULTANEOUS EQUATION MODELS 259 where-c < fi < c, 0 < e < 1,
and 0 < c, < oo. Note that we are asserting a priori that e, fi, and log c, are
uniformly and independently distributed and are putting in the information
that , the marginal propensity to consume, is restricted to the interval 0 to 1.
This is an example of how inequality constraints can be introduced into an
analysis. x8 Combining the likelihood function in (9.38) with the prior pdf
in (9.39), we obtain the posterior pdf for the parameters: (1 - cO ' (9.40) p(e,
fi, ,ldata) oc o,T+ 1 1 _ _ eyt). ] exp[ 2ro. Z (c t fi . If interest centers on
making inferences about the marginal propensity to consume, , we can
integrate (9.40) with respect to fi and to obtain the marginal posterior pdf
for e, which is 9 (9.41) p(e]data) oc {Y [c, - - (y - (1 - tz) a' oc {vsO ' + (tz -
&)' y (yt - y).}<r->/.' 0 < tz < 1, where = r - y c, y = T - ZY,, v = T- 2, vs ' =
Z [ct- - &(Yt- Y)]', and (9.42) a = E (ct- e)(y,- y) Y (y, - ., the least squares
quantity obtained from a regression of ct on Yt. � On viewing (9.41), we
see that it is the product of one factor in a truncated univariate Student t
form centered at the least squares quantity a, given. in (9.42), if 0 < & < 1,
and a second Jacobian factor (1 - a) . The factor (1 - e)r pushes the center of
the posterior pdf toward zero and thus may be interpreted roughly in
sampling theory terms as compensating for Haavelmo's finding that plim &
> e, an interpretation put forward by Rothenberg. Using Haavelmo's annual
data, 1929-1941 (T = 13), Chetty computed the posterior pdf in (9.41) with
results shown in Figure 9.1. '� The mean of the x0 If we wished, we could
introduce other than fiat prior pdf's for a and log without greatly modifying
the following analysis; for example, we might use a beta pdf for and an
inverted gamma pdf for ,, See Chetty, loc. cit., for the results of
computations that employ a beta pdf for the prior pdf for , the marginal
propensity to consume. The integration with respect to fi is done by
completing the square on fi in the exponent and using properties of the
univariate normal pdf. Then the integration with respect to can be
performed by using properties of the inverted gamma pdf. 9.0 The marginal
posterior pdf for fi, obtained by a bivariate numerical integration, is also
shown in Figure 9.1.

260 SIMULTANEOUS EQUATION ECONOMETRIC MODELS 9 6 3


0.12 0.08 0.04 0.58 0.62 0.66 0.70 0.74 (a) 106 108 110 112 114 116 (b)
Figure 9.1 Marginal posterior distributions of c and/. (a) E(c0 = 0.660; E(a -
E = 0.004; E(a - E(a)) a = 0.001; E(a - E(a)) 4 = 0.0002; (b) E(/) -- 111.589;
E(/ -- E (/))' = 129.642; E(/ - E (/))a = 71.3034; E(/ - E (/))4 = 49283.172.
posterior pdf for a was calculated to be 0.660 which is somewhat below a,
the least squares quantity in (9.42), computed by Haavelmo to be 0.732. In
addition, since we have the complete posterior pdf for a, posterior proba-
bility statements can be made readily. Further, the posterior pdf for the
Keynesian "multiplier," 1/(1 - a), can be derived from p(aldata) shown in
(9.41). 2x This posterior pdf for the multiplier will incorporate the prior in-
formation that the marginal propensity to consume is restricted to the range
zero to one as well as the sample information, and, of course, if additional
prior information about e is introduced, it too will be reflected in the
posterior pdf for the multiplier. .z Using the above assumptions and
assuming that the ut's are generated by a first order autoregressive model, ut
= put-! q- et, Chetty computed conditional posterior pdf's, p(cq t, data), and
marginal posterior pdf's, p(cq data) and p(tldata), using Haavel- mo's data.
These results indicated that inferences about were somewhat sensitive to
departures of t from zero. In fact, the posterior pdf for p was highly
concentrated about 0.898, its mean. ANALYSIS OF PARTICULAR
SIMULTANEOUS EQUATION MODELS 261 Just as in current
discussions revolving about the Friedman-Meiselman testing of the
"income-expenditure" and "quantity-theory" models, ". Haavelmo in his
1947 paper was concerned about the assumption that zt in (9.37) is an
exogenous variable. To investigate this assumption he formulated two
broader models in which the assumption was relaxed. Here we consider an
analysis of his third model, 'a whose equations are (9.43) (9.44) (9.45) Ct =
q' oyt + Ult } rt = v + tz(Ct + Xt) + U9.t yt = ct + Xt -- rt t= 1, 2,..., T. In
this model (9.43) is a consumption function exactly the same as that in his
first model. Equation 9.44 is a "gross business saving" equation which
relates gross business saving, rt, to the total "gross disposable income," ct
and xt, of the private sector in which x, is gross investment. '4 Finally,
Haavelmo considers (9.45) as an accounting identity, assumes that xt is
exogenous, and introduces disturbance terms uxt and u.t. In this system ct,
y, and rt are endogenous variables. Note that by definition we have zt = xt -
rt, where zt is the variable appearing in (9.37). Thus, if in (9.44)/z = 0 and
uxt and u.t are independently distributed, then z, would 'be exogenous, as
assumed in the first model. Studying how departures from these
assumptions affect inferences about parameters of the model is an excellent
way of assess- ing the implications of the assumption that zt is exogenous.
If we substitute from (9.45) into (9.43), we obtain (9.46) (9.47) ct = fi + (ct
+ xt - rt) + rt = v + lz(Ct + XO +um t = 1,2,...,T, as a two equation model in
two endogenous variables, ct and rt, and one exogenous variable, xt. We
assume that ut and u.e have a bivariate normal pdf with zero mean vector
and covariance matrix Z, a 2 x 2 positive definite symmetric matrix. Further
the disturbance terms are assumed temporally 9.9. M. Friedman and D.
Meiselman, "The Relative Stability of Velocity and the Invest- ment
Multiplier," in The Commission on Money and Credit Vohtme, Stabilization
Policies, Englewood Cliffs, N.J.: Prentice-Hall, 1963. 9.a A brief analysis
of his second model appears in A. Zellner, "Bayesian Inference and
Simultaneous Equation Econometric Models," cit. suRra. The analysis of
the third model was initiated in the author's lecture notes in 1963 and
studied further in V. K. Chetty's ,work, cit. supra. 24 All variables are price-
deflated and on a per capita basis. See Haavelmo's paper for further details
of the definitions of the variables and their relation to. the national incomo
accounts. -'

262 SIMULTANEOUS EQUATION ECONOMETRIC MODELS


independently distributed. Under these assumptions the likelihood function
for the model is given by (9.48) l(0, Zldata) c JlZ-xl v�' exp (-� tr U'U;-
x), where J is the Jacobian of the transformation from the u's to the c's and
r's: (9.49) J U = (uxu.) is a T x 2 matrix related to the structural parameters
and observations by (9.46) and (9.47), and 0' = (e, fi,/, v) is a vector of
structural coefficients. As prior assumptions we assume not only that 0 < <
1 but also that 0 </z < 1, since/ is the marginal propensity to save on the
part of the private business sector. For present purposes we also assume that
(9.50) that is, the elements of 0 are uniformly and independently distributed
'5 and we have diffuse prior information about the distinct elements of Y. or,
equivalently, about those of Y.-L With these prior assumptions the joint
posterior pdf for the parameters is (9,51) p(0, Z-Xldata) oc JlZ-xl(e-'-x>t'
exp (-� tr U'UZ-X) � On integrating with respect to Z -x, we obtain [1 -
(1 -/z)] a'. (9.52) p(OIdata ) oc IU'Ul,a Then a further integration with
respect to the intercept parameters, fi and v, yields .6 [1 - (1 - t)] v (9.53)
p(,/ldata) ' , 0 < , t < 1, 0.5 The analysis can be carried through with
nonuniform prior pdf's for a and without much difficulty. o.6/ and v appear
just in the denominator of (9.52). Note that I U'UI = ux'uxuo.'uo. - (ux'uo.)
�'. From (9.46), with wxt =- ct - (ct + xt - rt), we have uxt = wxt - x - (/ -
x), where x = '= x wu/T. Similarly, from (9.47) with wo.t -- rt - (ct + xt), we
have uv.t = war- 9.- (v- fie.), where 9. = Y. tv--x wv.t/T. Then
straightforward algebra yields (i) ux'uxu.'u. - (ux'u.) ' = rnxxrn.o. - rnxo. ' +
rrnaa(l - x) ' + Trnxx(v - a) �' - 2Trnxo.(t - x)(v - ffa), ANALYSIS OF
PARTICULAR SIMULTANEOUS EQUATION MODELS 263 where =
(fixfi.) is a T x 2 matrix in which fix and fi. have typical elements given by
at '- Ct -- -- o[Ct -- q- Xt -- '-- (r t -- ?)] (9.54) and . = rt - ? - I(c, - + xt - ),
respectively. In the definition of at and a.t, c, x, and ? are sample means.
Using Haavelmo's data and bivariate numerical integration techniques, we
calculated marginal posterior pdf's for e and/z from (9.53). Both mar- ginal
pdffs are unimodal. That for e had a mean of 0.705 and a variance of
0.00137, whereas that for/ had a mean of 0.158 and a variance of 0.00050.
Shown in Figure 9.2 are the contours of the joint posterior pdf. The results
strongly suggest that the parameter/z has a non-zero value. Also, it should
be noted that (9.53) can be employed to study how sensitive inferences
about e are to assumptions about/z; that is, if it is assumed that / = to,
where/0 is a given value, numerical methods can be employed to analyze
(9.53) under this assumption. Further, if it is assumed that the disturbance
terms in (9.46) and (9.47)are uncorrelated (. = 0) and if diffuse prior
assumptions are introduced for xx and o.., posterior pdf's for e and / can
readily be obtained under the prior assumptions that we have made about
other parameters, assumptions that incorporate prior information regarding
the ranges of e and/. To illustrate some aspects of systems that are
"overidentified" in the traditional sense we consider the following simple
model: (9.55) Y = 7Y. + ut } t = 1, 2,..., T, (9.56) y. = where yx and y. are
endogenous variables, xx and x. are exogenous variables, y,/x, and/. are
scalar parameters, and ux and u. are disturbance terms with covarianee
matrix Y,, a 2 x 2 full matrix. The reduced form system is (9.57) Y, = Mxu +
'.x.t + vt t = 1 2,..., T, (9.58) Yat = *raxxt + *raaxat + vat) where rn = Y[=x
(wu - i)(wt - �), i,j = 1, 2, and = Y wu/T, i = 1, 2. On sub- stituting from (i)
in (9.57), [1 -- (1 -- /)l ' (0ldata) oc (mxxmaa -- m122) :vIa x + - ) � The
integration with respect to fi and v may now be performed by using
properties of the . bivariate Student t pdf to yiel&(9.53). The posterior pdf
given in Chetty, op. cit., p. 593, i'for this problem is incorrect.

264 0.775 SIMULTANEOUS EQUATION ECONOMETRIC MODELS


0.750 0.725 0.700 0.675 0.650 0.625 0.600 _k l 0.1125 0.1250 0.1375
0.1500 0.1625 0.1750 0.1875 Figure 9.2 Contours of joint posterior pdf for
c and/ computed from (9.53). with = rA, = (9.59) It is apparent that the
relations in (9.64) imply that y = rrn/rr.z = rrz./,r.. or (9.60) rrnrr.. - rrz.rro. z
= 0, which is a condition on the rr's. Thus not all four rr's are capable of
inde- pendent variation, given that (9.60) is to be employed as prior
information in our analysis. Care must be exercised in the choice of a prior
pdf for the rr's in this situation. In larger systems restrictions on the
structural coef- ficients usually imply quite a few restrictions on reduced-
form ficients and should be taken into account in analyzing systems. Of
course. our prior pdf is in terms of the structural parameters F, B, and Z, an
analysb::.! can go forward, as above, without bringing in reduced-form
parameters. In the next section we present Bayesian analogues to single-
equation!:i sampling theory estimation approaches. In this single-equation
approach w�ii take account only of the a priori identifying information
pertaining to parameters of the equation under consideration. Such analysis
will be usefulll for example, when we are uncertain about the formulation
of some or all oi the other structural equations of a model or when we wish
to build into o u "LIMITED INFORMATION" BAYESIAN ANALYSIS
265 analysis just that part of the identifying prior information relating to the
parameters of one equation. 9.5 "LIMITED INFORMATION" BAYESIAN
ANALYSIS Consider a particular structural equation of an m-equation
model, say the first: (9.61) Yx - YxYx - Xxlx + uz, where y is an n x 1
vector of observations on the endogenous variable whose coefficient is set
equal to one by our normalization, Yz is an n x mx matrix of observations
on mx other endogenous variables appearing in the first equation with
coefficients not assumed equal to zero, matrix of observations on kx
predetermined variables appearing in the first equation with coefficients not
assumed equal to zero, �x and [x are tax. x 1 and kx x 1 coefficient
vectors, respectively, and ux is an n x 1 vector of serially uncorrelated
normal disturbance terms, each with mean zero and variance *n. We assume
that the parameters of (9.61) are identified by virtue of the restrictions
imposed. .7 The reduced-form equations for (yi Y) are given by (9.62) (yi
Y0 = (X where X = (XxiXo) is an n x k matrix of observations with rank k
on the k predetermined variables of the system, with Xx, an n x kx matrix
of observations on the kx predetermined variables appearing in (9.66), and
Xo, an n x ko matrix of observations on ko predetermined variables
excluded from (9.61) on a priori grounds. The partitioned reduced-form
coefficient matrix in (0.62) has elements n, a k x 1 vector, xo, a ko x 1
vector, IIox, ' a kx x mx matrix, and IIoo, a k0 x mx matrix. The n x (tax +
1) matrix (vx Vx) contains reduced-form disturbance terms, where vx is an
n x 1 and Vx is an n x m matrix. On postmultiplying both sides of (9.62) by
(1! - �x')' and equating the g coefficients with those appearing in (9.61),
we obtain Xo - 11oo� = 1o: 0, That is, we can write (9.61) as Yx- YxYx-
YoYo = Xxlx + Xolo + ux with = 0 and 15o = 0, where Y = (Yx! YxiYo),
an n x rn matrix of observations on all dogenous variables,,and X = (Xx!
Xo), an n x k matrix of observations on all edetermined variables. The
identifying restrictions are o = 0 and 15o = 0.

266 SIMULTANEOUS EQUATION ECONOMETRIC MODELS where


[50 is assumed a priori to be a zero vector. Using (9.63) and (9.64), we can
express the reduced-form equations, given in (9.62), as y = (Xno +
Xonoo)� + x[5 + v = XIIoY1 + X1151 '4- Vl (9.65a) and (9.65b) Y1 =
XIIo + 1/1, where IIo'= (Ilol'iIIoo'). We have brought the reduced-form
system (9.62) into the form of (9.65) to illustrate that for given He, say IIo=
fie = (X'X)-1X' Y1, (9.65a) is in the form of a multiple regression-model.
With a diffuse prior pdf for the elements of �1, [51, and on the common
variance of the elements of vl, assumed to be normally and independently
distributed, each wifh zero mean, the conditional posterior pdf for �1 and
[51 is in the multivariate Student t form with mean vector given by (9.66) 1
= (e]'l)= (-?/-X--X---�-'i---�'--X;-X---l( fiOtx'y11, XltXo i XltX1 / XltYl /
which is just the two-stage least squares (2SLS) estimate. It must be appre-
ciated, however, that 1 is a conditional posterior mean, given He = fie, and
may be a poor approximation to the unconditional posterior mean of lil in
small samples. Also, it is interesting to observe that since Yl - Vl = Xi,
where 'H 1' = (11t i10'), we can write (9.65a) as Then on premultiplying
both sides of (9.67) by (XHoiX1)' and solving for the coefficient vector, we
have 1 ................................ (9.68) al = = Xl'Xno i Xl'Xl ] \ Xl'X,l/' an
algebraic expression for the elements of lil in terms of reduced-form
parameters. Further, if we expand the rhs of (9.68) about fie = (X'X)-1X'yx
and 1 -- (X'X)-1X'yl, the mean of the posterior pdf for the reduced-form
parameters if we use a diffuse prior pdf in connection with the system in
(9.62), '8 we obtain the following result for the posterior mean of $1, which
is assumed to exist: (9.69) E(al lYl, Y1)-- ............................. + remainder. \
x/Xfio Xl'Xl / \ Xl'Xl / 0.8 That is, (9.62) is viewed as a normal multivariate
regression system, studied in Chapter 8, and a diffuse prior pdf for the
elements of (gx, H0), and the disturbance covariance matrix f in the form of
(8.7-8.9) is used. "LIMITED INFORMATION" BAYESIAN ANALYSIS
267 The zeroth-order term in this expansion is precisely the 2SLS estimate.
It is not known how well the 2SLS estimate approximates the posterior
mean of i1' = (�1'[51') when our prior information about il is diffuse;
however, it may be conjecturedthat neglect of the remainder in (9.69) can
yield a poor approximation when the sample size is small. To provide an
explicit posterior pdf for the parameters of (9.61) we employ an approach
similar in certain respects to that put forward by Drze. 29 Combining (9.61)
and that part of (9.62) relating to Y1, we have (9.70) (yl - rmi 1) = (xlXo) -
L i ..... \o: Iloo + (Uli t/l), where we have already incorporated the prior
information that certain endogenous variables do not appear in (9.61) and
will later introduce [5o = 0. Note that the disturbance matrix in (9.70) is
given by (9.71) /1 iO'\ (Ill.i V1) since I11 -'- V 1 -- I/flY1.30 Under the
assumption that the rows of (�1 ! V1) are normally and independently
distributed, each with zero mean vector and (ml + 1) x (ml + 1) pds
covariance matrix fZl, the covariance matrix for each row of (Ul i Vl.) is fZ,
= A'fiA, where A is the triangular matrix on the rhs of (9.71). Then the
likelihood function for the system in (9.70) is al- (9.72) l(�1, [5, IIo, n,lyl,
}'1) cr exp [-� tr(W- XII,)'(W - xn,)n,- q, where W= (Yl- Y1YliY1), H,:
([S iHo), and [5'= ([51ti[50'), with the prior information [5o = 0 to be
incorporated in our prior pdf. Note that for given �1 the likelihood
function in (9.72) is in a form encountered earlier in the analysis of the
multivariate regression model. The prior pdf we shall employ is given by
(9.73) I-J0, n,lOo = 0) c pln,I 2 j. Drze, "Limited Information Estimation
from a Bayesian Viewpoint," Discussion Paper 6816, University of
Louvain, 1968. See also A. Zellner, "Bayesian and Non- Bayesian Analyses
of Simultaneous Equation Models," paper presented to the Second World
Congress of the Econometric Society, Cambridge, 1970. a0 Note from
(9.33) that,,U = VP. With U = (u, uo.,..., urn), V = (v! V i V0), and the first
column of P given by (1 ! - %' 0')' uz = vl - Vzyx, as stated above. ax The
JacobJan of the transformation from (uliV1) in (9.70) to (yzi Yz) is equal to
one.
268 SIMULTANEOUS EQUATION ECONOMETRIC MODELS where pt
-- �x(Yx, [xl[o = 0) and m' = mx + 1. In (9.73) we are assuming that (yx,
[tx), IIo, and f. are independently distributed. The elements of Ho are
assumed to be uniformly and independently distributed, and for fl. we use a
diffuse prior pdf in a form introduced in Chapter 8. Then the posterior pdf is
(9.74) x exp [-tr(W- where D denotes given data and P prior assumptions. In
the analysis of (9.74) we first integrate with respect to the elements of fl.
and then with respect to those of Ho, a submatrix of H.. The resulting pdf
will then be conditionalized to reflect the identifying prior information o 0.
On integrating (9.74) with respect to the elements of fl., we obtain (9.75)
where fl, = (X'X) -x'W and A = (W- Xfl,)'(W- Xfl,). Since (9.75) has the
form of a generalized Student t pdf (see Appendix B), we can use this fact
to integrate it with respect to the elements of o, a submatrix of ,, to obtain
a= all In (9.76) = (X'X)-xX'(Yx - YxYx) and axx, the (1, 1) element of A, is
given by = (9.77) = where (9x i x) = (Yx i Yx) - X(x i flo). With respect to
the quadratic form in in (9.76) we have aa a. Note that D = where I$o = 0,
reflects for given I$ and y the extent to which the restrictions in (9.63-64)
are not met. Also, it can be shown that with [o = 0 the values for y and 1S
which minimize (15- )'X'X(I$- )/a are the limited-information maximum
likelihood estimates. aa This algebraic result is derived as follows: x(f, - ) =
xf, + Xofo- x(x'x) -*x'(y - = xd* - (.* - "LIMITED INFORMATION"
BAYESIAN ANALYSIS 269 where gx is the 2SLS quantity shown in
(9.66). Thus (9.76) can be expressed as (9.78) p(SxlD , .) oc pai''�'[1 + (Sz
' JJD'H(Sz - Jz)]-("-"h )'�', where H = :'/a. If in (9.78) we take px --- p(�x,
lxl[$o = 0) oc const, the posterior pdf for t would be in the multivariate
Student t form centered at the quantity i[x, were it not for the fact that a
depends on �x.a4 However, since the con- ditional. posterior pdf for l,
given �x, is in the multivariate Student t form, (9.78) can be integrated
with respect to the k elements of lx to yield the following marginal posterior
pdf for �x (9.79) P(YID, P) oc a�(-,)l�'[1 + (y - )'H(y - where v = n - 2m
- k and H = [''' - I7''X(X'Xx)-X'IT]/ax. When, as is often the case in
applications, y has just a small number of elements, say one or two,
numerical integration techniques can be employed to evaluate the
normalizing constant and to analyze other features of (9.79). With regard to
the posterior pdf for an element of l, say fi, (9.78) can be integrated
analytically with respect to the elements of l, other than fi, to provide the
joint posterior pdf for y and which can be analyzed where j. = X(X'X) -
XX'yx, . = X(X'X)- xX' Yx = Xlelo, 2x = ( ? ! Xx), and [o = 0 have been
used. Then ([$ - [$)'X'X([$ - ): ( - 28)' - 8) and O - 2SD'( - 2s,) = [ - 2 - 2(s -
D]'[ - 2 - 2(s - = (ix - 2gD'O - 2x) + (6x - D'2'2x(6x - gD since 2z'x - 2xx) =
0 from (9.66) and - 2g = - Fxix - Xxx = y - Y - X - ( - P) = fi - ( - P) = 0,
since fi = y - Yx - X, the 2SLS structural residual vector, equals } - �. a, It
is not obvious that (9.79) is a proper pdL If we write it as all(n - m 1 - )/2
[all + (Y1 -- i)'Ha(y1 -- 1)] (v+m)la' where Ha = aH, and note that a is a
quadratic form in the elements of y, then for (9.78) to integrate to a constant
we need the existence of the (n - m - k)th moment : of the elements of ya.
The (n - m - k)th moment will exist if v > n - m - k or :k'- k > m. Sine, k - k
is the number of predetermined variables excluded from ' the st structural
equation by the identifying restriction o = 0 and m + 1 is the num- :bor of
endogenous variables assumed to appear in the first structural equation, the
Condition k - k > mx is formally the same as the usual "order ndition" for
identi- .('fiability. Further, the ndition that Ha be pds can be shown to be
formally equivalent i: to the mual "rank condition" for identifiability. ;:

270 SIMULTANEOUS EQUATION ECONOMETRIC MODELS


conveniently by using numerical integration techniques when the
dimension- ality of � is small. as If the prior pdf�x in (9.78) implies a
multivariate normal prior pdf for the elements of lix, the posterior pdf is in
the no,rmal-t form encountered in Chapter 4, except for the dependence of
axx on �x. The asymptotic expansion technique, explained in Appendix
4.2 along with expansions of negative powers of axx, appears to be an
approach to an approximation of the posterior pdf which is useful; however,
the details of this approach have not yet been fully worked out. 9.6 FULL
SYSTEM ANALYSIS In contrast to the approach presented in Section 9.5
which led to the posterior pdf for the parameters of a single equation, using
just the identifying prior information for those parameters, we now
concentrate attention on the problem of deriving the joint posterior pdf for
the parameters appearing in all structural equations, a joint posterior pdf
that will incorporate the prior identifying information for all parameters of
an m equation system, YP = XB + U. Having the joint posterior pdf for the
unrestricted elements of P and B, the marginal pdf for the parameters of a
single equation can be obtained. Since this full-system marginal posterior
pdf incorporates more prior and sample information than the corresponding
single-equation marginal posterior pdf for an equation's parameters, it will
in general exhibit less dispersion than the corresponding single equation
posterior pdf. a6 If our identifying information is in the form of exact zero
restrictions on elements of P and B in YP = XB q- U and our prior pdf for
the remaining parameters is not dogmatic or degenerate, then in large-sized
samples the likelihood function will approximate the posterior pdf, as
explain6d in general in Chapter 2. Since the likelihood function assumes a
normal form in large samples, the large-sample posterior pdf for the
unrestricted parameters then takes the multivariate normal form which has a
mean vector with elements that are the "full information" maximum
likelihood (FIML) estimates and a5 Another approach to the analysis of
(9.78) is the development of an asymptotic expansion along the lines
explained in Appendix 4.2. In this approach the fact that a depends on �x
must be taken into account. It is useful to write ix - fxYx = - f/x(%-'x)=fix-
Ix(Yx-x) and to write a =lt'fix[l +A], where A= [-2fix'f/x(Yx - {x) + (Yx -
x)'Px''x(Y - x)l/fix'a. Then t/l: t-(/-/l )/a and axx - can be expanded in a
series involving powers of A. The leading normal term in the asymp- totic
series has mean x and covariance matrix (ZPx'2)-0xx, where a6 This will be
true for small and large sample sizes, since the identifying prior informa-
tion exerts an influence on the posterior pdf for the elements of I and B in
both small- and�i large-sample situations. .. FULL SYSTEM ANALYSIS
271 a covariance matrix equal to the inverse of Fisher's information matrix,
evaluated with FIML estimates? Given that the computation of FIML
estimates has been discussed in the literature, we shall not pursue this topic
here. However, it must be emphasized that this approximation to the
posterior pdf is appropriatejust for large-sample sizes and not much is
known about how large a sample must be in relation to a given model for
the normal approximation to be reasonably good. Another interesting large-
sample approximation to the mean of the full- system posterior pdf can be
obtained by writing the relation in (9.67) for each equation of an m-
equation system; that is (9.80a) or (9.80b) y = where in (9.80a), with - 1,
2,..., m, . = X, the systematic part of the reduced form equation for y, = (I?i
X), with the systematic part of the reduced form equations for Y, a matrix of
observations for the endo- genous variables assumed to appear in the ath
equation with nonzero coefficients, and ,Y, the observation matrix for the
predetermined variables assumed to appear in the ath equation, with
nonzero coefficients, and li' = (y', 1'), the coefficient vector for the th
equation. In (9.80b) .' = (.', o.', � .., Ym'), represents the block diagonal
matrix on the rhs of (9.80a), and 15' = (lix', 8o.',..., li,n'). Then, given that H
is a square nonsingular matrix, we have Hg = H8 or li = ('H'H)- X,H,H
(9.81) = � � if H is taken so that H'H = Y,-z � I. Since (9.81) is an
algebraic relation connecting elements of li with elements of H, the
reduced-form coefficient matrix, and of Y., the structural-disturbance
covariance matrix, we can approximate the posterior mean of $, Eli,
assumed to exist, by expanding the rhs of (9.81) around consistent sample
estimates of Y,, ,, and , say :, 2, and 07 See, for example, T. J. Rothenberg
and C. T. Leeriders, "Efficient Estimation of iSirnultaneous Equation
Systems," Econometrica, 32, 57-76 (1964), for an explicit ii0valuation of
the information matrix.

272 SIMULTANEOUS EQUATION ECONOMETRIC MODELS 9,


respectively .os Then the zeroth-order term in the expansion yields the
following approximation: (9.82) E5 __' [2'( -x � I,)2]- 2'(1- � I,)y. It will
be noted that the rhs of (9.82) is in the form of the three-stage least squares
estimate (3SLS). 89 Below we show that the leading normal term in an
asymptotic expansion approximating the posterior pdf is centered at (9.82)
with covariance matrix [2:'( -x � I);2] -x. It must be remembered, however,
that these are large-sample approximations and that little is known about
how good these approximations are for a given model when the sample size
is not large. We now turn to an explicit derivation of the posterior pdf for
the param- eters of the model IT = XB + U. Under the assumption that the
rows of U are normally and independently distributed, each with zero mean
vector and rn x rn pds covariance matrix 5:, the likelihood function is 4�
(9.83) I(P,B, ZlD) oc IZl Irlexp [-�tr(YP - XB)'(YF - XB)Y,-x] where D
denotes given data. Our prior pdf for the parameters is In (9.84) we assume
that the elements of Y are a priori independent of those of I' and B and use a
diffuse prior pdf in a form employed and explained in Chapter 8. The prior
pdf for P and B, px(I', B), is assumed to incorporate identifying prior
information which, as indicated above, must be provided to estimate the
model's parameters. Then the posterior pdf is given by (9.85) x exp [-
�tr(YP- XB)'(IT- XB)Z-q oc p(r, "'21 r'czrl x exp {-� tr[nI"ClI' +(B -
1})'X'X(B- 1})]Y.-}, where P denotes prior assumptions, 1} = (X'X)-xX'YI '
= tiP, and nfl = (Y- XfI)'(Y- XfI)= 12'12. as For example, we could take 9. =
O'O/T, where /9 is a matrix of 2SLS structural residuals and use elements of
fI - (X'X)-xX' Y to obtain 2 and t. Alternatively, FIML estimates of Z, Z,
and y could be employed. 89 S A. Zellner and H. Theil, "Three-Stage Least
Squares: Simultaneous Estimation of Simultaneous Equations,"
Econometrica, 30, 54-78 (1962). o Note that Irl appears in (9.83), since the
Jacobian of the transformation from each row of U to the corresponding
row of Y is Irl. Since there are n rows in U and in Y, liT* appears in (9.83).
Also, it is understood that the Jacobian factor is taken to be positive. .
FULL SYSTEM ANALYSIS 273 On integrating (9.85) with respect to the
elements of :, we find that the marginal posterior pdf for P and B is v(r,
S)lp'Cwl, (9.86) p(r, s[o,P) oc Inr'Czr + (s- From the form of (9.86) we see
that, if the conditional prior pdf for B, given P, p.(B I I') oc const, the
conditional posterior pdf for the elements of B, given I', is in the
generalized Student t form with mean 1} = (X'X)- xX' YI', which is to be
expected, since �P = XB + U for given I' is, as pointed out above, in the
form of a multivariate regression system. Further analysis of (9.86) for a
particular choice of�x(P, B) appears possible but has not yet been carried
fhrough. x If we use the relation Z = P'glF, we can rewrite the second line of
(9.85) in terms of P, B, and fZ as follows' (n ) (9.87) p(I', B, alo, J,) p(r,
B)lfzl-(+m+x"' exp - tr fill - x exp [-� tr(fT- xs)'(gr- xs)(r'ar)-q, since (B-
t})'X'X(B- t}) = (XB- XflP)'(XB- XflI')= (gr- xs)' x � (frP - XB), with F =
Xfl. Further, we assume that our prior pdf for P and B, px(I', B),
incorporates the prior identifying information that certain elements of (P, B)
are. equal to zero. If we denote the nonzero elements of (P, B), other than
the rn elements of P set equal to one in our normalization, by 8' = (i', ia',...,
15m'), with i' = (y', lid), = 1, 2,..., m, and let p(i) be the prior pdf, the
posterior pdf can be written as (9.88) P(b'gllD'P) oc p(8)[gl[-("+m+x)/' exp
(-trflf -x) x exp [- tr(, XBO'(, XB0(P, P,) ], where (P, B0 represents (P, B)
with identifying and normalizing conditions imposed. We shall now
develop an asymptotic expansion of (9.88). In the last factor 0n e rhs of
(9.88) write r,'ar,: (L + ar,)'(a + aa)� + (9.89) =+C, where Pr and
aronsistent estimates of P and , respectively, P = Pr- and , both having
elements of 0(n-), = ' = - P :a If px(P, B) implies zero identifying
restrictions, a normalization rule and a diffuse ::;:prior pdf on the remaining
parameters, the mode of (9.86) is located close to the FIML ;:%timate. ::

274 SIMULTANEOUS EQUATION ECONOMETRIC MODELS and C =


Fr'Fr - 'r'il' Then4' (9.90) (r;arr)- = where R = g-xCg - + -xC-C-X .... . On
substituting from (9.90) in the second exponential factor in (9.88), we have
exp [-� tr(fZP - XS)'(grr - xsr)(r;nro-q (9.91) = exp [-� tr(gI' - xB0'(gr -
xB02 - tr K] exp [-( 215)'(g - � I)( 215)] exp (-tr K), where K = (Fi" -
XB)'(FP - XBOR/2. 4� Further, if we complete the square on 15 in the last
line of (9.91) and expand exp (-tr K) as e ' -- 1 + we obtain (9.91)
proportional to x + x'/2! + x/3! +'", exp [-�(15 - (9.92) x [1 - tr K + �(tr
K) ' - {r(tr K) a +"'], where $ = ['(g-x � i):]-x,(-x � I). Using (9.92), we
find that the posterior pdf in (9.88) takes the following form' (9.93) (n )
p(15, lD, os exp tr ilgl -x x exp [-�(15 - )'M(15 - )] x [1 - tr K + �(tr K) ' -
{r(tr K) s +'.'], where M = '(S;-x � h):. 9. Equation 9.90 is obtained by
writing the rhs of (9.89) as ( + c)- = 9. - ( + c; - ) - and expanding (lr + C-x)-
x in a power series. Note that the elements of C are O(n- '). In going from
the second to the third line of (9.91) tr( trr - XB0'( tr - XB,),- = ( - 2S)'(- �
I,)( - has been used where and 2 have been defined in connection with
(9.81). 4 In the expression for K the elements of ( ?P, - XBr)'( I'I'r -- XBr) =
(flrr - Br)'X'X x (flit - B) are 0(1), since from llP = Br we have flPr - Br = -
Alii'r, where All = li -- fl. Thus flI' - B has elements of 0(n-m) because those
of All are of O(n -xt') and (flP - BO'X'X(flI'r - BO has elements of 0(1),
given that those of X'X are 0(n). FULL SYSTEM ANALYSIS 275 If only
the leading term of (9.93) is retained, we have s (n ) p(15, nip, P) Inl exp tr
ilf-x (9.94) ,:, x exp [-�(8 - $)'M(15 - $)]. We see that in (9.94) gl and 15
are independently distributed, with the elements of [2 having an inverted
Wishart pdf and those of 15 having a multivariate normal pdf, with mean
vector = [2'(g- x � I)2]- x2'(5;- x � I) and covariance matrix M -x = [2'(2 -
x � I02] -x, quantities analogous to the large-sample sampling theory 3SLS
estimate and its large-sample covariance matrix estimate. If, for example,
(9.94) is used as a prior pdf in the analysis of another set of data, (Y, X),
which is assumed to satisfy YI', = XB, + U,, with the n, rows of U,
independently and normally distributed with zero mean vector m x m pds
covariance matrix Y,, then, to the order of the approximation involved in
(9.94), the posterior pdf is given by p(15, hiD, D,, P) oc Inl-'"' +m+ exp - tr
f-x (9.95) x exp [-�(15 - g)'37I(15 - g)], where D denotes the new data, n'
= n + n,. f = (nll + n�,il,O/(n + n,O, tl, = (Y - Xtl,)'(Y, - XtI,O, with tI, =
(X,'X,O-X,'Y,, (9.96) = /O-(Mg + Mg), with = M-,'(,- � I), M = .,'(2- �
I,o), and (9.97) 3r-x = (M + M)- . The sample quantities , 9, and g are based
on the new data and are de- fined precisely in the same way as , , and g.
From (9.95) and (9;96) we see that to this order of approximation the
posterior mean of 15, $ in (9.96), is a matrix weighted average of the two
sample quantities and with their respective precision matrices, M and M, as
weights. Also, 3r-x in (9.97) is the posterior covariance matrix for 15 if we
rely on the leading term in the expansion of the posterior pdf. Since
dependence-on the leading term of the expansion (9.93) may not be an
accurate enough approximation in many circumstances, we now indicate
how further terms in the expansion can be taken into account. Let us
assume that the prior pdf)for 15 in (9.93), p(15), is given by (9.98) p(15) oc
exp [-�(15 - 1)'A(15 - 1)1, 4' To this order of approximation the prior
factor p(S) must be omitted; that is, if we expand p(8) about , p(li) -- p() +
R', where R' is the remainder, the elements of R' are of the same order as
terms omitted in (9.94).

276 SIMULTANEOUS EQUATION ECONOMETRIC MODELS where $


is the prior mean vector and A- x is the prior covariance matrix. On
inserting this in (9.93) and completing the square on 8 in the exponent, we
have (n ) p(8, .fID, P) oc Ifil -"+m+u exp - tr (9.99) x exp [-�(8 - )'(M + A)
(8 - 8)] x [1 - tr K + (tr K) ' - }(tr K) a +..'], with = (M + A)- (Mg + AS).
Then, to evaluate the normalizing constant of (9.98), we can integrate with
respect to 8 and f, taking acccount of the terms involving powers of tr K.
These integrations can be viewed as evaluating the expectation of tr K and
its powers with respect to the inverted wishart-multivariate normal pdf for f
and 8 shown as the leading factor in (9.98). Given that these integrations
have been performed, 46 we have the normalization constant for (9.98) and
can use this normalized pdf to provide a better approximation to the
posterior pdf than that provided by relying on just the leading term. 9.7
RESULTS OF SOME MONTE CARLO EXPERIMENTS In this section
we analyze a simple two-equation simultaneous-equation model from the
Bayesian and sampling-theory approaches. The model has the property that
almost all sampling-theory approaches, including limited- information
maximum likelihood, two-stage least squares, full-information maximum
likelihood, three-stage least squares, indirect least squares, etc., yield the
same estimators for the coefficients of the model. Thus the experi- ments
described compare the Bayesian approach with almost all well-known
sampling-theory estimation techniques. In the experiments samples of data
were generated from a two-equation model under known conditions and
Bayesian and sampling-theory tech- niques were applied in the analysis of
each generated sample. We then com- pared the relative performance of the
two approaches in a number of trials. The rationale of this procedure is that
we envisage investigators using the:ii model on a number of occasions to
analyze different sets of data. What w� want to measure are various
properties of the performance of the Bayesian and sampling-theory
approaches in repeated trials. 47 It should be recognized that this criterion
of" performance in repeated trials" is one that is commonli 46 Terms of a
given order in n, say 0(n-a), a > 0, are retained, and those of higheii order of
smallness in n are neglected. :i 4v See H. V. Roberts, "Statistical Dogma:
One Response to a Challenge," The America! Statistician, 20, 25-27 (1966)
for further comment on the relevance of Monte Carla experimental results
for appraising alternative approaches. " RESULTS OF SOME MONTE
CARLO EXPERIMENTS 277 employed to rationalize sampling-theory
approaches. We do not regard it as the ultimate or even the most appropriate
for judging alternative approaches. However, it does appear to have some
relevance in the sense mentioned above and has certainly been given
extremely heavy weight in the sampling-theory literature. With*hese
remarks made, let us turn to the specific details and results of the
experiments. 9.7.1 The Model and Its Specifications The simple model we
analyze is (9.100) yxt = yyo. t + uxt t = 1, 2,..., T, (9.101) Yae fix + uo. )
where Y.t and y. are observations on two endogenous variables, xe is an
observation on an exogenous variable, u.t and uo.t are disturbance terms,
and and are scalar parameters. The reduced-form equations for this model
are (9.102) (9.103) Yxt -' rrXt + xt Yat : rraXt d- Vat} t = 1,2,...,T, where rrx
= fit,, fro.: fit, and vxt and vat are reduced form disturbance terms. This is
the model we have used to generate data for our Monte Carlo experiments
under the conditions summarized in Table 9.1. In all three runs the
parameters y and fit were assigned values of 2.0 and 0.5, respectively. In
Run I the xt were obtained by independent drawings from a normal distri-
bution with mean zero and variance one, and in Runs II and III they were
similarly obtained from normal distributions with zero means and variances
equal to two and nine, respectively. In all runs uxt and uo.t had a bivariate
Table 9.1 CONDITIONS UNDER WHICH DATA WERE GENERATED
Run I Run II Run III � , = 2.0 , = 2.0 y = 2.0 i::,t: 0.5 /: o.5 / = o.5
xt:NID(0, 1) xt :NID(0, 2) xt :NID(0, 9) ', NID(O, O; x, u=, u) (0, O; xx, ,
=a) NID(O, O; x, uu, x=) : = 1.0 xx: 1.0 axx = 1.0 , aa: 4.0 ) aa: 4.0 = = 4.0
?a =-1.0 x=: 1.0 a: 1.0 ?T=20; N: 50 T=20; N= 50 T= 20; N= 50 T:40; N:50
T:40; N= 50 T:; N= 50 T:60; N= 50 T: 60; N= 50 T= 60; N= 50 :=I;N= 50
T= 100;N= 50 T= 100;N= 50

278 SIMULTANEOUS EQUATION ECONOMETRIC MODELS normal


distribution with zero means and variances equal to 1.0 and 4.0,
respectively. In Run I the covariance of uxt and u.t, *xe, was equal to - 1.0;
thus the correlation between uxt and u.t was -�. In Runs II and III the
covariance *x. was set equal to 1.0; thus the correlation between ux and u.,
on these runs was �. For all runs we first generated N = 50 samples, each
of size T = 20, then N = 50 new samples, each of size T = 40, then N = 50
new samples, each of size T = 60, and finally N = 50 new samples, each of
size T - 100. 9.7.2 Sampling-Theory Analysis of the Model For most
principles of sampling-theory estimation, including maximum- likelihood,
two-stage least squares and three-stage least squares, the following is the
sampling-theory estimator for y' (9.104) ^ -, Y2t where .t = ext, e = x x v
and x = = x xtYxt/X= x Xt 2. = x?, The quantity in (9.1) was computed for
each of our samples. Also, in an effort to check the validity of sampling-
theory confidence intervals, which are employed frequently in practice, we
computed two sets of 95% confidence intervals constructed in the following
ways' in the first instance we assumed that (- y)/s is normally distributed
where s?e= (X=xea)-xs e with s e = X=x (Yu - PYet)e/( T - 1). On this
assumption the probability state- ment we wished to check in our
experiments is (9.105) Pr(p - 1.96s p + 1.96s} = 0.95. The interval p k 1.96s
was computed for each sample. Also, we computed an inteal based on the
assumption that (p - v)/s has a univariate Student t distribution with v = T- 1
degrees of freedom. Here our interval is ts where t was selected from the t-
tables so that (9.106) Pr(p - ts y + ts) = 0.95 would be a valid probability
statement if, in fact, ( - v)/s had a t distri- bution with v = T - 1 degrees of
freedom, a result that, to the best of our knowledge, has not been
established analytically. Since approximate inteals of this kind are widely
used in econometrics, we think that it would be of interest to study. their
properties. 9.7.3 Bayerart Analysis of the Model Viewing (9.102) and
(9.103) as a simple bivariate regression model and using the results on the
multivariate regression model in Chapter 8, based on diffuse prior
distributions for x, e and the distinct elements of the RESULTS OF SOME
MONTE CARLO EXPERIMENTS 279 reduced form disturbance
covariance matrix, we obtain the following posterior pdf for rrx and *re'
(9.107) pffrx, rely, Ye) '[1 + (7rl -- 1i7r2 -- 7r2)$21 sX.\ ^ . ] -via - :re -- ' a
pdf in the bivariate Student t form, where �x and e are the least squares
quantities and s at = Cat .__ xf', with b at the , lth element of ['__ x (Yat -
axt)(Yu -- txt)] tz, l = 1, 2. Now introduce the following transformation' 7Y
2 which has JacobJan ll. Then the posterior distribution of ), and ft is
(9.108) + 2&v - 30( - By standard methods we integrated out of (9.108) to
obtain 8 b'r 2{bo (9.09) p(ylyx, ya)m bb[1-2F(d)l- [bd Fx(d)), where v=T-
1, F(d) =fp(t) dt, Fx(d)=fttp(t)dt, p(t) is the Student-t pdf with degrees of
freedom, d = -(vbx/bo)Ub, b = yasXX + 2ys TM + s 22, bx and bo = 1 +
xes 11 + 2x2s TM + 2as aa - [Yxs11 + (1 + Y2) s12 + 2sa2] 2 bx Using
standard numerical techniques and a high-speed electronic com- puter, we
computed the normalizing constant for the posterior pdf shown in (9.109)
and the complete posterior pdf for each sample we generated. We also had
the computer plot each posterior pdf. In addition, we have computed 95
Bayesian confidence intervals for from each sample of data. One inteal was
computed such that the area a For further work)on the distribution of the
ratio of correlated Student-t variables, such as y = x/a, see S. James Press,
"The t-Ratio Distribution," J. Am. Statist. Assoc., 64, 242-252 (1969). For
50 samples, each of size 20, these computations and plots, as well as the
sampling- theory computations, listings of data generated, and tabulations
of results, took about a minute and a half on an IBM 7094 computer.

280 16 14 SIMULTANEOUS EQUATION ECONOMETRIC MODELS 12


10 2 0 0.90- 1.10- 1.30- 1.50- 1.70- 1.90- 2.10- 2.30- 2, 1.09 1.29 1.49 1.69
1.89 2.09 2.29 2.49 2.69 Posterior modal value 2.89 3.09 Figure 9.3
Frequency distribution of Bayesian modal values of posterior distributions
for y (Run I: T = 20). under each tail of the posterior pdf was 0.025. We
refer to this interval as the "exact central" interval. Another interval, the
shortest 957o interval, was also computed for each sample and is referred to
as the "exact shortest interval." The sampling performance of these
Bayesian intervals is compared with approximate sampling-theory intervals,
shown above, in what follows. 9.7.4 Experimental Results: Point Estimates
In Table 9.2 we present the distribution of Bayesian posterior modal values
and sampling-theory estimates for Run I. 5� For the 50 samples each of
size T-- 20, the distributions of estimates of y, true value equal to 2.0, are
presented in columns (2) and (3) of the table. It is seen that the distribution
of Bayesian posterior modal values has a well-defined mode at the interval
1.900 to 2.099, which embraces the true value 2.0, whereas the distribution
of sampling-theory estimates in column (3) is almost rectangular over the
interval 1.300 to 2.499 (see Figures 9.3 and 9.4). Note also that two of the
sampling-theory estimates are negative for this run with T = 20. As T is
increased to 40, 60, and 100, we note from columns (4) to (9) of Table 9.2,
that the distributions of Bayesian posterior modal values and sampling-
theory estimates become more similar and more closely concentrated about
the true value. We. note, however, that even in samples of sizes 40, 60, and
100 "outliers" appear in the sampling-theory approach, a fact discussed
below. 5o See Table 9.1 for a description of conditions underlying Run I.
Under our assump- tions, the posterior mean of y = *rx/*r2 does not exist.
RESULTS OF SOME MONTE CARLO EXPERIMENTS 10 8 --
::::::::::::::::::::::: ::::::::::::::::::i :::::::::::::::::::::: i:i:!:i:i:i::5:!:i i:;::!:i:i:i:5:i:i:
__ 281 0 -1.90- -0.30- 1.10- 1.30- 1.50- 1.70- 1.�0- 2.10- 2.30- 2.50- 2.70-
2.90- --1.71 -0.11 1.29 1.49 1.69 1.89 2.09 2.29 2.49 2.69 2.89 3.09
Estimate, 'Y Figure 9.4 Frequency distribution of sampling theory estimates
of (Run I: T = 20). In Table 9.3 we present results obtained for Run II in
which the variance of xt was increased to two, compared with its value of
one on Run I. Also the correlation between uxt and uo.t was changed to �,
compared with its value of -� on Run I. With these changes it is seen from
Table 9.3 that the distri- butions of the estimates are more sharply
concentrated about the true value of y than in Run I, for both approaches.
As in Run I, however, we see that for the smaller sample sizes, T = 20 and
T = 40, the Bayesian modal values are more highly concentrated about the
true value of 7. As in Run I, also, the sampling-theory approach has
produced a number of outlying estimates uncharacteristic of the Bayesian
approach. As regards "outliers" in the sampling-theory approach, we note
that they have been encountered in other Monte Carlo experiments. 5x
Although other workers point to possible rounding errors and near
singularity of moment matrices as possible explanations for their outliers,
such explanations are not relevant for explaining the current "outliers." It
just appears that the dis- tribution of the estimator 9, shown in (9.104), is
such that under the conditions underlying Runs I and II extreme values can
be encountered with a non- negligible probability? That the Bayesian
approach is not characterized by extreme values for posterior modal values,
at least in this set of experiments, is indeed an important result of the
experiments. ' Shown in Figure 9.5 are the general shapes of the exact
posterior distri- butions encountered in Run I. Generally speaking, for
smaller sample sizes these distributions often departed from the normal,
usually being more 6 See, for example, f. G. Cragg, loc. cit., and R.
Summers, "A Capital Intensive Approach to the Small Sample Properties of
Various Simultaneous Equation Estimators," Econornetrica, 33, 1-41
(1965). 6' For some recent analysis bearing on the distribution of the ratio
of correlated normal random variables, which is what the estimator is, see
G. Marsaglia, "Ratios of Normal Variables and Ratios of Sums of Uniform
Variables," J. Am. Statist. Assoc., 60, 193-204 (1965).

zzz

284 SIMULTANEOUS EQUATION ECONOMETRIC MODELS 2.5 2.0


1.5 1.0 0.5 0 T = 60 T TM 20 1.0 1.5 2.0 2.5 3.0 ? 3.5 Figure 9.5 Typical
posterior probability density functions for y (Run I: T -- 20 and T = 60).
peaked, with fatter tails, and skewed. In large samples many more cases
were encountered in which the distributions were close to being normal and
closer to the approximating normal distribution discussed above. In Run III
the only change in the experimental set-up relative to Run II was to raise
the variance of xt to nine, a change that improves the precision with which
r. can be estimated. As in Run II, the correlation between u and u.t, the
structural disturbance terms, was set equal to �. Raising the variance of x
to nine produced conditions in which it was possible to make much more
precise inferences about y in both the Bayesian and sampling- theory
approaches. Shown in Table 9.4 are the distributions of Bayesian posterior
modal values and sampling-theory estimates for Run III. Here the results
are quite different from those shown in Table 9.2 for Run I. In Run III the
distributions of estimates for each sample size are more similar. This is
probably because the conditions for Run III may be such that large- sample
results take hold even for smaller T. In these experiments, also, the exact
posterior distributions were quite close to being normal in many instances
and were closer to the approximating normal distribution discussed above.
One point brought out by the results of Runs I, II, and III is that what we

286 SIMULTANEOUS EQUATION ECONOMETRIC MODELS regard as


a "large sample" is critically dependent on the features of the underlying
model. For Run I we may say, somewhat loosely, that large- sample results
take hold for sample sizes in the vicinity of 100 + 30, whereas for Run III
they appear to take hold even in samples of size T -- 40. However, having
the exact posterior distribution in the Bayesian approach makes the issue
of"how large is large" less critical than in the sampling-theory approach in
which it is usual to rely on asymptotic theory to justify finite sample
inferences. 9.7.5 Experimental Results: Confidence Intervals We now turn
to consider the performance of Bayesian and approximate sampling-theory
confidence intervals. For each sample the intervals discussed above were
computed. Then for each sample size the number of intervals covering the
true value of ), was determined. This number, expressed as a percentage of
50, the number of trials, is reported in Table 9.5. For Runs I, II, and III the
Bayesian confidence intervals perform remarkably well. With 50 trials it is
impossible to get 47� intervals that cover; thus 95.07o coverage is
impossible. The actual figures reported in columns (2) and (3) indicate that
the Bayesian intervals have relatively good sampling properties28 With
respect to the performance of the approximate sampling-theory confidence
intervals the results reported in Table 9.5 indicate that the nominal 957o
confidence level is not generally realized. The percentage of coverage in a
number of instances was found to be significantly below 95. This indicates
that inferences, based on these approximate sampling-theory intervals, may
generally be erroneous in "small-sample" situations. 9.7.6 Concluding
Remarks on the Monte Carlo Experiments The experiments reported above
indicate that the differences in the sampling properties of Bayesian and
sampling theory estimators in "small- sample" situations are quite striking.
Under the conditions we have examined Bayesian procedures produced
better results than sampling-theory estimation procedures. With respect to
point estimation Bayesian estimates tended to be more highly concentrated
about the true value of the parameter being estimated than sampling-theory
estimates, particularly in small-sample situations. In this connection it
appears relevant to note that the sampling- theory estimators considered are
usually given a large-sample justification. Thus there is no assurance that
these estimators will perform well in small- as For some analysis of the
sampling properties of Bayesian intervals, see B. L. Welch and H. W. Peers,
"On Formulae for Confidence Points Based on Integrals of Weighted
Likelihoods," Journal of the Royal Statistical Society, B25, 318-324 (1963),
and D. J. Bartholomew, "A Comparison of Some Bayesian and Frequentist
Inferences," Biometrika, 52, 19-35 (1965). QUESTIONS AND
PROBLEMS Table 9.5 PERFORMANCE OF BAYESIAN AND
SAMPLING THEORY NOMINAL 95'70 CONFIDENCE INTERVALS
FOR y 287 Experiment (1) Bayesian Bayesian Sampling- Sampling- "Exact
"Exact Theory Theory Central" Shortest" "Normal" t Interval a Interval a
Interval a Interval a (2) (3) (4) (5) (Percent of intervals covering true value)
Run I: T = 20 96.0 96.0 84.0 88.0 T = 40 96.0 98.0 82.0 82.0 T-- 60 96.0
98.0 76.0 76.0 T = 100 96.0 96.0 92.0 96.0 Run II: T = 20 96.0 98.0 82.0
82.0 T = 40 96.0 96.0 86.0 86.0 T -- 60 90.0 96.0 90.0 94.0 T = 100 100.0
100.0 94.0 98.0 Run III: T-- 20 92.0 96.0 84.0 84.0 T = 40 100.0 100.0 94.0
94.0 T = 60 96.0 96.0 98.0 100.0 T -- 100 94.0 96.0 88.0 90.0 These
intervals are defined in the text. sample situations. s4 With respect to
interval estimation intervals computed from posterior pdf's were found to
have rather good sampling properties. On the other hand, approximate
sampling-theory confidence intervals, as often used in practice, were found
to be deficient in small-sample situations. In large-sample situations, as
theory predicts, Bayesian and sampling-theory procedures performed about
equally well in terms Of the criterion of performance in repeated samples.
QUESTIONS AND PROBLEMS 1. Consider the followingtwo-equation
supply and demand model: Demand: Pt =q + o2xt + u t= 1,2,...,T Supply: qt
'-' lPt-1 " 2Xat q- l12t, where the subscript t denotes the value of a variable
in the tth period, Pt and 54 In the text of this chapter we have seen that
certain sampling theory estimates can be viewed as approximations to the
mean or modal value of posterior pdf's. 56 The finite sample properties of
these approximate procedures have not been analyzed to the best of our
knowledge. However, some Monte Carlo experiments investigating this
problem have been reported in J. G. Cragg, 1oc. cit.

288 SIMULTANEOUS EQUATION ECONOMETRIC MODELS qt, price


and quantity, respectively, are endogenous variables, xxt and x.t, income
and a cost variable, respectively, are assumed exogenous, the s's and O's are
structural parameters, and uxt and u.t are serially uncorrelated random
disturbance terms. Assume that uxt and u.t are normally distributed with
zero means, variances rn and r,., and zero covariance, that is ro. = 0, for all
t. Given the initial observation po, write down the likelihood function and
derive maximum likelihood estimates for the parameters, given sample data
for p, q, and the x's. What is the information matrix for this system ? 2.
Analyze the supply and demand model in Problem 1 with the following
diffuse prior pdf for the parameters; p(-z, s.,/x,/%., ax, r.o.) oc 1/anr,., with -
m < st,/,< mand0 < r, < o,i= 1,2. 3. Suppose in Problem 1 that the uxt's are
autocorrelated and satisfy uxt = nuxt-x + t for all t, where n is an unknown
parameter and the ,'s are NID(0, r'). Show how maximum likelihood and
Bayesian estimates of the parameters s and so. can be obtained under the
assumption regarding the uxt's. 4. If in Problem 1 r. - 0, write down the
likelihood function in terms of the s's,/'s, rn, r.o., and t> = rx-//rn r"-' Then,
using the prior assumptions of Problem 2 and the assumption that t> = t>o,
a given value, show how to obtain conditional posterior pdf's for the s's
and/'s, given that t> = t>o. 5. In connection with the simple Haavelmo
model shown in (9.36) to (9.37), formulate an informative beta-prior pdf for
s, the marginal propensity to consume, and show how it can be employed,
along with other prior assump- tions, to obtain a posterior pdf for . 6. In
Problem 5 from the beta-prior pdf for s deduce the implied prior pdf for the
multiplier rt = 1/(1 - s) and examine how its properties depend on the
parameters of the prior pdf for . 7. Use Haavelmo's data for the United
States, 1929 to 1941, shown below, to compute the posterior pdf for s
derived in Problem 5: Year ct yt Year 1929 $474 $534 1936 $463 $511
1930 439 478 1937 469 520 1931 399 440 1938 444 477 1932 350 372
1939 471 517 1933 364 381 1940 494 548 1934 392 419 1941 529 629
1935 416 449 (Figures are per-capita price-deflated personalconsumption
expenditures and disposable income.) 8. Formulate a relatively diffuse prior
pdf for the parameters of the following slightly expanded Haavelmo model:
�t = I + oy + yCt-z +ttt) t = 1, 2,..., T, Yt = Ct + Zt where zt is assumed to
be exogenous and the u,'s are NID(0, o). '9. 10. 11. QUESTIONS AND
PROBLEMS 289 Using the prior pdf formulated in Problem 8 and
Haavelmo's data, shown in Problem 7, derive and compute posterior pdf's
for the parameters of the model in Problem 8, given the initial value for ct,
Co = 466, the 1928 value. In particular, examine the posterior pdf for y to
decide whether the informa- tion in the data and prior assumptions suggest
a zero value for this parameter. In Problem 9 from the joint posterior pdf for
s and y explain how to obtain the marginal posterior pdf for s/(1 - y), the
"long-run" marginal propensity to consume. Then compute the marginal
posterior pdf for s/(1 - y). For the ith firm i= 1, 2,...,n let yo =log of output,
yx,=Iog of labor input, and y. = log of capital input. If the production
function is of the Cobb-Douglas (CD) form and the firm is assumed to
maximize the mathematical expectation of profit, it has been shown 6 that
the following model for the observations (yo, Ym Y.) results: Production
relation: Yo - sxy - s.y. - So = Uo, Condition on labor input: (sx - 1)y, + s.y.,
- /x = u, Condition on capital input: ,xyx + (so. - 1)y. - /9. = u.{, for i = 1,
2,..., n, where sx and so. are the labor and capital coefficients in the CD
function, respectively, So, x, and o. are parameters, and the triplets (Uo,, um
u.0 are assumed to be independently and normally distributed, with zero
mean vector and common 3 x 3 pds covariance matrix E, given by In this
last expression %0 is the variance of the Uo,'S, :. is a 2 x 2 covariance
matrix for ux and u., and 0' = (0, 0). Write down the likelihood function for
the system and derive maximum likelihood estimates for the parameters.
12. Use the likelihood function in Problem 11 along with the following
diffuse prior pdf for the parameters ' = (a, s., so), I$' = (, .), %o and p(ot, 15,
%0, E.) oc p(ot, 15)P.(%o)pa(E.), with p(ot, 15) oc 11 - a - a.] P.(%o) oc
1/%o, and pa(I;.) oc [l;.[-at -. Here we are assuming that the location
parameters ot and 13 are a priori independent of the scale parameters %o
and Z.. Further, application of Jeffreys' invariance theory ** provides the
form for px(ot, 13), shown above. Assuming that 0 < %o < 0% 0 < IZ.I, and
(,o,/, and/. range from -m to 0% write down the joint posterior pdf for the
parameters and then integrate to obtain the marginal posterior pdf for So,
and e., which is in the trivariate Student-t form centered at the maximum
likelihoodestimates if it is assumed that sx and so. both range from -m to
CO. 13. In Problem 12 provide an alternative prior pdf for e and s. which
reflects information suggested by economic theory, namely ex, s. > 0, and
show how See A. Zellner, J. Kmenta, and J. Drze, "Specification and
Estimation of Cobb- Douglas Production Function Models," Econometrica,
34, 784-795 (1966), for details. ? See Appendix to Chapter 2.

290 SIMULTANEOUS EQUATION ECONOMETRIC MODELS it can be


employed in the analysis of the CD model given in Problem 11. How will
such prior information affect the posterior pdf's for ,,x and ,,. and optimal
Bayesian point estimates when the sample size n is small and when it is
large ? 14. Let yxt = Yaty + uxt and yv.t = Xt': + uv.t, t = 1, 2, ..., T, where
Yxt and y.t are endogenous variables, y is a scalar parameter, xt' is a 1 x k
vector of given quantities with X' = (xx, x.,..., xv) a k x T matrix of rank k, r
is a k x 1 vector of parameters, and uu and u.t are disturbance terms.
Assume that the pairs (uxt, u.t) are normally and independently distributed,
each with zero mean vector and 2 x 2 pds covariance matrix :. Is the above
system observationally equivalent to the following system? Yu = xt'ry + qt
and y.t = xt'n + .t, where the pairs (qt, .t) are assumed to be normally and
independently distributed, each with zero mean vector and 2 x 2 pds
covariance matrix f. 15. Suppose that we formulate the following prior pdf
for parameters appearing in the second system of Problem 14: 16. 17. with -
m < m < m, i = 1, 2,..., k, Irl > 0, and px(y), the marginal prior pdf for y.
What does this prior pdf imply about the prior pdf for Z, the disturbance
covariance matrix for the first system in Problem 147 Using the prior pdf
shown in Problem 15, analyze the system in Problem 14 to obtain a
marginal posterior pdf for y. As in Section 9.5, analyze the first equation of
a simultaneous equation system but with the assumption that the structural
disturbance covariance has the following form: where Z is an rn x rn pds
matrix, axx is the common variance of the elements of ux, :. is an (m - 1) x
(m - 1) full matrix, and 0' is a 1 x (m - 1) vector of zeros. 18..If in
connection with the system shown in (9.1) I' and : are known to be block
diagonal, : and Z = � i F ! : i ' 0 aa \ 0 ;a with Z, the covariance matrix for
the disturbance terms appearing in the subset of equations with endogenous
variable coefficient matrix I' ,, i, = 1, 2, ..., (7, show how this leads to a
factorization of the likelihood function. CHAPTER X On Comparing and
Testing Hypotheses In many circumstances there is the problem of
comparing alternative hypotheses; for example, we may be interested in
comparing Friedman's permanent income hypothesis (PIH) and the absolute
income hypothesis (AIH), or we may wish to compare the hypothesis that a
parameter is zero with the hypothesis that it is different from zero. Given
aprecise statement of the hypotheses to be compared, the way in which we
actually make the comparison will depend on the purpose to be served by
our analysis, the state of our prior information, and whether we have an
explicitly formulated loss function. Each of these points is considered
below. As regards purpose of an analysis, a body of data may be analyzed
just to provide a revision of prior probabilities associated with alternative
hypothe- sesX; for example, if, initially, our prior probabilities for the PIH
and AIH are each equal to �, sample information, used in a way to be
described below, may change these probabilities to for the PIH and � for
the AIH. An investi- gator may stop here with the conclusion that on the
basis of initial prior probabilities (= �) and of the information in the data
the posterior odds in favor of the PIH are three to one; that is :} + � = 3.
Clearly, this result is useful and important. Further, it may be that the
investigator has no idea how his research findings are to be used. Thus he
cannot be or is not interested in formulating a decision problem with an
explicit loss function. However, by obtaining posterior probabilities
associated with alternative hypotheses, he has provided an important
ingredient for those interested in solving decision problems with explicitly
given loss functions. Further, analysis of additional sample data will result
in a revision of posterior probabilities. After, much data have been analyzed
it may be that the posterior probabilityassociated with one of the hypotheses
is close to one. In this situation we would say that this hypothesis is
probably true?' The process of revising prior probabilities associated with
alternative hypotheses does not necessarily involve a decision to reject or
accept these hypotheses, which is Note that in the Bayesian approach it is
considered meaningful to introduce proba- bilities associated with
hypotheses. ' The use of the term "probably true" allows for the possibility
of error. 291

292 ON COMPARING AND TESTING HYPOTHESES why we have used


the term "comparing hypotheses" rather than "testing hypotheses." On the
other hand, we recognize that on many occasions the objective of an
analysis is to reach a decision, say accept or reject, with respect to alterna-
tive hypotheses; that is, we may wish to state that on the basis of the
available information we can reject the AIH and accept the PIH. If we have
an explicitly given loss function, the prescription act to minimize expected
loss is usually employed to reach a decision. 8 If, however, we have no
explicitly given loss function, there is some unavoidable degree of
arbitrariness in our decision; that is, we may decide to accept the PIH and
reject the AIH if the posterior odds in favor of the PIH are 20 to 1 or
greater. One may well ask why 20 to 1 rather than 30 to 1 ? No satisfactory
answer to this question can be given when the consequences of the acts,
reject or accept, are not or cannot be spelled out explicitly. 4 Thus, when no
explicit statement of the consequences of the acts, accept or reject, is given,
going beyond the reporting of posterior probabilities associated with
alternative hypotheses to reach the conclusion, reject or accept, will involve
some element of arbitrariness. The last general point to be considered is the
role of prior information in comparing hypotheses. Just as in estimation
problems, it will be seen that prior information can be incorporated in
analyses involving the comparison of hypotheses. As with estimation, the
amount and kind of prior information to be employed in an analysis will
depend on what we know and what we judge appropriate to incorporate into
the analysis. We recognize that there are situations in which we know very
little and thus want procedures for comparing hypotheses with the use of
little prior information. There are other circumstances, however, when we
have prior information, say from analyses of past samples of data, and wish
to incorporate it in our comparison of hypotheses. We shall see how this can
be done in the Bayesian approach. 10.1 POSTERIOR PROBABILITIES
ASSOCIATED WITH HYPOTHESES We now turn to the problem of
using data to revise prior probabilities associated with hypotheses. 5 As a
first example let us consider two mutually exclusive and exhaustive
hypotheses, H0 and H. Initially, we assume that under H0 our observation
vector y has a pdf p(yl0 = 0o) and under H, This is in accord with the
expected utility hypothesis. For an interesting discussion of this precept for
consistent behavior see R. D. Luce and H. Raiffa, Games and Decisions.
New York: Wiley, 1958. 4 The same arbitrariness arises in the choice of a
significance level in sampling theory tests. That is, why use the 5 per cent
rather than the 1 per cent level of significance ? 5 Much of what follows is
due to Harold Jeffreys. See his Theory of Probability (3rd ed.). Oxford:
Clarendon, 1961, Chapters 5 and 6. POSTERIOR PROBABILITIES
ASSOCIATED WITH HYPOTHESES 293 P(Ylq > = q>0, where 0o and
q>x are specific values for the parameter vectors 0 and q, respectively. 6
Further, let w be a dichotomous random variable such that { if H0istrue,
(10.1) w = if Hx is true. Our prior probabilities associated with the
hypotheses are p(Ho) - p(w = O) and p(HO = p(w = 1) with p(Ho) + p(HO
= p(w = O) + p(w = 1) = 1. Now consider the joint pdf for y and w, (10.2)
from which we obtain (10.3) p(y, w) = p(w) p(ylw) - p(y) p(wly), p(wly) =
p(w) P(YlW) p(y) " wherep(wly) is the discrete posterior pdf for w, given
the sample information, p(w) is the discrete prior pdf for w, P(Yl w) is the
conditional pdf for y, given w, and p(y) = p(ylw = O)p(w = 0) + p(ylw =
1)p(w -- 1) is the marginal pdf for y, assumed nonzero. Then from (10.3) we
have for the posterior probability associated with Ho p(HolY) -- p(w -- 0ly)
-- p(w = 0) p(ylw - 0) p(y) (10.4) _ p(Ho) p(yl0 = 0o) P(Y) and similarly for
Hx (10.5) p(Hxly) = p(w- lly)=p(w--- 1)p(ylw = 1) p(y) = p(H0 P(Ylq, =
The expressions in (10.4) and (10.5) can be employed to compute posterior
probabilities, given that we have prior probabilities and explicit functional
forms for p(y10 = 0o) and p(y]q = q). Also, the posterior odds in favor of
H0, denoted by K0, are given by * p(Holy) = p(Ho) p(y[0 = 0o) (10.6) Kox
= p(Hly ) p(HOp(ylq = q)' 6 In some problems 0 = , and we consider Ho:0 =
0o and Hx:0 = 0x as our two hypotheses. v Note that Kox in (10.6) can be
computed and would remain unchanged if we had more than two mutually
exclusive hypotheses.

294 ON COMPARING AND TESTING HYPOTHESES In (10.6) we see


that the posterior odds are the product of the prior odds, P(Ho)/p(HO, and
the likelihood ratio, p(y10 = As an example illustrating application of these
concepts and operations let us assume that a coin has been fairly and
independently tossed three times and that we observe two heads and one
tail. Under hypothesis H0 we assume that the probability of a head is �.
Under the hypothesis Hx we assume that the probability of a head is �. If
the prior probabilities are p(Ho) = p(HO = �, then the data, two heads in
three tries, yields the follow- ing posterior odds from (10.6)' Ko = p(Ho[y)
= (�)(�)2(�) p(HxlY) (�)(�)2(/[) = 8/3. Thus the sample evidence
changes our prior odds, 1/1, to 8/3 in favor of the hypothesis Ho, namely,
that the probability of a head is �. Equivalently, the data have changed our
prior probabilities from � to 8/11 and 3/11 for H0 and Hx, respectively. To
indicate how prior odds of 1/1 would be modified by other possible
outcomes we show in Table 10.1 posterior odds associated with a range of
possible outcomes. We see that for just two tosses the posterior odds are just
4/3 for Ho, given that one head has appeared. Given two heads in two
tosses, however, the odds are 4/1 in favor of Ho. Note, too, how increasing
the sample size affects the posterior odds, given various outcomes; for
example, having two of four tosses result in a head appearing leads to
posterior odds 16/9, which is somewhat greater than those associated with
getting one head in two tosses, namely 4/3. Being able to state posterior
odds and probabilities may be all that we may want to do. There are other
ci3cumstances, however, in which we may want to take an action; that is,
accept Ho or reject Ho. 8 This is a two-action problem. Table 10.1
POSTERIOR ODDS FOR IJ' o RELATIVE TO H1 a Number of Number of
Heads Appearing Trials 0 1 2 3 4 5 2 4/9 4/3 4 -- -- -- 3 8/27 8/9 8/3 8 -- -- 4
16/81 16/27 16/9 16/3 16 m 5 32/243 32/81 32/27 32/9 32/3 32 a Ho is the
hypothesis that the probability of a head on a single toss is �, whereas
asserts that this probability is �. Here we specifically assume that the
action "go on taking data" is not an option. POSTERIOR PROBABILITIES
ASSOCIATED WITH HYPOTHESES 295 Further, we recognize that, by
assumption, there are two possible states of the world--Ho true or Hx true.
Thus we have a "two-state-two-action" problem. Let us assume that the
consequences of our actions are given by the following loss structure: State
of World Ho true Hx true Accept Ho L(Ho, o) = 0 L(Hx, o) Accept Hx
L(Ho, ) L(Hx, ) = 0 Action This particular loss structure is such that we
experience zero loss if our actions are in agreement with states of the world.
However, if we accept H0 when Hx is true, we experience a positive loss
L(Hx, o), where the first argument of L refers to the state of the world and
the second to our action; that is, o is short-hand notation for accept Ho.
With the above loss structure we can evaluate the consequences of our
actions, given that we have posterior probabilities for the hypotheses Ho
and Hx; that is, the expected loss associated with the action accept Ho is
(10.7) E(Zlro) = p(HolY) Z(Ho, ro) + p(HxlY) L(Hx, = p(HxIy)L(Hx, since
we have assumed L(Ho, ro) = 0. Similarly, (10.8) E(LIr0 = p(H0lY) Z(Ho,
+ p(Hxly) L(Hx, = p(Holy) Z(Ho, since we have assumed L(Hx, ) = O.
Having computed the expected losses in (10.7) and (10.8), we can compare
them and choose an action: (10.9) If E(LIo) < E(Llrx ), accept Ho, action 0,
(10.10) If E(L[x) < E(LIo ), accept Hx, action x. This proes a basis for
action � in accord with the implications of the expected utility hypothesis,
namely, that a decision maker should maximize expected utility (or,
equivalently, minimize expected loss) in order to be consistent. 0 If the
expected losses are equal, we could take either action and experience the
same consequences.

296 ON COMPARING AND TESTING HYPOTHESES To see what acting


on the basis of a comparison of expected losses implies in terms of sample
information, we see from (10.7) and (10.8) that E(L]ro) will be less than
E(Llr0 if, and only if, (10.11) p(Hxly) L(Hx, ro) < p(Ho[y) L(Ho, ). Then
on substituting for p(Ho[y) and p(Hx[y) from (10.4) and (10.5) we have
p(y[Ho) p(Hx) L(H, o) (lO.12) p]KO > p(Ho) �(Ho, Thus we find that
(10.12) is logically implied by the condition E(L[o) < E(L[rx) and is an
alternative way of stating the expected loss criterion for taking the action of
accepting Ho. In (10.12) we have a likelihood ratio p(y[Ho)/p(y[HO which
is compared with the ratio of prior expected losses. The higher the prior
expected loss associated with accepting H0, p(Hx)L(Hx,o), in relation to
that associated with accepting Hx, p(Ho) L(Ho, rx), the greater the sample
evidence in favor of Ho, as reflected' by the likelihood ratio on the lhs of
(10.12). This indeed appears to be a sensible procedure for determining the
"critical value" in a likelihood-ratio test procedure; that is, a likelihood-ratio
test procedure states accept Ho if p(y[Ho)/p(YlH) > ?,, where ?,'s value is
determined by choice of the sig- nificance level for the test. Often the
significance level and the associated value of ?, are chosen with implicit
consideration of the relative costs or losses associated with errors of the first
and second kind? In the Bayesian approach explicit consideration is given
to the loss structure. That this leads to a likeli- hood-ratio test procedure is
indeed an interesting justification for use of this procedure in testing
hypotheses. Further, that explicit consideration of the loss structure
provides a natural choice of the critical value ?, is certainly satisfying. It is
extremely interesting to investigate the implications of various loss
structures. One particularly simple one is the "symmetric" loss structure:
(10.13) �(Ho,-grd - ro); �(Ho, = _gr0 = 0. For this loss structure errors of
the first and second kind have equal asso- ciated losses. When it is
appropriate to use such a loss structure, we see from (10.12) that the critical
point in the likelihood-ratio test is just the prior odds for H in relation to
Ho. If the likelihood ratio is greater than the prior odds for Hx, we accept
Ho, an action in accord with the expected utility hypothesis, given the
symmetric loss function in (10.13). Equivalently, we can make a x0 An
error of the first kind is accepting H, given that Ho is true, while an error of
the second kind is accepting Ho when H is true; L(Ho, Bx) and L(Hx, r0)
are the losses associated with errors of the first and second kind,
respectively. POSTERIOR PROBABILITIES ASSOCIATED WITH
HYPOTHESES 297 comparison of the expected losses in (10.5) and (10.6)
under the symmetric loss structure. This shows that E(LIEro) < E(LIErx) if,
and only if, (10.14) Thus under the p(Holy) > p(Hly) or symmetric loss
structure p(//oly) p(Hxly) > 1. a comparison of the posterior probabilities
will provide a basis for choosing between Ho and Hx. Obviously, with other
loss structures the ,sPecific prescription for action will be different but the
principles will be the same. To bring out salient points we have considered
alternative hypotheses that assign specific values to all parameters of the
pdf for the observations p(yIH), as in the example involving coin tossing.
This comparison of simple hypothe- ses is sometimes encountered in
practice. More frequently, we encounter so-called composite hypotheses;
for example, the hypothesis that the probability of a head is not equal to �.
No specific value is suggested. Rather, all possible values other than � are
consistent with the hypothesis. To compute posterior probabilities for these
hypotheses involves some extension of the above analysis, which we now
undertake. Again we consider two mutually exclusive and exhaustive
hypotheses, Ho and Hx, with prior probabilities p(Ho) = p(w = 0) and p(HO
= p(w = 1), where w is the random variable defined in (10.1). Let 0 denote
the parameter vector associated with hypothesis Ho under which the pdf for
the observation vector y is p(y[O) --p(yl TM = O, O) and q>, the parameter
vector associated with hypothesis Hx under which the pdf for y isp(ylq>) --
p(ylw = 1, 0). We have for the joint pdf for y, w, and � (10.15) or (10.16)
p(y, w, �) = p(y)p(w, �[y) = p(w, �)p(ylw, �) p(w, �IY)= p(w,
�)p(ylw, �) P(Y) _ p(w) p(�lw) p(ylw, �) P(Y) where p(w) i the prior
pdf for w and p(�l TM) is the conditional prior pdf for 0, given w. We
havq p(�l TM = 0) = p(0) and p(�lw = 1) = P(4), the prior pdf's for 0 and
4, respectively � Then the posterior probability associated with Ho can be
obtained from (10.16) by inserting w = 0 an d integrating with respect to 0;
that is p(w = O) f p(O) p(y[O) dO (10.17) p(Holy) = p(y)

298 ON COMPARING AND TESTING HYPOTHESES and similarly p(w


= 1) f p(q>) p(ylq) (10.18) p(HxlY) = p(y) ' These expressions can be
employed to compute the posterior probabilities associated with the
hypotheses, provided that the integrals converge. Also, the posterior odds in
favor of Ho are given by (10.19) Kox p(HolY) p(Ho) J'P(O)p(yIO) dO =
P(Hxly) = p(HO f p(q>) p(ylq>) where we have written p(Ho) = p(w = 0)
and p(Hx) = p(w = 1) as the prior probabilities associated with Ho and Hx,
respectively. From (10.19) we see that the posterior odds are equal to the
prior odds p(Ho)/p(H) times the ratio of averaged likelihoods with the prior
pdf's p(0) and p(q>) serving as the weighting functions. This contrasts with
the usual iikelihood-ratio testing procedure which involves taking the ratio
of maxi- mized likelihood functions under Ho and Hx, a procedure that
amounts to using maximum likelihood estimates as if they were true values
of the unknown parameters in forming a likelihood ratio appropriate for two
simple hypotheses. 10.2 ANALYZING HYPOTHESES WITH DIFFUSE
PRIOR PDF'S FOR PARAMETERS In this section we consider Lindley's
procedure xx for Bayesian tests of significance. As Lindley emphasizes, his
procedure is appropriate only when prior information is vague or diffuse;
that is, if we are concerned with the hypothesis that a scalar parameter 0 is
equal to 0o, the value under the "null hypothesis," and the alternative
hypothesis is 0 % 0o, the prior distribution in the neighborhood of the null
value 0o ". � � must be reasonably smooth for the tests to be sensible." x2
This means that the situation is assumed to be such that we have no reason
to believe that 0 = 0o more strongly than 0 = 0x, where 0x is any value for 0
near 0o. We shall see that for many, but not all, problems Lindley's
procedure leads to tests that are computationally equiva- lent to sampling-
theory tests. However, the interpretation of Lindley's Bayesian tests of
significance is fundamentally different from that of sampling-theory tests. It
must also be recognized that when our prior information is neither vague
nor diffuse results will usually be importantly influenced by prior
information, as demonstrated below. xx D. V. Lindley, Introduction to
Probability and Statistics from a Bayesian Viewpoint, Part 2. Inference.
Cambridge: University Press, 1965, p. 58 if. 2 Ibid., p. 61. ANALYZING
HYPOTHESES WITH DIFFUSE PRIOR PDF'S FOR PARAMETERS 299
In Lindley's procedure the posterior pdf for the parameter (or parameters) of
a model is derived by using a diffuse prior pdf. Let the posterior pdf for the
parameter 0 be denoted by p(01y), where y denotes the sample information.
Further assume that p(01y) is unimodal. To perform a Lindley significance
test of the hypothesis 0 = 0o, where 00 is a suggested value for 0, at the (say
0.05) level of significance, we construct an interval such that Pr{a < 0 <
b[y} = 1 - .3 (We referred to such an interval as a Bayesian confidence
interval in Chapter 2.) If 0o falls in this interval, that is, a < 0o < b, we
accept the hypothesis 0 = 0o at the level of significance; if 0o falls outside
the interval a to b, that is, 0o < a or 0o > b, we reject the hypothesis 0 = 0o
at the level of significance. The rationale for the Lindley procedure is that
the posterior pdf for a parameter, say 0, gives us a basis for expressing
beliefs about possible values of 0. If the value 0 = 0o is in a region in which
the posterior probability density is not high, this fact leads one to suspect
that this value for 0 is not believable and thus he rejects the hypothesis 0 =
0o. Some particular features of Lindley's procedure deserve to be
emphasized. First, since it is based on the use of the likelihood function, all
the sample information is employed. This contrasts with certain test
procedures based on the distribution of a sample statistic which is not a
sufficient statistic. Second, we are judging the hypothesis 0 = 0o on the
basis of posterior beliefs, as represented by the posterior pdf for 0. We are
not judging it on the fact that some test statistic may assume a usual or
unusual value, given that 0 - 0o, as in usual sampling-theory test
procedures. The level of significance for sampling-theory tests should not
be interpreted as measuring the degree of belief that the hypothesis 0 = 0o is
valid, even though this interpretation is frequently encountered. Sampling
theorists do not consider it meaningful to assign probabilities to hypotheses.
Third, Lindley points out that "... the significance level is... an incomplete
expression of posterior beliefs." Usually an investigator will study
properties of a posterior pdf in detail and not rely solely on a significance
test to summarize his beliefs. Finally, it is worth emphasizing again that
Lindley suggests use of his procedure only when prior information is vague
or diffuse. When prior information is not vague, we shall use posterior odds
in comparing and testing hypotheses. Example 10.1. Suppose that our
observations y'= (y,y.,...,Y,) are independently drawn from a normal
population with unknown mean tz and known st(ndard deviation * = %.
Under what condition will we reject the hypothesis t = 5 at the 0.05 level of
significance, given diffuse prior beliefs about t ? xa As discussed in Chapter
2, we determine a and b such that b - a is minimized subject to the condition
that Pr{a < 0 < bly} = 1 - a.

300 ON COMPARING AND TESTING HYPOTHESES Applying Bayes'


theorem, the posterior pdf for p is given by in l oc exp -2-? ( - )' ' where =
Y.l'= ydn, the sample mean. Thus, a posteriori, / is normally distributed with
mean and variance %'/n. Then z = (/ -/l)v/'/*o is a standardized normally
distributed variable; that is, Ez = 0 and Var z = 1. By consulting tables of
the normal distribution we find Pr{-1.96 < z < 1.96} = 0.95. A logically
equivalent statement is 'i {/ 1.96o 1.96o )I, 0, n} = 0.95. 10.20) Pr <p<+
Then, if the value 5 lies in the interval /I + 1.96%/x/, we accept the
hypothesis p = 5 at the 0.05 significance level. If not, we reject. In (10.20) it
is important to emphasize that p is considered random, whereas g, which
depends on the given sample information, is given with -0 and n. This
contrasts with the sampling-theory approach in which z'= (fi - 5)x//,o is
considered random and the hypothesis p = 5 is accepted at the 0.05
significance level if -1.96 < z' < 1.96 and rejected if Iz'l > 1.96; that is,
accept if g is in the interval 5 + 1.96,o/X/ and reject ifs lies outside this
interval. Note that this action is precisely equivalent to those taken on the
basis of (10.20) when we follow Lindley's procedure. However, the
interpretation and justification for the sampling-theory procedure are quite
different from those for the Bayesian procedure. Example 10.2. In Chapter
3 it was found that with a diffuse prior pdf for regression coefficients and
the error terms' common standard deviation in the linear normal regression
model the marginal posterior pdf for a single regression coefficient, say r
was in the univariate Student-t form; that is, a posteriori tv = (fi -/0/sV"' has
a univariate Student-t pdf with v = n - k degrees of freedom [see (3.39),
Chapter 3]. If we wish to test the hypothesis r -- 0 by using Lindley's
procedure 4 at, say, the 0.20 level of significance, we consult tables of the
Student-t distribution for v = n - k degrees of freedom to find c such that
Pr{-c < tv < c} = 0.80. Given the $4 Note that to be in accord with Lindley's
assumptions the value fl = 0 must not be in any sense an unusual value for
fl. If, for example, a theory predicts that fl = 0, it may very well be the case
that an investigator will want to incorporate this information in his prior
pdf, and thus Lindley's approach with a diffuse prior pdf would not I
appropriate. ANALYZING HYPOTHESES WITH DIFFUSE PRIOR
PDF'S FOR PARAMETERS 301 definition of tv above, we have Pr{/x -
csv/' ff < r < tx + csv/' if} - 0.80. Then, if the value r = 0 falls in the
interval/x + cs we accept; otherwise we reject. As in Example 10.1, this
leads to the same action as a sampling-theory test based on the random
variable Example 10.3. Consider the posterior pdf for the autocorrelation
coefficient p in (4.19). Using numerical integration procedures, we can find
the shortest interval sich that Pr{a < p < b} = 1- a. To test the hypothesis
that p = po, where p0 is a given value, we accept if a < po < b and reject
other- wise at the a level of significance. In this instance there does not
appear to be a simple sampling-theory test which yields comparable results.
In the above examples we have used Lindley's approach to test hypotheses
about a single scalar parameter. If we have a joint hypothesis about two or
more parameters, say 0, a Bayesian "highest posterior density" confidence
region for 0 is first obtained with probability content 1 - a. If our hypothesis
is 0 -- 0o, where 00 is a given vector, we accept if 0o is contained in the
confidence region and reject otherwise at the level of significance. Example
10.4. Consider the simple linear normal regression model in Section 3.1.
Equation 3.18 and the associated distributional result provide us with what
is needed to construct a Bayesian confidence region (ellipse) with any
specified probability content, say 1 - . Then, if our hypothesis is rx = r0 and
to. = rio. o, where ro and r.o are given values, we accept if the point (fio,
r.o) is within the Bayesian confidence region and reject otherwise at the e
level of significance. Equivalently, we compute � in (3.18) with rx = ro
and to. = to.0 and designate this computed value by �o. Since a posteriori
is distributed as F.,v, where v = n - 2, we can consult tables of the F-distri-
bution to find c such that Pr{ < c} = 1 - . If �o < c, we accept the
hypothesis that fix = x0 and r = r.o at the significance level. If �o'> c, we
reject the hypothesis at the a significance level. In summary, Lindley's
procedure for tests of significance are appropriate when prior information is
vague or diffuse. Basically, the rationale for such tests is accept when the
suggested value for a parameter under the null hypothesis is in an interval in
which the posterior probability density is high and reject otherwise. No
decision theoretic justification appears available for this procedure. Rather
it is based entirely on what is reasonable a posterJori. For a number of
problems Lindley's procedure leads to actions, accept or reject, which are
identical to those flowing from sampling-theory test pro- cedures. In some
problems, however, for example, testing a hypothesis about an
autocorrelation parameter, Lindley's procedure leads to easily computable
results, whereas comparable sampling theory results are not available.
Finally in large samples, when the likelihood function assumes a normal
form,

302 ON COMPARING AND TESTING HYPOTHESES with mean equal


to the maximum likelihood estimate for a parameter vector 0 and
covariance matrix equal to the inverse of the estimated information matrix
(see Chapter 2, Section 2.10), Lindley's testing procedure yields results that
are computationally equivalent to large-sample sampling-theory tests which
utilize the large-sample normality of maximum likelihood estimators
centered at the true parameter vector with approximate covariance matrix
given by the inverse of the estimated information matrix. 10.3
COMPARING AND TESTING HYPOTHESES WITH NONDIFFUSE
PRIOR INFORMATION In situations in which prior information is not
diffuse, it is important to take this fact into account in comparing and
testing hypotheses. If we con- sider a simple hypothesis 0 = 00, where 0o is
a value suggested by a theory, we may well believe that 0o is a more
probable value for 0 than are other possible values for 0. If this is the
situation, then it is important to have test procedures that allow us to
incorporate nondiffuse prior informati0n. x5 To illustrate how nondiffuse
prior information can be introduced in com- paring and testing hypotheses,
assume that we have n observations y'= (Yx, Y2,..., Y,) independently
drawn from a normal population with un- known mean/ and known
variance ' -- 1. Assume that our null hypothesis is H0:/ = 0 and that we have
some reason for thinking that/ = 0 with probability rrx and/ 0 with
probability 1 - rrx. Further assume that the prior probability, 1 -rrx, that / %
0 is spread out uniformly over the interval -M/2 to + M/2; that is, our prior
beliefs about/ are as shown in Figure 10.1. Under these assumptions the
prior odds ratio pertaining to the hypotheses Ho:/ = 0 and H:/ 0 is rr/(1 - ,r
0. Under Ho, the pdf for y, given/ = 0 and, = 1, is (10.21) p(yl = 0, = 1)=
(/')-" exp (-� y,O.) = (X/)-, exp [-�(rs ' + ng')], where v=n- 1, vs '=ELx(y-
9) ', and y= Y.=xydn. Under the hypothesis Hx:/ % 0, we have [ ] (10.22)
p(ylt : 0, = 1) -- (v/) - exp -� (y,- t02 = (/-)- exp {-�[vs ' + n(l - 9)2]}. x5
Jeffreys, op. cit., p. 251, points out that often when the hypothesis 0 = 0 is
being tested the mere fact that it has been suggested that the parameter's
value is zero "... corresponds to some presumption that it is fairly small."
Thus use of a diffuse prior pdf for 0 would be inappropriate in this case.
HYPOTHESES WITH NONDIFFUSE PRIOR INFORMATION 303
Figure 10.1 -N/2 Prior pdf for Area of rectangle = 1- r / o ,t,!/2 Now, as
explained in Section 10.1, the posterior odds ratio Ko, relating to the
hypotheses Ho:/ = 0 and H:/ % 0, is given by (10.19), which specializes in
the present instance to e (10.23) K0x = rrx exp [-(n/2)9 �'] 1 - rrx f,u/. J-m
exp [-(n/2)(/ - 9) �'] dt4M The quantity K0x can be readily computed,
given values for rrx, n, 9, and M. Several points regarding (10.23) should be
appreciated. First it gives us a basis for comparing the hypotheses H0:/ -- 0
and H:/ 0 which incor- porate nondiffuse prior information x7 as well as
sample information. Second, as explained in Section 10.1, we can choose
between H0 and Hx to minimize expected loss if we are given losses
associated with possible actions and states of the world. Third, as Lindley
has demonstrated, x the result of using Kox in (10.23) to compare the
hypotheses Ho and Hx can differ from what is obtained in a sampling-
theory approach. It is to this point that we now turn. Assume that 9 is well
within the interval -M/2 to M/2 so that (10.24) fut. This approximation
amounts to neglecting the area under the normal pdf to the right of M/2 and
to the left of -M/2 which will be small if 9 is well within the interval -M/2
to M/2. On substituting from (10.24) into (10.23), :e Since the prior pdf in
the present problem is part continuous and part discrete, to be formally
consistent with (10.19), (10.19) should be expressed in terms of Stieltjes
integrals. 7 Prior information in forms other than that shown in Figure 10.1
can easily be introduced. See D. V. Lindley, "A Statistical Paradox,"
Biometrika, 44 (1957), 187-192. In M. S. Bartlett's comment on Lindley's
paper, Biometrika, 44, 533-534 (1957), a minor error in Lindley's paper is
reported.

304 ON COMPARING AND TESTING HYPOTHESES we have Ko= 1 ---


r [] exp -Y�' (10.25) rr M[n � () - 1 -rrx [-] exp - , where z = . Now, if z;
1.96, a sampling theorist would reject the hypothesis = 0 at the = 0.05
significance level. On inserting z = 1.96 in (10.25), we see that the resulting
expression for Kox, the posterior odds, depends on the quantities x, M, and
n. For certain values of these quantities the odds in favor of Ho can be high,
even though z = = 1.96. To illustrate this point assume that x = and M = 1.
Then Kox = exp (-za/2) and we can table values of n, Kox, and the posterior
probability that = 0, given by Kox/(1 + Kox). For large n the posterior
probability that = 0 is close to one, even though y = 1.96, a value that would
lead to rejection of the hypothesis = 0 at the = 0.05 sign.fficance level. This
is Lindley's paradox which illustrates very nicely that a sampling- theory
test of significance can give results differing markedly from those obtained
from a calculation of posterior probabilities which takes account of
nondiffuse prior and sample information� In Table 10.2 the results diverge
most markedly as n grows large? Basically, the point to appreciate is that in
general one minus a significance level in a sampling-theory test cannot be
equated with a degree of belief in a hypothesis represemed by a posterior
probability. We shall now consider Lindley's problem by using a more
general prior pdf for following the analysis put forward by Jeffreys. a� t us
assume that our prior probabilities on the hypotheses Ho: = 0 and Hx: 0 are
x and 1 - x, respectively. Given that Hx is assumed to be true, let p() be a
Table 10.2 n K0x Pr{t = 0[ x/ y = 1.96, *rx = �, M = 1) 1 0.058 0.055 10
0.185 0.156 100 0.584 0.369 300 1.012 0.503 10,000 5.843 0.854 100,000
18.477 0.949 zo If a sampling theorist were to adjust his significance level
upward as n grows larger? which seems reasonable, z, would grow with n
and tend to counteract somewhat the: i influence of the x/ factor in the
expression for Kox .... ?i 9.0 Jeffreys, op. tit., p. 246 if. "i HYPOTHESES
WITH NONDIFFUSE PRIOR INFORMATION 305 continuous proper
prior pdf for /; that is, f p(/)d = 1 under Hx. Let Pr{w = 0} denote the
prior>probability that Ho is true and Pr{w = 1}, the prior probability that
Hx is true, where w is the discrete random variable introduced in
connec.tion with (10.2). Using (10.16), we have (10.26) p(w, = p(w) p(lw)
P(Y[/, w) P(Y) Then the posterior probability that Ho is true, that is, w = 0,
is from (10.17)' p(w -- 0[y) = p(w -- 0)p(Ylt - 0, w -- 0) p(y) (10.27) rxp(Yl/
= 0, w = 0) p(y) Note that, given Ho (or w = 0), p(l TM = 0) -- 1 for/ = 0
and 0 for/ 0. The posterior probability that Hx is true, that is, w = 1, is given
by p(w -- l ly ) -- p(w = 1,/[y) d/ p(w = 1) f p(/) p(y[/, w = 1) (10.28) = P(Y)
(1 - rrx) J'p(t) p(ylt, w = 1) d/ p(y) Then the posterior odds ratio Ko is
(10.29) Kox = p(w = 0[y) = rrx p(y[/ = 0, w = 0) p(w = 11y) 1 - f p60 p(YI,
w = 1) Under the above assumptions that the elements of y' -- (y, yo.,..., y)
are independently drawn from a normal population with standard deviation
= '0, with *o known, (10.29) becomes r x ) exp (10.30) Kox = 1 - exp [-
(n/2,o')(/ - y)'] d/ This expression can be evaluated if r and p(/) are given. -?
Let us assume that in (10.30) rx equals �; that is, the probability that i//0:/
= 0 is'true is �. Then rx/(1 - rx) = 1 and (10.30) becomes ! exp [-
(n/2,o�')y �'] (10.31) = : , p(/) exp [-(n/20')(/ - y)'] Along with Jeffreys,
consider two extreme cases. First assume that there is '. finite interval, say -
a to a, such that f5 p(/) d/ = 1. If y lies within the interval -a to a and *o/V is
so large that n(/ - )'/2o ' is small for -a < :l < a, we have both the numerator
and the denominator of Kox in (10.31),

306 ON COMPARING AND TESTING HYPOTHESES each


approximately equal to one, and under these conditions K0x -' 1. Thus there
is no discrimination between the hypotheses when the standard devia- tion
%/V associated with the mean y is much greater than the range for -a to a,
permitted under the hypothesis Hx. The result K0x -' 1 under these
conditions appears to be reasonable. As another extreme case, assume that
%/V is very small so that exp [-(n/2o')(/ - y)'] can take very large values in
the interval -a to a. The integral in the denominator of (10.31) is
approximately equal to (x/ %/r)p(y), where p(y) is the prior pdf for
evaluated at = y. Then 1 rnexp( nyU ) (10.32) Ko. -'- 'X/p(.) % 2 oo ' As
Jeffreys points out, if y -- 0, Kox is proportional to r and, as n grows, Kox
does also, which indicates support for the hypothesis Ho: V = 0. If lYl >>
%//, the exponential factor in (10.32) will be small and thus the
observations tend to support Hx: V % 0; that is, Kox tends to be small
under this condition. Further, for given n there will be a value for V Y/ao
such that Kox = 1. Last, for values ofy such that lY] < %/x/;z, as n grows,
Kox grows. This is reasonable, since having lY] less than its standard
deviation, should carry more weight for believing that V -- 0 when n is
large than when n is small. 'x Since nondiffuse prior information affects
posterior odds in both small and large samples, it is obvious that care must
be exercised in representing the prior information to be employed in an
analysis. That this information exerts an influence on posterior odds, even
in large samples, contrasts with the situation in estimation in which the
influence of nondogmatic prior informa- tion on the shape of a posterior pdf
generally diminishes as the sample size grows. 10.4 COMPARING
REGRESSION MODELS Often in econometrics and elsewhere, we have
the problem of comparing alternative regression models designed to explain
the variation of a particular dependent variable; for example, in the work of
Friedman and Meiselman '' two models were studied: (10.33) (10.34) Ct =
Vo + vx Mt + uxt Ct = mo + mxAt + uat) t= 1, 2,..., T, ax See Jeffreys, op.
cit., Chapters 5 and 6, for further consideration of tests of significance.
o.o.M. Friedman and D. Meiselman, "The Relative Stability of Velocity and
the Investment Multiplier," in the Commission on Money and Credit
volume, Stabilization Policies. Englewood Cliffs, N.J.: Prentice-Hall, 1963.
COMPARING REGRESSION MODELS 307 where the subscript t denotes
the value of a variable in the tth time period, Ct, consumption, Mr, the
money supply, At, autonomous spending, and and u.t, error terms. The v's
and m's are unknown regression parameters. To choose between (10.33) and
(10.34) Friedman and Meiselman used a goodness of fit measure, R '. It is
of interest to determine conditions under which this criterion, namely, the
choice of the model with the higher R ', is compatible with minimizing
expected loss in a Bayesian decision theoretic approach? In addition, we
provide general Bayesian procedures for comparing and choosing models.
Assume that there are just two possible models for explaining the variation
of a dependent variable; that is, we assume that the observation vector Y' -
(Yx, Y., � �., Yn) is generated by (10.35) Mx:y = XxlSx + ux or by
(10.36) M,.:y = Xda,. + u., where Mx denotes the first model and Mo., the
second, Xx and Xo. are each n x k matrices of given quantities, each with
rank k, 1S and 1So. are each k x 1 coefficient vectors with no common
elements, and u and no. are each n x 1 vectors of error terms. Given that Mx
is correct, the elements of u are assumed to be normally and independently
distributed, each with mean zero and common variance rrx �'. Similarly, in
the case that Mo. is correct, the elements of no. are assumed to be normally
and independently distributed, each with mean zero and common variance
As regards prior pdf's for the parameters 1S, and , i = 1, 2, we employ the
following natural conjugate forms for i = 1, 2: (10.37) *,) = p(N*,)p(*,),
with (10.38) and (2rr)/arh exp (10.39) [ a This problem has been analyzed
in H. Thornher, "Applications of Decision Theory to Econometrics,"
unpublished doctoral dissertation, University of Chicago, 1966, and in M.
S. Geisel, "Comparing and Choosing Among Parametric Statistical Models:
A Bayesian Analysis with Macroeconomic Applications," unpublished
doctoral disserta- tion, University of Chicago, 1970. See also G. E. P. Box
and W. J. Hill, "Discrimination Among Mechanistic Models,"
Technometrics, 9, 57-71 (1967).

308 ON COMPARING AND TESTING HYPOTHESES where the


normalizing constant K = 2(q&'/2)q,'lI"(qd2). In (10.38) we have assumed a
proper normal prior pdf for the elements of 15, given , with prior mean
vector and covariance matrix F'C-X; C, a k x k matrix to be assigned by the
investigator, is assumed to be a positive definite symmetric matrix. In
(10.39) the prior pdf for is in the inverted gamma form with parameters q
and -F' to be assigned values by the investigator. For (10.39) to be proper,
we need 0 < q, .' < oo, i = 1, 2. With the above assumptions about the
models Mx and M,. and the prior pdf's for the parameters, we can apply
(10.19) to obtain posterior odds, Kx., pertaining to the two models. If we let
p(MD/p(M,) denote the prior odds, then-4 (10.40) Kxo. = p(m) f p(lsl,,x)
p(,,D p(ylls, ,,, mx) d[Sx dex, p(Mg.) f p([59. le,.) p(e9.) p(yllS,., ,,., M,.)
dO9. where, for i = 1, 2, p(y[lS, e, MD, viewed as a function of 15 and e, is
the likelood function, given that M is assumed to be true; that is, for i = 1, 2,
(10.41) { i [,,s,9.+(f,-{,)'x,'x,(f, {,)]}, 1 1 exp 2th9 , - p(yl[,, th, m) = 2r)nt9 ,
, where = (XdX)-xX'y, v = n - k and vs9. = (y - X0'(y - X). To evaluate K9.
in (10.40), we must perform the indicated integrations; that is, we have
(10.42) + (x - x)'Xx'Xx(x - x)]} dx d%. To integrate with respect to the
elements of , complete the square on in the exponent as follows: ( - )'c( - ) +
( - )'x'x( - =' - 2'(c + x'x$) + 'c + = ( - 0'( - ) + 'c + 'x'x - = ( - )'( - 0 + ( - O'c(
- ) + - - Note that (10.40) can be expressed as Kx: = [(Mx)/(M:)]
[(y]Mx)/(ylM:)], where (y[M0 is the marginal pdf for the observations i = 1,
2, given that M is assumed to be true. COMPARING REGRESSION
MODELS 309 where (10.43) A = C + X'X and = A-(C + X'X). On
substituting in (10.42) and integrating with respect to the elements of [,
using properties of the multivariate normal pdf, we obtain (10.44) (2) ++
exp 2: [q: + .s: + Q + Qo] d, where we have introduced the following
definitions' (10.45) Q, = ( - x)'C,(x - ) and Qxo = ( - x)'Xx'Xx(x -D. Then on
integrating (10.44) with respect to , the result is : 1 gx {ICl 2<+:>/:P[(n +
qO/2] 2 (2n) : kl/ (qx?x + vsx + Qx + Qxo) (+Ot: When similar operations
are performed to evaluate the integral in the denominator of the expression
for Kx: in (10.40), the result for K: is Kx: p(mx) Kx [IcIIIAI] 2 <+": r[(n +
q0/2] = p Llcl/J 2(+% " r[(n + qa)/2] (qx + s a + Q= + Q)-(+/ x (qaga + vsa
+ Qa= + Qao)-(+a)/a (10.46) p(MO [ICl/ll] r[(n + qD/2]/r(qd2)(q)./ -p
[IGI/IA=IJ r[(n + q)/2]/r(qd2)(q)/ x (qxgxa + vsxa + Qx= + Qx)-(+o/a,
(qaga a + vsa + Qa= + Qa)-(+%)/a' where Aa = Ca + Xa'Xa and Qa= and
Qa are defined similarly to Qx= and Qa in (10.45); that is (10.47) Qa (a ' ' -
= a) Ca(a a) and Qa (a ' ' ' - - - = - ) x x( ), wi a = (Xa'Xa)-xXa'y, = Aa-x(Ga
+ Xa'Xaa), and a is the prior mean vector for a. The expression for Kxa in
(10.46) can also be expressed as -- pR- tlCl/l k] =/8A,(=/80 ' where, for i =
1, 2, 8 = (v& + Q + QO/n and ,,,(xa/8) denotes the ordinate of the F pdf
with qt and n degrees of freedom. We now turn to the interpretation of the
expression for Kxa in (10.48). a Note for a, b > 0, I (a+)-x exp (-b/2 ) d =
(2/b) m I xm-Xe-dx = (2/b)taF(a/2), where denotes the gamma function.

310 ON COMPARING AND TESTING HYPOTHESES 1. The first factor


p(Mx)/p(M.) is the prior odds ratio. If, for example, we had no reason to
believe more in one model than the other, we would take p(Mx) = p(M.) =
� and the odds ratio p(Mx)/p(M.) -- 1. 2.:The second factor involves the
ratios IGI/IAI and Icl/IA.l. Note from (10.38) and (10.43) that [c,I/l&l is a
measure of the precision (or information) in the prior pdf for 15, given t h in
relation to the posterior precision (or information), given th for i-- 1, 2. The
posterior precision, given % is proportional to [&[ = IG + X[X,I and thus
depends on C and the "design" matrix X[X. The posterior odds ratio Kx.
will be larger (smaller), the larger (smaller) IGI/I&I is relative to I This is
reason- able, since, ceteris paribus, we should favor the model with more
prior information, as measured by [GI/[&[, provided, of course, that the
prior information is in accord with sample information (on this point see
below). 3. The third factor in (10.48), namely, ($d$.) and 82 which reflect
what the sample information has to say about the good- ness of fit of the
models and the extent to which the prior information about the coefficient
vectors is in accord with the sample information; for example, 8x = (v& ' +
Q, + Qxb)/n. Now vsx 2 = (y - X[)'(y - X) is just the residual sum of
squares, a goodness of fit measure, and -- - - involves the difference
between the prior mean vector x and x, which is a "matrix" weighted
average of x and {x, where [x is (Xx'Xx)-xXx'yx, the sample estimate. Thus
the closer the agreement between x and x, the smaller Qx; Qxb = (x -
x)'Xx'Xx(x - x) depends on the difference between the sample estimate [x
and the average of x and x, namely ix. Similar con- siderations relate to 8o..
Thus, for given n, the larger 8x is relative to 8., due perhaps to poor fit (high
vsx 2) and/or incompatibility of prior and sample information about lSx
(high Qx, and Qx), the lower will be the posterior odds, Kx2, in favor of 4.
The final factor in (10.48), (10.49) wxf.n(w0 where w = &'/8,, i = 1, 2,
reveals the dependence of Kxo. on the prior information regarding trx and
fro. [see (10.39) for the prior pdf's for these parameters.] '6 Since w depends
on g,o., a prior parameter connected with the location of the prior pdf for {',
and &', a sample estimate of ,to., Kx. will be affected by divergences
between these two quantities. Further assume that g' = .' and qx = q.; that is,
we assign the same prior pdf's for tr and .. Note that the inverted gamma pdf
for % i = 1, 2, in (10.39) has a modal value at &x/td(1 + qO. Also, the prior
mean of ,' is qd'/(q, - 2), for q > 2. COMPARING REGRESSION MODELS
311 Then, should 8x = 8o., the factor in (10.49) assumes a value of one,
which appears reasonable. Its value is also one under the more general
conditions q = q. and '/8 = Next, it is interesting to determine the behavior
of the posterior odds K., given in (10.48), when n grows large. As n grows
large, 8t --> &', since Q:/n and Qo/n both approach zero. '* Assume further
that [Icl/l&l .-' [Co.[/[A2[] --+ 1 as n grows large, qx = qo. = q, and ' = o.
�' = '. Under these conditions and noting that (1/q)f,,,(w) approaches the X
pdf with q degrees of freedom when n is large, '6 we obtain the following
expression for the posterior odds: {sx�. -(, +,)l. exp (-q?�'/2sx�') (10.50)
Kx. = !s?/ exp (-qS'/2s. ') if we take p(Mx) = p(Mo.). The value of Kxo. is
vitally dependent on the relative sizes of the sample quantities sx ' and s.'.
�'� If s ' = s. ', Kx. = 11 if sx 2 > s. ', Kx. < 11 and if sff < s. �', Kx. > 1.
These results are intuitively pleasing. They relate, however, to situations in
which n is large and the other assumptions introduced above are satisfied.
Last it is useful to determine what happens as we let our prior information
become diffuse; that is JGI -> 0 a� and q = qx = q. -+ 0 with the
assumption that p(MO = p(Mo.). Under these conditions , --> , and thus Q:o
--> 0 and Qa--> 0. ax Thus 8--> v&'/n as our prior information about lSx
and becomes diffuse. Further we assume that, as ]C[-->0, i = 1, 2,
[JCxJ/JAxJ + JC.J/tAo. J] � --> 1. Last we assume that gxo. = o.o.. Under
these assumptions the o., We assume that limn-.o X(Xdn is finite, i = 1, 2.
9.8 That is, the pdf ofqw, which is (1/q)f,n(w), approaches [2m(qw) rn-
q/r(m) exp (-qw/2) as n grows large. 9.0 If q is very small so that the ratio
of exponentials in (10.50) is close to one, then K �/s)-t. ao Since [C[
equals the product of the roots of C, [C,[ 0 can result from having all the
roots of C 0, which is what we assume. ax Q,a = ($, - ,)'C,(, - $,) = z(P(CtP,
z, = z,'D,z,, where $, - $t = Ptzt with P, an orthogonal matrix such that
P(CtPi = D,, a diagonal matrix with the roots of Ct on the diagonal. If these
roots 0, Qa 0. With respect to St - as the roots of C 0, we have , = (x,'x, +
c,)-(c,$, + x,,,) = [ + (')-c,]-(,x,)-(c,, + ,,). Now let Hi be an orthogonal
matrix such that H((X(Xt)- CHt is a diagonal matrix, say Gt, with diagonal
elements of less than one. Then [I + (') - (I - G + G,: .... ) H' = I as the roots
of Ct 0. This leaves us with t = + (')-xC,. Now write (')-xC, = HH'(X')-
XCHH,, = which approaches zero as the elements of the diagonal matrix Gt
0, as they will do if the roots of C 0; that is, J GJ = J(X()- G J = L x &,�t,
where the &t's are the roots of (')-x and the i's are the roots of C. More
simply, if all the roots of Ci 0, then C a null matrix and the results above
follow immediately.

312 ON COMPARING AND TESTING HYPOTHESES expression for Kx.


in (10.48) becomes [$12 - n/2, (10.51) K. = which is a function 3�' of the
ratio &'/so?. If &'/so. �' = 1, Kxo. = 1, if &'/so. o. > 1, Kx. < 1, and, if
sx'/so? ' < 1, Kxo. > 1. Thus, as pointed out in Section 10.1, if our loss
function is symmetric, acting to minimize expected loss in the choice
between Mx and Mo. involves choosing the model with the higher posterior
probability. Under the present assumptions this is compatible with the rule
of choosing the model with the smaller so. (or higher R�). Of course, this
rule is appropriate only in the case of a symmetric loss function and diffuse
prior information or when n is large and other conditions are satisfied, as
explained above. 10.5 COMPARING DISTRIBUTED LAG MODELS, In
Chapter 7 the estimation of distributed-lag consumption function models
was considered. In this section we take up the problem of computing
posterior odds pertaining to alternative formulations with nondiffuse prior
information about the parameters incorporated? a As in Chapter 7, we
consider the following equation: (10.52) G = hCt_x d- (1 - A)kYt d- ut-
Aut_x, t = 1, 2,..., T. In 00.52) the subscript t denotes the value of a variable
in the tth time period, Ct and Y are price-deflated seasonally adjusted
personal-consumption expenditures and disposable income, respectively, 2t
and k are parameters, and ut is a disturbance term. Our alternative models
for the observations, the G's, taking Co as given, are described here: Mx:
Equation 10.52, with ut- 2tut_x = qt, t = 1, 2,..., T, where the qt's are
normally and independently distributed, each with mean zero and variance
xo.. M2: Equation 10.52, with u = o.t, t = 1, 2,..., T, where the o.t's are
normally and independently distributed, each with mean zero and variance
0'22� Note that, as long as ; % 0, Mx and Mo. are mutually exclusive. As
prior pdf's for the parameters of (10.52) we employ the prior pdf's aa
Thornber, loc. cit., and Geisel, 1oc. cit., arrive at the same result with
slightly different assumptions. aa These results are taken from A. Zellner
and M. S. Geisel, "Analysis of Distributed Lag Models with Applications to
Consumption Function Estimation," invited paper presented to the
Econometric Society, Amsterdam, September 1968, and published in
Econometrica, 38, 865-888 (1971). COMPARING DISTRIBUTED LAG
MODELS 313 discussed in Section 7.5. For convenience they are
reproduced below, with, in each instance, 0 < , < 1, 0 < k < 1, and 0 < < oo.
First Prior Pdf's: 1 (10.53) p(A, k, ,,) oc --, i = 1, 2. Second Prior Pdf' s :
(10.54) p(,, k, ,,) oc M(1 - ')X4k89(1 - k)9 i 1, 2. , o' t Third Prior Pdf's: A(1
- A)*ks�(1 - k) o (10.55) p(2, k, (,) cc ., i -- 1, 2. In each instance we
assume that A, k, and ( are independently distributed and we take a diffuse
prior pdf for (. With respect to A and k in (10.53), these parameters are
assumed to be uniformly distributed in the interval zero to :one. In (10.54)
and (10.55) independent beta prior pdf's are employed for ; and k. The beta
prior for k has mean 0.9 and variance 0.00089 in both (10.54) and (10.55).
The prior pdf for A has mean 0.7 and variance 0.0041 in (10.54) and mean
0.2 and variance 0.0146 in (10.55). The posterior odds ratio is given by
(10.56) , p(Mo.) fj'j' (cid, .o.) where p(Mx)/p(M.) is the prior odds ratio, C'=
(Q, Co.,..., C.), and p(C[;, k, (h) is the likelihood function, i = 1, 2, given
Co. The likelihood functions are given by (10.57) and I ( k, oc exp - (yl T 1
[C - - (1 - x [C - ;C_z - (1 - ,X)kY]} (10.58) p2(CIA, k, oc lal- ,? exp - 1
2(?' [C - ,XC_x - (1 - ,X)kY]'G -x x [C- ,XC_x- (1 - ,X)kY]},

314 where C_ ' ON COMPARING AND TESTING HYPOTHESES = (Co,


Q,. �., Q,-0, Y' = (Yx, Y., � � ', Yv) and -1 + A ' -A 1 + A 2 e e e -A 1 +
a T x T matrix with nonzero elements just in the three bands shown. On
inserting (10.57) and (10.58) in (10.56) and performing the integrations
with respect to trx and try., we have (10.59) /q,. = p(36) f [ p(, k){[c - c_ - (1
- p(M.) x [C - 2tC_x - (1 - A)kY]} -v/' d2t dk f f p(, )iGi-{[c - c_ - (1 - )/
�Y]'G - x [c - Ac_ - (1 - A)�]}-' an where/(h, k) denotes the prior pdf for
A and k. Bivariate numerical integra- tion was performed to evaluate the
double integrals in (10.59) for each of the prior pdf's shown above, with the
following results: Prior Odds p(M)/p(m.) = 1 p(Mx)/p(M.) = 1 p(M)/p(m.) =
1 Prior Pdf for ; and k Posterior Odds (10.53) Kx. = 1.62 x 108 (10.54) Kx.
= 6.32 x 108 (10.55) Kx. = 1.49 x 108 It is seen that in each instance the
information in the data a4 moves us from a prior odds ratio of one to a
posterior odds ratio overwhelmingly in favor of M. Note also that although
the Kx. values are high, no matter which of the three prior pdf's is
employed, the Kx. values show substantial variation as we change our prior
assumptions about A and k. As a second example of the comparison of
alternative distributed lag schemes by computation of posterior odds, we
take up the analysis of Solow's distributed lag model as which has a flexible
two-parameter weighting scheme. Solow's model for the observations y' =
(yx, y.,..., yv) is (10.60) yt = a,xt-t + ue, t = 1,2,...,T, a These are U.S.
quarterly data for the period 19471-1960IV taken from Z. Griliches, G. S.
Maddala, R. Lucas, and N. Wallace, "Notes on Estimated Aggregate
Quarterly Consumption Functions," Econometrica, 30, 491-500, pp. 499-
500 (1965). as R. M. Solow, "On a Family of Lag Distributions,"
Econometrica, 28, 393-406 (1960). Analyses of Solow's distributed lag
model from the Bayesian point of view have been reported in V. K. Chetty,
"Discrimination, Estimation, and Aggregation of Distributed Lag Models,"
manuscript, Columbia University, 1968, and in M. S. Geisel, loc. cit.
COMPARING DISTRIBUTED LAG MODELS 315 where xt-t is the value
of an exogenous variable in period t - i and the weights, the a[s, are given
by (10.61) oq=k( r+i- 1)(1-,),V, 0<,X< 1, r>0 aud i=0,1 2, where k, r, and h
are unknown parameters. It is seen that at is k times a factor that is precisely
in the form of probabilities associated with the Pascal dis- tribution (or with
r not an integer, the negative binomial distribution). Since Solow considers
positive integral values of r adequate for his purposes, we shall do so also.
He notes that the mean and variance of the Pascal distribu- tion are r2q(1 -
20 and r2q(1 - h) , respectivdy. us both the mean and variance increase with
r and h separately. Also, the mode is always less than the mean, and in this
sense the distribution is skewed to the right. As Solow points out, the larger
the h and the smaller the r, the greater the skewness. Note that if r = 1 we
have the special case of geometrically declining 's. On introducing a lag
operator L such that Lxt = xt- and Lxt = xt_ and noting that we can rewrite
(10.60) as (10.62) (1 - )yt = k(1 - A)x + (1 - AL)ue. In analyzing this
equation, we s11 follow Solow's first case in which he assumes that (1 -
AL)u = , t = 1, 2,..., T, and that the e's have zero means and common
variance ou.a Further, we assume that the els are normly and independently
distributed. The problem we pose is the following one. Given our
observations and prior assumptions about the parameters, what are the
posterior probabilities associated with these hypotheses: Hx :r = 1, H2: r =
2, Ha :r = 3, and H:r = 4 ? These four hypotheses are mutually exclusive
and are assumed to be the entire set of possibilities. The direct solution to
this problem is simply to compute the posterior pdf for r, a discrete pdf that
contains the information we require. Our four models are (10.63) Mx:r = 1,
ye = hye-x + k(1 - h)xe + (10.64) M:r = 2, y, = 2y,_ - y,_ + (1 - )x, + ,,
00.65) M,:r = 3, y, = 3y_ - 3y_ + *y,_, + ( - )*x, + , a Solow rationalizes this
assumption as follows: it is possible that an investigator will formulate the
model (1 - AL)ryt = (1 - A)rxe + ee as his basic model without reference to
(10.60) and the subsequent steps leading to (10.62). He suggests an analysis
of residuals to determine whether the q's are nonautocorrelated.

316 ON COMPARING AND TESTING HYPOTHESES and (10.66) M4:r


= 4, Ye = 4Ayt-x - 6A'Yt-u + 4A3Yt-3 - Mye_4 + k(1 - A)xe + qt. For given
r, say r = i, our prior pdf for the parameters is denoted by �(2, k, e]r = i),
where e is the standard deviation of ere, i = 1, 2, 3, 4. Then the posterior
odds ratio relating to, say, models Mx and M2 is given by Pr{r= f f .f = 1)l
dA dk dc (10.67) Kx2 = Pr(r = 2) f f f p(A, k, c21r = 2)l a d where Pr(r =
1}/Pr(r = 2} is the prior odds ratio and l denotes the likdihood function for
the parameters of M, i = 1, 2. From the posterior odds so computed it is
straightforward to obtain posterior probabilities? Geisel has applied a this
approach to the .analysis of U.S. quarterly dam in which Yt equals price-
deflated seasonally adjusted personal-consumption expenditures and xe
equals price-deflated seasonally adjusted personal- disposable income. His
prior pdf's for the parameters were taken as follows for i = 1, 2, 3 and 4: 1}
0<-,<, (10.68) p(A, A, .,Ir = i) m 0 < A, k < 1. Also, he assumed
Pr{r=i}/Pr{r=j}= 1 for i, j= 1,2,3,4. On sub- stituting from (10.68) in
(10.67), the integration with respect to the ,,'s can be done analytically. To
integrate with respect to A and k, bivariate numerical integration techniques
were employed. The results of his analysis, based on quarterly data, 1948 to
1967, are shown below: Posterior Value of r Probability 1 0.762 2 0.177 3
0.043 4 0.018 It is seen that the posterior probability associated with r = 1 is
quite a bit larger than those associated with other values of r. However, the
probability that r = 2, 0.177, is not negligible and thus the analysis points to
a possible departure from the model with geometrically declining weights (r
= 1)? a? If the posterior probabilities are denoted -, i = 1, 2, 3, 4, we have 1
-- 'x + 'a + 'a + w4 = wx(1 + wa/wx + wa/wx + w4/wx). Thus wx = 1/(1
+'Ko.x + Kax + K41), and so forth, where Ks = oo M. S. Geisel, loc. cit. He
assumed given initial values in formulating his likelihood functions. oo See
Geisel's and Chetty's work for results bearing on this issue. QUESTIONS
AND PROBLEMS 317 In concluding this chapter, it is pertinent to
emphasize that Bayesian methods for comparing and testing hypotheses
constitute a unified set of principles that is operational and applicable to a
broad range of problems. These methods permit the incorporation of prior
information when com- paring and testing hypotheses and, more important,
yield posterior proba- bilities associated with hypotheses, which are useful
in a wide range of circumstances. Last, when a choice has to be made, the
Bayesian-decision theoretic approach to testing is one that involves acting
to maximize expected utility in accord with some basic results in the
economic theory of choice. QUESTIONS AND PROBLEMS 1. Suppose
that we have n = 10 observations drawn independently from a normal
population with unknown mean t and standard deviation a = 1. If the sample
mean is 1.52, compute the posterior odds relating to the hypotheses Hx:t =
2.0 and Ha:t = 1.0 with each of the prior probabilities on these hypotheses
taken to be . 2. In connection with the hypotheses in Problem 1, assume the
following loss structure: State of World Action Hx true Ha true Accept Hx
0 4 Accept Ha 2 0 and compute and compare expected losses associated
with the actions Accept Hx and Accept Ha, by using posterior probabilities
obtained from results in Problem 1. 3. If we have n = 15 observations drawn
independently from a normal popula- tion with t = 1.0 and unknown
variance o. and s = __(y - )�./n = 2.2, obtain posterior odds relating to the
hypotheses Hx' rr a = 1.0 and Ha: rr ' = 2.5 by using equal prior
probabilities for the hypotheses. 4. If, in Problem 3, we considered a third
hypothesis, Ha: rr ' = 1.9, and assigned equal prior probabilities to Hx, Ha,
and Ha, would the posterior odds associated with Hx and Ha be changed
from what was obtained in Problem 3 ? Compare the posterior probabilities
relating to ,Hx, Ha, and Ha with those obtained in Problem 3. What is the
effect of increasing the number of mutually exclusive hypotheses
considered on the value of a posterior probability for one of them, given a
particular sample of data and equal .prior probabilities ? 5. Show how the
posterior odds for hypotheses Hx and Ha of Problem 1 can be computed
when the population standard deviation rr has an unknown value and we
use the following prior pdf: p(a I Vo, So a) = ka-% + x exp (-voso�'/2aa),

318 ON COMPARING AND TESTING HYPOTHESES 0 < a < o% where


vo and so: are positive parameters whose values are assigned by an
investigator and k = 2(VoS'/2)ota/F(Vo/2). 6. Discuss Lindley's paradox,
presented in Section 10.3, under the assumption that the sampling-theory
significance level for the test is increased as n grows in size. Why would it
be reasonable to change the significance level as n increases ? 7. Suppose
that we have n independent observations y' = (y, yo.,..., Yn) from a normal
population with unknown mean t and standard deviation a = 1. Consider the
hypotheses t = 0 and t 0. If, under t 0, we were to use the diffuse prior
pdfp(/x) oc constant, - oo < t < oo,'what would be the value of the posterior
odds associated with the two hypotheses ? In what sense is the result
obtained in agreement with the fact that the maximum likelihood estimate
for/x will have zero probability of assuming the value zero ? 8. Use
Lindley's approach to testing for situations in which prior information is
vague or diffuse to construct a test of the hypothesis 1'15 = c, where I is a k
x 1 vector of given constants, c is a given scalar constant, and I$ is a k x 1
vector of regression parameters appearing in a standard, linear, normal
multiple-regression equation, y = XI$ + u. 9. In connection with (10.40), it
was noted that the posterior odds pertaining to the regression models M and
Mo., shown in (10.35) to (10.36) can be expressed as p(Mx) P(Yl Mx) Ko.
= �( Mo.) p(y[Mo.) Under the assumption that Mx and Ma are the only
models under con- sideration, obtain posterior probabilities for Mx and M:
from the expression for Kxo.. 10. Explain how the posterior probabilities
for Mx and M: in Problem 9 can be employed to obtain a weighted average
of the mean of a future observation assumed to be generated from Mx in
(10.35) with probability equal to the posterior probability for Mx or from
Mo. with probability equal to the posterior probability for M:. 11. Express
Kx: in (10.51) in terms of Rxo. and Ro. a, the usual squared multiple-
correlation coefficients and note how Kx: depends on these quantities,
noting also that Kxo. is a relative measure of belief. Then, under the
assumption that the set of models under consideration includes just Mx and
Mo., obtain posterior probabilities for Mx and for Mo. in terms of Rxo. and
Ro.o. and examine their dependence on these latter quantities. 12. Explain
how informative prior pdf's for r, i = 1, 2, in the inverted gamma form can
be incorporated in the analysis of the posterior odds ratio shown in (10.56).
CHAPTER XI Analysis of Some Control Problems In control problems we
usually have the following elements' (a) a criterion function, (b) a model
containing some variables appearing in the criterion function, and (c) a
subset of the model's variables which can be controlled; for example, in the
economic theory of the firm the criterion function is usually taken to be the
profit function. This function depends on output and the prices of outputs
and inputs, which are usually incorporated in a model of the markets for
output and inputs. The variables assumed to be under control of the firm are
the levels of the inputs, say labor and capital services. Generally, we wish to
determine values of the control variables, consistent with possible
restrictions imposed by the model, to maximize (or minimize) the criterion
function. In the firm example values of the input variables, consistent with
the non-negativity requirements of the model and tech- nological
constraints, are sought to maximize profits. This problem can usually
be.solved without much difficulty for nonstochastic models. When,
however, the model is stochastic, and has unknown parameters which have
to be estimated, the problem becomes more difficult. In what follows we
analyze several problems of this kind. It will be seen that the Bayesian
approach is convenient for obtaining solutions, since it treats stochastic
elements and uncertainty about parameter values in a systematic and unified
manner. When control problems involve optimization over several time
periods with stochastic elements and uncertainty about parameter values
present, another basic complication appears. The setting of control variables
in one period importantly affects the information regarding parameter
values we have for other time periods. Thus in multiperiod control
problems involving random variables and uncertainty about parameter
values a full solution to the optimization problem will have to take account
of the flow of information about parameter values as we proceed through
time. Bayesian solutions to problems of this kind, called "adaptive control"
problems, provide a sequence of optimizing actions, settings of the control
variables through time, See, for example, M. Aoki, Optimization of
Stochastic Systems. New York: Academic, 1967, for a more extensive
coverage of problems and methods. 319

320 ANALYSIS OF SOME CONTROL PROBLEMS which not only take


account of what we learn from new data but also achieve the combined and
interdependent goals of control and learning about param- eters' values
most effectively. Thus a Bayesian solution to an adaptive control problem is
a simultaneous solution to a combined control and sequential design of
experiments problem. The plan of this chapter is as follows. In Section 11.1
we analyze several one-period control problems. Emphasis is placed on
how uncertainty about parameter values and the cost of changing control
variables affect solutions. In the next two sections we generalize some of
the one-period results to apply to the problems of controlling the outputs of
multiple and multivariate regression models. Sensitivity of results to the
form of the criterion function is explored in Section 11.4. In Section 11.5
we take up the analysis of a two- period problem, and in Section 11.6 two
multiperiod problems are considered. 11.1 SOME SIMPLE ONE PERIOD
CONTROL PROBLEMS Our first problem involves control of a simple
regression model; that is, we assume that the model generating our
observations is the simple regression model considered in Chapter 3 ':
(11.1) Yt = xt + ut, t = 1, 2,..., T. In (11.1) fi is an unknown parameter, Yt
and xt are the tth observations on the dependent and independent variables,
respectively, and ut is the tth random unobservable disturbance term. We
assume that the values of the independent variable can be controlled and
that the ut's are normally and independently distributed, each with zero
mean and unknown variance o a. In the first future period t -- T + 1 we
have, with z - Yv + x and w -- xv + x, (11.2) z = jSw + uv+x, where uv+x is
normally distributed with mean zero and variance tr ' and independent of all
previous disturbance terms. It is assumed that we have not yet observed z =-
- yv + x nor determined a value for w -- xv + x. To form our criterion
function we assume that we wish z to be close to a given target value,
denoted by a. The loss associated with being off target is assumed to be
given by the following quadratic loss function3: (11.3) L(z, a) = (z - a) '. '
To keep the analysis as simple as possible, we initially suppress the
intercept term. The analysis of the problem of controlling a simple
regression process, presented below, draws on the work presented in A.'
Zellner and M. S. Geisel, "Sensitivity of Control to Uncertainty and Form
of the Criterion Function," in D. G. Watts (Ed.), The Future of Statistics.
New York: Academic, 1968, pp. 269-283. o A quadratic utility function U -
- Co + 2cxz - c.z ' with cx and ca positive constants can be brought into the
following form by completing the square on z: U -- a0 - SOME SIMPLE
ONE PERIOD CONTROL PROBLEMS 321 Since z is random, L(z, a) is
random, and it is impossible to minimize a random function. Rather, we
pose our control problem to be minimization of the mathematical
expectation of L(z, a) with respect to the choice of w; that is (11.4) min
EL(z, a) -- min E(z - a) ' o is our problem in which z depends on the control
variable w, as shown in (11.2). The mathematical expectation of the loss
function is given by EL(z, a) = f-o L(z, a) p(zly, w) dz (11.5) = (z - a)'p(zly,
w)dz, where p(z[y, w) is the predictive pdf for z -= Yv + , given the sample
informa- tion y and the value of the control variable w -- xv + . From the
results in Chapter 3 p(z[y, w) is known to be in the following Student-t form
when we use a diffuse prior pdf 4 for fi and (11.6) p(zly, w) = r[(v + 1)/2]
g� [ g ] , 72Tr(v-55 where v = T- 1, /� = [__ xtYdS[_- xf', g = [f'(1 +
w'/[__ xf-)] -x, and vf' = [__ x (Yt - flxt) '. The mean of this pdf is w/ which
clearly depends on w. Also, since g depends on w, the "spread" of the pdf,
as well as its mean, depends on the value of the control variable w. Since
E(z - a) ' = E[z - Ez - (a - Ez)] ' = Var z + (a - Ez) ', we have s forv> 2 vsa (
w ' ) (11.7) E(z - a) a = v--_ 2 1 + 212 x a + (a - wt) a. Since (11.7) is
quadratic in w, it is easy to establish that the value of w, which minimizes
expected loss, denoted by w*, is 6 (11.8) w* ca(z - a) ', where ao = Co +
cx/c. and a = cx/c2. Since U is a decreasing function of (z- af', minimizing
the expectation of (z- a) e, our loss function in (11.3), will maximize
expected utility. That is, p(fl, ,) oc 1/ with -oo < fl < oo and 0 < < oo. The
above analysis can also be performed easily with a natural conjugate prior
pdf for fl and . From (11.6) t = x/j (z- w/) has mean zero and variance /( - 2).
Thus z has variance v/g(v - 2). This is essentially the result obtained by W.
D. Fisher, "Estimation in the Linear Decision Model," Intern. Econ. Rev., 3,
1-29 (1962), except that we have considered to be an unknown parameter.

322 ANALYSIS OF SOME CONTROL PROBLEMS where to �' =


�'mxx/' with mx = [= xe �' and 5 �' = vs�'/(v - 2). It is seen that w* is
equal to the product of two factors. The first, a/, is the target value divided
by the mean of the posterior pdf for/, namely, . The second factor is a
function of t; �' = �'rn,,/�'. Since g�'/rnx is the posterior variance for/,
to �. is the square of the coefficient of variation associated with the
posterior pdf for/. As the precision of estimation, measured by increases,
too. increases and the second factor in (11.8) approaches one; that is, if we
have a very precise estimate for/, (11.8) is approximately a/. To appreciate
how well a/ approximates w*, the following table is helpful: to: 1.0 2.0 3.0
4.0 5.0 (1 + 1/too.)-x: 0.50 0.80 0.90 0.94 0.96. It is seen that, even for to =
3.0, w* is 090 times a/fl. Thus, when the pre- cision of estimation is not too
high, use of the approximation a/ results in suboptimal settings for w and, of
course, higher expected loss (see below). At this point it is pertinent to point
out that if we went ahead conditional on / = , as appears to be the procedure
suggested by the "certainty equivalence" approach, 8 our 'loss function
would be approximated by (w - a)o. and the value of w, which minimizes
this expression, denoted by Wee, would be a (11.9) wee The certainty
equivalence solution is seen to be just the first factor of (11.8). The second
factor of (11.8), which reflects the precision with which/ has been
estimated, does not appear in (11.9). To demonstrate how use of the
suboptimal certainty equivalence value for w leads to higher expected loss
than the use of w* in (11.8), we calculate E(z - a)o. for w = w* and for w =
we from (11.7). The results are ao. (11.1o) �(�1w = w*) = + 1 + too. and
(11.11) ao. E(Llw = w,,)= go. + . too. 7 Note, too, that to xt2- = lm,,,,/s is
very nearly the same as the usual sampling theory statistic, m lm:,,/s, which
is used to test the hypothesis/t = 0. 8 The "certainty equivalence" approach
is described in H. A. Simon, "Dynamic Pro- gramming under Uncertainty
with a Quadratic Criterion Function," Econornetrica, 24, 74-81 (1956); H.
Theil, "A Note on Certainty Equivalence in Dynamic Planning,"
Econometrica, 25, 346-349 (1957); and C. Holt, J. F. Muth, F. Modigliani,
and H. A. Simon, Planning Production, Inventories and Work Force.
Englewood Cliffs, N.J.: Prentice-Hall, 1960. SOME SIMPLE ONE
PERIOD CONTROL PROBLEMS 323 The absolute increase in expected
loss associated with using w = }v rather than w = w*, the optimal value, is
(11.12) E(Llw = w**) - E(LIw = w*) = to.(1 + too.) It is seen that this
increase in expected loss depends not only on the pre- cision of estimation,
as measured by too., but also on the squared value of the target a '. Only
when a = 0 is there no increase in expected loss associated with use of w =
we,. � For target values far from the origin the contribution of a ' to the
expression in (11.12) can be substantial. To provide further information on
this last point Table 11.1 lists the relative expected losses REL given by the
following expression: (11.13) REL = E(LIw = w*) = [1 + (a/?)o./(1 + too.)]
E(L'Iw '-- w,e) [1 + (a/)2/to o.] It is seen from the figures in Table 11.1 that
the use of the optimal setting for w, namely, w* given in (11.8), results in a
reduction in expected loss vis d vis use of the approximate certainty
equivalence solution. The reduction in expected loss is greater, the smaller
to �', a measure of the precision with which / has been estimated, and the
greater (a/)o.. To illustrate application of this analysis let us take ye = Ye -
Ye-., the annual change in aggregate U.S. income and xe = Me - Me-., the
annual change in the U.S. money supply. Under the assumptions made in
connection with (11.1), our problem is to use data x� for 1921 to 1929,
along with a Table 11.1 TABULATION OF RELATIVE EXPECTED LOSS
(11.13) AS A FUNCTION OF (a/) AND to 2 Values of Values of to a (a/g)a
1.0 2.0 3.0 4.0 5.0 6.0 ... oo 0 1.00 1.00 1.00 1.00 1.00 1.00 2 0.75 0.83 0.90
0.94 0.95 0.96 4 0.60 0.78 0.86 0.91 0.93 0.94 6 0.57 0.75 0.83 0.90 0.92
0.93 8 0.56 0.73 0.82 0.88 0.1 0.92 10 0.54 0.72 0.81 0.87 0.90 0.91 0.50
0.67 0.75 0.80 0.85 0.86 1.00 1.00 1.00 1.00 1.00 1.00 : 1.00 This is
because we have assumed that the regression goes through the origin [see
(11.1)1. 0 The data are taken from M. Friedman and D. Meiselman, "The
Relative Stability of Monetary Velocity and the Investment Multiplier in the
United States, 1897-1958/'

324 ANALYSIS OF SOME CONTROL PROBLEMS diffuse prior pdf for


0 and , to find the change in the money supply for 1930, w = x,+x, to have z
= yz+x, income change for 1930, close to a target value of 10 billion
dollars. The symmetry of the quadratic loss function (z - 10) ' implies that
we consider overshoots involving inflation as serious as under- shoots
involving deflation. Based on the data for 1921 to 1929, we have (11.14) .t
= 2.0676xe, (o.883) where the figure in parentheses is the conventional
standard error, = 2.0676 and s ' = '= (ye - xt)'/v = 0.2681 x 108 with v = T- 1
= 8. Given these sample quantities, we can readily compute the optimal
setting for the 1930 change in the money supply and the associated
expected loss. For comparative purposes we also compute the certainty
equivalence approximate solution and its associated los. The results are w*
= 2.893; E(LI w = w*) = 55.29, Wee = 4.837; E(Llw = we0 = 60.03. In this
example w* and wee are quite different and use of w* results in about an
8% decrease in expected loss. Next, it is of interest to elaborate slightly the
simple quadratic loss function employed above to take account of possible
costs of changing the control variable. This will be done in the following
manner: (11.15) L = (z - where c is a known non-negative constant, z m
y,+X, W = X,+X, a is the target value, and x,, the setting for the control
variable in period T. Since w = x,+x, w - xv is the change in the control
variable in period T + 1. We assume that the cost of change is proportional
to (w - x,?'. Using the pdf for z in (11.6), we obtain the expected loss: EL =
Var z + (a - Ez?' + c(w - xv) ' (11.16) = '(1 + mWx2x)+ 9.( _
w)9'+c(w_xr)9.. Clearly the first term on the rhs of (11.16) will be
minimized if w = 0, the second, if w = a/t, and the third, if w = x,. On
determining the value of w, which minimizes EL in (11.16), we find (11.17)
w**= ,2fi'(/)+ q- cxv., in the Commission on Money and Credit Research
Study, Stabilization Policies, Engle- wood Cliffs, N.J.: Prentice-Hall, 1963.
Their money variable is currency in the hands of the public plus all
commercial bank deposits, and their income variable is consumption
expenditures plus their autonomous expenditures A. SOME SIMPLE ONE
PERIOD CONTROL PROBLEMS 325 a weighted average of the values 0,
a/t, and xv. Of course, if c = 0, (11.17) yields the result in (11.8) which can
be interpreted as a weighted average of 0 and a/l, namely, w* = l'(a/t)/( ' +
,'/mx:,). From (11.17), as c grows large, w**--- x,; that is, as change
becomes more costly, w** assumes a value closer to its initial value x,,
which results in a smaller change and less of a contribution to expected
loss. In general, for finite positive c the solution w** in (11.17) is always
between the solution for c = 0, w* shown in (11.8), and x; that is, for x <
w*, x < w** < w* and for x > w*, w* < w** < xv are inequalities satisfied
by the solutions if c has a positive value. Thus, with the cost of change
introduced, there is a tendency toward conservatism introduced in the sense
that the optimal change in the control variable, w** - xv, will be smaller in
absolute size with c > 0 than with c = 0, that is, with no cost of changing the
control variable. As another example of a one period control problem, let us
consider the problem of a profit-maximizing monopolist under the
assumption that he is uncertain about the parameters of his total cost
function and of his demand function. With r = profits, p = price, the control
variable, q = quantity produced, and C total costs, we have (11.18) r = pq -
C, where all variables' values are for the first future period, T q- 1. Further
assume that the monopolist's demand function and total cost function are
given by (11.19) qe = Oo + OlPe + ue, and t= 1,2,...,T, T+ 1 (11.20) Ct ----
ao q- a.qt + a9.qt 9' q- vt, t = 1, 2,..., T, T + 1, where the a's and O's are
unknown parameters, assumed random, and the u/s and vt's are normal,
independent, random-error terms with zero means and constant variances,
%' and %', respectively. Given past data on qt, �t, and _Pt and a prior pdf
for the parameters, it is possible to obtain posterior pdf's for the parameters
and predictive pdf's for C = Ce+ x and q = q,+x. The latter can be employed
to obtain the mathe- matical expectation of w,+ x = r, which is random
because of its dependence on q and C, both random. Then the optimal value
of p = p,+ x, price for period T + 1, can be deter'mined. An alternative
simpler and equivalent procedure for the present problem is to substitute
from (11.19) and (11.20) in (11.18) to obtain 01.21) = = p(0o + + u) - - + +
u) - + + u? - v,

326 ANALYSIS OF SOME CONTROL PROBLEMS where the 's, f's, u,


and v are random. The expectation of r is given by x (11.22) x + + + + +
wher the 's and a's are posterior means, o , , and o are the posterior
variances and covariance, respectively, of o and , and is the posterior mean
of . In taking the expectation of in (11.22), it is important to note that the
parameters of the demand function, the 's, are a posteriori 'dis- tributed
independently ot the parameters ot the cost function, the 's, since we have
assumed that the u[s and vt's in (11.19) and (11.20) are independently
distributed and further that our prior pdffs incorporat the assumption that
these two sets of parameters are independent. On differentiating (11.22)
with respect to the control variable, p, we obtain (11.23) dE' = go + 2fip - afi
- 2a.tfi(fio + fip) + P' + o]. ap The value ofp which sets this derivative equal
to zero, that is, the price that maximizes expected profits, is 1 [, - "+ 2a,o,(1
+ o0] (11.24) P* = 1 - a(1 + 40 ' where 4 = aa/ a and 4o = odo. The quantity
in (11.24) must be positive to be an acceptable solution. a Note that if we
had substituted tM mean values for the a's, O's, u, and v in (11.21) and
maximized with respect to p, the result would be (11.25) exactly what is
obtained from (11.24) by setting 4o and 4 both equal to zero. Alternatively,
4 will approach zero as information about grows, whereas aox could be zero
if o and are independent parameters. Under these special conditions in
(11.25) will approximate the optimal p* in (11.24). ZZNote that E(Oo +
OxP + u) = Eo- o + (Ox - x)P + o +zP + u] = (o - o) + v(O - ) + 2v(Oo - 0( -
) + 0o + v) + . z For a maximum, dE/dp and a are negative. If this is so, this
second derivative will be negative, given that z/a > ax + z. Since lal is often
quite small, this condition is reasonable. The dependence of the second-
order condition on the size of , a measure of uncertainty about Ox, is
surprising. za o will usually be large and positive, whereas x is negative and
often not too large in absolute value. Thus -o] will be large and positive.
Note, too, that a will be small and negative in most cases. SINGLE-
PERIOD CONTROL OF MULTIPLE REGRESSION PROCESSES 327
From (11.24) we see that as uncertainty about f, measured by (rx, decreases,
p* will fall, given/x, a. < 0. Thus, as the monopolist learns about the value
of fi in the sense that he has a posterior pdf for this parameter with a pro-
gressively declining variance, his profit-maximizing price falls. Also
(11.24) indicates an interesting dependence of p* on the posterior
covariance of fo and f. Given that a.fio < 0, the greater the value of (r0, the
lower is p*. As the above example indicates, allowing for imperfect
knowledge and stochastic elements modifies and enriches the results of
traditional economic theory. 4 11.2 SINGLE-PERIOD CONTROL OF
MULTIPLE REGRESSION PROCESSES Assume that our observation
vector y' -- (yx, y.,..., y,) is generated by a multiple-regression process
(11.26) y = XI + u, where X = (xx, x.,..., x0 is a T x k matrix with rank k of
observations of the past settings of k control variables, xs 15 is a k x 1
vector of unknown regression parameters, and u is a T x 1 vector of normal
and independent disturbance terms, each with mean zero and unknown
variance (F'. As prior pdf for the unknown parameters, we shall employ the
diffuse pdD 6 introduced in Chapter 3: 1 (11.27) p(15,(0oc-, 0<(r<oo, and -
-00 < f < 00, i= 1,2,...,k. Let z be a future value of the dependent variable,
say z =- y.+x, the first future value assumed to satisfy (11.28) z = W'15 +
U,+x, where u,+ is a normal disturbance term, distributed independently of
u, with zero mean and variance (F', and w', a 1 x k vector, denotes the 4 See
also R. M. Cyert and M. H. De Groot, "Bayesian Analysis and Duopoly
Theory," manuscript, Carnegie-Mellon University, April 1968; M. S.
Feldstein, "Production with Uncertain Technology: Some Econometric
Implications," manuscript, Harvard Univer- sity, 1969; and A. Zellner, J.
Kmenta, and J. Drze, "Specification and Estimation of Cobb-Douglas
Production Function Models," Econometrica, 34, 784-795 (1966). Here we
assume that all k independent variables can be controlled; see below for a
relaxation of this condition. 6 A natural conjugate prior pdf could easily De
employed instead of a diffuse prior in working the problems of this section.

328 ANALYSIS OF SOME CONTROL PROBLEMS settings of the


control variables in the (T q- 1)st period; that is, w' -- (x, + ...x. + 0. Now
suppose that we wish to choose the control vector w to keep z -- yv + close
to a target value, denoted by a, and that our loss function is (11.29) L(z, a) =
(z - a) a, the same squared error loss function employed in Section 11.1.
From results in Chapter 3 the predictive pdf for z -- yv + x, under the above
assumptions, is (11.30) p(zly, w) cc Iv + (z - w'lyH] -(v+ where [ = (X'X)-
XX'y, H = (1/s2)[1 + w'(X'X)-Xw] -x, v = T- k, and vs ' = (y - XI)'(y - X[).
The mean and variance of z are given by x7 (11.31) Ez = w' and (11.32)
�(z - �zy- = (v _ 2) = �.[1 + w'(X'X)-Xw], where 5' = vs'/(v - 2). Now
we shall take the expectation of the loss function in (11.29) and deter- mine
the value of w, the control vector, which minimizes expected loss. We have
E(z - a) ' = E[(z - Ez) - (a - Ez)] ' (11.33) = E(z- Ez) ' + (a- Ez) ' = '(1 +
w'(X'X)-Xw) + (a- w') '. Differentiating E(z - a) ' with respect to the
elements of w yields OE(z - a) ' = 2S.(X,X)_w _ 2a[ + 2Ol'w. aw The value
of w, say w*, which sets this derivative equal to zero satisfies (11.34)
['(X'X) - + ']w* = a - For the variance to exist we need v = T - k > 2.
SINGLE-PERIOD CONTROL OF MULTIPLE REGRESSION
PROCESSES 329 and thus x8 (11.35) w* = X'Xa + is the setting for w
which minimizes 8 expected loss. Multiplying both sides of (11.35) by ' and
rearranging terms, we have (11.36) a= w*'(1 + W[ ' , * in (11.36) can be
interpreted as the implied estimate for Substituting from (11.35) in (11.33),
we find (11.37) E(Llw for the expected loss associated with the optimal
setting for w, namely, w = w*. The first term in the expression for expected
loss, J', does not disappear as the sample size grows, whereas the second
term, 'a'/( ' + 'X'X) approaches zero as the sample size grows. 'x From
(11.36) we see also that l*, the implied optimal estimate, approaches l as the
sample size grows. Since in many problems not all independent variables
are under control, we extend the above analysis to the case in which X =
(XiX.), where X denotes observations on control variables and X. denotes
observations on variables not under control. Our regression model for the
sample observations is (11.38) y= Xl+u = + + u, 8 Note that [Sa(X'X) -x +
']- = (1/g')[X'X - X'X'X'X/(g + 'X'X)I and can be verified as follows with
Q= 'X'X: [g'(X'X)-X+ 'l(1/g')[X'X - X'X'X'X/(g + Q)I = I- 'X'X/(g + Q) +
'X'X/g - 'X'X'X'X/(g + Q) = I - 'X'X[l/( + Q) - 1/g + Q/ga(ga + Q)] =/. The
formula for the inverse of a matrix in the form encountered in (11.34),
given in C. R. Rao, Linear Statistical Inference and Its Applications. New
York: Wiley, 1965, p. 29, Example 2.8, is slightly in error. It should read:
(A + UV') -x = A - (A-xU)(V'A-x) - 1 + V'A-xU" where A is a nonsingular
matrix and U and V are two column vectors. Note that OE(z - a)'/Ow ' is a
positive definite matrix under our assumptions. 0 See W. D. Fisher, Ioc. cit.,
for some related results. Note that 'X'X = ,', grows in size as T grows.

330 ANALYSIS OF SOME CONTROL PROBLEMS and for the future


observation, z -- Yv + x, z=w'15 + v (11.39) = wx'15x + w.'15. + v, where
15' = (15'i15.') and all other variables are defined above with w' = (%' i w.')
and w2 is given. We make the same stochastic and prior assumptions made
in connection with (11.26) and (11.27). Let our loss function be (11.40) L.=
(z - a) where xxv denotes the setting for the control variables (the variables
in Xx) in period T and G is a positive definite symmetric matrix. The loss
function in (11.40) incorporates losses associated with being off the target
value a and costs of changing the control variables? Given that (11.30) is
the predictive pdf for z, we readily obtain the following expression for
expected loss: EL = S[1 + w'(X'X)-Xw] + (a - w'O) + (% - xxv)'G(wx -
xxv) [ lMX MXa(wx)] +(a-wx'x-w'a, (11.41) = 5 1 + (wx'w')[MX Ma] w +
- - where ' = ('') has been partitioned to correspond to the partitioning of X =
(XxXe) and similarly for {M Mx2 (X'X) -= \MO.X M.O.l' On
differentiating EL with respect to the elements of wx and solving for the
minimizing value wx = %*, we find wo. 15o.) �'Mo.wo. + Gxv] wx* =
[?'M xx + G + xx'l-X[lx(a- ,A _ (11.42) = (p-z_ P-'P[x(a - w') - Mw + Gxvl
1 + 11 x [{; i/lm = v_%(a- - -- ,,: I+Qx - +Qx where P = ?:M xx + G and Qx
= x'P-Xx: The first term on the rhs of the third line of (11.42) is similar in
form to that encountered in (11.35) except that P - replaces X'X/? , a - w:'O:
replaces a, and Qx replaces 'X'X/? . 29. Cost of changing the control
variables has usually been incorporated in loss functions. See, for example,
W. D. Fisher, loc. cit., and S. J. Press, "On Control of Bayesian Regression
Models," manuscript (undated). CONTROL OF MULTIVARIATE
NORMAL REGRESSION PROCESSES 331 The other term in the third
line of (11.42) reflects both the interdependence of wx and wo. in
determining the variance of z and the cost of moving the control variables.
If X. were orthogonal to Xx so that M xo. = 0 and, further, if G = 0, then
(11.42) reduces to Xx'Xxx(a - (11.43) %* = + " which is quite similar to
(11.35) in form. 11.3 CONTROL OF MULTIVARIATE NORMAL
REGRESSION PROCESSES o.a In this section we extend the ahalysis of
the preceding section to apply to the traditional multivariate regression
model ' considered in Chapter 8. Our model for the observations, Y = (yx,
yo.,..., Ym), a T x rn matrix, is given by Y=XB+ U (11.44) = XB + Xo.Bo.
+ U, where X is a T x /c matrix of rank k, B is a k x m matrix of unknown
parameters, and U is a T x m matrix of random disturbance terms. We have
partitioned X= (Xx!XO.), with Xx containing observations on variables
under control and Xo., those on variables not under control. The B matrix
has been partitioned to correspond to the partitioning of X. The first future
vector of observations, z' = (Yx,v + , Y.,v + ,. �., Ym,v + x), is assumed to
be generated by the same process generating Y; that is, z' = w'B + v' (11.45)
= wx'B. + wo.'Bo. + v', where w' = (%' ! wo.') is a 1 x k vector of values for
the independent variables in the future period T+ 1 and v'= (ux,v+x,
uo.,v+x,..., um,r+x) is the disturbance vector for period T + 1. In Chapter 8
(8.57) we have the predictive pdf for observations generated by a traditional
multivariate normal regression model when we employ :a For some
previous work on this problem from the Bayesian point of view, cf. W. D.
Fisher, loc. tit., S. J. Press, 1oc. cit., and A. Zellner and V. K. Chetty,
"Prediction and Decision Problems in Regression Models from the
Bayesian Point of View," J. Am. Statist. Assoc., 60, 608-616 (1965). :4 This
model can also be considered to be the "unrestricted" reduced form
equations of a "simultaneous equation" econometric model. If lagged
endogenous variables appear in the system, we assume that initial values for
these variables are given in forming the likelihood function. Also, such
variables are obviously not control variables.

332 ANALYSIS OF SOME CONTROL PROBLEMS diffuse prior pdf's for


the unknown parameters [see (8.8) and (8.9)]. In the case of one future
period (8.57) specializes to a pdf in the multivariate Student t form: (11.46)
�(z I Y) oc [v + h(z - J'w)'$-X(z - J'w)] -(+m)', wherev = T- (k - 1) - m,g =
(X'X)-X'Y,$=( Y- X/)'(Y- Xt)/v and h = I1 - w'(X'X + ww')-Xw]. Then we
note ' that Z(z'l r) = (11.47) and (11.48) v h_$ Var(zl Y) = v - 2 v [1 +
w'(X'X)-Xw]$. --2 Now let our loss function be given by '6 (11.49) L = (z -
a)'C(z - a), where a' = (ax, aa,..., am) is a vector of given target values, one
for each of the m dependent variablesf '7 and C is a positive definite
symmetric matrix. Then the expectation of the loss function is EL = E(z-
a)'C(z- a) (11.50) = E(z- Ez)'C(z- Ez) + (a - Ez)'C(a - Ez) = 2{g,[1 +
w'(X'X) -xw] + (am - w')(a,- w'[,)} c,, where the summation extends over e,
l = 1, 2,..., m, i, is the (e, l)th element of v$/( - 2), Ez = w'm, with m the eth
column of B, and e, is the (e, l)th element of C. In (11.50) we partition w' =
(wx'iwd) and (X'X) m, and , correspondingly; for example, '= ('i{2'), where
m is a column vector with the dimensions of wx. Then (11.50) becomes EL
= [/(1 + wx'Mnwx + 2wx 'M'w. + (11.51) w' - + (am- 9.s For both (11.47)
and (11.48) to exist we need v > 2. '6 Since it is straightforward to include a
quadratic term for the cost of changing the control variables, we do not add
this complication in the present analysis. Also, if, as Fisher, Ioc. tit., does,
we employ a cardinal utility function of the form U = 2b'z - z'Cz, where b is
a given vector and C, a positive definite symmetric matrix, it is wall known
that we can complete the square on z and write U = b'C-Xb - (z - C-Xb)'C(z
- C-Xb). If we let a = C-Xb, minimizing E(z - a)'C(z - a) is equivalent to
maximizing EU. In some problems we may have target values just for a
subset of the rn dependent variables, say m', with 1 _< m' < m.
SENSITIVITY OF CONTROL TO FORM OF LOSS FUNCTION 333
with (X'X) - = (M}, i,j = 1, 2. We can now differentiate (11.51) with respect
to the elements of wx and find the minimizing ' value wx = wx*. This
operation yields = [7 (2m,ta + + (11.52) x ( [(a - w')$, + (a- w') -
2SMXw]c); again all summations extend over a, l = 1, 2,..., m. Given the
sample quantities and the elements of C = (c), the optimal setting for wx*
can readily be computed. 11.4 SENSITIVITY OF CONTROL TO FORM
OF LOSS FUNCTION In the preceding sections we have analyzed several
control problems using symmetric quadratic loss functions. Now we present
the results of calcu- lations ' to illustrate how incorrect assumptions about
the form of the loss function affect results, an area that may be called
"robustness under changes of the loss function." o To provide some
numerical results T = 15 observations were generated from the following
simple regression model: (11.53) y = 2.0x + ut, t = 1, 2,..., 15; the u[s were
independently drawn from a normal distribution with zero mean and
variance = 9.0, and the xt's were independently drawn from a normal
distribution with mean zero and variance 0.64. The data so generated are.
shown in Table 11.2. Using the data in Table 11.2, we compute the
following sample quantitiesa: = xty 1.5885, Xt2 --' withy=T- 1 = 14. As
regards loss functions, functions' (11.54) f = (-lv) Y. (y- x) ' = 5.7351, we
first consider the following symmetric L(z, a) = I z - al m, = 0.5, 1, 2, and 4,
2 Note that the matrix of second derivatives of (11.51) with respect to the
elements of wx is positive definite; hence the value of wx which sets the
first matrix derivative equal to zero is a minimizing value. These are taken
from A. Zellner and M. S. Geisel, "Sensitivity of Control to Uncer- tainty
and Form of the Criterion Function," in D. G. Watts (editor), The Future of
Statistics. New York: Academic, 1968, pp. 269-283. 0 This phrase was
suggested by J. C. Kiefer. x The conventional standard error associated
with/ is s/(X xte) = 0.8259.

334 ANALYSIS OF SOME CONTROL PROBLEMS Table 11.2 DATA


GENERATED FROM SIMPLE REGRESSION MODEL (11.53) t yt xt t yt
xt 1 -0.039 -1.026 9 0.348 -0.300 2 2.730 1.542 10 4.428 0.924 3 0.997
-0.532 11 -0.723 0.699 4 -2.990 -0.253 12 -0.632 0.182 5 2.660 0.040 13 -
2.434 1.078 6 -0.624 -0.882 14 1.864 0.512 7 6.598 0.875 15 - 1.620 -0.438
8 -0.669 -0.066 where z = y.+x and'a = 4. In (11.54) we have included
square-root error (a = 0.5), absolute error (a = 1), and squared error ( = 2)
and quartic error ( = 4) loss functions. For each of them the following
integral was evaluated 3.: (11.55) f;oo L(z, a)p(zly, w) dz for various values
of w = xr+x, where p(zly, w) is the predictive pdf shown in (11.6). The
results of these calculations are shown in Table 11.3. From Table 11.3 we
see that the optimal setting for w in the squared-error loss function is w =
1.9144 and the associated loss is 10.533. Suppose, now, that we were
mistaken in assuming the loss function to be Ix - 412, for in fact it is Ix - al.
With w = 1.9144, our expected loss with the absolute-error loss function is
2.5475. This is close to the minimal expected loss, 2.5465, asso- ciated with
the optimal w value for the absolute-error loss function, namely w = 1.9584.
Similar results are summarized in Table 11.4. It is seen that for this problem
the optimal solution for the squared-error loss function is remarkably robust
under changes in the form of the loss function a3 as long as we restrict
ourselves to symmetric loss functions. To study robustness of solutions,
when there is a departure from sym- metry in the loss function, calculations
similar to those reported above have been carried through by using the
following loss functions: (11.56) L(z, a) fklz - a I for z > ;} k -- 0.25, 0.50,
0.75, 1.0, =/, [z - a[ for z < 1.5, 2.0, 3.0. In (11.56) we have a class of linear
loss functions which is asymmetric when k % 1. As above, we take a = 4.
o2 Numerical integration was employed. aa As can be seen from the figures
in Table 11.3, this conclusion does not hold for the approximate certainty
equivalence solution, w -- 4// -- 2.5181. SENSITIVITY OF CONTROL TO
FORM OF LOSS FUNCTION 335 Table 11.3 EXPECTED LOSS AS A
FUNCTION OF THE CONTROL VARIABLE SETTING AND FORM OF
THE LOSS FUNCTION Value of Form of Loss Function Control Variable,
w [z - 410'* [z - 41 [z - 412 Iz - 414 3.6780 1.7432 3.6031 20.853 1457.7
2.9895 1.5789 2.9680 14.370 731.98 2.5181 g 1.4992 2.6783 11.743 496.01
2.1751 1.4681 2.5666 10.759 412.52 2.1035 1.4651 2.5554 10.652 402.44
2.0364 1.4636 2.5490 10.582 395.05 2.0045 1.4632 2.5473 ...... 1.9812
1.4632 � 2.5467 ...... 1.9735 1.4632 2.5466 10.544 389.85 1.9584 1.4633
2.5465 � ...... 1.9144 b 1.4638 2.5475 10.533 � 386.42 1.8587 1.4653
2.5513 10.543 384.44 1.8061 1.4675 2.5576 10.572 383.67 1.7960 .........
383.64 o 1.7565 1.4704 2.5658 10.615 383.89 1.7095 1.4737 2.5757 10.672
384.93 1.5442 1.4906 2.6271 10.987 394.90 1.4080 1.5104 2.6887 11.383
410.36 w = 4//, certainty equivalence solution for [z - 41 '. w = (4/)[to9'/(1 +
to2)], optimal solution for Iz - 41 ' where to . = Minimal expected loss for
given loss function. For each of these loss functions the value of w, which
minimizes (11.55), was found by successive numerical integrations. The
results of these cal- culations are shown in Table 11.5. We note that in
contrast to what was found in the case of symmetric loss functions the
optimal solution for the quadratic loss function, denoted by wq, is not very
robust to marked departures from Table 11.4 EXPECTED LOSSES
ASSOCIATED WITH OPTIMAL CONTROL AND USE OF SOLUTION
FOR SQUARED ERROR LOSS FUNCTION Form of Loss Function Item
Iz - 410'* I - 41 Iz - 41 ' Iz - 414 Optimal value 1.9812 1.9584 1.9144
1.7960 for w Expected loss for 1.4632 2.5465 10.533 383.64 optimal w
Expected loss for 1.4638 2.5475 10.533 386.42 w = 1.9144

336 ANALYSIS OF SOME CONTROL PROBLEMS Table 11.5


COMPARISON OF OPTIMAL SOLUTIONS FOR ASYMMETRIC
LINEAR LOSS FUNCTIONS WITH RESULTS OF EMPLOYING
CERTAINTY EQUIVALENCE AND QUADRATIC CONTROL
SETTINGS Slope Certainty Parameter Optimal Expected Quadratic
Expected Equivalence Expected of Loss Setting Loss for Setting Loss for
Setting Loss for Function t' w = w* w = w* w = wq w = wq w = wee w =
wee k = 0.25 3.4778 1.5518 1.9144 1.9519 2.5181 1.6739 k = 0.50 2.6214
2.0059 1.9144 2.1504 2.5181 2.0087 k = 0.75 2.2321 2.3131 1.9144 2.3489
2.5181 2.3435 k = 1.0 1.9584 2.5465 1.9144 2.5475 2.5181 2.6783 k = 1.5
1.6226 2.8937 1.9144 2.9446 2.5181 3.3478 k = 2.0 1.3928 3.1511 1.9144
3.3416 2.5181 4.0174 k= 3.0 1.094 3.5272 1.9144 4.1358 2.5181 5.3565
Computations based on generated data in Table 11.2. Theloss function
employed is L ' Iz - 41 for -m < z _< 4;klx - 41 for4 < z < symmetry; for
example, when k = 3.0, the optimal setting for w is 1.0944, with an
associated expected loss of 3.5272, whereas for wq -- 1.9144 the 'expected
loss is 4.1358. However, for values of k in the range 0.50 to 2.0 use of the
quadratic solution does not result in raising the expected loss very much.
Use of the approximate certainty equivalence solution wce = 2.5181 gives
rise to quite an increase in expected loss in relation to minimal expected
loss when k -- 0.25 and k = 1.5, 2.0, and 3.0. For certain values of k use of
w,e leads to smaller. expected loss than does wq. In summary, for the range
of loss functions, problem and data considered, it is found that the optimal
solution for the quadratic case is quite robust to changes in the form of the
loss function except in the case of considerable asymmetry. 11.5 TWO-
PERIOD CONTROL OF THE MULTIPLE REGRESSION MODEL a4
Here we are concerned with the problem of setting the values of inde-
pendent variables in such a way that the dependent variable is 'kept close to
a target value in not just one future but several future periods. That we seek
to achieve optimal control for several future periods adds new dimensions
to our analysis. As has been recognized in the literature, learning and design
considerations are involved in the multiperiod control problem; that is, as
a4 The material in this section is based in part on A. Zellner, "On
Controlling and Learning about a Normal Regression Model," manuscript,
1966, presented to the Information, Decision and Control Workshop,
University of Chicago. TWO-PERIOD CONTROL OF THE MULTIPLE
REGRESSION MODEL 337 we proceed into the future, we get more
sample information that permits us to learn more about the values of
unknown parameters. Also, how much we learn about unknown parameter
values will depend on the settings of control variables, a design-of-
experiments consideration. We shall see that the solu- tion to a multiperiod
control problem provides an optimal sequence of actions that takes explicit
account of control, learning, and design con- siderations. Since the future
actions embodied in the solution involve adapting our actions to new data
as they appear, the multiperiod problem is often referred to as an adaptive
controlproblem. To make these considerations more concrete, let us
consider a two-period problem. Our model for the observations is (11.57) y
= ,Y15 + u, where y' = (yx, yo.,..., yv), Xis a T x k matrix with rank k of
observations on k control variables, 15 is a k x 1 vector of unknown
coefficients, and u is a T x 1 normal disturbance vector with mean zero and
covariance matrix The future observation vector z' = (zx, za) -= (Yv + x, Yv
+ o.) is assumed to be generated as follows: (11.58) z = W15 + v, where W
= (wx, w.)' is a 2 x k matrix of future settings for the control variables and v'
= (uv + x, uv + o.) denotes a vector of future disturbance terms, assumed to
be distributed normally and independently with zero means and common
variances rr ' and independent of u. If we employ a diffuse prior pdf for the
elements of 15 and rr, it has been shown in Chapter 3 that the predictive pdf
for z is in the following bivariate Student t form: (11.59) p(zly ) oc [v + (z -
W)'H(z - -(.+ where = (X'X)-XX'y, v = T- k, and H = [I + W(X'X)-XW']-/s
�., with s�'= (y- X)'(y- X)/v. Let us assume that our loss function is given
by 3s (11.60) L(z, a) = (zx - ax) �' + (zo. - aa) �', where ax and ao. are
given target values for future periods 1 and 2, respectively. Then our
problem is to formulate a procedure for determining the values of W wx
and wo., where W = (wx, o.), to minimize expected loss. Let us consider
several possible approaches for solving this problem. The analysis can be
extended to apply to a loss function of the following form: L(z, a) = (z -
a)'A(z - a), where ,4 is any positive definite symmetric matrix.

338 ANALYSIS OF SOME CONTROL PROBLEMS Approach I. "Here


and Now" Solution Use the pdf in (11.59) to evaluate the expectation of the
loss function in (11.60) and then minimize with respect to the elements of
wx and w.. 36 The solution to this problem, termed a "here and now"
solution, is appropriate when for some reason we must announce our actual
settings for both wx and w. at the beginning of the first future period. The
costs associated with being required to announce settings for wx and w. at
the beginning of period T + 1 are the following: 1. We cannot use the
information provided by zx -- Yv+x to obtain an optimal setting for w.. 2.
We cannot take account of how our choice of wx will affect our deter-
mination of an optimal setting for w.. However, since there are situations in
which a "here and now" solution is needed and also for comparative
purposes, it is of interest to present the "here and now" solution. We have
EL(z, a) = E(zx - aO ' + E(z. - a.) ' (11.61) = f'[1 + wx'(X'X)-Xwx] + (ax -
wx'[) 2 + a[1 + wa'(X'X)-Xwa] + (aa - wa') a. Then the optimal settings for
wx and wa are given by X'Xa X'Xaa . (11.62) wx* = a + 'X'XO and %* =
ga + 'X'X On inserting these values in (11.61), the expected loss is (11.63)
EL(z, aIW= W*)= ' 1 + g. + Q + . 1 + . + Q , where Q = 'X'X[. As
mentioned above, the solution for each future period is completely
analogous to that presented in Section 11.2 for a one-period problem.
Approach H. "Sequential Updating" Solution Use the pdf in (11.59) to
evaluate E(zx - ax) ' and minimize with respect to the elements of %. This
provides a setting for %. Then after the first future period has passed and zx
-- yv+x has been observed use p(z.lzx, y) to a6 Since zx and z2 have
marginal pdf's in the univariate Student t form and the marginal pdf for zx
does not involve w. and that for z. does not involve wx, the present problem
reduces to two one-period problems, each identical to one considered in
Section 11.2 [see (11.26) and the results that follow it]. TWO-PERIOD
CONTROL OF THE MULTIPLE REGRESSION MODEL 339 evaluate
E(z2 - a.) ' and minimize with respect to the elements of w.. In this
procedure the value of w is determined without taking account of how it
affects the determination of w. and on this count the solution is not optimal.
However, in contrast to the "here and now" solution, the "sequential
updating" solution does take account of information in period T 4- 1 to
arrive at the setting for w. and, as will be seen, is often a good
approximation to the optimal solution for the present problem. In the
sequential updating solution for the first future period we have (11.64) E(zx
- ax) ' = ?'[1 + wx'(X'X)-X%] + (ax - wx'$) ' and (11.65) %, = X'X[ax + The
value for wx in (11.65) is precisely the same as in the "here and now"
solution. For period T + 2, we take account of wx = %* and the new
observation zx -- y,+x; that is, we take the expectation of (z. - a) ' with
respect to z., given z. This yields 3' (11.66) where (11.67) (11.68) M2 = X'X
+ WxWx', . = m.-x(X'y + wxzx) = + (x'x)-%[1 + - with = (X'X)-XX'y and
22 -- V252.__._ 2 [(y- (11.69) -- - X) + (xx - wx'l)'(1 - wx'M.-Xwx)] v2 - 2
[(y - - xinu.) + - with v,. = v + 1, where v = T - k. In (11.67) M. is the
moment matrix for the independent variables at the beginning of T + 2, l. in
(11.68) is the least squares quantity computed at the beginning of T + 2, and
g.' involves the sum of squared residuals at the beginning of T + 2. See
Appendix 1 of this chapter for the derivation of the result shown in (11.66).

340 ANALYSIS OF SOME CONTROL PROBLEMS On minimizing


(11.66) with respect to the elements of we., we obtain (11.70) which differs
from the "here and now" setting shown in (11.62) in that (11.70)
incorporates information pertaining to period T + 1 whereas we.* in (11.62)
does not. Further, since w. + depends on zx, it cannot be evaluated until
after the first future period has passed and zx has been observed. On
substituting from (11.70) in (11.66), we find that the expected loss in period
T + 2, conditional on &, wx, and we. = we. +, is ( (11.71) _E[(za - a=)alz,
wx, wa Since the rhs of (11.71) depends on zx, we can obtain its mean
approximately by using the predictive pdf for zx. The result is (11.72) E 1 +
MIJ [ 1 + + + ' If we now substitute wx = wx*, with wx* given in (11.65),
we have expected loss for the second future period. With this substitution
made in both (11.64) and (11.72), the total expected loss for the two periods
for the sequential updating solution is approximately ( ) ( a�?') ax�' + 3
�' 1 +go.+ Q v (11.73) EL-' go. 1 +go.+ - l (g' + Q)4' where Q = 'X'x. On
comparing (11.73) with (11.63), the expression for expected loss associated
with the "here and now" solution, we see that the first two terms of (11.73)
are identical to the two terms of (11.63). The last term in (11.73) represents
a reduction cf expected loss associated with using the information for
period T + 1 in determining the setting for Approach III. Adaptive Control
Solution Here we describe the adaptive control procedure for obtaining a
solution to our two-period problem? 9 Step 1. We consider ourselves to be
at the beginning of the second ihture period, T + 2, and use the conditional
predictive pdf for z., given zx, wx, and a8 See Appendix 2 of this chapter
for the derivation of (11.72). a9 Although we specialize it to our two-period
regression problem, it should be recog- nized that it is more generally
applicable. TWO-PERIOD CONTROL OF THE MULTIPLE
REGRESSION MODEL 341 the given sample information, y and X, to
evaluate s� (11.74) E(LIz. wx, y, ,Y) = g(zx, w, we., y, X). It is to be
emphasized that (11.74) is valid for whatever value zx takes and for
whatever value is given to wx. Step 2. Minimize the expression in (11.74)
with respect to the elements of wo.. If $o. denotes the solution to this
problem, we have (11.75) 6o. = h(zx, wx, y, X). From (11.75), we see that .
depends on the as yet unobserved value of zx = Yv+x, the as yet
undetermined value of wx, and on the given sample information. 4 Step 3.
Substitute we. = $o. in (11.74) with $. as given in (11.75). This leads to
(11.76) E(LIzx, wx, y, X) = g(zx, wx, ., y, X), which is a function of zx, wx,
and the given sample information. Step 4. Take the expectation of (11.76)
with respect to zx to obtain (11.77) Eg(zx, wx, ., y, X) =/(wx, y, X). Step 5.
Minimize (11.77) with respect to the elements of wx to obtain the optimal
setting for wx, say $x, given by (11.78) which is seen to depend on just the
given sample information, 4�' hence can be computed at the beginning of
the first future period. The optimal setting for wx, shown in (11.78), takes
account of how the future information zx will be employed in optimizing in
the second future period and how the first period setting for wx will affect
actions in the second future period. Further, the second period setting, wo. =
o., shown in (11.75), incorporates all the sample information available at
the beginning of T + 2; that is, y, X, z --- Yv+ and w = , the optimal setting
for w. We now turn to the problem of obtaining the adaptive control
solution to the two-period regression control problem with loss function
given in (11.60) 4o Also, (11.74) is conditioned by our given prior
information about [ and a, diffuse in the present problem. x It should be
emphasized that a cannot be given a specific value until we observe zx and
are given a setting for 4a If we employ an informative prior pdf for [5 and ,,
x would depend on parameters of the prior pdf as well as on sample
information.

342 ANALYSIS OF SOME CONTROL PROBLEMS and predictive pelf


given in (11.59). Specializing Step 1, shown in (11.74), we have 4a (11.79)
E[L(z, a)lzx, wx] -- (zx - aO ' + El(z9. - a.y'lzx, = - - wherein (11.66) has
been utilized and the quantities 2, M, and 2 have been defined in (11.67) to
(11.69). Step 2 involves minimization of (11.79) with respect to the
elements of w. is yields (11.80) $ = M2Oa2 Since M2, 2, and 2 depend on
wx and z, $2 is a function of these quantities and cannot be computed until
values for them are available. In Step 3 we substitute from (11.80) in
(11.79) to obtain (11.81) E[L(z,a)lzx, wx, w. = which is a function of zx and
Step 4 involves computation of the mean of (11.81) with respect to zx. This
computation yields the following approximate result 4'. (11.82) [(z _ a0. +
S?.(1 + a2 ' . + ] where Q = 'X'x. In Step 5 we minimize (11.82) with respect
to the elements of wx to get our first-period setting. This calculation
produces 4' X'X'IaxK), for 0<K< 1, (11.83) $x = go. + Q(1 - � a In what
follows it is understood that we are taking our sample information y and X
as given. Note that $o. in (11.80) is in the same form as wo. + in (11.70). In
evaluating $o., we employ a setting for wx which differs from that
employed in evaluating wo. +. 5 The second term on the rhs of (11.81) is in
exactly the same form as (11.71). Thus we can employ the results in
Appendix 11.2 to get its approximate mean. 46 In (11.83) the condition 0 <
K < 1 is needed in connection with the second-order condition for a
minimum. TWO-PERIOD CONTROL OF THE MULTIPLE
REGRESSION MODEL Table 11.6 FIRST'PERIOD SETTINGS AND
TWO-PERIOD EXPECTED LOSSES FOR VARIOUS CONTROL
SOLUTIONS a 343 Control Solution First Period Setting for wx Expected
Two-Period Loss c I. "Here and X'Xax ( axa + aa a now" w = a + Q go. 2 +
. . + Q ] II. "Sequential X'Xax ( a + ao. v - 2 aao?Q updating" wx = go.+ Q
o. 2 + go. + Q ' T (-ff "] III. Adaptive X'Xax ( control wx = a In the table Q
= 'x'x and K = (v - 2)gaaaa/(v - 1)(g . + Q).. Approximate in III. �
Approximate in II and III. Terms of 0(T-a) and higher order of smallness
have been dropped. with K-- [(v - 2)/(v - 1)]So.ao.o./(S - + Q)o.. It is seen
that as K-->0 (11.83) approaches wx*, the solution value for the first period
for both the "here and now" approach and the "sequential updating"
approach [see (11.62) and (11.65)]. The factor 1 - K in the rhs of (11.83)
provides a modification to take account of how the first-period setting
affects information about unknown parameters in solving the second-period
problem. Finally, we can substitute from (11.83) in (11.82) to obtain the
following � expression for expected loss, given the approximately optimal
setting for wx: (11.84) where Q = 'X'X and terms of order T-q, q 3, have
been dropped. On comparing the expected loss in (11.84) with that of the
"sequential updating" solution in (11.73), we see that the st two terms are
identical. The third terms differ and the expected loss in (11.84) is smaller.
The difference in the present instance, however, involves a difference in
terms of 0(T-a) which will be small in many cases. In Table 11.6 some of
the results obtained above are brought together. The main points to be
appreciated are that the solutions presented can readily be applied in
practice. Also, as already mentioned, there is a reduction in expected loss
associated with the "sequential updating" and adaptive control solutions vis
vis the "here and now" solution. Next, a comparison of approximate
expected losses for the "sequential updating" 7 Note that K is of 0(T- a) and
thus may often not be very different from zero. Note that the ratio of the
third term of (11.84) to that of (11.73) is (ga + Q)/Q > 1. Thus the expected
loss in (11.84) is smaller than that in (11.73).

344 ANALYSIS OF SOME CONTROL PROBLEMS and adaptive control


solutions indicates that the latter is smaller; however, the difference will
often be small when T is not small and target values are not large. Last,
these conclusions are presented in relation to a particular model and two-
period problem and thus should not be crudely generalized to other models
and problems. 11.6 SOME MULTIPERIOD CONTROL PROBLEMS First
it is rather straightforward to generalize the "sequential updating" solution
of Section 11.5 to the case of controlling a regression model for q(q > 2)
future periods rather than two. Assume that our sample data y and the future
observations z' = (zx, z.,..., z), with z6 = Yr+6 (i = 1, 2,..., q), are generated
by a standard normal linear-regression model. 4� Further, assume that our
prior information about parameters is represented by (11.27) and that our
loss function is given by q (11.85) L(z, a) = where a'= (ax, a,.,..., a) is a
vector whose elements values. The future values zx, z,.,..., z satisfy are
given target (11.86) z6 = w/15 + v6, i = 1, 2,..., q, where the w[s are future
settings of the control variables to be determined and the v[s are normal and
independent error terms, each with zero mean and common variance The
application of the "sequential updating" approach in the present instance
leads to the setting for wx shown in (11.65) for the first future period:
(11.87) w* = Ma where the subscript 1 denotes quantities available and
known at the beginning of the first future period; that is, Mx = X'X, = Mx-
XX'y = Mx-Xrnx and x' = (y - X[x)'(y - XO/O'x - 2) = (gx - [x'Mx[x)/(vx -
2), where gx = y'y and For the second future period we have observed zx
and know the setting for wx. Thus, on computing Ez2[(z. - a.)'lzx] and
optimizing with respect to the elements of w. as in (11.64) to (11.65), we are
led to (11.88) w,.* = 49 See (11.26) and the surrounding text for a
description of the model under considera- tion. Below it is assumed that all
independent variables can be controlled. Generalization to the case in which
just a subset can be controlled is direct and thus is not presented. SOME
MULTIPERIOD CONTROL PROBLEMS 345 where the subscript 2
denotes quantities available and known at the beginning of the second
future period; that is, M. = X'X + wx*wx*', . = M.-x(X'y + w*z0 = M.-m.,
and g.' = (g. - .'M.l.)/(v. - 2), where g. = y'y + zx" and ,,. = 'x + 1 = v + 1.
For the jth future period we have observed zx, z,.,..., zj_ and know the
settings for wx, w.,..., %_ x. On minimizing Ez(z - aj)" with respect to the
elements of w, we obtain (11.89) %, = Mjas where (11.90) M = X'X + Z 6-
-1 with (11.91) m= X'y+ w**z, = M-m and gj = y'y + 6=1 and v - 2 ' v =
v+j- 1. Thus (11.89), for j = 1,2,...,q, yields the sequence of "sequential
updating" settings '� for the q control vectors, wx, w.,..., w. Now we turn
to one of several multiperiod problems considered by Prescott? x He
discusses control of a multivariate system with autoregressive features and
obtains approximations to the adaptive control solution. His approximate
solutions take account of uncertainty about parameter values and new
information as it becomes available, considerations which were seen to be
important above. In addition, for a particular system he uses Monte Carlo
techniques to evaluate the average losses associated with his approxi- mate
solutions and with several "certainty equivalence" or "linear decision rule"
solutions which neglect uncertainty about parameter values. The average
losses associated with Prescott's approximate adaptive control solutions are
found to be quite a bit smaller than those associated with solutions that
neglect uncertainty about parameter values. This finding attests to the
importance of allowing for uncertainty about the values of parameters in
solving control problems, particularly when parameter estimates are not
very precise. 0 For computational convenience relations in the form of
(11.68) and (11.69) can be employed to update the 's and .a's. In the second
line of (11.68) note that [1 + wx'(X'X)-Xwx] -x = 1 - witM2-IWl, where Ma
= X'X + wxwx', and thus a = [$x + Mx-Xwx( 1 - wx'M2-XwO(zx - wx'x),
where Mx = X'X and x = Mx-Xmx with m = X'y. x E. C. Prescott, Adaptive
Decision Rules for Macro Economic Planning, doctoral dissertation,
Graduate School of Industrial Administration, Carnegie-Mellon University,
1967.

346 ANALYSIS OF SOME CONTROL PROBLEMS The multivariate


system considered by Prescott is given by (11.92) Yt = Ayt-x + Aowt +
Axxt + ut, t = 1,2,...,q, where Yt is an m x 1 dependent or endogenous
variable vector, Yt-x is the dependent variable vector lagged one period, s'
wt is a k x 1 vector of control variables, xt is a p x 1 vector of independent
(or exogenous) variables not under control, and at is an m x 1 vector of
random disturbance terms; A, Ao and Ax are coefficient matrices. Time is
measured with t = 0 denoting the current period; thus t = 1 denotes the first
future period, t = 2, the second, and so on. so The disturbance vectors are
assumed to be normally and independently distributed, each with zero mean
vector and common positive definite symmetric covariance matrix. The loss
function for the q future periods is assumed to have the following form: q
(11.93) L = Lt(yt), t=1 where each Lt(yt) is a quadratic in the elements of
yt. s As new information becomes available, we learn more about the
param- eters of (11.92). To represent this learning process it is convenient to
rewrite (11.92) as (11.94) Yt = Adt + ut, t = 1, 2,...,q, where A = (Au i Ao i
Ax) and dr' = (y_ i wt' i xt'). As prior pdf for the elements of A at t = 0,
Prescott considers, among others, independent multivariate normal pdf's for
the rows of A; that is, if At is the transpose of the ith row of A, (11.95)
p,x(&lmm Hx) = MVN(m,x, Htx), i = 1, 2,..., m, where mt is the prior mean
vector and Htx, the prior precision matrix s5 of the multivariate normal
(MVN) prior pdf at the beginning of the first future period. The pdf in
(11.95) would have the form of the posterior pdf based on T sample
observations (t = - 1, -2,..., -T) if our initial (t = -T) prior 52 By suitable
definition of Yt it is well known that higher order difference equation
systems can be written in the form of (11.92). In this case, as well as others,
some of the coefficients of the system will be known with certainty. sa With
time so measured the sample period extends from t = -T + 1 to t = 0, T
observations in all. 54 Prescott, op. tit., p. 17, notes that by suitable
definition of Yt the loss functions Lt(Yt) can include lagged and unlagged
endogenous, exogenous, and control variables. Of course, if Yt is so
redefined, the system in (11.92) should be correspondingly redefined. ss
The prior precision matrix is just the inverse of the prior covariance matrix.
SOME MULTIPERIOD CONTROL PROBLEMS 347 pdf on the elements
of A were diffuse and the covariance matrix for ut were diagonal with
known elements. so After the first future period (t = 1) has passed we can
use the new informa- tion to update the prior pdf in (11.95) by an
application of Bayes' theorem, which yields s7 (11.96) p,.(AtlI) oc exp [-�
(& - m,x)'H,x(At - re,x) - �(Ytx - dx'At) '1 oc exp [-�(& - mt.)'Ht.(At -
m,.)], where pto.(&[I) is the posterior pdf for the elements of &, given the
informa- tion denoted by I., available at the beginning of t = 2. In the
second line of (11.96). (11.97) and Hta = Hx + dxdx', (11.98) mta = Hto.-
X(Htxmtx + ytxdx); Ht. and mt. are the precision matrix and mean vector,
respectively, for pt.(AtlI2). Since the posterior pdf's p,(&113, t = 1, 2,..., q,
are all normal, (11.97) to (11.98) can be generalized by induction to (11.99)
/-/t,t + = Hit + dtdt' -1 (11.100) mt.t+ = H,,t+ (H, tmtt + y,dt) t =0,1,2,...,q-
1, where Ht,t and m, are the precision matrix and mean vector, respectively,
of p,(Atllt), a multivariate normal pdf. The relations (11.99) to (11.100) are
useful for updating our pdf's for the elements of At as new observations
become available. so With this said about updating our prior information,
we turn to our basic problem of selecting values sequentially for the control
vector wt in (11.92) for t = 1, 2,..., q, to minimize the expected value of the
loss function shown in (11.93)? Formally, let ft denote the minimum
expected value of the loss 5 Other initial assumptions can lead to a posterior
pdf in the form of (11.95); for example, we could initially (t = -T) have
independent MVN prior pdf's for the elements of the Ads, given the
elements of a diagonal disturbance covariance matrix. Generalization to the
case of an unknown and nondiagonal disturbance matrix is considered by
Prescott. 57 Below, for convenience, we take the known diagonal
disturbance covariance matrix to be the identity matrix; that is, Eutut' = I for
all t. With known variance, this can be achieved by rescaling the data. 50
Note that the predictive pdf for Ytt, ptt(yullt), is given by Pit(Yullt) = I
Pu(YuIAOpu(A,[It) dA, where p,(J',IA0 cr exp [-�(ytt - d(At) ] and
pt(&llt) is the multivariate normal pdf for the elements of A, given It,
information available at the beginning of period t. 5 We shall assume that a
minimum exists.

348 ANALYSIS OF SOME CONTROL PROBLEMS for periods t through


q, inclusive; that is, so (11.101) f(It, yt-x) = min E[__t L,(y3llt, Yt-x]' Now
Bellman's principle of optimality states: An optimal policy [here sequential
rules for selecting values of the control vector, the wt's] has the property
that whatever the initial state and initial decision are the remaining
decisions must constitute an optimal policy with regard to the state resulting
from the first decision. 6x Thus by applying this principle we can write for
the present problem (11.102) j(It, yt-x) = min E[ft.x(It.x, Yt) + Lt(yt)llt, y-
x] for t = 1, 2,...,q, withfq+x = 0. As we can see from (11.102), the optimal
setting for wt and optimal settings for future periods are determined by
taking past settings and outcomes as given, whatever they may have been;
thus the formal solution procedure, shown in (11.102), reflects the principle
of optimality. In (11.102) we have a system of q functional equations which
in theory can be solved s�'; for example, if (11.103) E[A+ x(It+ x, Yt) +
Lt(yOIIt, Y-OI is a quadratic function in Yt-x and wt, the w which
minimizes (11.103) will be linear in yt-x and the function's minimal value is
quadratic in Yt-x. Then, using (11.102), we determine the function J(It, Yt-
x) explicitly. If the values of the elements in the coefficient matrix A in
(11.94) were all known with certainty, (11.103) would be quadratic in Yt-x
and wt and the approach described above could be employed. However,
when some or all elements of A are unknown and have to be estimated,
(11.103) is not quadratic in and wt and a different solution procedure is
required. s4 6o That f is a function of It, the information available at the
beginning of period t, and Yt-x is due to the fact that (11.92) is a Markov
process in the vector Yt. 6x Quoted from R. Bellman and R. Kalaba,
"Dynamic Programming and Adaptive Processes: Mathematical
Foundations," reprinted from IRE Trans. Automatic Control,!!! AC-5, No. 1
(January 1960), in R. Bellman and R. Kalaba (Eds.), Selected Papers on
Mathematical Trends in Control Theory. New York: Dover, 1964, pp. 195-
200, p. 69. See, for example, Bellman and Kalaba, loc. cit., M. Freimer, "A
Dynamic Prograra4 ming Approach to Adaptive Control Processes," IRE
Trans., AC-4, 2, No. 2, 10--15 :. (1959), and M. Aoki, loc. cit., for solutions
for particular problems. 6a Of course, (11.103) depends also on quantities
that describe the available informatio It. Note, too, that It+ will depend on
Yt as well as other quantities pertaining t. period t. . s4 Note, for example,
that the lhs of (11.82) is not quadratic in z and w. SOME MULTIPERIOD
CONTROL PROBLEMS 349 In developing his approximate solution to the
present problem, Prescott considers the first equation of (11.102)' (11.104)
A(I, Yo) = min E[f(I., Y0 + Lx(yx)llx, Yo]. w! Iffo.(Io., Y0 had a known
functional form, we could evaluate the expectation on the rhs of (11.104)
and find the minimizing value for wx. Indeed, this was done above for a
two-period problem [see Section 11.5, particularly (11.79) and the
subsequent analysis]. For three or more future periods, however, the
determination of an exact form for fo.(Io., yx) does not appear to be
possible. Since this is the case, Prescott introduces a function that
approximates fo.(Io., Y0. Given that Ia represents our information about the
unknown parameters for t = 2 and that this information is not updated for
subsequent periods, we can define the minimum value of the expected loss
for periods t through q as ht(Ia, yt_x), t = 2, 3,..., q. Then, applying the
optimality principle, we have (11.105) ht(Ia, Yt- x) = min E[ht + x(Ia, Yt) +
Lt(yt)]Ia, Yt- x] wt for t- 2, 3,...,q with hq+x-.= O. Given I., the system of
equations in (11.105) can be solved because the function h(Io., Yt-x) is
quadratic in Yt-x. In terms of (11.104), ho.(Io., Yx) is taken as an
approximation for fo.(Io., The setting for wx is obtained by minimizing
(11.106) E[h.(I., Yx) + Lx(yx)], Y where I. depends on Ix, wx, Yx, and xx.
Since I. depends on wx, the first term of (11.106) reflects the influence of
the first period setting of wx on the losses for subsequent periods? To
provide a practical computation procedure for medium to large systems, a
final approximation is introduced; that is, (11.106) is approximated by
(11.107) ha(Ia , Eyx) + E Lx(Yx), Yx where Eyx is the mean of the
predictive pdf for yx and Io. denotes the prior information for t = 2, with
Eyx replacing yx. Then (11.107) can be minimized with respect to the
elements of wx by using computer search procedures. As a convenient
initial value for wx, Prescott suggests the value of wx which mini- mizes
expected loss if the prior pdf for the first future period were never i:updated.
i Since Prescott's solution is an approximate one that incorporates an
:i.allowance for uncertainty about parameters and shows how current
control ': '.' It should be noted that this consideration is not taken into
account in the "sequential updating" approach described and applied above.

350 ANALYSIS OF SOME CONTROL PROBLEMS variable settings


affect future losses, it is of interest to consider its performance in relation to
other approaches. In this connection Prescott generated data from the
following model66: AC = 0.308AY x + 0.194AC_x + 0.408AM + 0.078AG
+ 87.7P_x - 4797 + AIx = 0.278AYx -- 0.663Ix,_x + 0.009Y_x* + 0.168AG
+ 159P_x - 6125 + u., (11.108) AL. = 0.105AY x -- 0.220AR -- 0.510I.._x +
0.041 Y-x* + 92.8P_x - 5996 + ua, AR = 0.11lAY x -- 0.739AM +
0.318AM_x + 0.187AG -- 937 + u, AT = 0.21A Y, where the time
subscripts have been suppressed, A denotes the first difference operator, for
example, AC = Ct- Ct-x, and a subscript -1 denotes a variable lagged one
period, for example, C-x = Ct-x. Definitions of the variables follow: I-- G T-
G= M= P= R= personal consumption expenditures, ' gross private
investment, less new construction expenditures, new construction
expenditures, Ix + L., gross private investment, government purchase of
goods and budgetary surplus, budgetary government surplus (or deficit if T
- G is negative) on income and product account, taxes, YI+G-T, C+L
currency and demand deposits adjusted in the middle of the year, GNP
deflator, yield of 20-year corporate bonds, annual percentage rate
multiplied. by 10, C + I + G, with variables measured in billions of current
dollars. random disturbance term (i = 1, 2, 3, 4). The variables assumed
under control are G and M. All other variables are endogenous, except for
P, which is assumed to be exogenous. The random disturbance terms were
generated by independent drawings from normal distributions with mean
zero and a different variance for each of the four 6 The model used is one
developed and estimated by G. Chow in "Multiplier, Accelerator and
Liquidity Preference in the Determination of National Income in the United
States," IBM Record Report, RC 1455 (1966). SOME MULTIPERIOD
CONTROL PROBLEMS 351 disturbance terms? For purposes of
generating the data the exogenous price level was assumed to have grown at
2% a year in a 12-observation "sample period "66 and at 1.5% rate for the
"future planning period" of eight years. Further, since information regarding
Ms and G is needed to generate the data, Prescott used the following
relations: AMt = 2.1 + 2.2ust, AG, = 4.1 + 4.0ut, where Ust and u6 are
independently and normally distributed disturbance terms with zero means
and unit variances. The loss function, reflecting the goals of full
employment, rapid economic growth, and price stability, is assumed to have
the following form' 1972 (11.109) L= t -- 1965 [2(Y - :fio). + (G - G�) + +
- where �t �, Ct �, G, �, and It � are given optimal or target values for
the variables in period t. 69 In (11.109) we have an eight-year planning
horizon for which the first future period is 1965. Some calculations were
performed by using a four-year horizon, 1965 to 1968. Given data
generated as explained above for the "sample" period, 1953 to 1964, the
policy maker must decide on values of M and G for 1965 and then, on the
basis of his augmented sample after 1965's data are available, values for
1966, and so on. The following are several of the decision rules considered
by Prescott: 1. Linear Decision Rules (LDR) The reduced form equations of
the system in (11.108) are estimated from the generated data by using
classical least squares. The parameter estimates are regarded as being equal
to the true parameter values and the expectation of (11.109) is minimized to
provide settings for M and G. 07 The variances, Var(/q), i = 1, 2, 3, 4 were
given values equal to their respective sample estimates. 0o The parameter
values used to generate the data are equal to estimates obtained by G.
Chow, 1oc. cit., based on actual data. In other respects the experimental set-
up is designed to approximate conditions of the U.S. economy in the late
1950's and early 1960's. Observed values of variables in 1951 and 1952
were used as initial conditions in generating the data for 1953 to 1964, the
"sample" period, and subsequent future periods. 0o The target values
assigned were selected with an eye toward providing realism. An estimate
of "capacity" GNP for 1964 was assumed to grow at 3.5% per year. Given
that P grows at 1.5% per year, yto was assumed to increase 5% per year.
Other target values were obtained as follows: Ct � = 0.64Y �, I � = 0.16
Yt o, and G � = 0.20Y �.

352 ANALYSIS OF SOME CONTROL PROBLEMS 2. Linear Decision


Rules I and H (LDR-I and LDR-H) The two-stage least squares method is
used to estimate parameters in the structural equations. 7� If the
coefficient estimate for AM in the consumption equation has the "wrong" a
priori algebraic sign, the corresponding variable is deleted from the
equation. TM Reduced-form coefficient estimates are obtained from the
structural coefficient estimates and regarded as being the true values of the
reduced-form coefficients. The procedure described above in (1) is then
applied to obtain settings for M and G. LDR-II is the same as LDR-I except
that the "sign test" is not used. 3. Adaptive Decision Rules (ADR) The
system in (11.108) has the form of (11.92) and the control solution ADR is
the approximate one developed by Prescott. The initial (1953) prior pdf for
the parameters is diffuse. Further, the variances of disturbance terms are
assumed to be equal to their respective sample estimates. This is an
approximation that can be relaxed in applications. 4. Adaptive Decision
Rules (ADR-0) Here the adaptive decision-rule procedure is one in which
the decision maker sets the values of M and G for period t by using
information available at that time but not allowing for the future updating of
prior pdf's in sub- sequent periods. ADR-0 is what we have called the
"sequential updating" approach. 5. Perfect Information Decision Rule
(PIDR) This decision rule involves minimization of the expected value of
the loss function (11.109) under the assumption that the decision maker
knows the true values of the model's parameters used in the generation of
the data. Of course, in practice these values are unknown and have .to be
estimated. However, it is of interest to compare the resulting loss associated
with a solution under this assumption with those associated with an
application of decision rules when the parameters are unknown and have to
be estimated. In Table 11.7 the losses actually experienced with different
decision rules and several sets of data, generated as explained above, are
presented. 7�' As Prescott concludes, "In every case, the procedure using
adaptive decision rules performed better than LDR, clearly indicating the
superiority of our analysis in this example." 73 In many cases the margin of
superiority is 7o These are not shown in the text. 7x This a priori "sign test"
is frequently employed in practice and appears to have been used by Chow
in constructing his model. 72 E. Prescott, loc. cit., presents additional
results generally similar to those shown in Table 11.7. 7a E. Prescott, op.
cit., p. 63. Table 11.7 SOME MULTIPERIOD CONTROL PROBLEMS
LOSSES ASSOCIATED WITH DIFFERENT DECISION RULES FOR
VARIOUS GENERATED SETS OF DATA a 353 Data Set PIDR Four-Year
Planning Horizon (1965-1968) Adaptive Decision Rules ADR ADR-0
Linear Decision Rules LDR-I LDR-II LDR 1.1 1.2 1.3 1.4 1.5 4.1 4.0 6.0
2.8 5.9 5.2 5.2 6.0 6.0 4.3 6.6 5.6 5.9 10.0 10.0 5.5 8.7 8.7 10.9 10.6
Average for 1.1-1.5 4.6 6.2 6.7 8.9 1.6 1.7 1.8 1.9 1.10 4.2 10.0 7.1 5.9 4.4
3.8 25.1 9.5 3.9 2.8 4.4 5.7 6.0 16.3 7.4 6.5 6.6 12.8 8.6 9.0 Averager 1.6-
1.10 5.0 11.1 10.5 7.3 Eight-Year Planning Horizon (1965-1972) 2.1 8.4
15.2 15.2 2.2 8.0 8.1 8.1 2.3 9.7 9.9 7.4 2.4 11.1 12.9 13.0 2.5 7.6 9.1 9.8
Averager 9.0 11.0 10.7 2.1-2.5 2.6 15.0 11.3 14.7 18.0 2.7 20.1 33.1 65.1
20.9 2.8 11.0 71.0 32.0 16.4 2.9 16.1 21.6 13.9 19.3 2.10 20.0 26.7 18.1
21.1 Average for 16.4 32.5 28.7 19.1 2.6-2.10 a Taken from E. Prescott, op.
cit., pp. 64-65. Where entries are not shown, they were not calculated.

354 ANALYSIS OF SOME CONTROL PROBLEMS substantial. Prescott


points out that ADR performs better for the four-year horizon and ADR-0
does slightly better for the eight-year horizon. In dis- cussing this difference
he points to excessive experimentation when ADR is employed and when
the horizon is long. TM He suggests possible use of a moving horizon of
three or four periods in connection with his approximate solution procedure.
The results in Table 11.7 point up the need to be extremely careful in
determining "optimal" settings for control variables, particularly when
sample information is not extensive. In addition, it must be realized that
models representing economies and loss functions, such as that shown in
(11.109), are approximations. Further research and experience with applica-
tions appear to be needed before it is possible to state whether adaptive
control solutions will be "good enough" to aid in macroeconomic policy
making. APPENDIX 1 The Conditional Predictive Pdf for z2 Given z In
this appendix we derive the conditional predictive pdf for z., given zx, from
the joint predictive pdf for these variables, which, as shown in (11.59), is in
the following bivariate Student t form when we employ a diffuse prior pdf
for 15 and ' (1) p(zly) oc [ + (z - W[)'H(z - W[] -(v+2>', with all quantities
as defined in connection with (11.59). Now we partition z - W[ as and H =
[/+ W(X'X)-XW']-x/s ' as (3) H = a 2 x 2 symmetric positive definite
matrix. Then the quadratic form in (1) can be written as (z- W)'H(z - W) =
h..e. ' + 2hxeea + hxex hx. ' {hxxh.. - h. ') = h''(e' + - eO + \ 7 e" 74 As he
points out, ibid., pp. 69-70, his approximate "... solution selected the current
period's decision on the basis that all learning would occur from that
observation and that the prior would not be updated subsequently.
Therefore, ADR placed too great an emphasis on experimenting when the
planning horizon was long." APPENDIX 1 On introducing this result in (1),
we have 355 / (4) p(zx, z.ly) ocv + + h.. e. hX x ex , where ex and e. are as
defined in (2) and h u denotes the (i, j)th element of H- ; that is, H_ x = [hU]
= f[1 + wx'(X'X)-Xwx wd(X'X)- w ]. 1 + w.'(X'X)-Xw.J To put (4) in a
more appealing form note that h ' w.'(X,X)-Xwx - = - wCla - 1 -7- - (5) = z.-
w (x'x)-lwl ] + 1 + - � The quantity in square brackets is equal to (6) 1. =
(X'X + wxwx')-x(X'y + wxzx), the least squares quantity computed on the
basis of information available at the beginning of period T + 2. This is
shown as follows*: (7) t -1 t t -1 (x_x_)-%w/(x'x). - 1 + wx'(X'X)-Xwx 'Jr y
+ wxzx) - = + 1 + wx'(X'X)-Xwx ' which is just the quantity in brackets in
the second line of (5). Thus hX. (8) e. - ex = z: - w.'[: and, given zx, (4) can
be written as h.. ] (9) p(z.lzx, y) oc 1 + (z. - w.'[.) ' -(v+')/' v + ex'/h n We
now obtain a more intelligible expression for the quantity h../(v + e'/h n)
appearing in (9). From H = [I + W(X'X)- W']- /f' = [I - W(X'X + W'W)-
W']/f . we have, with M. = X'X + wwx', 1 - w2'(M. + w.w2')-Xw2 h22 = $9.
1 (1 + w2'M2-Xw.)f ' The expression in brackets in the first line of (7) is
(X'X + wxwx') -x.

356 ANALYSIS OF SOME CONTROL PROBLEMS Then (10) i ( 3( ' v +


e�'/h x = 1 + wJMo. -w ' + (zx - w/)�'/[ 1 + w'( X'X)-xw From vs = (Y -
X)'(y - X) and the expression for in the sond line of (7), it is straightforward
to show that - (11) (y - XO)'(y - X) + (zx - wx'O) = vs + 1 + wx'(X'X)-Xwx
Writing the lhs of (11), the residual sum of squares at the beginning of
period T + 2, as s , with v = v + 1, we find that the expression in (10)
becomes can 1/vs(1 + w M w) and the conditional pdf for z, given zx, in (9)
be expressed as* [ 1 _ w2,2)21 -<'+xv2 (12) p(zelzx, Y) m + s(1 + w'M-
we} (z ' This pdf is in the form of a univariate Student t pdf with e = + 1
degrees of freedom. The conditional mean of ze, given zx, is we'e, whereas
its variance ** is %see(1 + we'Mwe)/(ve - 2) as stated in the text in
connection with (11.66). APPENDIX 2 Derivation of Approximate Mean
Given in (11.72) In (11.72) we have the following expectation to evaluate'
Ezgo.�'[ 1 + aJ'/(gJ' + /Mo.O], where go. a, o., and Mo. have been defined
in (11.67) to (11.69) and ao. is the given target value for period T + 2. For
the first term we have 78 Ego.o. E(y - X[)'(y - X) + (zx - wx'O)9'( 1 -
wx'M'-Xw0 = v. - 2 (y- xl)+ + wf(X'X)-wd [1 - wx'(X'X q- wxwx')-Xwx]
(1) vs o. + g�.(1 + Q)[1 - Q/(1 + --- 1,' 2 -- 2 76 For the pdf to be proper
we need va > 0. For the mean and variance to exist we need va > 1 and va >
2, respectively. In going from the second to the third line of (1), we use
(X'X + wxwx') -x = ( X'X)- _ (X,X)_wwx,(X,X)_/[ 1 + w,(X'X)-w]. Thus
wx'(X 'X + wz w')-xwz = Q - Qx2/(1 + Qx)= Qx/(1 + Qx), where Qx = wx'(
X'X)-xwx' APPENDIX 2 357 where vo. = v + 1, go. = vso./(v - 2), vso. = (y
- X)'(y - X), and Qx = w'(X'X)-w. Next we have to evaluate Ezao.�'J/(o.
�' + [o.'Mo.[o.). 7 Using (11.69) and (11.70), we can rewrite this last
expression as ao.o.[Soo. + ex�'/(1 + Qx)(v- 1)] (2) E o + 'wwx' + 2wx'ex +
[Qx + 1/(v- 1)]ex�'/(1 + Qx)] ' where ex = zx - wH, ao = So ' + I'X'XI, So '
= vs�'/(v - 1), and Qx = wx'(X'X)"Xwx. Unfortunately, it does not appear
possible to evaluate the expectation in (2) exactly? � Therefore we
approximate it by (3) (a�'o./a�)[s��' + o./(v- 1)] 1 + [{'ww'{ + Q +
1/(v- 1)]/ao The expression in (3) is obtained by noting that (2) can be
written in terms of ex/ao and exo./ao, random variables with finite means
and variances, the latter being of 0(T-o.), since ao = So �' + 'X'X is 0(T).
Then an approxima- tion to the expectation in (2) is obtained by replacing
the random variables with their mean values, ox an approximation that here
involves neglecting terms of 0(T -) and higher order of smallness. Since the
denominator of (3) is one plus a term of 0(T-), we can expand it to obtain oa
[ g" So 2 ] (4) aJ' s0o. + (w/l) �' , ao v- 1 ao where terms of 0(T -') have
been retained and those of higher order of smallness have been dropped. We
can now add (!) and (4) to get our final result: a2 2 (5) 2 So 2 ] _. go. + ao.o.
SoO. + --('wxwx' + g�'Qx) a"- v - 1 ao 2 = 2 [1 + a� 2 where in going
from'the first to the second line So �' = (v - 2)/(v - 1)g ' has been used.
Since ao = ga + 'X'X and Qx = wx'(X'X)-Xwx, it is the case that (5) is
identical to the expression shown in (11.72). * Note that the denominator
g=o. + ='M= a is positive with probability one. 0 This problem is similar to
the problem of taking the expectation of the reciprocal of a positive
constant plus a quadratic in a normal variable discussed by M. Aoki who
states that the problem cannot be solved exactly; op. cit., p. 113 if. ox See
M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, Vol. I.
London: Griffin, 1958, p. 231 if., for a discussion of this type of
approximation to the mean of a function of random variables. 2 The
expansion is of the form (1 + x) -x = 1 - x + � �., which requires 0 _< Ix[
< 1.
358 ANALYSIS OF SOME CONTROL PROBLEMS QUESTIONS AND
PROBLEMS 1. Explain why and how the expression for expected loss in
equation (11.7) depends on Y.. x 2. :. Consider the loss function in (11.3):
L(z, a) -- (z - a) 2. With -- w, the mean of the predictive pdf for z, show that
E(z - a) 2 = const + (a - )2 + 0(T-), where 0(T-) denotes a term of order T-x
3. In the analysis of the one-period control of the simple regression process
in (11.1), with the loss function shown in (11.3), we employed a diffuse
prior pdf for the parameters/ and a. Use the same loss function and an
informative prior pdf for/ and tr in the normal-gamma form; that is, p(/[tr)
p.(tr), where is normal with prior mean / and prior variance c% 2 and p.(,)
oc a-t0+x) exp (-VoSo2/2-'), where/, c, Vo, and So are assigned values to
reflect prior information about/ and tr to obtain the optimal setting for the
control variable w -- x.+ x. Compare this value for w and the associated
expected loss with the corresponding expressions (11.8) and (11.10),
obtained by using a diffuse prior pdf. 4. Under what circumstances will to '
in (11.8) tend to be large so that w* will be approximately equal to a/] ? 5.
Would it be meaningful to consider the quantities g' and to ' on the rhs of .
(!1.10) as random and to evaluate the mean of (11.10) by using the pdf for
$' and to2? 6. Discuss the extent to which the loss function (z - 10) 2, where
z is the nominal annual change in U.S. aggregate income and 10 is a target
value for z, is a satisfactory loss function for use in economic policy
making. 7. If we have a loss function of unspecified form, say L(z, a),
where a is a given quantity and z has the pdf shown in (11.6), explain how
the mean of L EzL(z, a), assumed to exist, can be approximated by
expanding L(z, a) in a Taylor series about the mean of z. 8. Use the
expansion technique of Problem 7 to provide an approximation to Ez(z - a)
2 - EL(z, a), where L(z, a) is a "true" loss function, a is a given constant,
and z has pdf given in (11.6) of the text. Evaluate terms in the series when
L(z, a) = (z - a) 4, indicating the order in T of each term retained and of
terms omitted. 9. Suppose that a "true" loss function is given by L(z, a) = (z
- a) 2, where a is the target value and z = y.+x, with predictive pdf given by
(11.6). If, instead of the true loss function, we use L(z, ao) = (z - ao) ',
where ao a, compare the optimizing values of w = xe+x and the expected
losses asso- ciated with the use of L(z, a) and L(z, ao). 10. In Problem 9
explore the consequences of decisions based on L(z, ao) where a0 = (1 +
b)a with those based on L(z, ao) where ao = (1 - b)a, and b is a given
constant satisfying 0 < b < 1. QUESTIONS AND PROBLEMS 359 11. In
connection with the loss function shown in (11.15) explore the con-
sequences of using a loss function that misrepresents the cost of change;
that is, L = (z - a) ' + eo(W - x.) ', where e0 c. 12. Assume that the model in
(11.1) has an intercept term; that is, y = /0 + /x, + ut. With a diffuse prior
pdf for the parameters, derive the predictive pdf for z = Y.+x, use it to
obtain the mathematical expectation of the loss function in (11.15), and
examine its three components. Then, analogous to the result in (11.17),
show that the value of w = xv+ x, which minimizes expected loss, can be
expressed as a weighted average of the values of w, which mini- mize the
three components of expected loss individually. 13. In the profit
maximization problem analyzed in (11.18) and following the control
variable was assumed to be price. Analyze the problem under the
assumption that q, quantity produced, is the control variable and the
demand equation is p = 9,0 + yxqt + t, where y0 and yx are parameters with
unknown values and that is a random-error term. In this analysis supply
needed assumptions and use (11.20) as the cost function. 14. In Problem 13
assume that there are costs associated with changing output q and that these
costs can be approximated by a quadratic cost function. How is the solution
to Problem 13 modified by the introduction of costs of changing q ? 15. In
connection with the model in (11.38), if some of the variables not under
control are stochastic (e.g., one might be a measure of rainfall), explain how
the problem of determining a value for wx in (11.39) to minimize the
expecta- tion of the loss function in (11.40) is affected. Then provide
assumptions about the stochastic uncontrolled variables that permit
determination of an optimizing value for 16. The expression in (11.73)
provides a basis for comparing a sequential up- dating solution with a here
and now solution. What is the order in T of the last term on the rhs of
(11.73) which is the approximate reduction in loss associated with the
sequential updating solution vis , vis the here and now solution ? 17.
Provide a critical appraisal of the economic considerations and implications
of the loss function shown in (11.109).

CHAPTER XII Conclusion In the preceding chapters the Bayesian


approach was applied in analyses of a broad range of models and problems.
At this point it is useful to summarize some of the major attributes of the
Bayesian approach to inference in econometrics. 1. As Jeffreys and others
have emphasized, the Bayesian approach to inference complements very
nicely the activities of researchers. A researcher is often concerned with
how information in data modifies his beliefs about empirical phenomena. In
the Bayesian approach to inference an investigator has operational
techniques for determining how information in data modifies his beliefs;
that is, initial beliefs represented by prior probabilities are combined by
means of Bayes' theorem with information in data, incor- porated in the
likelihood function, to yield posterior probabilities relating to parameters or
hypotheses. In a fundamental sense the Bayesian procedure for changing
initial beliefs is a learning model of great value in accomplishing a major
objective of science---learning from experience. 2. As regards estimation,
we have seen that Bayes' theorem can be applied in analyses of all kinds of
models to yield exact finite sample posterior pdf's for parameters. That one
simple principle has such wide applicability is indeed appealing, since it
obviates the need for the use of ad hoc procedures which are often needed
in other systems of inference to get reasonable results. Further, in the area
of point estimation, the Bayesian prescription (choice of the point estimate
that minimizes expected loss) is a general operational principle in accord
with the expected utility hypothesis. The theoretical appeal, generality, and
practical aspects of this solution to the problem of point estimation stand in
marked contrast to the many and varied rules for generating point estimates
in sampling-theory approaches to inference, some l of which have only a
large-sample justification. That Bayesian estimators are admissible,
consistent, and minimize average risk, when it exists, are additional features
that commend their use in practice. 3. Bayesian methods for analyzing
prediction problems are simple, opera- tional, and generally applicable.
Whatever the model, we derive the predictive 360 CONCLUSION 361 pdf
for future observations. This pdf enables us to make probability state- ments
about future observations. For a given loss function involving pre- diction
errors it is generally possible to obtain a point prediction that minimizes
expected loss. 4. With respect to control and decision problems, we have
seen that the Bayesian approach yields solutions that take account of
uncertainty about parameters and the future values of random variables.
That these difficulties are dealt with by a straightforward application of
basic Bayesian principles is further testimony to their generality and
fruitfulness. 5. In the Bayesian approach prior information about parameters
or models can be readily and formally incorporated in analyses of
estimation, prediction, control, and hypothesis-testing problems. This
flexibility con- trasts markedly with currently available sampling theory
techniques. As seen in the analyses of the errors-in-the-variables and
simultaneous equation models, prior information is required to identify
parameters of interest. Sampling theorists usually introduce this information
in the form of exact restrictions. In such situations Bayesians can introduce
less restrictive prior information by choice of a suitable prior pdf and
perform an analysis by using less restrictive prior information than the
sampling theorist. x 6. The problem of nuisance parameters is solved quite
straightforwardly and neatly in the Bayesian approach. Parameters not of
interest to an investigator, so-called nuisance parameters, can be integrated
out of a posterior pdf to obtain the marginal posterior pdf for the parameters
of interest. In the sampling theory approach it is often the case that
"optimal" estimators or test statistics depend on nuisance parameters whose
values are unknown; for example, minimum variance linear unbiased
estimators frequently depend on elements of the covariance matrix of
disturbance terms. In many instances these unknown parameters are
replaced by sample estimates. However, this procedure provides just an
approximation to the optimal estimator and is usually justified by large-
sample theory. That a Bayesian can generally integrate out nuisance
parameters enables him to perform a finite ,sample analysis without having
to rely on a large-sample justification. 7. The Bayesian approach is
convenient for the analysis of effects of departures from specifying
assumptions; that is, use of conditional posterior pdf's enables an
investigator to determine how sensitive his inferences about a particular
subset of parameters are to what is assumed about other param- eters. In
Chapter 4 this procedure was applied in the analysis of the regression model
with autocorrelated errors, a departure from the assumption of independent
error terms. In Chapter 9 the same approach was pursued to This point has
been emphasized by Jacques Drze.

362 CONCLUSION determine how inferences about the marginal


propensity to consume are affected when the assumption that an investment
variable is exogenous is relaxed. 8. In the Bayesian approach inferences
about parameters, etc., can be made on the basis of the prior and sample
information we have. There is no need to justify inference procedures in
terms of their behavior in .repeated, as yet unobserved, samples as is
usually done in the sampling theory approach. This is not to say that
properties of procedures in repeated samples are not of interest; in fact,
Bayesian estimators have desirable sampling properties in that they are
admissible and constructed to minimize average risk, when it exists. In a
given analysis, however, the information currently available is used to make
inferences in the Bayesian approach and there is no need to bring in
considerations regarding performance in repeated samples. 9.. Last, in the
area of comparing and testing hypotheses the Bayesian approach is
distinguished from sampling-theory approaches in that it asso- ciates
probabilities with hypotheses and provides simple operational tech- niques
for computing them. These posterior probabilities incorporate prior and
sample information and represent degrees of belief. As seen in Chapter 10,
such probabilities are useful in comparing non-nested models. Then, too, if
one has explicit losses associated with actions such as accept or reject and
possible states of the world, he can act to minimize expected loss in testing
hypotheses. From what has been said above and from the material presented
in pre- ceding chapters we see that the Bayesian approach to inference is a
unified one that works well on a broad range of problems and models, both
in small and large samples. The conclusion that emerges is that use of
Bayesian methods can result in more fruitful and meaningful econometric
analyses of a wide range of problems. This is not to say, however, that all
problems associated with the Bayesian approach have been solved. In the
preceding chapters several technical problems associated with analyses of
posterior pdf's that remain to be solved were encountered. Also, more work
is required to formulate, understand, and use a broader range of prior pdf's
to represent our prior information, particularly in multiparameter problems.
The tech- nical aspects of developing and interpreting procedures for
computing posterior odds for a broader range of models must be considered
further. Additional results for and experience with adaptive control
problems would be welcome. These problems, among others, deserve
attention. It is hoped that the present volume will serve as a useful point of
departure for research on these and other problems, research that will help
to make the Bayesian approach even more useful than it is at present.
APPENDIX A Properties of Several Important Univariate Pdf's Here we
provide properties of several important univariate pdf's which have
appeared at various points in the text. A.1 UNIVARIATE NORMAL (UN)
PDF A random variable, , is normally distributed if, and only if, its pdf has
the following form: [' ] 1 exp -(x- 0) a , -c < x < c. (A. 1) p(x]O, ) = %/2rr
This pdf has two parameters: a location parameter 0, -oo'< 0 < oo, and a
scale parameter ,, 0 < , < oo. It also has a single mode at x = 0 and is
symmetric about the point. Thus 0 is the median and the mean of the UN
pdf. x Given symmetry about x = 0, the odd-order moments about 0 are all
zero; that is, (A.2) t.r- E(. - O) �'r- = f -oo (x - O) �'- p(x[ O, ) dx = O, r =
1, 2, 3,...; for example, t -- 0 or E = 0. The even-order moments about the
mean are given by _= - o) = (x - o).p(xl o, (A.3) 2r2r - r(r + �), r = 1,2,...,
where I'(r + �) denotes the gamma function with argument r + � [see
(A.6) for a definition of this well-known and important function]. That x = 0
is the modal value of the UN pdf can be seen by inspection from (A.1). That
x = 0 is also median follows from the fact that (A.1) is a normaliied pdf
[i.e., p(xl O, o) dx = 1] and that it is symmetric about the single modal value
x = 0. 363

364 APPENDIX A Further, we note that by a change of variables z = (x -


0)/a, (A.1) can be brought into the standardized UN form:: 1 e_Zt2 --oo < z
< oo. (A.4) p(z) = x/w ' The moments of this pdf are easily obtained from
the expressions for the moments shown in (A.2) and (A.3). Proofs of
Properties of the UN Pdf First, we establish that (A.1) is a proper
normalized pdf; that is [_p(xlO, )dx = 1. Note that p(xlO, ) > 0 for all x such
that -oo < x < o. Now make the change of variable z = (x - O)/a to obtain
(A.4) and note that -oo P(xlO, ) dx = -o p(z) dz. Now, letting u = z�'/2, we
have 0 < u < oo and du = z dz, or dz = du/V-. Then � (A.5) 1 ffo =o. e- dz
= u-�e -'* du. The integral on the rhs of (A.5) is the gamma function with
argument �; that is, the gamma function, denoted by F(q), is defined as
(A.6) F(q) = uq-e -' du, 0 < q < 0% and thus the rhs of (A.5) is (1/X/7r) F(5
). Since it is shown in the calculus that F(5) = ff'S, the rhs of (A.5) is indeed
equal to one. Second, we show that the odd-order moments of x - 0 are all
zero, as shown in (A.2). This is equivalent to showing that E :r- = 0, r = 1,
2,..., since = ( - 0)/. We have E r - = z a- p(z) & + z a- p(z) &. Since z :r-,
with - < z < 0, is negative and p(-z)= p(z) from the symmetrical form of
(A.4), the first integral on the rhs can be expressed as the negative of the
second and thus their sum is zero, as was to be shown. Third, we derive the
expression for the even-order moments shown in (A.3). We shall obtain the
even-order moments of p(z) in (A.4) and from o. Note from z = (x - 0)/, dz
= dx/a and thus the factor 1/a in (A.1) does not appear in (A.4). a
Remember that in changing variables by u = �z 2 we have I_ p(z) dz = 2 '
p(u) du/ /2u, with the factor 2 appearing to take account of the area under
p(z) for both positive and negative values of z. PROPERTIES OF
SEVERAL IMPORTANT UNIVARIATE PDF'S 365 them obtain the
expression in (A.3). Let u = 5z�'; then � � z .,p(z) dz = 2 (2uyp(5-) du 2r
(A.7) = -. ur-Ae -u dt, t r vG r(r + 5), r - 1, 2,..., where P(r + 5) is the gamma
function in (A.6) with argument q = r + 5. Since z = (x -/0/, the even-order
moments, denoted to.r, in (A.3), are just a:r times the expression given in
(A.7) and the result shown in (A.3) is established. For convenience we
provide explicit expressions for the second and fourth moments shown in
(A.3)4: o. = ( - o): = r(1 + 5) = (A.8) and (A.9) m = E(- 0) * = r(2 + 5) = 3
*. Explicit expressions for higher order even moments can be obtained in a
similar fashion. As regards measures other than moments to characterize
properties. of univariate pdf's, we shall just take up measures of skewhess
and kurtosis. Measures of skewhess, that is, of departures from symmetry,
include K. Pearson's measure, shown in (A.10), and two others: mean-mode
(A. 10) Sk = ., (A.11) and (A.12) 3 2 /.1, 8 Of course, for the symmetric UN
pdf all of these measures have a zero value. With respect to kurtosis, the
following measure, the "excess," is often employed' (A. 13) yo. = rio. - 3, 4
Use is made of the following two properties of the gamma function: (i) I(q
+ 1) = qI(q), for q > 0, and (ii) I(�) = x/. In (A.9) property (i) is used twice.

366 APPENDIX A where fi2 = //2 '. The measure yo. assumes a value of
zero for the UN pdf; pdf's for which yo. = 0 are called mesokurtic, those for
which y2 > 0 are leptokurtic, and those for which 72 < 0, platykurtic. As
already pointed out, "... it was thought that leptokurtic curves were more
sharply peaked, and platykurtic curves more flat-topped, than the normal
curve. This, however, is not necessarily so and although the terms are useful
they are best regarded as describing the sign of y2 rather than the shape of
the curve. "5 A.2 THE UNIVARIATE STUDENT t (US t)Pdf A random
variable, , is distributed in the US t form if, and only if, it has the following
pdf: (A. 14) p(xlO, h,O= I'[(v + 1)/2] (�) [ h r(1/2) r(/2) 1 +-(x - 0) �-
where -oo < 0< oo, 0 < h < o%0< randwhere I' denotes the gamma function.
This pdf has three parameters, 0, h, and v. From inspection of (A. 14) it is
seen that the US t pdf has a single mode at x = 0 and is sym- metric about
the modal value x = 0. Thus x -- 0 is the median and mean (which exists for
v > lmsee below) of the US t pdf. The following expressions give the odd-
and even-order moments about the mean: (A.15) Io.r- ---- E( -- O) �'r - = f
, and 2r -- E( - O) 2r = (A.16) (x - o) �'- p(xl o, h, 0 dx = O) r= 1,2,..., v >
2r-l, j.o (x - o) 2' p(xl o, h, 0 dx r(1/2) r0/2) > 2r. As shown below, for the
existence of the (2r - 1)st moment about 0 we must have v > 2r - 1.
Similarly, for the existence of the 2rth-order moment about 0, v must satisfy
v > 2r. Given the symmetry of the US t pdf about x -- 0, all existing odd-
order moments in (A. 15) are zero. In particular, E(. - 0) = 0 so that E. = 0
which exists for v > 1. � With respect to the even-order M. G. Kendall and
A. Stuart, The Advanced Theory of Statistics, Vol. I. London: Griffin, 1958.
New York: Hafner, p. 86. However, for many distributions encountered in
practice a positive ya does mean a sharper peak with higher tails than if the
distribution were normal. o For v = 1 the US t pdf is identical to the Cauchy
pdf for which the first- and higher- order moments do not exist. (A. 17) and
PROPERTIES OF SEVERAL IMPORTANT UNIVARIATE PDF'S 367
moments in (A.16) the second- and fourth-order moments are given by 7 1
v /o. -= E(� - 0) �' = for v > 2, (A.18) /z4 -= E(: - 0) 4 = (v - 2)(v - 4) ' for
v > 4. Given that v > 2, the variance/o. exists and is seen to depend on v and
h. When v > 4, the fourth-order moment/4 exists and, as with/., depends on
just v and h. Since the US t pdf is symmetric about x = 0, the measures of
skewness, discussed in connection with the UN pdf, are all zero, provided,
of course, that moments on which they depend exist. With respect to
kurtosis, we have from (A. 17) and (A. 18) (A. 19) 4 3 =3 1 = -5' v>4. Thus,
for finite v the US t pdf is leptokurtic (yo. > 0), probably because it has
fatter tails than a UN pdf with mean 0 and variance 1/h. As v gets large, the
US t pdf assumes the shape of a UN pdfwith mean 0 and variance/zo. = 1/h.
8 We can obtain the standardized form of the US t pdf from (A.14) by the
following change of variable (^.20) t= v7 (x- 0), -o < t < o. Using (A.20),
we obtain F[(v+ 1)/2] ( (A.21) pOlO = Vl5 r-2) 1 + , --00 < t < 00, which,
with v > 0, is the proper normalized standard US t pdf. This pdf has a single
mode at t = 0 and is symmetric about this point. From (A.20) the moments
of f can be obtained from those of g - 0 when they exist. Proofs of
Properties of the US t Pdf To establish properties of the US t pdf we need
the following results from the calculus: 7 In obtaining these results,
repeated use is made of the relationship I'(q + 1) = qI'(q), q > 0, a
fundamental property of the gamma function. o For v about 30 these two
pdf's are similar; (A.17) and (A.18) show explicitly how and tz4 for the US
t are related to the corresponding moments of the limiting normal pdf; that
is, ta = 1/h and t4 = 3/h u.

(A.22) where 368 APPENDIX A 1. If f(v) I is continuous for a < v < oo and
lim_.o v'f(v) = A, a finite constant, for r > 1, then f If(v)[ dv < oo; that is,
the integral converges absolutely. 9 2. If g(v) is continuous for -oo < v < b
and lim_._.o (-v) r g(v) = c, a constant, for r > 1, then lb_ oo Ig(v)l dv < oo.
3. The relation connecting the beta function, denoted B(u, v), and the
gamma function is x� r(u) r(v) B(u,v) = P(u+ v)' 0 < u,v < o% (A.23) B(u,
v) = f2 x-X(1 - x) - dx, 0 < U,V <00. With these results stated, we first show
that for v > 0 the US t pdf in (A.21) is a proper normalized pdf. xx We note
thatp(tlv) > 0 for -oo < t < 0% and letting t' = t/x/v we can write (A.21) as -
-00 < t' < 00, (A.24) p(t'10 = a , (1 q- , with from the result given in (A.22).
Now make the change of variable z = 1/(1 + t�'), 0 < z < 1, to obtain '
(A.25) P(Z10 = B , zV"-x(1 - z)-6 dz T' 0<z<l. Noting that fp(t'[O dr' = 2 f
p(zlO dz, we have L [ (A.26) p(t'l) dt'= B , z,=-(1 - z)- & = 1, since the
integral on the rhs of (A.26) is just B(1/2, v/2), provided that > O. This
condition is required in order to have f p(t'10 dt' < [see (1) and (2)1. o Note
that I If()l dv < implies lf(v)dv <m; that is, absolute convergen implies
simple convergence. See, for example, D. V. Widder, Advanced Calculus.
New York: Prentice-Hall,. 1947, p. 271. See also p. 273 if. for proofs of (1)
and (2). xo Equation A.22 implies B(u, v) = B(v, u). xx Since (A.21) is
obtained from (A.14) by the change of variable in (A.20), showing that
(A.21) is a proper normalized pdf will imply that (A. 14) also has this
property. x= From z = 1/(1 + t'=), I&ldt'l 2t'/(1 + t'=? and t '= = (1 - z)/z;
thus [dt']&[ = z-%(1 - z)- /2. PROPERTIES OF SEVERAL IMPORTANT
UNIVARIATE PDF'S 369 The results for the odd-order moments in (A. 15)
can be established easily by considering (A.27) f-oo t�'r-X p(t'lv) dt', -oo <
t' < 0% with t' = t/5/' = x/-h (x - 0). For the integral in (A.27) to converge we
need 2r - 1 < by application of (1) and (2). Thus, if 2r - 1 < , the (2r - 1)st
odd-order moment exists and is zero by virtue of the symmetry of p(t'lO
about t' = 0. The expression for the even-order moments in (A. 16) is most
easily obtained by evaluating (A.28) f_ t '"r p(t'lO dt', -oo < t' < oo, with t'=
(x- 0). For the integral in. (A.28) to converge we need v > 2r. If this
condition is satisfied, we use the transformation z = 1/(1 + t''), or t '' = (1 -
z)]z [see (A.24)] to obtain L t'*P(t'lv) dt'= [B(,)] fo () 'zt (1- z) -6 dz (A.29)
= B , z't"-'-(1 - z) - dz B(v/2 - r, r + 1/2) P(r + 1/2) r0/2 - r) = = , >2r. B(1/2,
r(1/2) This gives the 2rth moment ofp(t'lv). From x - 0 = t' we obtain the
2rth moment ofp(x I 0, h, ), la9.r = E('- O) 'r P(r + 1/2) r(v/2- r) () r = r(1/2)
p(./2) , v > 2r, which is just (A. 16). A.3 THE GAMMA (G) AND X �'
Pdf's As the name implies, the G pdf is closely linked to the gamma
function. A random variable, ., is distributed according to the G distribution
if, and only if, its pdf is given by X a- 1 (A.30) p(xly , a) = p(a)y e -x", 0 <
x < oo, where a and y are strictly positive parameters; that is, a, y > 0. From
the form of (A.30) it is seen that y is a scale parameter. When a > 1, the pdf
has

370 n'PENDIX n a single mode 3 at x = y(a - 1). For small values of a the
pdf has a long tail to the right. As grows in size for any given value of y, the
pdf becomes more symmetric and approaches a normal form. The G pdf can
be brought into a standardized form by the change of variable z = x/y,
which results in 1 (A.31) p(zl) = p()z-e -, 0 < z < . From the definition of the
gamma function it is obvious that (A.31) is a proper normalized pdf and
that moments of all orders exist. The moments about zero, denoted by /, are
given by (A.32) r' = f z p(zla) & = r(r + a) P(a) ' r = 1,2, .... From (A.32) we
have for the first four moments about zero x (A.33) '=; a'=(1 +); o'=(2+a)(1
+); and ' = (3 + )(2 + )(1 + ). From these results it is seen that is the mean of
the G pdf and, surprisingly, also its variance. x* Further, for the third and
fourth moments about the mean we have = 2 and = 3(2 + ). Collecting these
results, we have (A.34) x' =, =, =2 and = 3(2+). Given that 0 < < , the
skewhess is always positive. Since the mode is located at z = - 1 for 1, we
have for Pearson's measure of skewness Sk = (mean-mode)/ = 1/. Clearly,
as grows in size, this measure of skewhess approaches zero. As regards
kurtosis, ya = 4/a a - 3 = 6/a, which also approaches zero as grows large.
That Sk 0 and ya 0 as is, of course, connoted with the fact that the G pdf
assumes a normal form as .x The X a pdf is a special case of the G pdf
(A.30) in which a = v/2 and y = 2; that is, the X pdf has the following
formX7: xV12 - i e- x12 (A.35) P(xlv) -- r(v/2)' 0 < x < xa For 0 < < 1 the
pdf has no mode. x4 In obtaining the expressions for the moments below,
we use the relation r(1 + q) = q I'(q) repeatedly. x5 The variance tzo. is in
general related to moments about zero by the following relation- ship: 2 =
to.' - t '2. For the G pdf 2 = (1 + ) - 2 = . x6 Moments, etc., for the
unstandardized G pdf in (A.30) are readily obtained from those for the
standardized pdf (A.31) by taking note of z = x/y. x7 Often (A.35) is written
with x = X 2. PROPERTIES OF SEVERAL IMPORTANT UNIVARIATE
PDF'S 371 where 0 < v. The parameter v is usually referred to as the
"number of degrees of freedom." Since (A.35) is a special case of (A.30),
the standardized form is given by (A.31) with replaced by v/2. Also, the
moments of the standardized form are given by (A.33) and (A.34), again
with = v/2. Since the standardized variable z is related to the unstandardized
variable x by z = x/y = x/2, moments of the unstandardized X a pdf in
(A.35) can be obtained easily from moments of the standardized X �' pdf.
For the reader's convenience we present the moments associated with
(A.35): (A.36) /x'= v; /a = 2v; /3 = 8v; and A very important property of the
X a pdf is that any sum of squared inde- pendent, standardized normal
random variables has a pdf in the X �' form; that is, if = g' + ga a +... + gff,
where the g's are independent, standard- ized normal random variables, then
� has a pdf in the form of (A.35) with V -- .19 A.4 THE INVERTED
GAMMA (IG) Pdf's The IG pdf is obtained from the G pdf in (A.30) by
letting y equal the positive square root of 1 Ix; that is, y = [ '[ and thus y2 =
1 Ix. With this change of variable the IG pdf is 2� 2 (A.37a) P(YlY, ) =
p(a)yy2+ e-, 0 < y < where y, a > 0. Since this pdf is encountered frequently
in connection with prior and posterior pdf's for a standard deviation, we
rewrite (A.37a), letting = y, = v/2, and y = 2/rs ' to obtain (A.37b) p(lv, s) =
2 (___) vo. 1 r(7/2) ,+ e -'�'�, 0 < < where v, s > 0. The pdf in (A.37b) has
a single mode at the following value �-x of (A.38) *moo = S[v--'-'-} '
Clearly, as v gets large, *moo--> s. o These are obtained from (A.34) with =
v/2 and z = x/2, where x is the X �' variable in (A.35), z is the standardized
variable in (A.31), and is the parameter in (A.31). x0 See M. G. Kendall
and A. Stuart, op. cit., pp. 246-247, for a proof of this result. 2o. Since
(A.37a) is obtained from a proper normalized pdf by a simple one-to-one
differentiable change of variable, it is a proper normalized pdf. 2 This is
easily established by taking the log of both sides of (A.37b) and then
finding the value of a for which log p(alv, s) achieves its largest value.

372 a, PENDIX A The moments of (A.37b), when they exist, are obtained
by evaluating the following integral: (A.39) btr' = c f; er-(+ X)e-S2m'2 de,
where 2 On letting y = vs'/2e ', (A.39) can be expressed as (A.40) [v$9.\ (r-
v)l v. f; i r' = �C..-. I y (v-r)19'- Xe- dy. The integral in (A.40) is the
gamma function. For it to converge we must have (A.41) v - r > 0, which is
the condition for existence of a moment of order r. Inserting the definition
of c in (A.40), we have F[(v - r)/2] (v_) rt' (A.42) //= F(v/2) , v > r, a
convenient expression for the moments about zero. The first four moments
are F[(v - 1)/2] [v\ (A.43) m '= [5] s, > 1, 1)() us ' (A.44) r(v/2) s�'= 2' > 2,
- 3)/2 ! % (A.45) I'(v/2) s3' v > 3, and I'(v/2 - 2) �'s4 = 4) s4' v > 4. (A.46)
/4'= ((v]) (v- 2)(v- It is seen from (A.43) that the mean/x' is intimately
related to the parameter s. As v gets large, x' s, which is also approximately
the modal value for large v (see above). In the calculus it is shown that with
0 < q < , as q , qO- (q + a)/(q + b) I for a and b finite. PROPERTIES OF
SEVERAL IMPORTANT UNIVARIATE PDF'S As regards moments about
the mean, '3 we have 373 t 2 = t2 -- /1 2, (A.47) vs ' - v - 2 /x'" v>2, i (A.48)
tza =/zo' - /x/o. -/x'o, v > 3, and (A.49) tz4 =/z' - 4tz'/o - 6/q'atza -/x', v > 4.
These formulas are useful when we have to evaluate higher order moments
of the IG pdf. The Pearson measure of skewhess for the IG pdf is given by
mean - mode Sk= (A.50) - L' - > 2. Since this measure is generally positive,
the IG pdf is skewed to the right. Clearly, as v gets large Sk O. For small to
moderate v the IG pdf has a rather long tail to the right? A.5 THE BETA
(B) Pdf A random variable, �, is said to be distributed according to beta
distri- bution if, and only if, its pdf has the following form: (A.51) p(xla, b,
c) = cB(a, b) 1 - , 0 < x < c, where a, b, c > 0 and B(a, b) denotes the beta
function, shown in (A.23), with arguments a and b. It is seen that the range
of the B pdf is 0 to c. By a change of variable, z = x/c, we can obtain the
standardized B pdf, 1 z_X( 1 _ z),_ x (A.52) p(zla, b) = B(a, b--} , 0 < z < 1,
which has a range zero to one. Some properties of (A.52) follow. That
(A.52) is a proper normalized pdf is established by observing that the pdf is
non-negative in 0 < z < 1 and that B(a, b) = o x z-X(1 - z) -x dz, aa See M.
G. Kendall and A. Stuart, op. cit., p. 56, for formulas connecting moments
about zero with moments about the mean. a Since it is rather
straightforward to obtain the pdf for a n, n = 2, 3,..., from that for in (A.37b)
and to establish its properties, we do not provide these results.

374 APPENDIX A which converges for all a, b p 0. If a p 2, (A.52) is


tangential to the abscissa at z -- 0, and if b p 2 it is tangential at z = 1. For a,
b p 1,�'6 the mode is a-1 (A.53) Z=o, = a+b-2' The first and higher
moments about zero for the standardized B pdf in (A.52) are ' = z+'-x(1 - z)
t'-x dz t B(a, b) B(r+a,b) F(r+a) F(a+b) (^.54) -- -- B(a, b) r(r + a + b) r'(a)
a(a + 1)(a + 2)...(a + r- 1) - (a+b)(a+b+ 1)...(a+b+r- 1)' r-- 1,2,..., where
(A.22) and the recurrence relation for the gamma function, I'(q + 1) - q I'(q),
have been employed. Thus the first three moments from (A.54) are t a
(A.55) t = ----3' a+ , a(a + 1) (A.56) t. -- (a + b)(a + b + 1)' , a(a + 1)(a + 2)
(A.57) / = (a + b)(a + b + 1)(a+b+25' and so forth. The mean and higher
moments are seen to depend simply on the parameters a and b. Further, the
variance is given by ab (A.58) ' = (a + b)'(a + b + 1)' As regards skewness,
Pearson's measure for a, b p 1 is (A.59) a/(a + b) - (a - 1)/(a+b-2) (b - a)/(a +
b)(a + b - 2). Thus, if b = a, $k = 0 and the pdf is symmetric. If b p a, there
is positive skewness, whereas if b ( a there is negative skewness? a5 For 0 <
a, b < 1 the pdf approaches oo as z -+ 0 or 1. ae Since the variable of the
unstandardized B pdf in (A.51) is related to that of the standardized B pdf in
(A.52) by x -- cz, the moments, etc., associated with (A.51) are easily
obtained. PROPERTIES OF SEVERAL IMPORTANT UNIVARIATE
PDF'S 375 A useful and important result connecting the standardized
gamma (G) and beta (B) pdf's follows. Let x and g. be two independent
random variables, each with a standardized G pdf with parameters and .,
respectively [see (A.31)]. Then the random variable = g/(g + .) has a
standardized B pdf with parameters ax and .; that is, the pdf for g = g/(g + .)
is z-(1 _ z)%-/B(ax, .)..7 This result is often used when x and g. are
independent sums of squares of independent standardized normal random
variables. Further, since the B pdf can be transformed to the F pdf, as
shown below, we can state on this basis that - /( + g.) has a pdf trans-
formable to an F pdf. A pdf closely associated with the B pdf is the beta
prime or inverted beta (IB) pdf? Its standardized form is obtained from the
standardized B pdf in (A.52) by letting z = 1/(1 .+ u) to obtain the IB pdf: 1
u b- l (A.60) p(u]a,b) = B(a,b)(1 + u) *+' 0 < u < 0% with a, b > 0. The
moments of this pdf are given by (A.61) , 1 f; u r+t-' tZr = B(a, b) (1 q- u) +
lu B(b + r, a- r) = B(a, b) ' r < a, a result that is obtained by a change of
variable u = 1/(1 + z) in (A.61) and noting that the result is in the form of a
standardized unnormalized B pdf. Then, from (A.61), (A.62) /&, = b a- 1' a
> 1, , b(b + 1) (A.63) /' = (a - 1)(a - 2)' a > 2, and so on. The variance is
b(a+ b- 1) (A.64) /' = (a - 1)'(a - 2)' a > 2. ' The joint pdf for gx and is the
product of their individual pdf's, since they are assumed to be independent;
that is, p(zx, z2lex, .) dzx dz = [I'(ex) I'()] - x zx0x - x x z%-Xe-%+)dzx dz..
Now change variables to v = zx + z and z = zx/(zx + z), with dzx dz -- v dz
dr, and integrate with respect to v from zero to infinity to obtain the result
above. 8 See, for example, J. F. Kenney and E. S. Keeping, Mathematics of
Statistics: Part Two (2nd ed.). Princeton, N.J.: Van Nostrand, 1951, pp. 95-
96, and H. Raiffa and R. Schlaifer, Applied Statistical Decision Theory,
Boston: Graduate School of Business Administration, Harvard University,
1961, pp. 220-221.

376 APPENDIX A The IB pdf has a single mode if b > 1 at b-1 (A.65)
Umoa = a + 1' Then Pearson's measure of skewness for the IB pdf is (A.66)
Sk= b/(a- 1)- (b- 1)/(a + 1) 2b+a-l[ a-2 j] = a+l 5(a+b-1 ' b>l and a>2, which
is positive and shows that the IB pdf usually has a long tail to the right.
Last, we can obtain an important alternative form of (A.60) by letting u =
y/c, with 0 < c, to yield 1 (y/cy '-' 0 < y < co, (A.67) P(Yla?b'c) = cB(a,b)(1
+ y/c) '+" where a, b, c > 0. Since y -- cu and we have already found the
moments associated with (A.60), the moments for (A.67) are directly
available from (A.61) to (A.64), that is the rth moment about zero is c%.'
with/r' given in (A.61). It will be seen in the next section that the F and US t
pdf's are special cases of (A.67). A.6 THE FISHER-SNEDECOR F PDF A
random variable, , is said to have an F distribution if, and only if, it has a
pdf in the following form' (A.68) p(xJux, vO = B ,' V'o.! (1 + (vdv.)x) (" +
where vx, v. > 0. It is seen that (A.68) is a special case of the IB pdf in
(A.67), where a = d2, b = vx/2, and c -- ./vx. The parameters vx and . are
usually referred to as degrees of freedom and (A.68) is called the F pdf with
x and . degrees of freedom. If vx/2 > 1, the F pdf has a single mode at
(A.69) vo.v/2- 1 Xmoa ,, vd2 + 1 The moments of the F pdf can, of course,
be obtained directly from those associated with the IB pdf shown in (A.62)
t.o (A.64). For easy reference we PROPERTIES OF SEVERAL
IMPORTANT UNIVARIATE PDF'S 377 list the moments of the F pdf: ,
(A.70) I = vx %/2 - 1" > 1, (A.71) /" = (v,./2- 1)(v,./2- 2)' > 2, and so on.
The variance of the F pdf is (Lt ' (d2)[( + 0/2 - 11 v. (A.72) = w/ (d2-
1)2(,,./2- )' 5 > 2. We.now review relations of the F pdf to several other
well-known pdf's. 1. If in the F pdf in (A.68) rx = 1 and we let t 2 -- x, the F
pdf is trans- formed to a standardized US t pdf with . degrees of freedom. 2.
If x and . are independent random variables with X 2 pdf's that have v and
% degrees of freedom, respectively, then = (x/vO/(z'a/vo.) has an F pdf with
vx and vo. degrees of freedom, provided vx, vo. > 0. 3. If � has the F pdf
in (A.68), then, when % --> co, the random variable vx� will have a Xo.
pdf with vx degrees of freedom. 4. If � has the F pdf in (A.68), then, when
vx --> co, the random variable v./ will have a X �' pdf with va degrees of
freedom. 5. If � has the F pdf in (A.68), then, when % --> co with vx = 1,
the random variable / will have a standardized UN pdf. 6. If and ao. are
independent random variables with IG pdf's in the form of (A.37b) and
parameters vx, sx, and vo., so., respectively, the random variable � =
(ax�Jx�')/(ao.�'/so.�) will have an F pdf with va and vx degrees of
freedom. Proposition (1) is established by making the change of variable t
�. = x in (A.68) and noting that with vx = 1 the resulting pdf is precisely
the standard- ized US t pdf with vo. degrees of freedom. Proposition (2) is
established by noting that the joint pdf for x and 3o. is 'p(zl, z2lvl, v2) -----
kzm-Xzata-Xe-% +a )/2, 0 < zx, z2 < co, with 1/k = 2%+a)t�'P(vx/2)P(vd2
). Now let v = zx/zo. and y = (zx + zO/2, which implies that zx = 2vy/(v +
1) and za = 2y/(v + 1). The Jacobian of this transformation is 4y/(v + 1) 2
and thus the pdf for v and y is 9vx/2 - 1 y (vx +va)12-le-lt 0 < v, y, <co. p(v,
ylvx, %) = 2% +va)/2k (1 + v)% +va)/2 , On integrating with respect to y,
the result is p(vl, = + rod2, vd2) (1 + v)% +'-)'" 0 < v < co. Finally, on
letting x = (vdvx)v, we obtain the F pdf in (A.68).

378 APPENDIX To prove Proposition (3) let z = vxx in (A.68) to obtain


r[(v + v.)/2] 1 zW '- p(zl. vo.) = r(d2)r(./2) .v - (1 + z/v.)( * and as vo. --->
oo with vx fixed the limit is -! ZVl/2-1�-z/2 0 < Z < 00, which is a x �'
pdf with vx degrees of freedom. Propositions (4) and (5) can be established
by using similar methods. Proposition (6) is established by writing the joint
pdf for x and o.' t vl$12 v2522 p(, .]vx, v., &, s.) = k(l + . + )- exp 2. 2--]
with = r(d2Tr(d2[l TI 0 2 < Now make the following change of variables A
= ,x'/*o. �' and 4 = crx to obtain .p(, [vx, v2, $x, $2)= k v2t2-x ( 12 ) 2 +
+x exp -- (vs + ,%s,ff) , 0 < &4', <m. On integrating with respect to , the
result is If we change variables by x = s2�')/s 2, the pdf for x will be
precisely the F pdf in (A.68). APPENDIX B Properties of Some
Multivariate Pdf's B.1 THE MULTIVARIATE NORMAL (MN) Pdf The
elements of a random vector i'= ('1, '2,''', 'ra) are said to be jointly normally
distributed if, and only if, they have a pdf in the following form: (B.1) p(xl0
, Y.) = IY'I (2rr)mta exp{- � (x - 0)' y.-x (x - 0)}, -oo <xt< o% i-- 1,2,...,m,
where x' = (x, x2,..., Xm), 0' = (0, 02,..., Ore), with -oo < 0, < o% i = 1, 2,...,
m, and Y. is an rn x rn positive definite symmetric (PDS) matrix. For
convenience the MN pdf is often written (B.2) p(xl0, V) = where IVl'
(2rr)m/2 exp{- � (x - 0)'V (x -0 )}, --oO < Xt < 0% i = 1, 2,..., m, (B.3) V -
- Z -x is PDS. That (B.2) is a proper normalized pdf can be shown by
observing that the pdf is positive over the region for which it is defined.
Also, (B.4) f_.,cxlo, 1, where dx = dx dxo.... dxm; (B.4) can be shown
easily by employing the following change of variables: (B.5) x - 0 = Cz
with C an rn x rn nonsingular symmetric matrix such that C'VC = I. Since V
is PDS, there exists an orthogonal matrix, say P, such that P' VP = D, where
D is an m x m diagonal matrix with the positive roots of V on the diagonal.
Then C = PD- is a nonsingular symmetric matrix such that C' VC = I. 379

380 APPENDIX B The Jacobian of the transformation in (B.5) is l C[ and


thus (B.2) can be expressed (B.6) p(z) = ICl 1171 exp{- � z'C'VCz} 1
exp{- � z'z), -oo < z < oo, i = 1, 2,... m. (2r)m. ' The pdf in (B.6) is in the
form of a product of m standardized, proper normalized UN pdf's. Thus
(B.6), integrated with respect to z, -oo ( z ( oo, i = 1, 2,..., m, is equal to 1,
which establishes (B.4). The pdf in (B.6) is usually referred to as a
standardized multivariate normal (SMN) pdf. a If the elements of an m x 1
random vector have a SMN pdf, it is clear from the form of (B.6) that they
are independently distributed with (B.7) E. = 0 and E..' = I; that is, each ,
has a zero mean and unit variance and all covariances, E,gj, i,j = 1, 2,..., rn,
i : j, are zero. From X - 0 = C. and the results in (B.7) we have E( - o) = CE.
= 0 and (B.9) �( - 0)( - 0)'= = CC'= V - =Z. 4 The result in (B.8) gives E =
0, as the mean vector of the MN pdf, whereas (B.9) yields V- x (or Z) as its
covariance matrix. We now derive the conditional pdf for 'x, given ., where
'' = (x'.') has a MN pdf given by (B.2). Partitioning x - 0 and V to
correspond to the. partitioning of ', that is x2 02 \172 17221 ' Note that I cI I
Vl = I C'VCl = 1. a Often the term "spherical" is used rather than
"standardized" to emphasize the fact that the contours of (B.6) are spherical
(or, with rn = 2, circular). 4 From C' VC -- I we have V -- (C')- xC- x and
thus V- x = CC'. PROPERTIES OF SOME MULTIVARIATE ?DF'S 381 we
can express the quadratic form in the exponent of (B.2) as (x - 0)'17(x - 0) =
(Xl - + (x2 (B.10) --- Ix I x [xx + (x2 ox)'17(xl - 00 + 2(x - 00'172(x2 - 02) -
02)'1722(x2 - 02) Ol + Vll-lV12(x2 - o2)1'Vll - ol + v11-1 v12(x2 - o2)1 -
o2)'(v22 - v2x Vll -lv12)(x2 - 02) Further, we have by completing the
square on xx. (B. 11) [V, = I Vx V' I 172x 1722 = 117111 11722 -
17211711-117121. On substituting from (B.10) and (B.11) into (B.2), we
can write (B.2) as the product of two factors p(x, x40, 17)= (.! \(2rr);i7. exp
{-�[xx - 0x + Vn-xVxa(x. - 0a)]'Vn x [x - ol + vll-lv2(x2 - x {11722-
17211711-117121 � x exp [-�(x2 - o2)'(17.2 - 17.11711-11712)(x2 -
02)]}, where ml and m2 are the number of elements in x and x2,
respectively, with m + m2 = m. Both factors in (B. 12) are in the form of
normalized MN pdf's. The first factor on the rhs of (B.12) is the conditional
pdf for xl, given xa, since in general we can write p(xx, xa[0, V) = p(xxlx.,
0, V)p(x.l 0, V). It is a MN pdf with mean vector (B.13a) zr(112) = Ol -
1711-11712(x2 - 02) and covariance matrix (B. 14a) CoV(alla25 = 1711-1.
Since V- Y'-L we can express (B.13) and (B.14) in terms of submatrices of
Z 5' (B.13b) zr(11:2) = Ol + z12z22-1(x2 - 02) That is, we partition Y.- x to
correspond to the partitioning of V, (Vll Vlq /Zll z12 v2 v22! = \z21 z221'
Then Vxx - x = (52xx)- x and Vxa = Z xa. If we partition Z correspondingly
as

382 A??�NDIX B and (B. 14b) Cov(l[2) = ]11 - Y"I2Y'22 - lye21' The
marginal pdf for xo. can be obtained from (B. 12) by integrating with
respect to the elements of x. Since x appears just in the first factor on the rhs
of (B.12), and this is in the form of a normalized MN pdf, integrating the
first factor with respect to the elements of xx yields 1. Thus the second
factor on the rhs of (B. 12) is the marginal pdf for xo.. It is a MN pdf with
mean vector 0o. and covariance matrix 6 (B.15) Cov(a.) = - By similar
operations the marginal pdf for x is found to be MN, with mean and
covariance matrix (Vx - Last, consider linear combinations of the elements
of ; that is (B.16) $x = Lx, where ' is an rn x 1 vector of normal random
variables with a MN pdf, as shown in (B.2), and Lx is a n x m matrix of
given quantities with n < rn of rank n. Then $x is an n x 1 vector whose
elements are linear combinations of the elements of . If n < m, write I.
(u.17) = = \wo.! \LJ with the matrix L an rn x m nonsingular matrix. Then E
= LE = L0, and we can write (B.18) - L0 = L(' - 0). Then, on noting that the
Jacobian of the transformation from ' to in (B.18) is Iz-l, we can obtain the
pdf for ' (B.19) p(w10, Y, L) = IZ'Zl - (2n)m/2 exp [-�(w - LO)'L-X'Z-XL-
X(w - L0)], we have Z x2 = -5.115.1222-1 and (Zxx) -x = 51xx -zaaa-xaz.
Thus Vxx -x = 5'.11 -- ,12Z22-1E21 and Vxx- x Vx2 = -5'.12'22-1. Since V
= Z-x, we have and thus Z.xVx2 4-22[/22 = I and Z2xVxx + E22Vl = 0.
The second of these relations yields Zx = -Z2Vo. x Vn -x which, when
substituted in the first, yields Z.o.: (V.2 - V2xVn-xV12) -x PROPERTIES
OF SOME MULTIVARIATE PDF'S 383 which is a MN pdf with mean
vector L0 and covariance matrix LZL'. Thus $ = L has a MN pdf. If we
partition w, as shown in (B.17), the marginal pdf for wx will be multivariate
normal with mean vector Lx0 and covariance matrix LxZLx', an application
of the general result associated with (B.12) which gives the marginal pdf for
a MN pdf. B.2 THE MULTIVARIATE STUDENT (MS) t Pdf A random
vector ': (x, o.,..., m) has elements distributed according to the MS t
distribution if, and only if, they have the following pdf: (B.20) p(x10, V, ,
rn) = + m)/2]l + (x - - - < x, < o,i= 1,2,...,m, where v > 0, V is an rn -oo < 0,
< oo, i= 1,2, is PD, the MS t pdf has symmetric about x = 0, v > 1, as shown
below. x rn PDS matrix, and 0' = (0x, 0o.,..., 0), with ..., m. Since the
quadratic form (x - 0)'V(x - 0) a single mode at x = 0. Further, since the pdf
is 0 is the mean of the MS t pdf which exists for The symmetry about 0
implies that odd-order moments about 0, when they exist, will all be zero.
The matrix of second- order moments about the mean exists for v > 2 and is
given by V- [v/(v - 2)]. To establish that (B.20) is a proper normalized pdf
we note that it is positive in the region for which it is defined. If we let
(B.21) x - 0: Cz, where C is an rn x rn nonsingular matrix such that C' VC =
Ira, the pdf for z, an rn x 1 vector, is * (B.22) p(zl,m) = v�'I'[( + m)/2] + -o
< z < oo, i- 1, 2,..., m, the standardized form of the MS t pdf. We show that
(B.22) is a normalized pdf 6 by making the following change of variables
from zx, zo.,..., Zm to 7 Note from C' VC: Ira, I VI A = I Cl - . The
Jacobian associated with the transforma- tion in (B.21) is [C[ and thus
[VI� times the JacobJan is equal to 1. Another way of showing this is to
observe that p(zlv ) can be written as p(zlv ) = p(z,z(,-x), v) p(Jrn_llZ(m_2),
v)... p(Jllv), where z&_y = (zx, z.,. . ., Zm-), j = 1, 2, ..., m - 1. Each factor
is in the form of a US t pdf and can be integrated by using the results of
Appendix A.

384 AVVENDIX B U, a 1, a9., . .., am-1, given by Zz = /,/S cosa l


cosao.'''cOSam-i z,. = uS cos ax cos ao. � � � cos am-o. sin am-x (B.23)
COS a 1 COS a9. � � � COS a m-I sin am-t + l Zm = US sin ax, where 0
< u < co, -r r/2 < at<rr/2 for i= 1,2,...,rn-2, and 0< am-x < 2rr. From
trigonometry (B.23) yields 2 _ Z2 2 +... - Zra 2 = ZtZ. (B.24) u = z Also,
the Jacobian of the transformation in (B.23) is � �u m'-x cos m-2 a x cos
m -a a.- � � cos am-.. Thus the pdf in (B.22) becomes le,.r[(v + m)/21 u
m"- (B.25) .v(u, a, a,.,..., am-l, m) = 2 rrm'"P(v/2) ( + u) <'+m>'" X COS m-
2 al COS m-3 a9....COSa m_2. Now (B.25) can be integrated with respect
to u and the ds by using f; umt,.- 1 (i) P(v/2)P(m/2) O, (B.26) (v + u) ('+m)''
du = v-- B , = v 't*' P[(v + m)/2] ' v > from (A.60), (B.27) n12 COS m-y-1 -
r[(m - j)/21 r[(m- j- 1)/2 + 1i' j = 1, 2,...,rn - 2, and (8.28) " dam- = 2ft. On
substituting from (B.26), (B.27), and (B.28) into (B.29) . . . .f p(u, a,, a=, . .
. , am- lv, m) du dax . . . dam- x, with the integrand given by (B.25), the
integral in (B.29) has a value of 1. Thus (B.25) and (B.20) are normalized
pdf's. Note from (B.25) and (B.26) that the normalized marginal pdf for u =
Z'Z is vV12 1,lm12 - 1 (B.30) p(ulv, m) = B(v/2, m/2)(v + u) ('+m)t" 0 < u
< co. 9 See, for example, M. G. Kendall and A. S. Stuart, The Advanced
Theory of Statistics, Vol. I. London: Griffin, 1958, p. 247. PROPERTIES
OF SOME MULTIVARIATE PDF'S 385 Letting u -- my, we have (B.31)
p(ylv, m)= B , (1 +(mir)y) <'+m)/"' O<y<co, which is an F pdf with rn and v
degrees of freedom [see (A.68)]. Thus the random variable y = 17/m =
.'Um, with . having a pdf given by the standardized MS t pdf in (B.22), has
an F pdf with rn and v degrees of freedom. Further, by writing (B.21) to
connect random variables we have (B.32) '. (x - 0)'(c-)'c-(x - 0) rn rn ( - o)'/(
- 0) m and thus the quadratic form (X - 0)'V(: - O)/m has an F pdf with m
and v degrees of freedom when 'i's pdf is the MS t pdf shown in (B.20). To
obtain expressions for the first and second moments associated with (B.20)
we determine moments associated with (B.22) and then use (8.21) to find
moments for (B.20). As regards existence of moments, consider the rth
moment about zero. To evaluate this moment we have to consider the
integral oo 21r (B.33) (a + zx�') (+m)t�" rn = 1, 2,..., with a -- v + 1"= o.
z 2. Using the tests for convergence described in Appendix A, we see that
(B.33) will converge for any rn if r + 1 < v + 1 or v > r. Thus for the first
moment to exist we need v > 1, for the second, v > 2, and so on. From the
symmetry of (B.22) we have (B.34) E. = 0, v > 1, and fromX-
0=C.,E=0forv> 1. To evaluate the second moments associated with (B.22)
consider (B.33) with r = . By letting u = zxO'/a (B.33) can be brought into
the following form: (B.35) 1 y0 uS 1 (+m-3 i) a (v+m-a)la (1 + 1,l)
(v+m)12 du = a(v+m_3)12 B v 2 ' ' Now (B.35) has to be integrated with
respect to z,., za,..., zm, which appear in the quantity a = v + __0., z,"; that
is, using (B.35), (B.36) Egx2= kB( v + m - 3 i)f-o f7 dza...dzm 2 ' , '" oo (v
+ Y?=2 z') ('+m-ae

386 APPENDIX B with k = vv/O'F[(v + m)1211'"lO'F(v/2), the


normalizing constant of (B.22). The integrand of (B.36) can be brought into
the form of a standardized MS t pdf and integrated o to yield (B.37) Ea = v
- 2' Since this argument can be applied separately to o., a,. �., ,, and Ej -- 0
for i % j, the covariance matrix for is (B.38) E' =----Z--v I,, v > 2. v--2
From i - 0 = CS the covariance matrix for i - 0 is tr(a - 0)( - 07= cEc' v v
V_n v > 2. (B.39) v'---- CC' = , - - v - 2 We now consider the marginal and
conditional pdf's associated with the MS t pdf in (B.20). To accomplish this
conveniently we let TM H V/v and rewrite (B.20) as (B.40) p(x10, H, v) =
r[(v + m)/21 ,,"""r(,,/2) IHlY'[1 + (x - 0)'H(x - 0)1 -<"'+'>'0', v>0. Now
partition (x- 0)'= [(x- 00'(x,.- 00'], where x- 0 is an rn x 1 vector and xo. -
0o. is an mo. x 1 vector with rn + mo. = m. Also let {i H= \io. io.d' xo The
integrand of (B.36) can be written as v' + - z, �' , -oo < z < , i = 2, 3,..., m,
with v' = v - 2. Then letting w = '' z, (B.36) becomes kB( v + rn -- 3 3
{v'Vlo. - dwo.. . . dwrn 2 ']\71 J" 'j (v' + Ypo. wF') = kB(V + m- 3,3'{v__"
vt�'-' =r-"�'I'("'/2) ' 2]\v] (v')q�'P[(v' + rn - 1)/21 = v' --oo < w{ < oo, i=
2, 3,...,m. From C'VC = I, V = (C')-'C -' or V - = CC'. With v > 0, H is PDS.
PROPERTIES OF SOME MULTIVARIATE PDF'S where the partitioning
of H corresponds to that of x - 0. Then x + (x - 0)'H(x - 0) = (B.41) 387 p(x,
xo.10, H, ) = (B.42) where k P[(v + mo.)/2] P[(m + 0/21 = ,,-uo.r(/2) and
ko. = - rrZo.P[(v + mo.)J2] The second line of (B.42) gives explicit
expressions for the marginal and conditional pdf's; that is (B.43) with p(xl,
xo.10, H, .) = p(xo.10, H, v) p(xlx. , 0, H, v), (B.44) and a p(xo.10, H, 0
=/qlH,.o. - HO.H-HO.I (1 + Qo.)(=+')/" ' ko.(1 + p(xxlx., 0, H, v)= [1 +
Qx.d(1 + with Q.o. and Qo. defined in connection with (B.41); (B.44) and
(B.45) show that in general the marginal and conditional pdf's associated
with a MS t pdf have the forms of MS t pdf's. 4 From (B.44) we have for
the mean and covariance matrix of the marginal pdf for x2 (B.46) Eio. =
0,,., v > 1, la The expression for the conditional MS t pdf, given in H. Raiffa
and R. Schlaifer, Applied Statistical Decision Theory, Boston: Graduate
School of Business Administra- tion, Harvard University, 1961, p. 258 is
erroneous. t If xo. in (B.44) and x in (B.45) are scalars, these pdf's are US t
pdf's. where Q.. and Qo. denote the first and second quadratic forms,
respectively, on the rhs of the second line of (B.41). Then, noting \Hi =
IHnI JHo.2 - Ho.Hn-Hio.I, we can express (B.40) as

388 APPENDIX B and (B.47) Var(.) = v>2, since H -- V/v. For the
conditional pdf in (B.45) we have E(xl,.) = 0x - Hxx-XHx,.(x. - (B.48) = 0x
- Vxx -x Vx.(x. - 0.), m2 + v > 1, and (B.49) Var(l=) = 1 (1 + Q.)H - m.+v-2
_ v (1 + Q.)Vxx -, m. + v > 2, -m.+v-2 where Q. = (x. - 0.)'(H.. - H.xHxx-
XHx.)(x. - 0.). Next consider linear combinations of the elements of a
random vector i with a MS t pdf, as shown in (B.40): (B.50) S = Li, where
L is an m x m nonsingular matrix. Then ES = L0. The Jacobian of the
transformation in (B.50) is IL-l and thus from (B.40) the pdf for S is P[(v +
m)/2] (B.51) p(w10, F, v) = IFlV2[1 + (w - mO)'F(w- mo)] rrm.i,(/2) '
where F = L-X'HL-x. Thus the elements of S have a MS t pdf with mean L0
and covariance matrix equal to (u - 2)-XLH-XL' = [/( - 2)]LV-XL '. The
marginal and conditional pdf's associated with (B.51) are easily obtained by
using (B.44) and (B.45) and, of course, will be in the MS t form. A single
linear combination of the elements of , say S, the first element of S, will
have a marginal US t pdf. Last, as is apparent from many examples cited,
the MS t is related to the MN and IG pdf's. Consider the joint pdf (B.52)
p(x, d0, V, 0 = g(xl 0, ', v) where g(x10, ,, V) denotes an m-dimensional
MN pdf with mean 0 and covariance matrix V-Zo and h(d0 denotes an IG
pdf with parameters v > 0 and s = 1 [see (A.37b)]; that is, (B.53) p(x, crl0,
V, v) k IVIV ( 1 [v + (x - 0)'V(x - 0)]}, = crm++ iexp PROPERTIES OF
SOME MULTIVARIATE PDF'S 389 where k is the normalizing constant.
Then, on integrating (B.53), with respect to , 0 to oo, we have (B.54) p(x10,
v, 0 = k'lvl[ + (x - 0)'V(x - 0)] -(m+,,,'., which is precisely in the form of
(B.20), with k' the normalizing constant. B.3 THE WISHART (W) Pdf The
m(m + 1)/2 distinct elements of an m x m PDS random matrix = {j} are
distributed according to the W distribution if, and only if, they have the
following pdf: (.55) where k- x p(AliZ, ,, m) = k izl,,. exp {- � tr Z-XA},
IAI > o, = 2 '%r'"(-x)t* 1-l[Lx P[( + 1 - i)/2], m < v, and Y. = [rqj], an m x m
PDS matrix. The pdf in (B.55) is defined for the region given by IAI > 0.
We denote the pdf in (B.55) by W(Y., v, m). Some properties of the W pdf
are listed below. 1. If ix, [.,..., [ are m x 1 mutually independent random
vectors, each with4 MN pdf, zero mean vector, and common PDS m x m
covariance matrix Y., the distinct elements of A = '5', where : = (., .,.,..., .,),
have a W(Y., v, m) pdf. Note that 55' = Y.= x [[' has diagonal elements
given by Y[__ x g,/', j = 1, 2,..., m, and off-diagonal elements given by Y.=
x 2sg{, j 4= k = 1, 2,..., m. Thus '/v = , the sample covariance matrix, and
the distinct elements of have a W [(1/05;, v, m] pdf. 2. The distinct elements
of a random matrix J, with the W(Y., v, m) pdf in (B.55), have the following
means, variances, and covariances: (B.56) E = (B.57) Var &j = v(% ' + and
(B.58) Cov(, at) = v(rhe% + cr,%e). Let us partition A and Y.
correspondingly as = { A= mx(Axx Ax., A_ \ A'x A''] ' m - mx\A.x A..] m -
m \Za

390 APPENDIX B where, in each instance, the (1, 1)th submatrix is of size
m x rn and the (2, 2)th submatrix is of size rn - rn x rn - rn. Further, let and
-1 -1 A11.. = (All - Ai.A.. A.l) -- A ll Z�g. = (Zn - Zl,.Z,.,.-lZ,.1) -1 = Z n.
Then the following properties of the W(Y, v, m) pdf are known to hold: 5 3.
The joint pdf for the distinct elements of Axx is W(Zxx, v, m 0. 4. The joint
pdffor the distinct elements of A.. is W [Zn.., v - (m - m0, 5. The marginal
pdf for r. = (B.59) h(rxalt, xa, v) = kx(1 - rxaa)('-a)/a(1 - pxo.�'),/�. with
kx = [(v - + i)], = and x5 f/ ay . (B.60) /(r) = (cosh y - tr)" (B.59) gives the
pdf for a sample correlation coefficient based on v pairs of observations
drawn independently from a bivariate normal pdf with zero means and 2 x 2
PDS covariance matrix Z. Property 1 is a fundamental relationship between
the MN pdf and the W pdf. It is established as follows? The joint pdf for v
normal, mutually independent rn x 1 vectors, ix, [a,..., iv, each with zero
mean vector and common PDS covariance matrix, Z, is (B.61) p(ZIS;, , m)
= 12;I (2rr),m/o. eXp {- � tr Z- xZZ'}, where Z = (z, z.,..., z0, an rn x v
matrix with rn < u. Now make the following transformation Z = TK, where
K is an rn x v matrix such that KK' -- Ira; that is, K is a semiorthogonal
matrix, and T is an rn x rn lower x$ The following well known properties
have been listed in S. Geisser, "Bayesian Estima- tion in Multivariate
Analysis," Ann. Math. Statistics, 36, 150-159 (1965). xo The "cosh"
function is defined by cosh u = (e u + e-U)/2. 't The derivation follows that
presented in S. N. Roy, Some Aspects of Multivariate Analysis. New York:
Wiley, 1957, p. 33. triangular matrix: PROPERTIES OF SOME
MULTIVARIATE PDF'S (B.62) T = ttxx 0 ... t21 t22 ' ' ' , tml tm9. ' ' ' mm
391 with ]T[ = 1-lP=x t, > 0. Note that KK' =Im places m(m + 1)/2
constraints on the elements of K. Thus there are really only urn - m(rn +
1)/2 inde- pendent elements in K. Choose an independent set of elements
from those in K, say (kxx, kxo., . . . , kx,,_. ), (k.x, k.., . . . , k..v_ 0, . . . ,
(kin, k., . . . , k.v_), and call this set K. Thus we can regard the
transformation Z = TK subject to KK' = Im as equivalent to the
transformation from the um elements of Z to the rn(rn + 1)/2 elements of T
and the vrn - rn(rn + 1)/2 elements of K. Then, on substituting Z = TK in
(B.61), we have (B.63) p(T, K,[Z, , m) = Jlzl (2rr)o. exp {- � tr 52- xrT'},
where J denotes the Jacobian of the transformation from the elements of Z
to those of T and K. To obtain an explicit expression for J we use the
following result TM: If y = f(x, x.,..., xv, xv + ,. �., xv+) for i = 1, 2,..., p,
where the xs's, j = 1, 2, ..., p + q, are subject to q constraints, f(xx, x.,..., xv,
xv + x, ...,x v+) = 0 for i=p + 1,p +2,...,p+q, then x� the JacobJan J
associated with the transformation from xx, x,.,..., xv to yx, y,.,..., Yv is
(B.64) J = I (f'f"''"fv'fv+"'"fv+q) + I(fv+,...,fv+q) l :. :. 0(4+. In applying this
result to the present problem, Z = TK takes the place of y = f and KK' - Im
= 0 takes the place off = 0. Further, the elements of K are to be associated
with xx, xo.,..., xv, whereas the remaining elements of K, denoted Ko, are to
be associated with xv + x,..., xv + . Then the Jacobian in (B.60) is [ cq(Z,
_KK') o(gg'- Ira) The explicit expression for the numerator of (B.65) is a�
(B.66) [O(Z, _KK')] = 2 t;_x. This result from the calculus is presented in S.
N. Roy, op. eit., p. 165. o It is assumed that the usual conditions for the
existence of the Jacobian, including the nonvanishing of the numerator and
denominator in (B.64), are satisfied. =o Cf. Roy, op. cit., pp. 170-174.

392 APPENDIX B Thus (B.63) becomes 2mF-f rn (B.67) p(T, z, m, v) = x


(2rr)Vm,.lS;lW. e- tr Z-TT' + Since . equals ,a(KK')[ . a(KD) I:, dK +
]a(KK')/a(KD)I:t integrated over the region KK'= Ira, rtVm/'-rm(m-Xn/4/1-
-l[n=x F[(v- i + 1)/2], the marginal distribution of the elements of T is
(B.68) p(rlz, m, v) = c with exp {- -}2 tr Z- rr'} Now the sample covariance
matrix & with m(m + 1)/2 distinct elements, is given by vS = ZZ' = TKK'T'
= TT'. Transform (B.68), which involves m(m + 1)/2 elements of T, to a pdf
for the distinct elements of S. This yields ". l-I= t- v m(m + P(SlE, v,m)=c
iZiv,o. 2l-iI,=tff-+exp{-�trvZ-S} (B.69) - 2m exp{-�trvY-xS} -- el exp {-
� trv Z -S}, where c - F � (/2) m/' i= :t 2 The pdf in (B.69) is W [(1/v)Z,
v, m], as was to be shown. By a simple change of variables the pdffor A =
vScan be obtained from (B.69) and is W(E, v, rn), the pdf given in (B.55)?
04 Cf. Roy, op. cit., p. 197. aa The JacobJan of the transformation from the
elements of T to the distinct elements of S is v "{m+>tv' + (2'" 1-ln= t-+).
Also, in the second line of (B.69), from vS = TT', we use ITI = ISl -- ''lSl .
=a Note that we have defined J = 2' = vg, with the v columns of assumed to
be independent. Frequently we have an n x rn random matrix where t' = (1...
1), a 1 x n vector of ones and ?' = (:z,.= .... , :m), with each column of ' an n
x I vector. The rows of ' are assumed to be independently and normally
distributed, each with a I x rn mean vector ix' and common rn x rn PDS
covariance matrix Y:. Although the rows of are idependent, the rows of the
residual matrix are not. Writing ' =tix' + O, where 0 is an n x rn matrix
whose rows are inde- pendently and normally distributed, each with zero
mean vector and common PDS covariance matrix Y:, we have 0 = (In -
t(t'0-t')O and PROPERTIES OF SOME MULTIVARIATE PDF'S 393 The
formulas for the moments, (B.56) to (B.58), have been obtained in the
literature * from those of the elements of ', since J = ', with having the MN
pdf in (B.61)' and Then Cov(a, at) = E(a. - = + and, for i = k and j = l,
Var(&) = v(,,,, + %'). Property 3 is most easily shown by partitioning 2' =
(2x'2o.') where x is an rnx x v random matrix with columns independently
and normally dis- tributed with zero vector mean and rnx x mx PDS
covariance matrix Zxx. Then, using (1), Txx = 2x2x' has a W(Zxx, v, rnx)
pdf. In the case that rnx = 1, W(*xx, v, 1) is in the form of a univariate
gamma (G) pdf, which, of course, can be transformed to a X �' pdf. Thus
the W pdf can be viewed as a multivariate generalization of the univariate G
pdf. To prove Property 4 let us write ,,T = 17'17, where 17 is an v x m(m <
v) random matrix with rows independently and normally distributed, each
with where In - t(t't)-xt ' is idempotent, with rank n - 1. Now let 0 = LI7,
where L is an n x n nonsingular orthogonal matrix such that L'[ln- t(t't)-
xt']L = Then 0'0 = P'L'[In - t(t't)-xt']Ll 7 = 17x'17, where 17'= (17x'ii n );
that is, 17x is an (n - 1) x rn matrix formed from 1 by deleting the last row.
Since the n - 1 rows of 17x are independently and normally distributed,
each with zero vector mean and common covariance matrix 1 (Note: EI7'P
= EO'L'LO = EO'O = 1 � In), it satisfies the con- ditions of Property 1.
Thus 0'0 = 17x'Fx has a W(Y., v, m) pdf with v = n - 1. 04 See, for example,
T. W. Anderson, An Introduction to Multivariate Statistical Analysis. New
York: Wiley, 1958, p. 161. The fourth-order moments required in the
derivation are given on p. 39 of Anderson's book.

394 APPENDIX B zero vector mean and common m x m PDS covariance


matrix Y. Then, partitioning 17 = (fi i fi.), we have where 17'17 and if,' . F.
are of size rn x m and rn. x rn., respectively, where m + rn. = m. Then Now,
given 17., if we let 17 = L, where L is a v x v orthogonal matrix such that '5
we have from the third line of (B.70) ,.a = 2'L'tL - Pa(P.' Pa) - Pa']L2 where
2xa is a v - m. x m submatrix of 2; that is, 2' = Therefore ,n.,. can be
expressed as where the rows of 2 are independently and normally
distributed, each with zero mean vector and covariance matrix Zn.a = Zn-
ZxaZa.-xZo.x. ae Thus, using Property 1, -'n.o. has a W(Zn.o., , - m., mO
pdf, where m - m = mo.. Property 5 is derived from the pdf for ,n = {&j} i,j
= 1, 2; that is, p(an, a,.,., axo.IZn, v), which is W(Z, 2, v), by expressing it
in terms of a.o., and r = axo./(anao.o.)A and integrating out an and ao.o.. '? '
For given l., Iv - l.(o.'P.)-xl.' is an idempotent matrix of rank v - m.. Thus it
has i, - rn. roots equal to 1 and rn. equal to 0. ' From = L2, E2x = 0. Further,
given la, E2x'L'L2x = Zx.a � Iv, where Zxx.. is the covariance matrix of
the rnx elements of any row of P, given the mo. elements in the
corresponding row of lo.. '7 See, for example, T. W. Anderson, op. cit., pp.
68-69, for details. Since the function Iv(0r) in (5) can be expressed as a
hypergeometric function, the pdf for ro. can be expressed as a rapidly
converging series; that is h(ro. lp., v) = (v - 1)I'0, ) (1 - r.o.)c-a>'(1 - v'5-; r(
+ �) (1 - where $v(pra) denotes the hypergeometric function, F(�, �; v +
�; (1 + rap)/2), which PROPERTIES OF SOME MULTIVARIATE PDF'S
395 B.4 THE INVERTED WISHART (IW) Pdf The m(m + 1)/2 distinct
elements of an rn x m PDS random matrix (J follow the IW distribution if,
and only if, they have the following pdf.s: (B.71) p(GlH, > 0, where k - =
2"/arr(,->/ I-i?= P[(v + 1 - i)/2], v > m, and H is an rn x rn PDS matrix. The
IW pdf is defined as (B.71) in the region IG I > 0 and zero elsewhere. If the
elements of (J have the pdf (B.71), we say that they have an IW(H, v, rn)
pdf. Some properties of (B.71) follow: 1. The joint pdf for the m(rn + 1)/2
distinct elements of G - = A is a W(H- , v, m) pdf. 2. Let G and H be
partitioned correspondingly as ml m 2 ma\Ga Gaa] H=\Ha Haa] where mx+
mo. = m. Then the joint pdf for the m(mx + 1)/2 distinct elements of Gn is
an IW(Hn, v - mo., tax) pdf. 3. When in (2) Gn is a scalar, say gxx, the pdf
for gn is as (B.72) P(gnlhn, v', m 0 = k g+a)/a e-%lan, 0 < g, with k =
(hn/2)'/a/r(,/2), with ' = - m + 1. 4. By virtue of (B.72) the moments of the
diagonal elents of can be obtained from those associated with a univariate
inverted gamma pdf. Property 1 is a fundamental one connecting the W and
IW pdf's. To establish it we -require the Jacobian of the transformation of
the m(m + 1)/2 distinct elements of G to the m(m + 1)/2 distinct elements of
A = G -. The is given by [r()] 2 il r(v + + i) 1=0 ,+ 8 -+2(+)(+) +.... See, for
example, H. Jeffreys, Theory ofProbabifty (3rd ed.), Oxford: Clarendon,
1961, p. 175, for the details of exprsing Iv(pt) in terms of a hypergeometric
function. 2 In the analysis of the multivariate regression model with a
diffuse prior pdf in Chapter 8 we found that the posterior pdf for the
disturbance covariance matrix is given by P(ZJY) m JZJ -v:/2 exp (- trZ-S).
This is in the form of (B.71), as can be seen by letting G = , H = S, and v' =
v + m + 1. Thus p(Jy) is an IW pdf. 2o Equation B.72 can be obtained from
(A.37b) by letting g = ,2 and h = vs 2. Of course, the positive square root of
g will have a pdf precisely in the form of (A.37b).

396 a,P�NDtX B Jacobian of the transformation is [A[- <' + x>oo and thus
(B.71) can be expressed in terms of A = G-X' p(AIH, , m) = k :i,+i exp {- i
tr AH}, IAI > 0. (B.73) lml,ll exp {- tr AH), IA[ > 0, with k given in
connection with (B.71). If, in (B.73), we define Z -x = H, it is seen that
(B.73) is in precisely the form of the W(Z, v, m) pdf shown in Property 2
can be established by noting that Gxx = (Axx - AxA-XAx)-x = A7. As
shown in the preceding section, if A has a W pdf, Axx. also has a W pdf.
Then Property 1 of the IW pdf can be employed to obtain the pdf for Gxx =
A; from that for Axx.. Given this result, the special case in which Gxx is a
scalar, gn, leads to the result in (B.72) which is Property 3. Property 4 is an
immediate consequence of Property 3. B.5 THE GENERALIZED
STUDENT t (GSt) The pq elements of a p x q random matrix, = {iru}, have
the GSt dis- tribution if, and only if, they have the following pdfa': i (B.74)
P(TIP , Q,n) = k iQ + T,PTl,t,., -oo < tj < oo, where k - = rr "t" I-IL-x P[(n -
p - i + 1)J2]Jl-I[-x P[(n - i + 1)J2], n > p + q - 1, and P and Q are PDS
matrices of sizes p x p and q x q, respec- tively. For convenience we denote
the pdf in (B.74) as T(P, Q, O, n), where ao To show that the Jacobian is
IAI-(re+x> write AG = I. Then (cA/cO)G + A(cG/cO) 0 or aG/00 = -
G(OA/OO)G. If 0 = au, we have ag/aau = -gg for ,/, i, and j = 1, 2,..., rn,
with/ < and j <_ i, since G and A are symmetric matrices and the trans-
formation from the elements of G to those of A involves just rn(rn + 1)/2
distinct elements of G. On forming the Jacobian matrix and taking its
determinant, we have IG] '+x = ]AI -('+x). See T. W. Anderson, op. tit., p.
162, for a derivation of this Jacobian which relies on properties of the W
pdf. Also on pp. 348 to 349 Anderson provides the Jacobian of the
transformation G-- A = G -x for the general case in which G and A are not
symmetric, a result not applicable to the present case in which G and A are
symmetric. ax Some call this pdf the matrix t pdf. See J. M. Dickey,
"Matricvariate Generalizations of the Multivariate t Distribution and the
Inverted Multivariate t Distribution," Ann. Math. Statistics, 38, 511-518
(1967); S. Geisser, "Bayesian Estimation in Multivariate Regression," cit.
supra; G. C. Tiao and A. Zellner, "On the Bayesian Estimation of
Multivariate Regression," tit. supra; and the references cited in these works
for further analysis of this pdf. a*. The pdf in (B.74) was encountered in
Chapter 8 in connection with the analysis of the multivariate regression
model. With a diffuse prior pdf the posterior pdf for the regression
coefficients was found to be p(B[ Y) oc JS + (B - ,)'X'X(B - 2)J-,,i9.. If we
let S - 12, P X'X, and T = B --/}, p(BJ Y) is exactly in the form of (B.74).
PROPERTIES OF SOME MULTIVARIATE PDF'S 397 0 appears to denote
that the mean of (B.74), by symmetry, is a zero matrix. Some properties of
(B.74) follow: 1. The pdf in (B.74) can be obtained as the marginal
distribution of p(G, T) = px(G)p.(TIG ), where px(G) denotes an inverted
Wishart pdf and p.(TIG ) denotes a multivariate normal pdf. Let qx q. Px P.
and with qx + q. = q and Px + P. = P and These quantities appear in the
properties of (B.74). �� 2. If T =' (Tx, T0, the conditional pdf for Tx,
given T., is a GSt pdf with parameters (p-x+ TaQaa-XTa,)-x, Qxx.a, T.Qaa-
XQax, n. The mean is Ta Qo. a - X Q ax. 3. If T' = (Xx, XO, the conditional
pdf for Xx, given Xa, is a GSt pdf with parameters Px, Q + Xa'Pa..X., Px-
xPxaXa, n. The mean is Pxx - Px .X'.. 4. If T = (Tx, T0, with Txp x qx and
T.p x q., the marginal pdf for is GSt with parameters P, Q,.,., 0, n - 5. If T' =
(Xx, X0, with Xxpx x q and X,.p,. x q, the marginal pdf for X. is GSt, with
parameters P=,..x, Q, 0, m - p. 6. If in (2) Tx is a p x 1 vector, the
conditional pdf for T, given T., is in the multivariate Student t form.
Similarly, if in (4) T. is a p x 1 vector, it has a marginal pdf in the
multivariate Student t form. 7. With T = (h, t=,..., t), (B.75) p(T) -
P(h)P(t4h)p(tolh, and each of the pdf's on the rhs of (B.75) is in the form of
a multivariate Student t pdf. To establish Property 1 write the IW pdf for the
q(q + 1)/2 distinct elements of G as (B.76) Iw(GIQ,,,,p) = exp{- � tr aa
Some of the following properties established in Chapter 8 draw on the
results in papers by J. M. Dickey, S. Geisser, and G. C. Tiao and A. Zellner,
cited in the preceding footnote.

398 A?ENmX B where Q is a q x q PDS matrix and k is the normalizing


constant, and the multivariate normal pdf for the elements of T, a p x q
matrix, given G, as � (B.77) MN(TIG, P) =/,.lPIq�-IG 1-,,2 exp {- k tr
T'PTG-}, where P is a p x p PDS matrix and kv. is a normalizing constant.
Then the joint pdf for the distinct elements of G and those of T, p(G, T), is
the product of (B.76) and (B.77); that is, (B.78) p(, T) = kk. iGl,++ + >,.
exp (- tr (Q + T'PT)G-}. Now note from the properties of the IW pdf that
(B.79) f IQ + T'PTI (+>0' 1 o>o I1 "+'++>0' exp{-�tr(Q + T'PT)G-}dG =
o' where ko is the normalizing constant of the IW pdf, p(G I Q + T'PT, v +
p, q). Using (B.79), we find that the integration of (B.78) with respect to the
elements of G over the region IGI > 0 yields (g.80) p(Y) = k&o. I Ql,.7, p, .
ko I Q + T'PT] {'+')0" -oo < ti < If we let v = n - p, we see that (B.80) is in
precisely the form of (B.74), a GSt pdf? Property 2 is most easily
established if we note that the GSt pdf can be written in the following
alternative form��: (B.81) p(rlP, Q,n) = k [p_ + TQ_T,i,t2, -oo < t{ <
Then P- + rQ-r'= - + (rT)[Q QWtT'/ = p- + rQr' + rQr' + rQr' + = p- + Ta[Q -
Qa(QU)-Qa]T ' + [T + TaQ(Qn)-]Qn[T + TaQa(Q)-] , -P- + rQ-r' + (r - - -
aQaa Qa)Q.a x (T- TaQaa-Qa) ', o Note that with T = (h" � t0, we can write
the MN pdf for the elements of T, given G as MN(TIG, P) = ko.[G- � P[
exp [-�(h', t2',..., t')G - � P(t'&',...,t')'] and that IG- :� PI� = IPI/o'IG I -
to.; see, for example, T. W. Anderson, op. cit., p. 348. a, We could transform
(B.76) to a Wishart pdf for ,4 = G -: and obtain (B.80) as the marginal pdf
of a Wishart pdf times a conditional MN pdf for T given ,4. ao See J. M.
Dickey, op. cit., p. 512. PROPERTIES OF SOME MULTIVARIATE PDF'S
where Q% i,j = 1,2, are submatrices of Q- x and Q.X(Qn)- = _ On
substituting this result in (B.81), we have p(Tx, TI Q, P, n) = k Il-(->/'IQI
(B.82) x i.p_ + TaQa_Ta, + (Tx - TQaa-Qa)Qna x (Tx- TQaa-XQx)'] From
(B.82) it is apparent that the conditional pdf for Tx, given T:, is in the GSt
form (B.81) with parameters (P-+ TQ::-XT:')-x, Qn., and Ta Qa- x Qx, n,
where the conditional mean is given by T Qaa- x Qa. Further, the marginal
pdf for Ta is obtained from (B.82) by integrating with respect to the
elements of Tx. Note that (B.82) can be expressed as p(Tx, TalQ, p,n) m
[]p-x + IP - + TQ T I->,1Qn.I-/ (B.83) x ]p_ + TaQa_Ta, + (Tx- TaQaa
Qa)Qx.a and that on integrating with respect to the elements of T the
second factor integrates to a numerical hctor. Thus the marginal pdf for the
elements of Ta is proportional to the first hctor on the rhs of (B.83) which is
in the GSt form with parameters P, Qa, 0, n - qx. This is Property 4.
Properties 3 and 5 can be established in the same way as Properties 2 and 4.
However, in proof of the former two properties, it is convenient to use the
form of the GSt pdf in (B.74). Property 6 follows from Property 4. Also, it
and Property 7 have been demonstrated in the text of Chapter 8. 399 Q..

APPENDIX C FORTRAN Programs for Numerical Integration Frequently


in statistics and other branches of applied mathematics we encounter
definite integrals which are either difficult or impossible to integrate
analytically. When this happens either of two procedures may be used.
First, the integrand may be approximated by a series expansion and
integration done termwise or, second, the integration may be done
numerically by the use of the trapezoidal rule, Simpson's rule, or gaussian
quadrature. The purpose of this writeup is to explain FORTRAN programs
for numerical integration of univariate and bivariate integrals that use
Simpson's rule. Simpson's rule is chosen because it combines reasonable
accuracy with simplicity. C.1 UNIVARIATE INTEGRATION General Let
f(x) be a function of x defined, continuous, and non-negative on the closed
interval a _< x _< b. The object of discussion is = that is, the area under the
curve f(x) and above the x-axis between a and b. We may transform f(x)
into a proper density function by p(x) = f(x). Then This appendix was
prepared by Martin S. Geisel. 400 FORTRAN PROGRAMS FOR
NUMERICAL INTEGRATION 401 A is called the normalizing constant
for f and is one of the quantities we may wish to find by numerical
integration. We may also wish to know various moments of the distribution
and the mathematical expectations of functions of x; for example, (mean), t.
= f (x - tx') ' p(x) dx (variance). The procedure described below is
applicable to all these problems. Simpsoh's Rule Divide [a, b] into n equal
parts, where n is an even integer. Label the end- points of the subintervals
x0, xx, x.,..., x_x, x and let y{ = f(x), i = 0, 1, 2,..., n. Consider the first three
points x0, xx, x. and the corresponding points on the curve y = f(x). If the
points on the curve are not collinear, there is a unique parabola g(x) with
axis parallel to the y-axis which passes through the three points. Write its
equation g(x) = a + b(x - xO + c(x - xO ', where a, b, and c are chosen so
that the three points lie on the parabola. Let Ax = x, -- x,_ i -- 1, 2,..., n.
Then, if Y0, Yx, Y. lie on the parabola, we have Y0 =a--bAx+c(Ax) ., Yl m
a, y. = a n u b Ax + c(Ax) a or b Ax + c(Ax) a = Ya - Yx, -b Ax + c(Ax) ' =
Yo - Yx, 2c(Ax) ' = Yo - 2yx + y.. Using the parabola as an approximation
to f(x) in Ix0, x.], we obtain ff(x) dx -' .f; [a + b(x - xO + c(x - x0 '] dx - [ax
+ �b(x - xx) ' + �c(x - xO = 2a Ax + c(Ax) � Ax = 3 (Yo + 4yx + y.),
wherein the values for a and c, shown above, have been used.

402 Repeat the procedure results to get Ax ff(x) dx -' 3" (Yo + 4y + 2y. +...
+ 2y,_. + 4y,_ + y,). This is Simpson's rule. APPENDIX C for [x., x4], [x4,
x0],..., [x_.,x] and sum the FORTRAN Programs for Simpsoh's Rule 1.
SimpsoWs Rule 1 In this program the user specifies the function to be
integrated by means of a function-defining statement. He enters the limits of
integration on a data card along with the value of a parameter which we call
TOL. The program then computes a first approximation to the integral and
then does Simpson's rule for 2, 4, 8, 16,... subintervals (at each step, only
the functional values corresponding to the new x-values are actually
computed). After a specified number of subintervals has been reached (16
in the example below) the result from the current computation of Simpson's
rule is compared with the result of the preceding computation. If the
absolute difference between the two is less than the value of TOL,
computation stops and the answer is printed out. If not, the number of
subintervals is doubled and Simpson's rule is computed again and the
procedure is repeated. If the user is not certain of the limits of integration
(e.g., he is approximating (-o% oo) by a finite interval), he simply enters
another data card which specifies the new limits of integration. The obvious
advantage to Simpson's Rule I is that it provides a good idea of the accuracy
of the results for most functions. Also, the ease of changing the limits of
integration is desirable at times. Furthermore, it requires very little
programming effort on the part of the user. On the other hand, use of the
function-defining statement makes it difficult to change the function within
a given run of the program (parameters of the function may, of course, be
changed as shown in the example below). Also, computer time adds up
rapidly as the number of subintervals increases. Finally, this procedure does
not extend easily to bivariate or multivariate integration.' Simpson's Rule I
is illustrated in the computation of the area under several normal densities.
Program statements are shown in Table C.1. 1. After the user's ID card
comes the XEQ card and the DIMENSION statement in which all arrays
are declared. 2. The next statement, which must appear before any
executable state- ments, is the function-defining statement. The following
rules are pertinent to construction of function-defining statements.
FORTRAN PROGRAMS FOR NUMERICAL INTEGRATION 403 (a)
The name of the function must have four to seven characters, the last of
which is F and the first of which is a letter other than X. The name must not
be the same as any of the computer's built-in functions. (b) There may be as
many arguments as the user wishes but they may not be subscripted in this
statement. (c) The following are functions on most computers that may be
useful: SQRTF(X) = LOGF(X) -- loge x EXPF(X) -- e NEXPF(X) = e -
ABSF(X) = Ixl GAMMAF(X) = P(x) LGAMAF(X) = loge P(x) FLOATF(I)
converts a number without a decimal to one with a decimal. Consult a
manual for the particular computer that is being used for information
regarding limitations on the arguments for these functions. 3. Statements I
to 4. Statement 1 instructs the computer to print 'the heading in Statement 2.
Statement 3 instructs the computer to read values of the variables listed
from a data card. They are punched according to the FORMAT of Statement
4. 4. Statements 5 to 8. These statements define the values of the arguments
(other than X) of FUNCF. 5. Statement 9. This statement calculates the
width of the subinterval for the first iteration. 6. Statements 10 to 15. These
statements perform the initial computa- tions for Simpson's rule. 7.
Statements 16 to 20. These statements finish the first computation of
Simpson's rule. 8. Statements 21 and 22. Statement 21 doubles the number
of sub- intervals and Statement 22 halves the width of the subinterval. 9.
Statement 23. This statement instructs the computer to repeat the above if
(the new value of) N is less than 16 and to go on if N is greater than or
equal to 16. 10. Statements 24 to 26. These statements compute Simpson's
rule again. 11. Statement 27. This statement tells the computer to continue
to State- ment 28 if N is less than or equal to 4000 but to go to Statement 39
if N is greater than 4000. This sets a limit on the amount of computation to
be done. 12. Statement 28. This statement computes the current result of
Simpson's rule (actually, three times the result). 13. Statement 29. In this
statement the difference between the current result of the Simpson's rule
computation and the preceding result is compared.

FORTRAN PROGRAMS FOR NUMERICAL INTEGRATION 405 z o


OOOOOOZ

00 z FORTRAN PROGRAMS FOR NUMERICAL INTEGRATION 407

408 APPENDIX C with TOL. If the difference is greater than TOL, the
computer goes to Statement 30. If it is less than or equal to TOL, it goes to
Statement 35. 14. Statements 30 to 34. These statements perform the initial
computations for the next computation of Simpson's rule. 15. Statements 35
to 38. Statement 35 computes the final result and State- ment 36 prints out
the results according to the FORMAT of Statement 37. Statement 38 sends
the computer to Statement 41, which is the last statement in the DO-loops
started in Statements 5 and 7. 16. Statements 39 to 40. These statements
print out the results (D(1) = three times the last result of Simpson's rule) in
the event that N is greater than 4000. 17. Statement 42. This statement
sends the computer back to Statement 3 which instructs the computer to
read another data card and proceed. If there are no more data cards,
computation stops. 18. Data Cards. These cards are punched according to
FORMAT Statement 4. The second card provides a check on the
computations done with the first data card by computing the tail area. 2.
Simpson's Rule H In this program the computation of Simpson's rule is
performed by the subprogram FUNCTION FUNC1 (UP, SL, MM).
Whenever FUNC1 is used, the user specifies the upper limit of integration
(UP), the lower limit (SL), and the number of subintervals (MM). The user
writes his own main program to compute the values of the integrand at the
endpoints of all the subintervals and to store them in an array (called W in
the example below). After the computation of the W array is completed the
next statement is of the form ANSWER = FUNC1 (.,., .). This is all that is
required to perform the computation of Simpson's rule. If another integral is
desired, the values of this integrand at the endpoints of its subintervals are
computed and placed in the W array and FUNC1 is used again. Naturally
the values of UP, SL, and MM may be changed each time. The main
advantage of Simpson's Rule II is that several different integrals may be
computed easily in one program. Also, as shown below, it can be extended
easily to multivariate integrations. However, it requires more pro-
gramming on the part of the user and within a single run provides no idea of
the accuracy of the results. Simpson's Rule II is illustrated in the
computation of the area under several normal densities and its first and
second moments. Program statements are shown in Table C.2. 1. After the
XEQ card and the DIMENSION statement the statement COMMON W is
included. It occupies a similar position in the FUNC1 FORTRAN
PROGRAMS FOR NUMERICAL INTEGRATION 409 subprogram. This
tells the computer that W in the subprogram is the same array as in the main
program. 2. Statements 1 and 2. Statement 1 prints out the title shown in
FORMAT Statement 2. 3. Statements 3 to 7. Values of SIGMA and XBAR
are computed here as well as the constant of the integrand� 4. Statements
8 to 12. A loop is set up to compute the various values of Z, the variable of
integration, and to compute the various integrantis. In Statement 9 we have
picked the lower limit of integration Z = -30. and the width of the
subinterval = . 30. The upper index of the DO (Statement 8) tells us the
number of subintervals (201 - 1 = 200)� These facts imply that the upper
limit of integration is + 30. 5. Statement 13. FUNC1 is used to compute the
area. 6. Statements 14 to 19. The values of the X (first moment) array are
placed in the W array and FUNC1 performs the integration for the first
moment (Statements 14 to 16). Similarly, the computation for the second
moment is done. 7. Statements 20 and 21. The results are printed out.
Statement 20 ends the loops started in Statements 3 and 5. After Statement
20 is completed, the computer returns to Statement 5 and the Statements 7
to 20 are executed again for the new value of ZBAR. After this is
completed, it returns to Statement 3 and does Statements 5 to 20 for the new
value of SIGMA. When this is done the computation stops and the program
ends via CALL EXIT and END. 8. FUNCTION FUNC1 subprogram
follows the main program as shown. Suppose we wished to check the
accuracy of our results by doubling the number of subintervals. The way the
program is now written we have lost the original entries in the W array and
would thus have to recompute them. We still,have the values of X and Y,
however. We can do the following: set up new arrays, XX and YY. Transfer
X(I) to XX(2 � I) and Y(I) to YY(2 � I). Then we.need only compute
XX(3), XX(5),... XX(401) and YY(3), YY(5), � .. YY(401). (i.e., only the
new values) and perform the integration by trans- ferring XX and YY to W.
The same procedure could be followed for the original values in the W
array by transferring them to another array. C.3 BIVARIATE
INTEGRATION General Here the problem is of the form

FORTRAN PROGRAMS FOR NUMERICAL INTEGRATION 411 + # + o


oo o .... I 0 000 0 � � m II oo o II o Z o o II o o o II II o II
ooooooooZoZoZ=oZ>o z o o O0 0 0 0

O O O O OOOOOO OOOOO I I I IIII 0 0000 000 000 0 I I I II II Z


.................... FORTRAN PROGRAMS FOR NUMERICAL
INTEGRATION 413 In calculus the above integral is evaluated by iterated
integration; that is, V-f(fif(x,y)dx)dy, where x, x. in general depend on y.
Here we use the same procedure; that is, we use Simpson's rule twice. The
objects of interest may be a proper density function p(x, y)= (1/V)f(x, y),
marginal density functions, and various moments. FORTRAN Program The
program used is a straightforward extension of Simpson's Rule II. We now
have a grid of points (x, yy) corresponding to the endpoints of the x-
subintervals and the endpoints of the y-subintervals. The procedure used to
find a volume is to fix x and evaluate the integrand at all y and use
Simpson's rule (FUNC1) to compute this integral. This is done for all x.
Application of Simpson's rule to these results then yields the desired
volume. We illustrate with a program for finding marginal posterior
densities of two parameters B and G, given some sample evidence. See the
listing in Table C.3. 1. Computation and definition of constants used in
evaluating the integrand. Here, some of the contants are computed from
data and read into the computer. 2. Definition of "current" B value (indexed
by I). 3. Definition of "current" G value (indexed by J). 4. Evaluation of
integrand for one G value with the value of B held constant. . Storage of
this value in a two-dimensional array P(I, J). On each "pass" an element
corresponding to the current B and G values is added to this array and
computer returns to "DO 10 ..." and repeats (3), (4), and (S) until an entire
row in P(I, J) has been computed, that is, does all values of G. 6. For given
B, Simpson's rule is used to compute the univariate integral over G and the
computer returns to "DO 20... ", repeating the above for the next value of B.
7. The results of the preceding integrations are put into the W array. 8.
Integration over B is performed to find the volume. 9. PB (J) is transformed
into a proper marginal density. 10. The first moment of this distribution is
found by use of Simpson's rule on elements of the form x.p(xO. 11.
Similarly, the variance is computed. 12. The results are printed. 13. In a like
manner the marginal posterior density of G is found by

414 APPENDIX C transferring a column of P(I, J) to the W array and using


Simpson's rule. Similarly, the mean and variance are found and the results
printed. 14. FUNCTION FUNC1 (UP, SL, MM)--Simpson's rule
computation. 15. Input data. C.4 COMMENTS The IBM 7094 computer,
for example, has approximately 32,000 storage locations. This places
definite limitations on the size of the P(I, J) array. If a plot of the bivariate
density is not desired (no printout for it was provided in the program
above), it is possible to do away with the P(I, J) array and thereby greatly
increase the maximum number of intervals the computer can handle.
Bibliography Aigner, D. J., and A. S. Goldberger, "On the Estimation of
Pareto's Law," Workshop Paper 6818, Social Systems Research Institute,
University of Wisconsin, Madison, 1968. Anderson, T. W., An Introduction
to Multivariate Statistical Analysis. New York: Wiley, 1958. Ando, A., and
G. M. Kaufman, "Bayesian Analysis of Reduced Form Systems,"
manuscript, MIT, 1964. , "Bayesian Analysis of the Independent
Multinormal Process--Neither Mean nor Precision Known," J. Am. Statist.
Assoc., 60, 347-353 (1965). Anscombe, F. J., "Bayesian Statistics," Am.
Statist., 15, 21-24 (1961). Aoki, M., Optimization of Stochastic Systems.
New York: Academic, 1967. Arrow, K., H. Chenery, B. Minhas, and R.
Solow, "Capital-Labor Substitution and Economic Efficiency," Rev. Econ.
Statist., 43, 225-250 (1961). Barlow, R., H. Brazer, and J. N. Morgan,
Economic Behavior oft,he Affluent. Washington, D.C.: Brookings
Institution, 1966. Barten, A. P., "Consumer Demand Functions under
Conditions of Almost Additive Preferences," Econometrica, 32, 1-38
(1964). Bartholomew, D. J., "A Comparison of Some Bayesian and
Frequentist Inferences," Biometrika, 52, 19-35 (1965). Bartlett, M. S., "A
Comment on D. V. Lindley's Statistical Paradox," Biometrika, 44, 533-534
(1957). Bayes, Rev. T., "An Essay Toward Solving a Problem in the
Doctrine of Chances," Pln'l. Trans. Roy. Soc. (London), 53, 370-418
(1763); reprinted in Biometrika, 45, 293-315 (1958), and Facsimiles of Two
Papers by Bayes (commentary by W. Edwards Deming). New York: Hafner,
1963. Bellman, R., Introduction to Matrix Analysis. New York: McGraw-
Hill, 1962. , and R. Kalaba, "Dynamic Programming and Adaptive
Processes: Mathematical Foundations," reprinted from IRE Trans. Altt.
Control,. AC-5 (1) (January 1960), in R. Bellman and R. Kalaba (Eds.),
Selected Papers on Mathematical Trends in Control Theory. New York:
Dover, 1964, pp. 195-200. Boot, J. C. G., and G. M. de Wit, "Investment
Demand: An Empirical Contribution to the Aggregation Problem," Intern.
Econ. Rev., 1, 3-30 (1960). Box, G. E. P., and D. R. Cox, "An Analysis of
Transformations," J. Roy. Statist. Soc., Series B, 26, 211-243 (1964). 415

416 BIBLIOGRAPHY Box, G. E. P., and W. J. Hill, "Discrimination


Among Mechanistic Models," Techno- metrics, 9, 57-71 (1967). Box, G. E.
P., and G. C. Tiao, "A Further Look at Robustness via Bayes Theorem,"
Biometrika, 49, 419-433 (1962). , "Multiparameter Problems from a
Bayesian Point of View," Ann. Math. Statist., 36, 1468-1482 (1965).
Brown, P. R., Some Aspects of Valuation in the Railroad Industry,
unpublished doctoral dissertation, University of Chicago, 1968. Carlson, F.
D., E. Sobel, and G. S. Watson, "Linear Relationships between Variables
Affected by Errors," Biometrics, 22, 252-267 (1966). Chetty, V. K.,
"Bayesian Analysis of Haavelmo's Models," Econometrica, 36, 582-602
(1968). , Bayesian Analysis of Some Simultaneous Equation Models and
Specification Errors, unpublished doctoral dissertation, University of
Wisconsin, Madison, 1966. , "Discrimination, Estimation and Aggregation
of Distributed Lag Models," manuscript, Columbia University, 1968. , "On
Pooling of Time Series and Cross-Section Data," Econometrica, 36, 279-
290 (1968). Chow, G., "Multiplier, Accelerator and Liquidity Preference in
the Determination of National Income in the United States," IBM Record
Rept., RC 1455 (1966). Cochran, W. G., "The Planning of Observational
Studies of Human Populations," J. Roy. Statist. Soc., Series A, Part 2, 234-
255 (1965). Copas, J. B., "Monte Carlo Results for Estimation in a Stable
Markov Time Series," J. Roy. Statist. Soc., Series A, No. 1, 110-116 (1966).
Cook, M. B., "Bivariate ,c-statistics and Cumulants of their Joint Sampling
Distribution," Biometrika, 38, 179-195 (1951). Cragg, J. G., "On the
Sensitivity of Simultaneous-Equations Estimators to the Stochastic
Assumptions of the Models," J. Am. Statist. Assoc., 61, 136-151 (1966).
"Small Sample Properties of Various Simultaneous Equation Estimators:
The Results of Some Monte Carlo Experiments," Research Memo 68,
Econometric Research Program, Princeton University, 408 pp., 1964.
Crockett, J., "Technical Note," in I. Friend and R. Jones (Eds.), Proceedings
of the Conference on Consumption and Savings, Vol. II. Philadelphia:
University of Pennsylvania, 1960. Cyert, R. M., and M. H. de Groot,
"Bayesian Analysis and Duopoly Theory," manuscript, Carnegie-Mellon
University, April 1968. Dickey, J. M., "Matricvariate Generalizations of the
Multivariate t Distribution and the Inverted Multivariate t Distribution,"
Ann. Math. Statist., 38, 511-518 (1967). Drze, J., "The Bayesian Approach
to Simultaneous Equations Estimation," Research Memorandum No. 67,
Technological Institute, Northwestern University, 1962. "Limited
Information Estimation from a Bayesian Viewpoint," CORE Dis- cussion
Paper 6816, University of Louvain, 1968. Feldstein, M. S., "Production
with Uncertain Technology: Some Econometric Implica- tions," manuscript,
Harvard University, 1969. Fieller, E. F., "The Distribution of the Index in a
Normal Bivariate Population," Biometrika, 24, 428-440 (1932).
BIBLIOGRAPHY 417 Fisher, F. M., "Dynamic Structure and Estimation in
Economy-wide Econometric Models," in J. S. Duesenberry'$t al., The
Brookings Quarterly Econometric Model of the United States. Chicago:
Rand-McNally, 1965, pp. 589-653. Fisher, W. D., "Estimation in the Linear
Decision Model," Intern. Econ. Rev., 3, 1-29 (1962). Frazer, F. A., W. J.
Duncan, and A. R. Collar, Elementary Matrices. Cambridge: Cambridge
University Press, 1963. Freimet, M., "A Dynamic Programming Approach
to Adaptive Control Processes," IRE Trans., AC-4, 2 (2), 10-15 (1959).
Friedman, M., and D. Meiselman, "The Relative Stability of Monetary
Velocity and the Investment Multiplier in the United States, 1897-1958," in
the Commission on Money and Credit Research Study, Stabilization
Policies. Englewood Cliffs, N.J.: Prentice-Hall, 1963. Fuller, W. A., and J.
E. Martin, "The Effects of Autocorrelated Errors on the Statistical
Estimation of Distributed Lag Models," J. Farm Econ., 44, 71-82 (1962).
Geary, R. C., "The Frequency Distribution of the Quotient of Two Normal
Variates," J. Roy. Statist. Soc., 93, 442-446 (1930). Geisel, M. S.,
"Comparing and Choosing Among Parametric Statistical Models: A
Bayesian Analysis with Macroeconomic Applications," unpublished
doctoral dissertation, University of Chicago, 1970. Geisser, S., "A Bayes
Approach for Combining Correlated Estimates," J. Am. Statist. Assoc., 60,
602-607 (1962). , "Bayesian Estimation in Multivariate Analysis," Ann.
Math. Statist., 36, 150-159 (1965). Goldberger, A. S., Econometric Theory.
New York: Wiley, 1964. Graybill, F. A., An Introduction to Linear
Statistical Models. New York: McGraw-Hill, 1961. Greville, T. N. E.,
"Some Applications of the Pseudoinverse of a Matrix," SIAM Relx, 2, 15-
22 (1960). , "The Pseudoinverse of a Rectangular or Singular Matrix and Its
Application to the Solution of Systems of Linear Equations," SIAM ReiJ.,
1, 38-43 (1959). Griliches, Z., "Distributed Lag Models: A Review Article,"
Econometrica, 35, 16-49 (1967). , G. S. Maddala, R. Lucas, and N. Wallace,
"Notes on Estimated Aggregate Quarterly Consumption Functions,"
Econometrica, 30, 491-500 (1965). Grunfeld, Y., "The Determinants of
Corporate Investment," unpublished doctoral dissertation, University of
Chicago, 1958. Haavelmo, T., "Methods of Measuring the Marginal
Propensity to Consume," J. Am. Statist. Assoc., 42, 105-122 (1947),
reprinted in Wm. C. Hood and T. C. Koopmans (Eds.), Studies in
Econometric Methods. New York: Wiley, 1953, pp. 75-91. Hadamard, J.,
The Psychology of lnIJention in the Mathematical Field. New York: Dover,
1945. Hanson, N. R., Patterns of Discouery. New York: Cambridge
University Press, 1958. Hartigan, J., "Invariant Prior Distributions," Ann.
Math. Statist., 35, 836-845 (1964). Hildreth, C., "Bayesian Statisticians and
Remote Clients," Econometrica, :31, 422-438 (1963).

418 BIBLIOGRAPHY Hildreth, C., and J. Y. Lu, "Demand Relations with


Autocorrelated Disturbances," Tech. Bull. 276. East Lansing, Mich.:
Michigan State University Agricultural Exper- iment Station, 1960.
Hobson, E. W., The Theory of Functions of a Real Variable and the Theory
of Fourier's Series, Vol. II. New York: Dover, 1957. Holt, C., J. F. Muth, F.
Modigliani, and H. A. Simon, Planning Production, Inventories and Work
Force. Englewood Cliffs, N.J.: Prentice-Hall, 1960. Hood, W. C., and T. C.
Koopmans (Eds.), Studies in Econometric Method. New York: Wiley, 1953.
James, W., and C. M. Stein, "Estimation with Quadratic Loss," in J.
Neyman (Ed.), Proceedings of the Fourth Berkeley Symposium on
Mathematical Statistics and Probability, Vol. I. Berkeley: University of
California Press, 1961. Jeffreys, H., Scientific Inference (2nd ed.).
Cambridge: Cambridge University Press, 1957. , Theory of Probability (3rd
ed.). Oxford: Clarendon, 1961 and 1966. Johnson, R. A., "An Asymptotic
Expansion for Posterior Distributions," Tech. Report No. 114, May 1967,
Department of Statistics, University of Wisconsin, Madison, Ann. Math.
Statist., 38, 1899-1906 (1967). Johnston, J., Econometric Methods. New
York: McGraw-Hill, 1963. Kakwani, N. C., "The Unbiasedness of Zellner's
Seemingly Unrelated Regression Equations Estimators," J. Am. Statist.
Assoc., 63, 141-142 (1967). Kendall, M. G., and A. Stuart, The Advanced
Theory of Statistics, Vol. I. London: Griffen, 1958. , The Advanced Theory
of Statistics, Vol. II. New York: Hafner, 1961 and 1966. Kenney, J. F., and
E. S. Keeping, Mathematics of Statistics: Part Two (2nd ed.). New York:
Van Nostrand, 1951. Kiefer, J., and J. Wolfowitz, "Consistency of the
Maximum Likelihood Estimator in the Presence of Infinitely Many
Incidental Parameters," Ann. Math. Statist., 27, 887-906 (1957). Kmenta, J.,
and R. F. Gilbert, "Small Sample Properties of Alternative Estimates of
Seemingly Unrelated Regressions," J. Am. Statist. Assoc., 63, 1180-1200
(1968). Koyck, L., Distributed Lags and Investment Analysis. Amsterdam:
North-Holland, 1954. Kullback, S., Information Theory and Statistics. New
York: Wiley, 1959. Le Cam, L. "Les Proprits Asymptotiques des Solutions
de Bayes," Publ. Inst. Statist., University of Paris, 7, 17-35 (1958). --, "On
Some Asymptotic Properties of Maximum Likelihood and Related Bayes
Estimates," Univ. Calif. Publs. Statist., 1, 277-330 (1953). Lindley, D. V.,
"The Use of Prior Probability Distributions in Statistical Inference and
Decisions," in J. Neyman (Ed.), Proc. Fourth Berkeley Syrup. Math. Statist.
and Probab., Vol. I, 1961, 453-468. , "Regression Lines and the Linear
Functional Relationship," J. Roy. Statistical Soc. (Supplement), 9, 218-244
(1947). , Introduction to Probability and Statistics from a Bayesian
Viewpoint, Part 2. Inference. Cambridge: Cambridge University Press,
1965. , "A Statistical Paradox," Biometrika, 44, 187-192 (1957).
BIBLIOGRAPHY 419 LindIcy, D. V., and G. M. E1-Sayyad, "The
Bayesian Estimation of a Linear Functional Relationship," J. Roy. Statist.
Sot., Series B, 30, 190-202 (1968). Lute, R. D., and H. Raiffa, Games and
Decisions. New York: Wiley, 1958. Madansky, A., "The Fitting of Straight
Lines When Both Variables are Subject to Error," J. Am. Statist. Assoc., 54,
173-205 (1959). Marsaglia, G., "Ratios of Normal Variables and Ratios of
Sums of Uniform Variables," J. Am. Statist. Assoc., 60, 193-204 (1965).
Miller, M. H., and F. Modigliani, "Some Estimates of the Cost of Capital to
the Electric Utility Industry, 1954-57," Am. Econ. Rev., 56, 333-391
(1966). Moore, E. H., GeneralAnalysis, Part I, Philadelphia: Mere. Am.
Phil. Soc., Vol. I, 1935. Moore, H., "Notes of Sculpture," in B. Ghiselin
(Ed.), The Creative Process. Mentor Book, 1952. Nagar, A. L., "The Bias
and Moment Matrix of the General k Class Estimators of the Parameters in
Simultaneous Equations," Econometrica, 27, 575-595 (1959). Nerlove, M.,
"A Tabular Survey of Macro-Econometric Models," Intern. Eeon. Rev., 7,
12%173 (1966). Neyman, J., and E. L. Scott, "Consistent Estimates Based
on Partially Consistent Observations," Econometrica, 16, 1-32 (1948).
Orcutt, G. H., and H. S. Winkour, Jr., "First Order Autoregression:
Inference, Estima- tion, and Prediction," Econometrica, 37, 1-14 (1969).
Pearson, K. (Ed.), Tables of the Incomplete Beta Function. Cambridge:
Cambridge University Press, 1948. , The Grammar of Science. London:
Everyman, 1938. Penrose, R., "A Generalized Inverse for Matrices," Proc.
Cambridge Phil. Sot., 51, 406-413 (1955). Plackett, R. L., "Current Trends
in Statistical Inference," J. Roy. Statist. Soc., Series A, 129, Part 2, 249-267
(1966). Popper, K. R., The Logic of Scientific Discovery. New York:
Science Editions, 1961. Prescott, E. C., Adaptive Decision Rules for Macro
Economic Planning, unpublished doctoral dissertation, Graduate School of
Industrial Administration, Carnegie- Mellon University, 1967. Press, S: J.,
"On Control of Bayesian Regression Models," manuscript (undated). , "The
t-Ratio Distribution," J. Am. Statist. Assoc., 64, 242-252 (1969). and A.
Zellner, "On Generalized Inverses and Prior Information in Regression
Analysis," manuscript, September 1968. Raiffa, H. A., and R. S. Schlaifer,
Applied Statistical Decision Theory, Boston: Graduate School of Business
Administration. Harvard University, 1961. Rao, C. R., "A Note on a
Generalized Inverse of a Matrix with Applications to Prob- lems in
Mathematical Statistics," J. Roy. Statist. Soc., Series B, 24, 152-158 (1962).
, Linear Statistical Inference and Its Applications. New York: Wiley, 1965.
Reichenbach, H., The Rise of Scientific Philosophy. Berkeley: University of
California Press, 1958. Reiersol, O., "Identifiability of a Linear Relation
Between Variables which are Subject to Error," Econometrica, 18, 375-389
(1950).

420 BIBLIOGRAPHY Richardson, D. H., "The Exact Distribution of a


Structural Coefficient Estimator," J. Am. Statist. Assoc., 63, 1214-1226
(1968). Roberts, H. V., "Statistical Dogma: One Response to a Challenge,"
Am. Statist., 20, 25-27 (1966). Rothenberg, T. J., "A Bayesian Analysis of
Simultaneous Equation Systems," Report 6315, Econometric Institute,
Netherlands School of Economics, Rotterdam, 1963. Roy, S. N., Some
Aspects of Multivariate Analysis. New York: Wiley, 1957. Samuelson, P.
A., "Interactions between the Multiplier Analysis and the Principle of
Acceleration," Rev. Econ. Statist., 21, 75-78 (1939). Savage, L. J.,
"Bayesian Statistics," in Decision and Information Processes. New York:
Macmillan, 1962. , "Subjective Probability and Statistical Practice," in L. J.
Savage et al., The Foundations of Statistical Inference. London and New
York: Methuen and Wiley, 1962, pp. 9-35. , "The Subjective Basis of
Statistical Practice," manuscript, University of Michigan, 1961. Sawa, T.,
"The Exact Sampling Distribution of Ordinary Least Squares and Two-
Stage Least Squares Estimators," J. Am. Statist. Assoc., 64, 923-937
(1969). Shannon, C. E., "The Mathematical Theory of Communication,"
Bell System Tech. J. (July-October 1948), reprinted in C. E. Shannon and
W. Weaver, The Mathematical Theory of Communication. Urbana:
University of Illinois Press, 1949, pp. 3-91. Simon, H. A., "Dynamic
Programming under Uncertainty with a Quadratic Criterion Function,"
Econometrica, 24, 74-81 (1956). Smirnov, N. V., Tables for the Distribution
and Density Functions of t-Distribution. New York: Pergamon, 1961.
Solow, R. M., "On a Family of Lag Distributions," Econometrica, 28, 393-
406 (1960). Stein, C. M., "Confidence Sets for the Mean of a Multivariate
Normal Distribution," J. Roy. Statist. Soc., Series B, 24, 165-285 (1962). ,
"Inadmissibility of the Usual Estimator for the Mean of a Multivariate
Normal Distribution, in J. Neyman (Ed.) Proceedings of the Third Berkeley
Symposium on Mathematical Statistics and Probability, Vol. I. Berkeley:
University of California Press, 1956. Stone, M., "Generalized Bayes
Decision Functions, Admissibility and the Exponential Family," Technical
Report 74, Department of Statistics, University of Wisconsin, Madison,
Ann. Math. Statist., 38, 618-622 (1967). Summers, R., "A Capital Intensive
Approach to the Small Sample Properties of Various Simultaneous Equation
Estimators," Econometrica, 33, 1-41 (1965). Swamy, P. A. V. B., Statistical
Inference in Random Coefficient Regression Models, unpublished doctoral
dissertation, University of Wisconsin, Madison, 1968. Theil, H., "A Note on
Certainty Equivalence in Dynamic Planning," Econometrica, 25, 346-349
(1957). "On the Use of Incomplete Prior Information in Regression
Analysis," J. Am. Statistical Assoc., 58, 401-414 (1962). , and J. C. G.
Boot, "The Final Form of Econometric Equation Systems," Rev. Intern.
Statist. Inst., 30, 136-152 (1962). BIBLIOGRAPHY 421 Theil, H., and A.
S. Goldberger, "On Pure and Mixed Statistical Estimation in Economics,"
Intern. Econ. Rev., 2, 65-78 (1961). Thornber, H., "Applications of
Decision Theory to Econometrics," unpublished doctoral dissertation,
University of Chicago, 1966. , "Bayes Addendum to Technical Report 6603
'Manual for B34T--A Stepwise Regression Program'," Graduate School of
Business, University of Chicago, September 1967. , "Finite Sample Monte
Carlo Studies: An Autoregressive Illustration," J. Am. Statist. Assoc? 62,
801-818 (1967). , "The Elasticity of Substitution: Properties of Alternative
Estimators," manu- script, University of Chicago, 1966. Tiao, G. C., and W.
Y. Tan, "Bayesian Analysis of Random-Effect Models in the Analysis of
Variance. I. Posterior Distribution of Variance Components," Biometrika,
52, 37-53 (1965). Tiao, G. C., and A. Zellner, "Bayes' Theorem and the Use
of Prior Knowledge in Regression Analysis," Biometrika, 51, 219-230
(1964). , "On the Bayesian Estimation of Multivariate Regression," J. Roy.
Statist. Soc., Series B, 26, 277-285 (1965). Tocher, K. D., "Discussion on
Mr. Box and Dr. Wilson's Paper" J. Roy. Statist. Soc., Series B, 13, 39-42
(1951). Welch, B. L., and H. W. Peers, "On Formulae for Confidence Points
Based on Integrals of Weighted Likelihoods," J. Roy. Statist. Soc., Series B,
25, 318-324 (1963). Widder, D. V., Advanced Calculus. Englewood Cliffs,
N.J.: Prentice-Hall, 1947. Wright, R. L., "A Bayesian Analysis of Linear
Functional Relations," manuscript, University of Michigan, 1969.
Zarembka, P., "Functional Form in the Demand for Money," Social Systems
Research Institute Workshop Paper, University of Wisconsin, Madison,
1966, J. Am. Statist. Assoc., 63, 502-511 (1968). Zellner, A., "An Efficient
Method of Estimating Seemingly Unrelated Regressions and Tests for
Aggregation Bias," J'. Am. Statist. Assoc., 57, 348-368 (1962). ,
"Estimators for Seemingly Unrelated Regression Equations: Some Exact
Finite Sample Results," J'. Am. Statist. Assoc., 58, 977-992 (1963). ,
"Bayesian Inference and Simultaneous Equation Econometric Models,"
paper 15resented to the First World Congress of the Econometric Society,
Rome, 1965. , "On Controlling and Learning about a Normal Regression
Model," manuscript, 1966, presented to Information, Decision and Control
Workshop, University of Chicago. , "On the Analysis of First Order
Autoregressive Models with Incomplete Data," Inter. Eton. Rev., 7, 72-76
(1966). "Estimation of Regression Relationships Containing Unobservable
Inde- pendent Variables," Intern. Econ. Rev., 11,441-454 (1970). (Ed.),
Readings in Economic Statistics and Econometrics. Boston: Little, Brown,
1968. , and V. K. Chetty, "Prediction and Decision Problems in Regression
Models from the Bayesian Point of View," J'. Am. Statist. Assoc., 60, 608-
616 (1965).

422 BIBLIOGRAPHY Zellner, A., and M. S. Geisel, "Analysis of


Distributed Lag Models with Applications to Consumption Function
Estimation," invited paper presented to the Econometric Society,
Amsterdam, September 1968, and published in Econometrica, 38, 865-888
(1970). , "Sensitivity of Control to Uncertainty and Form of the Criterion
Function," in D. G. Watts (Ed.), The Future of Statistics. New York:
Academic, 1968. Zellner, A., and D. S. Huang, "Further Properties of
Efficient Estimators for Seemingly Unrelated Regression Equations,"
Intern. Econ. Rev., 3, 300-313 (1962). Zellner, A., J. Kmenta, and J. Drze,
"Specification and Estimation of Cobb-Douglas Production Function
Models," Econornetrica, 34, 784-795 (1966). Zellner, A., and C. J. Park,
"Bayesian Analysis of a Class of Distributed Lag Models," Econometric
Ann. Indian Econ. J., 13, 432-444 (1965). Zellner, A., and N. S. Revanker,
"Generalized Production Functions," Social Systems Research Institute
Workshop Paper 6607, University of Wisconsin, Madison, 1966, Rev. Econ.
Studies, 36, 241-250 (1969). Zellner, A., and U. Sankar, "Errors in the
Variables," manuscript, 1967. Zellner, A., and H. Theil, "Three-Stage Least
Squares: Simultaneous Estimation of Simultaneous Equations,"
Econornetrica, 30, 54-78 (1962). Zellner, A., and G. C. Tiao, "Bayesian
Analysis of the Regression Model'with Auto- correlated Errors," J. Am.
Statist. Assoc., 59, 763-778 (1964). Author Index Aigner, D. J., 36, 37
Anderson, T. W., 156,233,235,393,394, 396,398 Ando, A., 75,233
Anscombe, F. J., 11 Aoki, M., 319,348,357 Aristotle, 2 Arrow, K., 3,169
Barlow, R., 37 Barnard, G. A., vii Batten, A. P., 224 Bartholomew, D. J.,
286 Bartlett, M. S., 303 Bayes, T., 11 Bellman, R., 231,348 Black, S., vii
Boot, J. C. G., 102,194,244 Box, G. E. P., vii, 27,46,162, 163,164,
165,167,168,169,173, 175, 178,179,307 Brazer, H., 37 Brown, P. R., 149
Carlson, F. D., 146 Chenery, H., 169 Chetty, V. K., vii,
108,169,233,258,259, 260,261,263,314,316,331 Chow, G., 350,351
Cochran, W. G., 3 Collar, A. R., 96 Cook, M. B., 112 Cooper, R. V., vii
Copas, J. B., 190 Cox, D. R., 162,163,164, 165,167,168, 169,
173,175,178,179 Cragg, J. G., 28, 287 Crockett, J., 145 Cyert, R. M., 327
de Finetti, B., 9, 11 de Groot, M. H., 327 423 Deming, W. E., 11 de Wit, G.
M., 102,244 Dickey, J. M., 233,396,397,398 Drze, J., vii, 177,257,267,289,
327,361 Duesenberry, J. S., 75 Duncan, W. J., 96 EbSayyad, G. M.,129,141
Feldstein, M. S., 327 Fieller, E. C., 128 Fisher, F. M., 75 Fisher, R. A., 3, 8
Fisher, W. D., 321,329,330, 331,332 Frazer, R. A., 96 Freimet, M., 348
Friedman, M., 123,261,291,306,232 Friend, I., 145 Fuller, W. A., 96,203
Geary, R. C., 128 Geisel, M. S., vii, 207,209,210,211, 307,312,314,
316,320, 333,400 Gelsset, S., 224,226,233,390, 396, 397 Gibbs, W., 8
Gilbert, R. F., 240,243 Goldberger, A. S., v, vii, 36, 37,146 Good, I. J., vii,
11 Graybill, F. A., 75, 81 Greville, T. N. E., 77 Griliches, Z., 200,208,314
Grunfeld, Y., 102 Guttman, I., vii Haavelmo, T., 64, 65, 89,258,260,
261,263 Hadamard, J., 5, 6 Hanson, N. R., 5 Hartigan, J., 49, 50, 255
Hildreth, C., 41, 96 Hill, W. J., 307

424 Holt, C., 322 Hood, W. C., 64,258 Houthakker, H. S., 159 Huang, D.S.,
240 Hume, D., 2 AUTHOR INDEX Modigliani, F., 145,322 Moore, E. H.,
77 Moore, H., 1 Morgan, J. N., 37 Muth, J. F., 322 James, W., 115 Jeffreys,
H., vii, 2,4, 5, 7, 8, 9, 11, 12, 20, 31, 33, 35, 40, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 58,153, 190, 217,219, 225,226,228,255, 256,289,292,
302, 304,305,306, 360, 395 Johnson, R. A., 32 Johnston, J., 124 Jones, R.,
145 Kakwani, N. C., 243 Kalaba, R., 348 Kaufman, G. M., 75, 23,3
Keeping, E. S., 375 Kendall, M. G., 114, 115,118, 119, 127, 164,
176,357,366,371, 373,384 Kenney, J. F., 375 Kiefer, J. C., 127,333 Klein,
L. R., 194 Kmenta, J., 177,240, 243,289,327 Koopmans, T. C., 64,258
Koyck, L., 200 Kullback, S., 256 Kuznets, S., 5 Laub, P.M., vii Le Cam, L.,
31 Leenders, C. T., 271 LindIcy, D. V., vi, vii, 20, 21, 31, 33, 40, 53,
58,124, 129,141,298,299, 300, 301,303,304, 318 Lu, J. Y., 96 Lucas, R.,
208,314 Luce, R. D., 292 Mach, E., 5 Madansky, A., 126 Maddala, G. S.,
208, 314 Marsaglia, G., 128,281 Martin, J. E., 96,203 Meiselman, D.,
261,306,323 Miller, M. H., 145 Minhas, B., 169 Nerlove, M., 248 Neyman,
J., 31,115,118, 119 Noble, B., vi Ockham, W., 8 Orcutt, G. H., 190 Park, C.
J., vii, 203 Pearson, K., 1, 39,365,370, 373, 374,376 Peers, H. W., 286
Penrose, R., 77 Pierce, G. S., 5 Plackett, R. L., 42 Pointare, H., 5 Popper, K.
R., 3 Prescott, E. C., 345,346,347,349,350, 351,352,353,354 Press, S. M.,
vii, 75,279, 330, 331 Price, R., 11 Raiffa, H., vii, 21, 58, 75,292, 375,387
Ramsey, J. B., 163 Rao, C. R., 77,329 Reichenbach, H., 2 Reiersol, O., 128,
130 Renyi, A., 20 Revankar, N. S., vii, 176,177,179,183 Richard, J. F., vii
Richardson, D. H., 147 Roberts, H. V., vii, 276 Robinson, E. A. G., 159
Rothenberg, T. M., 239,258,271 Roy, S. N., 390, 391,392 Samuelson, P. A.,
194, 195 Sankar, U., vii, 131,133,154, 169 Savage, L. M., vii, 14, 46,226
Sawa, T., 147 Schlaifer, R., vii, 21, 58, 75,375,387 Scott, E., 118, 119
Shannon, C. E., 43 Sharma, D., vii Simon, H. A., 322 Smffnov, N. V., 27,
64 AUTHORINDEX 425 Sobel, E., 146 Solow, R., 169,314,315 Stein, C.
M., 115,140 Stone, M., vii, 26 Stuart, A., 114, 115,118, 119, 127,164, 176,
357,366,371,373,384 Summers, R., 281 Swamy, P. A. V. B., vii, 145 Tan,
W. Y., 129 Theil, H., 100, 194, 272, 322 Thornber, H., vii, 26, 95,169,
171,172, 190, 201,307,312 Tiao, G. C., vii, 27, 46, 96, 98, 112, 129, 224,
229,252, 396,397 Tocher, K. D., 230 Varga, R. S., 209 Wallace, N., 208,
314 Watson, G. S., 146 Watts, D. G., 320, 333 Weaver, W., 43 Welch, B. L.,
286 Widder, D. V., 368 Winkour, H. S., Jr., 190 Wolfowitz, J., 127 Wright,
R. L., 154 Zarembka, P., 164 Zellner, A., 75,96, 98, 112, 131,133,145, 154,
176,177, 179, 183, 191,194, 203,207,210, 211,224,229,233, 240, 243,252,
261,267,272,289, 312, 320, 327,331,333,336, 396,397

Subject Index Absolute error loss function, 24, 25, 333, 334 Adaptive
control solution, 340ff Aimon distributed lag technique, 221 Appraisal of
alternative control solutions, 322-324, 334-336, 343-344, 351-353
Approximate posterior pdf's, 33, 34, 47, 96, 101-106, 110-112, 238,240
Asymptotic expansions, 110-112 Autocorrelation, 86-97 Autoregressive
process, first order, 186-191, 216-220 fast order with incomplete data, 191-
193 second order, 194-200 Average risk, 26 Bayes biographical note, 11
Bayes-Laplace procedure, 41, 42 Bayesian estimator, 26-27 Bayes'
Theorem, 10-11, 13-14 several sets of data, 17-18 Beta function, 36%368
Beta pdf, 373-375 Beta prime pdf, 375-376 Binomial distribution, 3840
Bivariate moment-cumulant inversion formulas, 112 Box:Cox analysis of
transformation, 162ff Certainty equivalence approach, 322ff Chi-square
pdf, central, 370-371 noncentral, 156 Chow's model of U.S. economy, 350-
351 Cobb-Douglas production function, 69, 84, 162, 177,182, 289
Comparing hypotheses, 291ff Conditional posterior pdf, 21 Confidence
intervals and regions, 27-28 Constant elasticity of substitution production
function, 162, 169ff Constant returns to scale, 69, 173 427 Control
problems, adaptive control solution, 340.343 adaptive decision rules, 352
certainty equivalence solution, 322ff cost of changing control variable, 324-
325,330-331 here and now solution, 338, 343 linear decision rules, 351-352
money multiplier model, 323-3 24 monopoly problem, 325-327 multiperiod
problems, 344ff multiple regression process, 327-331, 336-343 multivariate
regression process, 331-333 one period problems, 320ff perfect information
decision rules, 352 sensitivity of solutions to form of loss function, 333-336
sequential updating solution, 338-340, 343 simple regression process,
320.325, 333-336 simultaneous equation process, 346ff two period
problem, 336-343 Convergence of integrals, 368 Correlation coefficient,
228 Covariance matrix, diffuse prior pdf, 225-227 posterior pdf in
multivariate regression, 227 Deductive inference, 24 Degrees of belief and
probability, 8-10 Direct probability, 13 Distributed lag models, 220ff Almon
technique, 221 application to consumption function, 207ff, 312-317
generalizations, 213ff Solow's family, 314-317 Drze's results on
identification, 257-258 Dynamic properties of models, 194ff

428 Errors in variables, 114ff Errors in variables model, Bayesian


estimation, 132-145, 150-154 functional form, 123-127, 132, 144
identification, 121,128 inconsistency of maximum likelihood estimator,
118, 120, 126-127 inequality constraints, 129, 135-136, 160 instrumental
variable approach, 146ff maximum likelihood estimation, 123-132 negative
variance estimator problem, 129 prior information and identification, 122,
128, 253-258 structural form, 127-132, 145 Estimable functions, 81
Estimator, 25 Expected loss, 24 Expected (or average) risk, 26 Expected
utility, 74 Expected utility hypothesis, 24,292 Fisher's information matrix,
47 F pdf, 376-378 Friedman-Meiselman problem, 306 Friedman's
consumption model, 123,160 F test, 298 Full information maximum
likelihood estimation, 270-271 Fully recursive model, 249,250-25 2
Gamma function, 364-365,368 Gamma pdf's, 370-371 Generalized
distributed lag models, 213ff Generalized inverses, 77ff Generalized least
squares, 101,242-243 Generalized (or matrix) Student t pdf, 229-233,396-
399 Generalized production functions, 162, 176ff Grouped data and Pareto
distribution, 35-38 Haavelmo consumption models, 258-264 Helmert's
transformation, 155 Highest posterior density intervals and regions, 27
Housing expenditure model, 221-223 Hypothesis testing, 29 lff decision
theoretic approach, 295-297 likelihood ratio, 294-296,298 ldmdley's
paradox, 303-305 Lindley's procedure, 298-302 SUBJECt INDEX
Hypothesis testing (continued) posterior odds, 293ff Identification, 121-122,
128-129,254-258 Incidental parameters, 119, 127-138, 145 Inductive
inference, 4-5 rules for a theory of, 7-11 Information matrix, 47 Information
measure, 43, 50 Initial conditions, 87-88, 188 Instrumental variable
approach, 146ff Integration, numerical, iii, 400ff Interval estimation, 27-28
Inverse probability, 13 Inverted gamma pdf's, 371-373 Inverted Wishart
pdf, 227-228, 395-396 Investment multiplier, 63-65 Jeffreys' prior pdf's, 41-
45, 47-53 Jeffreys' rules for induction, 7-11 Just4dentification, 257 Klein's
Model I, 194 Kurtosis, 365 Lamda matrix, 96 Large sample properties of
posterior odds, 304, 311 Large sample properties of posterior pdf's, 31-34
Least squares, 77, 147 Leptokurtic, 366 Likelihood function, 14
approximation to posterior pdf, 31-34 Likelihood principle, 14-15
Likelihood ratio, 294, 296, 298 Limited information Bayesian analysis,
265-270 Limited information maximum likelihood estimates, 268 Lindley's
paradox, 303-305 Lindley's testing procedure, 298-302 Linear asymmetric
loss functions, 334 Linearized models, 96-97 Locally uniform prior pdf, 45
Log-normal pdf, 56 Loss function, 24-25 Marginal distribution of the
observations, 28-29 Marginal posterior pdf, 24 SUBJECT INDEX 429
Means, problem of several, 52-53, 114-122 Mesokurtic, 366 Minimum
mean square error estimator, 56 Missing observations and autoregressive
model, 191ff Money demand function, 164ff Moore-Penrose generalized
inverse, 78 Monte Carlo experiments, 171-172,190, 276-287 Moving
average error term, 201 Multicollinearity, 75 Multi-period control problems,
344ff Multiple regression model, 65ff autocorrelation, 93-97 comparing
regression models, 306-312 control problems, 327-331,336-344 diffuse
prior pdf, 66 informative prior pdf, 70 likelihood function, 65 pooling cross
section and time series data, 108ff pooling data with differing variances,
98ff posterior pdf's for parameters with diffuse prior, 65ff posterior pdf's for
parameters with informative prior, 70-71 predictive pdf, 72ff singular
moment matrix, 75ff Multiplier-accelerator model, 194,221 Multivariate
pdf's, double Student t, 101, 110-112 generalized (or matrix) Student t, 229-
233,396-399 inverted Wishart, 227-228,.395-396 normal, 379-383 normal-
Student t, 99, 110 Student t, 383-389 Wishart, 389-394 Multivariate
regression, 224ff diffuse prior pdf, 225-227 exact restrictions, 236-238, 243
informative prior pdf, 238-240 likelihood function for traditional model,
224-225 likelihood function for "seemingly unrelated" model, 241 posterior
pdf's, "seemingly unrelated" model with diffuse prior, 242-243 traditional
model with diffuse prior, 226ff Multivariate regression (continued)
traditional model with informative prior, 238-240 predictive pdf for
traditional model, 233-236 Natural conjugate prior pdf, 21, 71, 75, 238-
239,307 Non-Bayesian techniques, 11 Nonlinear models, 87ff, 96-97,162ff
Normal equations, 77 Normal mean problem, 14-17, 20 Normal means
problems, 52-53, 114-122 Normal mean standard deviation problem, 21-23
Normal pdf, multivariate, 379-383 univariate, 363-366 Normal-Student t
pdf, 99, 111 Nuisance parameters, 22 Numerical integration, vi, 400ff
Ockham's rule, 3, 8 One period control problems, 320ff Optimality
principle, 348 Overidentification, 257-258 Pareto distribution, grouped data,
35-38 ungrouped data, 34-35 Pdf's: see Univariate pdf's and Multivariate
pdf's Performance in repeated samples, 27, 276-277 Permanent income
model, 123-124,160 Platykurtic, 366 Point estimates and estimators, 24-27
Point prediction, 30-31 Pooling cross-section and time series data, 108ff
Posterior odds, 293�f composite hypotheses, 296-298 consumption
models, 312-316 distributed lag models, 312-316 large sample properties,
304, 311 Lindley's paradox, 303-305 regressjoe models, 306-312 simple
hypotheses, 292-294 Posterior pdf, 14 marginal and conditional, 21-23 large
sample properties, 31-34 Precision parameter, 15 Predictive intervals and
regions, 30-31 Predictive pdf, 29-30

430 SUBJECT INDEX Principle of inverse probability, 13 Principle of


optimality, 348 Prior information, 9-11 Prior pdf, 14, 18-21, 41-53 data-
based, 18-19 improper, 20 invariance properties, 47-50 Jeffreys' prior pdf's,
41-45, 47-53 locally univorm, 45 minimal information, 51-53 natural
conjugate, 21, 71, 76, 238-239, 307 non-data based, 18-19 vague or diffuse,
20 Probability, 9-11 degree of reasonable belief, 9-10 direct, 13 frequency
and other definitions, 9 inverse, 13 Quadratic loss function, 24-25 Quartic
loss function, 334 Quasi-Bayesian estimator, 26 Simultaneous equation
model (continued) large sample approximations, 266-267, 270-272
likelihood function, full system approach, approach, 272 fully recursive
model, 250 limited information approach, 267 triangular model, 253 limited
information Bayesian analysis, 265-270 observational equivalence, 254-255
reduced form system, 256-258 triangular model, 249, 252-253 Skewness,
365 Solow's family of distributed lag models, 314ff Square root error loss
function, 333 Student t pdf, generalized (or matrix), 229-233, 396-399
multivariate, 383-389 univariate, 366-369 Structural equations, 248ff
Sufficient statistics, 20-21 Recursive model, 249,250-252 Reduced form
coefficients, 257 Reduced form system, 257-258 Reductive inference, 5-7
Regression model (see Multiple regression, Multivariate regression, and
Simple regression) Reporting results, 40-41 Returns to scale, 69, 173,174,
176ff Risk function, 25 Samuelson multiplier-accelerator model, 194
Second order autoregressive model, 194ff Simple regression, 58ff, 86-92,
320-325, 333-336 Simplicity, 3 Simultaneous equation model, 248ff Drze's
results on identification, 257-258 full system analysis, 270-276 fully
recursive model, 249, 250-252 Haavelmo consumption models, 258-264
identification and prior information, 253-258 T-test, 300 Teaching Bayesian
inference, vi-vii Testing hypotheses (see Hypothesis testing) Three-stage
least-squares, 272, 274,276 Time series models, distributed lag models,
200ff first order autoregressive model, 186-190, 216-220 first order
autoregressive model with incomplete data, 191-193 second order
autoregressive model, 194-200 simultaneous equation model, 248ff
Triangular models, 252-253 Two-stage least-squares, 146-147, 266-267,
276 Underidentification, 257-258 Uniform prior pdf, 41ff Unity of science,
1-2 Univariate pdf's, beta, 373-375 beta prime, 375-376 binomial, 38-40 F,
376-378 gamma, 369-370 inverted beta, 375-376 Univariate pdf's
(continued) inverted gamma, 371-373 log-normal, 56 noncentral x 2 , 156
normal, 363-366 Pareto, 34-38 Student t, 366-369 x 2 , 370-371 Unusual
facts, 5 Utility function, 332 SUBJECT INDEX Variance-covariance
matrix, diffuse prior pdf, 225-227 posterior pdf for multivariate regression,
227 Vague prior information, 20 Wishart pdf, 227,389-394 pdf, central,
370-371 noncentral, 156 431

You might also like