0% found this document useful (0 votes)
208 views9 pages

How To Be A Bayesian in Sas

This document discusses model selection uncertainty in logistic regression and generalized linear models using SAS. It describes how the frequentist approach of using likelihood ratio tests and stepwise regression for model selection has disadvantages like only working for nested models and favoring overly complex models. The document proposes a Bayesian-like approach of averaging over multiple good models instead of selecting a single best model, to account for model uncertainty. This improves predictions and produces more realistic confidence intervals than any single model. The approach uses information criteria like Akaike information criterion and Schwarz information criterion that penalize model complexity, as an alternative to likelihood ratio tests for model selection within SAS procedures like PROC LOGISTIC and PROC GENMOD.

Uploaded by

WenKai Zhang
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
208 views9 pages

How To Be A Bayesian in Sas

This document discusses model selection uncertainty in logistic regression and generalized linear models using SAS. It describes how the frequentist approach of using likelihood ratio tests and stepwise regression for model selection has disadvantages like only working for nested models and favoring overly complex models. The document proposes a Bayesian-like approach of averaging over multiple good models instead of selecting a single best model, to account for model uncertainty. This improves predictions and produces more realistic confidence intervals than any single model. The approach uses information criteria like Akaike information criterion and Schwarz information criterion that penalize model complexity, as an alternative to likelihood ratio tests for model selection within SAS procedures like PROC LOGISTIC and PROC GENMOD.

Uploaded by

WenKai Zhang
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

HOW TO BE A BAYESIAN IN SAS:

MODEL SELECTION UNCERTAINTY


IN PROC LOGISTIC AND PROC GENMOD

Ernest S. Shtatland, Sara Moore, Inna Dashevsky, Irina Miroshnik,


Emily Cain, Mary B. Barton

Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA

ABSTRACT MODEL SELECTION AND INFERENCE:


FREQUENTIST APPROACH
The SAS system is known not to support any
more or less developed Bayesian method. At the Model selection is a fundamental task in data
same time a Bayesian framework is the ideal analysis, widely recognized as central to good
environment for resolving the problem of model inference. It is also a very complex matter, so it
selection uncertainty (which is important for is not surprising that there is no definitive
getting proper inference based on the model), breakthrough in this field and still there is no
though at a price of very complex and time- widely accepted model building strategy. At the
consuming algorithms. In this presentation, same time in the research community there is a
which is a continuation of our SUGI’2000 clear need for such a strategy. Many researchers
paper, the possibility of avoiding the come to the conclusion that the appropriate
complexities of fully developed Bayesian model selection criteria should be specified in
methods is discussed. A Bayesian-like approach the protocol for any study, including clinical
to resolving the problem of model selection trials (Lindsey & Jones (1998)), and that model
uncertainty in PROC LOGISTIC and PROC selection should be considered an integral part
GENMOD is developed, while staying of inference. Until recently the relationship
completely within the maximum-likelihood between model selection and inference was a
methodology. Only standard elements of the one-way street: first, we search for a reasonable
output are used, such as the likelihood, the model (optimal or sub-optimal in some
Akaike information criterion, and the Schwarz meaning) and then, conditioning on a single
information criterion, etc., or some equivalent choice, we make statistical inference
R2 measures discussed in the above mentioned (confidence intervals, etc.). In other words, we
Shtatland, Moore & Barton (2000). The proceed in our inference as if our chosen model
proposed approach uses some averaging and were the true one, which is almost always
improves the model selection process by taking incorrect. As a result, we ignore the model
model uncertainty into account. The average of a uncertainty uncovered in the search,
(usually small) number of ‘good’ models is underestimate the total uncertainty, and work
often better than any one model alone. The with too narrow confidence intervals. In short,
improvement is seen in terms of the quality of we can be overly optimistic, which is
predictions, more realistic confidence intervals, dangerous.
etc. Applications to some medical studies are
discussed.
MODEL SELECTION: LIKELIHOOD relative to model without the covariate (in other
RATIO TEST AND STEPWISE words, the one that would result in the largest
REGRESSION likelihood ratio statistic). Also, the most
important explanatory variable is the one with
Currently, there are two basic approaches to the smallest P-value. However, it is well known
model selection in SAS PROC LOGISTIC: the that the P-values used in stepwise selection
classical approach based primarily on the procedures are not P-values in the traditional
likelihood ratio test (LRT) and the approach hypotheses testing context. They should be
based on the family of information criteria such rather thought of as indicators of relative
as the Akaike information criterion (AIC), importance among explanatory variables.
Schwarz or Bayesian information criterion (SIC
or BIC). The classical approach through LRT, None of these automatic procedures are
though still being widely used, is unsatisfactory foolproof and numerous warnings are issued to
because of three basic disadvantages: use them with care. When using stepwise
a) It works only if nested models are selection techniques we capitalize on chance
compared; because we perform many significance tests to
b) Asymptotic Chi-Square approximation compare different combinations of explanatory
may be poor for small sample sizes; variables. As a result, completely unrelated
c) LRT is inconsistent: inherently it favors variables can be chosen by chance alone, and a
larger models unduly. thorough analysis is needed that examines the
The most popular implementation of the LRT substantive importance of the variables in
idea is stepwise selection which is realized in addition to their statistical significance. Any
both multiple linear regression and logistic stepwise selection identifies candidates for a
regression cases and which, in principle, can be model based solely on statistical grounds. A
implemented in generalized linear modeling as a common modification of the stepwise selection
whole. procedure is to begin with a model which
already contains some known important
In SAS PROC LOGISTIC, the most commonly covariates (option=INCLUDE) irrespective of
used model selection methods are three their statistical significance.
automatic procedures: forward selection,
backward elimination, and stepwise regression As a whole, the stepwise selection approach is
which is, essentially, a combination of the a very convenient and powerful technique.
previous two. Ideally, we expect that the final Unfortunately, it is too often misused especially
model selected by each of these procedures when researchers rely heavily on the result of
would be the same. This does often happen, but the stepwise search as a single choice without
it is not 100% guaranteed. All of them are based further exploratory analysis. According to
on the ‘importance’ of a covariate defined in Breiman (1992) such a usage has long been “ a
terms of the statistical significance of the quiet scandal in the statistical community”. Note
coefficient of the variable (Hosmer & that stepwise techniques and LRT as a whole do
Lemeshow, pp. 106-107). Significance is not address overfitting or underfitting problems.
assessed via the likelihood ratio chi-square test, As a result, a stepwise model choice could be
and at any step in the stepwise procedure the biased. It does not provide confidence intervals
most important covariate will be the one that with the proper coverage. Confidence intervals
produces the largest change in the log-likelihood produced with a stepwise procedure are falsely
narrow, and cross-validation or bootstrapping
techniques are usually needed to get more case, there is no problem in it since (1) and (2)
realistic results. Unfortunately, these techniques are related to each other by a one-to-one
are not implemented in SAS. mapping. We can add that the most general form
of AIC-type information criteria is
MODEL SELECTION AND
INFORMATION CRITERIA IC(c) = - 2logL(M) + c*K (3)

To overcome the disadvantages of LRT If c = 0, (3) is equivalent to the classical


mentioned above, the information criteria family likelihood statistic. If c = 1, (3) is equivalent to
was introduced. The basic idea behind the the GLIM goodness-of-fit procedure based on
information criteria is penalizing the likelihood plotting the deviance against degrees of freedom
for the model complexity – the number of (Smith and Spiegelhalter (1980)). If
explanatory variables used in the model. The c = 2, IC is identical to AIC. And lastly, if c =
most popular in this family are the Akaike logN, (3) is equivalent to SIC. The question of
information criterion (AIC) and Schwarz what value of parameter c to choose is not easy.
information criterion ( SIC). AIC must be Atkinson (1981) suggests that the range between
credited with being the first widely known 2 and 5 or 6 may provide “a set of plausible
measure that attempts to address the divergent initial models for further analysis”.
requirements of model complexity and
estimation accuracy (fit, likelihood). AIC and AIC and SIC have some optimal properties
SIC can be defined in two different forms. The providing certain justification for choosing these
“smaller-is-better” form is defined by the information criteria out of the entire family (3).
equations: AIC is based on the errors of prediction ground
and as such has some minimax properties for
AIC = - 2logL(M) + 2*K prediction over the experimental region, but
(1) larger values of c may be required for
SIC = - 2logL(M) + (logN)*K extrapolation (Atkinson (1981)). Striving
predominantly for good prediction, AIC may
where logL(M) and logL(0) are the maximized tend to select too many covariates. From the
log likelihood for the fitted model and the “null” prediction standpoint, occasionally retaining an
model containing only an intercept term, N is the unnecessary covariate is preferable to
sample size and K is the number of covariates occasionally omitting a necessary one. It is
(including an intercept). The “larger-is-better” known (Stone (1977)) that c = 2 is
form uses the equations: asymptotically equivalent to a cross-validation
criterion – a very important property of AIC.
AIC = logL(M) – K Unlike AIC, the Bayesian Information Criterion
(2) is consistent: the probability of choosing
SIC = logL(M) – (logN/2)*K incorrectly the bigger model converges to 0. SIC
arises automatically out of the minimum coding
Having these two forms seems confusing approach (Dawid (1992)). Also important is
especially if they are implemented that AIC and SIC can be used in model
simultaneously in the same procedure (SAS comparison of nested as well as non-nested
PROC MIXED, for example). Using both forms models (unlike LRT)). Some applied
can be explained by statistical tradition. In any statisticians strongly believe that in the future
the emphasis will shift from studying the effect
of a single covariate (after correction for (a) Information criteria work in both nested
confounders) to building prognostic models for and non-nested cases;
individual cases. See in Van Houwelingen (b) Information criteria are not tests of
(1997): “Maybe Akaike’s information criterion significance. As such, they do not indicate
will take over completely from the P-values… that the better of two models is
This asks for a different view on statistical “significantly better”. But at the same time,
model building.” Thus, we see that strong they do not depend on asymptotic Chi-
properties of AIC and SIC are often mutually Square approximations which may be poor
complementary: SIC is consistent - AIC is not, for small sample sizes. Although,
AIC is good in prediction - SIC is better in asymptotically, the use of AIC is
extrapolation, SIC often performs better when equivalent to a stepwise procedure with a
the true model is very simple - for relatively critical level of 15.7% (Lindsey & Jones
complex models AIC is consistently more (1997));
accurate. The philosophy underlying AIC is that (c) At least some of the information criteria
“truth” is high-dimensional, requiring many (SIC, for example) are consistent.
(possibly infinitely many) explanatory
variables. By contrast, working with SIC we Information criteria, originally designed
assume that a true model is low-dimensional specifically for prediction purposes in time
(Buckland, Burnham, and Augustin, (1997)). An series, are much more wildly used now.
applied researcher has to be capable of Regarding their use in biostatistics and health
maneuvering between “AIC-SIC truths” and care applications see Van Houwelingen (1997)
reconciling them. Below we give an example of and Lindsey & Jones (1998).
such heuristic reconciling. If we want to avoid
overfitting a model, we should use more There are two serious problems related to the
conservative criteria, such as SIC, sometimes at information criteria. First, they are not
the cost of underfitting a model for finite automated. Second, it is still assumed that we
samples, which leads to a significant increase in will come to a single model: AIC-optimal or
bias. If we want to avoid uderfitting a model, SIC-optimal, etc. The first problem is technical.
then we should use more liberal AIC. We will If we have p = 10 possible explanatory
see below that AIC and SIC are also mutually variables (which is a comparatively small
complementary from a different, Bayesian number), then there are K = 210 = 1024 possible
analysis standpoint. models to compare. If p=20 (which is rather
moderate), then the number of possible models
Summarizing, we come to the following is about one million. Thus, finding the best AIC
conclusions. First, AIC and SIC have some or SIC model by complete enumeration is
optimal mutually complementary properties, and usually impractical, and we need some
on this ground should be chosen out of the entire shortcuts. One of the possible solutions to this
family of information criteria (this choice is problem is to use the stepwise selection method
implemented in SAS PROC REG, PROC with the level of significance for entry
LOGISTIC and PROC MIXED). Second, there SLENTRY close to one, for example
is no single information criterion which will SLENTRY=0.95. In this case we will get most
play the role of a panacea in model selection. likely the sequence of models starting with the
As a whole, information criteria resolve (at null model (with the intercept only), and ending
least partly) some problems related to the with the full model with all explanatory
classical LRT approach: variables included. The models in this sequence
will be ordered in the way maximizing the Bayesians. The second disadvantage is a
increment in likelihood at any step. Note that we technical one: the difficulty of calculating the
use the stepwise procedure in a way different Bayes factors (the Bayes factor in the simple vs.
from the one typically used. Instead of getting a simple hypothesis testing setting can be defined
single stepwise pick for a small SLENTRY, as the odds in favor of one model over the
say, 0.05, we are planning to work with the competing model) and the number of terms for
whole sequence of K models and calculate AIC Bayes averaging which can be enormous (Kass
and SIC for them. Thus, instead of comparing and Raftery (1995)). Many methods were
the values of AIC or SIC for 1024 models we proposed to overcome these problems: the
have to do this for 10 models (10 vs. 1000 and Occam’s window approach, to minimize the
20 vs. 1000000). This is a huge gain in number of models for averaging; the Markov
efficiency. Moreover, the behavior of AIC or Chain Monte Carlo method, to average all the
SIC on this sequence is very simple and easy to models; the intrinsic Bayes factor approach of
interpret: when the number of covariates grows, Berger and Pericchi; the fractional Bayes factor
the values of both AIC and SIC decrease then method of O’Hagan, etc. (see Kass and Raftery
increase with one minimum. And this minimum (1995)). All these methods are very complex
corresponds exactly to the AIC or SIC-pick. technically, and have not yet resolved the
problem. Also, all these techniques are
The second problem related to the information unavailable for SAS users.
criteria is much deeper because it is connected
to our tradition of selecting a single model, Fortunately, there exists a “shortcut” method that
optimal or sub-optimal according to some allows us to resolve the problem of unknown
unique chosen criterion. To overcome this priors on one hand and the formidable
limitation, we have to turn to the Bayesian calculations on the other hand. It can be done by
approach. using AIC and SIC simultaneously. As shown in
Kass and Raftery (1995) (see also Akaike
MODEL SELECTION AND THE (1983)), model comparisons based on AIC are
BAYESIAN APPROACH asymptotically equivalent to those based on
Bayes factors under the assumption that the
It is well known (see Draper (1995), Chatfield precision of the priors is comparable to that of
(1995), Kass & Raftery (1995)) that the the likelihood (in other words, only if the
Bayesian approach with averaging across the information in the prior increases at the same
models is the most natural environment for rate as the information in the likelihood). This
resolving the problem of model selection situation is considered not very typical, though
uncertainty, superior to bootstrapping. Note that not impossible (Kass and Raftery (1995), Carlin
neither the Bayesian approach nor bootstrapping and Louis (1996), pp.48-49). Much more usual
are implemented in SAS. It is also known that is the situation when the prior information is
the fully developed Bayesian approach has two small relative to the information provided by the
disadvantages. First, it is assumed that we know data. In this case SIC should be used. According
the prior distributions while they are usually to Kass and Wassermam (1995), exp(-SIC / 2)
unknown, and any assumptions about these provides a surprisingly good approximation to
distributions “are not checkable, however many the Bayes factor when the amount of information
data are collected” (according to Nelder in the prior is equal to that in one observation
(1999)). This is a very important disadvantage (at least when comparing nested models). Thus,
that deters many statisticians from becoming AIC and SIC can emulate the Bayesian approach
wk=exp(-AICk /2) / Σ exp(-AICi /2)
in the two extreme and opposite situations:
when the priors are as important as the (4)
likelihood (i.e. the data), and when the priors or

wk = exp(-SICk /2) / Σ exp(-SICi /2)


are almost of no importance at all. This is one
more example of AIC and SIC being mutually (5)
complementary, this time from a Bayesian
standpoint. It emphasizes a particular We would like to emphasize one more time that
significance of AIC and SIC in the family of working with the AIC-SIC approach has
information criteria, and suggests that we have important advantages over the fully developed
to pay special attention to the segment of the Bayesian approach. Prior distributions are
stepwise sequence between SIC and AIC. We usually unknown and can only be hypothesized.
call this segment the AIC-SIC window. By the This is a major problem. Bayesian factors are
way, we can always add to this segment some inherently sensitive to errors of specification of
models that are substantially important. It prior distributions. This is another major and
should be reminded that both stepwise very serious limitation. The AIC-SIC window
regression and information criteria are based approach works without reference to any prior
solely on the statistical grounds. By using the distribution. Still another difficulty is that
AIC-SIC window we get some important Bayesian methods may lead to intractable
benefits of the Bayesian approach without its computational problems. All widely available
disadvantages. There is a variety of ways to use statistical packages (including SAS) use
the AIC-SIC window. If we need a quick and classical frequentist methods. Besides the fact
reasonably reliable prediction, we can use the that the fully developed Bayesian approach is
largest model in this window: the AIC-optimal unavailable in SAS, we can add that some areas
model. If we are interested in the best model for of research such as clinical trials and
description and interpretation, we can use the epidemiology are especially resistant to
smallest model: the SIC-pick. According to Bayesian methods (Freedman (1996)). The AIC-
Kass and Wasserman (1995), SIC may be SIC window approach is much simpler in terms
preferable to the fully Bayesian techniques: the of computation. It uses only standard elements of
intrinsic Bayes factors of Berger and Pericchi the PROC LOGISTIC, such as likelihood,
and the fractional Bayes factors of O’Hagan. stepwise selection, AIC, SIC, etc. augmented by
Kass and Raftery (1995) think that SIC may be some relatively simple calculations.
used for reporting scientific results with other
analyses omitted but serving as background EXAMPLES OF USING THE AIC-SIC
support. But we can also use AIC-SIC window WINDOW
in a purely Bayesian way – through averaging.
The promising Bayesian model averaging We will give two examples of applying the
approach to coping with model uncertainty AIC-SIC window approach. In Barton et al
should appeal not only to Bayesians, but also to (2000), the authors study the dependence of
any “broad-minded” statistician. The key to the mammography utilization in a prepaid health
success of this approach lies in not having to care system on socioeconomic factors. PROC
choose the single best model but rather in LOGISTIC is used with the number of cases
averaging over a variety of plausible competing N = 1667 and the number of models in the
models (Chatfield (1995)). We can average the stepwise sequence K = 6. In this example, AIC
models from the AIC-SIC window with weights and SIC demonstrate a very uncommon consent
and choose as optimal the model with
covariates age, age-squared, and income. interested in description and
Thus, the AIC-SIC window contains only one interpretation, then we should use the
model which makes unnecessary farther work on simplest model and run the SIC-optimal
choosing a submodel in or averaging over the regression. If we need a quick but more
window. This is a very atypical situation. or less reliable prediction, we can use
the largest model in this window and run
The second example is related to the application AIC-optimal regression. If we need a
of Poisson regression (Barton, Moore, Polk et more accurate prediction, we might use
al (2000)). In this case, the number of cases averaging over the AIC-SIC window.
N = 992 and the number of the models in the Kass and Raftery (1995) show that model
sequence K = 10. The AIC-optimal model averaging by the Occam’s window and
contains 7 covariates (including the intercept). Markov Chain Monte Carlo methods
The SIC-optimal model is a submodel of the gives consistently and substantially better
AIC pick with 4 covariates. The AIC-SIC predictions than predictions based on any
window contains 4 models. Note that in this one model alone, for several data sets.
case SIC may over-penalize, placing too much We can expect similar effects when using
weight on parsimony. As a result, the SIC- averaging over the AIC-SIC window.
optimal model does not contain some Note that we can apply the averaging
substantially significant explanatory variables. approach only for large enough data sets
because the Bayesian interpretation of
SAS MACRO FOR AIC-SIC WINDOWS IN AIC and SIC is justified only
LOGISTIC AND POISSON REGRESSION asymptotically. Note also that Bayesian
averaging does not lead to a simple
With the enhanced capabilities in Version 8 to model. According to Chatfield (1995),
output the resulting statistics for many SAS though “even a simple average is often as
statistical procedures, it is not difficult to write good as anything, the user does not
a SAS macro for model selection based on the receive a simple model to describe the
AIC-SIC window approach which combines the data.” It is interesting to note that the
stepwise selection, information criteria and the stepwise pick with the default SLENTRY
Bayesian averaging approach. The macro will of 0.05 (which is typically used) is
build the model in three steps: usually located half-way between AIC-
(1) A stepwise regression (or its PROC optimal and SIC-optimal models. Thus,
GENMOD analog) with the probability the default pick is usually too large for
of entry high enough to allow interpretation and too small for
construction of a sequence of models prediction. Users who trust stepwise
starting with the null model (the intercept regression blindly have a rather bad
only) to the full model with all choice for both purposes. It is one more
explanatory variables of interest; manifestation of “a quiet scandal in the
(2) Comparing the values of AIC and SIC for statistical community” (Breiman (1992)).
each model of the stepwise sequence,
finding the AIC and SIC-optimal models Thus, before using the macro, the user needs to
in this sequence and constructing an clarify the objectives of using the model: either
AIC-SIC window; for data description and interpretation, or for a
(3) Using the AIC-SIC window. If we are quick and simple prediction, or maybe for a
more accurate prediction. All of these goals can
be achieved using the AIC-SIC window and the inference. Biometrics, 53, 603-618.
macro based on it.
Carlin, B. P. & Louis, T. A. (1998). Bayes and
CONCLUSIONS Empirical Bayes Methods for Data Analysis,
New York: Chapman & Hall/CRC.
In this paper, we propose a model selection
approach that combines the advantages of Chatfield, C. (1995). Model uncertainty, data
stepwise regression, information criteria and mining and statistical inference. Journal of the
Bayesian averaging. The basic message of the Royal Statistical Society, Series A 158, 419-
paper is that we should not ignore the model 466.
uncertainty uncovered in the search and
underestimate the total uncertainty, and that we Dawid, A. P. (1992). Prequential analysis,
can afford to do this by taking into consideration stochastic complexity and Bayesian inference. In
only a small number of candidate models. Bayesian Statistics 4, eds. J. M. Bernardo et al.
Oxford: Oxford Science Publications, 109-125.
REFERENCES
Draper, D. (1995). Assessment and propagation
Akaike, H. (1983). Information measures and of model uncertainty (with discussion). Journal
model selection. Bulletin of the International of the Royal Statistical Society, Series B 57,
Statistical Institute, 50, 277-290. 45-97.

Atkinson, A. C. (1981). Likelihood ratios, Freedman, L. (1996). Bayesian statistical


posterior odds and information criteria. Journal methods (A natural way to assess clinical
of Econometrics, 16, 15-20. evidence). British Medical Journal, 313, 569-
570.
Barton, M. B., Moore, S., Shtatland, E. S. &
Bright, R. (2000). The relationship of household Hosmer, D. W. & Lemeshow, S. (1989).
income to mammography utilization in a prepaid Applied Logistic Regression, New York: John
health care system. (submitted). Wiley & Sons, Inc.

Barton, M. B., Moore, S., Polk, S., Shtatland, Kass, R. E. & Raftery, A. E. (1995). Bayes
E.S., Elmore, J. G., & Fletcher, S. W. (1999). factors. Journal of the American Statistical
Anxiety and health care utilization after false Association, 90, 773-795.
positive mammograms (Abstract). Journal of
General Internal Medicine, 14, 9 Kass, R. E. & Wasserman, L. (1995). A
reference Bayesian test for nested hypotheses
Breiman, L. (1992). The little bootstrap and and its relationship to the Schwarz criterion.
other methods for dimensionality selection in Journal of the American Statistical
regression: X-fixed prediction error. Journal of Association, 90, 928-934.
the American Statistical Association, 87, 738-
754. Lindsey, J. K. & Jones, B. (1998) Choosing
among generalized linear models applied to
Buckland, S. T., Burnham, K. P. & Augustin. N. medical data. Statistics in Medicine, 17, 59-68.
H. (1997). Model selection: an integral part of
Nelder, J. A. (1999). Statistics for the
millennium (from statistics to statistical
science). The Statistician, 48, 257-269.

SAS Institute Inc. (1997). SAS/STAT Software


Changes and Enhancements Through Release
6.12, Cary, NC: SAS Institute Inc.

Shtatland, E. S., Moore, S. L. & Barton, M. B.


(2000). Why we need an R2 measure of fit (and
not only one) in PROC LOGISTIC and PROC
GENMOD. SUGI’2000 Proceedings, Cary, NC,
SAS Institute Inc., 1338-1343.

Smith, A. F. M. & Spiegelhalter, D. J. (1980).


Bayes factors and choice criteria for linear
model. Journal of the Royal Statistical Society,
Series B 42, 213-220.

Stone, M. (1977). An asymptotic equivalence of


choice of model by cross-validation and
Akaike’s criterion. Journal of the Royal
Statistical Society, Series B 39, 44-47.

Van Houwelingen, H. C. (1997). The future of


biostatistics: expecting the unexpected.
Statistics in Medicine, 16, 2773-2784.

CONTACT INFORMATION:

Ernest S. Shtatland
Department of Ambulatory Care and Prevention
Harvard Pilgrim Health Care & Harvard
Medical School
126 Brookline Avenue, Suite 200
Boston, MA 02215
tel: (617) 421-2671
email: [email protected]

You might also like