GLMConstrained
GLMConstrained
1 Introduction
Let β̂ be the posterior mode of the distribution in (5) and ηˆi = xti β̂ the corre-
sponding vector of linear predictors. Denoting l′ (η̂i ; yi ) = [∂ log f (yi ; ηi )/∂ηi ]ηi =η̂i
and l′′ (η̂i ; yi ) = ∂ 2 log f (yi ; ηi )/∂ 2 ηi η =η̂ , then a second order Taylor expan-
i i
sion of log f (yi ; ηi ) around ηˆi as a function of ηi gives
1
log f (yi ; ηi ) ≈ − (wi − ηi )2 + constant,
2σi2
where wi := η̂i − l′ (η̂i ; yi )/l′′ (η̂i ; yi ) and σi2 := −1/l′′ (η̂i ; yi ). Therefore, the i th
data point is “transformed” to an observation wi , normally distributed with
mean ηi and variance σi2 , i = 1, . . . , n. Now, if we define w and Σ w as
w := [w1 . . . , wn ]t , (6)
Σ w := diag[σ12 , . . . , σn2 ], (7)
In this work a non-informative prior for β, with support in the region T defined
in (4), is considered, i.e.,
π (β) ∝ 1T (β). (10)
Hence, substituting (10) into (9), we obtain
1 t −1
πa ( β| w) ∝ exp − (w − Xβ) Σ w (w − Xβ) IT (β)
2
1 t t −1
= exp − (β − β̂) X Σ w X(β − β̂) IT (β), (11)
2
Bayesian Inference of Constrained Parameters in Generalized Linear Models 5
where w and Σ w are given in (6) and (7), respectively. To obtain β̂, the mode
of π(β|y) in (5), we proceed as follows: given an initial estimate β̂ for the
mode we obtain successively β̃ in (12) and Σ a in (13) until the convergence of
β̃. This process is equivalent to solving the system of p nonlinear equations,
∂ log π(β|y)/∂β = 0, using the Newton-Raphson method.
To estimate E[h(β)] with respect to the distribution in (5), we apply Pthe impor-
n
tance sampling technique (Gilks P and Wild (1992)), i.e., Ê[h(β)] = i=1 ωi h(β i ),
m
where ωi = π(β i |y)/πa (β i |w)/[ i=1 π(β i |y)/πa (β i |w)]. The standard Gibbs
sampler implementation to β (using full conditionals) followed by Importance
Sampling to estimate E[h(β)] will be denoted GIS.
4 Example
The data in Table 1, taken from Breslow et al. (1983), are about respiratory
cancer deaths among a cohort of smelter workers exposed to airborne arsenic
trioxide. “Obs” and “Exp” are the observed and expected number of respira-
tory cancer deaths for the i th subcohort, i = 1, . . . , 40. Each person is classified
according to its birthplace (U.S. or foreign), level of moderate arsenic expo-
sure (“0”, “< 1”, “1 − 4”, “5 − 14”, “15+” years), and level of heavy arsenic
exposure (“0”, “< 1”, “1 − 4”, “5+” years). Breslow et al. (1983) considered
6 Gabriel Rodrı́guez-Yam et al.
A GIS path of length 10,000 was obtained for πa (β|w). In Figure 1 the auto-
correlations of some components of β for the GIS sample are shown. Chen et
al. (2000) observe that slow decay in the autocorrelations suggests slow mixing
within a chain and usually slow convergence to the posterior distribution and
viceversa. In Figure 1 we observe a fast decay on the autocorrelations, and so
a good mixing and fast convergence for GIS is expected.
In Table 2 parameter estimates of the model in (14) are given. In column
3, the (unconstrained) MLG estimates are given. In column 5 the constrained
parameter estimates obtained with the GIS path are shown. Also, the estimates
obtained by McDonald and Diamond (1990) are given in column 4.
5 Conclusions
6 Acknowledgements
References
Breslow, N.E., Lubin, J.H., Marek, P. and Langholz, B. (1983). Multiplicative models and
cohort analysis. Journal of the American Statistical Association, 78, 1–12
Chen, M-H., Shao, Q-M. and Ibrahim, J. G. (2000). Monte Carlo methods in bayesian
computation. New York: Springer.
Dellaportas, P. and Smith, A.F.M. (1993). Bayesian inference for generalized linear and
proportional Hazards models via Gibbs sampling. Appl. Statistics, 42, 443–459
Dunson, D.B. and Neelon, B. (2002). Bayesian inference on order constrained parameters
in generalized linear models. Biostatistics Branch, National Institute of Enviromental
Health Sciences, MD A3-03
Fosdick, L.D. (1963). Monte Carlo calculations on the ising lattice. Meth. Comput. Phys.,
1, 245–280
Gelfand, A.E., Smith, A.F.M. and Lee, T.M. (1992). Bayesian analysis of constrained pa-
rameter and truncated data problems using Gibbs sampling. Journal of the American
Statistical Association, 87, 523–532
Gelman A., Carlin, J., Stern, H. and Rubin, D. (2004). Bayesian data analysis. 2nd Ed.
New York: Chapman and Hall/CRC.
8 Gabriel Rodrı́guez-Yam et al.
Geweke, J. (1996). Bayesian inference for linear models subject to linear inequality con-
straints. In: W. O. Johnson, J. C. Lee, and A. Zellner (Ed.) Modeling and Prediction:
Honouring Seymour Geisser(pp. 248-263). New York: Springer
Geyer, C. J. (1991). Constrained maximum likelihood exemplified by isotonic convex logistic
regression. Journal of the American Statistical Association, 86, 415
Geyer, C. J. (1994). Estimating normalizing constants and reweighting mixtures in Markov
chain Monte Carlo. Revision of Technical Report No. 568. School of Statistics, University
of Minnesota.
Gilks, W. R. and Wild, P. (1992). Adaptative rejection sampling for Gibbs sampling. Appl.
Statist., 41, 337–348
Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their ap-
plications. Biometrika, 57, 97–109
McDonald, J. W. and Diamond, I. D. (1990). On the fitting of generalized linear models
with nonnegativity parameter constraints. Biometrics, 46, 201–206
Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. Journal of the
Royal Statistical Society, 135, 370–384
Rodrı́guez-Yam, G. (2003). Estimation for state-space models and bayesian regression anal-
ysis with parameter constraints. Ph. D. Dissertation, Colorado State University, Ft.
Collins, CO.
Bayesian Inference of Constrained Parameters in Generalized Linear Models 9
Table 1: Observed and expected numbers of deaths in 40 States defined by Birthplace and
cumulative years working in heavy and moderate arsenic areas.
β0 β1 β2
1.0
1.0
1.0
0.5
0.5
0.5
Autocorrelation
Autocorrelation
Autocorrelation
0.0
0.0
0.0
−0.5
−0.5
−0.5
−1.0
−1.0
−1.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
β3 β4 β5
1.0
1.0
1.0
0.5
0.5
0.5
Autocorrelation
Autocorrelation
Autocorrelation
0.0
0.0
0.0
−0.5
−0.5
−0.5
−1.0
−1.0
−1.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
Fig. 1: Autocorrelation plots of some components of the GIS path of length 5000