0% found this document useful (0 votes)
14 views11 pages

GLMConstrained

This paper proposes a Bayesian model for analyzing Generalized Linear Models (GLMs) with regression parameters constrained by a set of linear inequalities. The methodology utilizes Gibbs sampling and importance sampling to approximate the posterior distribution of the regression parameters, making it accessible for users without advanced numerical skills. An example is provided using a cohort dataset related to respiratory cancer deaths among smelter workers exposed to arsenic.

Uploaded by

Justyou Andme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views11 pages

GLMConstrained

This paper proposes a Bayesian model for analyzing Generalized Linear Models (GLMs) with regression parameters constrained by a set of linear inequalities. The methodology utilizes Gibbs sampling and importance sampling to approximate the posterior distribution of the regression parameters, making it accessible for users without advanced numerical skills. An example is provided using a cohort dataset related to respiratory cancer deaths among smelter workers exposed to arsenic.

Uploaded by

Justyou Andme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Annals of the Institute of Statistical Mathematics manuscript No.

(will be inserted by the editor)

Bayesian Inference of Constrained Parameters


in Generalized Linear Models

Gabriel Rodrı́guez-Yam · Luis


Colorado-Martı́nez · Gustavo
Ramı́rez-Valverde

Received: date / Revised: date

Abstract In this paper we propose a Bayesian model to analyze a General-


ized Linear Model (GLM) with known scale parameter when the regression
parameters are subject to a set of inequality linear constraints. The analysis
is based on the Gibbs sampler implementation to the normal approximation
to the posterior distribution of the regression parameters and on importance
sampling. To draw from this approximation, a truncated multivariate normal
distribution, a Gibbs sampler is implemented. To illustrate the method a co-
hort dataset previously analyzed in the literature is used.
Keywords Bayesian inference · Linear constrained parameters · Gibbs sam-
pler · Truncated normal distribution · Normal approximations · Importance
sampling

1 Introduction

Constrained parameter problems arise in a wide variety of applications, includ-


ing bioassay, actuarial graduation, ordinal categorical data, response surfaces,
reliability development testing, and variance component models (Gelfand et al.
(1992)). Bayesian calculations can be implemented routinely for constrained
parameters by means of Markov chain Monte Carlo (MCMC) methods. As
an example, Gelfand et al. (1992) provide a discussion of the Gibbs sampler

Universidad Autónoma Chapingo


KM. 35.5 Carretera México-Téxcoco, Chapingo. C.P. 56230
E-mail: [email protected]
Universidad Autónoma de Yucatán
C-48 B #207 Frac. Vivas, Tizimı́n Yucatán México. C.P. 97700
E-mail: [email protected]
Correspondence author Colegio de Postgraduados
KM. 36.5 Carretera México-Téxcoco, Montecillo, México. C.P. 56230
E-mail: [email protected]
2 Gabriel Rodrı́guez-Yam et al.

structures arising from rather general formulations of Bayesian parametric


versions of constrained parameters and truncated data problems. The imple-
mentation of this standard Gibbs sampler is reduced to identify the appropri-
ate full conditional distributions and methods for drawing from them. Chen
et al. (2000) use simulation-based methods via the reweighting mixtures of
Geyer (1994) to compute posterior quantities of the desired Bayesian poste-
rior distribution in a constrained parameter problem. Few publications have
focused on the Bayesian analysis for generalized linear models. Geweke (1996)
considers a Bayesian approach to the constrained inference problem in linear
regression models. Rodrı́guez-Yam (2003) proposes an efficient Gibbs sampler
implementation for simulation of a multivariate normal random vector sub-
ject to inequality linear constraints and use it to solve the constrained linear
regression problem. Unlike Geweke, in this implementation it is not imposed
any limitation on the number of constraints. Dellaportas and Smith (1993)
proposed a Gibbs sampler wich makes systematic use of an adaptative rejec-
tion algorithm proposed by Gilks and Wild (1992). Dunson and Neelon (2002)
propose a general Bayesian approach for inference for order constrained pa-
rameters in generalized linear models, instead of choosing a prior distribution
with support on the constrained space, they propose to map draws from an
unconstrained posterior density using an isotonic regression transformation.
In the frequentist approach, McDonald and Diamond (1990) use the Kuhn-
Tucker conditions for the case of a generalized linear model with nonnegativity
constraints on some or all the regression parameters to obtain maximum like-
lihood estimates. Geyer (1991) use parametric bootstrap to obtain maximum
likelihood estimates when the regression parameters are subject to order re-
strictions. In this paper we propose a Bayesian analyses of the constrained
parameter problem in generalized linear models. First, the posterior distri-
bution of the regression parameters are approximated by a truncated normal
distribution with a sample from this approximation, functionals of β can be
estimated using Importance Sampling. Hence, the analysis reduces to sample
from a truncated multivariate normal distribution. To do so, the standard
Gibbs sampler proposed by Gelfand et al. (1992) is implemented. Our method
provides a straightforward computational procedure for Bayesian inferences
and does not require numerical or analytical sophistication for the user. The
organization of this paper is as follows. In Section 2 we provide a Bayesian
framework for generalized linear models, where the regression parameters are
subject to a set of inequality linear constraints. In Section 3 we provide a
methodology for dealing with the analyses of these models and in Section 4
the methodology is illustrated using a cohort dataset previously analyzed by
Breslow et al. (1983) and McDonald and Diamond (1990).

2 Bayesian Generalized Linear Models with Constraints

Generalized linear models, introduced by Nelder and Wedderburn (1972), pro-


vide an unifying procedure that is widely used for regression analysis. These
Bayesian Inference of Constrained Parameters in Generalized Linear Models 3

models are intended to describe non-normal responses, yi , i = 1, . . . , n, which


are assumed to be a realization of a random variable Yi independently dis-
tributed with mean µi and pdf belonging to the one-parameter exponential
family  
f (yi ; θi ) = exp a−1 (φi ) (yi θi − ψ(θi )) + c(yi ; φi ) , (1)
where the canonical parameters θi , i = 1, . . . , n are unknown and a(φi ) (> 0),
i = 1, . . . , n are known. It is assumed that θi = h(xti β), where h is a strictly
increasing sufficiently smooth function, β is a vector of p unknown regression
coefficients, and xi , i = 1, . . . , n, is a vector of p covariables. For simplicity,
it is assumed that the φi are known and that X, the design matrix, with
rows xti , i = 1, . . . , n, has rank p. The likelihood function, based on the data
y := (y1 , . . . , yn )t , for model in (1) is given by
" n #
X
−1 t t
L(β) ∝ exp a (φi )[yi h(xi β) − ψ(h(xi β))] . (2)
i=1

It is assumed that the regression parameters β1 , . . . , βp are subject to a set of


inequality linear constraints
Bβ ≤ b, (3)
where B is a q × p matrix of known constants and b is a q vector of known
constants. This inequality linear constraints define a convex subset on ℜp given
by
T := { β ∈ ℜp | Bβ ≤ b} . (4)
Let π(β) be a prior for β with support T , then the posterior distribution of
β is given by
π(β|y) ∝ L(β)π(β)IT (β). (5)
The posterior π(β|y) is proper as long as n > k and the rank of X equals p.
Statistically, in the absence of prior information, the first condition requires
that there be at least as many data points as parameters, and the second
condition requires that the columns of X be linearly independent in order
for all p coefficients of β to be uniquely identified by the data (Gelman et al.
(2004)). Due to the constraints in (4), direct sampling from (5) can be difficult
to obtain.

3 Gibbs Importance Sampling (GIS)

In this Section a methodology for estimating E[h(β)], where h(·) is a real


function and expectation is computed with respect to the posterior distribution
in (5), is proposed. The main idea consists in approximating the likelihood
L(β) in (2) by a truncated normal distribution. An approximation πa (β|w) for
π(β|y) is obtained when L(β) in (2) is replaced by this approximation. When
either a normal or a non-informative prior is used, πa (β|w) is a truncated
multivariate normal distribution. If a sample from πa (β|w) were available,
4 Gabriel Rodrı́guez-Yam et al.

importance sampling (Fosdick (1963), Hastings (1970)) can be used to estimate


E[h(β)]. To sample from πa (β|w), the standard Gibbs sampler (Gelfand et al.
(1992))can be used.

3.1 Approximation normal for L(β)

Let β̂ be the posterior mode of the distribution in (5) and ηˆi = xti β̂ the corre-
sponding vector of linear predictors. Denoting l′ (η̂i ; yi ) = [∂ log f (yi ; ηi )/∂ηi ]ηi =η̂i
 
and l′′ (η̂i ; yi ) = ∂ 2 log f (yi ; ηi )/∂ 2 ηi η =η̂ , then a second order Taylor expan-
i i
sion of log f (yi ; ηi ) around ηˆi as a function of ηi gives

1
log f (yi ; ηi ) ≈ − (wi − ηi )2 + constant,
2σi2

where wi := η̂i − l′ (η̂i ; yi )/l′′ (η̂i ; yi ) and σi2 := −1/l′′ (η̂i ; yi ). Therefore, the i th
data point is “transformed” to an observation wi , normally distributed with
mean ηi and variance σi2 , i = 1, . . . , n. Now, if we define w and Σ w as

w := [w1 . . . , wn ]t , (6)
Σ w := diag[σ12 , . . . , σn2 ], (7)

then we can combine the wi points to approximate the entire likelihood by a


linear regression model of the vector w in (6) on the matrix of explanatory
variables X, with known variance matrix given by (7). Thus, the likelihood
function in (2) is approximated by La (β) given by
 
1
La (β) := exp − (w − Xβ)t Σ −1
w (w − Xβ) IT (β). (8)
2

Now, replacing L(β) in (5) by La (β) in (8), the following approximation


πa (β|w) for the posterior distribution π(β|y) in (5) is obtained
 
1 t −1
πa (β|w) ∝ exp − (w − Xβ) Σ w (w − Xβ) π(β)IT (β). (9)
2

In this work a non-informative prior for β, with support in the region T defined
in (4), is considered, i.e.,
π (β) ∝ 1T (β). (10)
Hence, substituting (10) into (9), we obtain
 
1 t −1
πa ( β| w) ∝ exp − (w − Xβ) Σ w (w − Xβ) IT (β)
2
 
1 t t −1
= exp − (β − β̂) X Σ w X(β − β̂) IT (β), (11)
2
Bayesian Inference of Constrained Parameters in Generalized Linear Models 5

Therefore, πa (β|w) is a truncated multivariate normal density with mean β̃


and covariance matrix Σ a given by
−1 t −1
β̃ = X t Σ −1
w X X Σ w w, (12)
Σ a = [X t Σ −1
w X]
−1
, (13)

where w and Σ w are given in (6) and (7), respectively. To obtain β̂, the mode
of π(β|y) in (5), we proceed as follows: given an initial estimate β̂ for the
mode we obtain successively β̃ in (12) and Σ a in (13) until the convergence of
β̃. This process is equivalent to solving the system of p nonlinear equations,
∂ log π(β|y)/∂β = 0, using the Newton-Raphson method.

3.2 Sampling from πa (β|w)

To draw from the truncated normal approximation πa (β|w), the standard


Gibbs sampler given in Gelfand et al. (1992) is implemented. In this case, to
operate the Gibbs sampler, the full conditionals of the approximation πa (β|w)
are truncated univariate normals, i.e., βj |β −j ∼ NTj (βj∗ , σjj ∗
), where Tj =
t
 ∗ −1 ∗
βj ∈ ℜ : bj βj ≤ b − B −j β −j , βj = β̃j + Σ −j Σ −jj (β −j − β̃ −j ), and σjj =
t −1
σjj − Σ −j Σ −jj Σ −j , j = 1, . . . , p, β̃j is the j th element from β̃ in (12), β̃ −j
is the vector obtained from β̃ by removing the j th element, Σ −j is the j th
column of Σ a in (13) without the j th entry, Σ −jj is the matrix obtained from
Σ a by removing both the j th column and j th row, σjj is the jj element of Σ a ,
B −j denotes the matrix obtained from B in (3) by removing the j th column
bj , and b is the vector defined in (3).

3.3 Estimating functionals of β

To estimate E[h(β)] with respect to the distribution in (5), we apply Pthe impor-
n
tance sampling technique (Gilks P and Wild (1992)), i.e., Ê[h(β)] = i=1 ωi h(β i ),
m
where ωi = π(β i |y)/πa (β i |w)/[ i=1 π(β i |y)/πa (β i |w)]. The standard Gibbs
sampler implementation to β (using full conditionals) followed by Importance
Sampling to estimate E[h(β)] will be denoted GIS.

4 Example

The data in Table 1, taken from Breslow et al. (1983), are about respiratory
cancer deaths among a cohort of smelter workers exposed to airborne arsenic
trioxide. “Obs” and “Exp” are the observed and expected number of respira-
tory cancer deaths for the i th subcohort, i = 1, . . . , 40. Each person is classified
according to its birthplace (U.S. or foreign), level of moderate arsenic expo-
sure (“0”, “< 1”, “1 − 4”, “5 − 14”, “15+” years), and level of heavy arsenic
exposure (“0”, “< 1”, “1 − 4”, “5+” years). Breslow et al. (1983) considered
6 Gabriel Rodrı́guez-Yam et al.

a Poisson generalized linear model for Yi , the number of respiratory cancer


deaths in the i th subcohort, where each exposure level was contrasted with
the first “baseline” level of no exposure, i.e.,
indep
Yi ∼ Poisson(λi ), i = 1, . . . , 40, (14)
where log λi = log(ei ) + β0 + β1 x1i + · · · + β8 x8i , and ei is the expected number
of respiratory cancer deaths in the i th subcohort, log(ei ) is an offset, β0 is
the intercept, β1 is a parameter to contrast the two levels of birthplace, with
U.S. born as baseline, β2 , β3 , β4 and β5 are the parameters of the moderate
arsenic exposure factor and β6 , β7 , and β8 are the parameters of the heavy
arsenic exposure factor. Breslow et al. (1983) and McDonald and Diamond
(1990) assume the following monotonic constraints on the exposure levels for
both arsenic exposure factors
0 ≤ β2 ≤ β3 ≤ β4 ≤ β5 , 0 ≤ β6 ≤ β7 ≤ β8 . (15)
Breslow et al. (1983) noticed that the MLE estimates do not satisfy these
monotonic constraints. To take these into account, McDonald and Diamond
(1990) used a reparameterization to define only nonnegativity constraints for
each arsenic exposure factor by contrasting adjacent exposure levels and re-
moved those that did not satisfy the nonnegativity constraints.
Our procedure GIS can be used to study this constrained GLM. To start,
notice that the inequality linear constraints in (15) define the following convex
subset on ℜ9
T := β ∈ ℜ9 Bβ ≤ b ,

(16)
where
0 0 −1 0 0 0 0 0 0 0
   
0 0 1 −1 0 0 0 0 0  0
0 0 0 1 −1 0 0 0 0  0
   
B=0 0 0 0 1 −1 0 0 0  , and b =  0  .
   
0 0 0 0 0 0 −1 0 0  0
   
0 0 0 0 0 0 1 −1 0  0
0 0 0 0 0 0 0 1 −1 0
Thus, the posterior distribution of β with support in (16) is given by
" 40 40
#
X X
t
 t
π(β|y) ∝ exp yi xi β − ei exp xi β IT (β),
i=1 i=1

and the mean β̃ and covariance matrix Σ a of the normal approximation


πa (β|w) in (11) are given by
0.53 0.02 −0.01 −0.01 −0.01 −0.01 −0.01 −0.01 −0.01 −0.01
   
 0.73 −0.01 0.03 0.00 0.00 −0.01 −0.01 0.00 0.00 0.00
−0.01 0.00 0.09 0.01 0.01 0.01 −0.01 0.00 0.00
   
−0.27

 0.55 −0.01 0.00 0.01 0.07 0.01 0.01 0.00 −0.01 0.00
   
β̃ =  0.22 , Σa = −0.01 −0.01 0.01 0.01 0.12 0.02 0.00 −0.01 0.00 .
   
 0.90 −0.01 −0.01 0.01 0.01 0.02 0.06 −0.01 −0.01 0.01
   
 0.46 −0.01 0.00 −0.01 0.00 0.00 −0.01 0.09 0.01 0.01
   
 0.18 −0.01 0.00 0.00 −0.01 −0.01 −0.01 0.01 0.21 0.01
1.24 −0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.09
Bayesian Inference of Constrained Parameters in Generalized Linear Models 7

A GIS path of length 10,000 was obtained for πa (β|w). In Figure 1 the auto-
correlations of some components of β for the GIS sample are shown. Chen et
al. (2000) observe that slow decay in the autocorrelations suggests slow mixing
within a chain and usually slow convergence to the posterior distribution and
viceversa. In Figure 1 we observe a fast decay on the autocorrelations, and so
a good mixing and fast convergence for GIS is expected.
In Table 2 parameter estimates of the model in (14) are given. In column
3, the (unconstrained) MLG estimates are given. In column 5 the constrained
parameter estimates obtained with the GIS path are shown. Also, the estimates
obtained by McDonald and Diamond (1990) are given in column 4.

5 Conclusions

In this manuscript, a Bayesian procedure to analyze generalized linear mod-


els when the regression parameters are subject to a set of inequality linear
constraints with known scale parameter has been implemented. The number
of linear constraints can exceed the number of regression parameters. The
method, which is denoted as GIS, is based on sampling from the truncated
normal approximation of the posterior distribution of the regression parame-
ters. To sample from this approximation the standard Gibbs sampler proposed
by Gelfand et al. (1992). To estimate functionals of the regression parameters,
the method of importance sampling is proposed. In the numerical example,
the Gibbs implementation has good mixing and converges fast.

6 Acknowledgements

This work was supported in part by PROMEP.

References

Breslow, N.E., Lubin, J.H., Marek, P. and Langholz, B. (1983). Multiplicative models and
cohort analysis. Journal of the American Statistical Association, 78, 1–12
Chen, M-H., Shao, Q-M. and Ibrahim, J. G. (2000). Monte Carlo methods in bayesian
computation. New York: Springer.
Dellaportas, P. and Smith, A.F.M. (1993). Bayesian inference for generalized linear and
proportional Hazards models via Gibbs sampling. Appl. Statistics, 42, 443–459
Dunson, D.B. and Neelon, B. (2002). Bayesian inference on order constrained parameters
in generalized linear models. Biostatistics Branch, National Institute of Enviromental
Health Sciences, MD A3-03
Fosdick, L.D. (1963). Monte Carlo calculations on the ising lattice. Meth. Comput. Phys.,
1, 245–280
Gelfand, A.E., Smith, A.F.M. and Lee, T.M. (1992). Bayesian analysis of constrained pa-
rameter and truncated data problems using Gibbs sampling. Journal of the American
Statistical Association, 87, 523–532
Gelman A., Carlin, J., Stern, H. and Rubin, D. (2004). Bayesian data analysis. 2nd Ed.
New York: Chapman and Hall/CRC.
8 Gabriel Rodrı́guez-Yam et al.

Geweke, J. (1996). Bayesian inference for linear models subject to linear inequality con-
straints. In: W. O. Johnson, J. C. Lee, and A. Zellner (Ed.) Modeling and Prediction:
Honouring Seymour Geisser(pp. 248-263). New York: Springer
Geyer, C. J. (1991). Constrained maximum likelihood exemplified by isotonic convex logistic
regression. Journal of the American Statistical Association, 86, 415
Geyer, C. J. (1994). Estimating normalizing constants and reweighting mixtures in Markov
chain Monte Carlo. Revision of Technical Report No. 568. School of Statistics, University
of Minnesota.
Gilks, W. R. and Wild, P. (1992). Adaptative rejection sampling for Gibbs sampling. Appl.
Statist., 41, 337–348
Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their ap-
plications. Biometrika, 57, 97–109
McDonald, J. W. and Diamond, I. D. (1990). On the fitting of generalized linear models
with nonnegativity parameter constraints. Biometrics, 46, 201–206
Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. Journal of the
Royal Statistical Society, 135, 370–384
Rodrı́guez-Yam, G. (2003). Estimation for state-space models and bayesian regression anal-
ysis with parameter constraints. Ph. D. Dissertation, Colorado State University, Ft.
Collins, CO.
Bayesian Inference of Constrained Parameters in Generalized Linear Models 9

Table 1: Observed and expected numbers of deaths in 40 States defined by Birthplace and
cumulative years working in heavy and moderate arsenic areas.

Years moderate Years heavy arsenic


arsenic 0 <1 1−4 5+ Total
Obs. Exp. Obs. Exp. Obs. Exp. Obs. Exp. Obs. Exp.
U.S. Born
0 28 20.86 2 1.21 3 0.56 6 0.60 39 23.23
<1 7 4.91 2 0.76 1 0.20 2 0.29 12 6.16
1−4 8 3.10 4 0.33 1 0.10 2 0.11 15 3.64
5 − 14 4 1.58 0 0.12 0 0.08 0 0.01 4 1.79
15+ 4 1.14 1 0.11 0 0.05 0 0.03 5 1.33
Totals 51 31.59 9 2.53 5 0.99 10 1.04 75 36.15
Foreign Born
0 33 7.34 1 0.39 0 0.11 2 0.28 36 8.12
<1 2 1.31 0 0.10 0 0.02 0 0.05 2 1.48
1−4 4 0.91 0 0.01 0 0.07 0 0.04 4 1.03
5 − 14 6 1.05 0 0.02 0 0.13 0 0.04 6 1.24
15+ 16 1.60 3 0.20 0 0.10 0 0.01 19 1.91
Totals 61 12.21 4 0.72 0 0.43 2 0.42 67 13.78
10 Gabriel Rodrı́guez-Yam et al.

Table 2: Estimations of the unconstrained and constrained regression coefficients (s.e. in


parentheses).

Explanatory Regression Unconstrained Constrained estimate


variable coefficient estimate MD GIS
Constant β0 0.53 (0.14) 0.49 (0.13) 0.43 (0.13)
birthplace β1 0.73 (0.17) 0.72 (0.17) 0.71 (0.18)
Years moderate Adjacent lower
arsenic exposure
<1 β2 -0.27 (0.29) 0.13 (0.10)
1−4 β3 0.55 (0.26) 0.46 (0.21) 0.43 (0.18)
5 − 14 β4 0.22 (0.34) 0.60 (0.20)
15+ β5 0.90 (0.24) 0.49 (0.28) 1.01 (0.22)
Years heavy Adjacent lower
arsenic exposure
<1 β6 0.46 (0.30) 0.35 (0.26) 0.30 (0.20)
1−4 β7 0.18 (0.46) 0.55 (0.24)
5+ β8 1.24 (0.31) 0.89 (0.38) 1.25 (0.28)
Bayesian Inference of Constrained Parameters in Generalized Linear Models 11

β0 β1 β2
1.0

1.0

1.0
0.5

0.5

0.5
Autocorrelation

Autocorrelation

Autocorrelation
0.0

0.0

0.0
−0.5

−0.5

−0.5
−1.0

−1.0

−1.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30

Lag Lag Lag

β3 β4 β5
1.0

1.0

1.0
0.5

0.5

0.5
Autocorrelation

Autocorrelation

Autocorrelation
0.0

0.0

0.0
−0.5

−0.5

−0.5
−1.0

−1.0

−1.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30

Lag Lag Lag

Fig. 1: Autocorrelation plots of some components of the GIS path of length 5000

You might also like