Banerjee 2016 Spatial Data Analysis
Banerjee 2016 Spatial Data Analysis
ANNUAL
REVIEWS Further
Click here to view this article's
online features:
• Download figures as PPT slides
• Navigate linked references
• Download citations
• Explore related articles
Spatial Data Analysis
• Search keywords
Sudipto Banerjee
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
47
PU37CH04-Banerjee ARI 22 February 2016 10:23
INTRODUCTION
The emergence of highly efficient geographical information systems (GIS) databases and associ-
ated computational resources has transformed the way spatial or geographical data are collected,
stored, managed, and analyzed. Researchers in diverse disciplines within the physical, social, and
environmental sciences and in public health are increasingly faced with the task of analyzing data
that are geographically referenced and often presented as maps. Consequently, the past decade has
seen significant development in statistical modeling of complex spatial data; for a variety of meth-
ods and applications, see the texts by Cressie (16), Webster & Oliver (41), Cromley & McLafferty
(18), Møller (32), Schabenberger & Gotway (37), Waller & Gotway (40), Cressie & Wikle (17),
and Banerjee et al. (5), among others.
Following convention, spatial data are often classified into one of three basic types: point-
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
referenced data, point pattern data, and areal data. Point-referenced data sets consist of variables
(e.g., outcomes and predictors) that are linked to a specific point location, customarily referenced by
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
a coordinate system (e.g., longitude-latitude, easting-northing). Point-referenced data sets are not
uncommon in environmental monitoring for public health, where pollutants are often measured
at spatial fixed locations or monitoring stations. The spatial locations are considered fixed, and
investigators are usually interested in the spatial distribution of the measurements and in predicting
their levels at new spatial locations. Point pattern data refer to situations where the spatial locations
themselves correspond to random events. Examples include locations being reported as sites of the
occurrence of a particular disease. Areal data consist of variables that are aggregated over regions
as counts or rates. Areal data are more common in public health applications, where geospatial
referencing is not performed at very fine scales, such as GPS locations of households or small
neighborhoods, to protect the privacy of human subjects.
The Annual Review of Public Health has published two excellent reviews on spatial analytic
methods by Rushton (36) and Auchincloss et al. (1). This review differs from the previous ARPH
articles because of its emphasis on the advances made in formal statistical modeling and infer-
ence for spatial data. It is beyond the scope of a single article to review all such methods. The
aforementioned texts offer more comprehensive coverage. This review focuses primarily on areal
data analysis because areal data are most conspicuous in public health. In fact, point patterns
are often reported as areal aggregates, i.e., counts, rates of other summaries over well-delineated
spatial regions such as counties or census tracts or zip codes, and subsequently modeled as areal
data. Within this context, the review briefly discusses disease mapping for single diseases and for
multiple diseases that may be associated with each other, as well as modeling of areally referenced
survival data.
48 Banerjee
PU37CH04-Banerjee ARI 22 February 2016 10:23
chance variability from genuine differences. Statistical models that allow a more accurate depiction
of true disease rates by borrowing information from neighboring regions will help mitigate the
effects of sparsely populated regions and deliver better inference.
Perhaps the most conspicuous manner of modeling spatial dependence is to introduce spatially
associated random effects within a Bayesian hierarchical setting [see, for example, Banerjee et al.
(5)]. The Bayesian modeling and inferential framework is flexible and extremely rich in its capa-
bilities to accommodate various scientific hypotheses and assumptions. In particular, it provides a
cohesive framework for combining complex data models and external knowledge or expert opin-
ion. This review discusses spatial modeling within a Bayesian context. The models and illustrations
that follow are produced using Markov chain Monte Carlo (MCMC) simulation methods. Again,
it is beyond the scope of this review to discuss MCMC algorithms. Details on established MCMC
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
and other computational algorithms for spatial data can be found in the books by Møller (32),
Gelman et al. (22), and Robert & Casella (35).
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
where i ∼ j denotes that region j is a neighbor of region i. The CAR structure (2) reduces to
the well-known intrinsic conditionally autoregressive (ICAR) model [described in Besag et al.
(10)] if α = 1 or an independence model if α = 0. The ICAR model induces local smoothing by
borrowing strength from the neighbors, whereas the independence model assumes independence
of spatial rates and induces global smoothing. The smoothing parameter α in the CAR prior (2)
controls the strength of spatial dependence among regions, though it has long been appreciated
that a fairly large α may be required to deliver significant spatial correlation [see Wall (39) for
details on this]. Other variants of CAR models have been developed and applied to public health
problems by Leroux et al. (30) and Dean et al. (19).
ind
Yi j ∼ Poisson(Ei j e xi j β j +φi j ), i = 1, . . . , n, j = 1, . . . , p, 4.
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
where each xij is a vector of region-specific explanatory variables for disease j having (possibly
region-specific) parameter coefficients β j . The key problem here is to specify rich and flexible
spatial distributions for the φ ij s.
Carlin & Banerjee (11) and Gelfand & Vounatsou (21) generalized the univariate CAR (2)
to a joint model for the random effects φ ij , which permits modeling of correlation among the
p diseases while maintaining spatial dependence for each of the diseases. These models were
subsequently subsumed by more general, and flexible, Bayesian hierarchical frameworks developed
and implemented by Jin et al. (27, 28).
The idea in Jin et al. (28) is best expounded with p = 2 diseases. Let φ 1 be the n × 1 vector of
spatial random effects for the first disease, and let φ 2 be the same for the second disease. Jin et al.
(28) specify a joint spatial model for φ 1 and φ 2 by specifying a conditional distribution of φ 1 given
φ 2 and a marginal distribution for φ 2 . To achieve spatial smoothing, we assume that both these
distributions are CARs. More precisely, we write the joint density as
p(φ1 , φ2 ) = N (φ2 |0, [τ2 (D − α2 W )]−1 )× N (φ1 |(η0 I + η1 W )φ2 , [τ1 (D − α1 W )]−1 ), 5.
where η0 and η1 are the bridging parameters associating the spatial effect for disease 1 in region
i with disease 2 in region i. With disease 2 in a neighboring region, ρ 1 and ρ 2 are smoothing
parameters associated with the conditional distribution of φ 1 |φ 2 and the marginal distribution of
φ 2 respectively, and τ 1 and τ 2 scale the precision of φ 1 |φ 2 and φ 2 , respectively. The model in
Equation 5 yields a legitimate probability density as long as the two CAR distributions on the
right-hand side are valid, which means that the two dispersion matrices for φ 1 |φ 2 and φ 2 must be
positive definite. Jin et al. (28) provide conditions for these matrices to be positive definite.
Models where the spatial random effects are shown as in Equation 5 are known as generalized
multivariate conditionally autoregressive (GMCAR) models. The specification in Equation 5 sub-
sumes several special cases in the multivariate disease mapping literature. Setting ρ1 = ρ2 = ρ and
η1 = 0 produces a model showing that the association between the two diseases remains the same
across the regions. If we assume ρ1 = ρ2 and η0 = η1 = 0, then we ignore dependence between
the multivariate components, and the model turns out to be equivalent to fitting two separate
univariate CAR models. Finally, if we assume ρ1 = ρ2 = 0, η0 = 0, and η1 = 0, then the model
becomes a simple bivariate normal model with no spatial association.
The above approach is appealing for two diseases, or perhaps at most for three diseases, but using
it to model several diseases at once has its limitations. An inherent problem with these methods
is that their conditional specification imposes a potentially arbitrary order on the variables being
modeled, as they lead to different marginal distributions depending on the conditioning sequence
[i.e., whether to model p(φ1 |φ2 ) and then p(φ2 ), or p(φ2 |φ1 ) and then p(φ1 )]. This problem is
50 Banerjee
PU37CH04-Banerjee ARI 22 February 2016 10:23
somewhat mitigated in certain (e.g., medical and environmental) contexts where a natural order
is reasonable, but in many disease mapping contexts this is not the case.
To obviate the ordering issue, Jin et al. (27) developed an order-free, joint framework for
multivariate areal modeling that allows versatile spatial structures, yet is computationally feasible
for many outcomes. These are called coregionalized MCAR models, named after linear models
of coregionalization in multivariate geostatistics [see, e.g., Wackernagel (38)]. The underlying
idea here is to develop richer spatial association models using linear transformations of much
simpler spatial distributions. The objective is to allow explicit smoothing of cross-covariances
without being hampered by conditional ordering. In particular, suppose we assume a common
proximity specification for each component of the random effects vector, φ. Then, we could
write φ = Aψ, where ψ j , the jth component of ψ, is a univariate intrinsic CAR with precision
parameter τ j2 and each of the component CAR models are independent. The matrix A represents
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
the linear transformation that maps independent CAR effects for each disease to correlated CAR
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
Illustration
We illustrate with a brief example from Jin et al. (28), who modeled the numbers of deaths due to
cancers of the lung and esophagus between 1991 and 1998 across the 87 counties in Minnesota.
The county-level maps of the raw standardized mortality ratios (i.e., SMRi j = Yi j /Ei j ) shown in
Figure 1 exhibit evidence of correlation both across space and between cancers, motivating use
of our proposed GMCAR models. The bottom row shows the smoothed maps obtained from the
GMCAR model specified using a CAR prior for the conditional distribution [lung|esophagus] and
another CAR for the marginal distribution [esophagus].
We fit the model Banerjee & Carlin (4) to this data set. To determine Eij , we account for
each county’s age distribution by calculating the expected age-adjusted number of deaths due
m
to cancer j in county i as Ei j = k=1 ω j k N ik for i = 1, . . . , 87 and j = 1, 2, where ω j k =
87 87
( i=1 Di j k )/( i=1 N ik ) is the age-specific death rate for cancer j and age group k over all Min-
nesota counties, Dijk is the number of deaths in age group k for county i and cancer j, and Nik is the
total population at risk in age group k for county i. Jin et al. (28) conducted exploratory analysis on
the basis of least-squares estimation as well as formal Bayesian model comparison methods to show
that a GMCAR model specified using CAR distributions for [lung|esophagus] and [esophagus]
was preferable to modeling [esophagus|lung]. The GMCAR models are easily implemented in the
Bayesian modeling language BUGS (see http://www.biostat.umn.edu/∼brad/software.html
for the code and the data). Figure 2 presents maps of the smoothed standardized mortality
ratios (SMRs) for lung and esophagus cancer in Minnesota from the GMCAR.
Jin et al. (28) also reported that the estimate of the parameter η1 was statistically significant
for the GMCAR with [lung|esophagus] and not significant in the reverse order. We also saw
that the posterior distribution of the linking parameters η0 and η1 had mostly positive support,
Lung cancer
0.6296–0.7932
0.7932–0.8665
0.8665–0.9599
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
0.9599–1.0731
1.0731–1.3333
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
Esophagus cancer
0–0.7932
0.7932–0.8665
0.8665–0.9599
0.9599–1.0731
1.0731–2
Figure 1
Maps of raw standard mortality ratios (SMRs) of lung and esophagus cancer in Minnesota between 1991 and
1998.
meaning that the two cancers had positive spatial correlation. This is also evident from the maps
of the posterior means of the SMRs for the two cancers under the full model shown in Figure 2.
Incidence of the two cancers is clearly strongly correlated, with higher fitted ratios extending
from the Twin Cities metro area (eastern side, about one-third of the way up) to the mining-
and tourism-oriented north and northeast, regions where conventional wisdom suggests that
cigarette smoking may be more common.
The GMCAR delivered point and 95% equal-tail interval estimates of 0.602 and (0.0267,
0.979) for ρ 1 , and 0.699 and (0.0802, 0.973) for ρ 2 . These are spatial parameters, but while their
values are between 0 and 1 they are not “correlations” in the usual sense; the moderate point
estimates and wide confidence intervals suggest a relatively modest degree of spatial association in
the random effects. Note also that in this setup, ρ 2 measures spatial association in the esophagus
random effects φ 1 , whereas ρ 1 measures spatial association in the lung random effects φ 1 given the
esophagus random effects φ 2 . Turning to τ 1 and τ 2 , under the GMCAR we obtained 32.65 (16.98,
66.71) and 13.73 (4.73, 38.05) as our point and interval estimates, respectively. Because these
parameters measure spatial precision for each disease, they suggest slightly more variability in the
52 Banerjee
PU37CH04-Banerjee ARI 22 February 2016 10:23
Lung cancer
0.7212–0.7932
0.7932–0.8665
0.8665–0.9599
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
0.9599–1.0731
1.0731–1.2237
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
Esophagus cancer
0.7446–0.7932
0.7932–0.8665
0.8665–0.9599
0.9599–1.0731
1.0731–1.1668
Figure 2
Maps of posterior means of standardized mortality ratios (SMRs) of lung and esophagus cancer in Minnesota
between 1991 and 1998 from the generalized multivariate conditionally autoregressive (GMCAR) model
with conditioning order [lung|esophagus].
esophagus random effects, although again comparison is difficult here because τ 2 is a marginal
precision for φ 2 whereas τ 1 is a conditional precision for φ 1 given φ 2 .
The past decade has seen much demand for the analysis of spatially referenced survival data.
When each subject can be referenced with respect to a clinical site or geographical region, we
might suspect that random effects corresponding to proximate regions will be similar in magni-
tude. Models for spatially arranged survival data customarily introduce spatial frailties, such as in
Banerjee et al. (7). How these spatial frailties are introduced in survival models depends on the
specific model. We briefly discuss a few alternate spatial survival models. Apart from the spatial
distribution for the frailties, one needs to model a spatial hazard function with the understanding
that expected survival times (or hazard rates) will be more similar in neighboring regions, owing
to underlying factors (access to care, willingness of the population to seek care, etc.) that vary
spatially. This expectation is in contrast to the similarity observed among survival times from
subjects in proximate regions, which is not necessarily implied by spatially associated frailties.
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
Let T be the waiting time for a subject to experience an event (e.g., disease onset, relapse, death).
The subject’s survival function is defined as S(t) = P(T ≥ t) and the hazard function as h(t) =
f (t)/S(t), where f (t)is the probability density function of T. Let (i, j) index the j-th subject in
region i and let {(ti j , δi j ) : i = 1, 2, . . . , I ; j = 1, 2, . . . , ni } be observations from n subjects in a
study, where tij indicates the time at which either subject (i, j) experienced the event or the subject
was censored. Associated with each tij is an event indicator, δ ij , where δi j = 1 if the event occurred
before the termination of the study and δi j = 0 if the subject was censored. For right-censored
data, we have the likelihood
ni
ni
f (ti j )δi j S(ti j )1−δi j = h(ti j )δi j S(ti j ). 6.
j =1 j =1
If δi j = 1, then subject j contributes f (ti j ) = h(ti j )S(ti j ) to the likelihood, whereas if δi j = 0, then
it contributes S(ti j ) to the likelihood. Cox & Oakes (15) provide the corresponding expressions
for left-censored and interval-censored data.
Let xij be a p×1 vector of observed explanatory variables associated with subject (i, j). To account
for heterogeneity in the population, most survival models will introduce these explanatory variables
in Equation 6 in the hazard function. For example, the proportional hazards model stipulates that
where h 0 (t) is a baseline hazard function affected only multiplicatively by the exponential term
involving the explanatory variables. Another option is a “proportional odds” model (9), which
requires the survival function for subject (i, j) to satisfy
S(t|xi j ) S0 (t)
= exp(xij β). 8.
1 − S(t|xi j ) 1 − S0 (t)
Yet another alternative is the accelerated failure time model. Here, the survival function for
subject (i, j) is S(t) = S0 (t/γi j ), where S0 (t) is any parametric survival function and γi j = exp{xij β}.
The corresponding hazard function for subject (i, j) is h(t) = h 0 (t/γi j )/γi j , where h 0 (t) is the
hazard derived from S0 (t). In each of the above situations, the hazard function can be modeled
using parametric or nonparametric statistical methods. The data-analytic settings where the above
specifications are appropriate, or not, have been comprehensively explored and documented in the
survival analysis literature. For example, the proportional odds model posits that the hazard ratio
approaches unity over time, i.e., the covariate effects on the hazards disappear over time, which
is clearly distinct from the proportional hazards model. The interpretation of the regression
54 Banerjee
PU37CH04-Banerjee ARI 22 February 2016 10:23
component significantly differs. The term exp{x β} in the proportional odds model reflects the
change in the odds of survival (or failure, depending on the parameterization) given the observed
covariates or risk factors.
Li & Ryan (31) provided the basis for legitimate likelihood-based inference from semipara-
metric spatial survival models. They proposed modeling the hazard function nonparametrically
and the spatially correlated frailties using different spatial covariance functions. These models
were applied to the East Boston Asthma Study to detect prognostic factors leading to childhood
asthma. Henderson et al. (23) proposed using multivariate Gamma distributions to investigate
spatial association and variation in the survival of acute myeloid leukemia patients in northern
England. Banerjee et al. (7) proposed a Bayesian hierarchical framework to introduce spatially
correlated frailties and compared performances between frailties modeled using Markov random
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
field and geostatistical covariance functions. Data from a large infant mortality study in the state
of Minnesota were analyzed. Subsequent papers explored Bayesian semiparametric modeling (2),
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
spatiotemporal modeling (3, 8), semiparametric proportional odds models with spatial frailties (6),
joint survival and longitudinal modeling with frailties (44), and parametric accelerated failure time
models (42). Finally, we refer the reader to Lawson et al. (29) for spatial survival models that do
not deploy spatial frailties.
In this section, we work with a two-parameter Weibull distribution specification for the density
function f (t| i j ), where we allow the Weibull scale parameter ρ to vary across the regions, and η,
which may serve as a link to covariates in a regression setup, to vary across individuals. Therefore,
f (t|ρi , ηi j ) = ρi t ρi −1 exp(ηi j − t ρi exp(ηi j )).
Banerjee & Carlin (4) analyze smoking cessation data using interval-censored spatial cure rate
models. The outcome of interest is the time for a subject to relapse into smoking. Here, we observe
only a time interval (ti j L , ti j U ) within which the event (smoking relapse) is known to have occurred.
For patients who did not resume smoking prior to the end of the study, we have ti j U = ∞, yielding
the case of right-censoring at time point ti j L . Thus we now set νi j = 1 if subject ij is interval-
censored (i.e., the subject has experienced the event) and νi j = 0 if the subject is right-censored.
Following Finkelstein (20), the general interval-censored cure rate likelihood is given by
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
I
ni
[S(ti j L |ρi , ηi j )] N i j −νi j {N i j [S(ti j L |ρi , ηi j ) − S(ti j U |ρi , ηi j )]}νi j
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
i=1 j =1
I
ni
S(ti j U |ρi , ηi j ) νi j
= [S(ti j L |ρi , ηi j )] N i j Nij 1 − .
i=1 j =1
S(ti j L |ρi , ηi j )
iid
If N i j ∼ Ber(θi j ), then the marginal likelihood obtained by summing over the Nij s is
L({(ti j L , ti j U )}|{ρi }, {θi j }, {ηi j }, {νi j }) and can be written as
I
ni
S∗ (ti j U |θi j , ρi , ηi j ) νi j
S∗ (ti j L |θi j , ρi , ηi j ) 1 − . 9.
i=1 j =1
S∗ (ti j L |θi j , ρi , ηi j )
As with the covariates, we introduce the frailties φ i through the Weibull link as intercept terms in
the log-relative risk; that is, we set ηi j = xij β + φi . Here we allow the φ i to be spatially correlated
across the regions; similarly we would like to permit the Weibull baseline hazard parameters, ρ i , to
be spatially correlated. A natural approach in both cases is to use a univariate CAR prior. Although
one may certainly employ separate, independent CAR priors on φ ≡ {φi } and ζ ≡ {log ρi }, another
option is to use a bivariate CAR model for the δi = {φi , ζi } = {φi , log ρi }. For further details, see
Banerjee & Carlin (4).
Illustration
We present part of a more elaborate data analysis as part of a smoking cessation study reported
by Murray et al. (33), which is of particular relevance to studies of lung health and primary
cancer control. For our illustration here, we restrict attention to 223 subjects from 54 zip codes
in southeastern Minnesota. These subjects were all smokers at study entry and were randomized
into either a smoking intervention (SI) group or a usual care (UC) group, which received no
antismoking intervention. On the basis of a consecutive five-year monitoring period between
1994 and 1998, each of these subjects were known to have quit smoking at least once during these
five years. The event of interest is whether they relapse into smoking (resume smoking). The
raw data revealed that 29.7% resumed smoking, producing an empirical cure fraction of 0.703.
Additional information available for each subject includes sex, years as a smoker, and the average
number of cigarettes smoked per day prior to the quit attempt.
As is not unusual in spatial data sets, the 54 zip codes that contributed the data were not
contiguous, which made it difficult to fit neighborhood-based models. Banerjee & Carlin (4)
considered 81 contiguous zip codes shown in Figure 3, which included the 54 dark-shaded regions
56 Banerjee
PU37CH04-Banerjee ARI 22 February 2016 10:23
Figure 3
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
Map showing a missingness pattern for the smoking cessation data between 1994 and 1998 from 54 zip codes
in southeastern Minnesota: Lightly shaded regions are those having no responses.
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
that had patients in the data set; the 27 regions that did not contribute patients were treated as if
the data were missing.
Table 1 presents estimated posterior quantiles for the fixed effects β, cure fraction θ, and
hyperparameters. Smoking intervention, expectedly, produces a significant decrease in the log
relative risk of relapse. Women seem to be more likely to relapse than men. This result is often
attributed to the (real or perceived) risk of weight gain following smoking cessation. The number
of cigarettes smoked per day seems to be less significant; however, what is perhaps somewhat
counterintuitive is that shorter-term smokers relapse sooner, perhaps attributable to subjects
being better able to quit smoking as they age.
CONCLUDING REMARKS
This article has provided a glimpse of the different types of statistical spatial models available
for analyzing regionally aggregated data (or areal data) and the type of statistical inference that
is obtained from such models. Although the illustrations provided here aggregated the data over
a number of years and did not attempt to model associations across time, such associations can
also be modeled by allowing the spatial random effects to vary across time. Also, this review has
restricted attention to the CAR models, which are especially congruous with Bayesian statistical
inference. Other types of spatial dependence structures, such as simultaneous autoregressive (SAR)
models, are very popular, and perhaps better suited, for maximum-likelihood-based inference.
Comparisons between these models can be found in Wall (39). Several other variants of such
models, including spatiotemporal extensions, can be found in Banerjee et al. (5) and references
therein.
SUMMARY POINTS
1. Statistical modeling and scientific inference using spatially referenced data sets are be-
coming increasingly common in public health research. Examples include disease map-
ping and spatial survival analysis.
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
2. Researchers are formulating more complex spatially oriented hypotheses that require
Annu. Rev. Public Health 2016.37:47-60. Downloaded from www.annualreviews.org
FUTURE ISSUES
1. As the accessibility to GIS and related computational resources continues to expand,
spatial statisticians are encountering increasingly complex data sets with more demanding
research questions. The scope for spatial modeling and analysis within public health will
continue to expand, ushering in new domains of application.
2. A large part of methodological research will be devoted to the development of probability
models, estimation methods, and computational algorithms for analyzing such data sets.
3. Statistical methods for analyzing spatially referenced data sets are computationally ex-
pensive and become unfeasible for large data sets. As spatial data sets become larger,
statisticians start encountering the so-called “big data” problems in geostatistics. This
area has started to garner much attention over the past five years or so and is seeing
increasing research activity with regard to statistical models, methods, and algorithms
for massive spatial data sets.
DISCLOSURE STATEMENT
The author is not aware of any affiliations, memberships, funding, or financial holdings that might
be perceived as affecting the objectivity of this review.
58 Banerjee
PU37CH04-Banerjee ARI 22 February 2016 10:23
LITERATURE CITED
1. Auchincloss AH, Gebreab SY, Mair C, Diez Roux AV. 2012. A review of spatial methods in epidemiology,
2000–2010. Annu. Rev. Public Health 33:107–22
2. Banerjee S, Carlin B. 2002. Spatial semiparametric proportional hazards models for analyzing infant
mortality rates in Minnesota counties. In Case Studies in Bayesian Statistics, Vol. VI, ed. C Gatsonis,
R Kass, A Carriquiry, A Gelman, D Higdon, et al., pp. 137–52. New York: Springer
3. Banerjee S, Carlin B. 2003. Semiparametric spatiotemporal frailty modeling. Environmetrics 14:523–35
4. Banerjee S, Carlin B. 2004. Parametric spatial cure rate models for interval-censored time-to-relapse data.
Biometrics 60:268–75
5. Banerjee S, Carlin B, Gelfand A. 2014. Hierarchical Modeling and Analysis for Spatial Data. Boca Raton,
FL: Chapman and Hall/CRC Press. 2nd ed.
6. Banerjee S, Dey D. 2005. Semiparametric proportional odds model for spatially correlated survival data.
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
29. Lawson A, Choi J, Zhang J. 2014. Prior choice in discrete latent modeling of spatially referenced cancer
survival. Stat. Methods Med. Res. 23:183–200
30. Leroux B, Lei X, Breslow N. 1999. Estimation of disease rates in small areas: a new mixed model for spatial
dependence. In Statistical Models in Epidemiology, the Environment, and Clinical Trials, ed. ME Halloran, D
Berry, pp. 135–78. New York: Springer
31. Li Y, Ryan L. 2002. Modeling spatial survival data using semiparametric frailty models. Biometrics 58:287–
97
32. Møller J, ed. 2003. Spatial Statistics and Computational Methods. New York: Springer
33. Murray R, Anthonisen N, Connett J, Wise R, Lindgren P, et al. 1998. Effects of multiple attempts to
quit smoking and relapses to smoking on pulmonary function. Lung Health Study Research Group.
J. Clin. Epidemiol. 51:1317–26
34. Othus M, Barlogie B, LeBlanc M, Crowley J. 2012. Cure models as a useful statistical tool for analyzing
Access provided by 2605:8d80:560:8c1a:38c2:b7c3:d45f:3573 on 10/03/23. For personal use only.
36. Rushton G. 2003. Public health, GIS, and spatial analytic tools. Annu. Rev. Public Health 24:43–56
37. Schabenberger O, Gotway C. 2004. Statistical Methods for Spatial Data Analysis. Boca Raton, FL: Chapman
and Hall/CRC
38. Wackernagel H. 2003. Multivariate Geostatistics: An Introduction With Applications. New York: Springer.
3rd ed.
39. Wall M. 2004. A close look at the spatial structure implied by the CAR and SAR models. J. Stat. Plann.
Inference 121:311–24
40. Waller L, Gotway C. 2004. Applied Spatial Statistics for Public Health Data. New York: Wiley
41. Webster R, Oliver M. 2001. Geostatistics for Environmental Scientists. New York: Wiley
42. Zhang J, Lawson AB. 2011. Bayesian parametric accelerated failure time spatial model and its application
to prostate cancer. J. Appl. Stat. 38:591–603
43. Zhang Y, Hodges J, Banerjee S. 2009. Smoothed ANOVA with spatial effects as a competitor to MCAR
in multivariate spatial smoothing. Ann. Appl. Stat. 3:1805–30
44. Zhou H, Lawson AB, Hebert J, Slate E, Hill E. 2008. Joint spatial survival modelling for the date of
diagnosis and the vital outcome for prostate cancer. Stat. Med. 27:3612–28
60 Banerjee
PU37-FrontMatter ARI 17 February 2016 13:59
Annual Review of
Public Health
vi
PU37-FrontMatter ARI 17 February 2016 13:59
Contents vii
PU37-FrontMatter ARI 17 February 2016 13:59
Health Services
Indexes
Errata
An online log of corrections to Annual Review of Public Health articles may be found
at http://www.annualreviews.org/errata/publhealth
viii Contents