Poisson-Based Regression Analysis of Aggregate Crime Rates
Poisson-Based Regression Analysis of Aggregate Crime Rates
1, 2000
This article introduces the use of regression models based on the Poisson distri-
bution as a tool for resolving common problems in analyzing aggregate crime
rates. When the population size of an aggregate unit is small relative to the
offense rate, crime rates must be computed from a small number of offenses.
Such data are ill-suited to least-squares analysis. Poisson-based regression models
of counts of offenses are preferable because they are built on assumptions about
error distributions that are consistent with the nature of event counts. A simple
elaboration transforms the Poisson model of offense counts to a model of per
capita offense rates. To demonstrate the use and advantages of this method, this
article presents analyses of juvenile arrest rates for robbery in 264 nonmetropoli-
tan counties in four states. The negative binomial variant of Poisson regression
effectively resolved difficulties that arise in ordinary least-squares analyses.
1. INTRODUCTION
The purpose of this paper is to introduce a statistical approach to ana-
lyzing aggregate crime rates that solves problems arising from small popu-
lations and low base-rates. In aggregate analyses, the units of the sample
are aggregations of individuals, such as neighborhoods, cities, and schools,
and the researcher is interested in explaining variation in crime rates across
those units. These crimes rates are defined as the number of crime events
(e.g., arrests, victimizations, crimes known to the police) divided by the
population size, often reported as crimes per 100,000.
The standard approach to analyzing per capita rates such as these is to
use the computed rates for each aggregate unit (or a transformed version of
them) as the dependent variable in an ordinary least-squares (OLS)
regression. For reasons discussed below, this least-squares approach is inap-
propriate when rates for many of the units must be computed from small
1
Crime, Law and Justice Program, Department of Sociology, 201 Oswald Tower, Pennsylvania
State University, University Park, Pennsylvania 16802. e-mail: [email protected].
21
0748-4518y00y0300-0021$18.00y0 2000 Plenum Publishing Corporation
22 Osgood
e −λ iλ iy i
P(Yi Gyi )G (2)
yi !
Fig. 2. Poisson distribution of arrest rates for four population sizes, given a mean rate of 500 arrests per 100,000 population.
Osgood
Poisson for Aggregate Data 27
Next we must alter the basic Poisson regression model so that it pro-
vides an analysis of per capita crime rates rather than counts of crimes. If
λ i is the expected number of crimes in a given aggregate unit, then λ i yni
would be the corresponding per capita crime rate, where ni is the population
size for that unit. With a bit of algebra, we can derive a variation of Eq. (1)
that is a model of per capita crime rates:
λi
1 2
k
ln G ∑ β k xik
ni k G1
K
ln(λ i )Gln(ni )C ∑ β k xik (3)
k G0
Thus, by adding the natural logarithm of the size of the population at risk
to the regression model of Eq. (1), and by giving that variable a fixed coef-
ficient of one, Poisson regression becomes an analysis of rates of events per
capita, rather than an analysis of counts of events. The same strategy can
be used to standardize event count models for other sources of variation
across cases, such as the length of the period of observation. Accordingly,
computer programs for Poisson regression routinely incorporate this
feature.
A Poisson-based regression model that is standardized for the size of
the population at risk acknowledges the greater precision of rates based on
larger populations, thus addressing the problem of heterogeneity of error
variance discussed above. This becomes apparent when we translate the
known variance of the Poisson distribution to a standard deviation for the
corresponding crime rates. Because the variance of the Poisson distribution
is the mean count, λ , its standard deviation will be SDλ G√λ . The mean
count of crimes, in turn, equals the underlying per capita crime rate, C,
times the size of the population: λ GCn. When a variable is divided by a
constant, its standard deviation is also divided by that constant. Therefore,
it follows that the standard deviation of a crime rate, computed from a
population of size n, will be
√λ √Cn √C √n √C
SDC G G G G
n n n √n
Fig. 3. Negative binomial distributions with mean count of 3, for four levels of residual variance.
Γ( yiCφ ) φ φ λ iyi
P(Yi Gyi )G (4)
yi !Γ(φ ) (φ Cλ i )φ Ayi
2
To ensure that single cases did not have undue influence on our results, we recoded some
extreme values to values less deviant from the distribution as a whole. We set the maximum
for residential stability to 0.6 (formerly 0.76; three cases recoded), that for female-headed
households to 0.35 (formerly 0.42, four cases recoded), and that for unemployment to 0.12
(formerly 0.14; three cases recoded). This recoding had no substantive impact on the results,
and it increases our faith in their reliability.
Poisson for Aggregate Data 31
assess only within-state relationships pooled across the states. Therefore the
model includes dummy variables representing states (with Florida serving
as the omitted reference category).
Table II. Five Statistical Models of Juvenile Arrest Rates for Robbery in Nonmetropolitan
Counties of Four States
Statistical method
OLS, OLS, OLS, Basic Negative
Explanatory variable ratey100,000 log(rateC1) log(rateC0.2) Poisson binomial
Log population at risk
b 11.220 0.749 1.102 1.501a 1.718a
SE 3.838 0.128 0.177 0.061 0.188
t 2.923 5.852 6.226 8.213 3.819
P 0.004 0.000 0.000 0.000 0.000
Residential instability
b 35.573 3.017 4.366 1.567 0.162
SE 48.790 1.628 2.255 0.567 2.026
t 0.729 1.853 1.936 2.764 0.080
P 0.467 0.065 0.054 0.005 0.936
Ethnic heterogeneity
b 63.839 2.461 3.325 2.069 2.861
SE 32.711 1.091 1.512 0.419 1.156
t 1.952 2.256 2.199 4.938 2.475
P 0.052 0.025 0.029 0.000 0.013
Female-headed households
b 22.765 0.533 0.192 3.919 3.739
SE 71.679 2.391 3.313 1.030 2.937
t 0.318 0.223 0.058 3.805 1.273
P 0.751 0.824 0.954 0.000 0.203
Poverty rate
b 39.474 1.405 2.181 0.499 0.021
SE 81.162 2.708 3.752 1.009 3.381
t 0.486 0.519 0.581 0.495 0.006
P 0.627 0.604 0.561 0.621 0.995
Unemployment
b A42.658 5.246 8.137 A1.338 0.432
SE 203.957 6.804 9.428 1.810 6.568
t A0.209 0.771 0.863 A0.739 0.066
P 0.834 0.441 0.389 0.466 0.948
Adjacent to metropolitan area
b A3.944 A0.211 A0.267 A0.247 A0.458
SE 6.372 0.213 0.295 0.071 0.215
t A0.619 A0.991 A0.905 A3.479 A2.130
P 0.537 0.322 0.365 0.000 0.034
Constant
b A66.020 A6.645 A11.560 A13.750 A15.243
SE 43.732 1.459 2.022 0.630 1.722
t A1.510 A4.554 A5.717 A21.825 A8.852
P 0.132 0.000 0.000 0.000 0.000
Poisson for Aggregate Data 33
Model fit
Method specific criteria
Baseline model b
MSE c 1853.9 2.419 4.729 α e 1.263
R2 0.226 0.328 0.332 0.484 f 0.456 f
A2LLd 1584.5 950.8
Full model
MSE 1760.6 1.960 3.762 α 0.852
R2 0.284 0.471 0.483 0.585 0.548
A2LL 1420.9 901.5
Spearman r 0.653 0.708 0.710 0.671 0.687
Note: The models also included dummy variables representing differences between the four states.
a
t and P values computed for difference of b from 1 rather than difference of b from 0.
b
The baseline model controls for differences between states and, in the Poisson and negative
binomial models, includes log population at risk, with a fixed coefficient of 1.
c
Mean squared error for the OLS regression models.
d
A2 times the log likelihood for the Poisson and negative binomial models.
e
Reflects residual variance in true crime rates, which is overdispersion beyond that expected
from a simple Poisson process.
f
See footnote 5 for a description of the computation of R2 values for the Poisson and negative
binomial analyses.
the variance in these robbery rates, which is 5.9% more than a baseline
model that includes only differences between states [F(7,253)G2.97, PG
0.005].
There are several indications that this OLS model is very poorly suited
to the data. Though an arrest rate below zero would be impossible, this
model yielded negative fitted values for 42 cases, and these negative values
fall as much as 0.61 standard deviations below zero. Under this OLS model,
the two counties with the highest arrest rates constituted extreme outliers
with standardized residuals of 7.2 and 6.1, both far too large to be accept-
able at any sample size. These are strong indications that the fitted values
do not accurately track actual mean crime rates, so it is clear that a linear
model severely distorts the relationship between these explanatory variables
and county level arrest rates.
The critical assumption for the accuracy of standard errors and signifi-
cance tests in OLS analysis is that the residual variance does not vary
systematically across cases, and White’s test for heteroscedasticity
(McClendon, 1994, pp. 178–181) provides a simple and direct means of test-
ing this assumption. This test involves an OLS regression analysis in which
34 Osgood
the squared values of the residuals serve as the dependent variables, and the
fitted values of that regression will reflect mean levels of squared residuals.
The independent variables in this residual analysis can be any factors sus-
pected to be related to heterogeneity of the residuals. Because I expect that
residual variance will depend on population size, but not in a linear fashion,
I chose linear, squared, and cubed terms for population size as independent
variables. Using absolute values of residuals rather than squared residuals
as the dependent variable provided a better summary of the data. (When
squared, residuals of the outliers dominated the entire sample.) This analysis
indicated that the magnitude of residual variance varied widely by popu-
lation size: The squared values of the fitted absolute residuals ranged from
94 to 1162 [R 2 G0.050, F(3,260)G4.51, PG0.004].3
variance, which is a clear indication that the transformation puts the data
in a form that has a closer linear correspondence to these explanatory vari-
ables. Also, in this altered metric a larger share of the explained variance is
attributable to the explanatory variables rather than to differences between
states [increase in R 2 G0.142, F(7,253)G9.761, PF0.001]. Second, the
range of the fitted values is not problematic under this model because nega-
tive fitted values correspond to logarithms of crime rates between zero and
one. Third, the transformation also reduces problems of outliers, with the
most extreme standardized residual for the OLS analysis of the transformed
arrest rates now 3.2. The change in metric means that the coefficients of
these first two analyses are not comparable, but the benefit of the improved
correspondence between model and data is apparent in the higher t values
for the three variables most strongly related to crime rates (population size,
residential instability, and ethnic heterogeneity).
Though the logarithmic transformation renders the data more suitable
for OLS analysis, it also makes apparent the inherent problems that require
Poisson-based regression. First, rather than solving the problem of hetero-
scedasticity, the error variance has become even more strongly related to
population size. The cubic model of the absolute residuals now acounts
for 10.8% of their variance [F(3,260)G10.52, PF0.001], with fitted values
corresponding to squared residuals that range from 0.01 to 1.98. Further-
more, the specific cases that constitute outliers also have changed. In the
OLS analysis of untransformed rates, the two most extreme outliers were
the counties with the highest crime rates, both of which have larger than
median juvenile populations. The most extreme outlier in the OLS analysis
of the transformed rates is the smallest population with any recorded
arrests. The OLS assumption of homogeneity of residual variance implies
that the predictive accuracy of the model is independent of population size.
As we would expect from the inevitable unreliability of crime rate estimates
based on small population sizes, that assumption is clearly in error.
Second, the discrete and skewed nature of crime rates for small popu-
lations presents a special problem for analyzing log transformed crime rates.
Observed rates of zero will be common for small populations, in which
case the transformation can be computed only after adding a constant. The
common choice of adding one is highly arbitrary. The value could as easily
be 1 per 1000 or 1 per 1,000,000 as the 1 per 100,000 used in the analysis
just discussed, yet the choice of this constant may drastically effect the
results. To see this compare Columns 3 and 4 in Table II, which differ only
in that Column 3 reports an analysis resulting from adding a constant of 1
per 100,000 while the constant for Column 4 is 0.2 (corresponding to 1
arrest per 100,000 for the 5 years covered in the study, rather than for one
year). This arbitrary choice results in an increase of roughly 40% in most of
36 Osgood
that, on average, differences between fitted and observed crime rates are
considerably larger than specified by the Poisson distribution. This overdis-
persion reflects some combination of unexplained variation in counties’ true
crime rates and positive dependence among crime events.
Comparing the fifth and sixth columns in Table II makes clear the
consequence of ignoring that overdispersion. The basic Poisson model gives
the impression of far greater precision in the estimated relationships than is
justified. The standard errors for the basic Poisson model average only
about one-third the size of the standard errors for the negative binomial, so
the basic Poisson model would produce highly erroneous significance tests
for the coefficients. In this example, the basic Poisson would lead us to
conclude that residential instability and female-headed households are sig-
nificantly related to rates of arrests of juveniles for robbery, while the nega-
tive binomial would not.
4
Standardized residuals can be computed on the basis of the variance of the negative binomial,
which is λ iCαλ 2i (Gardner et al., 1995). Though the Poisson rapidly approaches the normal
distribution as λ increases, this is far less true of the negative binomial with the moderate
value of α obtained in this example. For instance, even with a value of λ as large as 32,
standardized residuals could differ from the normal deviates corresponding to negative
binomial probabilities by over 50%.
38 Osgood
OLS models. For the Poisson-based analyses, the baseline model includes
not only dummy variables to control for differences between states, but also
the log of the population at risk with a fixed coefficient of one, as in Eq.
(3). This control for population size is necessary so that the regression will
be a model of per capita crime rates rather than a model of counts of crimes.
For the negative binomial, the full model yields a likelihood-ratio value of
49.4 in comparison to the baseline model, which is statistically significant
(dfG7, PF0.001). Thus, we can conclude that the explanatory variables
account for more variation in crime rates than would be expected by chance
alone.
By the conventional 0.05 standard of statistical significance, the nega-
tive binomial analysis indicates that higher juvenile arrest rates for robbery
are associated with larger populations at risk, greater ethnic heterogeneity,
and being adjacent to a metropolitan area.6 To interpret the regression coef-
ficients for these variables, we must take into account the logarithmic trans-
formation that intervenes between the linear model and fitted crime rates
[in Eqs. (1) and (3)]. Liao (1994) explains several useful strategies for inter-
preting these coefficients. One relatively straightforward approach to this
task follows the implication of Eqs. (1) and (3) that an increase of x in an
explanatory variable will multiply the fitted mean crime rate by the exp(bx).
Thus, given the coefficient of 2.861 for ethnic heterogeneity in the negative
binomial model, a 10% increase in ethnic heterogeneity would multiply the
rate by exp(0.286), which is 1.33. In plain English, a 10% increase in ethnic
heterogeneity is associated with a 33% increase in the juvenile arrest rate
for robbery. Because being adjacent to a metropolitan area is coded as a
dummy variable, an increase of one in this variable corresponds to the con-
trast between adjacent and nonadjacent counties. Thus, the statistically sig-
nificant coefficient of A0.458 indicates that counties adjacent to
metropolitan areas have a 37% lower rate of robbery than those that are
not because exp(−0.458 ∗ 1) equals 0.63. [This surprising result does not rep-
licate for analyses of other offenses reported by Osgood and Chambers
(2000).]
In interpreting the results for population size, we must take into
account the special role of this variable in Poisson-based analyses of aggre-
gate rates. When the coefficient for the log of the population at risk is fixed
at one [as in Eq. (3)], per capita crime rates are constant across counties
with different population sizes, controlling for the other explanatory vari-
ables. The analyses reported in Table II treat that coefficient as estimated
6
In the more extensive analyses reported by Osgood and Chambers (2000) population at risk,
ethnic heterogeneity, residential instability, and female-headed households proved to be
associated with most offenses, but adjacency to metropolitan areas did not.
40 Osgood
rather than fixed, which allows for the possibility that crime rates differ with
population size. In this case, however, it is necessary to subtract the value
of one from this coefficient in order to determine its implications for the
relationship of population size to per capita crime rates. Similarly, the stat-
istical significance of the relationship is gauged by comparing the estimate
to the value of one, rather than to zero as is the usual case.7 A coefficient
greater than one would indicate that counties with larger populations have
higher per capita crime rates, while a coefficient less than one would indicate
the opposite. Thus, the coefficient of 1.718 from the negative binomial
analysis agrees quite closely with the value of 0.749 from the OLS analysis
of the transformed crime rates. The first indicates that a doubling of the
population is associated with a 64% increase in per capita robbery rates
[exp(0.718 ∗ log(2))G1.645], while the second implies a 68% increase
[exp(0.749 ∗ log(2))G1.680].
I have argued that the coefficients and significance tests based on the
negative binomial (or another Poisson-based regression model that allows
for overdispersion) are preferable because the other models I have reviewed
rely on assumptions that are inconsistent with the data. Yet how much
difference does the choice of model make? We can get some idea by compar-
ing the coefficients and t values for the negative binomial analysis with those
for the other analyses in Table II. Other than the OLS analysis of untrans-
formed rates, all models specify a logarithmic relationship between fitted
values and mean crime rates, so coefficients have comparable meanings
across those models. In general, one would not expect an incorrect model
to introduce any systematic bias, so it is surprising that estimates for many
of the coefficients differ dramatically across the models. The absolute values
of coefficients for residential instability, poverty rate, and unemployment
are far larger in the OLS analyses than in the negative binomial analysis,
while the opposite is true for female-headed households. Differences of this
sort most likely are due to the role of population size in Poisson-based
analyses. OLS analyses place as much weight on small counties as on large
ones, but Poisson-based regression models expect error distributions in
small counties to have greater variance, with the consequence that results
are less influenced by small counties. This differential weighting has con-
siderable potential for changing results in a sample such as ours, where there
is a large range of population sizes.
The standard errors for the negative binomial model are most similar
to those of the OLS analysis of log transformed rates, using the additive
constant of one. Even here, however, standard errors for four of the seven
7
In other words, the test statistic to be compared to the normal distribution is not the usual
bySEb , but rather (bA1)ySEb .
Poisson for Aggregate Data 41
substantive variables are at least 20% larger in the negative binomial analy-
sis. There are far greater discrepancies in standard errors for the other
models, so it is clear that significance tests may be seriously affected by
applying an appropriate statistical model to aggregate data for small
populations.
4. CONCLUSIONS
Using Poisson-based regression models of offense counts to analyze per
capita offense rates is an important advance for research on aggregate crime
data. Standard analytical approaches require that data be highly aggregated
across either offense types or population units. Otherwise offense counts are
too small to generate per capita rates that have appropriate distributions
and sufficient accuracy to justify least-squares analysis. Poisson-based
regression models give researchers an appropriate means for more fine-
grained analysis. Poisson-based models are built on the assumption that the
underlying data take the form of nonnegative integer counts of events. This
is the case for crime rates, which are computed as offense counts divided by
population size. In our example analysis of juvenile arrest rates for robbery,
the Poisson-based negative binomial model provides a very good fit to the
data, while OLS analyses produce outliers and require arbitrary choices that
have a striking impact on results.
Poisson-based regression models free researchers to investigate a much
broader range of aggregate data because they are appropriate for smaller
population units and less common offenses. Yet these models are not magic.
The reason they are appropriate is that they recognize the limited amount
of information in small offense counts. The price one must pay in this trade-
off is that the smaller the offense counts, the larger the sample of aggregate
units needed to achieve adequate statistical power. For example, this sample
of 264 counties proved too small for a meaningful analysis of juvenile
homicide, the least common offense examined in this study (Osgood and
Chambers, 2000).
Though this article has concentrated on two of the most common
Poisson-based regression models, this approach to analyzing aggregate
crime rates can be implemented with virtually any of the Poisson-based
regression analyses. The numerous Poisson-based models reviewed by
Cameron and Trevedi (1998) offer many choices for finding a model with
assumptions that best match one’s data. Some models expand the range of
research questions that can be addressed, such as using finite-mixture mod-
els to identify homogeneous groups of counties. Other Poisson-based mod-
els have been developed for designs with repeated measures or nested data,
such as counties nested within states or multiple subpopulations nested
42 Osgood
ACKNOWLEDGMENTS
This research was supported by Grant 94-JN-CX-0005 from the Office
of Juvenile Justice and Delinquency Prevention, Office of Justice Programs,
U.S. Department of Justice. The author thanks Jeff Chambers for his assist-
ance with this study, Chet Britt for comments on an early draft, and Gary
Melton and Susan Limber for their support of the entire project. Points of
view or opinions in this document are those of the authors and do not
necessarily represent the official position or policies of the U.S. Department
of Justice.
REFERENCES
Bailey, A. J., Sargent, J. D., Goodman, D. C., Freeman, J., and Brown, M. J. (1994). Poisoned
landscapes: The epidemiology of environmental lead exposure in Massachusetts. Soc. Sci.
Med. 39: 757–766.
Bryk, A. S., and Raudenbush, S. W. (1992). Hierarchical Linear Models: Applications and Data
Analysis Methods, Sage, Newbury Park, CA.
Bryk, A. S., Raudenbush, S. W., and Congdon, R. (1996). HLM: Hierarchical Linear and
Nonlinear Modeling with the HLM/2L and HLM/3L Programs, Scientific Software Inter-
national, Chicago.
Cameron, A. C., and Trivedi, P. K. (1998). Regression Analysis of Count Data, Cambridge
University Press, Cambridge.
Gardner, W., Mulvey, E. P., and Shaw, E. C. (1995). Regression analyses of counts and rates:
Poisson, overdispersed Poisson, and negative binomial. Psychol. Bull. 118: 392–405.
Greenberg, D. F. (1991). Modeling criminal careers. Criminology 29: 17–46.
Greene, W. H. (1995). LIMDEP: Version 7.0 Users Manual, Econometric Software, Plainview,
NY.
King, G. (1989). Unifying Political Methodology: The Likelihood Theory of Statistical Inference,
Cambridge University Press, Cambridge.
Liao, T. F. (1994). Interpreting Probability Models: Logit, Probit, and Other Generalized Linear
Models, Sage University Paper Series on Quantitative Applications in the Social Sciences,
07–101, Sage, Newbury Park, CA.
Maltz, M. D. (1994). Operations research in studying crime and justice: Its history and accom-
plishments. In Pollock, S. M., Rothkopf, M. H., and Barnett, A. (eds.), Operations
Research and the Public Sector, Volume 6 of Handbooks in Operations Research and
Management Science, North-Holland, Amsterdam, pp. 200–262.
Poisson for Aggregate Data 43
McClendon, McK. J. (1994). Multiple Regression and Causal Analysis, F. E. Peacock, Itasca,
IL.
McCullagh, P., and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed., Chapman and
Hall, London.
Nagin, D. S., and Land, K. C. (1993). Age, criminal careers, and population heterogeneity:
Specification and estimation of a nonparametric, mixed Poisson model. Criminology 31:
327–362.
Osgood, D. W., and Chambers, J. M. (2000). Social disorganization outside the metropolis:
An analysis of rural youth violence. Criminology 38: 81–115.
Rowe, D. C., Osgood, D. W., and Nicewander, W. A. (1990). A latent trait approach to
unifying criminal careers. Criminology 28: 237–270.
Sampson, R. J., Raudenbush, S. W., and Earls, F. (1997). Neighborhoods and violent crime:
A multilevel study of collective efficacy. Science 177: 918–924.
United States Department of Commerce (1992). Summary Tape Files 1 and 3, 1990 Census.
Warner, B. D., and Pierce, G. L. (1993). Reexamining social disorganization theory using calls
to the police as a measure of crime. Criminology 31: 493–517.