Applying Spatial Thinking in Social Scie
Applying Spatial Thinking in Social Scie
Author Manuscript
GeoJournal. Author manuscript; available in PMC 2011 January 1.
Published in final edited form as:
NIH-PA Author Manuscript
Abstract
Spatial methods that build upon Geographic Information Systems are spreading quickly across the
social sciences. This essay points out that the appropriate use of spatial tools requires more careful
thinking about spatial concepts. As easy as it is now to measure distance, it is increasingly important
to understand what we think it represents. To interpret spatial patterns, we need spatial theories. We
review here a number of key concepts as well as some of the methodological approaches that are
now at the disposal of researchers, and illustrate them with studies that reflect the very wide range
of problems that use these tools.
NIH-PA Author Manuscript
Keywords
GIS; Spatial methods; Spatial concepts
In the last 10 years there has been an explosion of interest in the applications of spatial concepts
and techniques in the social sciences (Voss 2007). The development has been especially intense
among those researchers who are used to working with data that are aggregated for a territorial
unit (a county, city, or neighborhood). It is a natural step to take advantage of the new
Geographic Information System (GIS) technologies that make it relatively easy to map those
data. More important, visualizing information on a map tends to bring up other questions about
how to understand the patterns. At this point, GIS gives way to a myriad of tools of spatial
analysis that are well established in geography and in some applied fields such as biostatistics
but that most social scientists are not yet familiar with.
their data or their question is of the same form as one of those reviewed here, which can then
be used as a template for their own analysis. We also have a theoretical purpose. Spatial analysis
brings into play concepts like proximity or access, isolation or exposure, neighborhoods and
boundaries, neighborhood effects, and diffusion. Informed use of spatial tools requires
familiarity with these concepts, and the presentation is organized to clarify the spatial thinking
that underlies them. We emphasize that very simple notions in the new spatial statistics toolbox,
like distance, turn out to be theoretically complex in their use. When there are spatial patterns,
there is usually more than one way that they could arise, so there is no shortcut for interpretation
of findings.
helped us perceive a spatial relationship, to notice where some kinds of observations lie in
relation to others. The map allows us to see (and allows a GIS program to calculate and store
in a data file) these relative distances. What do these distances mean to us?
NIH-PA Author Manuscript
A core presumption for geographers is that like things tend to be near to one another, and the
more alike, the nearer they are (Tobler’s First Law of Geography). This is the phenomenon of
spatial dependence on which much spatial analysis is built. Yet many different causal processes
can lead to spatial dependence. It can result from the possibility that, for better or worse, the
nearer two phenomena are to one another, the more likely they will come into contact or affect
one another. In studies of intergroup relations exposure is often presumed to be a step toward
reducing boundaries between groups, so it is positive. In epidemiological applications, in
contrast, exposure to others leads to infection. Either way, if the access or the risk is
consequential, it can make nearby phenomena more similar by virtue of exposure to the same
stimulus. That is, proximity causes similarity. This causal path can operate in the opposite
direction. We can think of locations in terms of the access or risk that accompany them.
Proximity to a resource is thought of as better access; proximity to a hazard, as greater risk.
As people or organization make locational choices they may take into account their differential
proximity to resources or hazards, so the spatial pattern may be the result of their selectivity
in location. Similarity, in other words, can cause proximity.
How close is close enough to make a difference? Many spatial statistics require taking a position
NIH-PA Author Manuscript
on this question. It seems reasonable to assume that the relationship between two observations
declines monotonically with the distance between them, perhaps very rapidly and perhaps
disappearing entirely at some distance. Yet we rarely have enough information or theory to
specify more clearly the functional form of this decline. Geographers have defined a number
of options. It is popular to consider only adjacent observations to be close, or “spatially
dependent.” An alternative is to treat the “n” nearest neighbors as equally close, or to extend
the analysis to a larger number of neighbors but to give greater weight to the nearest ones in
the analysis. Rather than stipulate an answer to this question, some approaches directly
investigate how patterns of similarity are related to distance. For example, one could specify
a number of rings of different diameter (or band widths) around an observation and then
evaluate how similarity changes (presumably it declines) across these bands. The result is not
a single statistic, but a graph representing how the statistic varies at different distances.
The implication of these considerations is that even the simplest spatial measures reflecting
what we see on a map—the relative distances among observations—require substantive
interpretation. We will add one further complication. Suppose the important spatial patterns
are not smooth surfaces, but instead are better represented as discontinuous areas with frontiers
between them like the boundaries between two school districts. Imagine that on one side of the
NIH-PA Author Manuscript
line is district x [X] and on the other side is district y [Y], potentially with great differences
between the two. Then we need a theory not of distance between locations but of socially
relevant geographic areas, or discrete places. As the following review will illustrate, this is not
an uncommon case. Spatially oriented social scientists often think in terms of neighborhoods
or zones, and though the boundaries between them may not be as explicit as political
boundaries, they may be very distinct. Especially when we ask how about the consequences
of living in this place versus a different one, we are implicitly defining places. There are also
situations where the issue is not living a little nearer or a little further away, but in one place
or in another. How do we define the places and determine their boundaries? Sometimes the
definition is based on a measurement limitation—we have information about a given
administrative unit, and we choose to treat it as the relevant place even though it has no
substantive meaning. In such cases it may make sense to use tools that seek to convert discrete
data observations into smooth surfaces and deploy the usual methods of studying distance.
When we have choices about spatial scale, there is a temptation to try all of them. Indeed a
spatial pattern we observe at one scale may disappear at a smaller or larger scale, so searching
for a pattern is sometimes rewarded. A version of this approach is to hedge one’s bets by
NIH-PA Author Manuscript
conducting the analysis at one geographic scale but then taking into account the observations
in nearby units as another predictor (in spatial regressions, this is called “spatial lag”). But
when this is done without a clear theory, we do not know how to interpret the result when
neighboring places seem to matter. It could be that there are causal processes by which what
happens in one place is influenced by what happens in its neighbors. In that case one would
describe the result as showing diffusion across space, or perhaps one would comment that
places are fatefully embedded in a larger context. But alternatively the spatial dependence
revealed in the analysis may indicate that we conducted it at too narrow a geographic scale—
that a given area and the areas around it are really components of the same place. Ultimately,
as in so many methodological questions, the real issue is substantive and it cannot be settled
empirically.
Community health
Research on population health often takes advantage of data on the incidence of disease, where
the address of victims is known. Converting street addresses to points on a GIS map
(geocoding) opens up many possibilities for how to analyze the spatial distribution of those
points. One study of dengue infection in a city in central Brazil (Siqueira et al. 2004) collected
survey data on 1,585 individuals. The survey included dengue infection status, medical
condition history, and socioeconomic and demographic characteristics. Another component
asked specifically whether there were zones of the city with significantly high concentrations
of infection, which could then be targeted for public health interventions. For this purpose they
used a technique called kernel density estimation that evaluates the proximity of infected
persons to other infected persons as compared to non-infected persons. Locations with
statistically significant clusters of infected persons can readily be mapped (as in Fig. 1). The
kernel estimate map reveals such clusters in the northwestern, eastern, and southeastern parts
of the city.
Another step in the analysis investigated the source of the spatial clusters that were observed.
Are they random (wherever an infection occurs, for whatever reason, it tends to spread from
NIH-PA Author Manuscript
there) or caused by some underlying factor (something about the people living in a given area
or their environment)? Area-based indicators derived from census data were linked with
individual data. These variables included average income, population density, housing density,
and percent of households with an indoor water supply. There was no special effort to determine
the appropriate scale at which to measure neighborhood characteristics; like most researchers,
the Siqueira team applied administrative units from the Brazilian census. The resulting
contextual or multilevel analysis provided clues about the processes that caused spatial clusters.
the density of these cases at a given distance or spatial scale. The difference between the K-
functions for sick and healthy children is therefore a measure of clustering. Figure 2 graphs
the result, with the difference in K-functions on the y axis and distance on the x axis. The graph
NIH-PA Author Manuscript
seems to show that the difference is positive at every distance, and that it reaches a maximum
at a distance of about 4 km. One might conclude, then, that there is clustering at that scale. But
it turned out not to be statistically significant. The diagonal lines on the graph show a
“simulation envelope” within which K-function values could be expected to fall by chance
95% of the time. A nice feature of the K-function is that it can easily be recalculated many
times in a computer simulation, where leukemia cases are randomly assigned to children. That
is the basis for the test of statistical significance.
Unfortunately individual-level data (points) are often not available. Kelsall and Wakefield
(2002) had only aggregate data on colorectal cancer in 39 electoral wards in the U.K. West
Midlands district of Birmingham in 1989. They were aware that aggregated data, especially
for rare diseases and small geographic areas, are subject to much random variation between
extremely high and low values. And in this case they were unwilling to assume that the
administrative areas approximated “real” neighborhoods. Therefore they sought to convert the
aggregate data to an estimate of what the underlying point data might be like, if the ward-level
data were based on an underlying continuous risk surface of colorectal cancer. Their approach
assessed values in each electoral ward as well as in its neighbors, taking advantage of the fact
that neighboring areas tend to have similar values (the phenomenon referred to as spatial
NIH-PA Author Manuscript
dependence or spatial autocorrelation). Specifically they assumed that the risk at any point
could be approximated by a Gaussian random field model that has been much used in
geostatistics (Diggle et al.1998). Although there appeared to be great variation in cancer rates
across electoral wards, once the data were “smoothed” in this way there was little spatial
variation.
neighborhoods, and the question was how close (in three distance intervals) various types of
neighborhoods were to a risky location. The key finding is that neighborhoods with high
proportions of Latino residents were most exposed.
A more sophisticated approach is to use a distance decay model, where “exposure” to a site is
assumed to be proportional to one’s distance from it. Downey (2006) used this method to
examine whether minority and lower income groups are disproportionately burdened by
environmental hazards in Detroit. He began with a map showing the geocoded location of
industrial facilities identified in the federal government’s 2000 Toxics Release Inventory. He
overlaid this map with a census tract map, and calculated the distance from every toxic facility
to each of many small grids within each tract. He then calculated the total hazard exposure for
each grid, taking into account these distances and also the volume of toxic emissions from each
facility, and aggregated the grid cells to calculate a total tract exposure. There are two hurdles
for this analysis. The first is that Downey did not know what variation there was in population
composition of the many grid cells within each tract. He chose to presume that they were all
the same (unlike Kelsall and Wakefield, who sought to model the variation). The second hurdle
was to assess how distance should be related to exposure—should exposure decline linearly
NIH-PA Author Manuscript
with distance, or should nearby facilities be counted even more heavily than more distant ones,
and is there some distance beyond which there is no exposure? Because there is no obvious
solution, Downey chose six different distance-decay functions and tested all of them. He used
multiple regression analysis to determine that the percent of black residents in a tract is
significantly related to toxic exposure, but only at distances of 1.5–2.5 miles. Black census
tracts tended to be near but not directly adjacent to toxic facilities. Without a stronger theory
about the expected distance band, the significance level of this finding is in doubt—if one tests
several cutoff points, there is a probability that at least one of them will appear to be significant
even if the distribution is random.
Pais and Elliott (2008) used similar methods to investigate the effects of another type of
environmental risk: three major hurricanes during the early 1990s. What population shifts in
neighborhoods (again operationalized as census tracts) were caused by wind damage? This
study relied on sophisticated climatological applications of GIS methods to estimate the
maximum wind speeds experienced in every census tract within the study region. The
researchers combined these estimates with information about the tract’s demographic
composition (population size, in-migration, and number of housing units) in 1990 (before the
storm) and 2000 (afterwards). Their regression procedure adds a special feature that qualifies
NIH-PA Author Manuscript
it as a “spatial regression.” To control for the fact that census tracts near one another tend to
have similar characteristics, and also tend to have suffered similar levels of wind damage, they
included a spatial error term in their model to correct for spatial autocorrelation. They also
specifically investigated several spatial factors. Most interesting, it turned out that there was
robust population growth in all the areas hit by these hurricanes, but especially in those areas
just outside the zone of greatest damage. There was, in a sense, displacement of resources to
nearby, less damaged zones.
Residential segregation
A common feature of all these studies is that they rely on an implicit concept of neighborhood,
reflected in clusters of points or in administratively defined wards or census tracts. Scholars
have given the definition of neighborhoods more explicit attention in studies of residential
segregation and its effects. Although there is a long history of segregation research that relies
on aggregate data for administrative units, it is only recently that demographers have attempted
systematically to take into account the spatial configuration of those units. An example is a
study by Reardon et al. (2008). These researchers point out that the geographic scale of
segregation may vary greatly from city to city. This notion is illustrated in Fig. 3. The figure
NIH-PA Author Manuscript
shows stylized patterns of segregation for a city that is 50% white and 50% black. Starting at
a random point within this city, the figure shows the percent of neighbors who are black within
a ring of a given radius. Region A, for example, has many small neighborhoods (areas with a
1–2 km radius) that are as low as 30% black or as high as 70% black, but at any 4 km radius
every neighborhood is 50% black. At another extreme, Region D has large neighborhoods
(with a radius of about 16 km) that are predominantly white or black. In both cases the city
could be described as racially segregated, but the spatial scale of segregation is quite different.
Reardon et al. proposed using a spatial information theory index as a measure of segregation.
Such indices are based on the density distribution of a particular group, measured in this case
at the block level. Like Downey’s risk exposure measure, it includes a distance decay function,
so that adjacent blocks are more strongly weighted than those farther away. And to assess
variation in the geographic scale of segregation the Reardon index counts blocks within four
different radii, ranging from 500 m (what they describe as a “walking neighborhood” scale to
4 km, a much more macro scale). In the cities they study the level of segregation is consistently
higher at smaller scales, as one would expect. More interesting, some cities (Pittsburg) have
most of their segregation at a small geographic scale (small racially homogeneous
NIH-PA Author Manuscript
neighborhoods near other with very different composition), while others (Atlanta) are
characterized by segregation at a large scale.
This study emphasizes that the measure of segregation depends on what the researcher
considers to be a “local” neighborhood area. A related direction for research is to seek to define
explicitly the boundaries of neighborhoods, freed from the assumption that all neighborhoods
have the same scale. This is the goal of a project by Logan et al. (2002) that asked which group
members were likely to live within ethnic neighborhoods in New York and Los Angeles for
several largely immigrant groups, such as Chinese and Mexicans. To do this required as a first
step to identify the neighborhoods. Maps of census tract data for both regions revealed that the
areas of concentration for these groups tended to extend over multiple tracts. But what criterion
could be used to determine where the concentration ended—at what point was the next adjacent
tract “outside” of the neighborhood?
This question is very similar to the one raised in health studies above (where are the areas of
significant spatial clustering?), but with a greater emphasis on establishing a boundary. Logan
employed an increasingly popular method of analyzing spatial clusters for aggregated data
(where the units of analysis, like tracts, are polygons). This is the local Moran’s I (Anselin
NIH-PA Author Manuscript
1995), which evaluates the spatial distribution of local area values on a single variable and
identifies locations where there are clusters of areas with high values whose neighbors are also
significantly high (and also areas with low values whose neighbors are also significantly low).
Logan treated high–high clusters (such as tracts with a high share of Chinese residents
surrounded by other highly Chinese tracts) as ethnic neighborhoods.
Having identified the neighborhoods, Logan could then describe their characteristics, and also
estimate models of which group members lived in their group’s ethnic neighborhood vs. a less
segregated location. Findings for some groups did not match the expectations of standard
immigration theories, which posit that group members concentrate initially in neighborhoods
with lower costs but greater ethnic solidarity. For example, the Afro-Caribbean neighborhoods
of New York were in many respects more advantaged than the non-ethnic neighborhoods where
Afro-Caribbean people lived. And for some groups, it is the more affluent members who live
in ethnic neighborhoods, apparently because they have a preference for that living environment.
Neighborhood effects
Another rapidly growing interest among social scientists is whether neighborhoods have effects
on their residents. As already noted, most often neighborhoods are defined according to the
NIH-PA Author Manuscript
available administrative geography (like a census tract), although some progress has been made
in methods of combining small units into socially meaningful neighborhoods. But in rare cases
researchers have point data for individuals and therefore have greater flexibility in their
analysis. To investigate the relationship between neighborhood-level deprivation and
individuals’ mental disorders, Chaix et al. (2005) used data for all 65,830 residents aged 40–
59 years in Malmö, Sweden, geocoded at their place of residence. Figure 4 shows the spatial
distribution of predicted risk for mental disorder based on these data. This map is based on a
sophisticated “geoadditive logistic model,” where individual risk is predicted as a function of
their spatial location (see Wood 2004), and variation across individuals is “smoothed out.”
Clearly disorder was highly concentrated in neighborhoods in the northern part of the city.
How is this risk related to neighborhood deprivation? A standard approach would be to estimate
a multilevel model (Raudenbush and Bryk 2002), including some individual predictors (age,
gender, marital status, education, income) along with a measure of neighborhood income level
based on Malmö’s 100 administrative neighborhoods. Results from that approach, correcting
for spatial autocorrelation, showed significant effects of the mean neighborhood income.
Chaix’s team then took another step, reasoning that people might be especially affected by
NIH-PA Author Manuscript
their closest surroundings. Using their individual-level data, they drew their own “spatially
adaptive areas” around each person, experimenting with areas that included only the 25 nearest
neighbors, then the 100 closest, up to the 1,500 closest. The effect of income measured for
these areas was much stronger than when measured in administrative neighborhoods—about
twice as strong in the most tightly drawn areas.
Researchers have begun asking broader questions about neighborhood effects, especially
whether people are affected only by their own local neighborhood but also by characteristics
of larger surrounding areas. These are often called “spatial lag” effects. Chaix fitted a model
in which this lag effect could be estimated using the same sort of distance decay function
employed by Downey above. That is, it was assumed that adjacent neighborhoods had stronger
effects than more distant neighborhoods. It turned out, after controlling for individual
predictors and the effect of neighborhood income level, that there was significant association
in levels of mental disorder between neighborhoods that were as much as 700 m apart.
Land use
Spatial methods are natural in studies of land use, where a central question is how different
NIH-PA Author Manuscript
forms of agriculture or forage are distributed across a region. Remote sensing from satellites
is being exploited as a central data source. For example Chomitz and Gray (1996) conducted
a spatial analysis of land use in Belize between 1989 and 1992. The land use data are derived
from a land cover map based on satellite imagery. Remote sensing signals are coded into three
categories of land use: (1) “semisubsistence” agriculture, comprising milpa and other
nonmechanized annual cultivation; (2) “commercial farming”, comprising mostly pasture and
mechanized farming of annuals; and (3) “natural vegetation”, comprising forest, secondary
growth, wetlands, and natural savanna. In the rural area that they studied, they felt free to ignore
administrative boundaries entirely, and instead they placed a 1-km rectangular grid over the
territory and drew a sample of nearly 12,000 land points. To this they overlaid information on
the soil’s physical and chemical characteristics from a series of land resource assessments based
on a combination of aerial photography and field surveys. They also added the road network
from topographic maps.
Their purpose was to understand how soil characteristics and distance to market affect land
use. Like some other studies discussed above, their multivariate model included controls for
spatial autocorrelation. Less distance to market is associated with higher probabilities of land
being in semi-subsistence and commercial agriculture. Higher soil nitrogen and phosphorus
NIH-PA Author Manuscript
are related to higher probabilities of both types of agriculture, while an excessively low or high
pH is related to decreased probabilities of both types. Other geographic characteristics
associated with location (proximity to a river, slope of the land) also have significant effects.
Social scientists more often wish also to take population data into account. Pan et al. (2007)
conducted such a study to analyze the effect of changes in population size, density, and
distribution on forest cover in Ecuador’s northern Amazon region. A first round of survey data
were collected from migrant farmers in 1990, yielding a sample of 470 settler plots in 64
settlement sectors with interviews with both the economic head and the spouse. These surveys
covered a wide range of topics, including detail on land use, agricultural and non-agricultural
work, and household socioeconomic and demographic backgrounds. Then a follow-up survey
was conducted in 1999, providing an opportunity to examine changes in population structures
and land use. Additional community and spatial data were collected in 1999 and 2000.
Locations of all relevant community structures (e.g., markets, health care centers, community
centers, and schools), farms, and each household were geocoded using the global positioning
system (GPS) receivers. Primary and secondary roads were also digitized.
NIH-PA Author Manuscript
Pan estimated an initial ordinary least squares (OLS) regression seeking to explain which farms
had experienced the greatest loss of forest cover. Diagnostic statistics (Moran’s I and Lagrange
multiplier tests) showed that there was a high degree of spatial autocorrelation (farms with
more forest loss tended to be near one another). One standard response would be to estimate
a spatial error model, where correlated errors across farms are accounted for by their proximity
to one another. This requires creating a spatial weights matrix that identifies distances between
every pair of cases, which fortunately has become relatively straightforward through the use
of GIS software. Another response, and one that worked well in this case, is to understand the
source of spatial dependence and control for it more directly. Here spatial dependence mostly
reflected clustering of farms within settlement areas, so it could be accommodated through a
random effects model (Snijders and Bosker 1999) that allows intercepts to vary by settlements.
This model helps to control for spatial dependence and clustering of farms within sectors. All
three types of models showed that an increase in population size is significantly related to a
decrease in forest cover, deforestation rates are higher among more recently established farms,
but proximity to markets has no independent effect.
Fertility
NIH-PA Author Manuscript
We complete this review by turning to two of the most traditional demographic topics, fertility
and migration. Both phenomena are known to have a spatial structure, and understanding that
structure can lead to new conclusions about population processes.
A remarkable study by Skinner et al. (2000) used GIS techniques to analyze 1% microdata
from China’s 1990 census. Skinner showed that cities and towns in China can be arrayed on
both a core–periphery structure and an urban–rural continuum. The urban–rural distinction
categorizes 12,000 places in terms of size and volume of economic activity into classes ranging
from an “apex metropolis” (9 cities with an average of 4 million urban residents) to “central
towns” (nearly 3,000 places with an average of about 4,000 residents). The core–periphery
distinction classifies places into categories ranging from “inner core” (places with very high
levels of education, manufacturing employment, and economic productivity) to “far
periphery.” Note that this classification was clearly spatial, but the actual measures used to
produce it did not include distance. Figure 5 below illustrates this scheme for the Lower
Yangtze region centered on Shanghai. Skinner’s team then applied this model to the fertility
information from the 1990 census. They demonstrated a strong association between both
dimensions of regional structure and fertility rates—the share of women over age 30 with 3 or
more children ranges from less than 20% in the most urban/inner core places to over 65% in
NIH-PA Author Manuscript
the rural/far periphery. Additional analyses suggested that the fertility transition diffused over
time from the former to the latter type of place.
Skinner’s work shows a close affinity between demography and the analysis of regional
systems. Another study of fertility assesses the causal impact of spatial location on
contraceptive use. Entwisle et al. (1997) report on a long-term program of research on 51
villages in rural Thailand. The Nang Rong data included a full census of the villages conducted
in 1984 that gathered information on married women’s contraceptive use and method (e.g.,
pill, IUD, and sterilization). The data also included information about many characteristics of
communities that were believed to be related to reproductive choices or adoption of
innovations; hence a multilevel analysis could be conducted. Explicit spatial predictors were
the presence of a district health center in the village and proximity to Nang Rong’s main town.
Presence of a health center turned out to be positively related to being sterilized between 1984
and 1988 and also (among non-sterilized women) with using the pill vs. no birth control.
Proximity to the town was, as expected, positively related to being sterilized, but negatively
associated with using the pill.
NIH-PA Author Manuscript
Yet the most interesting spatial conclusion is the high degree of homogeneity within villages.
The pseudo-R2 for the model for choice of contraceptive was .146 with both individual and
community variables. Replacing the community variables with a set of 50 dummy variables to
represent villages—a maximal estimate of variation across villages—raises this to .332.
Evidently there is an unmeasured social process that causes differences between villages. Focus
group interviews in several villages led to the conclusion that contraceptive use is guided by
local social networks. Women within a village talk openly and often about family planning,
but there is less contact between women of different villages and less opportunity for intimate
discussion. Social scientists have only begun to study the spatial character of social networks
or their implications for population outcomes, though GIS methods have potential to facilitate
such work. Remarkably here there seems to be a strong boundary around the village, so that
“distance” needs to be replaced with the concept of discrete “places.”
Migration
Migration is of course inherently about location, with an origin and a destination. Johnson et
al. (2005) used county-level age-specific net migration estimates over five decades (1950–
2000) to describe how patterns have evolved over time. They distinguished several types of
NIH-PA Author Manuscript
counties that were expected to have different migration experience, similar in some ways to
Skinner’s classification of Chinese cities and towns. These are metropolitan core counties
(those containing a central city), metropolitan non-core counties, formerly non-metropolitan
counties that were reclassified as metropolitan after 1963, and non-metropolitan counties
(distinguishing those with a recreational economy from those where agriculture predominates).
The analysis confirmed the continuation into the 1990s of distinct net migration “signature
patterns” for counties, although there was temporal variation in the overall volume of
migration. For example, in every decade the metropolitan core counties experienced most net
in-migration by people in their 20s, while “new” metropolitan counties experienced substantial
losses in that age range but gains among people in their 30s and early 40s, plus children. The
core–periphery or urban–rural classification is implicitly spatial but deals with categories of
locations rather than distances.
In addition, Johnson’s team conducted an analysis of how counties that experienced net in-
migration or out-migration clustered together, this time taking distances explicitly into account.
They employed the same statistic, local Moran’s I, that Logan used to study ethnic
neighborhood clusters. Figure 6 illustrates the results for the decade of 1980–1990. It identifies
large, geographically contiguous regions of net in-migration (in particular, Florida and the
NIH-PA Author Manuscript
Southwest) and geographically contiguous regions of net out-migration (especially the Great
Plains, in particular). In most respects these are the same areas identified in other decades,
though with some changes over time. For example the Southwestern cluster of in-migration in
the 1990s shifted away from California and toward New Mexico and Colorado.
Migration can also be studied as an individual phenomenon, asking who moves and where they
move. Tolnay et al. (2005) used data from the 1920, 1940, and 1970 Census Public Use
Microdata Samples to investigate the distances traveled by men who moved from the Southern
states. Length of migration was measured as the distance between the center of the migrant’s
state of birth and the center of the migrant’s state of residence at the time of census enumeration.
Predictors are individual-level variables from the census. They found that white migrants
moved significantly farther than black migrants, which is related to the greater propensity for
white migrants to move west, rather than north in this period. In this case it appears that distance
is mainly a proxy for direction.
A more local study by Wiseman and Virden (1977) examined intra-urban migration by persons
aged 60 years or older in Kansas City, Kansas. From an intensive individual-level survey they
obtained the most recent previous and present residential locations for the sample households
NIH-PA Author Manuscript
and plotted addresses on a map of the metropolitan area. They calculated three distance
measures: distance to Central Business District (CBD) from the present residence; distance to
the CBD from the former residence; and distance between the present and former residences.
They then could classify 76% of moves as being away from the CBD. Figure 7 displays the
moves by persons who left the CBD for more suburban locations. The figure also shows two
ellipses (Wong 1999). The smaller ellipse summarizes the pattern of initial locations of these
people; the larger one summarizes their destinations. These ellipses are useful for showing the
overall direction of movement and the greater spatial dispersion of destinations.
Wiseman and Virden then used a step-wise multiple discriminant analysis to identify six
socioeconomic and attitudinal variables that most effectively distinguish inward from outward
movers. They found that the outward migrants tend to be wealthier, socially more active,
residentially more stable, and more likely to be home and auto owners, whereas the inward
movers are more likely to have a unstable residential history, and to be relatively poorer,
socially inactive, and less likely to own home or automobile.
Conclusion
NIH-PA Author Manuscript
This overview is intended to introduce some of the many ways in which spatial concepts and
methods are being incorporated into social science research. It illustrates some common
analytical tools, all of which rely heavily on GIS techniques. Once an area is systematically
mapped in a GIS system, it is straightforward to create a visual display of features as points
(such as locations of individual events) and polygons (such as characteristics of census tracts
or counties). Researchers commonly use such maps as an exploratory and descriptive tool,
noticing patterns that would otherwise easily be missed and raising questions and hunches
about what social process underlies those patterns.
GIS techniques facilitate the creation of new spatial measures. These include simple measures
such as distances between two locations. They also include more complex measures such as
exposures, where proximity to many different locations has to be taken into account, along
with information about the features of those locations and assumptions about how exposure
decays with greater distance. Such new measures can be added to an existing dataset and
analyzed using any other methodology.
Very often there is spatial dependence in social science data, which can be seen visually on a
map and confirmed by analyses of spatial clustering. Statistical methods like those discussed
NIH-PA Author Manuscript
here are being developed to draw conclusions about spatial structures, such as the existence of
non-random clusters of population phenomena. Some of these are purely descriptive. Others,
like Moran’s I, have a statistical basis that allows the researcher to discern which clusters are
unlikely to be random. Approaches to estimating spatial regressions that correct for statistical
problems due to spatial autocorrelation are now widely used. A newer direction of research
seeks to build models of the sources of spatial dependence, such as diffusion from one place
to surrounding areas or effects of a wider local context on processes occurring within
neighborhoods.
As researchers move beyond exploring spatial patterns to testing social theories, they also need
to clarify their understanding of space. That is the main point of this essay. The studies reviewed
here make reasonable use of a wide range of tools. Some of them are thoughtful in their
treatment of concepts like distance, exposure, and places. Others make choices that seem to
work, but are not clearly based on theory. Although many applications may seem to be
straightforward, their informed use requires thinking through questions such as what we mean
by proximity, how we expect the impacts of proximity to vary across shorter or longer distances,
and whether we conceive of space as a continuous surface or discontinuous places with more
NIH-PA Author Manuscript
or less clear boundaries. How do we define neighborhoods, and how do we distinguish between
the effects of adjacent neighborhoods and the misspecification of neighborhood scale? As more
scholars find new questions that require spatial methods, there will be growing demand for
approaches that are more finely attuned to the way problems are being thought about and
attacked.
References
Anselin L. Local indicators of spatial association—LISA. Geographical Analysis 1995;27(2):93–115.
Chaix B, Merlo J, Subramanian SV, Lynch J, Chauvin P. Comparison of a spatial perspective with the
multilevel analytical approach in neighborhood studies: The case of mental, behavioral disorders due
to psychoactive substance use in Malmö, Sweden, 2001. American Journal of Epidemiology 2005;162
(2):171–182. [PubMed: 15972939]
Chomitz KM, Gray DA. Roads, land use, and deforestation: A spatial model applied to Belize. The World
Bank Economic Review 1996;10(3):487–512.
Diggle, PJ. Statistical analysis of spatial point patterns. Academic Press; New York: 1983.
Diggle PJ, Tawn JA, Moyeed RA. Modelbased geostatistics. Journal of the Royal Statistical Society
Series C (Applied Statistics) 1998;47(3):299–350.
NIH-PA Author Manuscript
Tolnay SE, White KJC, Crowder KD, Adelman RM. Distances traveled during the great migration: An
analysis of racial differences among male migrants. Social Science History 2005;29(4):523–548.
Voss PR. Demography as a spatial social science. Population Research and Policy Review 2007;26(5):
NIH-PA Author Manuscript
457–476.
Wiseman RF, Virden M. Spatial and social dimensions of intraurban elderly migration. Economic
Geography 1977;53(1):1–13.
Wong DWS. Several fundamentals in implementing spatial statistics in GIS: Using centrographic
measures as examples. Geographic Information Sciences 1999;5(2):163–174.
Wood SN. Stable and efficient multiple smoothing parameter estimation for generalized additive models.
Journal of the American Statistical Association 2004;99(467):673–686.
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Fig. 1.
Kernel estimate of dengue prevalence in Goiânia, Brazil, 2001
Fig. 2.
Difference between K functions (bold line) and simulation envelope (lighter lines) for
childhood leukemia and ‘population at risk’
NIH-PA Author Manuscript
Fig. 3.
Stylized racial distribution in four hypothetical regions
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Fig. 4.
Smoothed map of the prevalence of mental disorders due to psychoactive substance use (top)
NIH-PA Author Manuscript
Fig. 5.
Core-periphery structure of the Lower Yangtze macroregion, showing high-order cities and
major waterways, 1990
Fig. 6.
Spatial autocorrelation of net migration in US counties
NIH-PA Author Manuscript
Fig. 7.
Moves showing origin and destination in the Kansas City area, for outwardly moving elderly
residents
NIH-PA Author Manuscript