1985 - Wartenberg - Multivariate Spatial Correlation A Method For Exploratory Geographical
1985 - Wartenberg - Multivariate Spatial Correlation A Method For Exploratory Geographical
METHODS
Spatial autocorrelation is defined in terms of univariate data observations.
Moran’s coefficient Z (Moran 1948, 1950; Cliff and Ord 1981), for example, is the
weighted sum of the product of separate data observations, centered to the
expected value of the observations, standardized to adjust for the variance of
the observations, and normalized for the total sum of the weights. The following
Contribution No. 539 in Ecology and Evolution from the State University of New York, Stony Brook.
This work is part of a doctoral dissertation submitted to the Department of Ecology and Evolution, State
University of New York, Stony Brook. The author thanks Drs. R. R. Sokal, F. Rohlf, J. D. Thomson,
and R. C. Grimson for comments on this manuscript and Drs. N. Oden and R. I& .for many hours
Setzer
of helpful discussions. A reviewer pointed out the problem of negative eigenvalues and provided helpful
guidance. B. Thomson and D. DiGiovanni assisted with technical aspects of this study. This research was
supported by grant GM 2826202 from the National Institute of General Medical Sciences to R. R. Sokal.
n C w , , ( x ,- X ) ( X i - X)
(2)
I = n 9
soC ( X i - xy
i=l
where
so = CWij
(2)
w..
'I
= weight for locality pair (i, j )
xi = observation at locality i
X = mean of x i s
n n
C = C C forizj.
(2) i-1 j=1
where
c w i j = 1.0.
(2)
M = ZtWZ, (4)
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Daniel Wartenberg / 265
where
M is an m by rn, variable by variable, spatial correlation matrix
Z is an n by rn, location by variable, standardized and centered (by variable)
data matrix
Z t is an m by n, variable by location, standardized and centered (by variable)
data matrix, the transpose of Z
W is an n by n, locality by locality, weight matrix.
Each coefficientin the matrix M is a Mantel-type coefficient (Mantel 1967). That
is, each coefficient is a general cross-product statistic among elements of two
matrices in which these elements are distances (or similarities) among pairs of
objects (Hubert, Golledge, and Costanzo 1981). The distributional properties of
each diagonal element of M are the same as for univariate autocorrelation values.
Indeed, the diagonal values are themselves Moran’s I coefficients. Each off-diagonal
element is, by analogy, a bivariate crosscorrelation coefficient,the spatial correlation
of one variable with another variable calculated by summing the values over all
pairs of localities, and weighted as in the autocorrelations. One such coefficient
exists for each pair of variables. The expected value and variance of these
coefficients under a permutational hypothesis have been derived by Mantel (1967),
but their distribution is unknown. For large sample sizes, the distribution is often
asymptotically normal, but deviations from normality are not unusual. Klauber
(1975), developing a multivariate analytic approach similar to MSC, derived
expectation and variance equations for the cross-product statistics when more than
two samples exist. However, he himself notes the limitations of the use of these
statistics given the unusual distribution of the raw data (i.e., reciprocal distance). In
addition, all the problems of significance testing in PCA, such as multiple
comparisons and tests on successive factors after one is found to be significant,
apply equally well to MSC. Given these problems and the fact that the full
distributional properties of these coefficients have not been worked out, this
approach to assessing significance will not be addressed here any further.
An alternative derivation can be given in which the spatial correlation
methodology is thought of as a part of a generalized principal components analysis
(Appendix). This approach is relevant for statistical modeling of the covariance
structure of the data. However, a discussion of this issue is beyond the scope of this
paper and will be explored elsewhere.
This spatial correlation matrix, M, which is in quadratic form, can be decomposed
into orthogonal components using eigenvector analysis. These components, as in
PCA, reflect the distribution of variation, in this case spatially weighted variation,
throughout the multivariate data field. All statements made in reference to spatial
variance or spatial components use these terms for convenience by analogy with
PCA and should not be interpreted in the strict statistical sense. The first component
explains the maximum amount of variance that can be explained by a linear
combination of the variables. The second component explains the maximum
amount of residual variance (i.e., that not explained by the first component) that
can be explained by a linear combination of the original variables, while remaining
orthogonal to the first component. A third component can be extracted that is
orthogonal to the first two, and so on. Those components explaining the major
portions of the variance should depict the basic patterns of spatial patches and
trends, when mapped.
An important difference between this approach and PCA must be pointed out.
Unlike R,the product-moment correlation matrix that i s decomposed in PCA, M is
not positive definite. That is, M can have negative eigenvalues, which R cannot.
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
266 / Geographical Analysis
RESULTS
1 . Sensitivity
Two types of patterns are simulated to evaluate the sensitivity of this proposed
technique for detecting simple spatial structure. The first is created by filling a
square grid of a specified size with random, normal (0,l) deviates for each of 10
variables. To the first variable, for the left half of the grid, a specified increment is
added to simulate a patch. Thus,
Z
yijk = 6,jk + ZNC for k = 1 and i < -
2
= E jk otherwise,
where
i is the row index
j is the column index
k is the variable index
I is the number of rows
J is the number of columns
K is the number of variables (10 in this case)
c . . are random, N(0, l), independent deviates
I% is the increment added to the cs
Y i j k is the observed grid value.
One variable is spatially patterned and nine are not. Calculated next are the
values of Moran’s Z for the patterned variable and the ratio of the first eigenvalue
to the sum of the absolute values of all the eigenvalues for both the spatial
correlation matrix for all variables and the Pearson product-moment correlation
matrix for all variables. The single spatial autocorrelation coefficient for the first
variable is an index of the univariate spatial structure for the variable with the
added increment. Spatial autocorrelation for the other variables should not be
significantly different from expectation and should not vary as the increment added
to the first variable changes. The ratios represent the relative magnitude of the first
eigenvalue in spatially weighted and unweighted correlation matrices, respectively,
to the total variance. They reflect the effectiveness of each method in detecting
structure of the one spatially patterned variable in an otherwise unpatterned
multivariate data set. For an increment of 0.0, no pattern should be detected. This
experiment was replicated 25 times for each set of parameter values. The means
and standard errors of the indexes are tabulated in Table 1 and Table 2 for grid
sizes 36 and 100, respectively, for increments ranging from 0. to 10.0.
Table 1 shows that for a 6by-6 grid, an increment of 1.0 produces an increase in
Moran’s Z and a slight change in the eigenstmctureof the spatial correlation matrix.
Larger increments show marked changes in both of these. The Pearson product-
moment correlation matrix shows no overall change.
Table 2, for the laby-10 grid, shows a similar pattern, although the change
appears to begin earlier for Moran’s I , at an increment of 0.5. The change in
eigenvalue ratio for M is suggestive at an increment of 0.5 and marked at an
increment of 1.0. Thus, for these sample sizes, this technique is marginally able to
detect displacements of one standard deviation of the overall surface values. For
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
268 / Geographical Analysis
TABLE 1
Results of Simulation 1: The Spatial Structure of a Patch Model on a Sby-6 Grid __ - __
~ _ _ _ .-
Spatial Ratio Pearson Ratio Moral’s I
Increment Mean(%) SE Mean(%) SE Mean SE
TABLE 2
Results of Simulation 2: The Spatial Structure of a Patch
_._.
a 10-by-10_Grid
Model on~ _ .. ~.
_____ ~
larger height changes, the method depicts clear change. As sample size increases,
there is also a suggestion of increasing sensitivity.
For the second sensitivity test, a linearly increasing trend term is added to the
first variable of random normal deviates. The total displacement of the trends
ranges from 0.0 to 5.0 across the entire grid area along one axis. Thus,
i
Y i j k = ZNC* -
I
+ c i j k for k = 1
= c i j k for k > I
___.
TABLE 4
Results of Simulation 4: The Spatial Structure of a Trend Model on a 10-by-10Grid
Spatial Ratio Pearson Ratio Moran’s I
Increment Mean (W) SE Mean (W) SE Mean SE
2. Accuracy
To assess the multivariate resolution of this method, two more simulated data sets
were created. These data sets have spatial patterns for more than one variable. The
sampling design is shown in Figure 1. The 36 localities are located on a square grid
and are divided into four groups: A, B, C, D. Each locality is assigned a random,
independent, normal (0,l) deviate for each of eight variables in each study. For the
first simulation in this set, an increment of 3.0 is added to all localities in sections A
and B for the first variable, and a like increment is added to all localities in sections
B and D for the second variable. Six additional spatially random variables, equivalent
to an increment of 0.0 in all sections, are included (Figs. 2A-2H). I then calculate
the M matrix and extract the eigenstructure. Two components account for over 98
percent of the variance. The eigenvectors are rotated obliquely to simple structure
using the Hams and Kaiser (1964) criterion, and the standardized data are then
projected onto these axes. The results are shown in Figures 21 and 2J. Figure 21
shows the contrast between AC and BD, as in variable 2. Figure 2J shows the
contrast between AB and CD as in variable 1. The input data structure is clearly
revealed by these analytic results. A similar analysis by PCA with oblique rotation
of the first 2 components (Figs. 2K, 2L) reveals no discernible geographic patterns
and the first two components explained only 42 percent of the variance. (Since the
locality scores and projections of data observations onto eigenvectors, the definition
of positive and negative is arbitrary and can be reversed. It is the magnitude that is
of interest.)
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIG. 1. The Sampling Design for the Simulation Experiments. There are 36 localities divided into 4
regions, A, B, C, and D.
cu
I-
z
w
z o
0
-1 0 1
COMPONENT 1
FIG. 2. Geographic Maps for the First Simulation. Frames A-H are input variables. Various values
are added to underlying random, N(0,l) deviates as described in the text. Frames I, J are maps of
rotated MSC scores, frames K, L are maps of rotated PCA scores. Frame M is a plot of the component
loadings for PCA and for multivariate correlation on the first two component axes, after oblique rotation.
The uppercase letters are the PCA loadins and the lowercase letters are the loadings from M. The
letters correspond sequentially to the variables (i.e., A and a are the first variable, B and b are the
second variable, etc.).
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Daniel Wartenberg / 271
Another way to look at these data is to plot on the same set of axes the loadings
on the first two components for each variable for PCA and for MSC. One can then
assess how the relative position of each variable in this space changes, based on the
spatial weighting. Variables with strong spatial structure should remain far away
from the origin, although their orientation may change. Variables with weak spatial
structure should end up closer to the origin.
A plot of this type, in which loadings from PCA and MSC have been rotated
obliquely, is shown in Figure 2M. The uppercase letters represent the PCA loadings
and the lowercase letters the loadings from MSC. The solid lines depict the change
in position of the variables from the PCA solution to that for MSC, that is, that due
to spatial weighting. In this case, only the first ( A ) and second (B) variables are far
away from the origin for MSC while most variables are far away from the origin for
PCA. As A and B have spatial structure, by design, while none of the other
variables do, this representation is consistent with what we know about the
variables and emphasizes the spatial pattern.
The next simulation introduces terms with trends rather than patches. Again, the
grid in Figure 1 is filled with independent, random, normal (0,l) deviates for each
variable. The first variable is incremented from left to right, by values ranging from
0.0 to 3.0 (Fig. 3A). The second variable similarly is incremented from front to back
increment (Fig. 3B), while the next six variables are left spatially random (Figs.
3C-3H). The data are analyzed as above and yield two components that account
for 91 percent of the variance. The first is a front-to-back contrast (Fig. 31), the
second is a left-to-right contrast (Fig. 35). The PCA results for the same data (Fig.
3K, 3L) do not show distinct geographic patterns. The PCA solution explains only
37 percent of the variance.
The plot of PCA loadings and the MSC loadings from (Fig. 3M) is similar to that
for the first simulation. The first two variables ( A and B) maintain their importance
in both types of analysis, although their orientation switches, while the other
variables lose some of their importance (i.e., end up closer to the origin) in the
spatially weighted case. The component scores are more informative than the
loadings, but the loadings generally are consistent with our knowledge of the data.
Additional simulations were run for more complex patterns and the results were
consistent with those reported here. In summary, in all simulations MSC depicted
the geographic pattern that was put in. PCA was much less effective at describing
these patterns. Plots of the component loadings helped describe the way in which
MSC was sensitive to geographic pattern.
3. H L A Human Blood Group Data
The next test of the proposed methodology is to analyze a real rather than a
simulated data set. The data I have chosen are gene frequencies of 21 alleles of the
HLA-A and HLA-B human blood systems measured in 58 European and Near
Eastern populations (localities). The geographic patterns of these data have been
studied by Menozzi et al. (1978), Sokal and Menozzi (1982), and Wartenberg
(1985a). Blood type characteristics are indicative of a population’s origin and
heritage, Differences in blood types between populations dissipate through
interbreeding. The expressed goal of Menozzi et al. (1978) was to map synthetic
variables, statistical composites of genetic (blood type) variables, from which to
infer the evolutionary history of the populations studied. They constructed these
synthetic variables using PCA. Since genetic distance between populations should
be proportional to the time of separation and inversely proportional to the
intermigration between them, the history of geographic movement should be
revealed from the study of these maps (Cavalli-Sforza and Bodmer 1971; Menozzi
et al. 1978). Using PCA, Menozzi et al. were able to summarize over half of the
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
272 / Geographical Analysis
-
D
-1
-1 0 1
COMPONENT 1
FIG.3. Geographic Maps for the Third Simulation. Frames A-H are input variables. Various values
are added to underlying random, N(0,l) deviates as described in the text. Frames I, J are maps of
rotated MSC scores, frames K, L are maps of rotated PCA scores. Frame M is a plot of the component
loadings as in Figure 2.
found by Menozzi et al. (1978). Sokal and Menozzi (1982) emphasized the
relationships of the parameters of the geographic patterns (similarities of the
correlograms of the variables) to parameters of the multivariate pattern (correlations
between the variables). Their conclusions about the migrational history of early
European populations were consistent with those of Menozzi et al. (1978).
Wartenberg (1985a) studied the same data subset as Sokal and Menozzi (1982)
and applied the method of canonical trend surface analysis (CTS). By constructing
variancecovariance matrices of the genetic variables (blood type alleles) and the
geographic variables (coordinates, their squares and cross products) and taking the
joint eigenstructure, he constructed maps of the overall geographic patterns. These,
too, were consistent with the earlier analyses. He also showed, however, that if
additional data without geographic pattern were included in the analysis, only CTS
would be able to recover the underlying geographic information.
Details of the data set used in this study are given in Sokal and Menozzi (1982).
A map of the localities is shown in Figure 4. The spatial correlation matrix is
calculated using inverse distance squared weighting, and the eigenstructure extracted
(Table 5). From consideration of a scree plot (Cattell 1978), I retain two components
as most important. They account for 80.6 percent of the variance. The next few
components account for patterns of lesser importance (corresponding to a second
scree) and the final components correspond to the error variance (the first scree). I
obliquely rotate the first two components to simple structure using the Harris-Kaiser
criterion (Hams and Kaiser 1964) and project the standardized data onto these
axes. The resulting locality scores are shown in Figures 5A and 5B.
1 ..
...........
I
I ..........
................ ..55.
I.. .. ................
..............
....
1..00...
1 ..... ..........
.................. 10
I
1 ...................
.....................
I ..........
.......... ........... ...... .03
...........
15.. ..11
...........
I
4
I Ol... .............
13....
...............
i i i b i i ' ................
12
46
I
I .... ...
c2..
I
... :;,.#;.
..c7
....... ................
...... . .............................
2Q.........................
1,o9;;:$;..........
45.................
I
I
I
3130..
32.. -26..
33....23......
.................
......... ... .......................
..................
.43.50..
I .........
......
.34 ,24.25 2944..
r?..........k~k2..-..............
I
.... .....
...........
. . . . . .
ZR..................................
.........................
. .......................
. ...............
35...40.. LO.......................
1
1 .............
............
..... ..
-37.. m.36
.... ......
.s3.
..................
.................
~47.
1
.......... .. . . . ..................
I 39 55' 49...50 1
?8.. ...lS
...........
1
......... ... ............ 1
1
I
1
1
....
.....
...
1
I
1
I
.----t----f ---- ----2----,----3----*----'----,----~----,----6.
4
56 -1
FIG.4. A Map of Europe and the Near East Showing the 58 Localities Where the HLA Blood Group
Gene Frequencies Were Sampled
All maps of the HLA data in this paper were produced by means of the SYMAP
computer contouring program (Dougenik and Sheehan 1979) using a Lambert
Azimuthal, equal area projection, centered at 0 degrees latitude and 7.5 degrees
east longitude. There is a one-to-one correspondence between areal sizes on such a
map and true areal sizes on the spherical Earth.
A clear north-south pattern exists across the entire map of the first component,
with various aberrations toward the center. Highest values are noted in Scandinavia,
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
-. .__
TABLE 5
Results of MSC on HLA Data Set I: Absolute Value of the Eigenvalues of the Spatial Correlation Matrix
M -~
~- -. ~.
Component Percentage
Number Eigenvalue of Total
.~ ~
1 7.65 61.75
2 2.33 18.81
3 0.64 5.15
4 0.56 4.51
5 0.44 3.57
6 0.25 2.06
7 0.15 1.18
8 0.12 0.95
9 0.07 0.60
10 0.04 0.32
11 0.03 0.21
12 0.02 0.21
13 0.02 0.17
14 0.01 0.12
15 0.01 0.11
16 0.01 0.11
17 0.01 0.06
18 0.01 0.06
19 0.00 0.03
20 0.00 0.02
21 0.00 0.02
C)
FIG.5. Contour Maps of the PCA and MSC Rotated Component Scores from HLA Blood Group
Data Set I Analysis. Frame A is the first MSC component, frame B the second MSC component, for the
data alone. Frame C is the first PCA component, frame D the second PCA component.
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Daniel Wartenberg / 275
TABLE 6
Spearman’s Rank Correlations for HLA Data Set I of Scores on the First Two MSC Components with
Component Scores on the First three PCA Components and the First Three CTS Components Different
Techniques as well as Scores on the First Two MSC Components for HLA Data Set I1
HLA Data Set I
Method Component 1 Component 2
MSC-HLA I
Component 2 0.669
MSC-HLA I1
Component 1 0.774 0.834
Component 2 0.480 0.794
Orthogonal PCA
Comwnent 1 0.973 0.707
Component 2 0.005 - 0.296
Component 3 - 0.237 - 0.717
CTS
Surface 1 0.914 0.611
Surface 2 0.194 0.423
Surface 3 - 0.268 - 0.401
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
276 / Geographical Analysis
A ) FACTOR 1 B) FACTOR 2
C)
FIG. 6. Contour Maps of the PCA and MSC Rotated Component Scores from HLA Blood Group
Data Set I1 Analysis, with Random Noise. Frame A is the first MSC component, frame B the second
MSC component, for the data alone. Frame C is the first PCA component, frame D the second PCA
component.
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Daniel Wartenberg / 277
are very similar to those discussed above. Most differences in the maps occurred in
areas of low data density, perhaps an artifact of the contouring program. The
correlations between these component scores and the corresponding scores from
data set I are all above 0.77 (Table 6). The technique is insensitive to spatially
unpatterned noise.
The PCA results for the HLA data set I1 (Figs. 6C,6D) are not as resistant to
geographically random information. The first component is still similar to that
obtained with HLA data set I, but the second component is quite different.
4. Foraminifma Data
The final data set I analyze is a set of species abundances of 26 species of
Foraminiferasampled from the sediment core tops at 61 locations (Fig. 7 )throughout
the Atlantic and Indian Oceans collected by Imbrie and Kipp (1971). The goal of
their original study was to derive statistically independent assemblages of species
that could be used in multiple regression analysis for paleoecological reconstruction
of climate. They discussed the geographic distribution of the species and argued
that components derived by PCA would be geographically coherent. They mapped
the component loadings (from a Q-mode analysis) which showed patterns
corresponding to the basic climatic regimes (i.e., polar, subpolar, subtropical, and
tropical) and circulation patterns (i.e., gyre margins, transitional zones) of the
oceans (see also Kipp 1976; Wartenberg 198%).
t ----,---- 2 ----
1 ----,---- t--- -3----*----L----t-- L---*
.........
I
I 01.. I
....... .....
I .')4..
INORTH A r E R I C A .g6.0910.... I
I 12.. I
+
............ 0711..1513. EURODE t
.................
I I
If:::::::
I
...... 32.....25........
...... ........
I 39..34...74.......
154............
..................
I. . . 5 5 ......
.4&3,
21
..
I
..........
..29.
.....
.......
1
.;ll;;;;:;IQ4c........
......................
.. ..... .........I
AFRICA
..........
I .56..............
........
.........
I
.... ......
I 56...42... I
I ............
.60. . 5 7 . .
53.50
..... ..........
...........
I
I
; .................
-57.. 5 2 4 6 . . 4 5 . .
................. ...........
...........
2
I
.... ...... ...........
ISOUTH A W E R I C A
I
I
.... .........
51..
..................
.............
.............
41..
.38..
...........
.....
...... ....
.........
..22.
.............
.23... 46
.4?...
.4
.
I
I
I
1
43. I
I
.I
................. ........................
3531.....27..20.....
.3026..........
........... l 18
I
3
I ......... ...... .... ...............
.......................................
14 ;
It
19
25
I
I .....................................
................................
I
I
I
t
I
.... ........
09..
..................
..................................
...........................
05
03
01
I
I
I
I
*----+ ----T
...........................
----*---- ........................
---- 2 ---- ---- -- ----.
+--- -3----, 4 +-- 5
I
I
FIG. 7. A Map of the Atlantic and Indian Oceans Showing the 61 Localities Where Core Tops Were
Taken by Imbne and Kipp (1971) and the Foraminifera Identified
These data have also been examined by CTS (Wartenberg 1985a). The CTS
results showed a regional structure in the first and second components that was
similar to that summarized by the first four PCA axes. The next three CTS axes
showed more detailed geographic structure that is coincident with circulation and
biological production patterns.
Again, the patterns depicted by MSC are somewhat different from those derived
with other methods, although the broader features are recovered similarly in all
techniques. Two factors recover 69 percent of the variance. The subsequent 8
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
278 / Geographical Analysis
factors, the second scree, also appear to be indicative of pattern, but to a much
lesser degree. The rest of the higher-order components, the first scree, seem
unimportant. For simplicity, I will concentrate on the first two factors.
The first factor (Fig. 8A) shows a latitudinal zonation and is most highly
concentrated in the trade wind region of the North Atlantic Ocean. The pattern
falls off to the north and south, but there is a brief rise in the South Atlantic and
Indian Oceans, also in the trade wind region. The contrast between regions
corresponds well with climatic zones, as was depicted by the other methods of
analysis, but the pattern in the trade wind regions is strongest. More fine scale
detail with greater geographic relief is afforded by the surfaces of MSC than the
smooth surfaces produced by CTS.
...........................
........................
........................................................
A ) FACTOR 1 B) FACTOR 2
FIG.8. Contour Maps of the PCA and MSC Rotated Component Scores for the Foraminifera Species
Abundance Data from the Atlantic and Indian Oceans. Frame A is the first MSC component, frame B
the second MSC component. Frame C is the first PCA component, and frame D is the second PCA
component.
The second factor (Fig. 8B) has its highest values in the equatorial region of the
oceans. The values fall off towards the poles, with intermediate values (i.e., those
close to 0) in the trade wind regions. The variation in these intermediate areas was
described by the first component. There is also a suggestion of pattern corresponding
to the strong coastal margin currents along the eastern United States, western
Africa, and southeastern Africa.
Maps of the obliquely rotated PCA components show a fairly similar pattern
(Figs. 8C,D). As with the HLA data, since I have picked a data set with strong
geographic pattern, most analytic techniques will depict the same geographic
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Daniel Wartenberg / 279
pattern. In general, the pattern is that of the climatic zonation noted in the other
analyses.
The joint picture from these two components (either the MSC or PCA) seems to
be a composite of those represented by other (orthogonal) methods. Much of the
information isolated into separate components by orthogonal PCA or separate
surfaces by CTS are merged into a more highly patterned representation. An
oblique representation is more complicated to interpret, but more economical in
depiction. Rank correlations of the component scores from different methods are
given in Table 7.
TABLE 7
Spearman's Rank Correlations between Scores on Components from MSC, PCA, and CTS for the
Foraminifera Data ~~
Foram Data
Method Component 1 Component 2
MSC
Component 2 0.209
Orthogonal PCA
Component 1 0.798 0.657
Component 2 - 0.346 0.778
Component 3 - 0.272 0.149
CTS
Surface 1 0.550 - 0.697
Surface 2 0.023 - 0.534
Surface 3 0.108 0.290
APPENDIX
The Relation between Ordinary Least Squares, Generalized Least Squares, Principal
Cmponents Analysis, and Multivariate Spatial Correlation
Consider the standard linear model
Y = $X+r, (A11
where c = N(0, a21).
In regression analysis by ordinary least squares methods (OLS), we estimate as
In situations where there is covariance among the error terms, the standard linear
model is
Y = px+c, (A31
where c = N(0, a2V).
In this case, to estimate $, we use generalized least squares methods (GLS):
= (X=V-'X)-'XtV-'Y. (A41
By analogy, for principal components analysis (PCA), we assume a standard model:
X = ZFt + c, (A51
where c = N(0,a21).
We estimate F by taking the eigenstructure of R,
R = XtX. (A6)
For generalized principal components analysis, (GPCA), I propose the following
model:
X = ZF' + c, (A7)
where c N(0,a2V).
We can estimate F by taking the eigenstructure of M,
M = XtV-'X. (A81
Note that in this derivation we do not have an arbitrary weight matrix W as in
Moran's Z derivation. Rather, I use a weight matrix called V-', as it is typically
15384632, 1985, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1985.tb00849.x by CAPES, Wiley Online Library on [19/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
282 / Geographical Analysis
v-’ = (I - p c ) . (A9)
Then in GLS,
fj = (xt(1- p c ) x ) - y X “ I - pC)Y)
= ( X y I - p c ) X ) - l ( x t Y - pX“Y), (A101
After appropriate normalizations, the rightmost term of the equation is the OLS
solution term minus a term for spatial covariance. Similarly in GPCA,
The first term on the right of the equation is the PCA solution and the second is the
MSC solution. Further development of this approach will be presented elsewhere.
LITERATURE CITED
Cattell, R. B. (1978). The Scientific Use of Factor Analysis. New York: Plenum Press.
Cavalli-Sforza, L. L., and W. F. Bodmer (1971). The Genetics of Human Populations. San Francisco:
W. H. Freeman.
Cliff, A. D., and J. K. Ord (1981). Spatial Processes: Models and Applications. London: Pion.
Cliff, N., and D. J. KNS (1976). “Interpretation of Canonical Analysis: Rotated vs. Unrotated Solutions.”
Psychmetrika, 41,35-42.
Crain, I. K., and K. Bhattacharyya (1967). “Treatment of Non-Equispaced Two-Dimensional Data with
a Digital Computer.” Geoexploration, 5, 173-94.
Dougenik, J. A., and D. E. Sheehan (1979). SYMAP User’s Reference Manual. Version 5.20. Cambridge,
Mass.: Laboratory for Computer Graphics and Spatial Analysis, Harvard University Graduate School
of Design.
I
Geary, R. C. (1954). The Contiguity Ratio and Statistical Mapping.” The Incorporated Statistician, 5,
‘I
115-45.
Griffith, D. A. (1978). “ A Spatially Adjusted ANOVA Model.” Geographical Analysis, 10, 296-301.
Hams, C. W., and H. F. Kaiser (1964). “Oblique Factor Analytic Solutions by Orthogonal
Transformations,” Psychometrika, 29, 347-62.
Hubert, L. J., R. G. Golledge, and C. M. Costanzo (1981). “Generalized Procedures for Evaluating
Spatial Autocorrelation.” Geographical Analysis, 13, 224-33.
Imbrie, /., and N. G., Kipp (1971). “ A New Micropaleontological Method for Quantitative
Paleoc Imatology Application to a Late Pleistocene Caribbean Core.” In The Late Cenozoic Glacial
Ages, edited by K. K. Turekian, pp. 71-181. New Haven: Yale University Press.
Kipp, N. G. (1976). “New Transfer Function for Estimating Past Sea-Surface Conditions from Sea-Bed
Distribution of Planktonic Foraminiferal Assemblages in the North Atlantic.” In Inoestigation of Late
uatemary Paleoceanography and Paleoclimtology, edited by R. M. Cline and J. D. Hays, pp. 3-41.
eol. Soc. Amer. Memoir 145.
Klauber, M. R. (1975). “Space-Time Clustering Tests for More than Two Samples.” Bimetrics, 31,
719-26.
Lebart, L. (1969). Analyse Statistique de la Contiguite.” Publication Institut Statistique de
“