Stat 07754
Stat 07754
net/publication/305911778
Sample Support
CITATIONS READS
3 2,716
1 author:
Pierre Goovaerts
BioMedware, Inc
259 PUBLICATIONS 21,169 CITATIONS
SEE PROFILE
All content following this page was uploaded by Pierre Goovaerts on 19 August 2017.
FS
Sample Support
By Pierre Goovaerts
OO
Keywords: aggregation, kriging, variogram, scaling, simulation, indicators
Abstract: Sample support generally refers to the length, area, or volume associated
with a measurement or observation. The term originates in the field of geostatistics,
PR
primarily from mining applications. To appreciate the issues associated with this term, this
entry provides (i) a more complete definition of sample support as used in geostatistics;
(ii) a description of the problems that arise in working with data of differing supports;
(iii) a brief overview of the solutions to change of support problems.
Sample support generally refers to the length, area, or volume associated with a measurement or observa-
tion. The term originates in the field of Geostatistics, primarily from mining applications. To appreciate
E
03434
the issues associated with this term, this entry provides
1. a more complete definition of sample support as used in geostatistics;
G
2. a description of the problems that arise in working with data of differing supports;
3. a brief overview of the solutions to change of support problems.
PA
1 Definition
01837 In many applications of Spatial Data Analysis, the data are measurements recorded at distinct points in
space. In other applications, the spatial variable of interest is inherently associated with a unit that has area
or volume (e.g., the permeability of a rock, the ozone concentration in the air, and the reflectance value of a
00024 pixel in satellite images). Spatial Aggregation is sometimes necessary to create or utilize meaningful units
for analysis (e.g., exposure units for soil pollution, management units in agriculture) or to make inference
about a region of interest (e.g., the average temperature of a lake and the grade of a block of ore). The term
T
support has come to mean simply the size or volume associated with each data value, but the complete
specification of this term also includes the geometrical size, shape, and spatial orientation of the regions
associated with the measurements[1] . Changing the support of a variable (often by averaging or aggrega-
RS
tion) creates a new variable. This new variable is related to the original one but has different statistical
and spatial properties. The problem of how the spatial variation in one variable relates to that of another
variable derived from it is called the change of support problem.
Update based on original article by Pierre Goovaerts, Wiley StatsRef: Statistics Reference Online © 2014 John Wiley & Sons, Ltd.
Wiley StatsRef: Statistics Reference Online, © 2014–2016 John Wiley & Sons, Ltd. 1
This article is © 2016 John Wiley & Sons, Ltd.
DOI: 10.1002/9781118445112.stat07754.pub2
Sample Support
2 Change of Support
FS
The change of support problem can be traced back to Krige’s “regression effect” and subsequent corrections
used in mining blocks of ore in the 1950s[2] . Simple examples and illustrations of the effect of support on
a regionalized variable can be found in Ref. 3, and more complete discussions can be found in Refs 4–8.
Gotway and Young[9] provide an excellent review and the most comprehensive discussion of the topic of
03016 change of support. Consider the spatial process (see Spatial Processes)
OO
{Z(s) ∶ s ∈ D ⊂ Rd } (1)
where Z(s) represents the value of a random variable at a known (point) location s (see Point Processes,
07742
Spatial). Suppose that instead of observing a realization of this process, we collect data Z(B1 ), Z(B2 ), … ,
Z(Bn ) where
1
Z(Bi ) = Z(s)ds (2)
|Bi | ∫Bi
problem. PR
and |Bi | is the volume of Bi ⊂ D, i = 1,2, … , n. In geostatistics, Bi is called the support of Z(Bi ). For B ⊂ D,
the problem of drawing inference on Z(B) from data Z(B1 ), Z(B2 ), … , Z(Bn ) is called the change of support
The change of support problem arises from the fact that the distribution of Z(B) is different from that of
Z(s). One reason for this is simply due to averaging. When the same set of data is averaged over increasingly
larger areal units, the variance of the data tends to decrease. This is a well-known inferential problem in
statistics that has been documented both with theoretical models[10,11] and in empirical studies[12 – 14] .
03898 However, with spatial data, this decrease in Variance is moderated by positive autocorrelation among the
E
original observations and exacerbated by negative autocorrelation. The aggregation process itself induces
03515 positive spatial autocorrelation, particularly if it is based on overlapping units (e.g., Moving Averages).
Thus, the distribution of Z(B) is not only smoother than that of Z(s), its spatial variation (as measured by
G
1 ∑
N(𝐡)
PA
̂
𝛾 (𝐡) = [Z(sj ) − Z(sj + 𝐡)]2 (3)
2N(𝐡) j=1
where N(h) is the number of data pairs with locations separated by a vector h. After averaging to create
a field with lower spatial resolution and four times fewer values (Figure 1b), the resulting variogram is
computed as
1 ∑
N(𝐡)
̂
𝛾B (𝐡) = [Z(Bj ) − Z(Bj + 𝐡)]2 (4)
2N(𝐡) j=1
T
This variogram has a lower sill reflecting the smaller variance of the aggregated data Z(Bi ) relative to the
point data Z(si ).
RS
There is another aspect to the change of support problem that is equally as important. This arises
from alternative formations of the areal units leading to differences in unit shape at the same or similar
07566 scales[15,16] . Averaging over regions of different shapes is analogous to Smoothing different combinations
of spatial neighbors. Depending on the similarity of the neighbors, different regional averages may have
different statistical properties. Simply said, Z(Bi ) and Z(Bj ) can have different distributions even if the
volume Bi is the same as that of Bj . Thus, the complete specification of sample support that includes the
shape and orientation of the units is an important consideration in change of support problems.
FI
Numerous terms have been introduced to describe one or more facets of the change of support prob-
lem and particular solutions to it: the ecological inference problem; the modifiable areal unit problem
2 Wiley StatsRef: Statistics Reference Online, © 2014–2016 John Wiley & Sons, Ltd.
This article is © 2016 John Wiley & Sons, Ltd.
DOI: 10.1002/9781118445112.stat07754.pub2
Sample Support
FS
Averaging
OO
(a) (b)
1.20
PR Regularization
γ
0.80
Deconvolution
0.40
Pixel = 2×2
0.00
0.00 1.00 2.00 3.00
(c) Distance
E
Figure 1. Numerical example illustrating the impact of change of support on the variogram. (a) Simu-
lated random field with pixel size 1 × 1. (b) Results of averaging groups of four pixels in image (a) to create
a new random field with pixel size 2 × 2. (c) Variograms of the two random fields. Averaging reduces the
G
variance of the data, leading to a lower sill of the variogram. Variograms can be derived from each other
through a regularization or deconvolution.
PA
03806 (see Modifiable Areal Unit Problem (MAUP)); spatial data transformations; the scaling problem; infer-
ence between incompatible zonal systems; block kriging; area-to-area, area-to-point, and area-and-point
kriging; pycnophylactic geographic interpolation; the polygonal overlay problem; areal interpolation; infer-
ence with spatially misaligned data; contour reaggregation; and multiscale spatial modeling.
The moments of Z(B) are related to the moments of Z(s). Thus, if E[Z(s)] = 𝜇, then E[Z(B)] = 𝜇, and
RS
C(u, v)dudv
∫Bj ∫Bi
= (5)
|Bi ||Bj |
00559 where C(u, v) = cov[Z(u), Z(v)], u, v ∈ D[5,6] . Note that the behavior of this Covariance depends not only on
FI
the point-support covariance but also on the specific blocks (and not just their volumes) being considered.
In practice, the integral in Equation (5) is computed by discretizing Bi and Bj into N points, {u′i } and {v′j },
Wiley StatsRef: Statistics Reference Online, © 2014–2016 John Wiley & Sons, Ltd. 3
This article is © 2016 John Wiley & Sons, Ltd.
DOI: 10.1002/9781118445112.stat07754.pub2
Sample Support
FS
C(Bi , Bj ) ≈ C(u′i , v′j ) (6)
N 2 i=1 j=1
The relationship between the “block-support” covariance (i.e., that pertaining to support B) and the
“point-support” covariance (i.e., that pertaining to support s) is a fundamental aspect of solutions to change
of support problems. This relationship is illustrated for variograms in Figure 1c. Following Ref. 6, the point-
support and block-support variograms are theoretically related by the general formula
OO
𝛾B (𝐡) = 𝛾(𝐡) − 𝛾(B, B) (7)
This relationship is based on the assumption that all the blocks have the same size and shape; hence the
within-block variogram 𝛾(B, B) is a constant. By analogy with Equation (6), the within-block variogram
value represents the mean value of the point-support variogram computed between any two arbitrary
points within block B. It is commonly approximated[3] using the following arithmetical average
1 ∑∑
N N
𝛾(B, B) ≈ 𝛾(u′i , v′j )
PR N 2 i=1 j=1
Deriving the block-support variogram from the point-support model is called regularization while the
reverse operation is known as deconvolution.
(8)
i=1
05980
where the weights {𝜆i } are chosen to minimize mean-squared prediction error subject to Unbiasedness.
03708 Quantity (9) is also known as area-to-area (ATA) Kriging predictor[17] and the operation of deriving the
value of B from overlapping blocks Bi is called side-scaling [18] . The optimal weights are obtained by solving
PA
the equations
∑
n
𝜆k C(Bi , Bk ) + m = C(Bi , B), i = 1, … , n
k=1
∑n
𝜆i = 1 (10)
i=1
02275 where m is a Lagrange multiplier (see Lagrange Multipliers, Method of) from the constrained minimiza-
T
tion, and the necessary covariances are specified in Equation (6)[4,6] . The prediction mean-squared errors
are analogous to those associated with the (point) kriging predictor and can be found in Ref. 4. When the
data are of point-support (i.e., Bi = si ), this predictor is called the block kriging predictor [19] and the opera-
RS
tion is known as aggregation or upscaling. Conversely, when the prediction support is a point (i.e., B = s),
the operation is disaggregation or downscaling and this predictor is called the area-to-point (ATP) kriging
predictor[7,17,20,21] . The application of ATP kriging to regular blocks (e.g., pixels of an image) and blocks of
irregular size and shape are illustrated in Figures 2 and 3.
Recently, the kriging predictor has been generalized to combine point-support data (e.g., soil samples)
with areal data (e.g., soil mapping units), leading to the so-called area-and-point kriging predictor[22] :
n′
∑ n
∑
FI
⌢
Z (s) = 𝜆i Z(Bi ) + 𝜆j Z(sj ) (11)
i=1 j=1
4 Wiley StatsRef: Statistics Reference Online, © 2014–2016 John Wiley & Sons, Ltd.
This article is © 2016 John Wiley & Sons, Ltd.
DOI: 10.1002/9781118445112.stat07754.pub2
Sample Support
11×10 image
FS
(a)
OO
ATP Kriging
(b)
PR
Figure 2. Numerical example illustrating the disaggregation of data with regular spatial support using
area-to-point kriging. (a) Simulated random field with coarse square pixels. (b) Values estimated over pixels
tenfold smaller using area-to-point kriging.
E
The kriging weights are solution of a system of equations similar to expression (10)
′
∑
n
∑
n
𝜆k C(Bi , Bk ) + 𝜆k ′ C(Bi , sk ′ ) + m = C(Bi , s), i = 1, … , n
G
k=1 k ′ =1
′
∑
n
∑
n
𝜆k C(sj , Bk ) + 𝜆k ′ C(sj , sk ′ ) + m = C(sj , s), j = 1, … , n′
PA
k=1 k ′ =1
′
∑
n
∑
n
𝜆i + 𝜆j = 1 (12)
i=1 j=1
In addition to block-to-block covariances C(Bi , Bk ), solving system (Equation 12) requires block-to-point
covariances C(Bi , s) and point-to-point covariances C(sj , s), which can be computed using Equation (6)
where N = 1 for the point support.
To obtain the block-support covariances needed in Equations (10) and (12), C(u,v) must be known or
T
estimated from available point-support data. When point-support data are not available, a parametric
model for C(u,v) can be assumed, the parameters of which can be estimated by equating the theoretical
moments given by Equation (5) to the empirical moments inferred from available data[5] . This proce-
RS
dure, known as deconvolution in the geostatistical literature, has been developed analytically for regular
blocks in the 1950s (Equation 7) as it is the decision support that is typically used for mining opera-
tions. The problem is more complex for blocks of different sizes and shapes. It has been tackled by several
approaches that include solving a set of integral equations[23] , the simultaneous estimation of mean and
covariance functions from aggregated data using generalized estimating equations (see Generalized Esti-
06899 mating Equations: Introduction)[18] , and an iterative procedure that seeks the point-support model that
FI
minimizes the difference between the theoretically regularized variogram model and the model fitted
to areal data[20] . Recent studies[24] have also stressed the uncertainty attached to the parameters of the
Wiley StatsRef: Statistics Reference Online, © 2014–2016 John Wiley & Sons, Ltd. 5
This article is © 2016 John Wiley & Sons, Ltd.
DOI: 10.1002/9781118445112.stat07754.pub2
Sample Support
Original data
FS
North
[Co] (mg/kg)
16.2 – 18.0
14.4 – 16.2
12.6 – 14.4
10.8 – 12.6
OO
9.0 – 10.8
7.2 – 9.0
5.4 – 7.2
3.6 – 5.4
1.8 – 3.6
1 km 0.0 – 1.8
(a)
PR
ATP Kriging
Kriged estimates
E
[Co] (mg/kg)
16.2 – 18.0
14.4 – 16.2
12.6 – 14.4
G
10.8 – 12.6
9.0 – 10.8
7.2 – 9.0
5.4 – 7.2
3.6 – 5.4
PA
1.8 – 3.6
(b) 0.0 – 1.8
Figure 3. Numerical example illustrating the disaggregation of data recorded over irregular spatial sup-
ports (geological units) using area-to-point kriging. (a) Map of average topsoil concentration in cobalt
within geological units of varying size and shape (choropleth map). (b) Concentrations estimated over a
regular grid (isopleth map) using area-to-point kriging.
T
deconvoluted point-support models used in areal kriging, leading to the development of a Bayesian version
of area-to-point kriging.
00213 An important property of the linear kriging predictor is its coherency (see Coherence – Basic)[17,20] . If
RS
the same data are used to predict the value of Z over the support B and over the N points discretizing B,
then the following equality between areal and point kriging estimates is satisfied:
1 ∑⌢ ′
N
⌢
Z (B) = Z (ui ) (13)
N i=1
A similar relationship is found for the disaggregation of an areal datum Z(B) into N point-support esti-
FI
mates using ATP kriging. In the example of Figure 2, the coherency property implies that the mean of 100
values estimated by ATP kriging within each 10 × 10 pixel equals the original value of the coarse pixel.
6 Wiley StatsRef: Statistics Reference Online, © 2014–2016 John Wiley & Sons, Ltd.
This article is © 2016 John Wiley & Sons, Ltd.
DOI: 10.1002/9781118445112.stat07754.pub2
Sample Support
For the case of irregular spatial support, averaging the isopleth map of kriged concentration estimates
(Figure 3b) yields the choropleth map of areal data (Figure 3a) used as input to ATP kriging.
FS
3.2 Nonlinear Geostatistics
In many applications, E[Z(B)|Z] is not linear in the data Z. In others, prediction of a nonlinear function of
Z(B) is of interest. These types of problems require more information about the conditional distribution
OO
function of Z(B), given the data FB (z|Z) = Pr[Z(B) ≤ z|Z], than that used for linear prediction. To solve the
change of support problem in nonlinear cases, we may be tempted to use the solutions of linear geostatistics
with indicator data I(Z(s1 ) ≤ z), I(Z(s2 ) ≤ z), … , I(Z(sn ) ≤ z), where
{
1, if Z (s) ≤ z
I(Z(s) ≤ z) = (14)
0, otherwise
PR
I ∗ (B) =
1
|B| ∫B
I(Z(s) ≤ z)ds
which is the proportion of B consisting of points, where Z(s) is at or below z. This quantity is not the
same[25] as {
(15)
1, if Z (B) ≤ z
I(B) = (16)
0, otherwise
E
which would provide an estimate of Pr[Z(B) ≤ z|Z], the probability that the average value of Z(s) over the
block B is at or below z. This latter quantity is the one of interest in change of support problems. The same
issue arises with any nonlinear function of Z(s) because the mean of block-support data will not be the
G
same as the block average of the point-support data. This is also true in the more general problem based
on data with supports Bi that differ from support B.
With the exception of disjunctive kriging[26] , direct estimation of block distributions in nonlinear cases
relies on Monte Carlo approximations[7,25,27,28] . In an approach suggested by Goovaerts[25] , the block
PA
is discretized and data Z(u′j ) are simulated at each discretizing node (see Monte Carlo Simulation).
06174
Simulated block values are then obtained by averaging the simulated values in the block and then
transformed into block indicator values using Equation (16). Finally, Pr[Z(B) ≤ z|Z] is estimated as the
average of these block indicator values. Such simulation approaches are similar to the Markov chain Monte
07189 Carlo (see Markov Chain Monte Carlo (MCMC)) techniques used with Bayesian hierarchical models
00232 (see Hierarchical Models – Theory) for nonlinear prediction and change of support problems[29,30] .
The change of support problem is a difficult one, and sample support is still often overlooked in statistical
T
models and analyses. However, with the increased use of the internet and geographic information systems,
07767
solutions afforded by geostatistical approaches and hierarchical models have greatly evolved these past few
07706
years and should become routinely used once included in software packages, such as SpaceStat[31] .
RS
07733
07732
01966
07722
Related Articles
07721
03820 Spatial Covariance; Geostatistics, Model-Based; Natural Resources Modeling; Multivariate Kriging;
03819 Kriging, Simple Indicator; Kriging, Asymptotic Theory; Kriging for Functional Data; Spatiotemporal
FI
07357 Risk Analysis; Spatial Risk Assessment; Spatial Distribution; Spatial Autocorrelation Coefficient,
03506 Moran’s; Spatial Analysis in Ecology; S + SpatialStats; Point Processes, Spatial-Temporal.
07766
Wiley StatsRef: Statistics Reference Online, © 2014–2016 John Wiley & Sons, Ltd. 7
07197
This article is © 2016 John Wiley & Sons, Ltd.
07743 DOI: 10.1002/9781118445112.stat07754.pub2
Sample Support
References
FS
[1] Olea, R.A. (ed.) (1991) Geostatistical Glossary and Multilingual Dictionary, Oxford University Press, New York.
[2] Krige, D.G. (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J. Chem. Metal.
Mining Soc. South Africa, 52, 119–139.
[3] Armstrong, M. (1999) Basic Linear Geostatistics, Springer-Verlag, New York.
[4] Chiles, J.P. and Delfiner, P. (1999) Geostatistics: Modeling Spatial Uncertainty, John Wiley & Sons, Inc., New York.
[5] Cressie, N. (1993) Statistics for Spatial Data, John Wiley & Sons, Inc., New York.
OO
[6] Journel, A.G. and Huijbregts, C.J. (1978) Mining Geostatistics, Academic Press, London.
[7] Zhang, J., Atkinson, P., and Goodchild, M.F. (2015) Scale in Spatial Information and Analysis, CRC Press, Boca Raton.
[8] Lloyd, C.D. (2014) Exploring Spatial Scale in Geography, John Wiley & Sons, Inc., New York.
[9] Gotway, C.A. and Young, L.J. (2002) Combining incompatible spatial data. J. Am. Stat. Assoc., 97, 632–648.
[10] Arbia, G. (1986) The modifiable areal unit problem and the spatial autocorrelation problem: towards a joint approach.
Metron, 44, 391–407.
[11] Cressie, N. (1996) Change of support and the modifiable areal unit problem. Geogr. Syst., 3, 159–180.
[12] Fairfield Smith, H. (1938) An empirical law describing heterogeneity in the yields of agricultural crops. J. Agric. Sci., 28,
1–23.
[13]
[14]
[15]
[16]
material. J. Am. Stat. Assoc. Suppl., 29, 169–170.
Speybroeck, N., Paraje, G., Prasad, A., et al. (2012) Inequality in human resources for health: measurement issues. Geogr.
[21] Kerry, R., Goovaerts, P., Rawlins, B.G., and Marchant, B.P. (2012) Disaggregation of legacy soil data using area to point
kriging for mapping soil organic carbon at the regional scale. Geoderma, 170, 347–358.
[22] Goovaerts, P. (2011) A coherent geostatistical approach for combining choropleth map and field data in the spatial
interpolation of soil properties. Eur. J. Soil Sci., 62 (3), 371–380.
[23] Mockus, A. (1998) Estimating dependencies from spatial averages. J. Comput. Graph. Stat., 7, 501–513.
PA
[24] Truong, P.N., Heuvelink, G.B.M., and Pebesma, E. (2014) Bayesian area-to-point kriging using expert knowledge as
informative priors. Int. J. Appl. Earth Obs. Geoinf., 30 (1), 128–138.
[25] Goovaerts, P. (1997) Geostatistics for Natural Resources Evaluation, Oxford University Press, New York.
[26] Matheron, G. (1976) A simple substitute for conditional expectation: the disjunctive kriging, in Advanced Geostatistics in
the Mining Industry (eds M. Guarascio, M. David, and C. Huijbregts), Reidel, Dordrecht, pp. 221–236.
[27] Verly, G. (1993) The multi-Gaussian approach and its applications to the estimation of local reserves. J. Int. Assoc. Mathem.
Geol., 15, 259–286.
[28] Goovaerts, P. and Glass, G. (2014) Geostatistical modeling of the spatial distribution of surface soil arsenic around a
smelter. J. Jpn. Soc. Soil Phys., 128, 5–10.
T
[29] Diggle, P.J., Tawn, J.A., and Moyeed, R.A. (1998) Model-based geostatistics (with discussion). Appl. Stat., 47, 229–350.
[30] Mugglin, A.S. and Carlin, B.P. (1998) Hierarchical modeling in geographic information systems: population interpolation
over incompatible zones. J. Agric. Biol. Environ. Stat., 3, 117–130.
[31] Jacquez, G.M., Goovaerts, P., Kaufmann, A., and Rommel, R. (2014) SpaceStat 4.0 User Manual: Software for the Space-Time
RS
8 Wiley StatsRef: Statistics Reference Online, © 2014–2016 John Wiley & Sons, Ltd.
This article is © 2016 John Wiley & Sons, Ltd.
DOI: 10.1002/9781118445112.stat07754.pub2