Spatial Modeling Principles in Earth Sciences
Spatial Modeling Principles in Earth Sciences
Zekai Şen
123
Prof. Zekai Şen
İstanbul Technical University
Civil Engineering Faculty
Campus Maslak
34469 Istanbul
Turkey
[email protected]
Earth sciences phenomena have evolved in time and space jointly, but in practical
applications their records are available as temporal records, spatial measurements,
or both. If the records are available only spatially, then they are one realization from
the regionalized variable (ReV), which has geometrical locations as longitudes (or
easting) or latitudes (or northing), with a record of the earth sciences phenomena
at the same location, Hence, in practical applications a set of triplicate values (lon-
gitude, latitude, record) provides the realization out of many realizations from the
ReV concerned. The worth of data in earth sciences and geology is very high since
most of the interpretations and decisions are based on their qualitative and quantita-
tive information content. This information is hidden in representative field samples,
which are analyzed for the extraction of numerical or descriptive characteristics.
These characteristics are referred to as data. Data collection in earth sciences is
difficult, expensive, and requires special care for accurately representing the ge-
ological phenomenon. After all various parameters necessary for the description
and modeling of the geological event, such as bearing capacity, effective strength,
porosity, hydraulic conductivity, chemical contents, are hidden within each sample,
they individually represent a specific point in space and time. Change of locations
leads to another realization, which is different than the others, but they are statis-
tically indistinguishable from each other. This property provides a common basis
for the development of convenient prediction models for the ReVs. In general, the
collection of methodologies for modeling such a set of triplicates falls within the
geostatistical domain, which has been in practical use for the past four decades.
Kriging is the methodology that is used invariably in earth sciences for the regional
(spatial) prediction of spatial variability. Prior to its application, a descriptive func-
tion of spatial variability as the change of square-differenced spatial variation at
two sides with the distance is constructed under the name of semivariogram (SV),
which describes the spatial variation in a quantitative manner. This is rather similar
to covariance in the classical time series analysis of stochastic processes. Since its
origin in the 1960s, few other alternatives as the cumulative SV (CSV), point CSV
(PCSV), and spatial dependence function based on these conceptions are developed
and applied in different aspects of earth sciences. Each one of these techniques is ex-
plained in this book, and their various uses in modeling several earth sciences events,
such as earthquake, meteorology, hydrology, are presented. Various alternatives of
vii
viii Preface
the Kriging methodology are presented and the necessary steps in their applications
are exposed in a rather simple manner. Simple spatial variation prediction method-
ologies are also revised with up-to-date literature, and their connections to the most
advanced spatial modeling methodologies are explained with basic conceptions.
Spatial simulation methodologies in earth sciences are necessary to explore in-
herent variabilities such as those in fracture frequencies, spacing, rock quality des-
ignation, grain size distribution, and many similar random behaviors of the rock
and porous medium. Innovative methodologies are presented for simulation purpose
with convenient applications.
The purpose of the book is to provide a comprehensive presentation of up-to-date
models for utilization in earth sciences and their applications. Some portions of the
textbook will deal with the material already covered in different books and in recent
literature like the top journals related to various topics of earth sciences. However,
a significant part of the book will consist of the original techniques developed and
presented into open literature by the author. Additionally, many unique physical
approaches, field cases, and sample interpretations will be presented prior to the
application of different models.
I could not have completed this work without the love, patience, support, and as-
sistance of my wife, Fatma Şen. I also extend my appreciation to my stay at the Fac-
ulty of Earth Sciences, Hydrogeology Department, and field experience for many
years at this university and later on to my stay with the Saudi Geological Survey
(SGS), where every facility was provided at my disposal for scientific achievements
within different aspects of earth sciences, so that many parts of this book have been
materialized. In the mean time, I am also grateful to the Istanbul Technical Univer-
sity for giving me every opportunity to work in different aspects of earth sciences,
including meteorology, atmospheric sciences, hydrology, and alike topics.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Earth Sciences Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Determinism Versus Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Earth, Environment, and Atmospheric Researches . . . . . . . . . . . . . . . 16
1.6 Random Field (RF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Regionalized Variable (ReV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
ix
x Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Chapter 1
Introduction
Abstract Earth sciences events have spatial, temporal, and spatio-temporal vari-
abilities depending on the scale and purpose of the assessments. For instance, in
geology, geophysics, rock mechanicals, and alike aspects the spatial variability is
dominant, whereas in hydrology, meteorology, etc., aspects the temporal variability
has preference; but in general, irrespective of time scale, all of the earth sciences
events have spatio-temporal variability. Plate tectonics has temporal variations dur-
ing millions of years whereas spatial variability exists at any time instant. Hence,
variability is one of the major factors in the modeling of earth sciences events for
their future behavior predictions or estimations at any given location where there is
no sampling information. Variability implies similarity at some instances and there-
fore similar representative patterns play significant role in earth sciences modeling.
1.1 General
quantitative studies and the objectivity increases with shifting toward quantitative
interpretations.
Any geological phenomenon can be viewed initially without detailed information
to have the following general characteristics:
1. It does not change with time or at least within the lifetime of human, and
therefore geological variations have spatial characters and these variations can be
presented in the best possible way by convenient maps. For example, geological
description leads to lithological variation of different rock types. The simplest of
such descriptions is the spatial variation of three basic rock types in any study area.
For instance, in Fig. 1.1 the sedimentary, igneous, and metamorphic rock lithology
is shown. In this simplest classification of rocks, the researcher is not able to look
for different possibilities or ore reserves, water sources, oil field possibilities, etc.
Oil reserves cannot be in igneous or metamorphic rock units and, therefore, he/she
has to restrict the attention on the areas of sedimentary rocks.
On the other hand, a more detailed geological map can be obtained for different
rock types as shown in Fig. 1.2 for the Arabian Peninsula where the subunits of three
basic rock types are shown. Now, it is possible to look for regions of groundwater,
which is in connection with atmosphere, i.e., the rainfall leads to direct infiltra-
tion. This information implies shallow groundwater possibilities and consequently
quaternary sediments (wadi alluviums of present epoch geological activity) can be
delimited from such a map. Each of the above mentioned maps shows spatial varia-
tion of geological phenomena.
2. Geological units are neither uniform nor isotropic nor heterogeneous in hori-
zontal and vertical extends. It is obvious from the maps in Figs. 1.1 and 1.2 that in
any direction (NS, EW, etc.) the geological variation is not constant. Of course, the
vertical direction will include succession of different rock types and subunits, which
are referred to as stratigraphic variation. Such a stratigraphic section is shown in
Fig. 1.3, where neither the thickness of each unit nor their slopes are constant.
It is possible to conclude from these two points that the spatial geometry of geo-
logical phenomena is not definite and, furthermore, its description is not possible
with Euclidean geometry, which is based on lines, planes, and volumes of regular
shapes. However, in practical calculations, the geometric dimensions are simplified
Igneous
Metamorphic
Sedimentary
Fig. 1.1 Three rock type
map
1.2 Earth Sciences Phenomena 5
Tb BASALT FLOW
QXs
Xs
Q DIFFERENTIATED SANDSTONE
4. Isotropy implies uniformity along any direction, i.e., directional property con-
stancy. The homogeneity means constancy of any property at each point. These
properties can be satisfied in the artificial material produced by man, but natural
material such as rocks and any natural phenomenon in hydrology and meteorology
1.2 Earth Sciences Phenomena 7
cannot have these properties in the absolute sense. However, provided that the direc-
tional or point-wise variations are not very appreciably different from each other,
then the geological medium can be considered as homogeneous and isotropic on the
average. In this last sentence, the word “average” is the most widely used parame-
ter in quantitative descriptions, but there are many other averages that are used in
the earth sciences evaluations. If there is not any specification with this world, then
it will imply arithmetic average. Arithmetic average does not attach any weight or
priority to any point or direction, i.e., it is an equal weight average.
5. It can be concluded from the previous points that spatial variations cannot
be deterministic in the sense of isotropy and homogeneity, and therefore they can
be considered as non-deterministic, which implies uncertainty and in turn it means
that the spatial assessments and calculations cannot be adopted as the final crisp
value. At this stage, rather than well-founded deterministic mathematical rules of
calculation and evaluation, it is possible to deal with spatial assessments and eval-
uations by uncertainty calculations, which are probability, statistics, and stochastic
processes.
6. Apart from the geometry and material features, the earth sciences event media
also includes in a non-deterministic way the tectonic effects such as fissures, frac-
tures, faults, folds, or chemical solution cavities, which appear rather randomly.
Figure 1.5 indicates the appearance of these features in some of the geological
media.
From these explanations, it is a scientific truth that the earth sciences phenom-
ena cannot be studied with deterministic methodologies for meaningful and useful
interpretations or applications. The non-deterministic, i.e., uncertainty techniques
such as probability, statistics, and stochastic, methodologies are more suitable for
the reflection of any spatial behavior.
1.3 Variability
Spatial and temporal variability are the common occurrences in nature. The patterns
caused by the spatial variability of a phenomenon occur at many scales, ranging
from simple chemical reactions between activator and inhibitor reagents to the large-
scale structure of the universe. For example, spiral wave patterns are well known
in chemical reactions, hurricanes, galaxies, and simulated predator–prey systems.
Because common mechanisms may be responsible for generating patterns among
diverse physical, chemical, and biological systems, new pattern formation phenom-
ena are potentially of great scientific interest. The mechanisms that cause patterns
to evolve, however, are not well understood. An understanding of the causes and
consequences of pattern formation in nature will increase our understanding of pro-
cesses in nature such as succession, spread, and persistence of species and manage-
ment. With the ability to detect and describe different patterns comes the power to
discover the determinants of patterns and the mechanisms that generate, maintain,
modify, and destroy those patterns.
Spatial variability from one site to another leads to the concept of regional vari-
ability within the area. This variability determines the regional behavior as well as
the predictability of the precipitation amounts on the basis of which the interpreta-
tions are derived, provided that suitable techniques and models are identified. For
spatial variability the classical time series techniques yield useful interpretations, but
for equal-distance sampling only. However, a great deal of progress has been made
in the adaptation of statistical techniques to unevenly sampled data (North et al.,
1982). These techniques do indeed yield useful information, which is significantly
different from the information obtained by the use of semivariogram technique in
geostatistics (Chapter 4). Regular scatter of sites might not provide enough regional
information as irregular sites since earth sciences agents and surface features are
almost always heterogeneous and anisotropic. Consequently, the following signifi-
cant questions remain unanswered so far in the literature on spatial (regional) earth
sciences assessments:
1) How to quantify from irregular site measurements whether the regional distribu-
tion is homogeneous, continuous, dependent, etc.?
2) How to model the heterogeneity so as to represent continuous variability within
the area concerned?
3) How to construct maps concerning the regional variability such that the estimates
are unbiased?
Earth sciences variability amounts in any area show considerable spatial and
temporal variations. The variation is brought about by differences in the type and
scale of development in event-producing processes and also influenced strongly by
local or regional factors such as topographic elevations and atmospheric conditions
(Wilson and Atwater, 1972). In practice, however, the variation of event is con-
sidered to be significantly site- or, at the very least, area-dependent. In addition,
1.3 Variability 9
for most areas in the world each individual measurement site is assumed to be
representative of a very considerable area around it. Logically, measurement at any
individual site will have an area of influence around it, but there is no physical or
data-based objective criterion for the definition of such an area (Chapters 2 and 4).
The assumption of very considerable spatial variation is a very dangerous one, espe-
cially at short distances and for severe conditions. For instance, there is no guarantee
that point rainfall will in any way produce a reliable guide to the rainfall of imme-
diate surrounding areas (Summer, 1988).
The spatial distribution of any event has always been a factor important in many
earth sciences analyses. Reliable estimations of average values are essential for
adequate planning. The same is true for water balance calculations, groundwater
recharge boundaries, urban drainage purposes, and for climatic conditions. Hence,
in practice the basic problem is that of point representativeness and, subsequently,
further problem is how to derive pictures of spatial relationships, which are reason-
ably close to reality. In overcoming this problem, it is important to determine the
spatial pattern based on a sparse and uneven network of measuring stations.
Variability is a general term used for a fundamental property of all events, which
is boiled down in a more quantitative term “difference.” The equivalence of these
two words at least in the implications provides a bridge between the philosophical
thinking and the rational quantitative modeling of social, economical, and natural
events. The word difference as an expression for variability gives rise to various
algorithms that are currently in use in many disciplines. For instance, the global
numerical values cannot explain the internal or external features of the concerned
event except its scale, but the comparison of any two leads to additional and detailed
information such as the difference (with dimension of the numerical values) and unit
difference (slope). There are many categories of variability such as geometric, kine-
matics, and dynamic types. In addition, the variability may be in regular (certain) or
irregular (uncertain) domains.
The geometrical similarity can be split into two types, namely size corresponding
to scale and shape.
In Fig. 1.6a, three shapes are exactly the same but their scales are different. How-
ever, they cannot be considered as having different variability from each other. Espe-
cially, if the last shape in Fig. 1.6a is further downscaled, it approaches a point.
Downscaling or upscaling each one of these shapes helps to convert each to others
easily, and therefore there is no real variability between them. The only variability
is within each shape, because each point on the figure has different coordinates or
slopes (tangents). The variability in shape is different than scale type, because the
overall appearance of different objects may look like almost the same with local
differences as in Fig. 1.6b. Since these shapes are not exactly the same, there are
relative deviations between the two. For the relative difference either of the shapes
may be taken as the basis and the other is considered as deviating from the first
one. Hence, it is possible that the differences are relative not absolute. This implies
the significance of relative variability. The geometric difference is referred to as the
spatial variability in this book.
10 1 Introduction
V2 V1 V2
V1
V4 V3 V4
V3
V5
V5
Fig. 1.7 Kinematics
variability and similarity
1.3 Variability 11
After the foregoing logical explanations the two most significant explanations for
the variability measurement appear as difference and division. Furthermore, these
operations can be applied between two points for a given ReV or between two ReVs
at a single point. Hence, if there are n measurements or sampling points, there are
then n(n–1)/2 different ways of distance variability samplings. The variability of
ReVs is represented most often by the arithmetic or preferably weighted-average
procedure. This indicates for a multitude of point variability, the summation opera-
tion provides a general representative value for the regional variability of ReVs.
In general, any field, office, or theoretical study relating to natural phenomena that
occur within the lithosphere, hydrosphere, or atmosphere is referred to as the earth
sciences event. For instance, the lithological compositions in terms of stratigraphy,
historical evolutions, structural and tectonic dynamism are under continuous spa-
tial and temporal changes, and therefore their analysis, control, and predictions are
quite difficult. The difficulty arises not only from the insufficiency of the techniques
only, but inherently from the random behavior of the earth sciences elements in an
area. In earth sciences concerning lithology, there are no significant changes tem-
porally within the average human life, but the lithological characteristics change
even after, say, 1 cm apart. For instance, occurrence of fracture systems includ-
ing fracture number, direction, dip, length, aperture, frequency, etc., are all very
complicated and sophisticated for measurement; and, consequently, right from the
field studies, there arise uncertainties either due to the human ignorance or due to a
significant extent as a result of inherent randomness that exists in nature. Another
example is the rate, time, and direction of individual pebble movement along a river,
and this phenomenon cannot be predicted with any reasonable degree of determin-
ism. It is possible to state different premises and then conclusions deduced logically
from a set of premises, but all these are in terms of linguistic variables and state-
ments. It is the role of quantification methods to translate these verbal statements
into mathematical equations or inequalities so as to come closer to final conclusions
or decisions. Idealization, assumption, and simplification in earth sciences are more
difficult than the atmospheric sciences where the sky does not have, say, any fracture
or discontinuity as in the subsurface. Hence, in earth sciences more than any other
physical or atmospheric sciences, the element of randomness is embedded in the
field, office, and theoretical studies. Unfortunately, geologists have worked so many
years in a descriptive manner and did not care about the uncertainty techniques such
as probability and statistics in their quantitative studies until 1960. However, then
onward many statistical and specifically developed statistics in geology as geostatis-
tics started to be used in many ore reserve, groundwater, petroleum, etc., studies.
The movement of the pebbles, sand, silt, and clay particles may be described as ran-
dom within the limits of environmental parameters. This randomness of movement
is interpreted as a result of chance events inherent in the transporting medium and
the transported material.
1.4 Determinism Versus Uncertainty 13
Rainfall
Day
Fig. 1.9 Diurnal temperature variation
14 1 Introduction
ReV phenomena evolve with time and space, and their appreciation as to the
occurrence, magnitude, and location is possible by observations and still better by
measurements along the time and/or space reference systems. The basic information
about the phenomenon is obtained from the measurements. Any measurement can
be considered as the trace of the phenomenon at a given time and location. Hence,
it should be specified by time and location, but its magnitude is not at the hand of
researcher. Initial observance of the phenomenon leads to non-numerical form of
descriptive data that cannot be evaluated with certainty.
The use of uncertainty techniques such as the probability, statistics, and stochas-
tic methods in earth, environmental, and atmospheric sciences has increased rapidly
since 1960s and most of the researchers as students and teachers seek more train-
ing in these disciplines for dealing with uncertainty in a better quantitative way.
Many professional journals, books, and technical reports in the earth, environment,
and atmospheric science studies include significant parts on the uncertainty tech-
niques in dealing with uncertain natural phenomena. Yet, relatively few scientists
and engineers in these disciplines have a strong background in school mathematics
and the question is then how can they obtain sufficient knowledge of uncertainty
methods including probability, statistics, and stochastic processes in describing nat-
ural phenomena appreciating the arguments, which they must read and then digest
for successful applications in making predictions and interpretations.
The inability to predict specific events might stem from the fact that nature is
intrinsically random and randomness is used in a statistical sense to describe any
phenomenon that is unpredictable with any degree of uncertainty for a specific
event; and deterministic phenomena, on the contrary, are those in which outcomes
of the individual events are predictable with complete certainty under any given set
of circumstances, if the required initial conditions are known. In general, the nature
is considered as random. Consequently, randomness has been suggested as the ulti-
mate and most profound physical concept of nature.
Moreover, it is almost true to claim that classical approaches (analytical,
empirical, etc.) are not deterministic theory if the claim simply means that actual
measurements confirm the predictions of the theory only approximately or only
within certain statistically expressed limits. Any theory formulated in terms of mag-
nitudes capable of mathematically continuous variation must, in the nature of the
case, be statistical and not quite deterministic in this sense, for the numerical val-
ues of physical magnitudes (such a permeability) obtained by laboratory or field
measurement never form a mathematically continuous series; any set of values
so obtained will show some dispersion around the values calculated from theory.
Nevertheless, a theory is labeled properly as a deterministic one if analysis of its
internal structure shows that the theoretical state of a system at one instant logically
determines a unique state of that system for any other instant. In this sense, and
with respect to the theoretically defined mechanical states of systems, mechanics
is unquestionably a deterministic theory. Consequently, when a predictor is intro-
duced into a philosophical discussion of determinism, it is not a human being but
a “superhuman intelligence.” Human predictors cannot settle the issue of determin-
ism because they are unable to predict physical events no matter what the world is
really like.
1.4 Determinism Versus Uncertainty 15
The uncertainty in the earth and atmospheric knowledge arises out of the
conviction that earth and atmospheric generalizations are immensely complicated
instantiations of abstract, and often universal, physical laws. Earth and atmospheric
generalizations always contain the assumptions of boundary and initial condi-
tions. In a way, the uncertainty in the predictions arises from the ignorance of the
researcher to know the initial and boundary conditions in exactness. They cannot
control these conditions with certainty. On the assumptions of physical theory, earth
and atmospherically significant configurations are regarded as highly complex. This
is true whether or not the “world” is deterministic. Physical laws, which are not
formulated as universal statements, may impose uncertainty directly upon earth and
atmospheric events as in the case of inferences based on the principles of radioactive
disintegration.
Earth and atmospheric sciences deal with spatial and temporal behaviors of nat-
ural phenomena at every scale for the purpose of predicting the future replicas
of the similar phenomena, which help to make significant decisions in planning,
management, operation, and maintenance of natural occurrences related to social,
environmental, and engineering activities. Since any of these phenomena cannot be
accounted by measurements, which involve uncertain behaviors, their analysis, con-
trol, and prediction need to use uncertainty techniques for significant achievements
for future characteristic estimations. Many natural phenomena cannot be monitored
at desired instances of time and locations in space and such restrictive time and loca-
tion limitations bring additional irregularity in the measurements. Consequently, the
analysis, in addition to uncertain temporal and spatial occurrences, has the prob-
lem of sampling the natural phenomenon at irregular sites and times. For instance,
floods, earthquakes, car accidents, fracture occurrences are all among the irregu-
larly distributed temporal and spatial events. Uncertainty and irregularity are the
common properties of natural phenomena measurements in earth and atmospheric
researches, but the analytical solutions through numerical approximations all require
regularly available initial and boundary conditions that cannot be obtained by lying
regular measurement sites or time instances. In an uncertain environment any cause
will be associated with different effects, each with different level of possibilities.
Herein possibility means some preference index for the occurrence of each effect.
The greater the possibility index the more frequent the event occurrence. Figure 1.10
indicates the deterministic and uncertain cause–environment–effect system in a sim-
plified manner. In the deterministic case any single cause will lead to a certain sin-
gle effect, whereas in the uncertain domain single cause (effect) might be associated
Input GEOLOGICAL
GEOLOGICAL Output
ENVIRONMENT
Outputs
Fig. 1.10 Cause, Input GEOLOGICAL
environment, and effect ENVIRONMENT
triplicate
16 1 Introduction
Outputs
???????? Earth science
environment Filtration
Outputs
Input ??????????
???????? Identification
????????
Input Earth science
environment Prediction
Recently, the scientific evolution of methodologies has shown that the more the
researchers try to clarify the boundaries of their domain of interest the more they
become blurred with other domains of research. For instance, a hydrogeologist tries
to model the groundwater pollution as one of the modern nuisances of humanity so
1.6 Random Field (RF) 17
far as the water resources are concerned; he/she needs information about the geo-
logical environment of the aquifers, meteorological and atmospheric conditions for
the groundwater recharge, in addition to social and human settlement environmental
issues for the pollution sources. Hence, many common philosophies, logical basic
deductions, methodologies, and approaches become common to different disciplines
and the data processing is among the most important common topics, which include
the same methodologies applicable to diversity of disciplines. The way that earth,
environmental, and atmospheric scientists frame their questions varies enormously,
but the solution algorithms may include the same or at least similar procedures.
Any natural phenomenon or its similitude occurs extensively over a region, and
therefore its recordings or observations at different locations pose some questions
as, for instance, are there relationships between phenomena in various locations? In
such a question, the time appears as if it is frozen and the phenomenon concerned
is investigated over the area with its behavioral occurrences between the locations.
Answer to this question may be provided descriptively in linguistic, subjective, and
vague terms, which may be understood even by non-specialists in the discipline.
However, their quantification necessitates objective, regionally applicable method-
ologies to ReV, which is one of the purposes of this book.
Another question that may be stated right at the beginning of the research in
the earth, environment, and atmospheric sciences are places different in terms of
the phenomena present there? Such questions are the source of many researches’
interest in the subject. Scientific treatment and interpretation of even error-laden
data lead to significant practical knowledge concerning the oceans and atmosphere.
It is the prime duty of the earth scientists to filter out the meaningful portions of the
data and to model randomly the error part.
with respect to a shift of the system of points (Chapter 4). The same random field
is called statistically homogeneous and isotropic (if it is homogeneous in the sense
indicated above), while the pdfs are invariant with respect to an arbitrary rotation of
the system of points such as of a solid body and to a mirror reflection of this sys-
tem with respect to the arbitrary plane passing through the origin of the coordinate
system. In other words, statistical moments depend upon the configuration of the
system of points for which they are formed, but not upon the position of the system
in space.
values of atmospheric pressure and rainfall at sites between the recording stations,
he did not propose a definite procedure for the distance, i.e., radius of influence.
However, in mining geology rapid and effective solutions are needed because of the
enormous costs incurred, and therefore it was in mining geology that the advances
in spatial analysis were made.
References
Cressman, G. P., 1959. An operational objective analysis system. Mon. Wea. Rev. 87(10), 367–374.
David, M., 1977. Geostatistical Ore Reserve Estimation. Elsevier Scientific Publishing Company,
Amsterdam, 364 pp.
Delhomme J. P., 1979. Kriging in the hydrosciences. Adv. Water Resour. 1, 251.
Gandin, L. S., 1965. Objective Analysis, Lectures on Numerical Short-range Weather Prediction,
WMO, Regional Training Seminar, pp. 633–677.
Isaaks, E. H., and Srivastava, R. M., 1989. An Introduction to Applied Geostatistics. Oxford Uni-
versity Press, Oxford, 561 pp.
Kolmogorov, A. N., 1941. Interpolation and extrapolation von stationaren zufalligen folgen. Bull.
Acad. Sci. USSR, Ser. Math. 5, 3–14.
Matheron, G., 1963. Principles of geostatistics. Econ. Geology 58, 1246–1266.
North, G. R., Bell, T.L., and Cahalan, F., 1982. Sampling errors in the estimation of Empirical
orthogonal functions. Monthly Weather Rev. 110, 699–706.
Popper, K., 2002. The Logic of Scientific Discovery. Routledge Publishing Company, London and
New York, 479 pp.
Summer, G., 1988. Precipitation Process and Analysis. John Wiley and Sons, New York, 455 pp.
Wilson, J. W., and Atwater, M. A., 1972. Storm rainfall variability over Connecticut. J. Geophys.
Res. 77(21), 3950–3956.
Chapter 2
Data Types and Logical Processing Methods
Abstract Not only numerical but also linguistic data are necessary in the modeling
of earth sciences events. Measurements are sources of numerical data whereas
observations lead to linguistic data. Numerical data include randomness and errors
but linguistic data are rather fuzzy, which means that there are uncertainties in both
data types. Accordingly, the final model results as predictions or estimations include
errors that must be confined within ±5% limits in practical applications. Spatial
estimations can be obtained either on point basis or on sub-areal basis depend-
ing on the refinement of the problem at hand and purpose. In general, longitude
(easting), latitude (northing), and regionalized variable (ReV) value at this location
are necessary for a complete description and establishment of a point-wise spatial
model, where these three values are referred to as triplicate; but in the case of pixel
location its size is also necessary, which leads to four variables (quadruple) for the
description of ReV. Simple classical triangularization, polygonalization techniques
are used in addition to innovative percentage polygon methodology. Droughts are a
kind of spatial earth sciences with coverage area that can be modeled by probabilis-
tic approaches.
2.1 General
Scientific and engineering solutions can be given about any earth sciences phe-
nomena through relevant spatial modeling techniques, provided that representa-
tive data are available. However, in many cases it is difficult and expensive to
collect the field data, and therefore it is necessary to make the best use of avail-
able linguistic information, knowledge, and numerical data to estimate the spatial
(regional) behavior of the event with relevant parameter estimations and suitable
models. Available data provide numerical information at a set of finite points, but
the professional must fill in the gaps using information, knowledge, and understand-
ing about the phenomena with expert views. Data are the treasure of knowledge
and information leading to meaningful interpretations. Observations are also poten-
tial source of information, which provides linguistic rational and logical expres-
sions in the form of premises. Data imply numerical measurements using different
instruments either in the field or in the laboratory. Observations are not numer-
ical but rather verbal data that assist to describe and identify the phenomenon
concerned.
The development of data estimation methods can be traced back to Gauss (1809),
who suggested the technique of deterministic least-squares approach and employed
it in a relatively simple orbit measurement problem. The next significant contri-
bution to the extensive subject of estimation theory occurred more than 100 years
later when Fisher (1912), working with pdf, introduced the approach of maximum
likelihood estimation. However, Wiener (1942, 1949) set forth a procedure for the
frequency domain design of statistically optimal filters. The technique addressed
the continuous-time problem in terms of correlation functions and the continuous
filter impulse response. Moreover, the Wiener solution does not lend itself very
well to the corresponding discrete-data problem, nor it is easily extended to more
complicated time-variable, multiple-input/output problems. It was limited to sta-
tistically stationary processes and provided optimal estimates only in the steady-
state regime. In the same time period, Kolmogorov (1941) treated the discrete-time
problem.
In this chapter, observation and data types are explained and their preliminary
simple logical treatments for useful spatial information deductions are presented
and applied through examples.
2.2 Observations
They provide information on the phenomenon through sense organs, which can-
not provide numerical measurements but their expressions are linguistic (verbal)
descriptions. In any study, the collection of such information is unavoidable and they
are very precious in the construction of conceptions and models for the control of the
phenomenon concerned. Observations may be expressed rather subjectively by dif-
ferent persons, but experts may deduce the best set of verbal information. Depending
on the personal experience and background, an observation may instigate differ-
ent conceptualization and impression on each person. In a way observations pro-
vide subjective information about the behavior of the phenomenon concerned. In
some branches of scientific applications, observational descriptions are the only
source of data that help for future predictions. Even though observations may be
achieved through some instruments as long as their description remains in verbal
terms, they are not numerical data. Observations were very significant in the early
developments of the scientific and technological developments especially before the
2.2 Observations 23
17th century, but they became more important in modern times including linguistic
implications and logical deductions explaining the fundamentals of any natural or
man-made event (Zadeh, 1965). For instance, in general, geological description of
rocks can be made by field observations and concise linguistic categorizations are
then planned for others to understand again linguistically. It is important to stress
at this point that linguistic expressions of observations help to categorize the event.
Although such a categorization set forward crisp and mutually exclusive classes
according to classical Aristotelian logic, recently, fuzzy logic classification includ-
ing mutually inclusive classes is suggested by Zadeh (1973) and it is most frequently
used in every discipline in an increasing rate. In many disciplines observations are
extremely important such as in geological sciences, medicine, social studies, phys-
iology, military movements, economics, social sciences, meteorology, engineering,
etc. At times they are more valuable than numerical data, but unfortunately, their
role is almost forgotten due to recent mechanical and software programs that work
with numerical data.
Example 2.1
What type of observational information can be obtained when one takes hand speci-
men from a rock? Even a non-specialist in geology try to deduce the following basic
linguistic information based on his/her observation and inspection of the specimen
through a combined use of his/her sense organs.
1) Shape: Regular, irregular, round, spiky, elongated, flat, etc. This information
can be supported for detailed knowledge, with the addition of adjectives such as
“rather,” “quite,” “extremely,” “moderately.” Note that these words imply fuzzy
information.
2) Color: Any color can be attached to whole specimen or different colors for dif-
ferent parts. Detailed information can be provided again by fuzzy adjectives
such as “open,” “dark,” “gray.”
3) Texture: The words for the expression of this feature are “porous,” “fissured,”
“fractured,” “sandy,” “gravelly,” “silty,” etc.
4) Taste: The previous descriptions are through the eye but the tongue can also
provide information as “saline,” “sour,” “sweet,” “brackish,” and so on.
5) Weight: It is possible to judge approximate weight of the specimen and have
description feelings as “light,” “heavy,” “medium,” “very heavy,” “floatable,”
and likewise other descriptions can also be specified.
6) Hardness: The relative hardness of two minerals is defined by scratching each
with the other and seeing which one is gouged. It is defined by an arbitrary
scale of ten standard minerals, arranged in Mohr’s scale hardness and subjec-
tively numbered in scale based on degrees of increasing hardness from 1 to 10.
The hardness scale provides guidance for the classification of hand specimen
according to Table 2.1, where the verbal information is converted to a scale
through numbers.
24 2 Data Types and Logical Processing Methods
Example 2.2
Earthquake effect on structures can be described according to Table 2.2 guidance,
which is referred to Mercalli scale. The following is an abbreviated description of
the 12 scales of Modified Mercalli intensity.
Scales Description
It is important to notice that the linguistic descriptions and scales are neither
time- nor space-dependent, but they have event basis. The reference to any system
is not required apart from logical rules.
Evolution of any event takes place both in time and in space, but depending on
the practical purposes they can be viewed either temporally or spatially or spatio-
temporally. Accordingly, instruments yield numerical data based on either time or
space reference system. In this book, only spatial data interpretations and treatment
processes are presented. It is assumed herein that the spatial phenomena continu-
ously cover the whole study area. Since the most elemental part of space is a point
in earth sciences, the basic sampling locations are points that may be scattered in
the study area either regularly or irregularly. Theoretically, there is infinite number
of points, but the sampling of all points is not conceivable practically. There are two
ways of sampling for conceivable studies. These are as follows:
In general, any point or pixel has a system of longitudes and latitudes. Hence,
the whole earth surface is covered by quadrangles, which may be regarded as large-
scale pixels. In practice, for Cartesian distance, area, and volume calculation pur-
poses, longitudes and latitudes are converted to “northing” and “southing” values
with respect to an initial reference point (see Figs. 2.1 and 2.2). This means that the
elements of spatial point data include triple variables, namely, easting, e, northing,
n, and the spatial variable measured at this location, say, z. In short, we can show
Northing
• Point
Easting
0
Northing
Pixel
Easting
0
the point data as a triple, {e, n, z}. Likewise, in addition to these triple values any
pixel includes its resolution size, r, which can be represented by a quadruple, {e, n,
z, r}. Even though the pixel size is small, one can practically calculate the number
of pixels that is necessary to cover a given area.
Example 2.3
If a study region has an area of 45 km2 and pixel dimension is 100×100 m2 , what
is the number of pixels for the representation of the whole region? The number can
be calculated as 45×106 /104 = 4,500 pixels. This simple calculation indicates that
each pixel has an area of influence defined by its square area. However, it is not
possible to make the same calculation for point data, since a point does not have an
area of influence by itself. However, it is possible to define the area of influence for
each point based on a set of neighboring points, as will be explained later in this
chapter.
Irregularity is not necessarily related to randomness and has its own connota-
tion in earth and atmospheric sciences. A very brief distinction between these two
terms is that although randomness is inherent in the behavior of natural events out
of human control, irregularity implies human implications in the measurement and
description or definition of natural events. For instance, the locations of meteorolog-
ical stations or groundwater wells are irregularly scattered in the area or space, and
consequently they do not comply by any regular or random pattern. Once the irreg-
ularity is established, then it does not change with time and space easily until there
are other additional inferences by human. Another good distinction in the earth and
atmospheric sciences between regularity and irregularity can be considered in the
solution of differential equations by numerical techniques. Finite element method
requires a definite and regular mesh to be laid over the solution domain of interest
(study area) with regular mesh. Boundary and initial conditions must be defined at
regular set of nodes. Various types of regularity are shown in Fig. 2.3. In practi-
cal studies, measurements as initial conditions are available at a set of irregularly
scattered station locations and hence there is not a desirable match between these
locations and consequent regular nodes.
2.4 Sampling 27
(a) (b)
(c) (d)
2.4 Sampling
• • • • •
• • • •
• • • • • •
•
• •
• • •
•
• • • • •
• • •
• • • •• • •• •
•
• • • • • • •
• • ••
• • • • •
e f
at smaller scales the researches have to lay down the set of points so as to sample the
concerned phenomenon in a representative manner. There are different techniques
in deciding about the position of the sampling points. If nothing is known before
hand, then it may seem logical to select the sampling points at nodes or centers of a
suitable mesh over the study area. This is the most uniform sampling procedure as
shown in Fig. 2.4.
These sampling patterns can be divided into three different categories as the reg-
ular, random, and aggregated or clustered. Figure 2.4a and b are of regular sampling
procedures. In Fig. 2.4c and d, the randomness is in small scales and the random pat-
tern remains within the sub-areal regular grids. In Fig. 2.4b although each sub-area
is sampled, in Fig. 2.4d only randomly chosen sub-areas are sampled. In Fig. 2.4a
the maximum distance between the two neighboring points cannot be greater than
twice of the sub-area diagonal length. In Fig. 2.4c the distance is of several times
2.4 Sampling 29
the main diagonal length of the sub-area. Large-scale random sampling patters are
given in Fig. 2.4e, where there is no restriction on the distance between the sampling
points. In Fig. 2.4f there are three clusters of the spatial sampling, each with random
sampling patterns. Such categorized samplings are possible depending on the areal
occurrence of the phenomenon studied. For instance, if ore, water, or oil deposits are
intact from each other at three neighboring areas, then the cluster sampling patterns
arise.
Another feature of spatial sampling points is its uniformity concerning the fre-
quency of occurrence per area. If the density defined in this manner is equal in each
sub-area, then the spatial sampling is uniform; otherwise, it is non-uniform. This
definition implies that the regular sampling points in Fig. 2.4a and b, in addition to
small-scale random pattern in Fig. 2.4c, are all uniform because there is one point
per sub-area.
Uniformity gains significance if there are many sampling points within each sub-
area. In geological or meteorological studies, sub-areas are quadrangles between
two successive longitudes and latitudes. For instance, such a situation is shown
in Fig. 2.5 where each quadrangle has random number of random sampling
points. Hence, the question is whether the sampling point distribution is uniform
or not?
The pixels present a regular mesh over a region, which is conceptually similar
to numerical solution of analytical models. The difference is that in the case of
pixels the measurements (brightness values) are known, but this is not the case in
numerical analysis where the values either at the center of each cell or at each node
are necessary for calculation.
A set of point data from a region on interested event such as the elevations at
25 different sites can be presented in three columns, two of which are location
descriptions and the third column includes the regional variable, which is elevation
in this case (see Table 2.3). This table is an example for triple values as mentioned
before.
• • •
• • • • •
• • • •
•
• • • •
• • • • • • •
• •
• • •
• • • •
• • •
• •
• • • • •
• • • • • •
Northing • • •
• • •
• • • • • •
• • • •
• •
• •• • •
• • • • • •
• •
• • • •
• • • • •
• • • •
• •
• • •• • • •
• • •
• • • • •• •
Fig. 2.5 Quadrangle •
sampling Easting
30 2 Data Types and Logical Processing Methods
Elevation Elevation
Points Easting Northing (m) Points Easting Northing (m)
In this table, the scatter of sample points can be obtained from the second and
third columns, which appear as in Fig. 2.6. The first appearance indicates that the
sampling points are irregularly distributed in the study area. However, it is not pos-
sible to say whether their distribution is random or there is a systematic correlation.
This question will be answered in Chapter 3.
This may not be a representative sampling set for the study area because there
are no sampling points in the upper right and lower left parts of the area.
Although the spatial data are understood as the sampling of a variable with
respect to longitudes and latitudes, it is also possible to consider another two
variables instead of longitude and latitude. For instance, similar to Table 2.3 in Table
2.4 calcium, magnesium, and chloride variables are given in three columns. These
values are taken at a single point in space at a set of time instances.
6
x 10
2.46
2.44
2.42
Northing
2.4
2.38
2.36
2.34
Fig. 2.6 Irregular sample 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6
Easting x 105
points scatter
2.5 Number of Data 31
In this case there are three alternatives for data treatment purposes. The following
similar questions can be asked:
1) Does one want the change of chloride with respect to calcium and sodium? or
2) Does one want the change of calcium with respect to chloride and sodium? or
3) Does one want the change of sodium with respect to calcium and chloride?
Of course, each one of these cases presents different scatter diagrams as pre-
sented in Fig. 2.7. In each case, the third variable plays the role of elevation as in
the preparation of topographic maps.
In each case, the triplicate {e, n, z} has {e, n}, {e, z}, and (n, z} alternatives,
which are referred to as scatter diagrams.
In this book spatial data will imply at the minimum the existence of three points
within the study area, because they represent a spatial form as the simplest plain
trend. However, it must not be understood that only triplets are necessary for spa-
tial modeling. The practical question is how many data are necessary for spatial
data modeling. There is not an easy answer to such a question. The ready answers
as appearing in open literature such as 12 or 30 data points are not logical, but
they may provide vague and practical solutions depending on the circumstances.
If the purpose is to check whether the data confirms with the normal probability
distribution, then 12 data are enough. On the other hand, 30 is a number which is
based on empirical observations that in the representation of average meteorological
32 2 Data Types and Logical Processing Methods
70 220
65
200
60
55 180
Chloride (ppm)
Sodium (ppm)
50
160
45
140
40
35 120
30
100
25
20 80
40 50 60 70 80 90 100 110 120 130 140 40 50 60 70 80 90 100 110 120 130 140
Calcium (ppm) Calcium (ppm)
a b
70
65
60
55
Sodium (ppm)
50
45
40
35
30
25
20
80 100 120 140 160 180 200 220
Chloride (ppm)
c
Fig. 2.7 Sample points scatter diagrams (a) Calcium-sodium; (b) Calcium-chloride; (c) Chloride-
sodium
In general, statistical parameters from finite length records are biased estimations
of population (very long samples) counterparts. Estimates whose population values
(expectations) are not equal to the parameter they estimate are said to be biased.
This is due to the lack of complete information concerning the phenomenon and,
consequently, the parameter will be under- or over-estimated with respect to its true
population value. In practice, the true population parameters, which are independent
of sample length, are not known, but it is possible to estimate them from the avail-
able finite length records. As mentioned before, as a rule of thumb it is necessary to
have at least 30 values in order to have normal parameter estimations. However, 30-
year period cannot be equally valid for all random or regionalized variables (ReV).
It is very much a function of the correlation structure. In short, the more the persis-
tence, (correlation) the smaller the necessary data. In practical applications, it is of
great interest to know the number of data that is necessary to have a stable value on
the average. This is equivalent to saying that the variance of the average parameter
estimation must be constant. From this point, Cramer (1946) showed theoretically
for the normal independent ReVs that the variance, VI (x), of the arithmetic averages
for sample length n is
σ2
VI (x) = , (2.1)
n
where σ 2 is the unknown population variance and n is the number of data. This is
also due to the central limit theorem that the average of random samples accords in
normal pdf with mean equal to the average of the data, x, and the variance of the
averages is given as in Eq. (2.1). The square root of this expression is referred to as
the standard error, e, of estimate of arithmetic mean. If the original data come from
an independent ReV process with population mean, μ, and standard deviation, σ,
then the finite sample averages will have the same arithmetic mean with variance
as in Eq. (2.1). This means that the sample average is an unbiased estimate of the
arithmetic mean with a standard error as
34 2 Data Types and Logical Processing Methods
0.4
0.35
0.3
0.25
Probability
0.2
0.15
0.1
σ
e= √ , (2.2)
n
which decreases with the square root of the sample length. ReVs are sampled over
various space (or time) intervals. It is necessary to take 10 (90)% standard error
(reliability, significance) level corresponding to standard deviate, x90 value from a
standard Gaussian distribution as shown in Fig. 2.8. This level separates the whole
area under the standard normal pdf into two parts as reliability and risk regions (Şen,
1998a).
It is possible to obtain a practical chart between the three variables in Eq. (2.2),
which gives the relationship between the data number depending on the reliability
level as in Fig. 2.9.
Example 2.4
In an extensive area there is an unconfined groundwater aquifer in rather homoge-
nous sandstone. The field tests indicated that the radius of influence of each well is
650 m. So how many samples must be drilled such that there will not be interference
between the adjacent wells in a total region of 5 km2 ? For the numerical answer, it
is necessary to first calculate the area of influence for each well as 3.14(650)2 =
1,326,650 m2 = 1.3266 km2 . Now the number of wells can be found as 5/1.3266 =
3.769 ∼ 4 wells.
2.5 Number of Data 35
100
90
5 10 25 50 100 n = 150
80
70
Standard error
60
50
40
30
20
10
0
0 200 400 600 800 1000 1200 1400
Standard deviation
Fig. 2.9 Sample number with the standard deviation and reliability level
Example 2.5
In a region n = 10 well samples of electric conductivity (EC) in micromhos per
centimeter are recorded as given in Table 2.5. What is the representative number of
data?
Data number 1 2 3 4 5 6 7 8 9 10
EC 770 1020 997 790 750 760 765 850 1029 900
The arithmetic average and the standard deviation of the data values are 863.1
μmhos/cm and 114.82 μmhos/cm, respectively. The samples are assumed to have
spatial independence. Find the number of data for e = 10%.
Since the standard deviation for the given example is 114.82 ppm then at 10%
standard error level one can find the number of representative data from Fig. 2.9 as
100.
The smaller the sampling interval (distance) the more is the correlation between
nearby observations, and consequently the simple result in Eq. (2.2) cannot be
36 2 Data Types and Logical Processing Methods
applied directly to the averages of dependent series. The physical phenomena within
the earth sciences (as atmosphere) that give rise to such features are non-linear
dynamic systems with limited predictabilities. Therefore, it is not convenient to have
a very persistent process in their systematic modeling, but rather lower-order pro-
cesses seem more convenient without any mathematical complications. For instance,
Schubert et al. (1992) proposed a first-order Markov process to provide a general
picture of the persistence-based model behaviors compared to the General Circu-
lation Model (GCM) in atmospheric sciences. First-order Markov processes have
short memory of the correlation function and therefore they are not sufficient in the
GCM. The same authors then attempted to improve the situation upon the statistical
model by fitting low-order univariate Auto-Regressive Integrated Moving Average
(ARIMA) models to the control run of the GCM. Detailed information concerning
these models is available from Box and Jenkins (1976) as a natural extension of
the first-order Markov model, and they are useful in modeling atmospheric behav-
iors (Chu and Karz, 1985). In addition to finite sample length, the autocorrelation
structure of the process causes further source of bias (Kendall, 1954; Quenouille,
1956).
In order to model persistence within earth sciences ReVs, herein, the ARIMA
(1,0,1) model is considered. It is the mixture of separate stochastic processes includ-
ing autoregressive and moving average models. The numbers in the argument as
(1,0,1) imply that this type of ARIMA model is composed of first-order autore-
gressive (Markov) and first-order moving average processes with zero order differ-
ence between successive values. Generally, the model is written mathematically as
follows:
where φ and θ are the autoregressive and moving average parameters, respectively,
and εi is a zero-mean independent (white noise) random variable. The autocorrela-
tion structure of the same process is presented in terms of the model parameters as
(Box and Jenkins, 1976)
ρ = 0,
(φ − θ) (1 − φθ)
ρ1 = , (2.4)
(1 + θ2 − 2φθ)
ρi = φ ρi−1 ( i ≥ 2).
These expressions reduce to the white noise case when φ and θ are equal to zero;
to first-order Markov process case if θ = 0 and φ = ρ; and finally, to the moving
average process for φ = 0. For this model, the variance of time averages can be
calculated by using model parameters, which are the fundamental quantities related
to some statistical parameters that can be estimated from available data. In order to
illustrate this point, let us consider the data smoothened using a simple arithmetic
average of length m less than record length n. Hence, the arithmetic average for such
a subsample length n is
2.5 Number of Data 37
1
n
Xn = Xi . (2.5)
n
i=1
By taking first the square and then the expectation operator on both sides leads after
some algebra to
⎡ ⎤
1 ⎣
n n
n
V(Xn ) = 2 E(X2i ) + E(Xi Xj )⎦ − μ2 . (2.6)
n
i=1 i=1 j=1
The substitution of ARIMA (1,0,1) autocorrelation structure from Eq. (2.4) leads to
E(S2 ) 2 ρ1
VA (Xn ) = n + (1 − φ) − (1 − φn
) . (2.8)
n2 (1 − φ)2
This expression provides a common basis for the change calculations by giving the
variance of the pdf of the ReV means for finite lengths, i.e., sub-samples, from a
complete data set. For large samples (n > 30) the distribution of means converge to
a normal pdf due to the well-known central limit theorem in statistics. Square root
of Eq. (2.8) is equal to the standard deviation of such a normal pdf. Consequently, it
can be used for determining whether the small sample mean value in a given REV
is significantly different from its long-term mean value, μ, supposedly calculated
from the whole record. For this purpose, the executions of the following steps are
necessary:
1) Identify the underlying stochastic or ReV model for the given earth sciences
phenomenon.
2) Find the theoretical variance of the averages by substituting the necessary model
parameters into Eq. (2.8).
3) Consider the population pdf of the given
data averages as a normal pdf with
mean, μ, and standard deviation σA = VA (Xn )
4) Find the standard deviate, t, for the means calculated from a given length, m, as
Xn − μ
t= . (2.9)
σA
38 2 Data Types and Logical Processing Methods
For the first-order Markov process, similar expressions to Eq. (2.8) can be
obtained provided that φ is substituted by ρ1 , which leads to
E(S2 ) 2 ρ1
VA (Xn ) = n+ (1 − ρ1 ) − (1 − ρ1 ) .
n
(2.10)
n2 (1 − ρ1 )2
This expression reduces to Eq. (2.1) for ρ1 = 0, which corresponds to the case of
independent model as explained in the previous section.
Generally, even for very small samples (n < 30) one can use Chebyshev inequal-
single value, say σA , selected at random to
ity, which states that the probability of
deviate from μ of the pdf more than ± VA (Xn ) is less than or equal to 1 K2 , and
it can be expressed in mathematical form as
P |σ A − μ| ≥ K VA (Xn ) ≤ 1 K .
2
(2.11)
This inequality yields upper and lower limits on the probability of deviation of a
given magnitude from the mean value. Hence, it is possible to find confidence inter-
vals on either sides of the average value.
Example 2.6
Let us calculate 95% confidence (reliability) limits for an ARIMA (1, 0, 1) model
with model parameters φ = 0.6 and for a finite length data, n = 15. First of all, from
Eq. (2.4) one can find that
ρ1 = 0.34, the substitution of which with other relevant
values into Eq. (2.8) gives VA (Xn ) = 0.40 σ. Confidence
level of 95% implies that
2
1 K = 0.95 or K = 1.026 and accordingly K VA (Xn ) = 0.41 σ. Consequently,
the confidence interval limits are. Finally, if the calculated square root of variance
of a finite data averages with n = 15 lies outside of these limits, then there is a trend
with the possibility of ReV change. Of course, for any desired small sample size
confidence interval limits can be calculated easily according to the aforementioned
procedure.
On theother hand, for the moving average case φ = 0, and hence from Eq. (2.4)
ρ1 = −θ (1 + θ2 ) and its substitution into Eq. (2.8) leads to
σ2 2 (n − 1) θ
VA Xn = 2 n − . (2.12)
n 1 + θ2
2.5 Number of Data 39
1
n
S2 = (Xi − X)2 . (2.13)
n−1
i=1
The existence of n–1 rather than n in the denominator of Eq. (2.13) is because of
obtaining unbiased variance estimate for independent processes. Indeed, taking the
expectation of both sides leads to E(S2 ) = σ2 . This indicates with no hesitation
that it is possible to substitute σ2 in Eq. (2.1) by its sample estimate for indepen-
dent ReVs. However, as will be shown in the following sequel this is not true for
dependent processes. It has already been shown by Şen (1974) that for dependent
processes
2
n
E(S ) = σ
2 2
1− (n − i)ρi , (2.14)
n(n − 1)
i=1
where ρi represents lag-i autocorrelation coefficient. This last expression shows that
there is bias effect not only due to the finite sample length, n, but also more sig-
nificantly due to autocorrelation structure of the ReV. Substitution of Eq. (2.4) into
Eq. (2.14) leads, after some algebraic manipulations, to
2 ρ1 n(1 − φ) − (1 − φn )
E(S ) = σ 1 −
2 2
. (2.15)
n(n − 1) (1 − φ)2
σ2 − E(S2 )
αr = 100 .
σ2
Hence, substitution of Eq. (2.15) into this expression leads to the most general form
of the relative error for the ARIMA (1,0,1) model as
2 ρ1 n(1 − φ) − (1 − φn )
αr = x100. (2.16)
n(n − 1) (1 − φ)2
40 2 Data Types and Logical Processing Methods
10
8
ρ =0.1
6
α
φ =0
4 .9
0
2 20 40 60 80
n
Fig. 2.10 Variance of average estimate relative error change with sample length (ρ1 = 0.1)
Based on this expression the change of relative error, αr , with the sample length n
for a set of ρ1 values are presented as charts in terms of φ values in Fig. 2.10.
A close inspection shows an exponential decrease in all the charts with sample
length. Increase in ρ1 implies increase in the relative error percentage for a given
sample length. In these figures the Markov model case corresponds to the curves
where ρ1 = φ. These charts are useful tools in finding equivalent independent
model length for a given dependent model length, provided that the relative error
percentage is given. For instance, when 10% error level is acceptable in the first-
order Markov model small data mean variance estimation with ρ1 = 0.3, then it is
possible to find from chart in Fig. 2.10 that after at least eight samples the ReV will
not have dependence structure. However, for the same error level and dependence
coefficient, an ARIMA (1,0,1) model requires at least 11, 16, and 48 samples for
φ = 0.5, 0.7, and 0.9, respectively.
These charts can also be used as indicators whether there is a trend in a given
ReV. For example, if the underlying generating mechanism is identified using the
procedures of Box and Jenkins (1976) as an ARIMA (1, 0, 1) model with ρ1 = 0.5
and φ = 0.7, then the error percentage is calculated from Eq. (2.16). If this error
percentage is greater than the theoretically calculated counterpart, then there is a
possibility of trend.
Another relative error for measuring any deviation from the independent model
with no long-term systematic features can be defined by considering Eqs. (2.1) and
(2.8) as
VA (Xn) − VI (Xn )
β = 100 .
VA (Xn )
This expression provides information about the necessary sample length concerning
the variance of climatic sub-sample averages. It is possible to write this expression
explicitly in terms of model parameters from Eqs. (2.1) and (2.8) as
2.6 Regional Representation 41
⎧ ⎫
⎪
⎪ ⎪
⎪
⎨ 1 ⎬
β = 100 1− ⎪ . (2.17)
⎪
⎪ 2ρ1
n) ⎪
⎩ n+ (1 − θ) − (1 − θ ⎭
(1 − θ)2
Finally, if the variance of short data and the variance of sub-sample averages are
calculated for the same sample length, then the relationship between these two types
of relative errors can be found from Eqs. (2.16) and (2.17) as
⎡ ⎤
⎢ 1 ⎥
β = 100 ⎣1 −
n−1 ⎦
, (2.18)
1+ αr
100
which implies that β is equal to zero only when αr approaches to zero. Otherwise,
for any given αr , there is an implied β-type error. The graphical representation of
Eq. (2.18) is given in Fig. 2.11. If any pair of αr , β, and n values are known, then the
third one can be determined from this chart easily.
α
φ=
0.9
10
0.7
0.30.5
0.1
0
2 20 40 60 80
n
Decision-making about which model to use for spatial estimation should be based on
some simple criteria as the number of data points, variability range, arithmetic aver-
age, standard deviation, geometrical configuration of sampling points, distances,
and in general distance-regional variability. Classical parameters such as the arith-
metic average, standard deviation, skewness coefficient in terms of summary statis-
tics can be readily known by the reader, and therefore they will not be elaborated in
this book (see Davis, 2002).
42 2 Data Types and Logical Processing Methods
RZ = ZM − Zm . (2.19)
RZ
em = 100 . (2.20)
ZM
n
SSD = (Zi − ZC )2 , (2.21)
i=1
n
n
SSD = Z2i − 2ZC Zi + n Z2C . (2.22)
i=1 i=1
2.6 Regional Representation 43
Since the minimum value is sought, the derivative of this last expression with respect
to ZC leads to
∂ (SSD)
n
= −2 Zi + 2 n ZC . (2.23)
∂ ZC
i=1
1
n
ZC = Zi , (2.24)
n
i=1
which is equal to the arithmetic average, Z. This proves that irrespective of any
practical requirement, the use of arithmetic average as a representative value for
the regionalized variable is a mathematical necessity. Further interpretation of
Eq. (2.24) can be given after its explicit form as
1 1 1 1
ZC = Z1 + Z2 + Z3 + . . . + Zn1 , (2.25)
n n n n
which means that the arithmetic average is the summation of point data value multi-
plied by a factor (1/n as weighting factor), which may have different interpretations.
1) The factors may be considered as weights for each data point, and therefore
they may represent some specific feature of the data point apart from the data
point value. For instance, weight may be the influence area of the point or some
important coefficient such as the relationship of this data value with the available
points. In this latter case, the factors may be the function of correlation between
data point pairs in addition to distances. In Eq. (2.25) the factors are the same,
which implies the isotropic behavior of the regionalized phenomenon, which
is not the case in natural events, and therefore each factor is expected to be
different than others.
2) In Eq. (2.25) equivalence of factors as 1/n reminds one that if there are n data
points, provided that they have equally likely future occurrence chances, the
probability of occurrence for each data value is p = 1/n. This corresponds to
random field case where the occurrences are completely independent from each
other, i.e., there is no regional correlation within the regionalized variable. How-
ever, in actual situations these probability of occurrences are not equal.
3) Another significant point from Eq. (2.25) is that the summation of the factors or
probabilities is always equal to 1. Furthermore, the factors are in percentages.
4) Finally, each factor is distance-independent whereas in any natural phenomenon
there are regional dependences, which may be simply expressed as inverse dis-
tance or inverse distance square weightings without natural consideration (see
Section 2.6.2).
44 2 Data Types and Logical Processing Methods
n
ZC = αi Zi , (2.26)
i=1
where Wi is the weight attached to i-th data point and WT represents the total weight
as the summation of all weights
WT = W1 + W2 + . . . + Wn , (2.28)
with n number of data points. The weights that are assigned to the estima-
tion points according to the weighting function are adjusted to sum to 1.0.
Therefore, the weighting function actually assigns proportional weights and
expresses the relative influence of each estimation point. A widely used ver-
sion of the weighting process assigns a function whose exact form depends
upon the distance from the location being estimated and the most distant point
used in the estimation procedure. This inverse distance-squared weighting func-
tion is then scaled so that it extends from 1 to 0 over this distance. Equation
(2.26) is in the form of linear multiple regression and it furnishes the basis
of all the linear estimation models in spatial analysis as will be explained in
this book.
Logically, the use of mode value is preferable because it is the most fre-
quently occurring data value within the given set of data points. The basis of
mode is probabilistic rather than mathematical, and it has the most likely prob-
ability of occurrence. A decision is to be made between the arithmetic aver-
age and the mode value in practical applications. If the pdf of the regional-
ized variable is symmetric, then the two concepts fall on each other. Otherwise,
the use of mode must be preferable, but mathematically its calculation is not
easy and therefore in any statistical or mathematical modeling arithmetic aver-
age is used invariably. The reader should keep in mind that although it yields
the minimum error, it does not abide with the most frequently occurring data
value. The arithmetic average value renders the mathematical procedures into a
tractable form.
2.6 Regional Representation 45
where dij is the effective separation distance between grid node “j” and the neigh-
boring points “i,” Zi ’s are the neighboring point ReV values and α is the weighting
(smoothing) power. It is possible to derive special features from this general expres-
sion, which are well known in the literature. The last equation becomes equal to
inverse distance and inverse square distance for α = 1 and α = 2, respectively. The
slopes at the points used in the estimation procedure are weighted according to the
distances between the estimation node and the other points. Various inverse distance
functions are presented in Fig. 2.12.
Weighting is assigned to data through the use of a weighting power that con-
trols how the weighting factors drop off as distance from a grid node increases. The
greater the weighting power the less effect points far from the grid node has dur-
ing interpolation. As the power increases, the grid node value approaches the value
of the nearest point. For a smaller power, the weights are more evenly distributed
among the neighboring data points. Normally, inverse distance function behaves as
an exact interpolator, but still does not take into consideration the inherent ReV
variability in the phenomenon concerned. In calculating a grid node, the weights
assigned to the data points are fractions and the sum of all the weights is equal to
1.0 (Eq. 2.28). When a particular observation is coincident with a grid node, the
distance between that observation and the grid node is 0.0, and that observation is
given a weight of 1.0 while all other observations are given weights of 0.0. Thus, the
46 2 Data Types and Logical Processing Methods
10
9
α=3
8
α=2
7
6
Weight, w
α=1
5
3 α = 0.5
1 α = 0.3
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Distance, d
grid node is assigned the value of the coincident observation. Here α is a mechanism
for buffering this behavior. When one assigns a non-zero smoothing parameter, no
point is given an overwhelming weight so that no point is given a weighting factor
equal to 1.0.
One of the characteristics of inverse distance function is the generation of “bull’s-
eyes” surrounding the position of observations within the gridded area. One can
assign a smoothing parameter during inverse distance function to reduce the “bull’s-
eye” effect by smoothing the interpolated grid. Inverse distance function is a very
fast method for gridding. With less than 500 points, one can use all data search types
and gridding proceeds rapidly.
2.7.1 Triangularization
The number of data points is also important in deciding which spatial prediction
model must be adopted. No need to say that spatial variability is describable by at
least three data points, which give the shape of a triangle; and, therefore, in the early
approaches before the advent of computers, triangularization method was preferred
due to its simplicity. If there are n data points, it is possible to obtain n–2 adjacent
triangles. For instance, five sampling points (A, B, C, D, and E) in Fig. 2.13 yield to
three triangles. Each corner is a measurement station. The ReV within each triangle
may be represented as the arithmetic average of the data values at three apices. In
this manner, rather than the use of arithmetic average over all the study area, it is
partitioned into triangular sub-areas where the arithmetic averages are used. In this
manner, the amount of the error in the global arithmetic average usage is reduced
significantly.
The problem in triangularization is that one point record enters more than one
sub-area. The question is, is it not better to define an influence area for each sub-
area separately? So that the whole study region is divided into a set of sub-areas
with influence of the data point.
In practice, given the scarcity of gauges and the spatial variability of ReV, for
instance, in the case of precipitation many storms completely miss several gauges
within a drainage area. Therefore, two basic tasks must be performed, namely the
assessment of the representativeness of point rainfall and picture derivations of spa-
tial patterns that reflect reality. Summer (1988) states, “In the ideal world, it should
be possible to track and then model, mathematically or statistically the passage of a
storm across an area with a great degree of accuracy and precision. In reality, this
is very difficult, and one must be content with generalized spatial models of storm
structure relating intensity or depth to storm area.” Kriging and stochastic methods
for the areal average estimation (AAE) based on the spatial correlation coefficient
are summarized by Bras and Rodriguez-Iturbe (1985). However, the use of these
methods needs recordings at many stations for the results to be reliable. Tabios and
Salas (1985) compared several AAE methods with rainfall variability and concluded
that a geostatistical method, ordinary and universal (Kriging, Chapter 4), with spa-
tial correlation structure is superior to Thiessen polygons, polynomial interpretation,
and inverse-distance weighting. Hevesi et al. (1992) suggested the use of multi-
variate geostatistical techniques for areal precipitation estimation in mountainous
A
B
E C
terrain. Reliable estimates by these techniques are particularly difficult when the
areal coverage of stations is sparse or when precipitation characteristics vary greatly
with locations. Such situations frequently occur in arid regions due to sporadic
and haphazard meteorological occurrences. On the other hand, Kedem et al. (1990)
have shown, by considering satellite images and simple probability models, that the
higher the rainfall the smaller the affected area over large regions. All these methods
require high-speed computers and they are not as practical as conventional proce-
dures such as the arithmetic average, Thiessen polygons, or isohyetal map tech-
niques, which do not require much data (Chow, 1964).
An alternative AAE calculation method is presented by Akin (1971). It is
assumed that the precipitation over the sub-area varies linearly between the three-
corner (triangular) gauge points. Thus, at any point (x, y) interior to the n-th sub-area
the precipitation height Hn (x,y) is expressed as
Hn (x,y) = αn + βn x + γn y, (2.30)
Hi (x,y) = αn + βn xi + γn yi .
Hj (x,y) = αn + βn xj + γn yj . (2.31)
Hk (x,y) = αn + βn xk + γn yk .
The solution of constants from these equations in terms of known quantities leads
to
αn = ai Hi + aj Hj + ak Hk /2An ,
βn = bi Hi + bj Hj + bk Hk /2An , (2.32)
γn = ci Hi + cj Hj + ck Hk /2An
where
ai = xj yk − xk yj .
bi = yj − yk . (2.33)
ci = xk − xj .
An = ai + aj + ak /2. (2.34)
2.7 Sub-areal Partition 49
The differential volume of rainfall at any point within the sub-area is defined as
dQ = H(x,y)dA. (2.35)
So that the total volume of rainfall associated with the sub-area becomes theoreti-
cally as
! !
Qn = αn + βn xn + γn yn dxdy, (2.36)
where the substitution of the above relevant expressions leads after some algebra,
finally, to the volume of rainfall for the n-th sub-areas as
Qn = An αn + βn (xi + xj + xk )/3 + γn (yi + yj + yk )/3 . (2.37)
m
Q= Qn , (2.38)
n=1
m
A= An . (2.39)
n=1
Finally, the ratio of Eq. (2.38) to Eq. (2.39) gives the AAE height over m sub-areas
as
Q
H= .
A
By means of this procedure, the AAE area and volume are easily calculated if the
gauge locations and rainfall amounts are known.
The development is analogous to some elementary concepts used in finite ele-
ment analysis techniques. Consider a region of interest with station locations and
amounts that are plotted on a map, then a series of straight lines are drawn arbi-
trarily to connect every gauge points with the adjacent gauges. Straight lines are
drawn in anti-clockwise direction as shown in Fig. 2.14. These straight lines should
produce a series of triangles “not necessary to have the same shape.”
Each triangle area is known as sub-area and the corners of the triangles are shown
by [i, j, k]. Precipitation values at the corners are denoted as Hi , Hj , Hk (i, j, k = 1,
2, . . . , n), where n is the number of sub-areas. Triangularization of all the stations
for the northern part of Libya is shown in Fig. 2.15. On the other hand, Table 2.6
shows sub-areas for this study.
50 2 Data Types and Logical Processing Methods
1
H
K
Y H
(x,y) 1
K
i
5 X dA
2 Hn(x,y)
4
1
3 H
2
14
16 13 3
MEDITERRANEAN SEA
17 15 12
18 32
15 14 13 4
23 11 2
11
33 12 10 10 1
19
27 31 22 9
20 30 24
N
35 28 9
16 34 29 8
21 8 3
7
GULF OF SIRT 5
26 36 37
23 6
22 24 27
6
26 4
28 7 5
25
1
18
19 Scale
29 - point number
37 - area number
Sub-area Sub-area
number i j k number i j k
1 1 2 3 20 15 16 29
2 3 4 5 21 16 17 29
3 3 5 1 22 17 18 29
4 5 6 1 23 18 26 29
5 6 7 1 24 18 27 26
6 7 8 20 25 18 18 27
7 7 8 20 26 19 20 27
8 9 10 20 27 19 1 7
9 10 21 20 28 19 7 20
10 20 11 21 29 20 21 25
11 11 12 21 30 21 22 28
12 12 22 21 31 22 23 28
13 12 13 22 32 15 25 24
14 13 23 22 33 28 23 25
15 13 24 23 34 26 25 29
16 13 14 24 35 26 28 25
17 14 25 24 36 27 28 26
18 14 15 25 37 20 28 27
19 15 29 25
Average volume
Season Average depth (mm) (×106 m3 )
2.8 Polygonizations
The idea is to surround each data point with a polygonal area of influence so that
the number of points will be equal to the number of sub-areas. In the case of n data
points, there will be n area of influence.
These polygonal methods are all related to each other with simple differences but the
basic logic is the same. Different names are used in different disciplines for the same
method. According to these polygon methods, the study area can be partitioned into
a set of convex polygons, each containing only one measurement point such that
every point within a given polygon is closer to the measurement point than any
other measurement points. Each polygon defines the area of influence around the
measurement point (Fig. 2.16). Each one of these polygons is also called as Thiessen
polygon whose boundaries define the area that is closer to each point relative to all
other points.
They are geometrically defined by the perpendicular bisectors of the lines
between all points. A Voronoi diagram is sometimes also known as a Dirichlet
(1850) tessellation, with cells that are Dirichlet regions, Thiessen or Voronoi poly-
gons (Dirichlet, 1850; Voronoi, 1907; Thiessen, 1912). On the other hand, the
Delaunay triangulation and Voronoi diagram in 2D space are dual to each other in
the graph theoretical sense. Voronoi diagrams are named after Russian mathemati-
cian who defined and studied the general n-dimensional case in 1908. Voronoi dia-
grams that are used in geophysics and meteorology to analyze spatially distributed
data (such as rainfall measurements) are called Thiessen polygons. In climatology,
Voronoi diagrams are used to calculate the rainfall of an area, based on a series
of point measurements. In this usage, they are generally referred to as Thiessen
polygons.
Thiessen method is quick to apply because once the sub-polygons are fixed with
a set of observation points, they remain the same all the time. The only change
occurs artificially when additional observation points are added to the available set
of measurement points. It is based on the hypothesis that for each point in the area,
2.8 Polygonizations 53
•
• • •
• • •
• •
•
•
•
• •
• •
•
•
•
•• • • • •
•
• • •
•
• • • •
• ••
•
• • • •
Fig. 2.16 Polygonalization
the best estimate of ReV is the measurement physically closest to that point. This
concept is implemented by drawing perpendicular bisectors to the straight lines con-
necting each two-measurement stations, which yields with the consideration of the
watershed boundary a set of closed areas known as Thiessen polygons. Based on the
given measurement stations, the sub-polygons are obtained according to the follow-
ing steps:
1) Connect each station to each nearby station with a straight line. These lines
cannot cross and should connect only the nearest stations. The end product is
several triangles.
2) Each side of the triangles is then bisected with a perpendicular line, thus forming
polygons around each station.
3) Using an appropriate method calculate the total area, A, and sub-areas repre-
sented by each polygon (A1 , A2 , . . ., An ). The number of sub-areas is equal to
the number of measurement locations (Eq. 2.28).
4) Calculate the areal average estimation (AAE) of ReV as ZC according to the
weighted average formulation in Eq. (2.27). In this equation, Wi ’s correspond
to the sub-area at measurement location i with measurement Zi and WT is the
total area.
54 2 Data Types and Logical Processing Methods
Example 2.7
Consider the measurement locations A, B, C, and D as shown in Fig. 2.17. Let the
precipitation values be as 3.5, 2.9, 5.8, and 4.2 mm, respectively. Draw the Theissen
sub-polygons and calculate the areal average ReV value. The total area is given as
A = 50 km2 , and after the polygon partitioning each sub-polygon area are calculated
as AA = 12 km2 , BA = 17 km2 , CA = 13 km2 and DA = 8 km2 .
Using Eq. (2.27) the Thiessen areal average of ReV estimation for the entire area
becomes
12×3.5 + 17×2.9 + 13×5.8 + 8×4.2
ZC = = 4.174 mm .
50
A
C
location configuration, they remain the same as long as the measurement locations
do not change or there are no additional stations. However, it is logically plausible
to expect that the sub-areas should change in response to the spatial variation of the
phenomenon concerned. In other words, the partition should be based not only on
the measurement location network configuration but also on the ReV measurements
at each location.
It is a new, simple, practical, and objective AAE method for determining the areal
average of the spatial event based on a sparse and/or irregular network of mea-
surement locations (Şen, 1998b). This method takes into account ReV measure-
ment percentage weightings for each station and also has geometrical advantages,
i.e., a better representation of the ReV on the study area compared to the conven-
tional Thiessen polygon method. For instance, ReV data such as precipitation show
a considerable spatial variation over any region, as explained by Huff and Neill
(1957), Stout (1960), Jackson (1972), and Summer (1988). Wilson and Atwater
(1972) suggested that this variation is due to differences in the type and scale of
precipitation-producing models, which are strongly influenced by local or regional
factors such as topography and wind direction. The AAE of the ReV over an area is
most conveniently determined from a well-sited network of measurement locations,
which show the local variations of ReV. For most areas, each measurement station is
assumed to represent a very considerable area around it. However, this is a restrictive
and frequently invalid assumption because of the spatial variation of ReV, especially
for short distances (durations), such as during severe storms (Summer, 1988). There
is no guarantee that point measurements provide reliable estimation for immediate
surrounding areas. Relief often leads to large variations in ReV over relatively short
distances. In the case of, say, precipitation if upward motion of air occurs uniformly
over thousands of square kilometers, the associated precipitation has usually light
or moderate intensity and may continue for a long time. On the other hand, con-
vective storms accompanied by compensating downdrafts (as best illustrated by the
thunderstorm) may be extremely intense, but their areal extent and local duration
are comparatively limited. In practice, given the scarcity of gauges and the spatial
variability of ReV, many stations may miss complete measurement. Therefore, two
basic tasks must be performed:
After deciding on the triangles, the following procedure is necessary for dividing
the study area into polygons, leading to the percentage weighting method. If the
precipitation values at three apices of a triangle are A, B, and C, then their respective
percentages are
56 2 Data Types and Logical Processing Methods
1) Draw lines between each adjacent pair of precipitation stations. Hence, a set of
triangles is obtained, which cover the study area.
2) For each triangle calculate the precipitation percentage at its apices according
to Eqs. (2.40, 2.41, and 2.42). Consider in each station that each apex has the
value of 100% with 0% on the opposite side.
3) Consider bi-sector, which connects an apex to the midpoint of the opposite
side, and graduate it into 100 equal pieces.
4) By making use of one precipitation percentage calculated in step 2, mark it
along the convenient bi-sector starting from the opposite side toward the apex.
5) Draw a parallel line from this marked point in step 4 to the side opposite to the
apex considered with its precipitation percentage.
6) Repeat steps 4 and 5 for the next precipitation percentage and find similar
parallel line this time to another opposite side.
7) The intersection of these two lines defines the key point for the triangle
considered.
8) In order to check the correctness of this key point, repeat steps 4 and 5 for the
remaining third precipitation percentage value. If the parallel line to the side
100
0
0
10
coordinate paper 0
2.8 Polygonizations 57
crosses through the aforementioned key point, then the procedure of finding
the key point for the triangle is complete. Otherwise, there is a mistake either
in the precipitation percentage calculations or in the location of marked points
along the bi-sectors in steps 3 through 6 inclusive.
9) Return to step 2 and repeat the process for the triangles constructed in step 1.
In this manner each triangle will have its key point. The location of this
point within the triangle depends on the percentages of recorded precipitation
amounts at the three adjacent apices. The greater the precipitation percentage
for an apex the closer the point will lie to this apex. It is not necessary that the
triangles resulting from a given set of stations in a network should be exactly
equilateral. However, in the Thiessen method, for an obtuse-angle triangle the
intersection of the three perpendicular bi-sectors occurs outside the triangular
area.
10) Key points at adjacent triangles are connected with each other to form poly-
gons, each including a single precipitation station.
11) The boundaries of polygons around the basin perimeter are defined by drawing
a perpendicular to the sides of triangles from the key points. Now, the division
of the whole basin area into sub areas is complete.
The triangular or polygon methods are preferred in practice when there are sev-
eral data points about eight to ten.
Example 2.8
A simple example of plotting on a triangular coordinate paper is presented in
Fig. 2.19 for an obtuse-angle triangle.
If the precipitation amounts at three apices are A = 12.5 cm, B = 20.1 cm,
and C = 7.4 cm, then the corresponding percentages from Eqs. (2.40, 2.41,
and 2.42) are pA = 31, pB = 50, and pC = 19. In Fig. 2.19, point D on the
A–A bi-sector corresponds to 31% of the A–A length starting from A , which
lies on the B–C side of the triangle representing 0%. A parallel line to side
B–C is drawn from point D. Likewise, the next percentage is considered for
the precipitation amount at apex B. The bi-sector B–B is drawn starting from
point B on the side A–C toward B. On the bi-sector, point E corresponding
to 50% is depicted. A parallel line from this point to the side A–C is drawn.
Finally, the intersection of these two parallels at point F defines the “key point’’
for triangle ABC. The following steps are necessary for the implementation of
the PWP:
1) Three adjacent stations are considered such as in Fig. 2.19, where each apex
is coded by its longitude (XA , XB , and XC ), latitude (YA , YB , and YC ) and
precipitation value (ZA , ZB , and ZC ).
2) The slopes (mAB , mBC , and mCA ) of triangle sides are calculated by making
use of the apices coordinates.
58 2 Data Types and Logical Processing Methods
A Bisectors
B'
F Parallel to BC
D
E
B
A' C
Pa
ra
lle
l to
AC
3) Determine the straight line equations perpendicular to each of the sides, but
crossing from the opposite apex by analytical formulations. First of all, the
coordinates of the projection point such as A and B of each apex on the oppo-
site side must be calculated. For instance, the coordinates XA and YA of point
A can be expressed in terms of known quantities as
1 " #
XA = m2BC (XB ) + mBC (YA − YB ) + XA , (2.43)
m2BC + 1
1
YA = XA − XA + YA , (2.44)
mBC
where
YB − YC
mBC = . (2.45)
XB − XC
Similarly, the coordinates XB and YB of point B on A–C side are
1 " #
XB = m2
(X
CA C ) + mCA (YB − YC ) + XB , (2.46)
m2CA + 1
1
YB = XB − XB + YB , (2.47)
mCA
2.8 Polygonizations 59
where
YC − YA
mCA = . (2.48)
XC − XA
4) The precipitation amounts (ZA , ZB , and ZC ) are used in order to find points
along the perpendiculars starting from the side toward the apex, which divide
each one in proportions λ1 , λ2 , and λ3 . By use of these ratios and the previous
known quantities, the coordinates of points A , B , and C are determined. For
instance, the coordinates XA and YA are defined as
5) On the other hand, from the precipitation measurements at the three apices of
any triangle, the total precipitation T value becomes
T = ZA + ZB + ZC ,
ZA /T + ZB /T + ZC /T = 1.0,
or
λ1 + λ2 + λ3 = 1.0
where
ZB
λ2 = . (2.53)
ZA + ZB + ZC
6) The straight lines are drawn, which pass through point A and parallel to the
B–C side of the triangle and similarly parallel to A–C crossing through B .
Let these parallel lines be D1 and D2 . The equations of these lines are given
as follows:
60 2 Data Types and Logical Processing Methods
Yk = mBC Xk − X A + Y A (2.57)
10) The area of each polygon is calculated by one of the available convenient math-
ematical methods.
11) Multiplication of each polygon area by its representative station precipitation
value gives the volume of water that is gathered over this polygonal sub-area.
12) Summation of these volumes for all the relevant polygons that cover the whole
catchment area gives the total water volume which falls over the catchment
area.
13) Division of this total water volume by the total catchment area yields the AAE
value for the catchment.
In general, the comparison of the Thiessen and PWP methods indicates the fol-
lowing significant points:
3) The size of any closed polygon for a station in the PWP method is always
smaller than the corresponding Thiessen method. Hence, more refined partition
of the catchment area into sub-areas is obtained. This implies that more refined
AAE values result by use of the PWP method. In fact, in the PWP method the
biggest precipitation record station results in the smallest sub-area.
4) Variations in the precipitation amounts, due to some different storms, change
the polygonal partition in the PWP method, whereas Thiessen polygons remain
the same.
Example 2.9
The implementation of PWP method was applied first to spatial precipitation data
given by Wiesner (1970) and Bruce and Clark (1966) as in Fig. 2.20, where the geo-
metric configuration of precipitation stations are given for the same data according
to the Thiessen and PWP methods. In order to assess the reliability of these meth-
ods, the average sum of square deviations (ASSD) from the AAE value is calculated
for each method as
1
n
2
ASSD = Pi − AAE , (2.58)
n
i=1
A(0.91)
B(1.62)
C(2.54)
D(4.23)
E(2.8)
B(5.8)
A(3.1)
F(0.83)
G(3.86) C(2.54) D(3.3)
H(2.10)
E(1.5)
F(1.0)
I(1.84)
Catchment boundary Catchment boundary
Inter-station connections Inter-station connections
Theissen polygon Theissen polygon
Percentage weighted polygon Percentage weighted polygon
(a) (b)
Fig. 2.20 Precipitation data (a) Wiesner, (b) Bruce, and Clark
62 2 Data Types and Logical Processing Methods
where n is the number of stations. The results for various methods are shown in
Table 2.9.
The PWP method yields smaller AAE and ASSD values than the Thiessen
method.
As mentioned earlier, with the Thiessen method, in the obtuse triangle BDF a per-
pendicular bi-sector’s intersection appears to the left of BF line (see Fig. 2.20a). By
considering precipitation amounts at stations B and F, it is not possible to expect a
value of about 4 inches. On the other hand, in the PWP method there is no such con-
fusion. Wiesner (1970) has shown, after completion of an isohyetal map for the same
precipitation data, that the storm peak occurs within a small area around station D in
Fig. 2.20a. This is also confirmed by the PWP method as shown in Fig. 2.20a. Again,
in the same figure Thiessen method yields only one closed polygon, whereas PWP
provides three such polygons. Extensions of Thiessen boundary polygons between
the pairs of stations F and I and E and I suffer from the same drawbacks in that it
is not possible to have precipitation values greater than 1.8 inches between F and I,
and more than 2.83 inches between E and I. Similar general interpretations are valid
also for Fig. 2.20b where there are two odd bi-sectors’ intersection points that lie
outside the triangles, resulting from the application of Thiessen method. These lie
to the left of line A–F and below line F–E. Comparatively, even in this figure, the
closed polygon from the PWP method is smaller in size than the Thiessen case. The
PWP method can be applied to local or regional scales due to the following reasons:
underestimated, because most rain gauge stations are located in valleys and do
not catch very well the heavy precipitation on the slopes or mountain peaks.
3) The proposed PWP method is very convenient in areas where the precipitation
stations are irregularly located.
Example 2.10
In order to determine the AAE of rainfall from ten different meteorological sta-
tions, the PWP method is applied together with the other conventional methods
(arithmetic mean, Thiessen polygon, isohyetal map technique) to the Southeastern
Anatolia Region of Turkey (Bayraktar et al., 2005). Total monthly rainfall values of
these stations in 1993 are used and presented in Table 2.10.
About 1/500,000 scaled maps are used to draw the sub-areas, which are measured
with a planimetry. For each method, AAR values are calculated with the help of
Eq. (2.27).
Application map of Thiessen method and areal values of Thiessen polygons are
merely given in Fig. 2.21, because the key points remain the same as long as the
meteorological stations do not change. Monthly isohyetal maps are prepared for
January, April, July, and October as in Fig. 2.22.
For the same months rainfall values and sub-areas are given in Fig. 2.23a,b,c,d.
In PWP method, which is the main purpose of this study, values of rainfall and per-
centage weightings are calculated for each of the three adjacent stations constituting
sub-triangles.
During the application, PWP rainfall values are calculated by considering Eqs.
(2.40, 2.41, and 2.42) used for the determination of sub-areas. PWP method cal-
culations and sub-areal values are given in Fig. 2.23a,b,c,d. For the comparison of
different methods all results are presented collectively in Table 2.11.
As can be seen from this application, the PWP method yields more reliable
results and smaller AAE values depending on the regional variability of the rainfall
amounts over the catchment area. For instance, in July, rainfall values have many
regional variations. This is due to the semi-arid characteristics of the study area and
Stations Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec
AHLAT
N
1695 km2
VAN
BITLIS
4415 km2 4550 km2
SILVAN
BASKALE
2390 km2 1545 km2
SIIRT
NUSAYBIN
Fig. 2.21 Application map of Thiessen method (January, April, July, October 1993)
frequent occurrence of convective rainfalls, which are not expected to cover large
areas. It is noted that in July, Başkale station surrounding appears as the most rain-
fall reception area, which is represented by a sub-area of 1,545 km2 in the Thiessen
method, whereas by the sub-areas of 3,140 km2 and 519 km2 in the arithmetic mean
and the isohyetal map technique, respectively. On the other hand, it is represented
by a sub-area of 153 km2 when PWP method is considered. Hence, the represen-
tation of the most rainfall intercepting meteorology station with the smallest sub-
area gives rise to a smaller value of AAE for the catchment area during convective
storms. The monthly AAEs of the catchment area in this month according to iso-
hyetal map, Thiessen Polygon, and arithmetic mean techniques are estimated as 44,
41, and 54%, respectively. They are smaller than the PWP method AAE value. The
areal rainfall variability in October is comparatively less than July because of the
north polar maritime air mass penetration into the study area, which gives rise to
frontal rainfall occurrences. It is well known that the frontal rainfalls have more
extensive areal coverage than the convective rainfall types and consequently the
percentages are smaller in the PWP method calculations. This is tantamount to say-
ing that percentage calculations from the three adjacent stations are not very dif-
ferent as they are in the convective rainfall events. It is timely to state that similar
effects can be expected from the orogrophic rainfall occurrences to the convective
rainfall types in mountainous areas. Such distinctions cannot be considered in the
Thiessen approach, where the sub-areas are determined according to the geometric
configuration of the station locations, without the consideration of actually recorded
rainfall amounts at these stations. For example, in October, the AAE calculations
by using the PWP method yielded 13, 15, and 18% smaller values than other three
2.8
Polygonizations
Fig. 2.22 Application map of isohyetal technique (a) January, (b) April, (c) July, (d) October
65
66
Fig. 2.23 Application map of PWP technique (a) January, (b) April, (c) July, (d) October
2 Data Types and Logical Processing Methods
2.9 Areal Coverage Probability 67
conventional methods. This is due to the lesser areal variability of rainfall in this
month than in July. Furthermore, the AAE amounts by use of the PWP method and
the annual average rainfall amounts are found 12, 14, 14% and on the average 13.5%
lesser than the isohyetal map, Thiessen polygon, and arithmetic average methods,
respectively. Arithmetic average method yields in January smaller AAE than all
other methods because this is the month where the effect of frontal rainfall is the
most dominant with high rainfalls in the area except low rainfalls at few stations,
such as at Van and Başkale stations as 42 and 47 mm, respectively. Last but not the
least, the arithmetic average method is affected from extreme rainfall values more
than any other method.
As the frontal and more homogeneous type of rainfalls are recorded during a
specific period, then it is expected that the results of the PWP and Thiessen poly-
gon methodologies will approach to each other, provided that there are approximate
rainfall records at each meteorological station. However, in practice the use of PWP
method is recommended. In fact, whatever the circumstances are, it is preferable to
adopt PWP method since it yields almost the same result as the Thiessen method if
the conditions are satisfied as the homogeneous type of rainfalls.
1 2 3
4 5 6
1 2 3 4 5 6 7 8 9
Time x
a b
y
Space
Time x
a b
If “dry” units are shown by black and “wet” units by white colors, then in
Fig. 2.25 there are four and five time and sub-areal dry units, respectively,
As the number of basic unit increases, then the “dry,” D, and “wet,” W, time and
spatial features start to become clearer as in Fig. 2.26.
In this section precipitation phenomenon is modeled as a random field where
time-wise probabilities at a fixed site are referred to as PoP, and spatial probabilities
for a fixed time instant are ACP. The areal probability is, in fact, the fraction of the
area hit by rainfall. It does not provide a means for depicting which of the sub-areas
is going to be hit with precipitation event. However, it simply represents the esti-
mate of what the fractional coverage will be. Furthermore, it is clearly a conditional
probability being conditioned by whether at least one precipitation event actually
occurs in the sub-area during a certain time interval.
y
Space
Time
K S K S
x
b
Fig. 2.26 Dry durations and areas
2.9 Areal Coverage Probability 69
The PoP at any desired threshold value, x0 , such as standard 0.01 inches is equiv-
alent to the exceedence probability of this value. If the pdf of precipitation at site i
is denoted by fi (X), then the PoP, pi , at this site is
!∞
pi = fi (X) dX.
x0
If the forecast area is thought in terms of n sample sites, then the average areal
probability p can be defined formally as
1
n
p= pi . (2.59)
n
i=1
In the special case where the point probability is uniform over the forecast area, p
is equal to the common value of PoP. Note that since p is a lumped value it does
not, in general, contain detailed information provided by the set of individual point
probabilities. However, if for some reason some of the sub-areas vary in size, then
each probability value must be percentage area weighted. It should be noted that the
areas associated with sub-areas need to be small enough that a single probability
value can be applied to each one.
In the most general case, none of the sites have equal PoPs, which implies that
the random field is heterogeneous. It is quite likely that probabilities might vary
from place to place even within a single area. In practice, in addition to the spatial
correlation variations, the following three situations give rise to heterogeneous and
anisotropic random fields. These are:
Let the PoP and its complementary probability at j-th site within a region of n
sites be given by pj and qj (j = 1, 2, . . ., n), respectively. They are point-wise mutu-
ally exclusive and complementary, i.e., pj + qj = 1. The ACP, P(A = i) including i
sites can be evaluated through enumeration technique after the application of sum-
mation and multiplication rules of probability theory. Hence
n n n $
n
P(A = i) = 1
i! pj1 pj2 . . pji qk
j1=1 j2=1 ji k=1 (2.60)
j2 = j1 ji = ji−1 k = j1 ,
70 2 Data Types and Logical Processing Methods
or succinctly
n n n $
n
P(A = i) = 1
i! pj1 pj2 . . . pji qk
j1 =1 j2 =j1 ji k=1 (2.61)
j2 = j1 ji = ji−1 k = ji ,
% &
$
n n $
n
P(A = i) = 1
i! pjk1 qk2
k1 =1 jk1 =1 k2 =1 (2.62)
jk1 = jm k2 = jm
m = 1, 2, . . . jkm−1 ,
where the multiplication of i summations in the brackets includes all the possible
combinations of i precipitation occurrences at n sites, whereas the last multiplication
term corresponds to possible no-precipitation combinations. For identical PoPs the
term in brackets simplifies to n(n–1). . .(n–i+1)pi and the last multiplication to qn—I ,
hence Eq. (2.62) reduces to
n
P(A = i) = pi qn−i , (2.63)
i
which is actually the Binomial distribution with two-stage Bernoulli trials (Feller,
1967).
This is the well-known Binomial distribution with mean np and variance npq.
The probability, pA , that all the sites, hence area, are covered by precipitation at an
instant can be found from Eq. (2.62) as
'
n
pA = pi . (2.64)
i=1
p > pA , (2.65)
and accordingly
min (p1 ,p2 , . . . ,pn ) < p < max (p1 ,p2 , . . . ,pn ). (2.66)
Assuming that any one of the sites is equally likely to experience a precipitation, the
conditional probability that only a certain group of i sites has precipitation can be
written as
k1
i
P(X1 > x0 , X2 > x0 , . . . ,Xk1 > x0 /A = i) = .
n
By definition the joint probability becomes
k1
i
P(X1 > x0 , X2 > x0 , . . . ,Xk1 > x0 ,A = i) = P(A = i).
n
The conditional ACP of precipitation can then be obtained after dividing this last
expression by Eq. (2.67), which leads to
k1
i
P(A = i/X1 > x0 , X2 > x0 , . . . ,Xk1 > x0 ) = P(A = i). (2.68)
np
Hence, in order to obtain the most general case of heterogeneous PoP’s conditional
areal coverage, it is necessary to substitute Eq. (2.62) into Eq. (2.68). On the other
hand, for homogeneously distributed PoPs, Eq. (2.68) takes its explicit form by the
substitution of Eq. (2.63)
k1
i n
P A = i/X1 > x0 , X2 > x0 , . . . ,Xk1 > x0 = pi qn−1 . (2.69)
np i
Since some initial information is given with certainty, the conditional probability in
Eq. (2.69) has more information content than the original unconditional distribution
in Eq. (2.63). This fact can be objectively shown by considering the variances of
pdfs in Eqs. (2.63) and (2.69), which are npq and (n–1)pq, respectively. Hence,
the conditional variance is smaller, and therefore conditional ACP is more certain.
However, the unconditional areal precipitation coverage expectation from Eq. (2.63)
is equal to np, whereas the conditional coverage area expectation is greater and can
be obtained from Eq. (2.68) as np + q.
Similarly, the conditional ACP of precipitation, given that a group of k2 sites
have no-precipitation, can be found for homogeneous PoPs as
n−i n
P(A = i/X1 < x0 , X2 < x0 , . . . , Xk2 < x0 ,) = qi pn−1 . (2.70)
nq i
Finally, the conditional ACP of precipitation, given that a group of k1 sites has
precipitation and another group of k2 sites has no-precipitation, is obtained as
72 2 Data Types and Logical Processing Methods
P(A = i/X1 > x0 ,X2> x0 , . . . ,Xk1 > x0 ,Xk1+1 > x0 ,Xk1 +2 < x0 , . . .Xk2 < x0 ,)
( )( ) n
= nqi n−i
np qi pn−1 .
i
(2.71)
The probability expressions in Eqs. (2.68, 2.69, 2.70, and 2.71) can be effectively
employed to study regional precipitation occurrence patterns.
where P(A ≤ i) can be evaluated, in general, from Eq. (2.62) for heterogeneous PoPs
by employing
i
P(A ≤ i) = P(A = j).
j=0
It is then possible to rewrite Eq. (2.73) for homogeneous PoPs from Eq. (2.63)
explicitly as
⎡ ⎤m ⎡ ⎤m
i
i−1
n n
P(AM = i) = ⎣ pj qn−j ⎦ − ⎣ pj qn−j ⎦ .
j j
j=0 j=0
2.10 Spatio-Temporal Drought Theory and Analysis 73
Hence, the probability P(AM = n) that the whole area is covered by precipitation
can be obtained from Eq. (2.70) as
P(AM = n) = 1 − (1 − pn )m . (2.74)
One can interpret from this expression that for small regions the number of sites
is also small, in which case the probability in Eq. (2.74) is not zero and there is a
chance for the whole area to be covered by precipitation. Similarly, the probability
of minimum areal coverage, Am , of precipitation can be written for homogeneous
PoPs as
⎡ ⎤m ⎡ ⎤m
n n
n n
P(Am = i) = ⎣ pj qn−j ⎦ − ⎣ pj qn−j ⎦ . (2.75)
j j
j=i j=i+1
Let an agricultural land be divided into m mutually exclusive sub-areas, each with
the same spatial and temporal drought chance. The Bernoulli distribution theory can
be employed to find the extent of drought area, Ad , during a time interval, t. The
probability of n1 sub-areas stricken by drought can be written according to Bernoulli
distribution as (Feller, 1967)
m
Pt (Ad = n1 ) = pnr 1 qm−n
r
1 pr + qr = 1.0. (2.76)
n1
This implies that out of m possible drought-prone sub-areas, n1 have deficit and
hence the areal coverage of drought is equal to n1 or in percentages n1 /m. For the
subsequent time interval t, there are (m–n1 ) drought-prone sub-areas. Assuming
that the evolution of possible deficit and surplus spells along time axis is indepen-
dent over mutually exclusive sub-areas, similar to the concept in Eq. (2.76), it is
possible to write for the second time interval that
n2
m m − n1
P2t (Ad = n2 ) = pnr 1 pnr 2 −n1 qm−n 1 qm−n2 , (2.77)
n1 n2 − n1 r r
n1 =0
where n2 is the total number of drought-affected sub-areas during the second time
interval. By considering Eq. (2.76), this expression can be rewritten succinctly in
the form of recurrence relationship as
n2
P2t (Ad = n2 ) = Pt (Ad = n1 ) P (Ad = n2 − n1 ), (2.78)
n1 =0
74 2 Data Types and Logical Processing Methods
ni
Pit (Ad = ni ) = P(i−1)t (Ad = ni−1 ) P (Ad = ni − ni−1 ) . (2.79)
ni−1 =0
For i = 1 this expression reduces to its simplest case, which does not consider time
variability of drought occurrences as presented by Şen (1980) and experimentally
on digital computers by Tase (1976). Furthermore, the probability of agricultural
drought area to be equal to or less than a specific number of sub-areas j can be
evaluated from Eq. (2.79) according to
j
Pit (Ad ≤ j) = Pit (Ad = k). (2.80)
k=0
The probability of having, n1 , deficit sub-areas, given that there are n1 deficit sub-
areas at the beginning of the same time interval within the whole region, can be
expressed as
m n1 n1 −n1 m−n n1
Pt Ad = n1 |Ad = n1 = pnr 1 pt qr qt ,
n1 n1 − n1
or shortly,
n1 n1 −n1 n1
Pt Ad = n1 |Ad = n1 = Pt (Ad = n1 ) pt qt . (2.81)
n1 − n1
It should be noted that always n1 ≥ n1 and the difference j = n1 − n1 gives the
number of transitions. On the basis of Eq. (2.81), a general equation for the marginal
probability of observing n1 deficit spells at the end of the same time interval, after
simple algebra, becomes
m=ni
k = n n
Pt Ad = n1 = Pit Ad = k + ni i pkt qt i . (2.82)
k
k=0
Hence, the regional agricultural drought occurrences during the second time interval
follow similarly to this last expression, and generally, for the i-th step, it takes the
following form:
2.10 Spatio-Temporal Drought Theory and Analysis 75
PROBABILITY
0.3 t=1 2 3 4 5
multi-seasonal model m=10;
pr =0.3; pt =0.2)
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9 10
AREA (m)
m=ni
k + n k ni
Pit Ad = ni = Pit Ad = k + ni i pt qt . (2.83)
k
k=0
Its validity has been verified using digital computers. The pdfs of areal agricultural
droughts for this model are shown in Fig. 2.27, with parameters m = 10, pr =
0.3, pt = 0.2 and i = 1, 2, 3, 4, and 5.
The probability functions exhibit almost symmetrical forms irrespective of time
intervals although they have very small positive skewness.
Another version of the multi-seasonal model is interesting when the number of
continuously deficit sub-areas appear along the whole observation period. In such a
case, the probability of drought area in the first time interval can be calculated from
Eq. (2.76). At the end of the second time interval, the probability of j sub-areas
with two successive deficits, given that already n1 sub-areas had SMC deficit in the
previous interval, can be expressed as
n1 j n −j
P2t (Ad = j |Ad = n1 ) = Pt (Ad = n1 ) pt qt 1 . (2.84)
j
This expression yields the probability of having n1 sub-areas to have deficit out of
which j sub-areas are hit by two deficits, i.e., there are (n1 – j) sub-areas with one
deficit. Hence, the marginal probability of continuous deficit sub-area numbers is
m−j
k+j j
P2t (Ad = j) = Pt (Ad = k + j) pt qkt .
j
k=0
In general, for the i-th time interval it is possible to write
m−j
k+j j
Pit (Ad = j) = P(i−1)t (Ad = k + j) pt qkt . (2.85)
j
k=0
The numerical solutions of this expression are presented in Fig. 2.28 for
m = 10, pr = 0.3 and pt = 0.5. The probability distribution function is positively
skewed.
76 2 Data Types and Logical Processing Methods
PROBABILITY
i=2
pt =0.2) i=3
0.3
0.2
0.1 i=4
0.0
0 1 2 3 4 5 6 7 8 9 10
NUMBER OF SUBAREAS
m
Ei (Ad ) = kPit (Ad = k). (2.86)
k=0
m
Vi (Ad ) = k2 Pit (Ad = k) − E2i (Ad ). (2.87)
k=0
Substitution of Eq. (2.79) into Eq. (2.86) leads to drought-stricken average area
within the whole region as
i−1
Ei (Ad ) = mpr qkr , (2.88)
k=0
or succinctly
( )
Ei (Ad ) = m 1 − qir . (2.89)
1
qr = 0.2
0.9 0.4
0.8 0.6
Average drought coverage area
0.7
0.8
0.6
0.5
0.9
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Number of sub areas
Figure 2.29 shows the change of drought-stricken area percentage with the num-
ber of deficit sub-areas, i, for given deficit probability, qr .
For regional drought variations in the first time interval (i = 1), from Eq. (2.90),
P1 A = pr . On the other hand, for the whole area to be covered by drought, theoret-
ically i →c ∞ and therefore P∞ A = 1. It is obvious that the temporal agricultural
drought percentage for a region of m sub-areas at any time i, is pr ≤ Pi A ≤ 1.
As the probability of deficit spell in a region increases, the average drought
area attains its maximum value in a relatively shorter time as can be written from
Eq. (2.90) as
Hence, this expression provides the opportunity to estimate the average time period
that is required for a certain percentage of the region to be covered by drought.
Figure 2.30 indicates the change of i with pr , which is the surplus probability.
Furthermore, in practical applications the probability of deficit can be approxi-
mated empirically as 1/m or preferably as 1/(m+1). The substitution of these condi-
tions into Eq. (2.91) gives
This expression confirms that regional drought durations are affected mainly by its
size rather than its shape, as was claimed by Tase and Yevjevich (1978).
78 2 Data Types and Logical Processing Methods
7
Average areal drought coverage area
m = 10
6
5
qr = 0.8
1 0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10
Time
The next significant regional drought parameter is the variance, which is a mea-
sure of drought variability. In general, the smaller the variance the smaller is the
areal drought coverage percentage. The variance of the regional persistence model
can be found from Eqs. (2.76) and (2.79) after some algebra as
Vi (Ad ) = m 1 − qtr qtr . (2.93)
Furthermore, consideration of Eq. (2.90) together with this last expression yields
the relationship between percentages of average and variance drought coverage as
Figure 2.31 shows the change of regional drought variance percentage with i
deficit affected number of sub-areas at different times for a given deficit probability,
qr = 0.7.
2.10 Spatio-Temporal Drought Theory and Analysis 79
0.8 1
0.9
0.7 qr=0.2
qr=0.2
0.8
Variance of areal droughr coverage
0.6
0.3
0.8
0.2 0.6
0.2
0.9
0.8
Variance of areal drought coverage
i = 10 subareas
0.7
0.6
0.5
0.4
0.9
0.3 0.8
0.2 0.6
0.4
0.1
qr=0.2
0
0 5 10 15
Time
E1 (Ad ) = m (1 − qr ) (1 − pt ) , (2.96)
which exposes explicitly the contribution of the regional and temporal dry spell
effects on the average areal drought. Figure 2.32 shows drought spatial and temporal
average variations for the given probabilities.
Finally, from the variance of the drought area coverage by simple spatio-temporal
model considerations, it is possible to derive for the first time interval that
80 2 Data Types and Logical Processing Methods
4 m = 10
qr = 0.6
3
2
qr = 0.8
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
SMC deficit time probability
Fig. 2.32 Average areal drought coverage by considering spatio-temporal variations during the
first time interval
2.0
0.1
1.5
Vi (Ad)
1.0
0.0
0.5
0.0
0 1 2 3 4 5 6 7 8 9 10
Time (i)
References 81
1) Drought occurrences are dependent on the regional and temporal dry and wet
spell probabilities, as well as size of the region considered.
2) Drought area distribution within a region without considering temporal probabil-
ities becomes negatively skewed as its size increases. Initially, it can be approx-
imated by a normal distribution. For multi-seasonal model the same distribution
has an approximate normal pdf, provided that continuous drought duration is not
considered. Otherwise, it is positively skewed.
3) Drought probabilities over a region are more affected by its size.
References
Akin, J. E., 1971. Calculation of areal depth of precipitation. J. Hydrol. 12, 363–376.
Bayraktar, H., Turalioglu, F. S., and Şen, Z., 2005. The estimation of average areal precipitation by
percentage weighting polygon method in Southeastern Anatolia Region, Turkey. Atmos. Res.
73, 149–160.
Box, G. E. P., and Jenkins, G. M., 1976. Time Series Analysis, Control and Forecasting. Holden
Day, San Francisco.
Bras, R. L., and Rodriguez-Iturbe, I., 1985. Random Functions and Hydrology. Addison-Wesley
Publishing Co., Reading, MA, 599 pp.
Bruce, J. P., and Clark, R. H., 1966. Hydrometeorology. Pergamon Press, New York, 319 pp.
Chow, V. T., 1964. Handbook of Applied Hydrology. McGraw-Hill, New York.
Chu, P. S., and Karz, R.W., 1985. A time-domain approach. Mon. Wea. Rev. 113, 1876–1888.
Cramer, H., 1946. Mathematical Methods of Statistics. Princeton University Press, New Jersey,
p. 213.
Davis, A., 2002. Statistics Data Analysis Geology. John Wiley and Sons, London, 638 pp.
Dirichlet, G. L., 1850. Über die Reduktion der positiven quadratischen Formen mit drei unbes-
timmten ganzen Zahlen. Journal für die Reine und Angewandte Mathematik 40, 209–227.
Feller, W., 1967. An Introduction to Probability Theory and its Application. John Wiley and Sons,
New York, 509 pp.
Fisher, R. A., 1912. On an absolute criterion for filtering frequency curves. Messenger Math. 41,
155.
Gauss, K. F., 1809. Theory of the motion of the heavenly bodies about the sun in conic section.
Dover, New York (1993).
Hevesi, J. A., Istok, J. D., and Flint, A. I., 1992. Precipitation estimation in mountainous terrain
using multi-variate geostatistics, Part I: Structural analysis. J. Appl. Meteorol. 31, 661–676.
Huff, F. A., and Neill, J. C., 1957. Areal representativeness of point rainfall. Trans. Amer. Geophys.
Union 38(3), 341–351.
82 2 Data Types and Logical Processing Methods
Jackson, I. J., 1972. Mean daily rainfall intensity and number of rainy days over Tanzania. Geogr.
Ann. A. 54, 369–375.
Kedem, B., Chiu. L. S., and Karni, Z., 1990. An analysis of the threshold method for measuring
area-average rainfall. J. App. Meteor. 29, 3–20.
Kendall, M. G., 1954. Note on the bias in the estimation of autocorrelation. Biometrika 42,
403–404.
Koch, S., and Link, R. F., 1971. Statistical Analysis of Geological Data. Dover Publications, New
York, 375 pp.
Kolmogorov, A. N., 1941. Interpolation and extrapolation von stationaren zufalligen folgen. Bull.
Acad. Sci. USSR, Ser. Math. 5, 3–14.
Quenouille, M. H., 1956. Notes on bias in estimation. Biometrika 43, 353–360.
Schubert, S. D., Saurez, M. J., and Schemm, J. K., 1992. Persistence and predictability in a perfect
model: J. Atmos. Sci. 49(5), 256–269.
Şen, Z., 1974. Small sample properties of stationary stochastic processe4s and Hurst phenomenon
in hydrology. Unpublished Ph. D. Thesis, Imperial College of Science and Technology, Uni-
versity of London, 256 pp.
Şen, Z., 1980. Regional drought and flood frequency analysis: theoretical considerations. J. Hydrol.
46, 265–279.
Şen, Z., 1998a. Small sample estimation of the variance of time-averages in climatic time series.
Int. J. Climatol. 18, 1725–1732.
Şen, Z., 1998b. Average areal precipitation by percentage weighted polygon method. ASCE J.
Hydrol. Eng. 3(1), 69–76.
Şen, Z., 2002. İstatistik Veri İşleme Yöntemleri (Hidroloji ve Meteoroloji) (Statistical Data Treat-
ment Methods – Hydrology and Meteorology). Su Vakfı yayınları, 243 sayfa (in Turkish).
Şen, Z., 2008. Kuraklik Afet ve Modern Modelleme Yontemleri (Drought Disaster and Modern
Modeling Methods). Turkish Water Foundation (in Turkish).
Stout, G. E., 1960. Studies of severe rainstorms in Illinois. J. Hydraul. Div. Proc. ASCE, HY4,
129–146.
Summer, G., 1988. Precipitation Process and Analysis. John Wiley and Sons, New York, 455 pp.
Tabios, G. O. III, and Salas, J. D., 1985. A comparative analysis of techniques for spatial interpo-
lation of precipitation. Water Resour. Bull. 21, 365–380.
Tase, N., 1976. Area-deficit-intensity characteristics of droughts. Hydrology Paper 87, Colorado
State University, Fort Collins.
Tase, N., and Yevjevich, Y., 1978. Effects of size and shape of a region on drought coverage.
Hydrol. Sci. Bull. 23(2), 203–213.
Thiessen, A. H., 1912. Precipitation averages for large areas. Mon. Wea. Rev. 39, 1082–1084.
Voronoi, G., 1907. Nouvelles applications des paramètres continus à la théorie des formes quadra-
tiques. Journal für die Reine und Angewandte Mathematik 133, 97–178.
Wiesner, C. J., 1970. Hydrometeorology. Chapman and Hall Ltd., London, 232 pp.
Wiener, N., 1942. The extrapolation, interpolation and smoothing of stationary time series. OSRD
370, Report to the Services 19, Research Project DIC-6037, MIT.
Wiener, N., 1949. The Extrapolation, Interpolation and Smoothing of Stationary Time Series. John
Wiley & Sons, Inc., New York.
Wilson, J. W., and Atwater, M. A., 1972. Storm rainfall variability over Connecticut. J. Geophys.
Res. 77(21), 3950–3956.
Zadeh, L. A., 1965. Fuzzy sets. Information Control 8, 338–353.
Zadeh, L.A., 1973. Outline of a new approach to the analysis of complex systems and decision
processes. IEEE Trans. Syst. Man. Cybern. 3, 28–44.
Chapter 3
Classical Spatial Variation Models
Abstract Men have tried to model spatial behavior of the natural phenomena since
a long time, with initiative simple models such as the weighting functions, which
are supposed to represent regional dependence structure of the phenomenon con-
cerned. Unfortunately, commonly employed weighting functions are not actually
data-dependent and hence they are applicable invariably in each spatial prediction,
which is not convenient since each spatial phenomenon will have its own spatial
dependence function. Spatial data distribution can be uniform, randomly uniform,
homogeneous, isotropic, clustering, etc., which should be tested by a convenient test
as described in the text. Besides statistically it is also possible to depict the spatial
variation through trend surface fit methods by using least squares technique. In this
chapter, finally, adaptive least squares techniques is suggested in the form of Kalman
filter for spatial estimation.
3.1 General
The spatial nature of earth sciences phenomena expelled the researchers to explore
spatial statistical procedures whereas the classical statistics remained at the service
as usual. In general, any phenomenon with spatial variations is referred to as the
ReV by Matheron (1963). ReVs fall between the random fields where the spatial
variations have independence and deterministic variability depending on the spa-
tial correlation value. The most significant methodologies of the spatial analysis is
Kriging, which is discussed in Chapter 5, but prior to their development earth scien-
tists were using different empirical and geometrical rational approaches in assessing
ReVs. Hence, in this chapter these rather preliminary and simple but effective meth-
ods will be discussed with their drawbacks. The trend surface analysis is an offshoot
of statistical regression; kriging is related to time series analysis, and contouring is
an extension of interpolation procedures.
Time
a
*A
*B
*C
*F *D
0 10 km
*E
attached at a particular time and sampling point. Hence, the first question is how to
deal with rather uncertainly (randomly) varying data? At times the data are random,
sometimes chaotic, and still in other cases irregular or very regular. These changes
can be categorized into two broad classes as systematic and unsystematic. System-
atic data yield mathematically depictable variations with time, space, or both. For
instance, as the depth increases so does the temperature and this is an example for
systematic variation. Especially, if there is only one type of geological formation,
then systematic variation becomes more pronounced. Otherwise, on the basis of
rather systematic variation on the average, there are unsystematic deviations, which
might be irregular or random. Systematic and unsystematic data components are
shown in Fig. 3.2.
In many studies systematic changes are referred to as the deterministic compo-
nents, which are due to systematic natural (geography, astronomy, climatology) fac-
tors that are explainable, and unsystematic variations are unexplainable or random
parts that need special treatment.
Data Systematic
Random
Fig. 3.2 Systematic and
unsystematic components Time
3.4 Spatial Data Analysis Needs 87
◦ Z < 189
+ 189 < Z < 536
∗ 536 < Z < 936
x 936 < Z < 1310
1310 < Z < 1632
♦ Z > 1632
These symbols help to make two types of interpretations, namely interclass and
smaller or less than a given class limits, two mutually exclusive but collectively
exhaustive groups. According to the attached symbols, one can deduce the following
interpretations:
a) The majority of small ReVs are gathered at high easting but low northing regions.
Two exceptional clusters of the same type appear at low easting and northing
regions and also in high easting but medium and high northing regions. In short,
there are three clusters of small ReVs. Each cluster has its special dependence
88 3 Classical Spatial Variation Models
feature different than others. The frequency of low ReV occurrences fall within
this type of data.
b) The ReV variability with values between 189 and 536 takes place also at high
easting and low northing region, with the most concentrated cluster of ReVs less
than 189. In fact, these two groups can be assumed as one cluster in larger scale
as ReV values less than 536.
c) The ReV values that lie between 536 and 936 are scattered almost all over the
sampling area, but they do not fall within the major sampling area where easting
is high and northing is low. This group of sampling points has the maximum ReV
variability over the region in an independent manner. In other words, although
ReVs less than 189 have high regional variability, but at least they are fathered
in three clusters.
d) Those ReV values between 536 and 1,310 show a single cluster within the
medium easting and northing region, without intermixing with other classes.
They show a directional extension along the northeast southwest trend in
Fig. 3.4.
e) The high values of ReV more than 1,310 are gathered at medium easting but
high northing region of the sampling area.
3.4 Spatial Data Analysis Needs 89
30 30
25 25
20 20
Frequency
Frequency
15 15
10 10
5 5
0 0
0 200 400 600 800 100012001400 16001800 2000 0 200 400 600 800 100012001400 16001800 2000
ReV in meters Rev in meters
a b
30 30
25 25
20 20
Frequency
Frequency
15 15
10 10
5 5
0 0
0 200 400 600 800 100012001400 16001800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Rev in meters Rev in meters
c d
Fig. 3.3 Relative frequency diagram of ReV with (a) 5-class; (b) 6-class; (c) 7-class; (d) 8-class
45
40
Northing (m)
35
30
25
Fig. 3.4 Spatial pattern of 36 37 38 39 40 41 42 43
each class Easting (m)
90 3 Classical Spatial Variation Models
f) If the symbols are followed from the smallest to moderate and then to the biggest
ReV values in Fig. 3.4, it is clear that there is an increasing trend from low nor-
thing sampling locations toward high northing direction. In the middle northing
region along the easting direction, there is a hill with low values at low and high
easting regions.
The above interpretations of the ReV data scatter based on the sampling locations
yield clues that become very useful in actual modeling scheme. So far only linguistic
interpretations are deduced and they are the basic ingredients or rules in fuzzy logic
system modeling (Zadeh, 1968; Şen, 2004).
It is important to notice from the qualitative information that distinctive features
are depicted by considering the differences or dissimilarities between the scatter
points and the attached ReV values at each point. The first difference implies the
distances between the sampling points and the second is the difference between
the ReV values at two sites. Hence, in any quantitative study these two differences
should be taken into consideration. These differences are considered independently
from each other, but there may be regional dependence between them. For instance,
visualization of a trend in step f implies such a regional dependence. As will be
explained in Chapter 5, the relationship between the distance of two sampling points
and the difference in their ReV values can be depicted by semivariogram (SV)
concept.
2. Univariate data description: It is necessary and helpful to explore the uni-, bi-,
and multivariate statistical properties of the ReV data irrespective of sampling point
locations. The second step in geostatistics (or, for that matter, any statistical analy-
sis) is to describe the data using univariate (and if possible, bivariate or multivariate)
descriptive statistics. Among the univariate descriptive statistics are the arithmetic
mean, median, mode, standard deviation, skewness, kurtosis, etc.
These parameters can be obtained either from the frequency diagram (as in Fig.
3.3) or through the classical mathematical expression of the parameters (Koch and
Link, 1971). Often used parameters are presented in Table 3.2 for the ReV data in
Table 3.1.
In spatial analysis one should note that there is only one univariate parameter
value for any given ReV variable. As will be explained later in more detail, the
Minimum 3
Mode 4
Median 125
Arithmetic mean 506.05
Maximum 1,869
Standard deviation 571.11
Skewness 0.89
Kurtosis 0.41
3.4 Spatial Data Analysis Needs 91
40.78 30.42 31
40.52 30.3 100
41.17 29.04 130
39.62 27.92 120
40.18 29.07 100
40.32 27.97 58
Parameters Value
Minimum 31
Mode 100
Median 100
Arithmetic average 89.83333
Maximum 130
Standard deviation 37.96007
Skewness 37.96007
Kurtosis 0.363526
univariate parameter values do not change with direction in the sampling area. What-
ever the direction, although, the projection of sampling point sequence and distances
will change but the univariate statistical parameters will remain the same. This
explains why the univariate parameters cannot be useful in the spatial dependence
search directly. The spatial variation can be measured by considering comparatively
the properties of at least two ReV values at two distinctive sampling locations,
3. Bivariate data description: This can de achieved by considering at least two
sampling points simultaneously by comparing their ReV values. It can be achieved
simply by considering either a direction or without any direction. In order to famil-
iarize the reader with each one of these approaches, six ReV values with their easting
and northing values are considered from Table 3.1 and presented in Table 3.3. The
scatter of sampling points in this case is given in Fig. 3.5.
30.5
30
29.5
Northing (m)
29
28.5
28
27.5
39.6 39.8 40 40.2 40.4 40.6 40.8 41 41.2 41.4
Fig. 3.5 ReV sampling
locations Easting (m)
92 3 Classical Spatial Variation Models
30.5 30.5
5 5
4 4
30 30
A
29.5 6 29.5
Northing (m)
Northing (m)
5 B 1
29 3 4 6 29 3 6
2
2 3 4 5
3 B
6
28.5 28.5
1
A
28 28
2 1 2
1
27.5 27.5
39.6 39.8 40 40.2 40.4 40.6 40.8 41 41.2 41.4 39.6 39.8 40 40.2 40.4 40.6 40.8 41 41.2 41.4
Easting (m) Easting (m)
a b
C 30.5
30.5 5
5 4
4
4 30
30
5 d34 d35
29.5
29.5
Northing (m)
Northing (m)
d36
29 3 6
29 3 3 6
d32
d31
6 28.5
28.5
1 28
28 2
1 2 2 1
27.5 27.5
39.6 39.8 40 40.2 40.4 40.6 40.8 41 41.2 41.4 39.6 39.8 40 40.2 40.4 40.6 40.8 41 41.2 41.4
C Easting (m)
Easting (m)
c d
30.5 30.5
5 5
4 4
30 30
d64 d64
d65 d65
29.5 29.5
Northing (m)
Northing (m)
d63 d63
6 29 3 6
29 3
28 28
2 2
1 1
27.5 27.5
39.6 39.8 40 40.2 40.4 40.6 40.8 41 41.2 41.4 39.6 39.8 40 40.2 40.4 40.6 40.8 41 41.2 41.4
e f
Fig. 3.6 (a) A–A direction; (b) B–B direction; (c) C–C direction; (d) Partial directional
(Point 3) configuration; (e) Partial directional (Point 6) configuration; (f) Unidirectional (Global)
configuration
On the basis of the data in this table, the spatial relationship between the distance
and the ReV values can be thought in three categories, namely punctual, directional,
and global assessments.
In the directional case all the sampling points are projected onto desired num-
ber of directions and the ReV values are considered not at the sampling points but
3.5 Simple Uniformity Test 93
* *
100 * *
50 *
* Distance (m)
1 3 24 5 6
b
ReV (m)
* *
100 *
50 * *
Distance (m)
4,5 3 6 1 2
c
In the case of sampling point scatter as in Fig. 3.4, it is necessary to decide whether
the points in each sub-quadrangle (sub-area) arranged in a manner have more or
94 3 Classical Spatial Variation Models
less the same uniformity? In the case of uniformity, there is no superiority among
sub-areas and each sub-area has the same likelihood of sampling point occurrences.
If there are n sampling points over the whole area and the number of sub-areas is k,
then the expected average uniform sampling number, UE , for each sub-area is
n
UE = . (3.1)
k
This indicates the average number of sampling points per sub-area. However, the
actual sampling point count, Ci , (i = 1, 2, . . ., k) in i-th sub-area is different from
each other and in general from the uniform sampling number. It is possible to check
the uniformity by the Chi-square test as
k
(Ci − UE )2
χ2 = . (3.2)
UE
i=1
The Chi-square distribution has ν = k − 2 degrees of freedom, and one can find
from the Chi-square distribution tables in any statistical text book (Benjamin and
Cornell, 1970) the critical Chi-square value, χcr2 , that corresponds to this degree of
freedom. If χ 2 ≤ χcr2 , then the distribution of the points in each sub-area over the
whole area is uniform.
Example 3.1
The positions of many earthquake measurement stations are given in Fig. 3.8, and
it has been divided into nine equal sub-areas. The question is whether the spatial
scatter of station locations is uniformly distributed in the area or not?
90
80
70 1 2 3
60
Northing (km)
50
40 4 5 6
30
20
10 7 8 9
0
Fig. 3.8 Earthquake station 0 10 20 30 40 50 60 70 80 90
location scatter Easting (km)
3.6 Random Field 95
1 7
2 15
3 8
4 7
5 6
6 12
7 4
8 5
9 12
Total 75
Table 3.4 indicates the number of stations in each sub-area where the station
locations on the boundary are considered as belonging to the right and upper sub-
areas.
According to Eq. (3.1), the expected average uniform number of stations in each
sub-area is 75/9 = 8.33, which can be taken as a round number equal to 8. The
application of Eq. (3.4) leads to χ 2 = 14. Since the degree of freedom is ν = 9 – 2
= 7, the critical χcr2 value at 5% significance level appears as 14.2 (i.e., χ 2 ≤ χcr2 )
and hence the distribution of the points is almost uniform.
m
λ= , (3.3)
A
which lies between 0 and 1 exclusive. Now the total area is considered in terms
of very small sub-areas where the number of sub-areas, n, is much larger than the
number, m, sampling point (n >> m). This means that m/n goes to zero as the number
of sub-areas increase. Each one of the n sub-areas can be considered as a pixel (see
Fig. 2.2), which is very small, compared to total area. This makes it possible to
consider that each pixel is almost equal to the influence area of one sampling point.
It is, therefore, impossible to have two sampling points within each pixel. Hence,
the area of influence for each sampling point can be calculated as
96 3 Classical Spatial Variation Models
A
a= , (3.4)
n
which is the pixel area. Since each sampling point has the occurrence probability
of λ per area, then in the influence area the probability of sampling occurrence
becomes
A
po = λa = λ . (3.5)
n
Accordingly, the probability of non-occurrence of the sampling point in a pixel can
be obtained simply as
A
pn = 1 − p0 = 1 − λ . (3.6)
n
If the question is to find k sampling sites in n pixels, then in the remaining n−k
pixels there will not be any sampling point. Since the spatial sampling point occur-
rences and non-occurrences are independent from each other, the production oper-
ation of the probability theory gives the probability of k sampling point occurrence
according to Binomial pdf as
n n A k A n−k
pk = (p0 )k (pn )n−k = λ 1−λ . (3.7)
k k n n
After this line, the remaining is the application of pure mathematical principles; and
as n goes to very large numbers (mathematically positive infinity), then Eq. (3.7)
takes its new shape as
which has the name of Poisson distribution in common statistics. This equation
requires only the rate of sampling points, λ (Eq. 3.3), sampling point number, k, and
the area of the concerned region. The combined value λA indicates the mean number
of stations per quadrant. All these values are practically observable or calculatable,
and therefore it is possible to calculate pk from Eq. (3.8). If for a given confidence
interval (90 or 95%) the critical probability value, pcr is found from the Poisson
distribution tables in any textbook on statistics (Benjamin and Cornell, 1970) and if
pk < pcr , then the sampling points are randomly distributed in the area.
However, in earth sciences the distribution of sampling points is exactly neither
random nor uniform.
Example 3.2
It is now time to search for the spatial random character of the same example given
in the previous section. As shown in Fig. 3.9 the whole area is divided into 9×9 =
81 quadrants.
3.6 Random Field 97
70
60
Northing (km)
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90
Easting (km)
The mean number of stations in each quadrant is λA = 75/81 = 0.926. The first
column in Table 3.5 shows the number of quadrants with 1, 2, 3, etc., stations. The
probability of occurrence calculations is given in the second column.
According to Eq. (3.2) one can calculate that χ 2 = 1.064. Since there are five
categories, the degree of freedom is ν = 5 – 2 = 3. The critical value of χ 2 for ν = 3
and the significance level of 5% is 7.81. The test statistics is less than this critical
value and so the hypothesis of randomness is acceptable.
The mean number of stations per quadrant λA and its variance can be esti-
mated as
T
(r1 − m/T)2
i=1
s2 = , (3.9)
T−1
where ri is the number of stations per quadrant and T is the number of quadrantsis. It
is well known that in Poisson distribution the arithmetic average is equal to variance,
and by using this rule it is possible to make further interpretations. For instance, if
Number of quadrants
Number of Equation (3.8)
stations calculations Theoretical Actual
0 0.3961 32 31
1 0.3668 30 29
2 0.1698 14 14
3 0.0524 4 6
4 0.0121 1 1
Total 0.9972 81 81
98 3 Classical Spatial Variation Models
the arithmetic average is greater (smaller) than the variance, the scatter of stations
is more uniform (clustered) than random. If the two parameters are equal to each
other, the scatter of stations accords with a complete random behavior. However,
at this stage it must be kept in mind that some sampling differences occur between
these two parameters and in practice it is not possible to have them equal in an exact
manner.
k n
n+k−1 po 1
Pk = , (3.10)
k 1 + po 1 + po
λ = npo . (3.11)
1
P0 = , (3.12)
(1 + po )n
helps to calculate the subsequent probabilities. The clustering parameter can be cal-
culated as
m 2
n= T
, (3.14)
s2 − m
T
where s2 is the variance in the number of occurrences per tract, which is defined as
m
λA = ,
T
or
m
T= . (3.15)
λA
Example 3.3
The same example given in the previous section can be adopted for the application of
cluster test. It is already calculated that λA = 0.926. The variance, s2 , in the number
of occurrences per quadrangle (sub-area) is 0.897. With these values at hand, the
clustering effect from Eq. (3.14) becomes n = 27.74. Hence, from Eq. (3.12) p0 =
0.0323, and it is possible to calculate from Eq. (3.13) the probability that a given
quadrant will have 1, 2, 3, etc., stations as shown in Table 3.6.
In order to test the theoretical values with the actual correspondences, it is nec-
essary to apply chi-square test, which yields χ 2 = 1.249. Since there are five cate-
gories, the degree of freedom is ν = 5 – 2 = 3. The critical value of χ 2 for ν = 3
and the significance level of 5% is 7.81. The test statistics is less than this critical
value and so the hypothesis of randomness is acceptable.
Table 3.6 Negative exponential distribution expected number of quadrants with r stations
Number of quadrants
Number of stations Probability Theoretical Actual
1 0.4140 33.5 31
2 0.3593 29.1 29
3 0.1616 13.1 14
4 0.0501 4.0 6
5 0.0120 1.0 1
Total 1.0020 81.0 81
100 3 Classical Spatial Variation Models
1
dT = L. (3.17)
2
The variance of the average distance between two points can be calculated ratio-
nally as
(4 − π )
σT2 = A. (3.18)
4π n2
In the derivations of Eqs. (3.17) and (3.18), the area is assumed without boundary,
i.e., very extensive. However, this is not the situation in practice, and therefore these
statistical parameters are without areal extent restrictions and hence they provide
under-estimations. If the constants are worked and the standard error of estimate is
calculated, the result becomes
0.26136
sTe = . (3.19)
a
n2
If the number of sampling points is more than six, then the distribution of the average
distance between nearest neighbors comply with a normal pdf. The mean and the
variance of this pdf are given by Eqs. (3.17) and (3.19), respectively. Hence, for
decision the standardized normal pdf value, x, can be obtained as
d − dT
x= . (3.20)
sTe
As mentioned earlier the theoretical Eqs. (3.17) and (3.18) are in under-estimations,
and therefore a correction factor less than 1 must be imported. Many researchers
suggested different adjustments, but the one given by Donnelly (1978) found
frequent use. According to his extensive numerical simulation studies, the theoreti-
cal mean distance and its variance can be expressed as (Davis, 2002)
3.8 Nearest Neighbor Analysis 101
• • • • • • • • • • • • •
• • • • • • • • • • • • • •
• • • • • • • • • • • • •
• • • • • • • • • • • • • •
• • • • • • • • • • • • •
• • • • • • • • • • • • • •
• • • • • • • • • • • • •
• • • • • • • • • • • • • •
a b
• • • • • • • • • • •
• • • • •• • • • • • • •
•
• • • • • • • •
•
• • • • • • • • • • •
•
• • • • • • • •
• • • •
• • • •• • • • • • •
• • • • • • •
• • • • • • • •
•
•
c d
• • •••
• •
•
• • •
• ••• ••
• • •• •
•
••• ••• •• •
• • • • •
•••• •
• •
• •
e f
Fig. 3.10 Nearest-neighbor ratio statistics
*
1 A 0.412 P
dT ≈ + 0.514 + √ , (3.21)
2 n n n
and
√
A A
σT2 ≈ 0.0070 2 + 0.035P 2.5 , (3.22)
n n
where P is the perimeter of the regular map. The ratio, Rd , of the expected and
observed mean nearest-neighbor distances can be used to indicate the spatial pat-
tern as
102 3 Classical Spatial Variation Models
d
Rd = . (3.23)
dT
This ratio approaches to zero where all the sampling points are very close to each
other with almost negligible average distance. Another extreme point appears as Rd
approaches 1, where the random field scatter of sampling points takes place. When
the mean distance to the nearest neighbor is maximized, then Rd takes its maximum
value as 2.15. Figure 3.10 indicates the sampling point distribution with different
distance ratio indices.
The distance ratio indices for Fig. 3.8a to f are equal to 2.15, 1.95, 1.20, 0.89,
0.31, and 0.11, respectively.
a) Quadrant search
x
x x
x x
x x
x x
x
x
x x
x
x x
b) Octant search
Any constraints on the search for the nearest points, such as a quadrant or octant
requirement, will obviously expand the size of the neighborhood around the estima-
tion point. This occurs because some nearby control points are likely to be passed
over in favor of more distant points, in order to satisfy the requirement that only a
few points may be taken from a single sector. Unfortunately, the autocorrelation of
a surface decreases with increasing distance, so more remote points are less closely
related to the estimation point. This means the estimate may be poorer than if a
simple nearest neighbor search procedure is used.
(a) They do not take into consideration the natural variability of the regional vari-
ability features. For instance, in meteorology, Cressman (1959) weightings are
given as
⎧
⎨ R2 −r2i,E
for ri,E ≤ R
W(ri,E ) = R2 +r2i,E , (3.24)
⎩0 for r ≥ R
i,E
where W(ri,E ) corresponds to Wi in Eq. (2.27); di,E is the distance between esti-
mation point and other points; R is the radius of influence, which is determined
subjectively by personal experience.
(b) Although weighting functions are considered universally applicable all over
the world, they may show unreliable variability for small areas. For instance,
within the same study area, neighboring sites may have quite different weight-
ing functions.
3.9 Search Algorithms 105
w = exp ⎡–4 ( r ) ⎡
2
⎣ R ⎣
2 2 4
w= R –r
R + r2
2
0
0 1
r
R
(c) Geometric weighting functions cannot reflect the morphology, i.e., the regional
variability of the phenomenon. They can only be considered as practical first
approximation tools.
The inclusion of α has alleviated the aforesaid drawbacks to some extent, but
its determination still presents difficulties in practical applications. Another form
of geometrical weighting function was proposed by Sasaki (1960) and Barnes
(1964) as
( )
ri,E 2
W(ri,E ) = exp −4 . (3.26)
R
In reality, it is expected that weighting functions should reflect the spatial depen-
dence behavior of the phenomenon. To this end, regional covariance and SV func-
tions are among the early alternatives for the weighting functions that take into
account the spatial correlation of the phenomenon considered. The former method
requires a set of assumptions such as the Gaussian distribution of the regionalized
106 3 Classical Spatial Variation Models
variable. The latter technique, semivariogram (SV), does not always yield a clear
pattern of regional correlation structure (Şen, 1989).
z = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y2 , (3.27)
where ai ’s are the model coefficients to be estimated from the available data recorded
at a set of geographical points within the study area. Although there are many sta-
tistical models, this model is appropriate for a smooth concave and convex surface.
The locations of local maximum and minimum can be obtained from Eq. (3.27) by
taking the partial derivative with respect to x and y, which yields
∂z
= a1 + 2a3 x + a4 y,
∂x
and
∂z
= a2 + a4 y + 2a5 y.
∂y
When these derivatives are set equal to zero, the solution for the extreme points
become
a2 a4 − 2 a1 a5
xe = ,
4 a3 a5 − a24
and
a1 a4 − 2 a2 a3
ye = .
4 a3 a5 − a24
3.10 Trend Surface Analysis 107
1) The basic variables are the geographical coordinates, which give the exact loca-
tions of the sampling points in the study area. For trend analysis the longitude
and latitude are converted to easting and northing values with respect to a com-
mon reference point. In trend analysis the choice of the reference point does not
make any difference in further calculations.
2) The trend component is the regional representation of the event in rather deter-
ministic manner, which has the form
∧
z = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y2 , (3.28)
1) Consider the main Eq. (3.27) and then take the arithmetic average of both sides,
which leads to
z = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y2 . (3.29)
This is the first estimation expression where all the arithmetic averages can be
obtained from an available data set. Since there are six coefficients as unknowns,
it is necessary to obtain five more expressions.
2) Multiply both sides of the main equation by the first term variable on the right-
hand side and then take the arithmetic averages. This procedure yields finally,
zx = a0 x + a1 x2 + a2 yx + a3 x3 + a4 x2 y + a5 xy2 . (3.30)
3) Apply the same procedure as in the previous step, but this time multiplying both
sides by the second independent variable, i.e., y, which gives
4) Multiply both sides by x2 and then take the arithmetic average of both sides,
which leads to
zx2 = a0 x2 + a1 x3 + a2 x2 y + a3 x4 + a4 x3 y + a5 x2 y2 . (3.32)
5) This time the independent variable is xy and accordingly its use under the light
of aforementioned procedure results in
Finally, considering the last independent term variable as y2 , the same rule yields
the following expression
y2 z = a0 y2 + a1 y2 x + a2 y3 + a3 y2 x2 + a4 xy3 + a5 y4 . (3.34)
Hence, the necessary equations are obtained for the trend surface model coefficients
estimation. In matrix notation these equations can be written explicitly as
3.10 Trend Surface Analysis 109
⎡ ⎤−1
⎡ ⎤ 1 x y x2 xy y2 ⎡ ⎤
a0 z
⎢ ⎥
⎢ a1 ⎥ ⎢ x x2 xy x3 x2 y xy2 ⎥ ⎢xz ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ a2 ⎥ ⎢ y xy y2 x2 y xy2 y3 ⎥ ⎢yz ⎥
⎢ ⎥=⎢ ⎥ ⎢ ⎥
⎢ a3 ⎥ ⎢ x2 x3 x2 y x4 x3 y x2 y2 ⎥ ⎢ x2 z ⎥ .
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ a4 ⎦ ⎢ ⎥ ⎣ xyz ⎦
⎣ xy x2 y xy2 x3 y x2 y2 xy3 ⎦
a5 y2 z
y2 xy2 y3 x2 y2 xy3 y4
The right-hand side of this last expression includes various averages that are calcula-
ble from a given set of data, and therefore the parameter estimation is possible after
the inverse of square matrix. This matrix is symmetrical around the main diagonal.
It is also possible to construct the titles of calculation table by considering these
averages as in Table 3.7 the following table titles.
The arithmetic average of each column gives the elements of each matrix on the
right-hand side of Eqs. (3.31, 3.32, 3.33, 3.34, 3.35, 3.36, and 3.37).
Example 3.4
A set of earthquake records are given for some part of Turkey, and the magnitude is
required to be related to easting (x) and northing (y) coordinates for the prediction of
Richter magnitude. The first three columns of Table 3.8 give the easting, northing,
and earthquake magnitude values, respectively. For the sake of argument, simple
linear trend surface equation is adopted as
z = a + bx + cy, (3.35)
where a, b, and c are the trend surface parameters. According to the steps given
above, the rest of the table is prepared accordingly.
There are three unknowns and it is necessary to obtain three equations in the light
of practical trend surface calculations. By considering the last row of averages from
Table 3.8, one can write the necessary equations simply as
Number x y z xz X2 xy yz y2
Compute Kalman
gain.
Compute error
covariance for
updated estimate.
Fig. 3.13 KF iterative
procedure
112 3 Classical Spatial Variation Models
Zk = Hk Xk + Vk , (3.37)
where Vk is another white noisy with zero mean and variance, R, and H is the
measurement parameter.
In order to improve the prior estimate X̂k/k−1 , the noisy measurement at time k,
Zk is used as
where X̂k/k is the updated estimate and Kk is the Kalman gain. Notice that (Zk −
Hk X̂k/k−1 ) is just the error in estimating Zk . For deciding on the value of K, the
variance of the error be computed as
" # ( )2
E (Xk − X̂k/k )2 = E Xk − X̂k/k−1 − Kk (Zk − Hk X̂k/k−1 )
" #2 (3.39)
= E (1 − Kk Hk )(Xk − X̂k/k−1 ) + Kk Vk
= Pk/k−1 (1 − Kk Hk )2 + RK2k ,
In these calculations Xk is a column matrix with many components. Then Eqs. (3.35,
3.36, 3.37, 3.38, 3.39, 3.40 and 3.41) become matrix equations, and the simplicity
as well as the intuitive logic of the KF become obscured. The covariance matrices
for Wk−1 and Vk vectors are given by
Qk , i = k
E Wk WTi = ,
0, i = k
Rk , i = k
E Vk VTi = ,
0, i = k
E Wk VTi = 0, for all k and i.,
and
On the other hand, the measurement update (corrector) equations are as follows:
and
1) Enter prior estimate X̂k/k−1 , which is based on all our knowledge about the pro-
cess prior to time tk−1 , and also suggest the error covariance matrix associated
with it Pk/k−1 .
2) Compute the Kalman gain as Kk = Pk/k−1 HTk (Hk Pk/k−1 HTk + Rk )−1 .
3) Update the estimate with measurement Zk as X̂k/k = X̂k/k−1 + Kk (Zk −
Hk X̂k/k−1 ).
4) Compute error covariance for updated estimate as Pk/k = Pk/k−1 − Kk Hk Pk/k−1 .
5) Project ahead the updated estimate X̂k/k and the error covariance matrix associ-
ated with it Pk/k , to use it as a prior estimation for the next time step X̂k+1/k =
k+1 k X̂k/k and finally Pk+1/k = k+1 k Pk/k Tk+1 k + Qk .
Once the loop is entered as shown in Fig. 3.12, then it can be continued as much
as necessary. Initially, when the model parameters are only rough estimates, the
gain matrix ensures that the measurement data are highly influential in estimating
the state parameters. Then, as confidence in the accuracy of the parameters grows
with each iteration, the gain matrix values decrease, causing the influence of the
measurement data in updating the parameters and associated error to reduce.
3.11 Multisite Kalman Filter Methodology 115
The geographic locations (latitude, longitude, and elevation in meter above mean sea
level) of precipitation stations are represented in Table 3.9.
Precipitation is characterized by variability in space and time. In addition, there
are many factors affecting the magnitude and distribution of precipitation, elevation
of station above mean sea level, various air mass movement, moisture, temperature,
pressure, and topography. The magnitude and distribution of precipitation vary from
place to place and from time to time, even in small areas. The application of multisite
KF model as developed in this chapter approach to multisite precipitation modeling,
which illustrates some interesting points in the annual precipitation pattern.
Figure 3.16 provides the observed and estimated annual rainfall values at Ada-
pazarı from 1956 to 1984.
It is to be noticed from this figure that the observed and estimated values follow
each other closely, which indicates that KF provides an efficient method for model-
ing of annual rainfall. Some statistical parameters of annual observed and estimated
rainfall values during the time period (1956–1985) are summarized in Table 3.10.
From another point of view, Fig. 3.16 provides the observed and estimated annual
rainfall values at 52 selected stations in Turkey for 1985. From Fig. 3.17 and Table
3.10, again as noticed before, in the case of one station (Adapazarı), the observed
and estimated values follow each other closely, which indicates that KF provides an
efficient method for modeling of annual rainfall in both time and space dimension.
800
700
600
Distance (km)
500
400
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Distance (km)
Table 3.9 Station locations and elevation above mean sea level
No. Station No. Station name Latitude (N) Longitude (E) Elevation (m)
Rainfall (mm)
1200
1000
800
600
1955 1960 1965 1970 1975 1980 1985
Time (Year)
Contour maps of observed and estimated annual rainfall for 1956–1985 and per-
centage errors of estimated annual rainfall are presented in Figs. 3.18 and 3.19,
respectively. In Fig. 3.18, dashed lines indicate the estimated annual average precip-
itation contours.
According to the areal values of observed and estimated annual rainfalls, the mul-
tisite KF method has a slight tendency toward under-estimation. Standard deviation
of estimated value is smaller than that of observed one (Fig. 3.20). Its mean is less
variable and, therefore, more smoothed than the observed values.
Furthermore, Fig. 3.17 proves that the estimated values of annual rainfall at most
of the sites in the study area are close to the observed values, especially in the part
where more stations are available such as in the northwestern part of Turkey. The
percentage error of estimated values vary from –6 in station number 52 (underes-
timation) to 6 in station number 49 (overestimation), with overall average about
0.121%.
The magnitude and distribution of precipitation vary from place to place and
from time to time even in small areas. Describing and predicting the precipitation
variability in space and/or time are the fundamental requirements for a wide variety
of human activities and water project designs. In this chapter, a KF technique has
been developed for the prediction of annual precipitation amounts at a multiple site.
In this manner, the precipitation amount at any year and stations is predicted by
considering all the available stations interrelationships.
Once a model has been selected, then KF processing requires specification
of initial state vector, error covariance matrix associated with this initial state
vector, system noise covariance, measurement noise covariance, state transition
matrix, and connection matrix. Most of this information should be based on
Table 3.10 Statistics parameters of observed and estimated annual rainfall values (1956–1985)
118
No. Station name Mean (mm) St. Dev. (mm) Minimum (mm) Maximum (mm) Range (mm) Skewness
Obs. Est. Obs. Est. Obs. Est. Obs. Est. Obs. Est. Obs. Est.
1 Adapazarı 835.5 834.6 183.4 135.3 606.3 611.7 1,564 1,332 957.7 720.3 2.18615 1.50665
2 Ali Fuat Paşa – Adapazarı 660.6 661 107.7 85.1 484.9 519.1 922.5 851.1 437.6 332 0.36170 0.13996
3 Ağrı 506.7 504.7 91.9 66.6 352 389.1 693.1 612.7 341.1 223.6 0.13271 −0.0978
4 Adana 690.4 688.7 221.3 184.4 319.4 360 1,265 1,149 945.6 789 0.86862 0.66023
5 Bahçeköy – İstanbul 1,287 1,284 478.6 354.7 848.1 866.2 2,698 2,325 1,850 1,459 1.73873 1.56783
6 Bolu 549.2 549.2 83.3 59.3 377.4 429.1 716.8 659.7 339.4 230.6 0.14283 0.16372
7 Balıkesir 606.7 606.1 163.6 126.3 362.6 385.6 1,192 1,036 829.4 650.4 1.59256 1.25458
8 Bursa 680.6 680.1 109.1 82.7 447.8 521.1 871 835 423.2 313.9 0.16262 −0.0062
9 Bandırma – Balıkesir 719 718.9 149.3 115 456.2 494.7 1,086 988.2 629.8 493.5 0.71735 0.55033
10 Bilecik 445.1 444.5 75.4 57.5 320.6 332.8 624.7 563.1 304.1 230.3 0.62528 0.39746
11 Bitlis 1,164 1,160 287.5 235.8 564.5 734.2 1,866 1,738 1,301 1,004 0.27270 0.51462
12 Çanakkale 636.6 640.4 139.9 108.9 414.8 459.8 977.7 868.5 562.9 408.7 0.64583 0.45769
13 Çorlu – Tekirdağ 570.6 571.1 104.1 81.5 431.2 456.4 801.3 759.9 370.1 303.5 0.58876 0.64617
14 Çorum 432.1 430.7 79.7 59.1 285.5 296.8 606.7 542.9 321.2 246.1 −0.05457 −0.2570
15 Diyarbakır 490 489 134.5 99.3 146.3 276 748.8 671 602.5 395 −0.10185 0.24572
16 Dursun bey – Balıkesir 779.1 782.2 171.6 130.6 479.3 575 1,199 1,079 719.7 504 0.36409 0.15470
17 Edremit – Balikesir 696.2 697.2 144.1 111.6 512.9 548.4 1,175 1,061 662.1 512.6 1.36238 1.2798
18 Erzurum 410.1 409.8 85.3 61.3 291.1 308.1 638.5 551.8 347.4 243.7 0.92707 0.49036
19 Edirne 593.2 594.4 102.2 72.1 430.5 496.6 863.6 792.2 433.1 295.6 0.85735 0.87714
20 Eskişehir 393.1 391.8 94.8 51.7 215.8 220.1 524 486.8 326.2 266.7 −0.18598 −0.9474
21 Florya – Istanbul 656.4 655.5 117.3 89.3 500.8 540 1,026 942.6 525.2 402.6 1.02088 1.07883
22 Gökçeada – Çanakkale 792.5 796.4 212.2 162.8 483 543.8 1,451 1,235 968 691.2 1.28517 0.92635
23 Göztepe – Istanbul 698.7 696.7 127.4 98.3 538.8 559 1,047 990.9 508.2 431.9 0.87054 0.94386
24 İpsala – Edirne 612.5 616.3 98.7 77.5 386.5 434.7 808.9 763.9 422.4 329.2 0.15228 0.11054
25 İzmit 766.6 764.9 150.2 119.9 554.9 557.2 1,088 1,012 533.1 454.8 0.37969 0.12170
26 Kartal – İstanbul 651.1 649.5 118.5 90.9 475.6 496.9 871.9 832.4 396.3 335.5 0.16123 0.05385
27 Kumköy – İstanbul 796.1 793.2 185.8 143.7 519.5 581.9 1,278 1,145 758.5 563.1 1.02989 0.88983
3 Classical Spatial Variation Models
3.11
No. Station name Mean (mm) St. Dev. (mm) Minimum (mm) Maximum (mm) Range (mm) Skewness
Obs. Est. Obs. Est. Obs. Est. Obs. Est. Obs. Est. Obs. Est.
28 Kirklareli 589.4 592.5 133.9 97.5 371.5 417.5 1,000 876.9 628.5 459.4 1.17951 0.99290
29 Kandilli 827.7 825.5 148.7 117.5 600.5 631.5 1,230 1,143 629.5 511.5 0.54675 0.42092
30 Kars 470.5 472.2 106.1 78.2 298.5 361.7 718.8 661.1 420.3 299.4 0.75373 0.81958
31 Luleburgaz – Kırklareli 652.3 655.4 182.2 138.8 399.7 437.4 1,360 1,159 960.3 721.6 2.00788 1.57797
32 Siirt 684.5 682.4 183.4 141.5 420.8 474.5 1,229 1,060 808.2 585.5 1.02095 0.91065
33 Şile – İstanbul 800.8 796.2 251.5 211.6 454.6 491.5 1,697 1,537 1,242 1,045 1.74568 1.66792
34 Sinop 640.2 639.7 134.2 109.6 414.7 455.2 990.4 917.1 575.7 461.9 0.83072 0.77820
35 Tekirdağ 604.2 606.6 192.6 141.2 405.2 434.8 1,464 1,192 1,059 757.2 3.1807 2.50102
36 Yalova – İstanbul 770.4 769.2 266.9 202.7 473.2 484.7 1,959 1,617 1,486 1,132 3.216 2.60607
Multisite Kalman Filter Methodology
37 Yozgat 565.8 562.6 109.2 86 391 412.9 858.2 771.3 467.2 358.4 0.72179 0.44662
38 Van 377.9 377.9 62.7 47.6 267.9 292.5 486 461.1 218.1 168.6 −0.1035 0.04482
39 Afyon 408.6 407.7 89.5 69 239 259.3 618 576.6 379 317.3 0.14329 −0.0192
40 Kayseri – Erkilet 364.8 365.2 60 45.3 263 278.6 535 475.3 272 196.7 0.75670 0.2949
41 Isparta 557.6 557.8 160 122.7 332 371.4 968 891.5 636 520.1 0.70164 0.85167
42 Konya 326.1 325 77.6 63.7 193 225 545 483.7 352 258.7 0.95732 0.66484
43 Muğla 1,165 1,164 298.3 229.3 658 767.4 1,805 1,672 1,147 904.6 0.53785 0.44423
44 Samsun 693.7 694.2 129.4 103.2 442.2 503.5 1,011 955 568.8 451.5 0.45975 0.72549
45 Antalya 1,074 1,072 293.4 228.3 554 608.7 1,914 1,747 1,360 1,138 0.29616 0.28473
46 Giresun 1,240 1,244 153.2 125.5 1,057 1,090 1,679 1,668 622 578 1.17704 1.63857
47 Kastamonu 457.6 455 82.2 67.3 308 312.1 617 574.9 309 262.8 0.17054 −0.1840
48 Sivas 411.8 410.1 79.2 56.4 286 298.8 575 519.8 289 221 0.03905 0.09282
49 Erzincan 368.8 367.6 71.7 56.8 257 277.9 563 504.1 306 226.2 0.86580 0.62331
50 Sarıyer – İstanbul 796.6 794.3 150.8 130.6 574.9 597.8 1,171 1,102 596.1 504.2 0.91685 0.72847
51 Iğdır 251.9 253.2 77.3 55.5 114.5 154.9 501.2 422.9 386.7 268 1.0145 1.00863
52 Malatya – Erhav 402.7 403.1 100 72.1 248 292.1 593 533.9 345 241.8 0.42067 0.33824
119
120 3 Classical Spatial Variation Models
1600
Observed
Estimated
1200
Rainfall (mm)
800
400
0
0 5 10 15 20 25 30 35 40 45 50 55
Station Number
Fig. 3.17 Observed and estimated annual rainfall values at selected stations in Turkey for 1985
physical understanding and on all the previous knowledge about the process prior
to tk-1 . If little historical information is available to specify the above matrices, then
KF may be started with very little objective information and adapted as the data
become available. However, the less the initial information, greater diagonal ele-
ments should be selected in the covariance matrices. In this manner, the algorithm
will have flexibility to adjust itself to sensible values in a relatively short space of
time.
The average amount of rainfall values at the selected stations are used as the ele-
ments of initial state vector. Sufficiently great diagonal elements of error covariance
matrix are needed with initial state vector provided in the initial moment. Then, the
prediction error covariance steadily decreases with time and arrives at a stable value
after some steps, indicating the efficiency of the prediction algorithm.
After the initial-state descriptions are read as first step, then the Kalman gain
matrix for the one-step prediction can be computed, with necessary assumptions, as
the connection matrix is unity, i.e., all stations are reporting their observations. The
diagonal elements of measurement noise covariance matrix is taken smaller than
those of the system covariance matrix, because the observed values are relatively
noise-free compared with the errors which result from the system.
Initially, when the model parameters are only rough estimates, with little objec-
tive information, the Kalman gain matrix ensures that the measurement data are
highly influential in estimating the state parameters. Then, as confidence in the accu-
racy of the parameters grows with each iteration, the gain matrix values decrease,
causing the influence of the measurement data in updating the parameters and asso-
ciated error.
3.11 Multisite Kalman Filter Methodology 121
800
700
600
DISTANCE IN (KM)
500
400 1960
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
DISTANCE IN (KM)
800
700
600
DISTANCE IN (KM)
500
1965
400
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
DISTANCE IN (KM)
800
700
600
DISTANCE IN (KM)
500
400 1970
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
DISTANCE IN (KM)
Fig. 3.18 Contour map of observed and estimated annual precipitation 1960–1985
122 3 Classical Spatial Variation Models
800
700
600
DISTANCE IN (KM)
500
400
1975
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
DISTANCE IN (KM)
800
700
600
DISTANCE IN (KM)
500
400
1980
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
DISTANCE IN (KM)
800
700
600
DISTANCE IN (KM)
500
400 1985
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
DISTANCE IN (KM)
800
700
600
Distance (km)
500
400 1960
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Distance (km)
800
700
600
Distance (km)
500
400 1965
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Distance (km)
800
700
600
Distance (km)
500
400 1970
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Distance (km)
Fig. 3.19 Contour map of percentage error of estimated annual rainfall 1960–1985
124 3 Classical Spatial Variation Models
800
700
600
Distance (km)
500
400
1975
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Distance (km)
800
700
600
Distance (km)
500
400 1980
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Distance (km)
800
700
600
Distance (km)
500
400 1985
300
200
100
0 km 100 km 200 km
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Distance (km)
250
200
150
1955 1960 1965 1970 1975 1980 1985
Time (Year)
KF provides an efficient method for modeling of annual rainfall in both time and
space dimensions.
The estimated values of annual rainfall at most of the sites in the study area are
close to the observed values, specially in the part where more stations are available,
such as in the northwestern part of Turkey. The percentage error of estimated values
vary from –6 (underestimation) to 6 (overestimation), with overall average about
0.121%.
References
Barnes, S. L., 1964. A technique for maximizing details in numerical weather map analysis. J.
Appl. Meteor. 3, 396–409.
Benjamin, J. R., and Cornell, C. A., 1970. Probability Statistics and Decision Making in Civil
Engineering. McGraw-Hill Book Inc., New York.
Bergthorsson, P., and Döös, B. R., 1955. Numerical weather map analysis. Tellus 7, 329–340.
Brown, R. G., and Hwang Y. C., 1992. Introduction to Random Signals and Applied Kalman Fil-
tering (Second edition). John Wiley & Sons, New York.
Cressman, G. P., 1959. An operational objective analysis system. Mon. Wea. Rev. 87, 367–374.
Davis, A., 2002. Statistics Data Analysis Geology. John Wiley and Sons, New York, 638 pp.
Dee, D. P., 1991. Simplification of Kalman filter for meteorological data assimilation. Q. J. R.
Meteorol. Soc. 117, 365–384.
Donnelly, K. P., 1978. Simulations to determine the variance and edge effect of total nearest neigh-
bor distance. In: Hodder, I. (Ed.), Simulation Studies in Archaeology. Cambridge University
Press, Cambridge, UK, pp. 91–95.
Gandin, L. S., 1963. Objective Analysis of Meteorological Fields. Hydromet Press, Leningrad,
242 pp.
Gilchrist, B., and Cressman, G. P., 1954. An experiment in objective analysis. Tellus 6, 309–318.
Goodin, W. R., McRea, G. J., and Seinfeld, J. H., 1979. A comparison of interpolation methods for
sparse data: application to wind and concentration fields. J. Appl. Meteor. 18, 761–771.
Harrison, P. J., and Stevens, C. F., 1975. Bayesian forecasting. University of Warwick, Working
Paper No. 13.
Kalman, R. E., 1960. A new approach to linear filtering and prediction problems. Trans ASME,
Ser. D, J. Basic Eng. 82, 35–45.
Kalman, R. E., and Bucy, R., 1961. New result in linear filtering and prediction theory. Trans
ASME, Ser. D, J. Basic Eng. 83, 95–108.
Koch, S., and Link, R. F., 1971. Statistical Analysis of Geological Data. Dover Publications, New
York, 375 pp.
Koch, S. E., DesJardins, M., and Kocin, P. J., 1983. An iterative Barnes objective map analysis
scheme for use with satellite and conventional data. J. Appl. Meteor. 22, 1487–1503.
Latif, A. M., 1999. A Kalman filter approach to multisite precipitation modeling in meteorology.
Unpublished Ph.D. thesis, Istanbul Technical University, 125 pp.
Matheron, G., 1963. Principles of geostatistics. Econ. Geol. 58, 1246–1266.
Matheron, G., 1965. Les Variables Regionalisees et leur Estimation. Masson, Paris, 306 pp.
Sasaki, Y., 1960. An objective analysis for determining initial conditions for the primitive equa-
tions. Tech. Rep. 60-16T.
Şen, Z., 1989. Cumulative semivariogram models of regionalized variables. Math. Geol. 21,
891–903.
Şen, Z., 2004. Fuzzy Logic and System Models in Water Sciences. Turkish Water Foundation
Publication, Istanbul, Turkey, 315 pp.
Zadeh, L. A., 1968. Fuzzy algorithms. Inform. Control 12, 94–102.
Chapter 4
Spatial Dependence Measures
4.1 General
Uncertainty within earth sciences data and geologic maps is rarely analyzed or dis-
cussed. This is undoubtedly due, in part, to the difficulty of analyzing large sets of
paper files. The increasing use of computers for storing, retrieving, and serving large
data sets has made it easier to systematically analyze these data and preparation of
maps and models from them.
General error theory can be a useful method for characterizing the uncertainty
within spatial data. For most data, errors can be due to inaccuracies in record keep-
ing, description, identification of behavior and location, generalization, and corre-
lation. While analysis by error theory can be employed to evaluate the uncertainty
within the data used in mapping, other methods can be used to more fully evalu-
ate the uncertainty within specific maps. As already mentioned in Chapter 2, the
area of influence method was developed to use three map characteristics (longi-
tude, latitude, elevation) for estimating uncertainty within the maps of earth sci-
ences. This method uses the spatial distribution of data points, the probability of
mis-identification of the targeted unit, and the size of the targeted geologic feature
to calculate the probability that additional, unidentified targets can exist. Insight
gained from the use of error theory and the area of influence methods can be used
to describe the uncertainty included in spatial maps.
Earth scientists need maps based even on few samples for extensive decisions
in executing certain projects. It is, therefore, necessary for him/her to be able
to visualize the subject through limited amount of data preliminarily, which can
then be expanded to more effective directions with the coming of additional data.
Although there is ready software for mapping if the user is not familiar with some
basic principles of the methodology implemented, then the conclusions may be
at bias at the minimum. At the early stages of any study, maps are needed for
efficient spatial relationships without any involved mathematical formulations. In
general, maps relate three variables (triplets as mentioned in Chapter 2), most of
the time two are the longitude and latitude or geographical coordinates of the
sample points and therefore they show the spatial variability of a single variable.
However, in this chapter imaginary maps of three different earth science vari-
ables are also presented for better logical arguments, interpretations, and conclu-
sions. Maps help to perceive large-scale spatial relationships easily on a small piece
of paper.
Maps are based on the point data measurements, the distances between the points,
and the density of points. Since most natural phenomenon is of continuous type,
maps are representations of finite number of measurement sites and their continuous
surface expressions. Hence, the more and better scattered the measurement points
within the study area the better is the map representation of the natural phenomenon.
The most common ones are the topographic maps with continuous contour lines of
elevations, which are derived from a set of discrete location surveys. In the drawing
of maps not only the measurements but also the artistic talent and skills of the expert
are also taken into consideration. Hence, in many early maps subjective biases have
entered the domain of mapping, but recently more objective mapping methodologies
are developed and the objectivity is more enhanced.
The reliability of contour maps is directly dependent on the total density of the
sampling points as well as on their uniform distribution (Chapter 3). However, in
practice the uniformity of the sampling points is seldom encountered and the maps
are prepared by avoiding this point. For instance, numerical, statistical, or prob-
abilistic versions of weather prediction methods are based on the data available
from irregularly distributed sites within synoptic regions. The very success of such
4.2 Isotropy, Anisotropy, and Homogeneity 129
Usually, points closer to the grid node are given more weight than points farther
from the grid node. If, as in the example above, the points in one direction have
more similarity than the points in another direction, it is advantageous to give points
in a specific direction more weight in determining the value of a grid node. The rel-
ative weighting is defined by the anisotropy ratio. The underlying physical process
producing the data as well as the sample spacing of the data are important in the
decision of whether or not to reset the default anisotropy settings.
Anisotropy is also useful when data sets have fundamentally different units along
different dimensions. For example, consider plotting a flood profile along a river.
The x coordinates are locations, measured in km along the river channel. The t coor-
dinates are time, measured in days. The Z(x, t) values are river depth as a function of
location and time. Clearly in this case, the x and t coordinates would not be plotted
on a common scale, because one is distance and the other is time (Fig. 4.1). One
unit of x does not equal one unit of t. While the resulting map can be displayed with
changes in scaling, it may be necessary to apply anisotropy as well.
Another example of anisotropy might be employed for an isotherm map (equal
temperature lines, contour map) of average daily temperature over a region.
Although the X and Y coordinates (as Easting, say X and Northing, say Y) are
measured using the same units, the temperature tends to be very similar. Along
north–south lines (Y lines) the temperature tends to change more quickly (getting
colder as one heads towards the north) (see Fig. 4.2). In this case, in gridding the
data, it would be advantageous to give more weights to data along the east–west
axis than along the north–south axis. When interpolating a grid node, observations
that lie in an east–west direction are given greater weight than observations lying an
equivalent distance in the north–south direction.
In the most general case, anisotropy can be visualized as an ellipse. The ellipse
is specified by the lengths of its two orthogonal axes (major and minor) and by
an orientation angle, θ. The orientation angle is defined as the counter clockwise
angle between the positive X and, for instance, minor axis (see Fig. 4.3). Since the
ellipse is defined in this manner, an ellipse can be defined with more than one set of
parameters.
Z (x, t)
Mediterranean Sea
For most of the gridding methods, the relative lengths of the axes are more impor-
tant than the actual length of the axes. The relative lengths are expressed as a ratio
in the anisotropy group. The ratio is defined as major axis divided by minor axis. If
it is equal to 1, then the ellipse takes the form of a circle. The angle is the counter
clockwise angle between the positive X axes and minor axis. The small picture in
the anisotropy group displays a graphic of the ellipse to help describe the ellipse.
An anisotropy ratio less than 2 is considered mild, while an anisotropy ratio greater
than 4 is considered severe. Typically, when the anisotropy ratio is greater than 3,
its effect is clearly visible on grid-based maps. The angle is the preferred orientation
(direction) of the major axis in degrees.
An example where an anisotropy ratio is appropriate is an oceanographic survey
to determine water temperature at varying depths. Assume the data are collected
every 1,000 m along a survey line, and temperatures are taken every 10 m in depth
at each sample location. With this type of data set in mind, consider the problem of
creating a grid file. When computing the weights to assign to the data points, closer
data points get greater weights than points farther away. A temperature at 10 m in
θ X, Easting
The first step is referred to as the objective analysis and the second one is the spa-
tial modeling phase. For sure, a sound objective analysis is the primary prerequisite
of successful modeling. For instance, meteorologists strive for effective interpola-
tion in order to enhance their mesoscale analysis and forecasts. Objective analysis
studies of meteorological variables started with the work by Panofsky (1949). He
attempted to produce contour lines of upper-wind movements by fitting third-order
polynomials and employing least squares method to the observations at irregular
sites. The least squares method leads to predicted field variables, which depend
strongly on the distribution of data points when a suitable polynomial is fitted to
full grid. Optimum analysis procedures are introduced to meteorology by Eliassen
(1954) and Gandin (1963). These techniques employ historical data about the struc-
ture of the atmosphere to determine the weights to be applied to the observations.
Here, the implied assumption is that the observations are spatially correlated. Con-
sequently, observations that are close to each other are highly correlated; hence, as
the observations get farther apart the spatial dependence decreases. It is a logical
consequence to expect regional dependence function as in Fig. 4.4, assuming that
at zero distance the dependence is equal to 1 and then onward there is a continuous
decrease or decreasing fluctuations depending on the ReV behavior.
In this figure there are three spatial dependence functions (SDFs) as A, B, and
C. Logically, A and B indicate rather homogeneous and isotropic regional behavior
of ReV whereas C has local differences at various distances. However, all of them
decrease down to zero SDF value. The distance between the origin and the point
where the SDF is almost equal to zero shows the radius of influence as R1 or R2 .
Provided that the ReV behavior is isotropic (independent of direction), the radius
of area can be calculated as a circle around each station, with radius equal to the
radius of influence. These are subjective and expert views about the spatial depen-
dence structure of any ReV. Their objective counterparts can be obtained from a
Spatial dependence
A B C
0 Distance
R1
Fig. 4.4 Spatial dependence
functions R2
4.3 Spatial Dependence Function 133
set of spatial data as will be explained later in this chapter. The spatial predictions
are then made by considering a spatial model with a domain equal to the radius of
area. For instance, Gilchrist and Cressman (1954) reduced the domain of polyno-
mial fitting to small areas surrounding each node with a parabola. Bergthorsson and
Döös (1955) proposed the basis of successive correction methods, which did not
rely only on interpretation to obtain grid point values but also a preliminary guess
field is initially specified at the grid points (Chapter 5). Cressman (1959) developed
a number of further correction versions based on reported data falling within a spec-
ified distance R from each grid point. The value of R is decreased with successive
scans (1,500, 750, 500 km, etc.) and the resulting field of the latest scan is taken
as the new approximation. Barnes (1964) summarized the development of a conver-
gent weighted-averaging analysis scheme, which can be used to obtain any desired
amount of detail in the analysis of a set of randomly spaced data. The scheme is
based on the supposition that the two-dimensional distribution of a ReV can be rep-
resented by the summation of an infinite number of independent waves, i.e., Fourier
integral representation. A comparison of existing objective methods up to 1979 for
sparse data is provided by Goodin et al. (1979). Their study indicated that fitting a
second-degree polynomial to each sub-region triangular in the plane with each data
point weighted according to its distance from the sub-region provides a compromise
between accuracy and computational cost. Koch et al. (1983) presented an extension
of the Barnes method, which is designed for an interactive computer scheme. Such
a scheme allows real-time assessment both of the quality of the resulting analyses
and of the impact of satellite-derived data upon various earth sciences data sets.
However, all of the aforementioned objective methods have the following common
drawbacks.
1) They are rather mechanical without any physical foundation, but rely on the
regional configuration of irregular sites. Any change in site configuration leads
to different results although the same ReV is sampled.
2) They do not take into consideration the spatial covariance or correlation struc-
ture within the ReV concerned.
3) They have constant radius of influence without any directional variations.
Hence, spatial anisotropy of observed fields is ignored. Although some
anisotropic distance function formulations have been proposed by Inman (1970) and
Shenfield and Bayer (1974), all of them are developed with no explicit quantitative
reference to the anisotropy of observed field structure of the ReV.
According to Thiebaux and Pedder’s (1987) assessment of the work done by
Bergthorsson and Döös, “the most obvious disadvantage of simple inverse distance-
weighing schemes is that they fail to take into account the spatial distribution of
observations relative to each other.” Two observations at equidistant from a grid
point are given the same weight regardless of relative values at measurement sites.
This may lead to large operational biases in grid point data when some observations
are much closer together than others within the area of influence.
Especially after 1980s, many researchers are concentrated on the spatial covari-
ance and correlation structures of the ReVs. Lorenc (1981) has developed a
134 4 Spatial Dependence Measures
methodology whereby, first of all, the grid points in a sub-region are analyzed simul-
taneously using the same set of observations and then sub-areas are combined to
produce the whole study area analysis. Some papers are concerned with the deter-
mination of unknown parameters of the other covariance functions or SDFs, which
provide required weightings for ReV data assimilation. Along this line, the idea
proposed by Bratseth (1986) depends on the interpretation of the ReV covariances
into the objective analysis. His analysis caused a recent resurgence of the succes-
sive correction method in which the optimal analysis solution is approached. His
method uses the correlation function for the forecast errors to derive weights that
are reduced in regions of higher data density. Later, Sashegyi et al. (1993) employed
his methodology for the numerical analysis of the data collected during the Genesis
of Atlantic Lows Experiment, (GALE). Practical conclusions of Bratseth’s approach
have been reported by Franke (1988) and Seaman (1988).
On the other hand, Buzzi et al. (1991) described a simple and economic method
for reducing the errors that can result from the irregular distribution of data points
during an objective analysis. They have demonstrated that a simple iterative method
for correcting the analysis generated by an isotropic distance-weighting scheme
applied to a heterogeneous spatial distribution of observations cannot improve anal-
ysis accuracy, but also results in an actual frequency response that approximates
closely the theoretical response of the predicted weight-generating function. They
have shown that in the case of heterogeneous spatial sampling, a Barnes analysis
could produce an unrealistic interpolation of the sampled field even when this is
reasonably well resolved by error-free observations. Iteration of a single correction
algorithm led to the method of successive correction (Daley, 1991). The method of
successive correction has been applied as a means to tune adaptively the a posteriori
weights. Objective analysis schemes are practical attempts to minimize the variance
estimation (Thiebaux and Pedder, 1987).
Pedder (1993) provided a suitable formation for successive correction scheme
based on a multiple iteration using a constant influence scale that provides more
effective approach to estimate ReVs from scattered observations than the more con-
ventional Barnes method which usually involves varying the influence scale between
the iterations. Recently, Dee (1995) has presented a simple scheme for online esti-
mation of covariance parameters in statistical data assimilation systems. The basis of
the methodology is a maximum-likelihood approach in which estimates are obtained
through a single batch of simultaneous observations. Simple and adaptive Kalman
filtering techniques are used for explicit calculation of forecast error covariance
(Chapter 3). However, the computational cost of the scheme is rather high.
Field measurements of ReVs such as ore grades, chemical constitutions in ground
water, fracture spacing, porosity, permeability, aquifer thickness, dip and strike of
a structure are dependent on the relative positions of measurement points within
the study area. Measurements of a given variable at a set of points provide some
insight into the spatial variability. This variability determines the ReV behavior as
well as its predictability. In general, the larger the variability, the more heteroge-
neous is the ReV environment and as a result the number of measurements required
to model, simulate, estimate, and predict the ReV is expected to be large. Large
variability implies also that the degree of dependence might be rather small even
4.4 Spatial Correlation Function 135
for data whose locations are close to each other. A logical interpretation of such a
situation may be that either the region was subjected to natural phenomena, such as
tectonics, volcanism, deposition, erosion, recharge, climate change, or later to some
other human activities as pollution, ground water abstraction, mining, etc.
However, many types of ReVs are known to be spatially related in that the closer
their positions, the greater is their dependence. For instance, spatial dependence is
especially pronounced in hydrogeological data due to groundwater flow as a result of
the hydrological cycle, which homogenizes the distribution of chemical constituents
within the heterogeneous mineral distribution in geological formations.
The factors of ReVs are sampled at irregular measurement points within an area
at regular or irregular time intervals. No doubt, these factors show continuous vari-
ations with respect to other variables such as temperature, distance. Furthermore,
temporal and spatial ReV evolutions are controlled by temporal and spatial correla-
tion structures within the ReV itself. As long as the factors are sampled at regular
time intervals, the whole theory of time series is sufficient in their temporal mod-
eling, simulation, and prediction. The problem is with their spatial constructions
and the transfer of information available at irregular sites to regular grid nodes or
to any desired point. Provided that the structure of spatial dependence of the ReV
concerned is depicted effectively, then any future study such as the numerical pre-
dictions based on these sites will be successful. In order to achieve such a task it is
necessary and sufficient to derive the change of spatial correlation for the ReV data
with distance.
In order to quantify the degree of variability within spatial data, variance tech-
niques can be used in addition to classical autocorrelation methods (Box and Jenk-
ins, 1976). However, these methods are not helpful directly to account for the spa-
tial dependence or for the variability in terms of sample positions. The drawbacks
are due to either non-normal (asymmetric) distribution of data and/or irregularity
of sampling positions. However, the semivariogram (SV) technique, developed by
Matheron (1963, 1971) and used by many researchers (Clark, 1979; Cooley, 1979;
David, 1977; Myers et al., 1982; Journel, 1985; Aboufirassi and Marino, 1984;
Hoeksema and Kitandis, 1984; Carr et al., 1985) in diverse fields such as geology,
mining, hydrology, earthquake prediction, groundwater, can be used to character-
ize spatial variability and hence the SDF. The SV is a prerequisite for best linear
unbiased prediction of ReVs through the use of Kriging techniques (Krige, 1982;
Journel and Huijbregts, 1978; David, 1977).
By definition, SCF, ρij between i and j, takes values between −1 and +1 and can be
calculated from available historical data as
(Zoi − Zi )(Zoj − Zj )
ρij = , (4.1)
(Zoi − Zi )2 (Zoj − Zj )2
136 4 Spatial Dependence Measures
where over bars indicate time averages over a long sequence of past observations,
Zoi and Zoj represent observed precipitation amounts at these stations, and finally, Zi
and Zj are the climatological mean of precipitations. Furthermore, ρij is thought as
attached with the horizontal distance Di,j between stations i and j. Consequently, if
there are n stations, then there will be m = n(n–1)/2 pairs of distances and the cor-
responding correlation coefficients. Their plot results in a scatter diagram that indi-
cates the SCF pattern for the regional rainfall amounts considered as a random field.
Figure 4.5 presents such scatter diagrams of empirical SCFs concerning monthly
rainfall amounts (Şen and Habib, 2001). At a first glance, it is obvious from this fig-
ure that there are great scatters at any given distance in the correlation coefficients,
and unfortunately, one cannot easily identify a functional trend. The scatter can be
averaged out by computing mean correlation coefficient over relatively short dis-
tance intervals (Thiebaux and Pedder, 1987). The following significant points can
be deduced from these SCFs.
0.8 0.8
0.8
0.6 0.6
0.4 0.6
0.4
0.2 0.4
0.2
Correlation
0.0 0.2
Correlation
Correlation
0.0
–0.2 0.0 –0.2
0 200 400 600 800 1000120014001600
–0.2 –0.4
Distance (km) 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600
1.0 1.0 1.0
April May June
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2
Spatial Correlation Function
Correlation
Correlation
Correlation
0.0 0.0 –0.2
–0.2 –0.2 –0.4
–0.4 –0.4 –0.6
0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600
1.0 1.0 August 1.0 September
July
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2
0.2 0.2 0.0
Correlation
Correlation
Correlation
Correlation
Correlation
Correlation
0.6
0.6 0.6
0.4
Correlation
0.0
Correlation
Correlation
0.2 0.2
–0.2
0.0 0.0
0 200 400 600 800 1000 1200 1400 1600
0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600
Distance (km) Distance (km) Distance (km)
1.0 1.0 1.0
0.8
April 0.8
May 0.8
June
0.6 0.6 0.6
Correlation
Correlation
Correlation
0.0 0.0 0.0
0.8
July 0.8
August 0.8
September
0.6 0.6 0.6
Correlation
Correlation
0.0 0.0 0.0
0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600
Distance (km) Distance (km) Distance (km)
1.0 1.0 1.0
0.8
October 0.8
November 0.8
December
0.6 0.6 0.6
Correlation
Correlation
0.0 0.0 0.0
4) The correlation function measures the variation around the arithmetic average
values of the measurements at individual sites. However, in the spatial variability
calculations, a measure of relative variability between two sites is necessary.
5) Especially for the last two points, the SV (Matheron, 1963) or cumulative SV
(CSV) (Şen, 1989) concepts are developed and their modifications as the point
CSV (PCSV) are presented and used for the regional assessment of earth sci-
ences data.
For instance, Barros and Estevan (1983) presented a method for evaluating wind
power potential from a 3-month long wind record at a site and data from a regional
network of wind systems. Their key assumption was that “wind speed has some
degree of spatial correlation,” which is a logical conclusion, but they failed to
present an effective method for the objective calculation of the spatial variability,
except by employing cross- and auto-correlation techniques. Their statement does
not provide an objective measure of spatial correlation. Skibin (1984) raised the
following questions.
1) What is “a reasonable spatial correlation?” Are the correlation coefficients
between the weekly averages of wind speed a good measure of it? Answers
to these questions are necessary by any objective method. For instance, PCSV
technique can be employed for this purpose.
2) Do the weekly averages represent the actual ones?
3) How applicable to the siting of wind generators the results obtained by the use
of spatial correlation coefficients?
In deciding about the effectiveness of the wind speed measurement at a site
around its vicinity, the topographic and climatic conditions must be taken into con-
sideration. The smaller the area of influence the more homogeneous orographic,
weather, and climatologic features are and, consequently, the simplest is the model.
However, large areas more than 1,000 km in radius around any site may contain
different climates with different through and ridges, high and low pressure areas
with varying intensities. Furthermore, in heterogeneous regions with varying surface
properties (such as land–sea–lake–river interfaces) and variable roughness param-
eters, the local wind profile and wind potential can be affected significantly. The
wind energy potential and the groundwater availability are highly sensitive to height
variations of hills, valleys, and plains (Şen, 1995). The reasons for wind speed vari-
ations are not only of orographical origin but also of different flow regimes (i.e.,
anabatic–catabatic influences compared with hill top conditions, upwind compared
with leeside sites, flow separation effects). All these effects will lose their influence
further away from the siting point. It can be expected that smaller distance from
the site corresponds to a larger correlation. Here, again it is obvious that the spa-
tial dependence decreases with distance as in Fig. 4.5 (correlation property). Barros
and Estevan (1983) noticed that a small region had higher correlation coefficients
between the sites. On the contrary, the spatial independence increases with the dis-
tance (SV property).
Barchet and Davis (1983) have stated that better estimates are obtained when
the radius of influence is about 200 km from the site. However, this information is
140 4 Spatial Dependence Measures
4.5.1 SV Philosophy
The very basic definition of the SV says that it is the half squared-difference varia-
tion of the ReV by distance. ReV theory does not use the autocorrelation, but instead
uses a related property called the SV to express the degree of relationship between
measurement points in a region. The SV is defined simply as half-square (variance)
of the differences between all possible point pairs spaced a constant distance, d,
apart. The SV at a distance d = 0 should be zero, because there are no differences
(variance) between points that are compared to themselves. The magnitude of the
SV between points depends on the distance between the points. The smaller the
distance the smaller is the SV, and at larger distances SV value is larger. The SV
is a practical measure of average spatial changes. The underlying principle is that,
on the average, two observations closer together are more similar than two obser-
vations farther apart. This is a general statement where the directional changes are
not considered. The plot of the SV values as a function of distance from a point is
referred to as an SV. However, as points are compared to increasingly distant points,
the SV increases.
The simplest and most common form of ReV is a triplet, and therefore it is illu-
minating first to consider the surface in 3D and then according to the SV definition
it is possible to infer its shape intuitively by mental experiment.
1) Continuously deterministic uniform spatial data: If the ReV is a deterministic
horizontal surface of homogeneous, isotropic, and uniform data as in Fig. 4.7,
then the average half-difference square of such data is zero at every distance as
in Fig. 4.8.
2) Discontinuously deterministic partially uniform spatial data: The continuity in
Fig. 4.7 is disrupted by a discontinuous feature (cliff, fault, facies change,
boundary, etc.) as in Fig. 4.9.
4.5 Semivariogram Regional Dependence Measure 141
x
0
d
0
Z(x, y)
Z H (x, y)
ZL(x, y)
The resulting SV is expected to take the shape as in Fig. 4.10, where there is a
non-zero value at the origin. Such a jump at the origin indicates discontinuity
142 4 Spatial Dependence Measures
γ (d)
2
[ZH (x, y) – Z L(x, y)]
d
0
x
0
γ (d)
x
0
nugget
d
0
d
0
The SV in this spatial random event case is equivalent to the expectation of Eq.
(4.2), which after expansion and expectation E(.) operation application on both sides
leads to
" # " #
E γ (d) = e Z2H (x,y) − 2E[ZH (x,y)ZL (x,y)] + e Z2L (x,y) .
Since the ReV is assumed as spatially independent with zero mean (expectation), the
second term of this expression is equal to zero and the other terms are
equal
to the
variance, σ2 , of the spatial event. Finally, this last expression yields E γ (d) = 2σ 2 .
In order to have the SV expectation equal to the variance in practical applications,
it is defined as the half-square difference instead of squared-difference as in Eq.
(4.2). Consequently, the SV of an independent ReV appears as having a sill value
similar to Fig. 4.10, but this time the sill value is equal to the spatial variance of
the ReV.
4.5.2 SV Definition
The SV is the basic geostatistical tool for visualizing, interpreting, modeling, and
exploiting the regional dependence in an ReV. It is well known that even though the
measurement sites are irregularly distributed, one can find central statistical param-
eters such as mean, median, mode, variance, skewness, but they do not yield any
detailed information about the phenomenon concerned. The greater the variance the
greater is the variability, but unfortunately this is a global interpretation without
detailed useful information. The structural variability in any phenomenon within an
area can best be measured by comparing the relative change between two sites. For
instance, if any two sites, distant d apart, have measured concentration values Zi
and Zi+d, then the relative variability can simply be written as (Zi – Zi+d ). How-
ever, similar to Taylor (1915) theory concerning turbulence, the squared-difference,
(Zi − Zi+d )2 , represents this relative change in the best possible way. This squared-
difference has appeared first in the Russian literature as the “structure function”
of ReV. It subsumes the assumption that the smaller the distance, d, the smaller
will be the structure function. Large variability implies that the degree of depen-
dence among earth sciences records might be rather small even for sites close to
each other.
4.5 Semivariogram Regional Dependence Measure 145
In order to quantify the degree of spatial variability, variance and correlation tech-
niques have been frequently used in the literature. However, these methods cannot
account correctly for the spatial dependence due to either non-normal pdfs and/or
irregularity of sampling positions.
The classical SV technique has been proposed by Matheron (1963) to eliminate
the aforementioned drawbacks. Mathematically, it is defined as a version of Eq.
(4.28) by considering all of the available sites within the study area as (Matheron,
1963; Clark, 1979)
1
nd
γ (d) = (Zi − Zi+d )2 , (4.3)
2nd
k=1
where k is the counter of the distance, which can be expanded by considering the
regional arithmetic average, Z, of the ReV as follows:
1
nd
2
γ (d) = Zi − Z − Zi+d − Z
2
k=1
" 2 2 #
= Zi − Z − 2 Zi − Z Zi+d − Z + Zi+d − Z .
The elegancy of this formulation is that the ReV pdf is not important in obtaining the
SV, and furthermore, it is effective for regular data points. It is to be recalled, herein,
that the classical variogram, autocorrelation, and autorun techniques (Şen, 1978)
all require equally spaced data values. Due to the irregularly spaced point sources,
the use of classical techniques is highly questionable, except that these techniques
might provide biased approximate results only. The SV technique, although suitable
for irregularly spaced data, has practical difficulties as summarized by Şen (1989).
Among such difficulties is the grouping of distance data into classes of equal or
variable lengths for SV construction, but the result appears in an inconsistent pattern
and does not have a non-decreasing form as expected in theory. As the name implies
a SV, γ (d), is a measure of spatial dependence of an ReV.
Due to independence any cross-multiplication of Zi and Zj will be equal to zero
on the average, and hence this is equivalent to regional variance, σ2 , as explained in
the previous section. Figure 4.16 shows this mental experiment SV as a horizontal
straight line. Hence, at every distance the SV is dominated by sill value only. Expert
reasoning of SV models in the previous figures help to elaborate some fundamental
and further points as follows.
1) If the ReV is continuous without any discontinuity, then the SV should start
from the origin, which means that at zero distance SV is also zero (Figs. 4.8 and
4.12).
2) If there is any discontinuity within the ReV, then at zero distance a non-zero
value of the SV appears as in Figs. 4.10, 4.14, and 4.16.
3) If there is an extensive spatial dependence, then the SV has increasing values at
large distances (Figs. 4.12 and 4.14).
146 4 Spatial Dependence Measures
4) When the spatial dependence is not existent, then the SV has a constant non-
zero value equal to the regional variance of the ReV at all distances as in
Fig. 4.16.
5) Under the light of all what have been explained so far, it is logically and ratio-
nally obvious that in the case of spatial dependence structure in ReV the SV
should start from zero at zero distance and then will reach the regional variance
value as a constant at large distances. The SV increases as the distance increases
until at a certain distance away from a point it equals the variance around the
average value of the ReV, and will therefore no longer increase, causing a flat
(stabilization) region to occur on the SV, which is called as a sill (Fig. 4.17). The
horizontal stabilization level of sample SV is referred to as its sill. The distance
at which the horizontal SV portion starts is named as the range, R, radius of
influence or dependence length after which there is no spatial (regional) depen-
dence between data points. Only within this range, locations are related to each
other and hence all measurement locations in this region are the nearest neigh-
bors that must be considered in the estimation process. This implies that the
ReV has a limited areal extend over which the spatial dependence decreases or
independence increases in the SV sense as in Fig. 4.17.
The classical SV is used to quantify and model spatial correlations. It reflects
the idea that closer points have more regional dependence than distant points. In
general, spatial prediction is a methodology that imbeds the spatial dependence
in the model structure.
6) At some distance, called the range, the SV will become approximately equal
to the variance of the ReV itself (see Fig. 4.17). This is the greatest distance
over which the value at a point on the surface is related to the value at another
point. The range defines the maximum neighborhood over which control points
should be selected to estimate a grid node, to take advantage of the statistical
correlation among the observations. In the circumstance where the grid node
and the observations are spaced so that all distances exceed the range, Kriging
produces the same estimate as classical statistics, which is equal to the mean
value.
7) However, most often natural data may have preferred orientations and as a result
ReV values may change more along the same distance in one direction than
another (Fig. 4.3). Hence, in addition to distance, the SV becomes a function of
direction (Fig. 4.18).
Nugget effect
Fig. 4.17 Classical global d
SV and elements 0
4.5 Semivariogram Regional Dependence Measure 147
γ (d) γ (d)
d d
0 0
a b
Fig. 4.18 Classical directional SV (a) major axis, (b) minor axis
The last point is helpful for the identification of regional isotropy or anisotropy.
For the Kriging application the convenient composition of these parameters must be
identified through a theoretical SV. Whether a given sample SV is stationary or not
can be decided from its behavior at large distances. If the large distance portion of
the SV approaches a horizontal line, then it is stationary, which means intuitively
that there are rather small fluctuations with almost the same variance at every corner
of the region.
If the SV is generated from paired points selected just based on distance (with
no directional component), then it is called isotropic (iso means the same; tropic
refers to direction) or omni-directional. In this case, the lag-distance measure is a
scalar and the SV represents the average of all pairs of data without regard to their
orientation or direction. A standardized SV is created by dividing each SV value by
the overall sample variance, which allows SVs from different data sets on the same
entity for facilitating mutual comparison.
On the other hand, SVs from points that are paired based on direction and dis-
tance are called anisotropic (meaning not isotropic). In this case, the lag measure is
a vector. The SVs in this case are calculated for data that are in a particular direction
as explained in Section 4.3. The regularity and continuity of the ReV of a natural
phenomenon are represented by the behavior of SV near the origin. In SV mod-
els with sill (Fig. 4.17), the horizontal distance between the origin and the end of
SV reflects the zone where the spatial dependence and the influence of one value
on the other occur, and beyond this distance the ReVs Z(x) and Z(x+d) are inde-
pendent from each other. Furthermore, SVs, which increase at least as rapidly as
d2 for large distances d indicate the presence of drift (trend), i.e., non-stationary
mathematical expectation. Plot of SV graphs for different directions gives valuable
148 4 Spatial Dependence Measures
1) Each distance lag (d) class must be represented by at least 30–50 pairs of points.
2) The SV should only be plotted out to about half the width of the sampling space
in any direction.
8) Add a cone structure with direction equal to the major direction plus 90◦ , and
model the SV results in this direction.
9) If the data are isotropic, choose the omni-directional SV as the major direction.
4.5.3 SV Limitations
The SV model mathematically specifies the spatial variability of the data set and
after its identification the spatial interpolation weights, which are applied to data
points during the grid node calculations, are direct functions of the Kriging model
(Chapter 5). In order to determine the estimation value all measurements within
the SV range are assigned weights depending on the distance of neighboring point
using the SV. These weights and measurements are then used to calculate the esti-
mation value through Kriging modeling. Useful and definite discussions on the
practicalities and limitations of the classical SV have been given by Şen (1989) as
follows.
1) The classical SV, γ(d), for any distance, d, is defined as the half-squared-
difference of two measurements separated by this distance. As d varies from
zero to the maximum possible distance within the study area, the relationship
of the half-square-difference to the separation distance emerges as a theoretical
function, which is called the SV. The sample SV is an estimate of this theoretical
function calculated from a finite number, n, of samples. The sample SV can be
estimated reliably for small distances when the distribution of sampling points
within the region is regular. As the distance increases, the number of data pairs
for calculation of SV decreases, which implies less reliable estimation at large
distances.
2) In various disciplines of the earth sciences, the sampling positions are irregu-
larly distributed in the region, and therefore, an unbiased estimate of SV is not
possible. Some distances occur more frequently than others, and accordingly
their SV estimates are more reliable than others. Hence, a heterogeneous reli-
ability dominates the sample SV. Consequently, the sample SV may have ups
and downs even at small distances. Such a situation gives rise to inconsistencies
and/or experimental fluctuations with the classical SV models, which are, by
definition, non-decreasing functions, i.e., a continuous increase with distance is
their main property. In order to give a consistent form to the sample SV, different
researchers have used different subjective procedures.
a) Journel and Huijbregts (1978) advised grouping of data into distance classes
of equal length in order to construct a sample SV. However, the grouping
of data pairs into classes causes a smoothing of the sample SV relative to
the underlying theoretical SV. If a number of distances fall within a certain
class, then the average of half-squared-differences within this class is taken
as the representative half-squared-difference for the mid-class point. The
effect of outliers is partially damped, but not completely smoothed out by
the averaging operation.
150 4 Spatial Dependence Measures
b) To reduce the variability in the sample SV, Myers et al. (1982) grouped the
observed distances between samples into variable length classes. The class
size is determined such that a constant number of sample pairs fall in each
class. The mean values of distances and half-squared-differences are used
for the classes as a representative point of sample SV. Even this procedure
resulted in an inconsistent pattern of sample SV (Myers et al., 1982), for
some choices of the number, m, of pairs falling within each class. However,
it was observed by Myers et al. that choosing m = 1,000 gave a discernible
shape. The choice of constant number of pairs is subjective and, in addition,
the averaging procedures smooth out the variability within the experimental
SV. As a result, the sample SV provides a distorted view of the variable in
that it does not provide, for instance, higher-frequency (short wave length)
variations. However, such short-wavelength variations, if they exist, are so
small that they can be safely ignored.
The above procedures have two basic common properties, namely predetermi-
nation of a constant number of pairs or distinctive class lengths and the arithmetic
averaging procedure for half-squared-differences as well as the distances. The for-
mer needs a decision, which in most cases is subjective, whereas the latter can lead
to unrepresentative SV values. In classical statistics, only in the case of symmetri-
cally distributed data the mean value is the best estimation, otherwise the median
becomes superior. Moreover, the mean value is sensitive to outliers. The following
points are important in the interpretation of any sample SV.
1) The SV has the lowest value at the smallest lag distances (d) and increases with
distance, leveling off at the sill, which is equivalent to the overall regional vari-
ance of the available sample data. It is the total vertical scale of the SV (Nugget
effect + sum of all component scales). However, linear, logarithmic, and power
SVs do not have a sill.
2) The range is the average distance (lag) within which the samples remain spa-
tially dependent, and it corresponds to the distance at which the SV values level
off. Some SV models do not have a length parameter; e.g., the linear model has
a slope instead.
3) The nugget is the SV value at which the model appears to intercept the ordinate.
It quantifies the sampling and assaying errors and the short-scale variability (i.e.,
spatial variation that occurs at distance closer than the sample spacing). It rep-
resents two often co-occurring sources of variability.
a) All unaccounted for spatial variability at distances smaller than the smallest
sampling distance.
b) Experimental error is often referred to as human nugget. According to
Liebhold et al. (1993), interpretations made from SVs depend on the size
of the nugget because the difference between the nugget and the sill (if there
is one) represents the proportion of the total sample variance that can be
modeled as spatial variability.
4.6 Sample SV 151
4.6 Sample SV
In practice, one is unlikely to get SVs that look like the one shown in Fig. 4.17.
Instead, patterns such as those in Fig. 4.19 are more common.
Important practical information in the interpretation and application of any sam-
ple SV is to consider only about d/3 of the horizontal distance axis values from the
origin as reliable.
A digression is taken in this book as for the calculation of sample SVs. Instead of
easting- and northing-based SVs, it is also possible to construct SVs based on triple
variables. In the following different triple values are assessed for the SV shapes and
interpretations. For instance, in Fig. 4.20 the chloride change with respect to calcium
and sodium are shown in 3D and various sample SVs along different directions are
presented in Fig. 4.21.
This figure indicates that the change of chloride data with respective independent
variables (magnesium and calcium) are of clumped type without leveling effect.
It is possible to consider Fig. 4.20 as having two parts, namely an almost linear
trend and fluctuations (drift) around it. In such a case a neighborhood definition
and weights assignments become impossible. Therefore, the ReV is divided into
two parts, the residual and the drift. The drift is the weighted average of points
within the neighborhood around the estimation value. The residual is the difference
between the ReV and the drift.
The residual is a stationary ReV in itself and hence allows construction of an
SV. However, once again the problem of not being able to define a neighborhood
arises. Therefore, an arbitrary neighborhood is chosen from which a drift can be
calculated. The calculation includes the points within the assumed neighborhood
γ (d) γ (d)
d d
0 0
a b
γ (d) γ (d)
d d
0 0
c d
Fig. 4.19 Common forms of sample SVs (a) random; (b) uniform; (c) clumped with leveling; (d)
clumped without leveling
152 4 Spatial Dependence Measures
Chloride (ppm)
m)
(pp
um
lci
Ca
Ma
gn
es
ium
(p
pm
)
and a corresponding coefficient for each point, which will be explained in more
detail in the Kriging section. The only variable left in the calculation is the SV;
however, no SVs exist from which to obtain the SV. Therefore, a reasonable SV is
assumed and compared to the resultant residual SV. If the two are the same, then
the assumptions made about the neighborhood and SV are correct, and regional
estimation can be made. If they differ, then another SV must be used until they
become the same. It is possible to identify from an SV the optimum distance after
which regional dependence is zero or constant. By definition, the SV is half the
variance of the difference between all possible points at a constant distance apart.
The existence of underlying trend implies the expectation of sample SV similar to
ideal case as in Fig. 4.12.
Zero and 30◦ directional SV in Fig. 4.21 a and b have such a trend, whereas other
directional sample SVs along 60◦ and 90◦ have more or less random type, provided
that only d/3 of the distance axis variables (0–15) are considered.
It is obvious that practically there is no nugget effect in these sample SVs, which
is rather obvious from Fig. 4.20 where the trend surface does not have discontinuity
as in Fig. 4.12. Similarly, 3D representation of bicarbonate variation with calcium
and magnesium is presented in Fig. 4.22, with a global (without different directions)
sample SV in Fig. 4.23.
4.7 Theoretical SV 153
Column C:Cl
Direction:30.0 Tolerance:10.0
Column C: Cl
Direction:0.0 Tolerance:10.0
8000
4000
7000
3500
6000
Semivariogram
3000
Semivariogam
5000
2500
4000
2000
3000
1500
2000
1000
500 1000
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Distance Distance
Column C: Cl
Column C:Cl
Direction: 60.0 Tolerance: 10.0
Direction: 90.0 Tolerance:10.0
7000 9000
8000
6000
7000
Semivariogram
Semivariogram
5000
6000
4000 5000
4000
3000
3000
2000
2000
1000
1000
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Distance Distance
Visual inspection of Fig. 4.22 indicates that a trend surface is embedded in the
ReV. This trend surface is almost horizontal with a very small angle, and accordingly
its global sample SV in Fig. 4.23 has almost horizontal sector within d/3.
Total dissolved solids (TDS) measurement is an indicator of the water quality
variation with respect to calcium and magnesium, and its variation is presented in
Fig. 4.24 with corresponding global SV in Fig. 4.25.
4.7 Theoretical SV
Useful discussion on the computation of the SV in one or two dimensions has been
given by Clark (1979) and Hohn (1988). The SV as computed from the data will
tend to be rather lumpy and the more irregular the data less regular it will appear.
Whatever the extend of lumpiness, the graph of γ (d) may often be linked to one or
154 4 Spatial Dependence Measures
Bicarbonate (ppm)
m)
( pp
m
lciu
Ca
Ma
gn es i
um
( pp
m)
Column C: HCO3
800 Direction: 0.0 Tolerance: 90.0
700
600
Semivariogram
500
400
Reliability range
300
Nugget = 345; Slope = 10.3; Anisotropy = 1.0
200
100
0
0 5 10 15 20 25 30 35 40
Distance
other of a small number of theoretical (ideal) and simple curves that relate γ (d) to
d, which are referred to as the theoretical SVs. These theoretical curves are mod-
els of SV that have been defined on theoretical and sample basis. The fitting of
the theoretical SV curve from a set of functions to the experimentally available
one derived from real data has been developed into the art of “structural analy-
sis” discussed in detail by a number of researchers (Journel and Huijbregts, 1978;
Myers et al., 1982; Hohn, 1988). The main theoretical SV types are linear, spherical,
exponential, Gaussian, or cubic types as will be explained later in detail. These
4.7 Theoretical SV 155
m)
( pp
m
siu
g ne
Ca Ma
lc ium
(pp
m)
Column C: TDS
Direction: 0.0 Tolerance: 90.0
60000
50000
Semivariogram
40000
30000
Reliability range
20000
10000
0
0 5 10 15 20 25 30 35 40
Distance
functions express rather well qualitatively the characteristics of ReV and act as a
quantified summary of the structural information, which is then channeled into the
estimation procedures of the natural (geologic, hydrologic, meteorological, atmo-
spheric, etc.) phenomena.
In order to apply Kriging modeling to an ReV, the first step is to obtain sample SV
from the available data and then to match this sample SV to a theoretically suitable
mathematical function. It describes the relationship between the difference of values
and distance with a mathematical function. Several different types of functions can
be used, each with a different form to the distance function. In addition to its shape
or form, the SV model is described by three parameters, namely nugget, sill, and
range. Nugget shows how much variance is observed at a distance of zero. It shows
up because there may be variation at distances shorter than the sample spacing or
because there are errors in the measurements. It corresponds to discontinuity feature
existence in the ReV. The sill shows how much variation is observed when the SV
levels off at large distances. Most SVs become constant at large distances, provided
that the ReV has spatial dependence without any systematic trend component. Once
it is far enough away, there is no relationship in the ReV between the distances of
two points. The range shows how far one has to cover distance before the SV levels
off to the sill. At distances less than the range an ReV will be said to be spatially
dependent and beyond the range distance the ReV has no effect and there is no
spatial dependence. Finally, if the directional SVs are very different, one may need
to specify an anisotropy parameter. For instance, the N-S SV may be different from
the E-W SV (there is a different range, sill, nugget value, and rather different shape).
The development of an appropriate SV model for a data set requires the under-
standing and application of advanced statistical concepts and tools, which is the
science of SV modeling. In addition, the development of an appropriate SV model
for a data set requires knowledge of the tricks, traps, pitfalls, and approximations
inherent in fitting a theoretical SV model to real world data, which is the art of SV
modeling. Skill with the science and art are both necessary for success.
The development of an appropriate SV model requires numerous correct deci-
sions. These decisions can only be properly addressed with an intimate knowledge
of the data at hand, and a competent understanding of the data genesis (i.e., the
underlying processes from which the data are drawn).
Several SV models have been developed to describe the various underlying spa-
tial patterns in data. Examples of isotropic models including spherical, exponential,
linear, power, and Gaussian models are used as input in the Kriging ReV estimation
process (Chapter 5).
The random field that this SV represents is not continuous (Fig. 4.15). No matter
what the distance is (small or large), each ReV is completely independent and dif-
ferent from others. In this case the spatial analysis methodology is the probability
principles only (Fig. 4.16). The special case of this nugget SV occurs when there is
not any spatial variability but a uniform ReV value at each point (see Figs. 4.7 and
4.8). The mathematical model is then γ(d) = 0 for all d values.
4.7.2 Linear SV
If the ReV does not have any discontinuity, then its simplest model is given as,
where β is the only model parameter with its meaning of the slope of the straight
line (see Fig. 4.26).
A more general form of theoretical linear SV is the mixture of nugget and the
linear SVs. It postulates a linear relationship between the cumulative half-squared-
difference and the distance as
50
45
40 β=5
Semivariogram, γ(d)
35
30
25 β=3
20
15
10 β=1
5 β = 0.5
β = 0.3
0
0 1 2 3 4 5 6 7 8 9 10
Fig. 4.26 Linear SV model Distance, d
158 4 Spatial Dependence Measures
60
50
β=5
Semivariogram, γ(d)
40
β=3
30
20 β=1
β = 0.5
10 β = 0.3
0
0 1 2 3 4 5 6 7 8 9 10
Distance, d
γs (d) = α + βd,
γn (d) = α + βdn ,
4.7.3 Exponential SV
The mathematical expression of theoretical exponential SV model is given as
(Cressie, 1993)
( )
γ (d) = α 1 − e−βd , (4.7)
where the two model parameters are α > 0 and β > 0. This model is used com-
monly in hydrological studies. Figure 4.28 shows exponential models with different
variance values and β = 1. It does not have a nugget value and hence represents
continuous ReVs, which are stationary, because at large distances exponential SV
approaches a horizontal asymptote.
4.7.4 Gaussian SV
Its general expression is given in the following mathematical form (Pannatier, 1996):
( )
γ (d) = α 1 − e−βd ,
2
(4.8)
where α > 0 and β > 0 are the two parameters and according to their values it takes
different forms. Figure 4.29 shows a set of Gaussian SV with different α values and
β = 1. Since at large distances there is a horizontal portion, the representative ReV
is also stationary.
At small distances the Gaussian model appears as proportional to d2 , which
implies that the ReV is smooth enough to be differentiable, i.e., the slope of SV
tends to a well-defined limit as the distance between the two points.
5
4.5
α=5
4
Semivariogram, γ(d)
3.5
α=3
3
2.5
2
1.5
α=1
1
0.5
0
Fig. 4.28 Exponential SV 0 1 2 3 4 5 6 7 8 9 10
model Distance, d
160 4 Spatial Dependence Measures
5
4.5 α=5
4
4.7.5 Quadratic SV
where the single model parameter α > 0. Its shape is given in Fig. 4.30 for different
α values. As it increases the sill value decreases, which means that the fluctuations
become smaller. It represents stationary ReVs with non-differentiable properties.
d2
γ (d) = α , (4.10)
1 + d2
where α > 0 is the model parameter. It has similar behavior to Gaussian model
at small distances, which implies that the ReV is differentiable and rather contin-
uous. Increase in α parameter value causes increase in the sill level, which fur-
ther shows that the fluctuations in the ReV from its average level become bigger
(see Fig. 4.31).
4.7 Theoretical SV 161
5
4.5
4 α=5
Semivariogram, γ(d)
3.5
3
2.5
α=3
2
1.5
1
0.5 α=1
0 1 2 3 4 5 6 7 8 9 10
Distance, d
4.7.7 Power SV
Its general form is presented by Pannatier (1996) as
+ +
γ (d) = α +dm + , (4.11)
where α > 0 and 0 < m < 2 are the model parameters. Convex and concave forms
result, respectively, for 0 < m < 1, and 1 < m < 2 as in Fig. 4.32. Besides, it reduces
162 4 Spatial Dependence Measures
3.5 2.5
α = 1, m = 1.9
3 = 0.9
,m 2 α = 1, m = 1.7
1
α= .7 α = 1, m = 1.5
=0
Semivariogram, γ(d)
Semivariogram, γ(d)
2.5
1, m α = 1, m = 1.3
α=
1.5 α = 1, m = 1.1
2 ,m = 0.5
α=1
1.5 α = 1, m = 0.3 1
1 α = 1, m = 0.1
0.5
0.5
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5
Distance, d Distance, d
a b
to a linear model (Eq. 4.5) for m =1. Depending on whether m > 1 (m < 1), the
SV represents a differentiable (non-differentiable) ReVs. Additionally, none of the
theoretical SVs approach to a sill value, and therefore the represented ReV does not
have stationary property. These SVs describe the same appearance of realizations at
every scale, and therefore they are referred to as self-similar ReVs. This is because
they appear as a straight line on a double-logarithmic paper with different slopes.
The main difference of this model from the others is that it has a non-zero value
for zero distance, i.e., it has a nugget effect. The forms of different cumulative SVs
resulting from Eq. (4.11) are shown in Fig. 4.32.
where again α > 0 is the model parameter (see Fig. 4.33). This model describes
ReVs with excursions above the mean tend to be compensated by excursions below
the mean. It exhibits linear behavior at small distances, which implies that the cor-
responding ReVs are continuous but not differentiable. They are less smooth than
the realization of a random field
4.7.9 Spherical SV
Semivariogram, γ (d)
5
4
α=3
3
2
α=1
1 α = 0.5
0
0 1 2 3 4 5 6 7 8 9 10
Distance, d
α=5
3.5
3
2.5
α=3
2
1.5
1
α=1
0.5 α = 0.5
α = 0.3
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Distance, d
4.7.10 Logarithmic SV
Semivariogram, γ(d)
0 α = 0.5
–5
–10
–15
–20
–25
0 0.5 1 1.5 2 2.5 3
Distance, d
The CSV method proposed by Şen (1989) as an alternative to the classical SV tech-
nique of Matheron (1965) has various advantages over any conventional procedure
in depicting the regional variability, and hence spatial dependence structure CSV is
defined similarly to the SV, the only difference is that successive cumulative sum-
mations are adopted. CSV has all of the advantages claimed for SV; besides, it pro-
vides more objective way in deriving theoretical models to the regional dependence
behavior of the regionalized variable. Furthermore, standardization of CSV provides
a basis in identifying regional stationary stochastic models (Şen, 1992). The CSV is
a graph that shows the variation of successive half-squared difference summations
with distance. Hence, a non-decreasing CSV function is obtained, which exhibits
various significant clues about the regional behavior of the ReV. The CSV provides
a measure of spatial dependence. The CSV can be obtained from a given set of ReV
data by executing the following steps.
2) For each distance, di,j calculate the corresponding half-squared differences, Di,j ,
of the ReV data. For instance, if the ReV has values of Zi and Zj at two distinct
sites at distance di,j apart, then the half-squared difference is
1 2
Di,j = Zi − Zj . (4.15)
2
m
m
γ(di,j ) = Di,j . (4.16)
i=1 i=1
700 VM
600
500
400
γc(d)
300
200
100
RM
0
0 200 400 600 800 1000 1200 1400 1600 1800
Distance (km)
In his discussion on the practical difficulties of classical SV, Şen (1989) has
noticed the reliability of SV estimations at small distances, with regular distribution
of sampling points within the region. However, in various disciplines of geological
sciences, the sampling positions are irregularly distributed in the region (such as in
the earthquake epicenters) where some distances occur more frequently than others;
thus, a heterogeneous reliability dominates the sample SV. In order to overcome
these shortcomings, he developed CSV technique on the basis of the ReVs theory.
This technique possesses all of the objective properties of the classical SVs and, in
addition, it helps to identify the hidden and local dependencies within a region. It is
defined as the successive summation of half-squared differences, which are ranked
according to the ascending order of distances extracted from all possible pairs of
sample locations within a region. Mathematically, CSV is expressed as
k " #
γc (dk ) = d di (k = 1,2,...,m), (4.17)
i=1
where γc (dk ) is the value of the k-th ordered distance CSV value and superscript i
indicates the rank. The procedure of sample CSV calculations as well as its model
form and equations have been given by Şen (1989) as in Fig. II.12. These models
are counterpart to those of classical SV models, but with different interpretations of
the model parameters. The attributes and advantages of CSV can be summarized as
follows.
1) The CSV is a non-decreasing function; however, there may be local flat portions,
implying constancy of the regionalized variables at certain distance, i.e., the
same values have been observed at two locations h apart.
2) The slope of the theoretical CSV at any distance is an indicator of the depen-
dence between pairs of regionalized variables separated by that distance.
3) The sample CSV reflects even smaller dependencies between data pairs,
which are not possible to detect with classical SV due to the averaging
procedure.
4) The sample CSV is straight forward in applications and free of subjectivity,
because there is no need for a priori selection of distance classes. In fact, the
real distances are employed in the construction of the sample CSV rather than
class mid-point distance.
5) The CSV model may be used for irregularly distributed sample positions within
the study region.
6) The underlying model for any regionalized variable can be detected by plotting
the cumulative half-squared differences versus distances on arithmetic-semi-
logarithmic or double logarithmic. Appearance of sample CSV points on any
one of these papers as a straight line confirms the type of model. Such an oppor-
tunity is missing in the samples of classical SV.
7) Model parameter estimates are obtained from the slope and intercept values of
the straight line.
4.8 Cumulative Semivariogram 167
The CSV proposed in the previous section is applied to the transmissivity, total dis-
solved solids, and piezometric level records in the Wasia sandstone aquifer in the
eastern part of the Kingdom of Saudi Arabia (Şen, 1989). A complete hydrogeolog-
ical study of this area has been performed recently by Subyani (1987).
GAMA3 software developed for computing the classical SV by Journel and
Huijbregts (1978) has been applied to groundwater variables such as transmissiv-
ity, piezometric level, and total dissolved solids from the Wasia sandstone aquifer.
The resulting sample SV and sample CSV plots are presented in Figs. 4.37, 4.38,
and 4.39. It is clear from these figures that the half-squared-difference points are
scattered in such a way that it is not possible to distinguish a clear pattern in the
sample SVs, which suffer from fluctuations even at small distances. Comparisons
of the sample SVs in these figures with the sample CSVs indicate that the latter are
more orderly and have distinctive non-decreasing patterns.
2.8
CSV
2.4
2.0
γ c (d) (×10–3 m4/s2)
1.6
3.0
SV
γ (d) (×10–3)
1.2
2.0
0.8
1.0
0.4 0
0 40 80 120 140 180
d (km)
0
0 20 40 60 80 100 120 140 160 180 200
d (km)
140
CSV
120
100
12 SV
2
γcd (mg/l)
2
80
γ cd (mg/l)
8
60
4
40 0
0 40 80 120 140 180
d (km)
20
0
0 20 40 60 80 100 120 140 160 200
d (km)
280
240
30 SV
200
γc(d) (m2)
20
γ c(d) (m2)
160
10
120 0
0 40 80 120 140 180
d (km)
80
40
CSV
0
0 10 20 30 40 50 60 70 80 90 100
d (km)
A sample CSV often yields more or less a straight line for large distances, which
corresponds to the sill concept in the classical SV. Furthermore, the sample CSV
starts as a curve with different curvatures over some distance domain before it
becomes almost a straight line. The length of the distance domain over which the
sample CSV occurs as a curve is a counterpart of the range in the classical SV.
Hence, it is straightforward to determine the range from the sample CSV. The piezo-
metric level sample CSV in Fig. 4.39 shows an initial range portion, which has zero
half-squared-differences for about 10 km. Such a portion implies physically that the
piezometric level does not change significantly within distances less than 60 km.
In fact, the Wasia aquifer has remained free of any tectonic movements, it is exten-
sive, and the recharge is negligible, but it is discharged by local well groups that are
situated at large distances from each other (Powers et al., 1966).
!d
γc (d) = γ (u) du, (4.18)
0
or through differentiation as
+
dγc (u) ++
γ (d) = u=d . (4.19)
du +
Therefore, a CSV counterpart may be found for any given classical SV using Eq.
(4.18). Furthermore, Eq. (4.19) indicates that the theoretical classical SV value at
any distance is equal to the slope of the theoretical CSV at the same distance. In the
following, models which have been used previously for SVs by many researchers
will be assessed from the CSV point of view.
the most extreme case, which rarely happens in earth sciences domains. This model
postulates a linear relationship between the cumulative half-squared difference and
the distance as
in which α and β are the model parameters (Fig. 4.40a). The sample CSV of the
regionalized variable that abides by this model will appear as a straight line on
arithmetic paper.
In fact, α is the intercept on the CSV axis and β is the slope of this straight
line. This slope corresponds to the sill value in the classical SV, which represents a
pure nugget effect (Şen, 1989). Furthermore, β represents exactly the variance of the
underlying random field. Hence, the smaller the slope of the straight line, the smaller
the random fluctuation in the ReV. If the slope is equal to zero, theoretically, this
indicates a complete deterministic uniform variation in the ReV. The sample CSV
scatter diagram and the fitted regression line to pH values measured at 71 sample
locations within the Umm Er Radhuma Limestone aquifer in the Eastern Provinces,
Saudi Arabia (Fig. 4.41), has the form
1
12 a 12 b β=
β>1
8 8
γc(d)
γc(d)
4 Linear 4 Power
0 0
0 4 8 12 16 20 0 4 8 12 16 20
d d
c 12 d
00
12
10
β=1
β=
8 8
γc(d)
γc(d)
1
β=
4 4
Exponential
Logaritmic
0 0
0 4 8 12 16 20 0 4 8 12 16 20
d d
Fig. 4.40 Theoretical CSVs
4.8 Cumulative Semivariogram 171
15
data
model
12
9
γc(d)
0
0 1 2 3 4 5 6 7 8 9 10 11 12
d (km)
from which the parameter estimates are α = −0.213 and β = 1.144. The hydro-
chemical data were presented by Şen and A1-Dakheel (1985) for major anions and
cations.
in which α is the scale parameter and β is the shape parameter. Because 0 < β < 2
for a theoretical SV from a power family (Journel and Huijbregts, 1978, p. 165),
parameter β for the theoretical CSV in Eq. (4.21) is restricted to the range 1 < β < 3.
The derivative of Eq. (4.21) yields also a power form for the classical SV. Obviously,
use of a double logarithmic paper facilitates parameter estimation. Sulfate concen-
trations in the Umm Er Radhuma aquifer groundwater show on double logarithmic
paper a more or less straight line pattern (Fig. 4.42).
The mathematical expression of this straight line by the regression technique can be
found as
30
data
model
101
γc(d)
100
10–1
101 100 101 20
d (km)
hence, parameter estimates are log α = 0.46 or α = −2.88 and β = 0.84. The original
form of this model prior to transformation can be written as γc (d) = 2.88d0.84 .
where α and β are scale and shape parameters, respectively. The main difference of
this model from the others is that it has a non-zero value for zero distance, i.e., it has
a nugget effect. Forms of different CSVs resulting from Eq. (4.22) are shown (Fig.
4.40c). The sample CSV can be checked for concordance with this model by plotting
log γc (d) vs. d on semi-logarithmic paper. If the sample points appear as a straight
line, the exponential model is the generating mechanism of the regional variability
within the regionalized variable. The slope of this line directly yields an estimate of
β, whereas the intercept on the γc (d) axis leads to an estimate of β. This model does
not have a unique classical SV, which has appeared in the geostatistical literature.
The sample CSV for bicarbonate concentrations in the Umm Er Radhuma aquifer
appears as a straight line on semi-logarithmic paper (Fig. 4.43). The appearance of
this straight line implies that the convenient model for bicarbonate concentrations
for this aquifer is of exponential type. The regression line of this scatter diagram is
4.8 Cumulative Semivariogram 173
100
data
model
γc(d)
10–1
0 1 2 3 4 5 6 7 8 9 10 11 12
d (km)
and, correspondingly, the model parameter estimates are log γc (d) = −0.86 or α
= 0.14 and β = 0.079. Hence, the original form of the model can be written as
γc (d) = 0.14e0.079d .
in which α and β are the two model parameters. This model differs from the expo-
nential one in that it has an intercept on the distance axis similar to the sample
CSV for piezometric level (Fig. 4.39). Different forms of the logarithmic models
are presented (Fig. 4.40d). The model can be depicted from a sample CSV plotted
on semi-logarithmic paper as γc (d) vs. log d. If the sample points appear as a straight
line, the validity of the logarithmic model is confirmed. The slope of this straight
line is equal to β, and the cumulative half-squared difference corresponding to d = 1
174 4 Spatial Dependence Measures
yields the estimate of a. Such a model is similar to what is referred to in the classical
SV terminology as the De Wijsian model (DeWijs, 1972).
Other models for the CSV can be constructed from classical CSV models through
Eq. (4.18). For instance, the exponential model of the classical SV, which is
" #
γ (d) = α 1 − e−βd , (4.24)
in which α and β are model parameters. A close inspection of Eq. (4.25) indicates
that for large distances, (1/β)exp(−βd) ≈ 0; consequently, at large distances this
model appears as a straight line (on arithmetic paper) whose slope is an estimate
of α. In addition, this model has an intercept value, γc (0), which is equal to α/β.
Provided that α is known from the slope at large distances, this ratio yields the
estimate of β. These α and β values are the parameters of the classical exponential
SV model. This last example shows that the CSV method may help to estimate the
parameters of the classical SV by simple graphical procedures.
where φ(d, β) is the area under the normal probability density function (with zero
mean and variance 1/β) from 0 to d. Obviously, α can be estimated as the slope of
this straight line. That is
* !d 2
β d β
φ(d, β) = exp − dd. (4.27)
2π 2
0
Last but not the least, it is also possible to have different types of CSV mod-
els along different directions for the same earth sciences phenomenon. In such a
situation, there is structural heterogeneity within the phenomenon.
Various theoretical CSV models are fitted to the sample CSV by simple least
squares technique. A weighted or generalized least-squares approach would proba-
bly be preferable because the sample CSV values are correlated and do not have
equal variance. Future researches should be directed toward how to implement
4.9 Point Cumulative Semivariogram 175
1) Calculate the arithmetic average, Z, and standard deviation, Zs , from the avail-
able data. Standardize the data according to the following formulation,
Zi − Z
zi = . (4.28)
ZS
2) Calculate the distances between the desired site and the remaining sites. If there
are m sites, the number of distances is n−1, di (i = 1, 2, . . . , n−1).
3) For each pair calculate the squared differences as (zc –zi )2 where zc and zi are
the ReVs at the concerned and i-th sites, respectively. Consequently, there are
(n–1) squared differences.
4) Rank the distances in ascending order and plot distances di versus correspond-
ing successive cumulative sums of half-squared differences. Hence, a non-
decreasing function is obtained similar to Fig. 4.36, which is named as the sam-
ple PCSV for the desired site.
All these steps imply that the PCSV, γ(dc ) for site C can be expressed as
1
n−1
γ (dc ) = (zD − zi )2 . (4.29)
2
i=1
5) Application of these steps in turn for each site leads to n sample PCSVs of ReV.
These sample PCSVs are the potential information sources in describing the ReV
characteristics around each site. Among these characteristics are the radius of influ-
ence, spatial dependence, and structural behavior of the regionalized variable near
the site such as the nugget (sudden changes) and sill effects, and heterogeneity as
will be explained in the following section.
176 4 Spatial Dependence Measures
Example 4.1
The PCSV proposed in the previous section is applied to seismic data from Turkey
(Erdik et al., 1985). Conventional probabilistic procedures are used to construct
the seismic hazard maps for Turkey. Seismic hazard is defined as the probability
occurrence of ground motion due to an earthquake of a particular site capable of
causing significant loss of value through damage or destruction at a given site within
a definite time period. The PCSV technique is applied to all sites; some of the sample
PCSVs are presented in Fig. 4.44, and after grouping, the major sample trends of
PCSV appeared as shown in Fig. 4.45. Consideration of these figures individually
or comparatively leads to the following significant interpretations, which can be
implemented in any future seismic hazard mapping models of Turkey (Şen, 1997).
1) Individual seismic PCSVs have rather different appearance from each other,
which indicates that the regional seismic distribution in Turkey is heteroge-
neous. It is a necessary and sufficient requirement for areal heterogeneity that
the sample PCSV should exhibit different patterns within the limits of sam-
pling error at different sites. A first glance through the whole sample PCSVs
gave the impression that, in general, there are ten distinctive categories of them
within Turkey. These categories are labeled alphabetically from A to J as shown
in Fig. 4.45. Each of these categories has different features, which reflect the
seismic record behavior around the station concerned. For instance, those sam-
ple PCSVs, which are in category F (see Fig. 4.45), have an initial part with
convex curvature (at small distances) and then either single or multiple broken
straight lines follow at large distances. On the other hand, in category D sample
PCSVs do not have any curvature but many broken straight lines and have an
intercept on the horizontal distance axis. An abundance of broken straight lines
indicates the heterogeneity involved around the station concerned at different
distances.
In category B the sample PCSVs expose single straight line, indicating that
there are rather homogeneous areas of influence around these stations.
The stations within each category can be regarded as homogeneous collec-
tively, and hence it is clear that the whole study area has about ten distinct
homogeneity regions of seismic variation. Further useful interpretations about
the sample PCSVs can be listed as follows:
2) Some of the sample PCSVs does not pass through the origin. This is tanta-
mount to saying that the seismic occurrences at these sites cannot be consid-
ered as regionally smooth processes, but rather the seismic event is under the
control of some local and/or regional geological factors. This further implies
that in the spatial seismic occurrences, uniform conditions do not prevail but
rather complex combination of multitude tectonic events. Last but not the
least, if the sample PCSV has an intercept on the vertical axis, it implies the
existence of a nugget effect in the regional variability at the site concerned
(Fig. 4.45 J).
3) Some of the sample PCSVs have intercepts, R0, on the horizontal (distance)
axis (see categories C, F, and I). This means that at distances less than R0 the
4.9 Point Cumulative Semivariogram 177
40
40
Point CSV
Point CSV
20
20
0
0
0 500 1000 1500 0 500 1000 1500
Distance (km) Distance (km)
40
40
Point CSV
Point CSV
20
20
0
Point CSV
Point CSV
20
20 0
sample PCSV value is equal to zero, hence from Eq. (4.29) Zc ∼ = Zi imply-
ing structural control within the regional seismic event. Furthermore, in gen-
eral, big seismic values follow big seismic values and small ones follow small
seismic values, i.e., there are isolated islands of high or low seismic locations
around the site.
178 4 Spatial Dependence Measures
Point CSV
Point CSV
A B
Distance Distance
Point CSV
Point CSV
C D
Distance Distance
Point CSV
Point CSV
E F
Distance Distance
Point CSV
Point CSV
G H
Distance Distance
Point CSV
Point CSV
I J
Distance Distance
4) That each sample PCSV fluctuates about a straight line for large distances.
The existence of straight line portions in the PCSV implies that seismic activ-
ities at large distances are independent from each other. This is equivalent to
the regional sill property in the classical SV, but PCSV provides information
about the sill at individual sites (see Fig. 4.45 J). These portions correspond to
horizontal segments at large distances in classical SV as defined by Matheron
(1963). Furthermore, this is the only range where the classical formulations
keep their validity.
5) A group of sample PCSVs passes through the origin (see category in Fig. A, B,
D, E, G, and H). Such a property on the PCSV diagram implies the continuity
of seismic effects from the site outward. Continuity means that there are no
nugget effects or discontinuities within the regional seismic variable at the site
concerned.
6) Some of the sample PCSVs has “curvature portions” for a moderate range of
distances (categories E and G). In fact, such a range corresponds to the distance
scale as defined in the turbulent flow by Taylor (1915). After this range, the
PCSVs converge to straight lines (Şen, 1989). The initial curvature implies
that the seismic at these sites has regional dependencies, which weaken toward
the end of curvature distance range, (Şen, 1989). Since curvatures are convex,
there are positive regional structural dependence (Şen, 1992). Furthermore, the
curvature implies that the seismic has areal structural dependence.
7) In some of the sample PCSVs, there is no curvature part at all (categories B and
F). Such a situation is valid in cases where the regional seismic distributions
arise predominantly due to the activities of external factors only. Furthermore,
there is no structural correlation, i.e., seismic phenomena evolve randomly over
the region concerned.
8) As suggested by Şen (1992), the sample CSVs help to identify the underlying
generating mechanism of regional phenomenon. Likewise, the sample PCSVs
provide clues about the seismicity-generating mechanism around the site con-
cerned. For instance, if the sample PCSV passes through the origin and has
straight line portions only, then the regional phenomenon concerned complies
with independent (white noise) process with no regional dependence at all.
However, when the sample PCSV is in the form of straight line but does not
pass through the origin, then a moving average process is the underlying gen-
erating mechanism of the regional variability (Şen, 1992). The PCSVs in cate-
gory F have such a property, and therefore it is possible to conclude that moving
average mechanisms are dominant at these sites.
9) In the case of a single straight line following a curved initial portion, the slope
of long-distance straight line portion is related to the regional standard devia-
tion of the underlying precipitation generating mechanism.
10) As mentioned above, the PCSV is an indicator of cumulative similarity of
the seismic varıatıon at a station with other stations. Practically, if the two
PCSV at different sites follow the same pattern within the limits of sampling
errors, then they are said to be similar. Such a similarity implies heterogene-
ity to exist between two sites. They may be partially similar to each other at
some distances if the PCSV values at the same distances are close to each other
180 4 Spatial Dependence Measures
1 180 305
2 300 1110
3 440 940
4 405 1295
5 300 1110
6 295 150
7 500 800
8 540 760
9 595 1120
12 370 600
13 180 435
14 280 435
20 470 1310
21 325 520
22 460 1065
23 220 500
24 200 280
30 580 1460
31 250 500
32 400 1070
33 305 880
34 350 1035
40 360 975
41 95 115
42 450 735
43 405 900
50 370 1165
51 220 470
52 430 650
53 280 420
60 115 295
61 300 420
62 320 1025
63 400 1205
70 705 1050
71 210 395
72 300 750
4.9 Point Cumulative Semivariogram 181
73 275 700
80 100 205
81 405 1110
82 620 800
83 215 500
91 200 645
92 200 500
93 320 1135
100 200 280
101 325 580
102 220 620
103 230 680
104 120 195
105 110 290
106 500 645
The relevant similarity maps are shown in Figs. 4.46 and 4.47. For a fixed PCSV
level the smaller is the distance the more the seismic activity effect at the site and the
smaller is the regional dependence, i.e., the location of the site has relatively intense
seismic occurrences than the other sites or the regions. For instance, in Fig. 4.46 the
map shows intense seismic occurrences in the eastern part of Turkey, surrounded by
the 120 contour line. The next most intensive seismic varıatıons are observed in the
central, northeast, and western portions of the country where contour lines of 280
occur. However, the least sensitive locations are in the southern parts, with similarity
contours of about 600.
On the other hand, at the higher level of similarity as presented in Fig. 4.47 the
study area seems more heterogeneous, but major distinctive regional zones as in
Fig. 4.46 remain the same.
Principles of PCSV have been explained and applied to seismic data in Turkey.
This type of PCSVs provides detailed information about a regional variable at and
near the measurement sites as well as among the sites. The main purpose of the CSV
technique is to check the heterogeneity of the regionalized variable. If the empirical
point CSVs at different sites have similar patterns within a certain error band such as
5–10%, then the regionalized variable is homogeneous and otherwise heterogene-
ity exists. The point CS concept brings an additional new concept, which provides
an opportunity to make spatial variability interpretations at each site rather than
regionally. Interpretations of relevant PCSV at any site provide useful information
concerning the smoothness, structural control, regional dependence, continuity, and
the radius of influence. The PCSV methodology proposed in this paper is applied
to the scattered seismic varıatıon over Turkey. Finally, a similarity map is obtained,
which provides a basis for the regional heterogeneity assessments.
182 4 Spatial Dependence Measures
35° N
35° N
25° E 45° E
35∞ N
35∞ N
25∞ E 45∞ E
Example 4.2
In order to explain the experimental CSV and thereof derived SDFs, the monthly
rainfall amounts at a set of stations are considered, each with at least 30 years of
records in the northwestern part of Turkey (Şen, 1997). Some of the CSVs are
presented in Fig. 4.49 with corresponding classical and calculated SDF values in
Fig. 4.50.
1.0
0.8
0.6
Weighting
0.4
0.2
4000 1500
January February
1000
CSV (mm2)
CSV (mm2)
2000
500
0 0
0 100 200 0 100 200
Distance (km) Distance (km)
8000 3000
March
April
CSV (mm2)
CSV (mm2)
4000 2000
2000 1000
0 0
0 100 200 0 100 200
Distance (km) Distance (km)
1000 1500
May June
CSV (mm2)
CSV (mm2)
1000
500
500
0 0
0 100 200 0 100 200
Distance (km) Distance (km)
1500 2000
July August
CSV (mm2)
CSV (mm2)
1000
1000
500
0 0
0 100 200 0 100 200
Distance (km) Distance (km)
3000 8000
September October
2000
CSV (mm2)
4000
1000
0 0
0 100 200 0 100 200
Distance (km) Distance (km)
3000 1000
November December
CSV (mm2)
CSV (mm2)
2000
500
1000
0 0
0 100 200 0 100 200
Distance (km) Distance (km)
1) They all reach almost horizontal CSV value at large distances, which mean that
after a certain distance there is no regional effect of one station on other stations’
rainfall amount. This distance corresponds to R in Eq. (3.24, 3.25, and 3.26).
2) Initially, all the CSV’s have an intercept on the horizontal distance axis at about
5 km. This corresponds to almost the smallest distance between the stations.
3) The smallest and the greatest CSV values at large distances occur during July
and October, respectively, which are the transition months in this region from
the Black Sea to the Mediterranean Sea climate in July and vice versa in
October.
On the other hand, Fig. 4.50 includes the geometric weighting functions already
given in Fig. 3.31 for the sake of comparison. The following interpretations and
conclusions can be drawn from these figures for each month.
1) In January, the experimental CSV weighting function does not conform by any
of the classical models. Initially, at small distances it is above all the models and
then becomes closer to exponential model, but only up to about 0.30 dimension-
less distances, deviating from it thereafter.
186 4 Spatial Dependence Measures
January February
1.0 1.0
Eq. (3.21) Eq. (3.21)
CSV
CSV
0.5 0.5
0.5 0.5
CSV
0.5 0.5
0.5
CSV
0.5
September October
1.0 1.0
Eq. (3.21) Eq. (3.21)
CSV
CSV
0.5 0.5
Eq. (3.25)
Eq. (3.25)
0 0
0 0.5 1.0 0 0.5 1.0
Distance
Distance
November November
1.0 1.0
Eq. (3.21) Eq. (3.21)
Eq. (3.21)
Eq. (3.21) CSV
0.5
CSV
0.5
2) In February, perhaps initial portion confirms with the exponential model, but
later becomes closer to the power model.
3) March experimental CSV follows the power law of the weighting functions at
almost every distance.
4) Similar pattern to March repeats itself in April, but in May although initially
there is a portion abiding with the power model, then it converts to exponential
model. In June, the experimental CSV weighting function comes even closer to
the exponential weighting function.
5) During September very small and big distances come closer to the ratio and
power models, respectively, but for moderate distance it has an exponential
form. So the mixture of three models appears as a hybrid model in represent-
ing the regional variability. Such a phenomenon cannot be estimated only by
considering one of the classically available models. In the remaining months,
similar interpretations are valid.
These discussions indicate that classical geometric weighting models do not have
full justification for the whole of the meteorological phenomenon, but they are good
first approximations. They cannot be valid for the whole regional variability in any
study area.
188 4 Spatial Dependence Measures
Example 4.3 For the implementation of the SDF and spatial estimation method-
ology, the iron (Fe) percentage (%) concentration dataset used by Clark (1979, p.
95, Table 4.5) is adopted. Although there are 50 sampling points, in order to make
comparison with fuzzy clustering results by Pham (1997) only 21 sites are given
in Table 4.2, because he considered only 21 sampling points from Clark’s table.
Figure 4.51 shows the locations of the sites within the study area.
It is obvious that there is a rather uniform distribution of these sites in a repre-
sentative manner over the whole study area.
The CSV is given in Fig. 4.52. The maximum CSV corresponding to 350
m is 4,250, and accordingly the SDF is obtained and the results are presented
in Fig. 4.53.
Similar to all the regional estimation procedures, weighted average formulation
as in Eq. (2.27) is used together with the weights obtained from SDF in Fig. 4.53.
In order to assess the validity of the proposed weighted average procedure, a
cross-validation technique is used. According to this a data value at one site is sup-
posed to be unknown and is removed from the data set. This removed value is then
estimated with the remaining set of data and by using the SDF together with Eq.
(2.27). This procedure is repeated for all the sites, knowing that a data removed for
estimation at its location is put again in the set for the estimation at another location.
150
Northing (km)
120
90
60
30
0
0 100 200 300 400 500
Easting (km)
4500
4000
3500
SEMIVARIOGRAM
3000
CUMULATIVE
2500
2000
1500
1000
500
0
0 50 100 150 200 250 300 350 400 450
DISTANCE
There are two procedures in the estimation of the site Fe % concentrations. In the
first one, all the other sites are considered for their simultaneous contributions, and
therefore in the estimation of any site’s Fe % concentration, all the distances from
this site to others (n−1 sites) are measured from the map in Fig. 4.51. Subsequently,
these distances are entered into Fig. 4.53 on the horizontal axis and the correspond-
ing RDF weights are found from the vertical axis for each distance. In this manner,
all the sites are treated equally by the same procedure and hence, instead of mea-
sured values, their estimations through the SDF and the cross-validation procedure
are calculated. Column (3) in Table 4.2 shows estimated Fe % concentrations and
their corresponding relative errors are calculated as the ratio of the absolute differ-
ence between the measured and the estimated values divided by the measured value
multiplied by 100, which is shown in column (4).
190 4 Spatial Dependence Measures
1,2
1
1- DIMENSIONLESS CSV
0,8
0,6
0,4
0,2
0
0 50 100 150 200 250 300 350 400 450
DISTANCE (KM)
|measured – estimated|
Relative error = 100 ∗ .
measured
It is obvious that for almost half of the sites, the relative error is more than 10%,
which indicates the unsuitability of the so far proposed procedure. For extreme Fe
% concentration sites, the relative errors are very high. However, the averages of
measured and estimated values are very close together with 2% relative error. On
this basis, it may be concluded that the proposed procedure yields reasonable values
on the average, but fails to estimate individual site values. This procedure takes into
account the contribution of all the sites in the estimation and disregards the concept
of the radius of influence.
In order to assess visually, the correspondence between the measured and the
estimated values, Fe % concentrations are presented in Fig. 4.54a against the site
number sequency along the horizontal axis. Unfortunately, consideration of all the
sites in this regional estimation procedure by using Eq. (2.27) and SDF function
for weights calculation yields to average estimations, which do not represent the
high and low points satisfactorily as obvious from Fig. 4.54a. On the other hand,
Table 4.3 shows also the results by the inverse distance square and four fuzzy clustral
procedures as suggested by Pham (1997). The comparisons of average relative errors
in the last row of this table indicate that the SDF method has the least relative error
percentage. It is even better than the five-cluster case, which has been stated as the
best solution by Pham (1997).
In order to improve the representativeness of the Fe % concentrations region-
alized variable estimations at sites, herein, an adaptive new technique is suggested,
which does not only estimate the regional value at a site but also provides the number
of the nearest sites that should be considered in the best possible regional estimation.
4.10 Spatial Dependence Function 191
Inverse square
Sample No. RDF distance Fuzzy clustering
(1) (2) (3) (4)
Accordingly, the radius of influence is defined as the distance between the estima-
tion site and the far distant site within the adjacent sites that are considered in the
regional estimation procedure. The following steps are necessary for the application
of this adaptive procedure.
(a) Take any site for cross-validation and apply Eq. (2.27) by considering the near-
est site only. Such a selection is redundant and corresponds to the assumption
that, if only the nearest site measurement is considered, then the regional esti-
mation will be equal to the same value. This means that in such a calculation
the radius of influence is the minimum and equal to the distance between the
estimation and the nearest sites.
(b) Consider now two of the nearest sites to the estimation site and apply the
RDF weighting method according to Eq. (2.27). Consideration of two sites will
increase the radius of influence as the distance between the estimation and the
next nearest site and the estimation value will assume weighted value of the
two nearest sites. Since the weights and measurements are positive numbers,
192 4 Spatial Dependence Measures
Measured
43
Fuzzy estimated
IRON-ORE PERCENTAGE
40
37
34
31
28
25
22
0 5 10 15 20 25
SITE NUMBER
a
43 Measured
SDF estimated
40
IRON-ORE PERCENTAGES
37
34
31
28
25
22
0 5 10 15 20 25
SITE NUMBER
b
Fig. 4.54 Measured and estimated Fe % concentration, (a) whole points, (b) adaptive method
the estimated value will be between the measurements at the two nearest sites.
There will be a squared estimation error as the square of the difference between
measured and estimated values.
(c) Repeat the same procedure now with three nearest stations and calculate the
square error likewise. Subsequently, it is possible to continue with 4, 5, . . . ,
(n−1) nearest sites consequently, and for each case to calculate corresponding
4.10 Spatial Dependence Function 193
Number of
nearest sites Estimation Square error
(1) (2) (3)
2 31.94 1.58
3 29.46 13.96
4 30.75 6.00
5 32.42 0.60
6 32.21 0.97
7 32.38 0.67
8 32.82 0.14
9 32.59 0.37
10 32.72 0.23
11 33.21 0.00
12 33.12 0.01
13 33.48 0.08
14 33.80 0.36
15 33.69 0.24
16 33.63 0.18
17 33.57 0.13
18 33.55 0.12
19 33.48 0.08
20 33.51 0.09
square error. The first one with the least square error yields the number of near-
est sites for the best regional Fe % concentration estimation. The distance of
the farthest site in such a situation corresponds to the radius of influence. As an
example, herein, only site-14 calculations are presented in detail in Table 4.4. It
is obvious that when Eq. (2.27) is applied by considering 11 nearest sites, the
estimation error square becomes the least with the radius of influence equal to
127.47 m.
1) The adaptive estimation procedure gives average iron ore concentration value
similar to average measurements with 2.31% error. Hence, similar to all
the previous methods, adaptive estimation also gives reasonable average
values.
2) Comparison of average relative error in Table 4.4 with average relative errors in
Table 4.4 shows clearly that the adaptive method with 4.74% error is the best
among all approaches, and the reduction in the relative error implies that devia-
tions from average level are taken into account effectively. Figure 4.49b presents
the adaptive estimations together with the measured values for Pham (1997) data
194 4 Spatial Dependence Measures
set. If Fig. 4.49a and b are compared, it is then obvious that deviations are better
accounted by the adaptive method.
3) The adaptive approach provides the radius of influence for each station as shown
in the last column of Table 4.5. The average radius of influence is about 88.5 m
with maximum and minimum values of about 223 and 13 m, respectively.
By making use of the radius of influence from Table 4.5, it is possible to construct
equal radii regional map as shown in Fig. 4.55.
350
300
250
Northing
200
150
100
50
0
0 50 100 150 200 250 300 350 400
Easting
Radius of Estimation Fe
Site No. Easting Northing influence (m) (%)
(1) (2) (3) (4) (5)
1 36 136 63 30.02
2 27 90 172 32.71
3 86 116 97 30.13
4 45 29 200 33.81
5 95 47 118 34.88
6 186 23 68 38.49
7 218 58 50 32.63
8 222 129 95 31.77
9 268 98 80 32.35
10 327 134 93 33.47
11 272 43 58 28.99
12 340 40 80 32.30
13 331 65 77 32.78
14 327 87 70 33.08
15 368 96 103 34.37
From this map one can know the relevant radius of influence for any desired point
within the study area. Once this radius is determined, then a circle with the center
at the prediction point is drawn. The measurement sites within this circle are taken
into consideration in the application of Eq. (2.27) for regional estimation through
the SDF weights.
15 sites
1,2
21 sites
0,8
RDF
0,6
0,4
0,2
0
0 0,2 0,4 0,6 0,8 1 1,2
Dimensionles distance
After having completed the cross-validation procedure and the map of the radius
of influence, it is now time to present spatial interpolation procedure with RDF
usage as follows.
1) Select any number, say 15, of spatially scattered points within the study area as
shown in Fig. 4.51. These sites are the locations without measurements. For the
sake of argument, they are selected rather arbitrarily with easting and northing
coordinates as shown in Table 4.6.
2) Radius of influence of each site is determined from the map in Fig. 4.55 and
written in the forth column of Table 4.6.
3) Consideration of the radius of influence for each site defines the number of
measurement sites within this radius, which are basis for the Fe% concentration
42
SDF without radius of Inverse distance square
42
influence
38
38
Observed
Observed
34
34
30 30
26 Measurements 26 Measurements
26 30 34 38 42 26 30 34 38 42
42
42
Fuzzy clustering with 3 Fuzzy clustering with 4
38 groups 38 groups
Observed
Observed
34 34
30 30
26 Measurements 26 Measurements
26 30 34 38 26 30 34 38 42
Observed
34 34
30 30
26 Measurements 26 Measurements
26 30 34 38 42 26 30 34 38 42
estimation through Eq. (2.27) at this site. Hence, the measurement sites that will
be considered in the spatial interpretation of the Fe % concentration at the site
are identified.
4) Subsequently, distances between the interpolation site and the effective mea-
surement sites are calculated.
5) The entrance of these distances on the horizontal axis in the SDF (Fig. 4.49b)
yields the weights on the vertical axis.
6) Substitution of all the relevant values into Eq. (2.27) provides the Fe % con-
centration value estimations at each site, which are shown in the last column of
Table 4.6.
7) In order to check the reliability of the estimations, the question is now whether
these spatial estimations will yield almost the same SDF or not. For this purpose,
the SDF calculation steps are applied to data in Table 5.6.
8) Figure 4.56 indicates the resulting SDF for the measured and estimated Fe %
concentrations. The maximum relative difference between these two SDFs is
less than 5%, which confirms the practical validity of the SDF adaptive estima-
tion procedure methodology as proposed in this paper.
It is now time to compare all the methods with the measured data on a Cartesian
coordinate system with the measured values on the horizontal axes and the model
outputs on the vertical axes as in Figs. 4.57 and 4.58.
42
40
38
RDF with radius of influence
36
34
32
30
28
26
24
24 26 28 30 32 34 36 38 40 42
Measured
References
Aboufirassi, M., and Marino, M. A., 1984. A geostatistically based approach to the iden-
tification of aquifer transmissivities in Yolo Basin, California. Math. Geol. 16(26),
125–137.
Alfano, M., 1984. Statistical inference of the semivariogram and the quadratic model. In: G. Verly,
and others (Eds.), Geostatistics for Natural Resources Characterization. Reidel, Dordrecht,
Holland, pp. 45–53.
Barchet, W. R., and Davis, W. E., 1983. Estimating long-term mean winds from short-term wind
data. Technical Report Number PNL-4785, Pacific Northwest Lab. Richland, WA (USA),
31 pp.
Barnes, R. J., 1991. The variogram sill and the sample variance. Math. Geol. 23(4), 673–678.
Barnes, S. L., 1964. A technique for maximizing details in numerical weather map analysis. J.
Appl. Meteor. 3, 396–409.
Barros, V. R., and Estevan, E. A., 1983. On the evaluation of wind power from short wind records.
J. Appl. Meteor. 22, 1116–1123.
Benjamin, J. R., and Cornell, C. A., 1970. Probability Statistics and Decision Making in Civil
Engineering. McGraw-Hill Book Inc., New York.
Bergman, K., 1979. Multivariate analysis of temperatures and winds using optimum interpolation.
Monthly Weather Rev. 107, 1432–1444.
Bergthorsson, P., and Döös, B. R., 1955. Numerical weather map analysis. Tellus 7, 329–340.
Box, G. E. P., and Jenkins, G. M., 1976. Time Series Analysis, Control and Forecasting. Holden
Day, San Francisco.
Bratseth, A. M., 1986. Statistical interpretation by means of successive corrections. Tellus 38A,
438–447.
Buzzi, A., Gomis, D., Pedder, M. A., and Alonsa, S., 1991. A method to reduce the adverse impact
that inhomogeneous station distributions have on spatial interpolations. Mon. Wea. Rev. 119,
2465–2491.
Carr, J. R., Bailey, R. E., and Deng, E. D., 1985. Use of indicator variograms for enhanced spatial
analysis. Math. Geol. 17(8), 797–812.
Clark, I., 1979. The semivariogram – Part 1. Eng. Min. J. 180(7), 90–94.
Cooley, R. L., 1979. A method of estimating parameters and assessing reliability for models of
steady state groundwater flow, 2, Applications of statistical analysis. Water Resour. Res. 15,
603–617.
Cressie, N. A. C., 1993. Statistics for Spatial Data (revised edition). Wiley, New York, 900 pp.
Cressman, G. P., 1959. An operational objective analysis system. Mon. Wea. Rev. 87, 367–374.
Daley, R., 1991. Atmospheric Data Analysis. Cambridge University Press, Cambridge, 475 pp.
David, M., 1977. Geostatistical Ore Reserve Estimation. Elsevier, New York, 340 pp.
Davis, J., 1986. Statistic and data analysis in geology. John Wiley & Sons, Inc., New York, 560 pp.
Dee, D. P., 1995. On-line estimation of error covariance parameters for atmospheric data assimila-
tion. Mon. Wea. Rev. 123, 1128–1145.
DeWijs, H. J., 1972. Method of successive differences applied to mine sampling: Trans. Inst. Min.
Metal. Sect. A, Min. Industry 81, 78–81.
Eddy, A., 1964. The objective analysis of horizontal wind divergence fields. Quart. J. Roy. Meteo-
rol. Soc. 90, 424–440.
Eliassen, A., 1954. Provisional report on calculation of spatial covariance and autocorrelation of the
pressure field, Report No. 5 (Oslo: Videnskaps-Akademiet Institut for Vaer og Klimaforskning).
Erdik, M., Doyuran, V., Akkaş, N., and Gülkan, P., 1985. A probabilistic assessment of the seismic
hazard in Turkey. Tectonophysics 117(3/4), 295–330.
Franke, R., 1988. Statistical interpretation by iteration. Mon. Wea. Rev. 116, 961–963.
Gandin, L. S., 1963. Objective Analysis of Meteorological Fields. Leningrad, Gidromet; Translated
from Russian, Jerusalem, Israel Program for Scientific Translations, 1965, 242 pp.
Gilchrist, B., and Cressman, G. P., 1954. An experiment in objective analysis. Tellus 6, 309–318.
200 4 Spatial Dependence Measures
Goodin, W. R., McRea, G. J., and Seinfeld, J. H., 1979. A comparison of interpolation meth-
ods for sparse data: application to wind and concentration fields. J. Appl. Meteor. 18,
761–771.
Gustafsson, N., 1981. A review of methods for objective analysis. In: L. Bengtsson, M. Ghil, and E.
Källén (Eds.), Dynamic Meteorology: Data Assimilation Methods. Springer-Verlag, pp. 17–76.
Hoeksema, R. J., and Kitandis, P. K., 1984. An application of the geostatistical approach to
the inverse problem in two dimensional groundwater modeling. Water Resour. Res. 20(7),
1003–1020.
Hohn, M. E., 1988. Geostatistics and Petroleum Geology. Van Nostrand Reinhold, New York, NY.
Inman, R. L., 1970. Operational objective analysis schemes at the National Severe Storms Forecast
Center. U.S. National Severe Storms Laboratory Tech. Circular 10, Norman, OK, 50 pp.
Journel, A. G., and Huijbregts, C. I., 1978. Mining geostatistics. Academic Press, London, 710 pp.
Journel, A. G., 1985. The deterministic side of geostatistics. Math. Geol. 17(1), 1–15.
Kitanidis, P. K. 1997. Introduction to Geostatistics: Applications in Hydrogeology. Cambridge
University Press, Cambridge, 249 pp.
Koch, S. E., DesJardins, M., and Kocin, P. J. 1983. An iterative Barnes objective map analysis
scheme for use with satellite and conventional data. J. Appl. Meteor. 22, 1487–1503.
Krige, D. G., 1982. Geostatistical case studies of the advantages of log-normal, De Wijsian Kriging
with mean for a base metal mine and a gold mine. Math. Geol. 14(6), 547–555.
Kruger, H. B., 1969a. General and special approaches to the problem of objective analysis of
meteorological variable. Quart. J. Roy. Meteorol. Soc. 95(403), 21–39, January.
Kruger, H. B., 1969b. Objective analysis of pressure height data for the tropics. Monthly Weather
Rev. 94(4), 237–257.
Liebhold, A. M., Halverson, J., and Elmes, G., 1993. Quantitative analysis of the invasion of gypsy
moth in North America. J. Biogeogr. 19, 513–520.
Lorenc, A. C., 1981. A global three-dimensional multivariate statistical interpolation scheme.
Monthly Weather Rev. 109, 701–721.
Matheron, G., 1963. Principles of geostatistics. Econ. Geol. 58, 1246–1266.
Matheron, G., 1965. Les Variables Regionalisees et leur Estimation. Masson, Paris, 306 pp.
Matheron, G., 1971. The theory of regionalized variables and its applications. Ecole de Mines,
Fontainbleau, France.
Myers, D. E., Begovich, C. L., Butz, T. R., and Kane, V. E., 1982. Variogram models for regional
groundwater geochemical data: Math. Geol. 14(6), 629–644.
Pannatier, Y., 1996. VARIOWIN: Software for Spatial Data Analysis in 2D. Springer-Verlag, New
York, NY.
Panofsky, H. A., 1949. Objective weather map analysis. J. Meteor. 6, 386–392.
Pedder, M. A., 1993. Interpolation and filtering of spatial observations using successive correla-
tions and Gaussian filters. Mon. Wea. Rev. 121, 2889–2902.
Perrie, W., and Toulany, B., 1989. Correlations of sea level pressure field for objective analysis.
Mon. Wea. Rev. 117, 1965–1974.
Pham, T. D., 1997. Grade estimation using fuzzy-set algorithms. Math. Geol. 29(2), 291–305.
Powers, R. W., Ramirez, L. F., Redmond, C. D., and Elberg, E. L., 1966. Geology of the Arabian
Peninsula. Sedimentary Geology of Saudi Arabia. U.S. Geol. Survey, Prof Paper 560-D, 1–47,
New York.
Rutherford, I. D., 1976. An operational three-dimensional multivariate statistical objective analysis
scheme. Proceedings of the JOC Study Group, Conference on four-dimensional Data Assimi-
lation, Paris, November 17–21,1975. The GARP Programme on Numerical Experimentation,
Report No. 11, January 1976.
Sashegyi, K. D., Harms, D. E., Madala, R. V., and Raman, S., 1993. Application of the Brat-
seth scheme for the analysis of GALE data using a mesoscale model. Mon. Wea. Rev. 121,
2331–2350.
Schlatter, T. W., 1975. Some experiments with a multivariate statistical objective analysis scheme.
Monthly Weather Rev. 103, 246–257.
References 201
Seaman, R. S., 1988. Some real data tests of the interpolation accuracy of Bratseth’s successive
correction method. Tellus 40A, 173–176.
Şen, Z., 1978. Autorun analysis of hydrologic time series. J. Hydrol. 36, 7585.
Şen, Z., 1989. Cumulative semivariogram models of regionalized variables. Int. J. Math. Geol.
21(3), 891–903.
Şen, Z., 1992. Standard cumulative semivariograms of stationary stochastic processes and regional
correlation: Math. Geol. 24, 417–435.
Şen, Z., 1995. Applied Hydrogeology for Scientists and Engineers. CRC Lewis Publishers, Boca
Raton, FL, 496 pp.
Şen, Z., 1997. Objective Analysis by cumulative semivariogram technique and its application in
Turkey. J.Appl. Meteorol. 36(12), 1712–1724.
Şen, Z., 2002. İstatistik Veri İşleme Yöntemleri (Hidroloji ve Meteoroloji). Su Vakfı yayınları, 243
sayfa.
Şen, Z., and Habib, Z. Z., 1998. Point cumulative semivariogram of areal precipitation in moun-
tainous regions. J. Hydrol. 205, 81–91.
Şen, Z., and Habib Z., 2001. Monthly spatial rainfall correlation functions and interpretations for
Turkey. Hydrol. Sci. J. 46(5), 829–829.
Şen, Z., 2008. Solar Energy Fundamentals and Modeling Techniques. Atmosphere, Environment,
Climate Change and Renewable Energy. Springer Verlag, 276 pp.
Shenfield, L., and Bayer, A. E., 1974. The utilization of an urban air pollution model in air man-
agement. Proc. 15th Meeting of the Expert Panel on Air Pollution Modeling, NATO Committee
on the Challenges to Modern Society, Brussels, Belgium, 35 pp.
Skibin, D., 1984. A simple method for determining the standard deviation of wind direction,
J. Atmos. Oceanic Tech., 1, 101–102.
Subyani, A. M., 1987. Hydrogeology of the Wasia aquifer and its geostatistical modeling. Unpub-
lished M.Sc. Thesis, Faculty of Earth Sciences, King Abdulaziz University, 170 pp.
Taylor, G. I., 1915. Eddy motion in the atmosphere. Phil. Trans. Roy. Soc. A 215, 1.
Thiebaux, H. J., and Pedder, M. A., 1987. Spatial Objective Analysis, Academic Press, London,
299 pp.
Chapter 5
Spatial Modeling
Abstract In general, spatial variability is concerned with different values for any
property, which is measured at a set of irregularly distributed geographic locations
in an area. The aim is to construct a regional model on the basis of measurement
locations with records and then to use this model for regional estimations at any
desired point within the area.
Earth science phenomenon varies both in time and in space; and its sampling is
based on measurement stations’ configuration. In many practical applications mea-
sured data are seldom available at the point of interest, and consequently the only
way to transfer the solar irradiation data from the measurement sites to the esti-
mation point is through regional interpolation techniques using powerful models.
The spatial variability is measured in the most common way through the recorded
solar irradiation time series at individual points. The relative variability between
the stations (the difference of simultaneous values between each pair of stations) is
treated commonly by spatial autocorrelation function, which is used for inter-station
dependence based on a set of restrictions.
Spatial variability is the main feature of regionalized variables, which are very
common in physical sciences. In practical applications, the spatial variation rates of
the phenomenon concerned is of great significance in fields such as solar engineer-
ing, agriculture, remote sensing, and other earth and planetary sciences. A set of
measurement stations during a fixed time interval (hour, day, month, etc.) provides
records of the regionalized variable at irregular sites, and there are few methodolo-
gies to deal with this type of scattered data. There are various difficulties in making
spatial estimations originating not only from the regionalized random behavior but
also from the irregular site configuration.
Optimum interpolation modeling technique is presented for spatio-temporal pre-
diction of regionalized variable (ReV) with application to precipitation data from
Turkey. Kriging method is explained with simple basic concepts and graphics and
then various Kriging application methodologies are explained for the spatial data
modeling. The distinctions between simple, ordinary, universal, and block Krig-
ing alternatives are presented in detail. Finally, triple diagram mapping method-
ology for spatial modeling is presented with applications at five different climate
locations.
5.1 General
is ready for temporal and/or spatial predictions (estimations) at locations where the
data measurements are not available.
The first spatial model is performed by Student (1907), who summed up the
number of the particles per unit area instead of the analysis of the spatial positions
of particles in a liquid. He divided the 1 mm2 area of a hemacytometer into 400 equal
squares and counted the yeast cells. Later, Fisher (1935) used the spatial analysis in
agricultural area. Yates (1938) searched about the influence of spatial correlation
at the randomization process, where completely random regionalized variables are
considered for modeling purposes.
This chapter is concerned with the spatial prediction of the ReV and two of the
most important procedures, namely the optimum interpolation and Kriging models,
are presented in detail.
The spatial interpolation of earth sciences data aims at estimating the ReV value at
a given site based on the nearby observations (measurements) sites (Fig. 5.1). Most
often the input variables are longitudes and latitudes (easting and northing) and the
ReV is one of the earth sciences variables, i.e., triplicate as mentioned in Chapter 2.
This is a problem of operational earth sciences that is regularly encountered in
the spatial estimation procedures. The 2D statistically optimum interpolation mod-
els are useful in the analysis and modeling of ReVs, where spatial dependence func-
tions (SDF) are used for depicting the radius of influence in the case of isotropic
ReVs or area of influence for anisotropic ReVs variability (Chapters 2 and 4). One
of the fundamental, effective, and objective analysis is the transformation of mea-
surements from irregularly distributed sites to regular gridded networks for use in
the numerical prediction schemes or mapping methodology through spatial mod-
els such as the Kriging methodology, which assigns a spatial estimation value to
any point without measurement through a suitable estimation procedure. Hence, in
practice, there are twofold purposes in describing the spatial variability in earth sci-
ence ReVs by objective analysis.
Measurement sites
Grid sites
Radius of influence
Estimation site
1) A regular grid of nodes serves as the initial data for numerical forecast models
or mapping methodologies.
2) Coordinates of a rather great number of iso-line points help to construct equal-
value lines (contour lines) such as isohyetal maps.
In any spatial prediction, the problems are with spatial (areal and elevation) mod-
eling and with the transfer of information from available irregular measurement
sites to regular grid nodes or to any desired point of interest. In general, ReV at any
location bears some relationship to nearby locations. The strength of the relation-
ship normally decreases as the distance between the location increases (Chapter 4,
Section 4.10). There are various spatial data modeling techniques for data interpo-
lation from the measurement sites to any desired point as follows (Schlatter, 1988).
1) Surface fitting methods: The first objective analysis method in meteorology was
the surface fitting type devised by Panofsky (1949). In this method, the analysis
value is represented as a continuous mathematical function, which fits irregularly
scattered measurements. Among these methods are the polynomial interpolation
(Panofsky, 1949), orthogonal polynomials (Dixon, 1969, 1976); splines (Fritsch,
1971), and finally spectral approaches (Flattery, 1970).
2) Empirical linear interpolations: The value of any variable at a particular location
is estimated as a weighted sum of observations. Among such interpolation tech-
niques are the iterative successive correction methods (Cressman, 1959; Barnes,
1964), which are already explained in Chapters 2 and 4.
3) Statistical objective analysis: These are estimation methods at any desired point
where spatial correlation (dependence) structure determines the weights appli-
cable to each observation. The major approaches in this category are the optimal
interpolation (Gandin, 1963), the covariance models for atmospheric variable
(Buell, 1958), adaptive filtering (Kalman, 1960; Şen, 1980, 1983), and recently
the CSV method (Şen, 1997).
4) Variational techniques: These include more mathematical abstractions than other
methods and two of them are the incorporation of dynamic constraints (Sasaki,
1958) and the fitting models to data at different times (Ledimet and Talagrand,
1986).
5) Geostatistical approaches: These models are based on the classical SV and Krig-
ing techniques (Matheron, 1963; Clark, 1979; Journel and Huijbregts, 1978).
Various alternatives of SV are presented by different authors (see Chapter 4),
6) Objective modeling: The analysis of ReV produces the optimum solution in the
sense that interpolation error is minimized on the average. This method allows
for the extraction of as much useful information as possible from the measure-
ments. The problems associated with optimum interpolation analysis can be
summarized as follows.
a) It requires knowledge of covariance, which is often not known and thus its
estimate is necessary from the available data. Establishing such an estimate
is often fraught with difficulty as a host of local factors are involved.
5.3 Optimum Interpolation Model 207
b) Essentially, one must determine the priority about which station measure-
ments are significantly correlated with the value at the point of estimation
(interpolation), i.e., one must determine a region of influence around the
interpolation point as in Fig. 5.1.
n
Wi Zoi
i=1
Zak = n , (5.1)
Wi
i=1
where Zak is the estimate value and Wi ’s show the weightings between the i-th site
and the grid point k (Fig. 5.1). In the literature, all weighting functions that are
proposed by various researchers appear as functions of the distance between the
sites only. The major drawbacks of such weighting functions are already presented
in Chapter 2.
OIM assumes that analyses of data are represented by first guess data plus cor-
rections, which are linear combinations of differences between the first guess and
the observed data. The coefficients of linear combination are determined statistically
so as to minimize the expected square error of data. The coefficients of linear com-
bination are expressed by error covariance matrices of observed and forecast data
(estimated data), when forecast data are used as “first guess.” Thus the covariance
matrix has a great influence on the final estimation. It is assumed that measurements
are spatially correlated and such correlations are calculated as in Chapter 4 (Figs. 4.5
and 4.6). This implies that measurements are close together in clusters, i.e., highly
correlated, and that as they get farther apart they become independent similar to
what has been explained in the SV definition in Chapter 4. Although the method
presented here is a fully 2D version of optimum interpolation, similar approach is
208 5 Spatial Modeling
1) One of the significant advantages in the use of OIM is the ability to estimate a
variability at any site from the observations at adjacent sites.
2) The interpolation weights depend on the statistical structure of the ReVs mea-
sured as a sequence of time series and not on the individual measurement values
at a given time.
3) The expected analysis errors are produced as a function of the distribution and
accuracy of the data.
4) It is more expensive computationally than other commonly used methods. The
method is computer intensive as the number of computations and amount of
computer storage are concerned.
5) The correlation (dependence) models require a long history of first guess field
for he accurate determination of empirical coefficients.
6) The statistical error functions are estimates, not exact values.
Furthermore, the basic assumptions that are embedded in any OIM can be sum-
marized as follows.
1) The measurements have a spatial dependence, which implies that as long as they
are close together the spatial dependence is high, otherwise they become more
independent.
2) There is no dependence (no correlation) between the measurements and the first
guess field errors.
3) OIM as described here may be applied to any scalar field when the correlation
and error patterns for that field are known.
4) The field to be analyzed is statistically homogeneous and isotropic.
Consider any grid point, k, with n observations, Zoi (i = 1, . . ., n) around it, which
will be used to calculate the analyzed value, Zak , at this grid point. The main idea
g
is that Zak is determined as the sum of a first guess value, Zk , at grid point k plus
a linear combination of the deviations of the observed values from the first guess
g
values (Zoi − Zi ) (see Fig. 5.2). This is to say that
g
n
g
Zak = Zk + Wi (Zoi − Zi ), (5.2)
i=1
where n is the number of stations (in other words, it may also be considered as
the number of influencing stations). The Wi ’s are the interpolation weights; as intu-
itively mentioned above, one would expect the Wi ’s to be positive and decrease
monotonically with increasing distance from the grid point, k.
Assume that the true value, ZTk , can be subtracted from both sides of the above
equation. Usually, the true value is not known, but some knowledge may be known
5.3 Optimum Interpolation Model 209
T o
Zi Zi
ΔX
g
Zi
Grid point Measurement site
ΔY
g
Zk
a T
Zk Zk
about its statistical parameters, and that will prove useful in the following method.
With these considerations, Eq. (5.2) becomes
g
n
g
(ZTk − Zak ) = (ZTk − Zk ) − Wi (Zoi − Zi ). (5.3)
i=1
The values of the guess fields at each observation location may be determined
by using bilinear interpolation. The difference between each observation and corre-
sponding guess values may be computed as
g g
Zoi − Zi = (Zoi − ZTi ) + (ZTi − Zi ) = Z̃i + Ẑi , (5.4)
where the deviation of the true value ZTi from the first guess value at observed station
is Ẑi (guess error at station i); the deviation of true value from the first guess at grid
point is Ẑk (guess error at grid k); and, finally, Z̃i denotes the deviation of true value
from the observed value at station i (the observational errors at station i), which can
be written as
g
⎫
Ẑi = (ZTi − Zi ) ⎪
⎪
⎬
g
Ẑk = (ZTk − Zk ) (5.5)
⎪
⎪
Z̃ = (Zo − ZT ) ⎭
i i i
By substituting Eqs. (5.4) and (5.5) into Eq. (5.3), one can obtain
n
(ZTk − Zak ) = Ẑk − Wi (Z̃i + Ẑi ). (5.6)
i=1
Furthermore, the interpolation weights Wi are obtained by condition that the mean
square error (MSE) of interpolation is the minimum, which corresponds to the
method of least squares. This condition can be expressed as
210 5 Spatial Modeling
where the over bar denotes an ensemble average in the case of a large grid point
number. Insertion of Eq. (5.6) into Eq. (5.7) and evaluation of the square term
leads to
- .2
n n n
E = Ẑk − Wi (Z̃i + Ẑi ) = Ẑ2k + Wi Wj (Z̃i + Ẑi )(Z̃j + Ẑj )
i=1 i=1 j=1
n
−2 Wi (Ẑk Z̃i + Ẑk Ẑi )
i=1 (5.8)
n n
= Ẑ2k + Wi Wj (Z̃i Z̃j + Z̃i Ẑj + Ẑi Z̃j + Ẑi Ẑj )
i=1 j=1
n
−2 Wi (Ẑk Z̃i + Ẑk Ẑi ),
i=1
n
n
n
E = Ẑ2k + Wi Wj (Z̃i Z̃j + Ẑi Ẑj ) − 2 Wi (Ẑk Ẑi ), (5.10)
i=1 j=1 i=1
n
n
n
E = γkk + Wi Wj (γij + γ̃ij ) − 2 Wi γik . (5.11)
i=1 j=1 i=1
g
γkk = (ZTk − Zk )2 , (5.12)
which represents the variance of the guess error at the grid point k, and
g g
γij = (ZTi − Zi )(ZTj − Zj ) (5.13)
is the covariance of the guess error at location i and with guess error variance at
location j as
g g
γki = (ZTk − Zk )(ZTi − Zi ), (5.14)
5.3 Optimum Interpolation Model 211
which is the guess error covariance at the grid point k with the guess error at
location i,
This is the covariance of the observation error at the location i with the observation
error at station j.
The statistical interpolation is a minimum variance estimation procedure and
attempts to minimize the expected analysis error variance. The problem is to find
the weights Wi that minimize the variance. Hence, differentiating Eq. (5.11) with
respect to each of the weightsWi and equating the result to zero after necessary
algebraic manipulations lead succinctly to
n
Wi (γij + γ̃ij ) = γkj . (5.16)
i=1
Interpolation of Eq. (5.2) using weights from Eq. (5.16) is called “optimum interpo-
lation.” However, these weights are optimal only if the observation and first guess
error covariance are correct. On the other hand, if the assumed observation and first
guess error covariance are not correct, then Eq. (5.7) is not minimized strictly. In this
case, the interpolation weights in Eq. (5.10) are not optimal and it is called statistical
interpolation.
Multiplication of Eq. (5.16) by Wj and summation for j =1, . . ., n, leads to
n
n
n
(γij + γ̃ij )Wi Wj = Wj γkj . (5.17)
i=1 j=1 j=1
Finally, subtraction of this expression from Eq. (5.11) gives the minimum interpo-
lation error as
n
E = γkk − Wi γik . (5.18)
i=1
778
BLACK SEA
667
556 400 Km
300 Km
AEGEAN SEA
445 Sivas
200 Km
334
100 Km
222
0 Km
111
0 MEDITERRANEAN SEA
0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668
Fifty-two stations for 30-year period from 1960 to 1990 are selected with monthly
rainfall measurements.
The spatial scatter distribution of these measurement sites is given in Fig. 5.4,
which shows an irregular pattern. In general, the transfer of information from the
measurement sites to grid points is necessary for any type of modeling as numeri-
cal solution or mapping the ReVs. Here each site monthly precipitation time series,
which may also be monthly average groundwater levels, groundwater quality val-
ues (calcium, magnesium, sodium, potassium, sulfate, bicarbonate, chloride, nitrate,
total dissolved solids, electric conductivity, etc.) and alike.
In practical applications of the optimum interpolation such as the analysis of rain-
fall, one uses the climatological mean as the first guess value. Hence, the following
expression is considered for the interpolation point value (Habib, 1999).
45
. Measurement
+ Grid points
40
Northing
35
30
25
Fig. 5.4 Spatial distributions 36 37 38 39 40 41 42 43
of the data points Easting
5.3 Optimum Interpolation Model 213
n
Zak = Zk + Wi (Zoi − Zi ), (5.19)
i=1
where Za k and Z0 i are the calculated and observed rainfall values corresponding
to the arithmetic averages of Zk and Zi, respectively, at interpolation point k and
observation stations i = 1, 2, . . ., n, and Wi ’s are the interpolation weights. In order
to calculate these weights, the interpolation formula can be obtained by multiplying
both sides of Eq. (5.19) by (Z0 j – Zj ) and taking the expectation leading to
n
Wi ρij = ρkj , (5.20)
i=1
where ρij is the spatial correlation coefficients between stations i and j, and ρkj
between stations j and k. This equation is valid in the case when E(Zk ) = E(Zk a ),
which implies unbiased estimator. In short, interpolation weights, Wi , are dependent
on the statistical structure of the spatial correlation function (SCF) of the rainfall
records at irregular sites. Once the SCFs are obtained from the available data, then
the value of ρkj can be read from this function depending on the distance between k
and j, and, consequently, the only unknowns in Eq. (5.20) are the weights that can be
calculated from the set of n linear equations. The correlation functions are presented
in the previous chapter (Figs. 4.5 and 4.6). The expected analysis error, εkj , at grid
point k, which results from the introduction by using information at location j, can
be expressed as (Habib, 1999)
n
εk = 1 − Wi ρki = 1 − ρkj . (5.21)
i=1
Most often in practice, 0 < ρkj < 1 and therefore 0 ≤ εkj ≤1. It is obvious that
the expected error does not depend directly on the observed values but again
on the spatial statistical structure of the rainfall amounts. Under the light of the
aforementioned discussion, the following OIM steps are necessary for practical
applications.
START
i=1
NO COMUTATION COMPLETE
FOR ALL GRID?
YES
END
where R(d) represents the theoretical SCF and d is considered as the distance
between i-th and j-th stations; a, b, and c are the model parameters. These parameters
are determined by fitting a mathematical function to the array of computed corre-
lation versus corresponding distances between measurement sites. Each one of the
monthly average SCF is fitted to this model by employing the ordinary least squares
regression approach. The resulting parameter values for each month are presented
in Table 5.1, in addition to the overall monthly average parameter values.
All the monthly spatial correlation functions (SCF), say for Sivas station (see
Fig. 5.3), are confined between January and July as one can see from Fig. 5.6. It
Month a b C
1.00
0.80 1
0.60
Correlation
11
12
0.40 4 2
3
5
7 6
0.20 10
9 8
0.00
0 200 400 600 800 1000
Distance (Km)
Fig. 5.6 Spatial correlation function of Sivas city for different months
is also clear from this figure that summer months have less SCF values than winter
months.
This is due to the fact that in this region cyclonic rainfall types are dominant in
winter whereas convectional type of rather local rainfall occurs in summer season.
Of course, orographic rainfall effects play more effective role in winter season than
summer. However, it is not possible to identify this type of rainfall from the SCFs.
In Fig. 5.6 one can note that the maximum continuity appears in winter months.
This is due to the fact that in this region frontal rainfall types are dominant in winter
whereas conventional type of rather local rainfall occurs in summer season.
At large distances, February has strong persistent SCF, which implies that in this
month there are cyclonic rainfall types because they cover large areas. Similar trends
are also observed in April and May. On the other hand, comparatively steeper SCFs
indicate convectional rainfall types, which appear over rather smaller area during
summer months.
It is possible to show the variation of SDF for any measurement station as the
center of concentric contour. The extreme and average SCFs are shown for Sivas
city as given in Fig. 5.7.
The benefit from these figures is that at any given month of the year and a given
correlation level, say 0.050, it is possible to determine the influence area around the
5.3 Optimum Interpolation Model 217
1.00
0.80
Maximum
0.60
Correlation
Average
0.40
Minimum
0.20
0.00
0 200 400 600 800 1000
Distance (Km)
station. On the average, the minimum correlation value for 200 km (other distances
have similar interpretations) appears in July and the maximum correlation value is
in January. It is also clear from this figure that summer months have less spatial
correlation values than winter months. Table 5.2 shows the values of the percentage
of the variance of observational error, which has a minimum of 14.32 in March and
a maximum of 50.46 in November. Similarly, the correlation of the observed values
varies between a minimum of 0.4975 in November and a maximum of 0.8567 in
March.
It is possible to find the correlation ρ(d) of the true (first guess) values of the
meteorological variables at distance d as
R(d)
ρ(d) = , (5.23)
R(0)
where R(d) and R(0) are the correlations of the observed values at d and at zero
distances, respectively. This equation is used to calculate the correlation of the first
guess (true) values of the ReVs. The negative-exponential model in Eq. (5.22) at
zero distance can be defined as
R(0) = a + b, (5.24)
218 5 Spatial Modeling
and, finally, by substituting Eqs. (5.22) and (3.24) into Eq. (5.23) it is possible to
obtain
a + be−d/c
ρ(d) = . (5.25)
a+b
This expression can be used for the indirect evaluation of the correlation of the
first guess (true) values of ReVs by means of extrapolating the correlation of the
observed values at zero distance d = 0 as shown in Fig. 5.8, or by using Eq. (5.23)
or Eq. (5.25) for finding the correlation of the true values.
ρ (0) 1 ρ (0) = 1
R(D)
0 15
Distance (km)
contours are very dense from March to October. The reason for such a digression
is due to the fact that the rainfall occurrences are comparatively more sporadic (i.e.,
regionally random) during March–October duration. This is also because the con-
vective rainfall occurrences appear almost independently from each other. However,
areal continuity of winter rainfall is a signature of cyclonic weather movements.
Such continuity results in comparatively very small expected error amounts, say
for instance in January (Fig. 5.9), where errors vary between 0.05 and 0.15 over
the Anatolia within Turkey. On the contrary, in July (Fig. 5.9) error band varies
between 0.20 and 0.70. Table 5.3 shows the relation between expected mean square
error (MSE) and months.
One can note that the maximum expected error appears in summer months (for
example, in July 0.3480), because the rainfall has more discontinuous ReV as
explained above. On the other hand, the minimum expected error is in winter months
(for example, in January 0.0193), because the rainfall in this season is areally exten-
sive and more continuous.
0 0 0
0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668 0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668 0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668
0 0 0
0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668 0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668 0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668
0 0
0
0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668 0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668 0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668
0 0 0
0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668 0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668 0 111 222 334 445 556 667 778 890 1001 1112 1223 1334 1446 1557 1668
5 Spatial Modeling
Fig. 5.9 Equal correlation lines between Sivas city and other stations
5.3 Optimum Interpolation Model 221
January 0.0193
February 0.0935
March 0.0834
April 0.1700
May 0.2500
June 0.1000
July 0.3480
August 0.1200
September 0.0657
October 0.0994
November 0.0238
December 0.0463
Average 0.1141
related to the number of observations permitted to influence each grid point. The
choice of a search strategy that controls the stations which are included in the inter-
polation procedure is an important consideration in any approach to OIMs.
The most common approach in choosing the stations that contribute to the inter-
polation is to define a search neighborhood within which all available stations will
be used. Herein, a simple search strategy is adapted using all station within a circular
search neighborhood with a limited radius of influence.
Meleschko and Prigodich (1964) have shown that the interpolation error reaches
a minimum at about six to eight measurements and shows no further improvement
with the inclusion of more sites. In order to fix this idea with the data at hand, the
change of expected MSE is plotted versus the number of neighboring station for
each month. However, it appeared that such graphs are very similar to each other.
It is objectively seen from Fig. 5.10 that on the average the number of influencing
0.032
Mean square of expected error
0.030
0.028
0.026
0.024
0.022
0.020
0.018
0 2 4 6 8 10 12 14 16 18 20
Number of influencing stations
stations does not change significantly above four sites. For more station numbers,
the expected error mean square remains on almost the same minimum level.
n 52 52
Mean 76.6 74.70
Minimum 5 0
Maximum 361 266.9
Range 356 267
Variance 3407.60 2905.61
Std. dev. 58.37 53.90
5.3 Optimum Interpolation Model 223
100
0
0 100 200 300 400
Observe values (mm)
measurements and estimations are very close to each other with less than 1% error.
Although there are more discrepancies for monthly standard deviations, generally
2.5% but high errors appear in November and December.
100
OBSERVED VALUES
ESTIMATED VALUES
80
RAINFALL (MM)
60
40
20
STANDARD DEVIATION
60
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12
MONTHS
1) Establish the theoretical basis for expressing the structural properties of a natural
phenomenon in a useful mathematical form as SV or CSV.
2) Provide a particular means for solving various problems of estimation such as
the Kriging methodology, which guarantees a solution to the estimation problem
(Kriging) and deals with the ReV by using the probabilistic theory of random
function.
By now, however, there are a number of excellent books on the subject, includ-
ing both introductory (Clark, 1979) and advanced aspects (David, 1977; Journel
and Huijbregts, 1978). The classical parametric and non-parametric geostatistics
are introduced recently for application in the field of earthquake ground motion
evaluation (Glass, 1978; Carr and Glass, 1984, 1985; Carr et al., 1985).
Almost all variables encountered in the earth sciences can be regarded as ReVs.
For seismic zonation this is an ideal one in describing earthquake ground motions.
Each observation of ground motion can be considered simply as a unique realization
of an Rev, which adequately represents local random behavior tempered by global
attenuation. Furthermore, provided that a valid SV can be developed, Kriging is well
suited for the estimation process to result in data regionalization.
in calculating estimates at a set of grid nodes. These Kriging estimates are best-
linear-unbiased-estimates (BLUE) of the ReV at a set of locations, provided that the
surface is stationary and the correct form of the theoretical SV has been determined
(Chapter 4).
In many disciplines such as petroleum exploration, mining, groundwater, and
water pollution analysis, data are available at a set of predetermined spatial locations
(water and oil wells, meteorology stations, etc.). The purpose is to make regional
estimation at any location based on the available data at these locations. It is nec-
essary to often have maps based on a regular grid and the estimations are used to
produce 2D contour maps or 3D surface plots. In theory, only Kriging method of
grid generation can produce better estimates (in the sense of being unbiased and
having minimum error). In practice, the effectiveness of the technique depends on
the correct specification of several parameters that describe the SV and the model
of the drift (regional trend). However, because Kriging is robust, even with a naive
selection of parameters it will do no worse than conventional grid estimation proce-
dures (Chapter 2).
The price that must be paid for optimality in estimation is computational com-
plexity compared to the techniques presented in Chapter 3. A large set of simultane-
ous equations must be solved for every grid node estimated by Kriging. Therefore,
computer run times will be significantly longer if a map is produced by Kriging
rather than by conventional gridding. Kriging can be computationally very intense
but increasingly available in software packages, and it is the best method for many
purposes. In addition, an extensive prior study of the data must be made to test for
stationarity, determine the form of the SV, set the neighborhood size, and select
the proper order of the drift, if it ever exists. These properties are not independent
and because the system is underdetermined, trial-and-error experimentation may be
necessary to determine the best combination. For this reason, to warrant the addi-
tional costs of analysis and processing, Kriging probably should be applied in those
instances where the best possible estimates of the surface are essential, the data are
of reasonably good quality, and the estimates of the error are needed.
1) The arithmetic average, m (expected value for any term), value of the ReV is the
same all over the area,
E [Z(x)] = m (5.26)
or equivalently,
2) The spatial covariance of the ReV is the same all over the field of interest.
The validity of this expression is true especially if the ReV has a Gaussian (normal)
pdf. Estimation of the SV is preferable to estimation of the covariance, because the
experimental SV does not require a prior estimate of the population mean. Besides,
SV calculation requires a set of measurements with no time variation. Under the
same condition, the relationship between the model autocovariance, ρ(d) and SV is
used in practice; for instance, non-linear estimator can be built by prior transforma-
tion of the data. Disjunctive Kriging (Matheron, 1971), Lognornal Kriging (Krige,
1951), and indicator Kriging (Journel, 1983) are examples of non-linear estimators.
All Kriging techniques are based on the simple linear models as
ZE = λ1 x1 + λ2 x2 + ... + λn xn , (5.32)
where ZE is the estimator of the true value at location E, and λi are the weights
allocated to each observation such that
n
λi = 1. (5.33)
i=1
1) If one has a model for the SV (or CSV), he/she can produce the minimum vari-
ance using the Kriging technique.
2) If the proper models are used for the SV (or CSV) and the system is set up
correctly, then there is always a unique solution to the Kriging system.
3) If one has regular sampling, and hence the same sampling set up at many dif-
ferent positions within the region, it is not necessary to recalculate the Kriging
system each time.
4) Kriging is not limited to a single point estimation of the given magnitude Z, but
can also be used
a) to estimate the mean value, Z, on a given block, e.g., on the mesh of model
or a sub-domain of any shape of watershed,
b) to obtain the estimation variance of magnitude Z, i.e., roughly the confidence
interval of this estimation,
c) to locate the best situation for a new measurement point, e.g., by minimizing
the overall uncertainty in the field under consideration.
5) Kriging is advantageous because it considers the following points explicitly.
a) The number of spatial configuration of observation points within the study
region
230 5 Spatial Modeling
For instance, in an earthquake ground motion study, Kriging can be used for
evaluating the earthquake ground motion hazard within the region of interest and
for estimating the tripartite earthquake response spectra at a site of interest. It is
already stated that potential errors in the data collection may lead to nugget effect
(Chapter 4), which becomes evident especially from the sample SV, and its existence
causes smoothing operation through the Kriging and less confidence in individual
data points versus the overall trend of the data. It has the same unit as the SV. There
are two components that give rise to nugget effect, namely variances due to error,
σe2 , and separation, σs2 . The latter variance is a measure of variation that occurs at
separation distances of less than the typical nearest neighbor sample spacing. The
more the random fluctuation of the same data at a given location, the greater is the
error variance and the less is the prediction reliability. Consequently, Kriging tends
to smooth the surface, and therefore it is not a perfect estimator.
Once the degree of spatial dependence has been established, then the SV can be
used to interpolate values for points not measured through the process of Kriging,
which is an interpolation method that uses the sample (empirical, experimental)
SV to weight sample points based on their locations in space relative to the point
value that is to be estimated. Therefore, a first step in Kriging is to fit a theoretical
function to the SV model that describes the theoretical SV. Kriging has many things
in common with traditional point-interpolation methods such as inverse distance
(or square) weighting, triangulation, polygonization, etc. methodologies, which are
explained in Chapter 2. The result of Kriging is an interpolated surface map of ReVs.
There are several types of Kriging methodology.
4) Block Kriging: Its interpolations are based on values in a particular finite area.
It is a more accurate and intensive computation that uses point estimates within
a block to derive an average estimate for the block.
5) Co-Kriging is a modification of ordinary Kriging that relies on the fact that many
phenomena are multivariate and that the primary variable of interest is under-
sampled. Co-Kriging estimation is done by minimizing the variance of the esti-
mation error using the cross-correlation (dependence) between several variables.
Estimates are derived for both the primary and the secondary variables. The co-
Kriging technique is a modification of the simpler technique of Kriging. It is
used to merge two variables or more. Estimation of co-Kriging contains a pri-
mary variable of interest and one or more secondary variables. Improvement in
the interpolation of one variable by using other variables is important (David,
1977; Journel and Huijbregts, 1978; Myers et al., 1982). There are two steps in
co-Kriging estimation.
a) Evaluation of the cross-correlation (or co-SVs) between variables to obtain
information about continuity and dependencies.
b) Construction of contour maps for the primary variable.
Seo (1998) used linear co-Kriging to interpret rainfall data from a set of rain
gages and radar information. He concludes that the consistency of the improvement
by gage-radar estimation makes co-Kriging an attractive tool in rainfall estimation.
Martinez (1996) applied co-Kriging to improve the accuracy of evapo-transpiration
estimation over a regular grid by including the effects of topography.
Among other spatial estimation techniques, Kriging methodology has the follow-
ing advantages.
In the application of Kriging algorithm, there are four successive essential steps
as follows.
1) When computing the interpolation weights, the algorithm considers the spacing
between the point of estimation and the data sites. The algorithm considers also
the inter-data spacing, which allows for declustering.
2) When computing the interpolation weights, the algorithm considers the inherent
length scale of the data. For example, there are regions where the topography
varies much more slowly than some other regions. If two points’ elevations at
232 5 Spatial Modeling
the same distance are considered in these two different topographies, then in the
slowly changing regional elevation case it would be reasonable to assume a lin-
ear variation between these two observations, while in the other region such an
assumed linear variation would be unrealistic. The algorithm adjusts the inter-
polation weights accordingly.
3) When computing the interpolation weights, the algorithm considers the inherent
trustworthiness of the data. If the data measurements are exceedingly precise and
accurate, the interpolated surface goes through each and every observed value. If
the data measurements are suspect, the interpolated surface may not go through
an observed value, especially if a particular value is in stark disagreement with
neighboring observed values. This is an issue of data repeatability.
4) Natural phenomena are created by physical processes, which have often pre-
ferred orientations. For example, at the mouth of a river the coarse material set-
tles out fastest, while the finer material takes longer to settle. Thus, the closer
one is to the shoreline the coarser the sediments, while the further from the
shoreline the finer the sediments. When computing the interpolation weights,
the algorithm incorporates this natural anisotropy. When interpolating at a point,
an observation 100 m away but in a direction parallel to the shoreline is more
likely to be similar to the value at the interpolation point than is an equidistant
observation in a direction perpendicular to the shoreline.
The last three items incorporate something about the underlying process from
which the observations are taken. The length scale, data repeatability, and anisotropy
are not a function of the data locations. These enter into the Kriging algorithm via
the SV (or CSV). The length scale is given by the SV range (or slope), the data
repeatability is specified by the nugget effect, and the anisotropy is given by the
anisotropy (Chapter 4).
1) The spatial sampling points are representatives of the ReV at a set of given
locations with measurement values.
2) The ReV is considered as a second-order random field variable with mean, vari-
ance, and SV.
3) The mean of ReV is known, which limits the application of this Kriging model-
ing alternative severely.
In practical applications there are many cases where the areal mean of the ReV
is known and hence direct application of the simple Kriging methodology becomes
5.6 Simple Kriging 233
• • measurement sites
• estimation site
• •
• Zi
•
• • º ZE
•
•
• • Zn
• Z1 • •
•
0 5 10 km
n
ZE = Z + λi Zi − Z , (5.34)
i=1
where Z indicates the regional constant mean value of the ReV. If there are n neigh-
boring sites for the estimation calculation, then there will be n2 elements in the
covariance (or SV) matrix, with the variances on the main diagonal. Due to the diag-
onal symmetry the number of different elements in the matrix is equal to n(n − 1)/2
as follows.
⎡ ⎤
var(z1 ) cov(z1 ,z2 ) ... cov(z1 ,zn )
⎢cov(z2 ,z1 ) var(z2 ) ... cov(z2 ,zn )⎥
⎢ ⎥
⎢ . . ... . ⎥
C=⎢
⎢
⎥
⎥ (5.35)
⎢ . . ... . ⎥
⎣ . . ... . ⎦
cov(zn ,z1 ) cov(zn ,z2 ) ... var (zn )
In this matrix each element is dependent also on the distance difference (rel-
ative distance) between the two sites. It is obvious that Cov(zi , zj ) = Cov(zj , zi )
and furthermore Cov(zi , zi ) = σi2 , which is the variance at site i. However, if the
234 5 Spatial Modeling
ReV is standardized with constant regional mean, Z and variance, σZ2 then Cov(zi ,
zi ) = 1 and the covariance corresponds to dependence (correlation) coefficient as
ρ(z1 , z2 ) = cov(z1 , z2 ). For standardized ReV, Eq. (5.35) takes the following form.
⎡ ⎤
1 ρ(z1 ,z2 ) ... ρ(z1 ,zn )
⎢ρ(z2 ,z1 ) 1 ... ρ(z2 ,zn )⎥
⎢ ⎥
⎢ . . ... . ⎥
ρ=⎢
⎢ .
⎥ (5.36)
⎢ . ... . ⎥ ⎥
⎣ . . 1 . ⎦
ρ(zn ,z1 ) ρ(zn ,z2 ) ... 1
This is the regional correlation matrix for ReV. Similar to this matrix one can write
also the distance matrix, D, between these n sites, with zero distances along the main
diagonal as follows.
⎡ ⎤
0 dis(z1 ,z2 ) ... dis(z1 ,zn )
⎢dis(z2 ,z1 ) 0 ... dis(z2 ,zn )⎥
⎢ ⎥
⎢ . . ... . ⎥
D=⎢
⎢
⎥
⎥ (5.37)
⎢ . . ... . ⎥
⎣ . . 0 . ⎦
dis(zn ,z1 ) dis(zn ,z2 ) ... 0
This matrix is also symmetrical with respect to the main diagonal. Hence, both the
covariance and the distance matrices provide information in the form of, say, upper
triangular matrices. The plot of distance matrix values on the horizontal axis versus
corresponding regional correlation coefficients from the correlation matrix provides
a general shape as in Fig. 5.15, which may be referred to as the regional dependence
(correlation) function. Logically, as the distance increases the correlation coefficient
between the ReV values decreases, and therefore Fig. 5.15 has a decreasing trend
with distance and theoretically this function should be asymptotic to the horizontal
axis. Consideration of Eq. (5.31) with unit variance yields the corresponding SVs in
the same figure.
The general expression in Eq. (5.34) for the simple Kriging can be rewritten for
a standardized ReV as
γ (d), ρ(d)
1.0
ρ(d)
γ(d)
0.0 Distance
n
zE = λi zi . (5.38)
i=1
Here there are n unknowns and accordingly n equations are necessary for the simul-
taneous solution. For this purpose, both sides of Eq. (5.38) are multiplied by each
measurement ReV variable and then the averages (expectations) are taken. The
resultant set of equations becomes
n
λi ρ (zi ,z1 ) = ρ (zE ,z1 )
i=1
n
λi ρ (zi ,z2 ) = ρ (zE ,z2 )
i=1 (5.39)
....................................
n
λi ρ (zi ,zk ) = ρ (zE ,zk )
i=1
In order to bring this set of simultaneous equations into a matrix form, the following
additional succinct vector definitions are necessary. The unknown column vector is
⎤⎡
λ1
⎢ λ2 ⎥
⎢ ⎥
λ=⎢
⎢ . ⎥
⎥ (5.40)
⎣ . ⎦
λn
Finally, the right-hand side of Eq. (5.39) represents the known part, say, column
vector, B, which is defined as
⎡ ⎤
ρ (zE ,z1 )
⎢ ρ (zE ,z2 ) ⎥
⎢ ⎥
B=⎢ ⎢ . ⎥;
⎥ (5.41)
⎣ . ⎦
ρ (zE ,zn )
C = B,
= C−1 B. (5.42)
After the determination of the weighting values, λi , from this last expression their
substitution into Eq. (5.38) leads to the estimation of the standard ReV, which is
then converted to non-standard (original) ReVs as
236 5 Spatial Modeling
ZE = Z + σE2 zE , (5.43)
where zE is an (n×1) matrix of the measured ReV values with zero mean and unit
variance.
The simple Kriging procedure can also be applied by considering the relation-
ship between the SV and the covariance function as given in Eq. (5.31). However,
the application of this transition between the covariance and the corresponding SV
will be reliable only in the case of the normally distributed ReVs. Otherwise, the
results obtained from the use of the covariance will be biased. The reader can see
the difference by applying the simple Kriging once with the covariance and then
with the SV functions of the same set ReV measurements. The simple Kriging is
equivalent to multiple regression procedure where the covariance is used for the
parameter estimation.
All what has been explained in this section is based on the covariance function for
the depiction of spatial dependence. Since there is a relationship between the covari-
ance and the SV functions in the case of standardized ReV according to Eq. (5.31)
as ρ(d) = 1 − γ(d), then the replacement of the covariance terms in all the equations
of this section provides an alternative spatial modeling of ReV based on SV. The
variance of the estimation in case of covariance use is
σE2 = 1 − B . (5.44)
When the SV is used for the spatial modeling, the same estimation variance
becomes
σE2 = B . (5.45)
The critic of the simple Kriging is that it depends on the statistical property of
the covariance (or SV) function preservation in the final estimations. In other words,
the spatial estimation is achieved in such a way that the overall spatial dependence
function (SDF) of the ReV is preserved throughout the procedure. Unfortunately,
neither the cross-validation nor the unbiasedness procedures are applied explicitly
in the simple Kriging procedure.
Example 5.1
The earthquake magnitude measurements at five stations are presented in Table 5.5
with the positions in Fig. 5.16. The spatial estimation, ZE , will be obtained from all
the five measurement sites.
The distance matrix between each pair of data is calculated with the following
results.
⎡ ⎤
0
⎢0.534689 0 ⎥
⎢ ⎥
⎢
D = ⎢0.349005 0.326195 0 ⎥
⎥
⎣0.664208 0.185368 0.370756 0 ⎦
0.483628 1.015764 0.772863 1.129264 0
5.6 Simple Kriging 237
41
Z5
40.8
40.6
Northing (km)
40.4 Z1
40.2
ZE Z3
40
Z2
39.8
Z4
39.6
24.5 24.55 24.6 24.65 24.7
Easting (km)
On the other hand, the distances between the estimation site, ZE , and measure-
ment sites are as follows.
1 2 3 4 5
There are two different ways to calculate the SV values in practical works, either
from a given small sample as in Table 5.6 without knowing the basic structure of
the sample SV or after defining the structural form of the sample SV from a large
number of data, which is preferred to be more than 30 data values. The former
approach gives a matrix, which that represents half-squared differences subtraction
from 1 according to Eq. (5.46) between the earthquake magnitudes becomes from
Table 5.5.
⎡ ⎤
0
⎢1 − 0.00125 0 ⎥
⎢ ⎥
γ=⎢ ⎢1 − 0.05445 1 − 0.03920 0 ⎥
⎥
⎣1 − 0.03920 1 − 0.02645 1 − 0.00125 0 ⎦
1 − 0.02000 1 − 0.01125 1 − 0.00845 1 − 0.00320
It is not possible to estimate the SV values in this vector because the earthquake
value at prediction location is not known. Therefore, it is necessary to know the
global SV that would depend on many location records, and it is assumed herein that
a priory structural analysis had produced the sample SV model as a linear model
Now one can calculate the SV value from the distances between the estimation point
and other surrounding points in Fig. 5.16, which leads to
⎡ ⎤
1 − 0.0447151
⎢ 1 − 0.0434038 ⎥
⎢ ⎥
B=⎢
⎢ 1 − 0.0242392
⎥
⎥
⎣ 1 − 0.0524746 ⎦
1 − 0.0904616
Of course, it is now possible to calculate the matrix in Eq. (5.46) according to the
distance matrix above by using large sample SV equation, which leads to
⎡ ⎤
0
⎢1 − 0.0684689 0 ⎥
⎢ ⎥
⎢
γ = ⎢1 − 0.0499005 1 − 0.0476195 0 ⎥
⎥
⎣1 − 0.0814208 1 − 0.0335368 1 − 0.0520756 0 ⎦
1 − 0.0633628 1 − 0.1165764 1 − 0.0922863 1 − 0.279264
5.7 Ordinary Kriging 239
The final solution can be found by taking inverse of this , which appears as follows.
⎡ ⎤
−0.7747 0.3017 0.2867 0.2763 0.2058
⎢ 0.3017 −0.7773 0.2822 0.2041 0.2866 ⎥
⎢ ⎥
γ−1 =⎢
⎢ 0.2867 0.2822 −0.7600 0.2391 0.2558 ⎥
⎥
⎣ 0.2763 0.2041 0.2391 −0.9102 0.4471 ⎦
0.2058 0.2866 0.2558 0.4471 −0.9829
Hence, application of Eq. (5.42) leads to the final weight values as,
⎡ ⎤
0.2801
⎢ 0.2668 ⎥
⎢ ⎥
λ=⎢ ⎥
⎢ 0.2641 ⎥
⎣ 0.2387 ⎦
0.2526
It is now possible to calculate the prediction value from Eq. (5.32), which gives ZE
= 2.65.
This procedure has two assumptions of the simple Kriging, with conflict of the third
one, which is assumed to be constant regionally but unknown. Hence, it is not pos-
sible to apply the standardization procedure to the ReV measurements because the
mean value is unknown. Similar expression to Eq. (5.34) can be written, but with
the consideration of unknown regional mean, m, value as
n
ZE = m + λi (Zi − m) (5.48)
i=1
Comparison of this with Eq. (5.34) indicates that rather than the standardized ReV
variables, non-standardized ReVs are used. If both sides of this last expression is
arithmetically averaged, one can then obtain
n
m=m+m λi − m = 0,
i=1
which yields the restrictive condition, as has already been given by Eq. (5.33).
Hence, the first rule in the ordinary Kriging is that the summation of all the weights
ought to be equal to 1. It indicates that the Kriging weights are independent of the
ReV average. This is referred to as the unbiasedness principle in the Kriging litera-
ture. It is possible to write Eq. (5.48) as
240 5 Spatial Modeling
% &
n
n
ZE = m 1 − λi + λi Zi , (5.49)
i=1 i=1
where the parenthesis in the first term on the right-hand side is equal to zero by
definition. Equation (5.33) is as a condition to the main estimation expression in Eq.
(5.48), which can be rewritten with the estimation error term, εE as
n
(ZE − m) = (λi Zi − m) + εE .
i=1
In the following this estimation error variance will be minimized. Let us leave the
estimation error as a subject,
n
εE = (ZE − m) − (λi Zi − m).
i=1
n
n
n
ε2E = (ZE − m)2 − 2 (λi Zi − m)(ZE − m) + λj Zj − m (λi Zi − m) .
i=1 j=1 i=1
If the estimation is made n times, then the average error estimate square (variance)
will be obtained as
1 ( 2 ) 1 1
n n n n
εE = (ZE − m)2 − 2 (λi Zi − m)(ZE − m)
n l n n
l=1 l=1 l=1 i=1
1
n n n
+ λj Zj − m (λi Zi − m)
n
l=1 j=1 i=1
or
n n
1 ( 2 ) 1 1
n n
εE = (ZE − m) − 2
2
(λi Zi − m)l (XE − m)l
n l n n
l=1 l=1 i=1 l=1
n
n n
1
+ λj Zj − m ((λi Zi − m))
n
j=1 i=1 l=1
5.7 Ordinary Kriging 241
The first average in the big brackets on the right-hand side is equal to the estima-
tion variance, σE2 , the second term average is equivalent to estimation-measurement
covariance, Cov (ZE , Zi ) and the last term is the covariance between two measure-
ments, Cov (Zi , Zj ). In fact, the average terms based on n values are equivalent to
their respective expectations as n goes to infinity theoretically. The last expression
can be written with these new covariance values as
n
n
n
ε2El = σE2 −2 Cov (ZE , Zi ) + Cov (Zi , Zj ). (5.50)
i=1 j=1 i=1
This expression must be minimized with the condition in Eq. (5.33) and, therefore,
the minimization equation can be written with the Lagrange multiplier, μ, as
n
n
n
n
ε2El = σE2 − 2 Cov (ZE , Zi ) + Cov (Zi , Zj ) + μ λi . (5.51)
i=1 j=1 i=1 i=1
This can be written in the matrix and vector form after the definition of the following
quantities.
⎡ ⎤
cov(z1 ,z1 ) cov(z1 ,z2 ) . . . cov(z1 ,zn ) 1
⎢cov(z2 ,z1 ) cov(z2 ,z2 ) . . . cov(z2 ,zn ) 1⎥
⎢ ⎥
⎢ . . ... . .⎥
⎢ ⎥
C=⎢
⎢ . . ... . .⎥⎥, (5.52)
⎢ . . ... . .⎥
⎢ ⎥
⎣cov(zn ,z1 ) cov(zn ,z2 ) . . . cov(zn ,zn ) 1⎦
1 1 1 0
and finally
⎡ ⎤
cov(zE ,z1 )
⎢cov(zE ,z2 )⎥
⎢ ⎥
⎢ . ⎥
B=⎢
⎢
⎥.
⎥ (5.54)
⎢ . ⎥
⎣cov(zE ,zn )⎦
1
242 5 Spatial Modeling
With these notations, the first ordinary Kriging weights may be estimated by using
either the covariance or SV values from
= C−1 B. (5.56)
The estimate and estimation error depend on the weights chosen. Ideally, Krig-
ing tries to choose the optimal weights that produce the minimum estimation error.
In order to derive the necessary equations for Kriging, extensive calculus use is
required, which is not included here; however, information about the derivation
can be found in various textbooks, such as by Clark (1979) and Olea (1975). Opti-
mal weights produce unbiased estimates and have a minimum estimation variance,
which are obtained by solving a set of simultaneous equations. In the case of SV the
corresponding expressions to Eqs. (5.52, 5.53 and 5.54) are
⎡ ⎤
γ(z1 ,z1 ) γ(z1 ,z2 ) . . . γ(z1 ,zn ) 1
⎢γ(z2 ,z1 ) γ(z2 ,z2 ) . . . γ(z2 ,zn ) 1⎥
⎢ ⎥
⎢ . . ... . .⎥
⎢ ⎥
⎢ . . ... . .⎥
⎢ ⎥, (5.59)
⎢ . . . . . . . ⎥
⎢ ⎥
⎣γ(zn ,z1 ) γ(zn ,z2 ) . . . γ(zn ,zn ) 1⎦
1 1 1 0
⎡ ⎤
λ1
⎢ λ2 ⎥
⎢ ⎥
⎢ . ⎥
λ=⎢ ⎥
⎢ . ⎥, (5.60)
⎢ ⎥
⎣ λn ⎦
−μ
5.7 Ordinary Kriging 243
⎡ ⎤
γ (zE ,z1 )
⎢ γ(zE ,z2 ) ⎥
⎢ ⎥
⎢ . ⎥
⎢
B=⎢ ⎥, (5.61)
. ⎥
⎢ ⎥
⎣γ (zE ,zn )⎦
1
respectively.
Example 5.2
Ordinary Kriging methodology is demonstrated by considering groundwater levels
at five sites as given in Table 5.6 with easting, northing, and groundwater level
elevation values. The graphical locations are shown in Fig. 5.17.
The distances between each pair are given as in the following matrix.
⎡ ⎤
0
⎢3.512834 0 ⎥
⎢ ⎥
⎢
D=⎢ 2.236068 4.338202 0 ⎥
⎥
⎣1.860108 2.154066 2.19545 0 ⎦
1.529706 5.035871 2.731300 3.310589 0
Z1 4.2 3.0 89
Z2 3.7 6.5 75
Z3 1.8 2.6 98
Z4 2.9 4.5 81
Z5 4.3 1.5 105
ZE 3.2 3.0 91.27
7
Z2
5
Z4
Northing (km)
4
ZE Z1
3 Z3
2 Z5
0
Fig. 5.17 Sample 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
geographical locations Easting (km)
244 5 Spatial Modeling
The distances between the estimation point, ZE , and others are given as,
1 2 3 4 5
For large sample sizes the structure of the SV is determined as a linear function
as
Likewise, from Eq. (5.61) realizing the distances between the estimation site and
other sites as given above, it is possible o obtain
⎡⎤
3.34
⎢13.74⎥
⎢ ⎥
⎢ 5.83 ⎥
B=⎢ ⎥
⎢ 6.11 ⎥ .
⎢ ⎥
⎣ 7.37 ⎦
1
5.8 Universal Kriging 245
⎡ ⎤
0.4916
⎢ −0.0275 ⎥
⎢ ⎥
⎢ 0.2676 ⎥
⎢
λ=⎢ ⎥.
⎥
⎢ 0.2004 ⎥
⎣ 0.0679 ⎦
−0.5255
One can evaluate the estimation value from Eq. (5.57), which leads to ZE = 91.27 m.
The Kriging estimation variance is the weighted sum of the SV for the distance
from the points to the estimation location, which can be calculated from Eq. (5.58)
leading to σE2 = 4.02 m2 . This implies that the standard error of estimation is σE =
2.00 m.
It is possible to make such calculations with estimation and its estimation vari-
ance at every point. For this purpose it is useful to make cross-validation by con-
sidering each measurement as if it does not exist and Kriging methodology gives
the estimation and error variance. Hence one can construct two maps for the Krig-
ing estimates as a best guess of the mapped variable configuration and an error
map showing the confidence envelope that surrounds this estimation. All these
are based on the measurement and estimation sites configuration and distances
between measurements used in the estimation process, and on the degree of spa-
tial continuity of the ReV as expressed by the spatial covariance and preferably SV
models.
The difference between the trend surface as defined in Chapter 3, Section 3.10,
and the drift covers partial local area whereas the trend surface extends over all
the ReV variability domain. The underlying drift component can be removed from
the original measurements of the concerned ReV through different methodologies.
Among these are linear and non-linear trend surface fitting, double-dimensional
Fourier analysis, etc. These methodologies depict the drift component, and its sub-
traction from the original measurements leaves the residuals with almost zero arith-
metic average. Under the statistical theory that includes universal Kriging, a single-
valued, continuous, mapable property is called an ReV and is considered to consist
of two parts, a drift, or expected value, and a residual, or deviation from the drift.
The drift may be modeled by a local polynomial function within a neighborhood,
which is analogous to a local trend surface. If the drift is removed, the residual sur-
face can be regarded as first-order stationary in a statistical sense. Hence, again the
simple or ordinary Kriging models can be used for the residual data set. After the
estimation of residuals again, the summation with the convenient drift values, the
original ReV can be estimated.
Apart from this rather complicated and piecewise application, it is also possible
to develop universal Kriging equations including another condition of drift in the
derivations with another Lagrange multiplier. Hence, the original measurements can
be used directly in the calculations. For example, similar set of equations can be
derived for the universal Kriging to Eqs. (5.57, 5.58, and 5.59) and Eq. (5.53) as
⎡ ⎤
γ(z1 ,z1 ) λ(z1 ,z2 ) ... γ(z1 ,zk ) 1 L1,1 L2,1
⎢ γ(xz2 ,z1 ) γ(z2 ,z2 ) ... γ(z2 ,zk ) 1 L1,2 L2,2 ⎥
⎢ ⎥
⎢ . . ... . . . . ⎥
⎢ ⎥
⎢ . . ... . . . . ⎥
⎢ ⎥
C=⎢
⎢ . . ... . . . . ⎥ ⎥ (5.62)
⎢ γ(zk ,z1 ) γ(zk ,z2 ) ... λ(zk ,zk ) 1 L1,1 L2,1 ⎥
⎢ ⎥
⎢ 1 1 ... 1 0 0 0 ⎥
⎢ ⎥
⎣ L1,1 L1,2 ... L1,k 0 0 0 ⎦
L2,1 L2,2 ... L21,k 0 0 0
⎡
⎤
λ1
⎢ λ2 ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ . ⎥
λ=⎢ ⎥
⎢ λk ⎥ , (5.63)
⎢ ⎥
⎢ μ0 ⎥
⎢ ⎥
⎣ μ1 ⎦
μ3
5.8 Universal Kriging 247
and finally
⎡ ⎤
γ (zE ,z1 )
⎢γ (zE ,z2 )⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ . ⎥
⎢
B=⎢ ⎥, (5.64)
⎥
⎢γ (zE ,zk )⎥
⎢ 1 ⎥
⎢ ⎥
⎣ L1,k ⎦
L2,k
respectively. Herein, zi represents a vector of the coordinates of point I, while L1,i
(L2,i ) is the scalar value representing the location of this point along coordinate axis
horizontal (vertical) or 1 (2), for instance, east–west (north–south) direction. The
vector of universal Kriging weight is found by Eq. (5.56), except that W and B
are given by Eqs. (5.62) and (5.64). The measurement vector that includes k nearest
values is given as
⎡ ⎤
Z1
⎢Z2 ⎥
⎢ ⎥
⎢.⎥
⎢ ⎥
⎢.⎥
M=⎢ ⎥
⎢Zk ⎥ . (5.65)
⎢ ⎥
⎢0⎥
⎢ ⎥
⎣0⎦
0
In this vector there are (k + d + 1) elements, where the last d+1 elements that corre-
spond to the Lagrange multipliers are zero.
Universal Kriging is a procedure that can be used to estimate values of a surface
at the nodes of a regular grid from irregularly spaced data points. If the surface
is second-order stationary, or can be made stationary by some transformation, the
spatial autocorrelation will express the degree of dependence between all locations
on the surface, and most particularly between observations and grid nodes.
Example 5.3
The problem given in Example 5.2 can be extended for the application of Universal
Kriging. By making use of Table 5.6, the numerical form of the coefficients matrix
can be obtained similar to the previous example but with additional location val-
ues as
⎡ ⎤
0 13.68 8.78 7.37 6.11 1 4.2 3.0
⎢13.68 0 16.78 8.47 19.93 1 3.7 6.5⎥
⎢ ⎥
⎢ 8.78 16.78 0 8.86 10.68 1 1.8 2.6⎥
⎢ ⎥
⎢ 7.37 8.47 8.86 0 12.88 1 2.9 4.5⎥
γ=⎢ ⎢ ⎥.
⎥
⎢ 6.11 19.93 10.68 12.88 0 1 4.3 1.5⎥
⎢ 1 1 1 1 1 0 0 0⎥
⎢ ⎥
⎣ 4.2 3.7 1.8 2.9 4.3 0 0 0 ⎦
3.0 6.5 2.6 4.5 1.5 0 0 0
248 5 Spatial Modeling
⎡ ⎤
−0.1248 0.0212 −0.0192 0.0342 0.0887 −0.9626 0.2474 0.0031
⎢ 0.0212 −0.0328 −0.0222 0.0552 −0.0184 −0.5517 0.0815 0.1934 ⎥
⎢ ⎥
⎢ −0.0192 −0.0222 −0.0252 0.0559 0.0108 1.7447 −0.3575 −0.0652 ⎥
⎢ ⎥
⎢ 0.0342 0.0522 0.0559 −0.1246 −0.0177 0.2611 −0.1285 0.0333 ⎥
−1 =⎢
⎢ 0.0887
⎥.
⎢ −0.0184 0.0108 −0.0177 −0.0634 0.5085 0.1571 −0.1646 ⎥
⎥
⎢ −0.9626 −0.5517 1.7447 0.2611 0.5085 40.1311 −8.5379 −5.6915 ⎥
⎢ ⎥
⎣ 0.2474 0.0815 −0.3575 −0.1285 0.1571 −8.5379 2.5455 −0.0474 ⎦
0.0031 0.1934 −0.0652 0.0333 −0.1646 −6.6915 −0.0474 1.5258
On the other hand, the elements of the right-hand side vector in Eq. (5.39) becomes
as follows.
⎡ ⎤
3.34
⎢13.74⎥
⎢ ⎥
⎢ 5.83 ⎥
⎢ ⎥
⎢ 6.11 ⎥
⎢
B=⎢ ⎥
⎥
⎢ 7.37 ⎥
⎢ 1 ⎥
⎢ ⎥
⎣ 3.2 ⎦
3.0
The universal Kriging weight vector can be found after the necessary algebraic cal-
culations according to Eq. (5.56) as
⎡ ⎤
0.4625
⎢ −0.0362 ⎥
⎢ ⎥
⎢ 0.3094 ⎥
⎢ ⎥
⎢ 0.2157 ⎥
λ=⎢ ⎥
⎢ 0.0486 ⎥ .
⎢ ⎥
⎢ 0.4546 ⎥
⎢ ⎥
⎣ −0.2998 ⎦
0.0123
Finally, after all these calculations the estimation value can be obtained from
Eq. (5.57) as ZE = 91.34 m.
One can notice that for the given example there is no major difference between
the ordinary and universal Kriging methodologies. Ordinary Kriging, however, in
common with other weighted-averaging methods, does not extrapolate well beyond
the convex hull of the control points. That is, most estimated values will lie on the
slopes of the surface and the highest and lowest points on the surface usually will
be defined by estimation (control) points.
5.9 Block Kriging 249
d1,i
i
i
d16,i
d1,i d2,i d3,i d4,i
1 2 3 4
b
a
Since gradual (trend) or abrupt (shifts) climatic change questions have gained par-
ticular attention in recent years most of the researches on lake level changes are con-
cerned with meteorological factors of temperature and precipitation data. Along this
line of research Hubert et al. (1989), Vannitseni and Demaree (1991), and Sneyers
(1992) used statistical methods to show that temperature, pressure, and flow series
in Africa and Europe have altered several times during the present century. On the
other hand, as stated by Slivitzky and Mathier (1993), most of the modelling of
levels and flow series on the Great Lakes has assumed stationarity of time series
using either Markov or ARIMA (AutoRegressive Moving Average) processes pre-
sented by Box and Jenkins (1976). These models may work on lags of one, two,
three, or more, but they consider the linear structure in the lake level fluctuations.
Since, lake level fluctuations do not have stationarity property, classical models such
as Markov and ARIMA processes cannot stimulate lake levels reliably. Multivari-
ate models using monthly lake level variable failed to adequately reproduce the
statistical properties and persistence of basin supplies (Loucks, 1989; Iruine and
Eberthardt, 1992). On the other hand, spectral analysis of water levels pointed to
the possibility of significant trends in lake level hydrological variables (Privalsky,
1990). Almost all these scientific studies relied significantly on the presence of an
autocorrelation coefficient as an indicator of long-term persistence in lake level time
series. However, many researchers have shown that shifts in average lake level might
introduce unrealistic and spurious autocorrelations. This is the main reason why the
classical stochastic and statistical models often fail to reproduce the statistical prop-
erties. However, Mathier et al. (1992) were able to reproduce adequately the statis-
tical properties of a shifting-mean model. In the following sequel a version of the
Kriging methodology is adopted and used for the lake level estimations. For this
purpose, the world’s largest soda lake, Lake Van on the Anatolian high plateau in
eastern Turkey (38.5◦ N and 43◦ E) is adopted for application (Fig. 5.19). Lake Van
area has very severe winters with frequent temperatures below 0◦ C. Most of the
5.10 Triple Diagram Model 251
precipitation falls during winter season in the form of snow and toward the end of
spring heavy rainfalls occur. High runoff rates occur in spring during snowmelt, and
more than 80% of annual discharge reaches the lake during this period. The sum-
mer period (July to September) is warm and dry with average temperatures of 20◦ C.
Diurnal temperature variations are about 20◦ C.
Human beings can visualize at the maximum three-dimensional variations. The
best configuration and interpretation of such variations can be achieved in three-
dimensional Cartesian coordinate systems through contour maps. Generally, maps
are regarded as the variation of a variable by location variables that are either longi-
tudes and latitudes or eastings and northings (Isaaks and Srivastava, 1989; Cressie,
1993; Kitanidis, 1997). Hence, it is possible to estimate the concerned (mapped)
variable value for a given pair of location variables. Since one wants to predict the
current lake level from previous records, it is suggested that two previous records
replace the two location variables. In this manner, it is possible to map the current
values of a variable based on two previous values of the same variable. The first
step in any Kriging methodology prior to mapping is to determine the sample SV,
which guides for the theoretical model that will be employed in the classical Kriging
modelling. For this purpose, the scatter of SV values versus distance is obtained for
lag-one, -two, and -three. In order to depict the general trend of the scatter diagram,
the distance range is divided into nine intervals, and the average of the SV values
that fall within each interval is considered as the representative SV value within
the mid-point distance of that interval as suggested by Myers et al. (1982). Dif-
ferent theoretical SV models such as linear, power, spherical, and Gaussian types
are tried for the best fit, and at the end the Gaussian SV is seen to have the best
match with the sample SV trend (see Fig. 5.20). The Gaussian model is the most
suitable in all lags and the properties of fitted Gaussian SV model are presented in
Table 5.7.
252 5 Spatial Modeling
Semivariance
theoretical SV for three lags 4505.551
3003.701
1501.850
0.000
0.00 48.51 97.02 145.53 194.04
Distance
Lag-2, Gaussian
Semivariance 4998.219
3748.664
2499.110
1249.555
0.000
0.00 48.51 97.02 145.53 194.04
Distance
Lag-3, Gaussian
Semivariance
4248.569
3186.427
2124.284
1062.142
0.000
0.00 48.51 97.02 145.53 194.04
Distance
Lake Van water level records are used for the implementation of the Kriging
methodology so as to obtain triple diagrams that give the common behavior of three
variables, which are taken consequently from the historical time series data. The first
two variables represent the two past lake levels and third one indicates the present
lake levels. Hence, the model has three parts, namely observations (recorded time
series) as input, triple diagram as response, and the output as prediction. It is pos-
sible to consider lags between the successive data at one, two, three, etc. intervals.
Such an approach is very similar to a second-order Markov process, which can be
expressed as
where Hi , Hi−1 , and Hi−2 are the three consecutive lake levels; α and β are model
parameters; and finally, εi is the error term. The application of such a model requires,
prior to any prediction procedure, the parameter estimations from the available data.
Furthermore, its application is possible under the light of a set of assumptions,
which includes linearity, normality (Gaussian distribution of the residuals, i.e., εi ’s),
variance constancy, ergodicity, and independence of residuals. The triple diagram
replaces Eq. (5.66) without any restriction in the form of map. Such a map presents
the appearance of natural relationship between three consecutive time values of the
same variable.
In order to apply the triple diagram approach, it is necessary to divide the data
into training and testing parts. Herein, the past 24 months (two years) are left for
the test (prediction) whereas all other data values are employed for training, which
is the TDM as in Fig. 5.21.
Prior to any prediction, it is possible to draw the following interpretations from
these figures.
1) In the case of lag-one there is a strong relationship between Hi−1 and Hi−2 with
increasing contour values of Hi along almost 45◦ line (see Fig. 5.21a). The small
Hi values are concentrated at small Hi−1 and Hi−2 values; this implies the clus-
tering of small values of the three consecutive lake levels. Similarly, high lake
level values of the three consecutive levels also constitute high values cluster.
This means that small values follow small values and high values follow high
values, which indicates positive correlations. Local variations in the contour
lines appear at either low (high) Hi−1 or high (low) Hi−2 values. Consequently,
better predictions can be expected within a certain band around the 45◦ line
(Fig. 5.22). It is possible to deduce the following set of logical rules from Fig.
5.21b.
IF Hi−1 is low and Hi−2 is low, THEN Hi is low,
IF Hi−1 is medium low and Hi−2 is medium, THEN Hi is medium,
IF Hi−1 is high and Hi−2 is high, THEN Hi is high.
These rules can be used for fuzzy logic inference system as suggested by Zadeh
(1968).
254 5 Spatial Modeling
Hi Hi
400 400
350 350
300 300
Hi – 1
Hi – 2
250 250
200 200
150 150
100 100
100 150 200 250 300 350 400 100 150 200 250 300 350 400
Hi – 2 Hi –3
a b
Hi
400
350
300
Hi – 3
250
200
150
100
100 150 200 250 300 350 400
Hi – 4
c
Fig. 5.21 Lake level TDMs (a) lag-one, (b) lag-two, (c) lag-three
450
400
350
300
Predicted
250
200
150
100
50
0
0 100 200 300 400
Observed
2) In Fig. 5.21b the variations in the contour lines become very distinctive and
rather haphazard compared to Fig. 5.21a. This implies that with the increment
in the lag value, present time lake level prediction will have more relative error.
There is also a distinctive 45◦ line, but with comparatively narrower band of
certainty around it.
3) Finally, at lag-three case (Fig. 5.21c) the contour pattern takes even more hap-
hazard variation. This implies increase in the relative error of predictions.
Şen et al. (1999, 2000) identified suitable models and estimates for lake level
fluctuations and their parameters for trend, periodic, and stochastic parts. A second-
order Markov model is found suitable for the stochastic part. As explained before,
TDM of lake levels can replace the second-order Markov process. In this manner, it
is not necessary to use first- and second-order autocorrelation coefficients, in order
to take into account more persistence. In order to make predictions for the past
24 months that are not used in the triple diagram constructions in Fig. 5.21, it is
necessary to enter Hi−1 and Hi−1 for each month on vertical and horizontal axes,
respectively. The prediction value of Hi can be either read from this map approxi-
mately, or calculated by using Kriging prediction equations. The prediction results
are shown in Table 5.8 with corresponding relative error amounts. Individual errors
400 400
Lake level (cm)
200 200
Observation Observation
100 Prediction Prediction
100
0 0
0 10 20 30 0 10 30 40
(Month) (Month)
a b
400
Lake level (cm)
300
200
Observation
100 Prediction
0
0 10 30 40
(Month)
c
Fig. 5.23 Observed and predicted lake levels (a) lag-one, (b) lag-two, (c) lag-three
are slightly greater than 10%, but the overall prediction relative error percentage is
about 4.83%.
Figure 5.23 indicates the observed and predicted Hi values. It is obvious that they
follow each other very closely, and on the average observed and predicted lake level
series have almost the same statistical parameters.
The triple diagram model depicts even the increasing trend, which is not possi-
ble directly with the second-order Markov process. During the prediction procedure
there is no special treatment of trend, but even so it is modeled successfully. How-
ever, in any stochastic or statistical modeling, it is first necessary to make trend
analysis and separate it from the original data. In order to further show the verifi-
cation of the triple diagram approach for lake level predictions, in Fig. 5.22 the test
data are plotted versus the predictions. It is obvious that almost all the points are
around 45◦ line and hence the model is not biased. Predictions are successful at low
or high values.
The mean annual and seasonal rainfall records in the southwest of Saudi Arabia are
adapted from the reports published by the Hydrology Division, Ministry of Agricul-
ture and Water in Saudi Arabia, and Al-Jerash (1985). Rainfall records at 63 stations
from 1971 to 1990 are selected for the Kriging application (Fig. 5.24). These stations
are chosen based on four criteria (Subyani, 2004, 2005).
5.11 Regional Rainfall Pattern Description 257
38 42 46 50 54 58
Taif
26°
Turabah Riyadh
o '
22°
21 00
STUDY
AREA
18°
14°
0 500 Km
Tathlith
900
T
o '
I
19 00
H
Kiyat
R
A
E
Abha
D
Qa
M
o ' ma
S
18 00 h
A
E
A
Najran
H
Descriptive statistics for the 63 stations are listed in Table 5.9 for these three
regions. All together 25 stations are located within the coast; 24 within mountains;
and 14 stations within the leeside.
Problems in the data, such as non-normality, trend, and outliers, should be fixed
before developing any kind of model. Normality of the sample data distribution
is known to improve the results from Kriging. Transformation is very important
to make the data more symmetric, linear, and constant in variance. Since annual
258
Region Number of stations Winter (mm) Spring (mm) Summer (mm) Fall (mm) Annual (mm)
and seasonal rainfall data are considered, it is pragmatic to find one transformation,
which works reasonably well for all. The Box-Cox transformation is widely used
and can be easily managed so that the skewness of transformed data Z(x,t) becomes
close to zero (Salas, 1993).
However, rainfall histogram in arid regions, as stated by Hevesi et al. (1992),
behaves as lognormal distribution. Hence, the transformation as Y = ln Z(x)
is applied for determining approximately normal annual and seasonal data. It is
accepted as normally distributed, if the computed Kolmogorov-Smirnov statistic
(Dmax ) is less than the corresponding critical value. The critical value for the 5%
level of significance is D0.05 = 0.171, which is greater than Dmax of transformed
data. Thus the null hypothesis of the transformed data normality cannot be rejected
at 0.05 level of significant. Further investigation can be done by visually observing
the normal probability plots, and most of the data lie on a straight line for the trans-
formed rainfall values. In addition, the skewness coefficients are reduced close to
zero (Subyani, 2005). Table 5.10 shows the statistics and normality test for original
and transformed annual and seasonal data.
The back-transformed value, i.e., exp (Y(x)) is biased predictor. However, the
unbiased expression for the Kriging estimates Z(x) is given by Aboufirassi and
Mariño (1984), Gilbert (1985), and Deutsch and Journel (1992). As
/ 0
Z∗ (x) = exp Y∗ (x) + σy2 /2 , (5.67)
where Z∗ (x) is the original data in millimeters; and Y∗ (x) is the natural logarithm,
and σy2 is the lognormal Kriging variance. The estimation variance is given as
2 " #
σZ2 ∗ = Z∗ exp (σy2 ) − 1 . (5.68)
These two last expressions are used for constructing the rainfall isohyetals and
their variances. Figure 5.25 shows the sample and fitted SV for the natural log of
average annual rainfall (LnAAR). An isotropic spherical model with no nugget, but
with a sill equal to the sample variance of 0.48, and a range of 110 km, is selected
as the best representation of the spatial structure.
Table 5.10 Descriptive statistics and normality test for annual and seasonal data
Season Mean (mm) Median (mm) St. Dev. (mm) Skewness CV(%) K-S (Dm )
0.6
0.5
0.4
γ (h)Ln P.
Range = 110 km
0.3 Sill = 0.48
0.2
Experimental Variogram
Fitted Spherical Model
0.1
0.0
0 100 200 300 400
Distance (km)
Table 5.11 SV
Cross-validation Season Model N MEE RMSE
1.4 0.8
1.2 0.7
0.6
1.0
0.5
γ (h) Ln P.
γ (h) Ln P.
0.8 Range = 70 km
0.4 Sill = 0.67
Range = 90 km
0.6 Nugget = 0.05
Sill = 0.95
0.3
Nugget = 0.1
0.4 Expremintal Variogram
Experimental Variogram
0.2
Fitted Exponential Model
0.2 Fitted Exponential Model 0.1
0.0 0.0
0 100 200 300 400 0 100 200 300 400
Distance (km) Distance (km)
1.6 1.6
1.2 1.2
γ (h) Ln P.
γ (h) Ln P.
0.0 0.0
0 100 200 300 400 0 100 200 300 400
Distance (km) Distance (km)
The kriged isohyets for annual rainfall show a rapid increase in average from the
Red Sea shore line up to the mountains and a gradual decrease to the north and east
parts of the study area (Fig. 5.27a).
Orographic effects are produced toward the mountain area with the maximum
kriged estimate exceeding 350 mm/year. In the east and northeast parts of the
study area, Kriging estimates are around 100 mm/year. In the northern part with
a moderate elevation reaching more than 1,000 m, Kriging estimates also exceed
100 mm/year. This figure reflects the topographic variation similar to Fig. 5.24 with
annual rainfall that generally increases with elevation.
Kriging variances indicate similar behavior to the average annual rainfall esti-
mates. Small values near the clusters of stations in the mountain area (Fig. 5.27b)
indicate high estimation accuracy, whereas large values in the north, east, and north-
east indicate low estimation accuracy areas owing to the scarcity of sample loca-
tions. Generally, high estimation variances appear at areas of lacking data.
In winter (December–February), rainfall is associated most of the time with moist
and cold air of northerly Mediterranean origin, which is coupled with the local
effects of the Red Sea convergence zone the Scarp mountains as well as orographic
rainfall occurrences. Figure 5.28a shows that the Kriging estimates exceed 120 mm
in the middle and northern section of the mountainous areas.
However, they do not exceed 30 mm in the south of Tihamah, because there is no
Mediterranean or Red Sea effect, and the elevation is not high enough for orographic
rainfall occurrence. On the other hand, the southern part of the study area receives
less than 100 mm of rainfall due to the absence of monsoons. The Plateau area (east
262 5 Spatial Modeling
600000 600000
Taif N Taif N
500000 500000
300000 300000
R
R
E
E
D
D
S
S
150 150
E
E
Najran Najran
A
A
100000 100000
Jizan Jizan
0
0.0
1 2 3 4
100 Km YEMEN
0 0
0 100000 200000 300000 400000 0 100000 200000 300000 400000
a b
Fig. 5.27 Isohyetal map of Kriging for annual rainfall (mm)
and northeast parts of the study area) receives less than 20 mm, because it is located
in the shadow (lee-side) area. Kriging estimation variance map has a similar trend
throughout the study area as shown in Fig. 5.28b.
600000 600000
Taif N Taif N
500
500000 500000
300000 300000
R
R
E
Abha
D
Abha
200000 200000
S
S
E
Najran Najran
A
100000 100000
Jizan Jizan
0
0.0
1 2 3 4
100 Km YEMEN 0
0.0
1 2 3 4
100 Km YEMEN
0 0
0 100000 200000 300000 400000 0 100000 200000 300000 400000
a b
600000 600000
Taif N Taif N
80
500000 500000
Bishah Bishah
Lith Lith
400000 400000
300000 300000
R
R
E
E
D
Abha
D
Abha
200000
S
200000
S
E
E
Najran
5000
A
Najran
A
100000
100000
Jizan
Jizan
YEMEN 0.0 100 Km YEMEN
0.0 100 Km
0
0 0 100000 200000 300000 400000
0 100000 200000 300000 400000
a b
In spring (March–May), the whole region comes under the influence of south-
east monsoon air stream flow, the Red Sea convergence zone, and Mediterranean
depression, which distribute the rainfall in all regions. The Kriging estimates give
more detailed information about the rainfall distribution as shown in Fig. 5.29a.
Rainfall in this figure increases gradually from the Red Sea coast (40 mm) to
the mountain where the highest amount of rainfall falls (more than 160 mm) and
decreases to the plateau area, which receives about 100 mm. Generally, the south-
west region of the Arabian Peninsula receives the highest amount of rainfall during
the spring season compared to other seasons. This high amount of rainfall is a result
of increasing African–Mediterranean interaction effect, where rainfall occurs oro-
graphically in the mountains and southeast monsoon effect where the Plateau and
eastern slope receive more rainfall than the Red Sea coast. Kriging variances show
an increase in estimation accuracy as shown in Fig. 5.29b.
In summer (June–August), the southwest monsoon flow from the Indian Ocean
and the Arabian Sea is the predominant factor, which increases the rainfall along
the Scarp mountains and low elevation areas in the south of the study area. Kriging
estimates for summer season exceed 120 mm in the mountains and 160 mm in the
foothills near the Yemen border at the southwestern corner of the Arabian peninsula
(Fig. 5.30a).
Rainfall decreases toward the northern part of the study area due to its distance
from the monsoon effect, even though this area has a high elevation. Moreover,
the Kriging variances show no change in estimation accuracy in the foothills and
mountains, but they cannot be calculated with reliability in the plateau areas due to
the paucity of data (see Fig. 5.30b).
264 5 Spatial Modeling
600000 600000
Taif N Taif N
20
500000 500000 1000
10
1000
R
E
E
D
D
Abha Abha
200000
S
200000
S
E
E
Najran
A
Najran
A
20
100000 100000
Jizan Jizan
0.0 100 Km YEMEN YEMEN
0.0 100 Km
0 0
0 100000 200000 300000 400000 0 100000 200000 300000 400000
a b
In fall (September–November), the local diurnal circulation and the southern air
stream weakens. In other words, it is a transition period from summer to winter and,
in general, the area receives little amount of rainfall. The Kriging estimation in the
foothills and the mountains in the southern part of the study area shows that they
receive higher amount of rainfall than the northern areas, similarly to the fall mon-
soon flow effects as shown in Fig. 5.31a. The Kriging variances show an increase in
600000 600000
Taif N Taif N
30
500000 500000
300000 300000
R
R
E
E
D
D
Abha Abha
200000 200000
S
S
E
E
Najran Najran
A
A
100000 100000
Jizan Jizan
0.0 100 Km YEMEN 0 1 2 3 4 YEMEN
0.0 100 Km
0 0
0 100000 200000 300000 400000 0 100000 200000 300000 400000
a b
N
TER
WIN
RED SEA
G
RIN
SP
RED SEA
ER
MM
SU
RED SEA
L
FAL
RED SEA
Fig. 5.32 Spatiotemporal Kriging maps for rainfall in southwest Saudi Arabia
266 5 Spatial Modeling
estimation accuracy in the northern part of the study area, whereas there is no clear
change in the southern part as shown in Fig. 5.31b.
Generally, rainfall is predominant in the northern mountain areas during winter as
a result of the Mediterranean effect, and it is widespread in all regions during spring
because of the local diurnal circulation effects. Orographic conditions are clear in
winter and spring seasons. This orographic factor is also clear for the appearance
of the nugget effect in the exponential models in both winter and spring seasons.
During summer, rainfall moves toward the south due to the monsoon flow effect,
with its southwesterly wind. However, during fall, as a transition season, the area
comes under the influence of monsoon as well as the local diurnal circulations.
Figure 5.32 illustrates these spatio-seasonal variations of rainfall in the southwest
of Saudi Arabia.
The Kriging estimation variances are also investigated concerning the spatial and
temporal variation of rainfall in the study area for these four seasons. In space varia-
tion, the small value or high estimation accuracy of Kriging variances occurs in the
mountainous areas in all seasons. Towards the east, north, and northeast of the study
area, there is a consistent increase in variances implying low estimation accuracy.
In time variation, Kriging variance increases from winter to fall. These variations in
space and time are due to several factors such as
The description of the rainfall variability in space and/or in time is among the fun-
damental requirements for a wide variety of human activities and water resources
project design, management, and operation. Geostatistical methods are applied to
develop new maps for the prediction of rainfall over different seasons. The assigned
objectives of this study are to predict the magnitude and variation of the rainfall
in space as well as during different time periods. These techniques are applied to
rainfall data gathered from meteorological station network covering the southwest
region of the Arabian Peninsula. Rainfall in this area is characterized by high varia-
tion in spatial and temporal distributions.
References
Aboufirassi, M., and Mariño, M. A., 1984. A geostatistically based approach to the identification
of aquifer transmissivities in Yolo Basin, California. Math. Geol. 16(26), 125–137.
Al-Jerash, M., 1985. Climatic subdivisions in Saudi Arabia: an application of principle component
analysis, J. Climate 5, 307–323.
Barnes, S. L., 1964. A technique for maximizing details in numerical weather map analysis. J.
Appl. Meteor. 3, 396–409.
References 267
Box, G. E. P., and Jenkins, G. M., 1976. Time Series Analysis Forecasting and Control. Holden
Day, San Francisco, 560 pp.
Buell, C. E., 1958. The correlation between wind and height on a isobaric surface II: summer data.
J. Meteorol. 15(12), 502–512.
Carr, J., and Glass, C., 1984. Estimation of earthquake response spectra using Kriging. In: G. Verly
et al. (Eds.), Geostatistics for Natural Resources Characterization. Proceedings of the 2nd
NATO Advanced Study Institute of Geostatistics for Natural Resources Characterization: Part
2, D. Reidel, Dordrecht, Holland, pp. 745–752.
Carr, J., Bailey, R., and Deng, E., 1985. Use of indicator variogram for an enhanced spatial analysis.
J. Math. Geol. 18, 197–213.
Carr, J., and Glass, C., 1985. Treatment of earthquake ground-motion using regionalized variables.
J. Math. Geol. 17, 221–241.
Clark, I., 1979. Practical Geostatistics. Applied Science Publishers, London, 281 pp.
Cressie, N. A. C., 1993. Statistics for Spatial Data (revised edition). Wiley, New York, 900 pp.
Cressman, G. P., 1959. An operational objective analysis system. Monthly Weather Rev. 87(10),
367–374.
David, M., 1977. Geostatistical Ore Reserve Estimation. Elsevier Scientific Publishing Company,
364 pp.
Davis, J. C., 1986. Statistics and Data Analysis in Geology. John Wiley & Sons, Inc., New York,
646 pp.
Delhomme, J. P., 1978. Kriging in the hydrosciences. Adv. Resour. 1, 251.
Deutsch, C. V., and Journel, A. G., 1992. GSLIB Geostatistical Software Library and User’s Guide.
Oxford University Press, New York, TIC: 224174.
Dixon, R., 1969. Orthogonal polynomial as a basis for objective analysis. Meteorological Office,
Scientific Paper No. 30, Her Majesty’s Stationery Office, London, 20 pp.
Dixon, R., 1976. An objective analysis using orthogonal polynomials. Proceedings of the JOC
Study Group Conference on four-dimensional Data Assimilation, Paris, GARP Program on
Numerical Experimentation, Report No. 11, pp. 73–85.
Eddy, A., 1964. The objective analysis of horizontal wind divergence fields. Quart. J. Roy. Meteo-
rol. Soc. 90, 424–440.
Eliassen, A., 1954. Provisional report on spatial covariance and autocorrelation of the pressure
field. Inst. Weather and Climate Research Academy of Science Oslo Report No. 5.
Fisher, R. A., 1935. The Design of Experiments. Oliver and Boyd, Edinburg.
Flattery, T. W., 1970. Spectral models for global analysis and forecasting, Proceedings of the Sixth
AWS Technical Exchange Conference, U. S. Naval Academy, 21–21 September, Air Weather
Service Technical Report 242, pp. 42–54.
Fritsch, J. M., 1971. Objective analysis of a two-dimensional data field by the cubic spline tech-
nique. Monthly Weather Rev. 99(5), 1122–1143.
Gandin, L. S., 1963. Objective Analysis of Meteorological Fields. Leningrad, Gidromet; Translated
from Russian, Jerusalem, Israel Program for Scientific Translations, 1965, 242 pp.
Gandin, L. S., 1965. Objective Analysis, Lectures on Numerical Short-range Weather Prediction,
WMO, Regional Training Seminar, pp. 633–677.
Gilbert, S., 1985. Introduction to Applied Mathematics. Wellesley-Cambridge Press, Wellesley,
MA 02181.
Glass, C. E., 1978. Application of regionalized variables to micro-zonation. Proc. 2nd Int. Conf.
Micro-zonation for Safer Construction 1, 509–512.
Habib, Z., 1999. Optimum interpolation modeling supported by the cumulative semivariogram
for spatial-temporal meteorological variables. Unpublished Ph.D. Thesis, Istanbul Technical
University, 156 pp.
Hevesi, J. A., Istok, J. D., and Flint, A. L., 1992. Precipitation estimation in mountainous terrain
using multivariate geostatistics, Part I. Structural analysis. J. App. Met. 31, 661–676.
Hubert, P., Carbomnel, J. D., and Chaouche, A., 1989. Segmentation des series hydrometeoralo-
ques: application a des series de precipitation et de debits de L’Afrique de L’ouest. J. Hydrol.
110, 349–367.
268 5 Spatial Modeling
Iruine, K. N., and Eberthardt, A. K., 1992. Multiplicative seasonal ARIMA models for lake Erie
and Lake Ontario water levels. Water Res. Bull. 28(3), 385–396.
Isaaks, E. H., and Srivastava, R. M., 1989. An Introduction to Applied Geostatistics. Oxford Uni-
versity Press, Oxford, 561 pp.
Journel, A. G., and Huijbregts, C. I., 1978. Mining Geostatistics. Academic Press, London, 710 pp.
Journel, A. G., 1983. Non-parametric estimation of spatial distribution. Math. Geol. 15(13),
445–468.
Kalman, R. E., 1960. A new Approach to linear filtering and prediction problem. J. Basic Eng.
Series D 82(1), 35–45.
Kitanidis, P. K. 1997. Introduction to Geostatistics: Applications in Hydrogeology. Cambridge
University Press, Cambridge, 249 pp.
Krige, D. G., 1951. A statistical approach to some basic mine evaluation problems on the Witwa-
teround. J. Chimic. Min. Soc. South-Africa 52, 119–139.
Ledimet, F., and Talagrand, O., 1986. Variational algorithms for analysis and assimilation of mete-
orological observation: theoretical aspects. Tellus 38A(2), 97–110.
Lorenc, A. C., 1981. A global three-dimensional multivariate statistical interpolation scheme.
Monthly Weather Rev. 109, 701–721.
Loucks, F. D., 1989. Modeling the Great Lakes hydrologic-hydraulic system. Ph D Thesis, Univer-
sity of Wisconsin, Madison.
Marsily, G. D., 1986. Quantitative Hydrogeology, Groundwater Hydrology for Engineers. Aca-
demic Press, New York.
Martinez, C. A., 1996. Multivariate geostatistical analysis of evapo-transpiration and precipitation
in mountainous terrain. J. Hydrol. 174, 19–35.
Matheron, G., 1963. Principles of geostatistics. Econ. Geol. 58, 1246–1266.
Matheron, G., 1971. The theory of regionalized variables and its applications. Ecole de Mines.
Fontainbleau, France.
Mathier, L., Fagherazzi, L., Rasson, J. C., and Bobee, B., 1992. Great Lakes net basin supply
simulation by a stochastic approach. INRS-Eau Rapp Scient (fique 362, INRS-Fau, Sainte-
Foy), 95 pp.
Meleschko, V. P., and Prigodich, A. E., 1964. Objective analysis of humidity and temperature.
Trudy Sympoziuma Pochislenyum Melodam Prognoza Pagody, Gidrometeoizdat, Moscow,
U.S.S.R.
Myers, D. E., Begovich, C. L., Butz, T. R., and Kane, V. E., 1982. Variogram models for regional
groundwater chemical data. Math. Geol. 14, 629–644.
Olea, R. A., 1975. Optimum mapping techniques using regionalized variable theory: series on
spatial analysis no. 3. Lawrence, Kansas, Kansas Geological Survey, 137 pp.
Olea, R. A., 1999. Geostatistics for Engineers and Earth Scientists. Kluwer Academic Publishers,
Boston, MA, 303 pp.
Panofsky, H. A., 1949. Objective weather map analysis. J. Meteorol. 6, 386–392.
Privalsky, V., 1990. Statistical analysis and predictability of Lake Erie water level variations. In:
H. C. Hurtmann and M. J. Donalhue (Eds.), Proc Great Lakes Water Level Forecasting and
Statistics Symposium. Great Lake Comission, Ann Arbor, Michigan, pp. 255–264.
Salas, M., 1993. Analysis and modeling of hydrologic time series. In: D. A. Maindment (Ed.),
Handbook of Hydrology. McGraw-Hill, New York, pp. 19.1–19.72.
Sasaki, Y., 1958. An objective analysis based on variational method. J. Meteorol. Soc. Japan 36(3),
77–88.
Schlatter, T. W., and Branstator, G. W., 1987. Experiments with a three-dimensional statistical
objective analysis scheme using FGGE data. Monthly Weather Rev. 115, 272–296.
Schlatter, T. W., 1988. Past and present trends in the objective analysis of meteorological data for
now-casting and numerical forecasting. Eight Conference on Numerical Weather Prediction,
American Meteorological Society, pp. j9–j25.
Seo, D. J., 1998. Real-time estimation of rainfall fields using radar rainfall and rain gage data. J.
Hydrol. 208(1–2), 37–52.
Şen, Z., 1980. Adaptive Fourier analysis of periodic-stochastic hydrologic series. J. Hydrol. 46,
239–249.
References 269
Şen, Z., 1983. Predictive hydrologic models in hydrology. Nordic Hydrol. 14, 19–32.
Şen, Z., 1997. Objective analysis of cumulative semivariogram technique and its application in
Turkey. J. Appl. Meteorol. 36, 1712–1720.
Şen, Z., Kadıoğlu, M., and Batur, E., 1999. Cluster regression model and level fluctuation features
of Van Lake, Turkey. Ann. Geophysicae, 17, 273–279.
Şen, Z., and Habib, Z., 2000. Spatial precipitation assessment with elevation by using point cumu-
lative semivariogram technique. Water Resour. Management 14, 311–325.
Şen, Z., Kadıoğlu, M., and Batur, E., 2000. Stochastic modeling of the Van Lake monthly fluctua-
tions in Turkey. Theor. Appl. Climatol. 65, 99–110.
Şen, Z., 2008. Wadi Hydrology. CRC Lewis Publishers, Boca Raton, 386 pp.
Slivitzky, M., and Mathier, L., 1993. Climatic changes during the 20th century on the Lauren-
tian Great Lakes and their impacts on hydrologic regime. NATO Advanced Study Institute,
Deauaille, France.
Sneyers, R., 1992. On the use of statistical analysis for the objective determination of climate
change. Meteorol Z. 1, 247–256.
Student, 1907. On the error of counting with a hemacytometer. Biometrika 5, 351–360.
Subyani, A. M., 2004. Geostatistical study of annual and seasonal mean rainfall patterns in south-
west Saudi Arabia. Hydrol. Sci. J. 49(5), 803–817.
Subyani, A. M., 2005. Hydrochemical identification and salinity problem of groundwater in Wadi
Yalamlam basin, Western Saudi Arabia. J. Arid Environ. 60(1), 53–66.
Vannitseni, S., and Demaree, G., 1991. Detection et modelisation des secheresses an Sahel-
proposition d’une nouvelle methodologie. Hydra! Continent 6(2), 155–171.
Yates, F., 1938. The comparative advantages of systematic and randomized arrangements in the
design of agricultural and biological experiments. Biometrika 30, 444–466.
Zadeh, L. A., 1968. Probability measures of fuzzy events. Oikos. 23, 421–427.
Chapter 6
Spatial Simulation
6.1 General
Spatial simulation models of geological phenomenon such as ore grades,
groundwater level elevations, porosity, chemical compositions of different litholog-
ical units, fracture spacing are bound to be increasingly important, due to their abil-
ity to model the underlying generating mechanism of these phenomena. The first
approximations in quantifying the geological phenomena were rather deterministic
and they did not take into account any factor of chance in their descriptions. In fact,
the variability in these phenomena was known to geologists for many decades, but
due to the deterministic training the solutions sought were also deterministic. For
instance, at most the arithmetical average of the data concerned is calculated and
then this value is treated as the best estimator of the phenomena. This way of calcu-
lation gave rise to over-estimations, which consequently made the geologists aware
of the fact that the variability within the data should definitely be taken into con-
sideration (Krige, 1951). This awareness led geologists to consider the frequency
distribution functions of any spatial variable. Hence, a new trend of statistical meth-
ods in analyzing the geological data was started (Krumbein, 1970; Agterberg, 1975).
It is a mere application of the statistical methods to process geologic data.
Later, it was recognized that, in addition to variability inherent in the data, the
interdependence either within the data themselves or with other variables is very
important due to the continuity of spatial variables. In order to account for the depen-
dence either the correlation techniques (Matern, 1960; Switzer, 1965) or the SVs are
employed (Matheron, 1965).
As a consequence of the aforementioned developments, the geologists started to
think about a convenient model to simulate the spatial variables so as to be able to
control them in the case of any change as well as in assessing the risks associated
with the data. Of course, developments of digital computers had a great impact on
this trend since without computers any simulation study is very tedious, needs great
patience, and is rather time-consuming, i.e., not practical.
Although simulation studies in other branches of science like engineering and
economics have started long time ago, the simulation of geological phenomena is
postponed. This is due to the fact that in other disciplines simulation is needed only
along one axis, for instance, time axis. Therefore, the simulation of one dimensional
variable is extensively available in the literature (Box and Jenkins, 1970).
General simulation models are possible for the generation of anisotropic as well
as isotropic synthetic patterns in 1D, 2D, or 3D, which have significance for the
purpose of modeling geologic properties such as ore grades, reservoir porosity,
mineral distribution, fracture spacing, aperture, orientation. General procedures for
such simulations by the autoregressive processes (including Markov processes) are
given for the model parameters estimation and synthetic pattern generation. The
model works on the square net basis and generates sequential pattern, first along
any desired direction for 1D simulation and then 2D patterns are constructed with
reference to two orthogonal 1D sequences. Applications to synthetic 2D pattern are
shown for isotropic cases with different model parameters. The extension of model
to 3D space is readily available.
6.2 3D Autoregressive Model 273
where xi,j,k is the spatial variable at a point with coordinates i, j, and k with respect
to a reference point as in Fig. 6.1.
Herein, α, β, and γ are the model parameters and εi,j,k, is a random component
with zero mean and variance, σε 2 , and it is independent from xi–1,j,k , xi,j−1,k , and
xi,j,k−1 . This model has four parameters, namely α, β, γ, and σε 2 , to be determined
from an available set of data. In Eq. (6.1) the geologic variable, xi,j,k is a standard
variable with zero mean and unit variance. This does not cause any loss of generality
in simulation studies, since after the simulation of standard variables the final form
of them can be found as
in which Xi,j,k is the spatial variable at position (i, j, k) with mean, μi,j,k , and stan-
dard deviation, σi,j,k . Standardization procedure renders the original variable into
k k
k
ρ'k xi,j–
ρj(1) xi–
ρi(1) xi,j,k
ρk(1) ρ'j
j ρ'i j
i
i i xi,j,k
Autocorrelation Cross-correlation j
a second-order stationary variable (Şen, 1979b). Any simulation model has three
major stages in arriving at its final goal. These are
If the model is valid, then the sequence of εi,j,k should have an independent
(completely independent) structure. This stage is referred to as the diagnostic
checking for model suitability.
3) Usage of the suitable model to generate equally likely synthetic sets of data that
are statistically indistinguishable from the original data, i.e., they should have
the same distribution functions and average statistical parameters. This corre-
sponds to the simulation of the underlying geological phenomenon. In real-time
estimations, this is equivalent to prediction stage.
in which ρi (1), ρj (1), and ρk (1) are the lag-one autocorrelation coefficients along the
i, j, and k axes, respectively. It is obvious that Eq. (6.4) reduces to Sharp and Aroian
(1985) simulation expression if α = β = γ = φ and with their notations ρi (1) =
ρ100 , ρj (1) = ρ010 , ρk (1) = ρ001 and σε 2 = σa 2 . Multiplication of both sides in Eq.
(6.1) by xi−1,j,k , xi,j−1,k , and xi,j,k−1 , respectively, and taking expectations leads to
three additional equations as
or implicitly as
AX = C, (6.10)
k k
ρ
ρ
ρ
j j
i i
ρi(1) = ρj(1) = ρk(1) ρi = ρj = ρk = ρd
Autocorrelation Cross-correlation
where A is a (4×4) matrix of coefficients with its elements estimated from a given
set of data, hence it is known and dependent on the available data only; X is the
unknowns vector and includes the model parameters only; and, finally C is the vector
of lag-one autocorrelations. The solution of Eq. (6.8) requires matrix inversion of
A, after which the solution can be written as
X = A−1 C. (6.11)
ρi (1) + (ρk ρi − ρj )ρk (1) + (ρj ρi − ρk )ρj (1) − ρ2
j ρi (1)
α= , (6.13)
1 − (ρ2 2 2
i + ρj + ρk ) + 2ρi ρj ρk
ρj (1) + (ρi ρj − ρk )ρi (1) + (ρj ρk − ρi )ρk (1) − ρ2
j ρj (1)
β= , (6.14)
1 − (ρ2 2 2
i + ρj + ρk ) + 2ρi ρj ρk
and
ρk (1) + (ρj ρk − ρi )ρj (1) + (ρk ρi − ρj )ρi (1) − ρ2
k ρk (1)
γ= , (6.15)
1 − (ρ2 2 2
i + ρj + ρk ) + 2ρi ρj ρk
It is clear from Eqs. (6.13, 6.14, and 6.15) that the model parameters (α, β, and γ)
are functions of the correlation structure of the geological phenomenon concerned.
There are three autocorrelation and three cross-correlation coefficients to be esti-
mated from the data. Hence, the total number of statistical parameters including
the average value and the variance to be extracted from the data is equal to eight,
provided that the geological phenomenon considered is homogeneous.
j j
ρi ρ
ρ'd
ρj ρ 1 2
( 1 + 8ρ2 −1)
ρ'd
i i
Anisotropic Isotropic
and cross-correlations are equal to each other among themselves. This is tantamount
to saying that ρi (1) = ρj (1) = ρk (1) = ρ and ρi = ρj = ρk = ρd where ρ and
ρd are isotropic serial and diagonal correlation coefficients. In other words, these
correlations do not depend on axial directions as shown in Fig. 6.3. Furthermore, in
an isotropic medium the autocorrelations along any direction is dependent only on
the variability along this direction. Hence, the dimensions of cubic blocks in spatial
simulation become significant. For the sake of simplicity with no loss of generality,
herein, the dimensions are assumed as units. Hence, the isotropic autocorrelation,
ρ, in Fig. 6.3 should be interpreted as the correlation between the adjacent corner
values of the phenomenon on a square mesh.
In 2D simulation this value will reduce into a square unit as shown in Fig. 6.4.
The model parameter estimations are found from Eqs. (6.13) and (6.14) by sub-
stituting the isotropic medium auto- and cross correlations, i.e., ρk = ρk (1) = 0,
ρi 1 = ρj (1) = ρ and ρi = ρj = ρd ,
ρ
α= , (6.16)
1 + ρd
and β has value identical to α. With this the 2D model mathematical expression
becomes
.1 + 8ρ2 − 1
ρd = . (6.18)
2
The variance of the independent random variable, εi,j , can be found from Eq.
(6.9) with necessary substitutions as
2ρ2
σε2 = 1.0 − . (6.19)
(1 + ρd )
1) If the diagonal correlation is unity (ρd = 1), then Eqs. (6.16) and (6.19) reduce to
the 1D case for which α = ρ and σε 2 = 1 – ρ2 . These correspond to the properties
of lag-one Markov process.
2) For zero autocorrelation (ρ = 0), it is possible to see that α = β = 0; ρd = 0 and
σε 2 = 1. These properties imply independent random variable with zero mean
and unit variance.
For the square net with dimensions (n×n), the generation procedure requires
first the generation of n normally distributed independent random variables ε1,j (j =
1, 2, . . ., n) with zero mean and variance equal to (1 – ρ2 ). Subsequently, these
variables are converted into an auto-correlated row sequence by means of the first-
order Markov process as
The first column sequence is also generated in the same way except that the initial
(n–1) normally distributed independent variables with zero mean and variance equal
6.2 3D Autoregressive Model 279
to (1–ρ2 ) are generated. It is important that the first row value, ε1,1 , is used as the
initial value for column Markov model, which is expressed as
in which x1,1 is taken equal to ε1,1 . Then the remainder rows (or columns) are gen-
erated through the use of 2D model as
6.2.3 Extension to 3D
ρ(ρd − 1)
α= . (6.23)
2ρd − ρd − 1
Fig. 6.5 Two-dimensional isotropic pattern for (a) ρ = 0.1; (b) ρ = 0.2; (c) ρ = 0.3; (d) ρ = 0.4;
(e) ρ = 0.5
6.3 Rock Quality Designation Simulation 281
The substitution of the parameters into Eq. (6.5) leads after some algebra to the
variance of 3D independent variables as
3ρ2 (ρd − 1)
σε2 = 1 − . (6.25)
2ρ2d − ρd − 1
Intact lengths
lengths, each of which is greater than a pre-designated threshold value to the total
length of scanline. Due to its relative simplicity, RQD has been used extensively in
the rock classification for engineering purposes. Directly RQD or its slight modifi-
cations have been employed in a variety of engineering applications. For instance,
Piteau (1970) has used RQD for rock slope stability, Barton et al. (1974) in the
design of tunnel support, Louis and Pernot (1972) in dam foundations analysis of
permeability, and Cording and Mahar (1978) in underground chamber design in
rocks, Bieniawski (1974) and Kulhawy (1978) in estimating the strength of rock
materials.
Most of the researchers so far in the literature have concentrated in the scanline
measurement evaluations analytically (Hudson and Priest, 1979; Priest and Hudson,
1976, 1981; Şen and Kazi, 1984; Şen, 1984; Kazi and Şen, 1985) or empirically
(Cruden, 1977; Wallis and King, 1980; or simulation on digital computers by Good-
man and Smith, 1980) and Şen (1990b). In order to alleviate some drawbacks in
RQD, Şen (1990a) has proposed the concept of rock quality percentage (RQP) and
rock quality risk (RQR).
Although the quality of rock based on these measurements can be done by visual
inspection, any quantitative method is always better in unifying different opinions
about the same rock mass. Therefore, the method of RQD was adopted by Deere
(1964, 1968) and was expressed as
100
n
RQD = Li , (6.26)
L
i=1
directions and at different places, which is quite tedious and not practical. Therefore,
the question arises how to determine the RQD distribution function from the basic
pdf of the intact lengths. It has been already shown by Şen (1990a) that an analytical
derivation of the RQD distribution function is almost impossible, and therefore the
only way to obtain it is by numerical methods using Monte Carlo techniques.
Simulation of stochastic variables is rather similar to the numerical solution
methods in mathematics. That is to say provided that the underlying properties of a
phenomenon are known, then simulation gives a way of reaching the desired goal
numerically. The desired goal here is the RQD distribution. For such a simulation
study, the following steps must be considered.
1) Determine the underlying pdfs of the intact lengths within a rock mass. In previ-
ous studies, this pdf has been taken to be either negative exponential or lognor-
mal pdf, which have the mathematical forms
and
1 1 1 x
f (x) = √ exp − Ln (0 <x<∞), (6.28)
x 2π σLnx 2 σLnx mx
2.3 1
s=− log u = − Ln u. (6.29)
λ λ
However, for the logarithmic distributions first the uniformly distributed pairs of
variables (u1 and u2 ) are transformed into a normally distributed random vari-
able pairs (s1 and s2 ) with a procedure already presented by Hammersely and
Handscomb (1964) as
s1 = −2 log u1 cos (2πu2 ) . (6.30)
s21 = −2 log u1 s11 (2πu2 ) (6.31)
Herein, u1 and u2 are uniformly distributed random variables within the range
of 0 to 1. The values of s are then transformed into logarithmically distributed
intact lengths x using
6.3 Rock Quality Designation Simulation 285
x = e−x , (6.32)
The cumulative pdf of RQD values resulting from negative exponentially dis-
tributed intact lengths with a set of average fracture number were presented in
Fig. 6.7 for five different threshold values.
On the same figure, classification of RQD values is also shown. There is a sim-
ilarity between the grain size distribution of granular rocks and these graphs which
show the rock quality distribution of the fractured hard rocks. Field experiences
show that any fractured rock is heterogeneous and accordingly more than one type
of rock quality exists within the same rock mass. The percentages of these different
qualities can be found quantitatively from Fig. 6.7. Inspection of this figure leads to
the following significant conclusions.
1) The smaller the average number of discontinuities, the smaller is the range,
which indicates the uniformity of the rock quality. For instance, invariably what-
ever the threshold value the rock has an excellent quality when the average num-
ber of discontinuities is equal to unity.
2) As the average number of discontinuities increases, the rock becomes heteroge-
neous. For instance, in Fig. 6.7 when the threshold value is 0.05, the curve that
represents 20 average intact lengths has three different qualities, namely excel-
lent, good, and fair portions. It is clear from the same curve that the majority
of values are in the “good” quality zone whereas other qualities are less likely
to occur.
3) An increase in the average number of discontinuities leads to deteriorating rock
qualities as shown in Fig. 6.8.
4) As the threshold value decreases, the rock quality increases (see Fig. 6.8). On the
other hand, for a given threshold value the deterioration rate in the rock quality
is higher at small discontinuity numbers. For instance, at 0.20 truncation level
the reduction in the rock quality is almost 12% between discontinuity numbers
2 and 4 whereas 3% from 18 to 20.
In addition, the pdfs of RQD for a set of discontinuity numbers and different
threshold values are presented in Fig. 6.9 as negative exponentially distributed intact
lengths.
286 6 Spatial Simulation
c
Fig. 6.7 RQD chart for negative exponential pdf with (a) t = 0.05; (b) t = 0.15; (c) t = 0.20
6.3 Rock Quality Designation Simulation 287
10–1 2 4 6 8 100 2 4
0
0.5
1.0
10%
10%
1.5
Depth ,d, (km)
2.0
2.5
3.0
Observed data
Regression
3.5 Gay
Haimson
Worothicki
and Deham
4.0
K(d)
Fig. 6.8 RQD threshold value discontinuity number chart (negative exponential pdf)
One of the most striking properties of these pdfs is that irrespective of the discon-
tinuity number and threshold value, they are invariably symmetric. The positions of
the maximum points on any one of these curves along the RQD axis shows the most
likely rock quality within the rock mass.
Due to the aforementioned symmetry, the average RQD value coincides with the
most likely rock quality value. This point indicates that the average RQD value is
equal to the maximum likelihood estimation of the averages resulting from these
pdfs. This value corresponds with the classical RQD value as defined by Deere.
Besides the mean mode and the median values of the RQD are equal to each other.
This last statement suggests that the pdf of RQD within a rock mass can be approx-
imated by a normal distribution as
2
1 1 r−μ
f (r) = √ exp − , (6.33)
2π σ 2 σ
288 6 Spatial Simulation
c
Fig. 6.9 RQD negative exponential pdfs with (a) t = 0.05; (b) t = 0.15; (c) t = 0.20
in which r is a dummy variable representing RQD; μ and σ are the population mean
and standard deviation of RQD, respectively. For the negative exponential distribu-
tion, Şen and Kazi (1984) have shown already that the RQD has the same mean and
standard deviation value, which are expressible as
6.3 Rock Quality Designation Simulation 289
c
Fig. 6.10 RQD description cart for Log-normal PDF (a) (t = 0.05, m = 0.0, s = 1.0); (b) (t =
0.15, m= 0.0, s = 1.0); (c) (t = 0.25, m = 0.0, s = 1.0)
290 6 Spatial Simulation
Finally, the cumulative pdf of RQD for the underlying intact length distribution as
the lognormal pdf are given in Fig. 6.10.
All of the conclusions for the negative exponential pdfs are equally valid for these
curves. In addition, comparisons of various graphs in Fig. 6.10 indicate different
standard deviations, and an increase in the intact length standard deviation improves
the rock quality designation. In other words, the less uniform the fracture spacing
the stronger the rock mass. The following significant conclusions can be drawn from
this study.
1) Any rock mass might have different rock qualities at the same time in different
directions.
2) Rock quality deteriorates with increase of the average number of discontinu-
ities for any intact length distribution, but an increase in the standard deviation
of log-normal pdf of intact lengths leads to improvements in the rock quality
designations.
3) The pdfs of RQD are unimodal in any case, but symmetrical for the negative
exponentially distributed intact lengths.
4) Any dominant type of rock quality has almost the same maximum frequency,
confined within 0.20–0.25 for threshold levels more than or equal to 0.10 m.
All of the aforementioned studies have a common point in that they give an RQD
estimation without consideration of the intact length correlation. However, it is a fact
that even on the same outcrop of the rock, there might be correlated intact lengths
along scanline taken at various orientations (Eissa and Şen, 1990). Although RQD
calculations, according to Deere’s (1964) definition directly from the scanline mea-
surements, implicitly account for the intact length correlation, unfortunately, the
analytical formulations do account for this correlation neither implicitly nor explic-
itly. Accordingly, for correlated intact lengths the existing analytical results are in
error. It is obvious then that the comparison of analytical and empirical RQD esti-
mations is possible accurately only for the cases of independent intact lengths. Oth-
erwise, such a comparison is meaningless.
An important factor in the analysis of rock quality assessments from discon-
tinuity measurements along a scanline is the correlation of the intact lengths. The
autorun model and first-order autorun coefficient are proposed as a method of objec-
tively quantifying the intact length correlation structure and discontinuity occur-
rences within a rock mass (Şen, 1978, 1984).
Any straight line through the rock mass encounters random number of discon-
tinuities. An intact length is defined as the length of scanline or drill-core between
two successive discontinuities. In general, if there are n+1 discontinuity, the number
6.3 Rock Quality Designation Simulation 291
of intact lengths is n provided that the start and end of the scanline are at discontinu-
ities. A first step in rock mass classification is to consider two types of intact lengths,
namely those whose lengths are greater than a pre-designate threshold value or oth-
erwise (Eq. 6.26).
For the sake of convenience, alternative intact lengths will be grouped into two
sets as elements ai , (i = 1, 2, . . ., k) in set A and bj (j = 1, 2, . . ., l) in set B, where
k and l are the number of intact lengths in each set. It is obvious that k+l = n, which
is the total number of intact lengths. Furthermore, in an alternate sequence either
k = l–1 or l = k–1; however, practically one may assume with no loss of generality
that k = l = n/2. In short, the intact lengths along a scanline will be an alternative
combination of elements from two sets, namely A = {a1 , a2 , . . ., ak } and B = {b1 ,
b2 , . . ., bl } as shown in Fig. 6.11a. In such a combination, the correlation structure of
sequence al , bl , a2 , b2 , . . ., an/2 , bn/2 (see Fig. 6.11b) is utmost important in addition
to various statistical descriptions of intact lengths.
Besides, it may well be that the intact lengths in set A have different PDF than B.
However, this point lies outside the scope of this chapter. Of course, the assumption
of uncorrelated intact lengths simplifies the analytical derivation of RQD but at a
sacrifice of precision. Due to such an assumption, there is no term representing the
correlation of intact lengths in any RQD formulations that are available so far in
the literature. The major elements that effect the RQD calculations are in detail as
follows:
A b1
a1 b2
a7 b
b15
a4 a10
b5
ak b1
A B
B
a1 a2 a3 a4 a5
b1 b2 b4 b5
b3
C
Fig. 6.11 Alternative intact
length concepts HHHTTTTTTHH T T T T THHHHHHTTT.......................
292 6 Spatial Simulation
3) Correlation structure: Any significant correlation effects not only the discon-
tinuity occurrences but also the intact lengths. This element is ignored invari-
ably in any of the previous RQD studies in the literature. However, only some
indirect procedures have been proposed for accounting the intact length correla-
tion (Higgs, 1984; Şen, 1991).
4) Threshold value: It is a fixed value as 0.1 m or 4 inches, below which the intact
lengths are not considered in the RQD calculations.
The first two elements are stochastic variables and in nature they are both serially
and crossly correlated. Hence, the probabilistic laws of these stochastic parts lead
to meaningful analytical expressions for RQD only after the consideration of cor-
relation structure. For instance, logically any increase in the correlation will imply
the occurrence of longer intact lengths along a scanline than the case where the
intact lengths are independent. It also implies that the number of discontinuities
decreases with increasing correlation. Hence, in general, the existence of relatively
longer intact lengths, (or lesser number of discontinuities) along a scanline implies
improvement in its quality. Consequently, the key in the analytical RQD formulation
for correlated intact lengths is the expression of correlation by an objective measure,
which is adopted herein conveniently as the autorun coefficient.
2nk
rk = , (6.35)
n−k
in which nk is the number of overlapping successive the same type of events lag
k-apart; n is the number of unit intact lengths. From the definition, it is obvious that
0 < rk < 1. In the case of purely independent observations, whatever the underlying
pdf Eq. (6.41) becomes equal to 0.5. Therefore, 0.5 shows the fact that the two
observations separated by lag-k are independent from each other. On the other hand,
if the observations are perfectly correlated, then rk = 1.0.
The autorun coefficient application is very suitable for binary type of data; there-
fore, prior to its application the variable concerned such as the intact length must
6.3 Rock Quality Designation Simulation 293
be rendered into a binary form. For such a purpose the analogy suggested by Priest
and Hudson (1976) as an unbiased coin tossing sequence of heads and tails will be
adopted herein for alternating intact lengths, where a head represents a unit length
of intact rock of type A and a tail represents a unit length of type B. With such an
analogy, the scanline in Fig. 6.11b can be considered as a sequence of heads and
tails (see Fig. 6.11c). The following significant points emerge in such an analogy:
1) The succession of uninterrupted sequence of heads(tails) represent intact lengths
of type A(B).
2) Each appearance of alternate successive events, i.e., head–tail or tail–head suc-
cession corresponds to a discontinuity. It is obvious that two successive head–
head or tail–tail events represent two units from overall intact lengths. These
explanations indicate the suitability of lag-one autorun coefficient, r, in quanti-
fying the intact length correlation structure.
3) The percentages of heads(tails) along a scanline is equal to the probability of
type A(B) intact length. Let these probabilities be denoted by p and q, respec-
tively, then obviously p + q = 1. In terms of total length, LA (LB ) for set A(B)
intact lengths, the probability can be expressed as p = LA /L(q = LB /L).
Assuming uncorrelated intact lengths, Priest and Hudson (1976) have presented
the analytical formulation of expected RQD as
L
E (n) = , (6.37)
E (x)
in which L is the scanline length. On the other hand, RQD as appears in Eq. (6.26)
is equivalent to the summation of random number of random variables; and first by
taking the expectations of both sides and then by considering Eq. (6.37), one can
write
100 E(x∗ )
E(RQD) = E(n)E(x∗ ) = 100 , (6.38)
L E(x)
in which E(x∗ ) is the expectation of intact lengths greater than a threshold value, t.
Due to the fact that E(x∗ ) < E(x), the ratio of expectations in the expression always
assumes a value between 0 and 100. The expectations on the right-hand side of
Eq. (6.38) can be found provided that the PDF of random variables concerned are
known.
It can be shown similar to autorun modeling (Şen, 1985) that the pdfs of k suc-
cessive heads and tails are of geometric distribution types as
P(nh = k) = (1 − r1 )rk−1
1 , (6.39)
and
k−1
p p
P(nt = k) = (1 − r1 ) 1 − (1 − r1 ) , (6.40)
q q
1
E(nh ) = , (6.41)
1 − r1
and
q
E(nt ) = . (6.42)
p(1 − r1 )
As mentioned above the number of set A intact lengths is one less or more than
set B intact lengths. In other words, practically they may be assumed as equals,
and therefore each type of intact length has its probability of occurrence equal to
0.5. With this information, the overall expectation of intact lengths, E(x), without
distinction between sets A and B can be seen to be E(x) = 0.5E(nh ) + 0.5E(nt ),
which yields by consideration of Eqs. (6.41) and (6.42) to
6.3 Rock Quality Designation Simulation 295
1
E(x) = (6.43)
2p(1 − r1 )
or from Eq. (6.37) one can find the expected number of discontinuities as
where the Subscript signifies the correlatedness of the intact lengths. The probability
P(k) of k discontinuity occurrences along a scanline of length x at an average rate
of λr1 becomes, according to the Poisson process,
k−1
−2xp(1−r1 ) 2xp(1 − r1 )
P(k) = e . (6.46)
ký
1.0
SIMULATION
ANALYTICAL
0.7
0.8
r=
6
0.
0.6
r1
5
0.
4
0.
0.4
0.3
0.2 0.2
Since the interest lies in the discontinuity spacing pdf, by considering the distance,
d, from one discontinuity to the following, one can write that P(d < x) = 1 – P
(k = 0), and hence substitution of Eq. (6.45) leads to cumulative PDF as
P(d ≤ x) = 1 − e−2xp(1−r1 ) .
By taking its derivative with respect to x the pdf, f(x), of intact lengths can be derived
as
Subsequently, the expectation of intact lengths that are more than t can be found
according to
!∞
∗
E(x ) = xf (x)dx,
t
which leads after the substitution of Eq. (6.46) to
1 + 2p(1 − r1 )t −2p(1−r1 )t
E(x∗ ) = e . (6.47)
2p(1 − r1 )
Finally, the substitution of Eqs. (6.43) and (6.47) into Eq. (6.38) leads to
E(RQD) = 100 1 + 2p(1 − r1 )t e−2p(1−r1 )t . (6.48)
For independent intact lengths, r1 = 0.5 and the occurrences of unit intact lengths
comply with the Binomial distribution, which leads to geometric intact length distri-
bution with E(x) = 1/p or λ = p = 0.5, and therefore Eq. (6.48) becomes identical
to what was suggested by Priest and Hudson (1976) as in Eq. (6.41). Under the light
of aforementioned discussions, one can rewrite Eq. (6.48) as
The validity of this formula is checked with extensive Monte Carlo simulation
technique by using autorun model for generating correlated intact lengths as pro-
posed by Şen (1985). First of all, estimates of average intact lengths of sets A and
B are calculated as
1
k
nA = (nA )i , (6.50)
mA
i=1
6.3 Rock Quality Designation Simulation 297
and
1
l
nB = (nB )i , (6.51)
mB
i=1
respectively. Herein, mA and mB are the number of intact lengths; (nA )i and (nB )i
are i-th intact length in sets A and B, respectively. The geometric pdf parameter,
which is the first-order autorun coefficient, can be estimated from Eq. (5.41) as
rA = (nA − 1) / nA . (6.52)
Similarly, the geometric pdf parameter, rB , for set B intact lengths turns out to be
rB = (nB − 1) / nB . (6.53)
1) The formulation provided by Priest and Hudson (1976) for E(RQD) yields
underestimated results if the intact lengths are positively correlated, which is the
case in most of the natural rocks as will be presented in the application section
of this section.
2) Increase in the correlation structure gives rise to increase in the E(RQD) values.
3) Relatively better RQD values are obtained for the same number of discontinu-
ities but correlated intact lengths.
4) The difference between the dependent and the independent intact length RQD
values is relatively less significant at small λ values than big λs. In fact, at 5%
relative error level the correlated intact lengths do not lead to significantly dif-
ferent RQD values, provided that λ < 3 and 10% error level when λ < 10. In Fig.
5.3 upper and lower confidence limits at 5 and 10% significance levels are shown
around the Priest and Hudson independent intact length solution. It is obvious
that for small average number of discontinuities their solution gives confident
RQD estimates even though the intact lengths are dependent. However, for big
average number of discontinuities the significance of intact length correlation
becomes very pronounced in the RQD estimations.
298
6 Spatial Simulation
6.3.2.3 Applications
The applications of the methodology developed herein are carried out for the field
data from various parts of the world. The first field data are recorded along the
exposed outcrop surfaces of granitic rocks in the western part of the Kingdom of
Saudi Arabia. Extensive geological field survey by Otaibi (1990) showed that the
area consists of one rock unit, which is granite of light pink color on fresh surface
and dark brown on weathered surface, medium to coarse grained and equi-granular.
This area was selected since it has a good combination of well-exposed bedrock
and relatively simple fracture pattern. Three sets of fracture orientations can be seen
distinctively in this area. Each one of the fracture set is measured by the scanline
perpendicular to the fracture traces. The fracture measurements are carried out at
three sites. These sites are selected such that they give rather random characteriza-
tion of the fracture geometry, i.e., they are quite independent from each other. In
order to be able to apply the methodology developed herein, the relevant values are
calculated and presented in Table 6.2.
It is obvious that the use of independent intact length RQD formulation does not
yield significant deviations from the dependent intact length case. This is due to two
major reasons. First of all, since the average number of discontinuities is all less
than 3, and therefore, as already explained in the previous section, even if the intact
lengths are strongly correlated there will not be practically significant difference,
i.e., the relative error will be less than 5%. In addition to this major reason, the
autorun coefficients are rather close to 0.5, which also confirms the approximation
in the results of E(RQD) calculations either by the use of Eqs. (6.36) or (6.47).
The second set of data for the implementation of the methodology are extracted
from a previous study by Ryckes (1984) on modes of failure and stability of rock
slopes in Tytherington quarry, which is located halfway between Thurnbury and
Tytherington for about 16 km north of Bristol in the county of Avon, England. The
Paleozoic rocks of the lower coal series in the area are affected by many movements
in the past, which led to faults, folds, and unconformities, each of which led to dif-
ferent pattern of intact lengths. Due to these different mechanisms, it is not possible
to expect that these intact lengths have independent correlation structure. In order to
depict the regional discontinuity pattern in this area, three scanlines were set up at
different directions. The bedding plane orientation is almost horizontal for the first
scanline, which will be referred to SL1. The second scanline SL2 has a direction of
20◦ toward southwest whereas the third, SL3, has an intermediate inclination to the
former. The necessary parameters as well as the E(RQD) calculations are presented
in Table 6.3.
The major difference between Tables 6.2 and 6.3 is that the average discontinuity
numbers of scanlines in Table 6.2 are far bigger than Saudi Arabian measurements;
however, the autorun coefficients in Table 6.3 are invariably less than 0.5, indicating
that there are negatively correlated intact lengths. Consequently, Priest and Hudson
formulation (Eq. 5.3) gives overestimation.
Generally, in open rock mechanics literature persistence implies the areal extent or
size of a discontinuity within a plane (Brown, 1981). It can be quantified crudely by
observing the discontinuity trace lengths on the surface of exposures. However, the
type of persistance herein is related to the sequential occurrences of intact lengths,
which may constitute obvious clusters. Within the contest of this chapter, persis-
tence can be defined as the tendency of short intact lengths to follow short intact
lengths and long intact lengths to follow long intact lengths. As Priest and Hudson
(1976) say, a high frequency of low spacing values occurs within clusters and a low
frequency of high spacing values occur between clusters.
The simple ways of expressing persistance in a sequence of observations is
through either the classical serial correlation coefficient or the autorun function
(Şen, 1978). Although the former requires the measurements to be normally dis-
tributed, the latter is robust and applicable in any distribution case. However, for
normally distributed intact lengths both give exactly the same result. In fact, in such
a situation the autorun coefficient r is convertible to the correlation coefficient ρ by
ρ = sinπ(r – 0.5). As a consequence, only for the normally distributed intact lengths
one can use interchangeably the autorun and autocorrelation terminologies.
In the following sequel, the intact length persistence is quantified with the help
of autorun coefficient rk for lag-k defined as in Eq. (6.35). In order to apply this
formulation to a scanline, the following steps must be executed on any scanline as
presented in Fig. 6.14:
6.4 RQD and Correlated Intact Length Simulation 301
1
(b)
0
1 (c)
0
Δs Δs Δs Δs Δs
1) By considering the threshold value equal to 0.1 m (or 4 in) as proposed by Deere
(1964), the effective intact lengths are assigned uniform value of +1 and defec-
tive lengths as 0 (see Fig. 6.14b). In fact, the resulting modified scanline has
zone of square waves, which are separated by zero valued intervals.
2) Divide the scanline into fixed length of finite intervals s as shown in Fig. 6.14c
where the intervals of fixed lengths are adopted as +1 cm, which is convenient
for any practical purpose. The number of such intervals along the scanline is
denoted by n.
The ratio of numbers n within the effective intervals along the scanline to the
total interval number n is equivalent to the estimate of RQD, i.e., RQD = 100
(n1 /n).
3) Find the number n of overlapping pairs of 1 and estimate r from Eq. (3.36) for
different lags from a sequence of r1 , r2 , r3 , r4 , . . .
The plot of autorun coefficient versus lag value gives rise to graphs that represent
the persistance existing in the original intact sequence. A very significant result at
this point is that if practically all of the autorun are equal to zero, then the intact
lengths originate from an independent random process and hence the calculation
of RQD as defined by Deere is reliable, otherwise persistance appears implying
clusters in intact lengths and unreliability in RQD calculations.
The applications of the abovementioned persistance procedure to actual data
are achieved with considering field measurements from South England near Bris-
tol (Ryckes, 1984) and measurements from the western province of Saudi Arabia
collected by authors.
The Paloezoic rocks of the lower coal series in the area are affected by vari-
ous earth movements in the past, which led to faults folds and unconformaties.
The location of intact length measurement sites are on the axis of a syncline,
which inclines toward the south. The bedding planes orientation is almost hori-
zontal for the first scanline (SL1). The second scanline (SL2) has a direction of
20o toward southwest and the third scanline (SL3) has an intermediate inclination
to the former. Further detailed information on these scanline measurements can
be found in Ryckes (1984). The fourth scanline (SL4) was an illustrative exam-
ple adopted from Brady and Brown (1985). However, the fifth scanline (SL5) is
a representative scanline measurement in crystalline dioritic rocks in southwest of
Saudi Arabia.
302 6 Spatial Simulation
0
0 5 10 15 20 25
lag
1.0 a
0.5
0
0 5 10 15 20 25
lag
b
1.0
0.5
0
0 5 10 15 20 25
1.0 lag
c
0.5
0
0 5 10 15 20 25
1.0 lag
d
0.5
0
0 5 10 15 20 25
lag
e
6.4 RQD and Correlated Intact Length Simulation 303
A striking conflict appears from this table; so far, the rock qualities on RQD and
persistance basis are concerned. For the fourth scanline although the same quality
conclusion is arrived, other scanlines are in conflict to some extent. Hence a dilenma
arises as to which one is to be chosen for decision-making. The view taken in this
research is that persistance-based qualities should be preferred over the RQD-based
descriptions since the former takes into consideration not only the mean and vari-
ance of intact lengths but also their correlation structure, i.e., clustering effects.
Provided that the autorun functions are available, the question arises as to what type
of dependent process represents best the intact length occurrences within a rock
mass. A straight forward answer to this question can be given, provided that the
autocorrelation structures of different theoretical stochastic processes are known
beforehand. For this purpose three stochastic processes will be described herein,
namely the independent random process, the lag-one Markov process, and the
autoregressive integrated moving average process ARIMA (1, 1).
1.0 1.0
0.8 0.8
0.6 ρ = 0.9
0.6
ρk
ρk
0.4 0.4
0.2 0.2
0.0 0.0
0 2 4 6 0 2 4 6 8 10 12 14 16
Lag, k Lag, k
a b
1.0 θ = 0.1
φ = 0.1
0.8
0.6
ρk
0.4
0.2
0.0
0 2 4 6 8 10 12 14 16
Lag, k
c
Fig. 6.16 Autocorrelation structure of different processes (a) independent process; (b) Markov
process; and (c) ARIMA(1, 1) process
Two parameters, namely the mean and standard deviation, are enough to describe
the phenomenon completely. In fact, all of the RQD, RQR, RQP, etc., have been
based on the understanding that the phenomenon has an independent structure (Şen,
1990a). Since there has been so many studies in the past concerning this process, its
repetition is avoided herein.
in which μ, σ, and ρ are the mean, standard deviation, and first-order serial correla-
tion coefficient, respectively, of intact lengths li and, finally, εi is a normal random
shock with zero mean and unit variance. The autocorrelation structure of this process
is given as
6.4 RQD and Correlated Intact Length Simulation 305
ρ0 = 1
(6.56)
ρi = ρi ,
in which φ and θ are model parameters and εi is again normal random process with
zero mean and unit variance. The autocorrelation structure of this model is given as
ρ0 = 1
(1 − φθ) (φ − θ)
ρ1 = (6.58)
1 + θ2 − 1φθ
ρk = φρk−1 (for k ≥ 2) ,
which is drawn for same set of parameters in Fig. 6.16c. Comparisons of graphs
in Fig. 6.16 indicate that only ARIMA (1, 1) processes lead to significantly more
persistant correlations at large lags than other processes.
0
0 5 10 15 20 25
lag
1.0
a
Markov
θ = 0.85 ARIMA
0.5
0
0 5 10 15 20 25
lag
b
1.0
Markov
θ = 0.85
0.5 ARIMA
θ = 0.80
0
0 5 10 15 20 25
lag
1.0 c
Markov
θ = 0.86
0.5 ARIMA
θ = 0.82
0
0 5 10 15 20 25
lag
1.0 d
Markov
θ = 0.80
0.5
ARIMA
θ = 0.75
0
0 5 10 15 20 25
lag
e
6.4 RQD and Correlated Intact Length Simulation 307
Parameters
Scanline No. Model ρ φ θ μ σ
80
60 1 = 5cm
RQD
40
10
20
15
0.10 m when ρ < 0.5. However, for ρ >0.5 the sensitivity increases enormously
and therefore the classical RQD calculations may not be reliable, especially when
0.5 < ρ < 1.0, provided that the underlying generating mechanism of intact lengths
is of Markov type. On the other hand, Fig. 5.9 also indicates that the rock quality
improves if a low threshold value such as 0.05 m is adopted for the basic RQD
definition. On the other hand, the comparison of three curves in the same figure
shows that the rock quality improvement from threshold value 0.15 to 0.10 m is less
than the transition from 0.10 to 0.05 m.
The same type of simulation, but for the ARIMA(1, 1) model, leads to standard
RQD–persistance relationship as in Figs. 6.19 and 6.20, which present only two
samples from an infinite number of such relationships.
100
80
60 1 = 5cm
RQD
40
10
15
20
1 = 5cm
60
RQD
40
10
20
15
First comparison of Figs. 6.19 and 6.20 with Fig. 6.18 shows that the ARIMA
(1, 1) process implies better rock quality for the same threshold value. This is indeed
the logically expected result from the previous discussions. On the other hand, the
sensitivity of RQD to persistence parameter changes decreased significantly for
ρ > 0.5, but increased for ρ < 0.5. In other words, the curvature of the curves is
very small compared with the Markov process to the extent that even in same cases
with such threshold values of 0.05 m the RQD–persistance relationship appears as
a straight line. Furthermore, comparison of Fig. 6.19 with Fig. 6.20 indicates that
increase in the φ value means improvement in in the rock quality, which is due to
the fact that large φ-values imply longer intact lengths.
3. In order to see the effects of genuine parameters, i.e., mean and variance
of the scanlines, a second set of simulation with the same models are performed
on digital computers. Some of the representative results for SL1, SL2, and SL3
are presented in Fig. 6.21. A general conclusion from these figures is that the
mean and standard deviation of the intact lengths are not sufficient in calculat-
ing RQD values. For instance, consideration of Fig. 6.21 leads to the conclusion
that the least RQD value as almost 20 will appear, provided that the intact lengths
along SL1 occur independently (ρ = 0). However, in addition to these two basic
parameters, the value of φ plays a dominant role by causing increment in the
RQD value. Similar statements are valid for other scanlines. Besides, compari-
son of Fig. 6.21 with Figs. 6.19 and 6.20 shows that as the mean and standard
deviation values increase, the sensitivity of persistance parameter decreases. It is
therefore plausible to conclude that the longer and less variable become the intact
lengths the more reliable will be the classical RQD calculations by Deere’s original
definition.
The main theme of this study was to emphasize that in practical studies the RQD
calculations are not reliable without the consideration of intact lengths correlation
structure. It is a well-known fact that the classical RQD evaluations are based on the
310 6 Spatial Simulation
100 100
F = 0.9 F = 0.9
80 = 0.7 80 = 0.7
= 0.5 = 0.5
= 0.3 = 0.3
= 0.1 = 0.1
60 60
RQD
RQD
40 40
Markov
Markov
20 20
0 0
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
a b
100
F = 0.9
= 0.7
= 0.5
80 = 0.3
= 0.1
60
RQD
40 Markov
20
0
0 0.2 0.4 0.6 0.8 1.0
c
Fig. 6.21 RQD–persistence relation for (a) SL1; (b) SL2; (c) SL3
mean value percentage of the intact lengths that are greater than a threshold value
only. However, under the light of the simulation study performed in this chapter, the
following important points are worth to notice.
1) The persistance structure of intact lengths gives rise to additional rock quality
increments.
2) There exists a proportional relationship between the RQD and the persistance
parameter, which is adopted as the first-order autocorrelation coefficient in
this study.
3) İt is possible to identify the underlying generating mechanism of the intact
lengths by comparing the experimental and theoritical autocorrelation functions
of stochastic processes, such as the Markov or ARIMA(1, 1) processes.
Any porous medium will have either a solid or a void at each point in space. Quanti-
tatively, solid point is represented by –1 and a void point by +1. Figure 6.22 indicates
voids as white squares and solids as black patches, and such a spatial distribution of
digitized numbers is a bivariate random field or ReV.
6.5 Autorun Simulation of Porous Material 311
where ∈ means “belongs to” A and Ac are the sets of voids and solids, respectively,
within a porous medium bulk volume, V. These two sets are mutually exclusive and
complementary, so it follows that
A ∪ Ac = V (6.60)
and
A ∩ Ac = , (6.61)
where ∪ and ∩ are union and intersection of sets, respectively, and is the empty
set. The expression “random function of coordinate” must be understood in the sense
that at each point of the 3D space, the value ξ(x) is a random variable, and conse-
quently it cannot be predicted exactly. The values of ξ(x) are subject to a certain pdf.
If the pdf is invariant with respect to a shift of the system of points, then the ReV
and corresponding porous medium are homogeneous. The same ReV is statistically
homogeneous and isotropic when the pdfs are invariant with respect to an arbitrary
rotation of the system points (such as a solid body) and to a mirror reflection of the
system with respect to an arbitrary plane passing through the origin of the coordi-
nate system. In other words, the statistical moments depend upon the configuration
of the grain–void system for which they are formed, but not upon the position of the
system in space. In practical terms, the porous medium is isotropic if the properties
of any point are the same in all directions from that point. The medium is of hetero-
geneous composition if its nature or properties of isotropy or anisotropy vary from
one point to another in the medium. If the porous medium is statistically homoge-
neous and isotropic, then the moments do not depend on any preferred direction
within the medium.
312 6 Spatial Simulation
where s is the distance along the sampling line from any arbitrary origin. The graph-
ical representation of f(s) forms a square wave as it passes alternately from void to
grain (Fig. 6.22). This function represents one of the possible realizations of the
ensemble of the porous medium. It reflects the size distribution of solids and voids,
their orientation, and packing within the sandstone. In practice, f(s) may be deter-
mined by preparing a high-contrast photographic image of a thin section whose
pores have been filled with epoxy.
In order to treat f(s) with classical time-series techniques, such as auto-
covariance, spectral, autorun analysis, and so forth, it must be defined as random
variable set at n points equally spaced s apart along the sampling line. Hence, ns
is equal to the sampling line length. The ordered set of f(s) values at ‘ = 1, 2, . . ., ∞,
is called a stochastic process. If the pdf is the same for all I, the stochastic process
is said to be weakly stationary. In addition, strict stationarity implies that all of the
possible joint pdfs are functions of the distance between pairs. However, in practice,
strictly stationary processes are not usually encountered. Non-stationarities in sed-
imentary rock units may arise from the presence of distinctive layering or gradual
grading. In general, sedimentary units are considered as stochastically stationary if
the pattern of variation in a property is similar in each sampled area of a bed.
where k is referred to as the lag and m as the truncation level, taken in most cases
as the median value. From the definition, it is obvious that values of the autorun
function vary between 0 and +1 for any lag-k. Let p be the probability of +1 in a
characteristic function (see Eq. 6.62) of infinite length. For a line characteristic func-
tion the truncation level can be taken as any value between +1 and –1, exclusively.
6.5 Autorun Simulation of Porous Material 313
For independent processes all of the autorun values at different lags are equal to p.
However, for any process the following autorun function properties are valid:
⎫
r (0) = +1 ⎬
0 < r (ks) < +1 . (6.64)
⎭
r (∞) = p
The small sample estimate of the lag-k autorun coefficient from finite length
characteristic functions can be obtained similar to Eq. (6.35) as
2nk
r (ks) = , (6.65)
n−k
1 1
r (ks) = + arcsin ρks , (6.66)
2 π
where ρks is the lag-k autocorrelation coefficient. Assuming a first-order Markov
process where ρks = ρks , this last expression gives autorun functions as shown
in Fig. 6.24. It is obvious that for this case 0 ≤ ρks ≤ +1 corresponds to
0.5 ≤ ρks ≤ +1. Figure 6.24a–d represents persistence, and the autorun function
converges 0.5 asymptotically. These continuous decreases indicate the stationary
nature of the underlying sample characteristic function. On the other hand, negative
dependence is characterized by autorun functions. In Fig. 6.24, rk implies r(ks)
and r(s).
The autorun coefficient is robust and is not dependent on any particular distribu-
tion function. Furthermore, it is very convenient for binary random variables. The
physical interpretation of the auto-covariance function may be difficult or impossi-
ble to make (Jenkins and Watt, 1968). Contrarily, the autorun function has phys-
ical meaning as conditional probabilities and is usually simple to interpret. The
rk
rk
rk
.4 .4 .4
.2 .2 .2
0 0 0
0 4 8 0 4 8 0 4 8 12 16 20 24
k k k
(d)
1.0 p = 0.5
.8 r = 0.9
.6
rk
.4
.2
0
0 4 8 12 16 20 24 28 32 36
k
Fig. 6.24 Autorun function of the dependent process for different r(ks) values and p = 0.5
asymptotic value of the autorun function corresponds to the porosity of the porous
medium. However, in practice autorun coefficients for relatively large lags estimated
from Eq. (6.65) yield the approximate porosity.
Another very important physical parameter that is directly related to the autorun
function is the specific surface, σ. The specific surface is defined as the ratio of
the total surface, S , of solids (or voids) to the bulk volume of the porous medium
concerned. Hence, generally
S
σ = (6.67)
V.
where S is the set of points in the porous medium resting on the grain surface and Sc
is the complementary set including the remaining points. Hence, –1 is important in
6.5 Autorun Simulation of Porous Material 315
T = 2n [1 − r (s)] p. (6.69)
Here T represents the points of solid surface along any sampling line, provided that
the porous medium is isotropic and homogeneous. Hence, the estimate of specific
∧
surface σ along such a line can be determined after dividing Eq. (6.69) by the length
of this line ns
∧ 1 − r (s)
σ =2 p,
s
or more conveniently
∧ r (0) − r (s)
σ =2 p. (6.70)
s
It is important to notice at this stage that the ration in Eq. (6.60) is the slope of
the autorun function at the origin. A similar relationship has been obtained with
the autocorrelation function slope at the origin by Watson (1955). The population
(asymptotic) value of the specific surface can be found from Eq. (6.70) as s → 0,
which leads to
where r (0) is the derivative (slope) of the autorun function at the origin. For inde-
pendent processes, r (0) = −∞ and for completely dependent processes r (0) = 0.
It is therefore expected theoretically that the specific surface is isotropic and homo-
geneous materials may take any positive value. Since processes in earth sciences fall
between the two aforementioned extremes, their specific surface values fall between
zero and infinity. It has been shown by Cramer (1938) that the specific surface of
sandstones varies in the range of 150–320 cm−1 . The finer the sandstone, the greater
the number of crossings on any sampling line, and therefore the slope of the autorun
function will be greater, leading to greater specific surfaces. It thus becomes very
obvious that fine materials will exhibit much greater specific surface than will coarse
materials. Some fine materials contain an enormous grain surface area per unit vol-
ume. The generalization of Eq. (6.71) for any preferred direction, say α, within the
anisotropic porous medium is possible after dividing it by the surface area of the
sphere with unit radius. Hence, Eq. (6.71) becomes
316 6 Spatial Simulation
!
1
σ =− rα (0) pα dα, (6.72)
2π
where rα (0) and pα are the derivatives of the autorun function at the origin and the
porosity along this preferred sampling line, respectively. The numerical calculation
of Eq. (6.72) can be achieved by taking various thin sections on different directions.
Since the porous medium has been regarded as a realization of a stochastic process,
it is necessary to develop a model for generating pore structure that is more rep-
resentative than those based deterministically on assemblages of spheres or tubes.
However, the spherical beads and capillary tube models are assumed for analytical
purposes in solving for fluid flow on the scale of a few pores. Therefore, they are
not convenient for generating the stochastic characteristics of the porous medium.
A simulation model of the grain-void size distribution can be achieved through the
autorun technique. Although such a model does not give, on the average, new infor-
mation about the medium, it helps to generate all of the possible line characteristics
function realizations of the medium. The first simulation model in this direction has
been proposed by Roach (1968) for independent processes.
However, in general, the porous medium composition of voids and solids has
dependent structure. This is tantamount to having clusters of voids and/or solids;
that is, as a general tendency voids follow voids and solids follow solids. The first
autorun coefficient, r(s), provides a criterion to decide whether the porous medium
composition has dependent composition; that is, occurrence of any void or solid
does not affect others. However, r(s) is significantly different from the porosity,
then dependence and, therefore, clustering exists. When r(s) > p a positive depen-
dence exists, which means physically that the clustering of voids is predominant.
On the contrary, for r(s) < p clustering of solids is effective.
The probabilities of having at least two successive voids is r(s), of having three
successive voids r2 (s), and in general having at least j successive voids is
Hence, substitution of Eqs. (6.73) and (6.74) yields the probability of having j unin-
terrupted successive voids on an infinite sampling line as
It indicates that void lengths are geometrically distributed with parameter r(s). The
expectation of void length in an infinite characteristic function can be found from
Eq. (6.75) as
1
E (nν ) = . (6.76)
1 − r (s)
For independent processes, p = r(s); this equation gives E(nυ ) = 1/(1 − p),
which has been presented by Feller (1967). The purpose of a simulation model
is to generate statistically indistinguishable synthetic characteristic functions from
the observed line characteristic function. In other words, on the average, statistical
parameters such as the mean and variance must be preserved in the synthetic char-
acteristic functions. It is possible to generate geometrically distributed void lengths,
υ, with parameter r(s) as
log ξ
ν=1+ , (6.77)
log r (s)
log ξ
g=1+ , (6.78)
log rg (s)
where rg (s) is the first autorun coefficient calculated from the line characteristics
after its multiplication by −1.
Since the void and solid lengths occur alternatively on a sampling line, the initial
length can either be selected randomly or according to the final length type on the
observed sampling line, and then void and solid lengths are arranged in sequence.
However, whichever way is adopted does not make any difference in the final
product.
The aforementioned modeling technique for the porous media based on the
autorun model has been implemented on a digital computer for parameters p = 0.5;
r(s) = rg (s) = 0.5, 0.7, and 0.9. Very long sequences of synthetic line character-
istic functions (10,000 points) have been generated, but three samples with length
of 125 points each are presented in Figs. 6.25a, 6.26a, and 6.27a, together with
their sample autorun functions in Figs. 6.25b, 6.26b, and 6.27b, respectively. Fig-
ure 6.25a represents the independent structure of void-solid occurrences resulting
from the autorun model, whereas Figs. 6.26a and 6.27a are samples of dependent
structure.
It is clear that on the average the lengths of voids or grains are shorter in Fig.
6.25a than in others; in addition Fig. 6.25a is richer in the number of crossings
than others. As the autorun coefficient increases, these lengths as well as the num-
ber of crossings increase. Line characteristic functions in Figs. 6.25a, 6.26a, and
318 6 Spatial Simulation
6.27a are the possible realizations of fine, medium, and coarse-grained sandstones,
respectively.
On the other hand, generalization of the stochastic model proposed by Roach
(1968) for finite sampling lines can be achieved by the autorun coefficient so as
to cover dependent porous medium configuration. For this purpose, the following
conditional probabilities of the combinations of two successive events at equally
spaced points along a sampling line can be written as
P (ν/ν) = r (s) ,
P (g/ν) = 1 − r (s) ,
p
P (g/g) = 1 − 1−p [1 − r (s)] , (6.79)
p
P (ν/g) = 1−p [1 − r (s)] .
6.5 Autorun Simulation of Porous Material 319
0.10 0.20
p = 0.1 p = 0.2
0.08 0.16
0.06 0.12
Nv /N
Nv /N
n=
n=
0.04 0.08
1
1
0.02 2 0.04
2
3 3
0 0
0 2 4 6 8 10 0 2 4 6 8 10
r(Δs) r(Δs)
a b
0.30 0.40
p = 0.3 p = 0.4
0.24 0.32
0.18 0.24
Nv /N
Nv /N
n=
n=
0.12 0.16
1
0.06 2 0.08 2
3 3
0 0
0 2 4 6 8 10 0 2 4 6 8 10
r(Δs) r(Δs)
c d
Fig. 6.27 (a–d) Number of voids on a line characteristic function
On the average, the probability P(υ) that a point lies in a void space is equal to
p, whereas the probability P(g) that a point is within a solid space is q = 1 − p.
The void-solid sequence probability is given in Eq. (6.79) as P(υ/g), which defines
a crossing point in the line characteristic function. The probability that a solid is
followed by a void, P(g, υ), is equal to P(g)P(υ/g) and explicitly
P (g,ν) = p [1 − r (s)] .
However, the probability P(g, υ, g) of having void length equal to 1s can be
obtained as P(g, υ)P(g/υ), or as
Similarly, in general, the probability of having the void length equal to ns becomes
The number of voids, Nυ , of length ns in a set of N equally spaced points along
the sampling line can be obtained as
For the independent process, r(s) = p, and then Eq. (6.82) yields the same result
as originally proposed by Roach (1968). Furthermore, a similar expression for the
solid can be found as
n−1
2
Ng = N(1 − p) 1 − rg (s) 1 − rg (s) . (6.83)
Figure 6.28 shows the change in the number of voids per unit length, Nυ /N with the
autorun coefficient for given void length of 1s, 2s, and 3s at different porosi-
ties, p = 0.1, 0.2, 0.3, and 0.4. It is obvious that the number of voids with length 1s
decreases continuously with the increase of the autorun coefficient. However, such
a relationship is not valid for other void lengths, because the number of voids with
lengths other than 1s has maximum values for autorun coefficients other than zero.
For instance, this maximum occurs at about r(s) = 0.3 for p = 0.3. Furthermore,
whatever the porous medium parameters are in the long run, the number of voids
of any length will be more than the voids with smaller lengths. It can be concluded
from Fig. 6.28 that the difference between the numbers of voids per unit length for
14
12
10
8
E(γck)
0 2 4 6 8 10 12 14 16 18 20
Log-K
voids of length 1s and others is relatively larger at low autorun coefficients. For
a line characteristic function with N = 10,000 points, p = 0.3, and r(s) = 0.2,
the number of voids for void lengths of 1s, 2s, and 3s are 1920, 384, and 77,
whereas with r(s) = 0.8 they are 120, 96, and 77, respectively.
Different rock quality description indices that are presented into the rock mechanics
literature, which are based on intact lengths, assume that the occurrences of discon-
tinuities and, accordingly, intact lengths are independent from each other. Unfortu-
nately, the validity of such an assumption is questionable, and therefore the final
equations for rock quality assessments need to be validated prior to their use.
Otherwise, the results can be regarded only as initial approximations, which
might be rough estimations of the rock quality. It is rather obvious that due to dif-
ferent geological phenomena the rock discontinuities appear in local cluster forms,
which shows that the intact lengths cannot be considered as independent from each
other. In order to check whether the intact lengths along a scanline are indepen-
dent or dependent from each other, herein, the CSV method has been presented in
detail. The basic concept of standard CSV is presented and so simplified that in
the case of independent intact length occurrences its variation with the total intact
length appears as a straight line, which passes from the origin. Any systematic devi-
ation from straight line implies that the intact length are dependent, and therefore
classical rock quality description index must not be used and instead measures that
take into account the correlation of intact lengths must be preferred in rock qual-
ity studies. Besides, theoretical CSV models are developed for the independent,
Markov, and ARIMA(1, 1) processes. These models provide a general guide in the
identification of intact length correlation structure as well as the numerical value of
such a correlation.
The application of the methodology developed, herein, is presented for scanline
measurements at Tytherington quarry in England. The Markovian type of correlation
structure is identified for all of the scanlines from this quarry.
Mechanical and/or hydraulic behavior of jointed rock masses require an accu-
rate representation of joint geometry including intact and trace lengths, orientation,
intensity, etc. The spacing between two successive joints along an arbitrary sam-
pling line, namely scanline, is referred to as the intact length. Most of the mechani-
cal behavior as well as the quality classification of jointed rock masses are directly
related to the intact length occurrences along different directions. According to
Terzaghi (1965), the intact lengths along the mean pole direction can be regarded
as the true spacing, whereas along any other directions they can be expressed as
a function of the true spacing and the angle between the sampling and the mean
pole directions. In case of several joint sets, within the same rock mass, it is rather
322 6 Spatial Simulation
difficult to measure true spacing for an individual set. In practice, however, scanline
measurements include intact lengths between joints belonging to different sets.
Many investigators have examined empirical joint spacing distributions based on
measured intact lengths, and generally two different types of theoretical distribution
functions were proposed, namely lognormal (Steffen, 1975; Bridges, 1975; Barton,
1977; Şen and Kazi, 1984) and negative exponential distribution (Call et al., 1976;
Priest and Hudson, 1976; Einestein and Baecher, 1983; Wallis and King, 1980).
Besides, Gamma distribution of intact lengths has been proposed by Şen (1984).
However, it seems that the negative exponential distribution appears to be the most
frequently used one due to its simplicity. In order to decide whether the intact lengths
fit the negative exponential or lognormal distribution, most often the conclusions are
based on visual comparison, but Baecher et al. (1977) have used goodness-of-fit tests
in the decision-making about the best distribution. For such a test the primary requi-
site is the histogram of intact lengths. However, any test of goodness-of-fit ignores
the internal structure in the intact lengths occurrence along a scanline, i.e., the intact
length occurrences are assumed to have a complete random behavior without any
dependence on each other. We cannot really assume that nature is nice enough to
present us with independent identically distributed intact lengths. In nature the cre-
ation of fractures depend on many interactive phenomenanon such as pressure, tem-
perature, volcanic activity, earthquakes. As a consequence, prior to any assessment
of intact lengths with the classical techniques, one should confirm their random
occurrences. All of the aforementioned studies imply that the intact lengths occur
randomly, which might not be the case in actual situations and subsequently overes-
timations result in the evaluations. Hence, a key question in the intact length evalu-
ation is whether they have an independent structure or not? Provided that the inde-
pendence is verified, then the use of classical techniques leads to reliable answers.
Otherwise, new techniques should be devised so as to take into consideration the
dependence structure of intact lengths. This point has been observed first by Higgs
(1984), who suggested an empirical technique called “the profile area method”
by which the dependence structure within the intact length measurements series
is taken into consideration indirectly but in an effective way. Later, Şen (1990b)
provided analytical formulation for the profile area method, which showed explicitly
that the rock quality measures are not only functions of simple statistical parameters
such as the mean and variance values but additionally the serial correlation coeffi-
cient. The classical techniques appear as a special case of the analytical formula-
tions when the serial correlation coefficient is practically equal to zero. On the other
hand, Eissa and Şen (1990) made an extensive computer simulation by using Monte
Carlo techniques so as to generate serially dependent intact lengths according to a
very simple Markov model. The calculations of RQD for different serial correlation
coefficients indicated that increase in correlation coefficient results in increase in
rock quality.
However, neither the empirical method (Higgs, 1984) nor the simulation tech-
nique (Eissa and Şen, 1990) is suitable for practical purposes. It is, therefore, sug-
gested herein the use of CSV concept as suggested by Şen (1989a) in evaluating
the intact length occurrences and accordingly the rock quality classification. The
6.6 CSV Technique for Identification of Intact Length Correlation Structure 323
CSV technique provides graphs that give additional interpretations in intact length
occurrences, and especially in their regional behaviors.
Intact length measurements in the field are dependent on the relative positions of
the scanline within the study area. The variability of intact lengths along a scan-
line leads to the concept of regional variability of intact length with the change in
scanline orientation. This variability determines the regional behavior as well as the
predictability of the intact lengths on the basis of which the rock quality appreci-
ations can be assessed. Large variability implies that the degree of dependence of
intact length on each other might be rather small even for scanlines close to each
other. Such a variability may be a product of either one of the active geological
phenomenon such as tectonic, volcanic, depositional, erosion, recharge activities.
In order to quantify the degree of variability of regional changes, variance tech-
niques have been used already (Higgs, 1984). On the other hand, Eissa and Şen
(1990) and Şen (1991) have employed autocorrelation methods in the intact length
assessment. However, these methods cannot account for the regional dependence
due to either non-normal distribution and/or irregularity of sampling positions (Şen,
1978).
The SV method has been proposed by Matheron (1963) to get rid of the afore-
mentioned drawbacks. Its elegancy is that the regionalized variable PDF is not
important in obtaining the SV, and furthermore it is effective for irregular data
positions. It is to be recalled herein that the classical variogram, autocorrelation,
and autorun techniques all require equally spaced data values. However, the dis-
continuities along a scanline are irregularly spaced, and therefore use of classical
techniques is highly questionable, except that these techniques provide approximate
results only. Although the SV technique is suitable for irregularly spaced data, it
has practical difficulties as summarized by Şen (1989a). Among such difficulties is
the grouping of distance data into classes of equal or variable lengths for SV con-
struction, but resulting SV results in an inconsistent pattern and does not have a non
decreasing form.
However, adaptation of CSV gives with the same data always a non-decreasing
pattern without grouping of distances, but rather their arrangement in ascending
order. By this arrangement each one of the distances is considered individually in
the regional variability of the intact lengths. In general, CSV, γs (dk ) is defined as a
successive summation of squared-differences as
k ( ) 1 k
γc (dk ) = d di = (Zi − Zi−1 )2 , (6.84)
2
i=1 i=1
324 6 Spatial Simulation
in which d(di ) indicates the half-squared difference at i-th order in an ordered intact
length arrangements where superscript i indicates the rank; Zi is the intact length
corresponding to rank i.
1 " ( 2) ( )#
E γc (hk ) = E Zi − 2E (Zi Zi−1 ) + E Z2i−1 . (6.85)
2
Let us assume that the intact lengths are second -order stationary, which implies
that E(Zi 2 ) = E(Z2 i−1 ) = σ2 and E(Zi Zi−1 ) = cov(Zi Zi−1 ) + σ2 where cov(Zi ,
Zi−1 ) indicates the covariance between Zi and Zi−1 and finally σ2 is the variance
of intact lengths. It is to be noted that second-order stationary implies that the sta-
tistical parameters are independent from i, i.e., from rank. Consideration of these
conditions and the substitution of the relevant values into Eq. (6.2)lead to
k
cov (Zi ,Zi−1 )
E γc (hk ) = σ 2 k − , (6.86)
σ2
i=1
or since from statistical time series analysis (Box and Jenkins, 1970) the ratio term
in the parenthesis is defined as the lag-k autocorrelation function, ρi , then Eq. (6.86)
can be rewritten succinctly as
% &
k
E γc (kk ) = σ 2
k− ρi , (6.87)
i=1
in which ρi represents the correlation within the intact length sequence and this
expression is a general formulation of the theoretical CSV. The following specific
models of CSV can be derived from the available stochastic processes in the litera-
ture as employed by Eissa and Şen (1990).
1. Independent model CSV: This is the most widely used assumption in the rock
quality assessments that have appeared in the literature (Priest and Hudson, 1976;
Hudson and Priest, 1979; Şen 1984, 1990b). The uncorrelatedness of intact lengths
implies ρ = 0, and consequently the simplest model of all emerges as
6.6 CSV Technique for Identification of Intact Length Correlation Structure 325
E γc (kk ) = kσ 2 , (6.88)
which means that in the CSV model of such intact length occurrences, the vari-
ance plays the dominant role only. The graphical representation of independent CSV
model appears as a straight line that passes through the origin as shown in Fig. 6.29.
The slope of this straight line gives the variance of intact lengths.
As the variance value becomes smaller and smaller, the independent process CSV
becomes closer to the horizontal axis, which represents the distances. For instance,
in the case of equal intact lengths along a scanline, the variance becomes zero and
hence the horizontal axis represents the CSV. Otherwise, for large variances the
CSV model becomes closer to the vertical axis, indicating significant and random
differences between successive intact lengths.
2. Markov model CSV: When the intact lengths are serially dependent and the
correlation coefficient decreases according to a power law with the increase of lag,
then the generating mechanism of them is Markov process, which has the autocor-
relation structure as ρ0 = 1, but ρi = ρ1 i (i =1, 2, . . .), where ρi is the lag-one
correlation coefficient, which may assume any value between −1 and +1. For this
model the CVS model becomes
% &
1 − ρk1
E γc (kk ) = k 1 − ρ1 σ 2, (6.89)
1 − ρ1
which reduces to Eq. (6.87) for ρ1 = 0. Positive ρ1 value means that, the average,
long intact lengths follow long intact lengths and short intact lengths follow short
intact lengths. On the other hand, negative ρ1 implies that long intact lengths follow
14
12 5
0. 7
0.
10
9
d 0.
8 ρ=
E(γck)
0 2 4 6 8 10 12 14 16 18 20
Log-K
Fig. 6.29 Theoretical standard CSV model for Markovian intact lengths
326 6 Spatial Simulation
short intact lengths or vice versa. Equation (6.84) has power form for small k values
but becomes straight line for large k-values as
ρ1
E γc (kk ) = k 1 − σ 2. (6.90)
1 − ρ1
Division
of both sides by σ2 gives rise to the definition of standard CVS,
E γc (kk ) /σ as
2
ρ1
(k) = k 1 − . (6.91)
1 − ρ1
ρ0 = 1,
(1 − φθ) (φ − θ)
ρ1 = ,
1 + θ2 − 2φθ
i−1
ρi = φρ1 ,
for i > 2. ARIMA model represents intact lengths, which are more persistent than
the Markov model case. The substitution of autocorrelation structure into Eq. (6.87)
yields, after necessary algebraic manipulation,
14
ρ = 0.9 1
12 0.
5
0.
10 d
9
8 0.
E(γck)
5
6 0.9
4 9
φ=9
0 2 4 6 8 10 12 14 16 18 20
Log-K
Fig. 6.30 Theoretical standard CSV model for ARIMA intact lengths
6.6 CSV Technique for Identification of Intact Length Correlation Structure 327
1 − φk
E γc (kk ) = k − ρ1 σ 2, (6.92)
1−φ
where φ is the model parameter representing extra persistence on the Markov model.
The standard CSV model is
1 − φk
(k) = k − ρ1 , (6.93)
1−φ
which provides power type of curves for small k values; but as k becomes big-
ger and bigger, it asymptotically approaches to a straight line portion that can be
expressed as
kρ1
(k) = k − . (6.94)
1−φ
The graphical representation of ARIMA (1, 1) model standard CSVs are presented
in Fig. 6.31.
The comparison of this figure with the Markovian standard CSVs in Fig. 6.30
indicates that in the case of ARIMA (1, 1) model the attendance of CSV to a straight
line appears at bigger distances. It is obvious that ARIMA model CVS reduces to
Markov and independent model cases for φ = ρ1 and ρ1 = 0, respectively. Equation
(6.94) helps to identify the correlation structure of intact lengths. In order to find the
correlation coefficient, it is sufficient to take the derivative of Eq. (6.94) with respect
to k, which leads to
d (k) ρ1
γc = =1− , (6.95)
dk 1−φ
in which γc indicates the slope of standard CSV at large distance. For independent
process case ρ1 = 0 and hence γc =1.
This is the case where the RQD formulations in terms of the average discontinu-
ity number as given by Priest and Hudson (1976) or Şen (1984) for different distri-
butions is valid. Otherwise, deviations from the straight line that passes through the
origin on a standard CSV plot indicates invalidity of all these formulations since the
intact lengths are correlated.
If the correlation structure is of Markovian type, i.e., φ = ρ1 , then Eq. (6.95) can
be rearranged for the first-order (lag-one) autocorrelation coefficient estimation as
1 − γc
ρ= . (6.96)
2 − γc
Example 6.1
The field data for the implementation of the methodology developed herein are
extracted from a previous study by Ryckes (1984) on modes of failure and the sta-
bility of rock slopes. The scanline measurements are carried out at the Tytherington
328 6 Spatial Simulation
40 15
30 12
Frequency
Frequency
9
20
6
10
3
0 0
0 30 60 90 120 150 0 20 40 60 80 100
Intact length (cm) Intact length (cm)
a b
30 16
25
12
20
Frequency
Frequency
15 8
10
4
5
0 0
0 20 40 60 80 0 40 60 80 100 120
Intact length (cm) Intact length (cm)
c d
20
16
Frequency
12
0
0 20 40 60 80 100
Intact length (cm)
e
Fig. 6.31 Sample and theoretical pdfs for intact lengths along (a) SL1; (b) SL11; (c) SL12; (d)
SL2; (e) SL3
i.e., intact length distributions. Due to these different mechanisms it is not possible
to expect that the intact length sequence along any scanline in this area has inde-
pendent correlation structure. In order to depict the regional discontinuity pattern
in this area, three scanlines were set up at different directions. The bedding plane
orientation is almost horizontal for the first scanline, which will be referred herein
as SL1. The second scanline, SL2, has a direction of 20◦ toward southwest whereas
the third scanline, SL3, has an intermediate inclination to the former. All of the
scanlines were set up horizontally as two joint sets were recognized to be vertical.
The geometrical and statistical summaries of these three scanlines are presented in
Table 6.6. Due to its heterogeneous structure the first scanline will be examined in
two subsets, which will be referred to as SL11 and SL12.
The frequency distribution functions for scanlines are shown in Fig. 6.32.
The theoretical pdfs have been fitted to the experimental counterparts by using
χ2 tests. For five different scanlines, two types of intact length distributions, namely
negative exponential and logarithmic normal, emerge as representative. This dif-
ference in the intact length distribution indicates very clearly that their occurrence
within the rock mass is heterogeneous, i.e., direction dependent, and consequently
even from this observation one can qualitatively conclude that intact lengths have
dependent structure. However, classical statistical tests, such as the one used for dis-
tribution function identification, do not give any clue about the correlation structure
of intact lengths.
Eissa and Şen (1990) suggested the use of correlation function technique in
intact length structure exploration. However, correlation techniques are valid only
for the normally distributed intact lengths (Şen, 1978). As shown above, none
of the intact lengths at Tytherington quarry are distributed normally and conse-
quently the use of correlation techniques is not meaningful. However, the non-
normal intact lengths, especially log-normally distributed ones, can be transformed
into normal distribution easily, but the transformation of negative exponential dis-
tribution exposes great difficulties. Nevertheless, even if the transformation is pos-
sible, the transformed intact lengths will not reflect genuine properties of original
lengths.
330 6 Spatial Simulation
Cumulative semivariogram
500
3000
d 400
2000 300
200
1000
100
d
0 0
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250
R R
Distance (cm) Distance (cm)
a b
1000 4000
Cumulative semivariogram
Cumulative semivariogram
800
3000
600
2000
400
d d
1000
200
0 0
0 50 100 150 200 250 300 0 50 100 150 200 250 300
R Distance (cm) R Distance (cm)
c d
4000
Cumulative semivariogram
3000
d
1000
0
0 50 100 150 200 250 300 350 400
R
Fig. 6.32 Sample standard CSV of (a) SL1; (b) SL11; (c) SL12; (d) SL2; (e) SL3
As mentioned in the previous section, the CSVs are robust and valid for any
pdf. In fact, the central limit theorem of classical statistics states that whatever the
underlying probability distribution of a random variable, its successive summations
or averages will have normal distribution. The experimental CSVs for each one of
the scanlines considered in this study are presented in Fig. 6.33. At a first glance
one can observe the following significant points.
1) None of the CSVs appear as a straight line passing through the origin. This is
tantamount to saying that the intact length occurrences along any scanline cannot
be considered as independent processes, but they all emerge from dependent
6.6 CSV Technique for Identification of Intact Length Correlation Structure 331
1.0 1.0
0.8 0.8
0.6 0.6
Y Y
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
X X
1.0
0.8
0.6
Y
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
X
Fig. 6.33 (a) A fracture set with 30o orientation from the north; (b) Two fractures sets with 0 and
30o orientation from the north; (c) Three fractures sets with 0, 30, and 130o orientation from the
north
processes. This further implies that in the creature of discontinuities within the
rock mass, uniform conditions did not prevail, but rather a complex combination
of multitude geological events such as tectonics, cooling, volcanic activities took
place jointly.
2) The initial portions of each experimental CSVs shifts toward the distance axis.
Such a shift implies the existence of positive correlation between successive
intact lengths. It further implies that, in general, big intact lengths follow big
intact lengths and small intact lengths follow small intact lengths. Otherwise, if
negative correlation should prevail in the occurrence of intact lengths, then the
initial CSV portion would shift toward the vertical axis.
3) Each experimental CSV becomes to fluctuate about a straight line for large dis-
tance values. As stated before, the existence of such a straight line as a portion
in the CSV implies that over the distance range of this straight line portion the
intact lengths are independent from each other. This is the only range where the
classical rock quality designation formulations keep their validity. Local devi-
ations from the straight line indicate the hidden or minor dependencies in the
intact length evaluation.
4) Over an initial distance range, R, the experimental CSV appears as a curve. This
range is defined quantitatively as the distance between the original point and the
332 6 Spatial Simulation
abscissa of the initial point of the late straight line portion, as mentioned in step
(3). It is worth mentioning, herein, that as long as the threshold value necessary
in rock quality designation (RQD) calculation is greater than this range value,
then the calculation is theoretically sound and valid. Otherwise, threshold val-
ues less than this range value will give RQD values, which are unreliable. Hence,
CSV provides a criterion for checking the validity of classical RQD values based
on 0.1 m (4 inches). Such a valuable criterion cannot be obtained with the corre-
lation techniques. These techniques are valid only for the distance domain over
which the CSV appears as a straight line.
5) Existence of straight line at large distances identifies the underlying generating
mechanism of the intact lengths as a Markov process. It is already mentioned
in the previous section that if the curvature continues even at reduced rates at
large distances, then the ARIMA(1, 1) process becomes effective. It is observed
that whole the scanlines considered in this chapter have the Markovian nature of
dependence, i.e., correlation.
6) The slope of the large distance straight line portion is related to the standard
deviation of the underlying intact lengths. It is possible to consider this slope as
the population standard deviation of the intact lengths.
7) The vertical distance between the late distance straight line and the one drawn
parallel to it passing through the origin reflects the magnitude of correlation coef-
ficient of the intact lengths. Obviously, the smaller this distance the smaller will
be the intact length correlation. Under the light of the above mentioned points,
the relevant numerical values concerning the initial range, R, vertical distance, d,
slope, γc of final straight line, and the correlation coefficient estimation from
Eq. (5.13) are summarized in Table 6.7.
It is obvious from this table that the correlation coefficient along each scanline
has more or less the same magnitude, and therefore the intact lengths dependence
may be regarded as regionally isotropic. It implies that only one type of model,
which is already identified as Markovian type, can be used in intact length descrip-
tion. The use of this model in intact length generation is outside of this chapter’s
scope. Last but not the least, the following particular points for each scanline can be
inferred from the CSV graphs comparison from Fig. 6.33.
1) The CSVs of SL1, SL2, and SL3 have the similar pattern in that they are com-
posed of initial curvature portion and large distance straight line. However, local
deviations around this straight line are rather more persistent in the case of SL1
than others. This indicates that the intact lengths along SL1 have same secondary
internal structure. In order to discover this structure SL1 is divided into two
mutually exclusive portions, namely SL11 and Sl12.
2) The CSVs for SL11 and SL12 in Fig. 6.33b and c are distinctively different than
others. They exhibit not a single straight line but two successive straight lines.
This implies that along the SL1 direction there is another secondary geological
event, which plays a role in the occurrence of discontinuities along this scanline.
A reliable explanation on this point can be arrived only after a detailed geological
study of the quarry considered.
One of the most recent and significant question about the intact lengths is whether
they have independent or dependent occurrences along a given scanline? In the
past, without consideration of such a question all of the rock quality classifica-
tions derived from the intact length properties were based on the assumption that
the intact lengths are identically and independently distributed. The difficulty was
due to the lack of a reliable technique in quantifying the correlation structure of
intact lengths. However, in this chapter standard CSV are proposed as a practical
tool in measuring the intact length correlations. The CSV calculations are straight-
forward, without any difficulty or ambiguity. The CSV graphs show the change
of half squared-differences between intact lengths with ordered distance. The intact
lengths are independent only when the standard CSV variation with distance appears
as a straight line that passes through the origin. Otherwise, they are dependent and
according to the dependence structure standard CSV graphs take different shapes.
However, they have some common property such as that at small distances they
appear as curves whereas at big distance again straight lines dominate, but their
extensions do not pass through the origin. The slope of the straight line portions
on standard CSV plot gives opportunity in calculating the intact length correlation
coefficient. For instance, if this slope is close to unity, then and only then the intact
lengths can be assumed as independent and consequently theoretical RQD relation-
ships with the average number of discontinuities are reliably used in any rock eval-
uation project. The application of methodology developed in this chapter has been
performed for the field data obtained at Tytherington quarry in England.
In general, the dimensions, orientation, trace lengths, apertures, and other geome-
chanical properties of each fracture are randomly distributed in nature. Therefore,
their replications through Monte Carlo simulations lead to realizations of a possible
portion of the fractured medium with the same statistical properties as the observed
ones. Once a representative simulation model is developed, it is then possible to
obtain the geomechanical properties of the fracture network in detail. Herein, the
main geomechanical property is the directional RQD. In order to achieve such a
goal in this chapter, the discontinuity network properties are assumed and defined
by the following characteristics.
1) The discontinuities within a fracture network are assumed to occur randomly
with their midpoint coordinates, obeying the uniform pdf within a unit frame
of 1×1. It is possible to enlarge such a basic frame by suitable horizontal and
6.7 Multidirectional RQD Simulation 335
In its simplest form the RQD is defined as the percentage of intact length summa-
tion, S, that is greater than a threshold value such as 0.1 m (4 in) along a scanline
length, L, as
S
RQD = 100 . (6.97)
L
The rock mass is classification is already given in Table 6.1. The practical diffi-
culties in using Eq. (6.97) are as follows
1) It gives a single sample value of RQD along one direction. The practical dif-
ficulty may arise in trying to get a different scanline direction so as to assess
the possible heterogeneities within the rock mass. For instance, along a highway
cut or tunnel excavation, only longitudinal scanlines can be measured for RQD
calculations. Lateral scanlines are possible only in large diameter tunnels and
large highway cuts. Therefore, it is necessary to set up a fracture network model
from the available scanline measurements; and then by using this model many
scanlines in a multitude of desired directions can be taken synthetically.
336 6 Spatial Simulation
2) Most often in nature, either due to narrow rock surface exposure, or weathering
or geomorphologic features, it may not be possible to make measurements along
rather long scanlines. This gives rise to another source of error or bias in the
RQD calculations according to Eq. (5.1) (Sen and Kazi, 1984). However, a well-
defined representative fracture network model provides unlimited opportunities
in making scanline measurements as long as desired.
3) The RQD calculations from Eq. (6.97) do not give any insight into the intact
length distribution. As mentioned by Şen (1984) different distributions provide
different formulations of the RQD calculation. However, in a fracture network
model, as developed in this chapter, the intact length distributions are known;
and therefore, one can make more accurate RQD estimations.
4) In a scanline measurement, different sets of fractures are measured irrespective
of their individual probability distribution functions. Therefore, it is not possible
to assess the effect of each set separately from others. However, in a fracture
network model such assessments can be made rather easily and the contribution
of each fracture set on the RQD value can be quantified individually.
5) Along the scanline adopted in the field, there may occur only intact lengths that
are all greater (or smaller) than the threshold value. In such a situation, Eq. (6.97)
leads to the rock mass as either excellent (or very poor). In order to account
completely for all of the possible intact lengths, the population distribution of
fractures at every direction should be known. Such a task can be achieved easily
once the fracture network model is established for the rock mass.
dx 1
=− . (6.100)
d ln P (x) λ
This equation shows a linear relationship between ln P(x) and x and the slope of
straight line is equal to 1/λ. Consequently, the following points can be advised for a
practicing geologist in order to find the λ value.
The only practical difficulty in this procedure is encountered in step (1) for cal-
culating the cumulative probability. However, there appears to be two distinctive
approaches. One is the usual way of constructing a histogram for the intact lengths,
and then transforming it into a cumulative PDF. However, this application raises
practical difficulties, especially when the number of intact lengths is small, which is
almost the case in any field survey. Another way of obtaining the cumulative prob-
ability is an empirical approach, which is based on the rank, m, of any intact length
within a set of available data with size n. Hence, the cumulative probability can be
expressed as
m
P (x) = . (6.101)
n+1
This empirical approach is used very often in the practical studies, and it does
not give any difficulty in the applications. The RQD analysis based on this empirical
law, together with the generated intact lengths, is given in the following section.
338 6 Spatial Simulation
1.0
Cumulative probability 1.0
Cumulative probability
0.1 0.1
(A) (B)
0.01 0.01
0 5 10 15 25 0 5 10 15 25
Intact length (×10–2 m) Intact length (×10–2 m)
1.0
Cumulative probability
0.1
1/λ = 0.106
λ = 9.434
(C)
0.01
0 5 10 15 25
Intact length (×10–2 m)
Fig. 6.34 Cumulative probability plots of intact lengths at the 0.1 level. (A) single set at 30o ; (B)
two sets at 0 and 30o ; (C) three sets at 0, 30 and 130o
scatter becomes less. The best straight lines are fitted through the scatter of points;
their slopes are found by considering a full cycle on the logarithmic axis. The
corresponding length on the horizontal axis is equal to the slope, i.e., 1/λ parameter.
This slope calculation procedure is presented in Fig. 64C. The results are summa-
rized in Table 6.9.
As is obvious from Table 6.4, the best rock quality is obtained when only one
fracture set is considered. As expected, an increase in the number of fracture sets can
cause a decrease in the RQD value. There appears a practically significant difference
between the RQD values based on the single set of fracture and those on two or three
fracture sets. The rock quality is of an excellent type when one set of fractures is
considered; whereas the quality deteriorates and becomes very good and fair for
two and three fracture sets, respectively. The last row in Table 6.4 gives average
RQD values for all the scanlines considered in this study. It is obvious from the
comparison of these values that when two sets are considered, the RQD deterioration
relative to one set is almost 15%, whereas for three sets consideration is almost 30%.
References
Agterberg, F., 1975. New problems at the interface between geostatistics and geology. Geostat. 75,
403–421.
Baczynski, N. R. P., 1980. Zonal concept for spatial distribution of fractures in rock. In Proceed-
ings of the Thrid Australian and New Zealand Conference on Geomechanics. Wellington, New
Zealand, pp. 29–33.
Baecher, G. B., Lanney, N. A., and Einstein, H. H., 1977. Statistical description of rock properties
and sampling. 18th U.S. Symposium on Rock Mechanics, Keystone, Colorado, pp. 5C1-1–
5C1-8.
Barton, C. M., 1977. Geotechnical analysis of rock structure and fabric in C.S.A. Mine, Cobar,
New South Wales. Applied Geomechanics, Technical Paper 24, CSIRO, Australia.
Barton, N., Lien, F., and Lunde, J., 1974. Engineering classification of rock masses for the design
of tunnel support. Rock Mech. 6(4), 189–236.
Bieniawski, Z. T., 1974. Geomechanic classification of rock masses and its application in tun-
neling. Proceedings of the 3rd International Congress on Rock Mechanics, Denver, Colorado,
September 1974, Vol. 2A, pp. 27–32.
Box, G. E. P., and Jenkins, G. M., 1970. Time Series Analysis, Forecasting and Control. Holden
Day, San Francisco, 475 pp.
Brady, B. H. G., and Brown, E. T., 1985. Rock Mechanics for Underground Mining. Allen-Unwin
Press, London.
Bridges, M. C., 1975. Presentation of fracture data for rock mechanics. Second Australia New
Zealand Conference on Geomechanics, Brisborne, Australia, pp. 144–148.
Brown, E. T., 1981. Rock Characterization Testing and Monitoring. Pergamon Press, London,
221 pp.
Call, R. B., Savely, J., and Nicholas, D. E., 1976. Estimation of joint set characteristics from surface
mapping data 17th U.S. Symposium on Rock Mechanics, pp. 2B21–2B29.
Cliff, A. D., and Ord, J. K., 1973. Spatial Autocorrelation. Pion, London.
Coates, D. F., 1964. Classification of rocks for rock mechanics. Int. J. Rock Mech. Min. Sci.
Geomech. Abst. 1(3), 421–429.
Cording, E. J., and Mahar, J. W., 1978. Index properties and observations for design of chambers
in rock. Eng. Geol. 12, 113–142.
Cramer, H., 1938. Random Variables and Probability Distribution Mixtures of alcohols, acids and
amines. J. Chem. Phys. 6, 847.
Cruden, D. M., 1977. Describing the size of discontinuities. Int. J. Rock Mech. Min. Sci. Geomech.
Abstr. 14, 133–137.
Deere, D. U., 1964. Technical description of rock cores for engineering purposes. Rock Mech.
Eng. Geol. 7(7), 16–22.
Deere, D. U., 1968. Geologic considerations. In: K. G. Stagg and O. C. Zienkiewicz (Eds.), Rock
Mechanics in Engineering Practice. John Wiley and Sons, New York, pp. 4–20.
References 341
Ege, J. R., 1987. Core index. A numerical corelogging procedure for estimating rock quality. US
Geol. Surv. 954, 1–15.
Einestein, H. H., and Baecher, G. B., 1983. Probabilistic and statistical methods in engineer-
ing geology, specific methods and examples. Part I: Exploration. Rock Mech. Rock Eng. 16,
39–72.
Eissa, E. A., and Şen, Z., 1990. Intact length persistance in relation to rock quality designation. Int.
J. Rock Mech. Min. Sci. Geomech. Abstr. 28, 411–419.
Feller, W., 1967. An Introduction to Probability Theory and its Application. John Wiley and Sons,
New York, 509 pp.
Goodman, R. E., 1976. Method of Geological Engineering in Discontinuous Rocks. West, San
Francisco, CA.
Goodman, R. E., and Smith, H. R., 1980. RQD and fracture spacing. J. Geotech. Eng. Division,
ASCE 106(GT2), 191–193.
Hammersely, J. M., and Handscomb, D. C., 1964. Monte Carlo Methods. Methuen and Co. Ltd.,
London, 178 pp.
Higgs, N. B., 1984. The profile-area and fracture frequency methods: two quantitative procedures
for evaluating fracture spacing in drill core. Bull. Assoc. Eng. Geol. 21(3), 377–386.
Hudson, J. A., and Priest, S. D., 1979. Discontinuities and rock mass geometry. Int. J. Rock Mech.
Min. Sci. Geomech. Abstr. 16, 339–362.
Hudson, J. A., and Priest, S. D., 1983. Discontinuity frequency in rock mass. Int J Rock Mech,
Mining Sci Geomech Abstracts. 20(2), 73–89.
International Society for Rock Mechanics, 1978. Standardization of laboratory and field tests. Sug-
gested methods for the quantitative description of discontinuities in rock masses. Int. J. Rock
Mech. Min. Sci. Geomech. Abstr. 15, 319–368.
Jenkins, G. M., and Watt, D. G., 1968. Spectral Analysis and Its Applications. HoldenDay, 523 pp.
Journel, A. G., 1974. Simulation conditionelle de gisements miniers-theorie et pratique: These de
Doctor Ingenieur, Universite de Noncy.
Kazi, A., and Şen, Z., 1985. Volumetric RQD; An index of rock quality. Proceedings of the Inter-
national Symposium on Fundamentals of Rock Joints, Bjorkliden, pp. 99–102.
Krige, D. G., 1951. A statistical approach to some basic mine valuation problems on the Witwater-
strand. J. Chem. Metall. Min. Soc. South Africa 52, 119–139.
Krumbein, W. C., 1970. Geological models in transition to geostatistics. Geostatistics. Plenum
Press, New York, pp. 143–161.
Kulhawy, F. H., 1978. Geomechanical model for rock foundation settlement. J. Geotech. Eng.
Division, ASCE 104(GT2), Proc Paper 13547, 211–228.
Long, J. C. S., 1983. Investigation of equivalent porous medium permeability in networks of dis-
continuity fractures. Unpublished Ph. D. Thesis, University of California, Berkeley, CA.
Louis, C., and Pernot, M., 1972. Three-dimensional investigation of flow conditions of Grand
Maison Dam Site. Proceedings of Symposium International Society of Rock Mechanics, Per-
colation through Fissured Rocks.
Matern, B., 1960. Spatial variation. Medd Statons Skogsforsknings institut, 144 pp.
Matheron, G., 1963. Principles of geostatistics. Econ. Geol. 58, 1246–1266.
Matheron, G., 1965. Les variables regionalises et leur estimation. Masson et Cie., Paris, 306 pp.
Mood, A. M., 1940. The distribution theory of runs. Ann. Math. Stat. 11, 427–432.
Otaibi, A., 1990. Geotechnical investigation and engineering geological maps of Al-Nagabah area.
Unpublished Thesis, Faculty of Earth Sciences, King Abdulaziz University, Jeddah.
Piteau, D. R., 1970. Analysis of the genesis and characteristics of jointing in the Nchanga Open
pit for the purpose of ultimately assessing the slope stability. Report, Nchanga Consolidated
Copper Mines Ltd.
Priest, S. D., and Hudson, J., 1976. Discontinuity spacing in rock. Int. J. Rock Mech. Min. Sci.
Geomech. Abstr. 13, 135–148.
Priest, S. D., and Hudson, J. A., 1981. Estimation of discontinuity spacing and trace length using
scanline surveys. Int. Rock Mech. Min. Sci. Geomech. Abstr. 18(3), 183–197.
342 6 Spatial Simulation
Rice, S. O., 1945. Mathematical analysis of random noise. Bell. Sys. Tech. J. 24, 46–156.
Roach, S. A., 1968. The Theory of Random Clumping. Methuen, London.
Rouleau, A., 1984. Statistical characterization and numerical simulation of a fracture system –
Application to groundwater in the Stsipe Granite. Unpublished Ph. D. Thesis, University of
Waterloo, Ontario, Canada.
Roulean, A., and Gale, J. E., 1985. Statistical characterization of the fracture system in the Stripa
Granite, Sweden. Int. J. Rock Mech. Min. Sci. Geomech. Abstr. 22(6), 353–367.
Ryckes, K. D., 1984. A routine method to evaluate the modes of failure and the stability of rock
slopes. Unpublished Thesis, Imperial College of Science, Technology and Medicine, 246 pp.
Şen, Z., 1974. Small sample properties of stationary stochastic models and the Hurst phenomenon
in Hydrology. Unpublished Ph.D. Thesis, Imperial College of Science and Technology, Uni-
versity of London, 286 pp.
Şen, Z., 1978. Autorun analysis of hydrologic time series. J. Hydrol. 36, 75–85.
Şen, Z., 1979a. Application of autorun test to hydrologic data. J. Hydrol. 42, 1–7.
Şen, Z., 1979b. Effect of periodic parameters on the autocorrelation structure of hydrologic series.
Water Resour. Res. 15(6), 1639–1642.
Şen, Z., 1980. Statistical analysis of hydrologic critical droughts. J. Hydraul. Div. ASCE, Proc.
Pap. 14134, 106(HY1), 99–115.
Şen, Z., 1984. RQD models and fracture spacing. J. Geotech. Eng. Amer. Soc. Civ. Eng. 110(2),
203–216.
Şen, Z., and Kazi, A., 1984. Discontinuity spacing and RQD estimates from finite length scanlines.
Int. J. Rock Mech. Min. Sci. Geomech. Abstr. 21(4), 203–212.
Şen, Z., 1985. Autorun model for synthetic flow generation. J. Hydrol. 81, 157–170.
Şen, Z., 1989a. Cumulative semivariogram models of regionalized variables. Int. J. Math. Geol.
21(3), 891–903.
Şen, Z., 1989b. Comment on “The generation of multi-dimensional autoregressive series by her-
ringbone method”. Math. Geol. 21(2), 267–268.
Şen, Z., 1990a. RQP RQR and fracture spacing: Technical note. Int. J. Rock Mech. Min. Sci.
Geomech. Abstr. 22, 135–137.
Şen, Z., 1990b. Spatial simulation of geologic variables. J. Math. Geol. 22(2), 175–188.
Şen, Z., 1991. The profile-area method for statistical evaluation of rock mass quality. Bull. Assoc.
Eng. Geol. XXVIII, 351–357.
Sharp, W. E., and Aroian, L. A., 1985. The generation of multidimensional autoregressive series
by herringbone method. Math. Geol. 17, 67–79.
Smith, L., and Schwartz, F. W., 1984. An application of the influence of fracture geometry on mass
transport in fracture media. Water Resources Res. 20(9), 1241–1252.
Steffen, O., 1975. Recent developments in the interpretation of data from joint surveys in rock
masses. Sixth Regional Conference for Africa on Soil Mechanics and Foundations, II, pp.
17–26.
Switzer, P., 1965. A random set process with a Markovian property. Ann. Math. Stat. 36,
1859–1863.
Terzaghi, K. C., 1946. Introduction to tunnel geology: Rock defects and loads on tunnel supports.
In: Protor, R. V., and White, T. L., Rock Tunnelling with Steel Supports: The Commercial
Shearing and Sampling Co., Youngs Town, Ohio, pp. 15–99.
Terzaghi, R. D., 1965. Sources of error in joint surveys. Geotechnique15, 237–304.
Wallis, P. F., and King, M. S., 1980. Discontinuity spacing in a crystalline rock. Int. J. Rock Mech.
Min. Sci. Geomech. Abstr. 17, 63–66.
Watson, 1955. Serial correlation in regression analysis I. Biometrika, 42, 327–341.
Index
343
344 Index
empirical and theoretical SV, 252 Subyani, A. M., 167, 256, 259
interpretations, 253, 255 Successive correction method,
lag-one lake level prediction, 255 133, 134
lag-one model verification, 254 Summer, G., 9, 47, 55
lake level TDMs, 254 Surface fitting methods, 206
location map, 251 SV, see Semivariogram (SV)
observed and predicted lake Switzer, P., 272
levels, 256
theoretical Gaussian SV T
parameters, 252 Tabios, G. O., 47
universal Kriging, 245–247 Talagrand, O., 206
Spatial pattern, 85–86 Tase, N., 74, 77
Spatial prediction study, steps, 129 Taylor, G. I., 144, 179
Spatial simulation, 272–273 Temporal variations, 84
autorun simulation of porous material, Terzaghi, K. C., 281
310–311 Terzaghi, R. D., 321
autorun analysis of sandstone, Theoretical CSV models
312–316 exponential, 172–173
autorun modeling of porous media, Gaussian, 174–175
316–321 linear, 169–171
line characteristic function of porous logarithmic, 173–174
medium, 312 power, 171–172
CSV technique, 321–323 Thiebaux, H. J., 129, 133,
intact length CSV, 323–324 134, 136
theoretical CSV model, 324–333 Thiessen, A. H., 52, 54
3D autoregressive model, 273–274 Thiessen polygons, 52, 53
2D uniform model parameters, 276–279 obtaining sub-polygons, 53
extension to 3D, 279–281 Time series, 84
parameters estimation, 274–276 Toulany, B., 165
multidirectional RQD simulation, 333–334 Trend surface, 106–107
fracture network model, 334–335 calculation table, 109
RQD analysis, 335–337 variables, 107
RQD simulation results, 338–340 Triangularization, 47–52
rock quality designation simulation, 281 Triple diagram method (TDM), 252
dependent intact lengths, 290–300
independent intact lengths, 281–290 U
RQD and intact length simulation, 300–303 Uncertainty techniques, 14
proposed models of persistance, Uniformity test, 93–94
303–305 Universal Kriging, 228, 245–247
simulation of intact lengths, 305–310 procedural structure, 246
Spatial variability, 8, 9, 203
Specific surface, 314 V
Srivastava, R. M., 18, 222, Vannitseni, S., 250
249, 251 Variability, 1
Statistical interpolation, 211 deterministic and stochastic
see also Optimum interpolation variations, 11
Statistical objective analysis, 206 geometric similarities, 10
Steffen, O., 322 kinematics similarity and, 10
Stevens, C. F., 111 quantitative consideration, 11
Stochastic process, 312 spatial, 8, 9
Stout, G. E., 55 Variational techniques, 206
Stratigraphic variation, 4 Void length, 317
Student, 205 Voronoi, G., 52
Index 351
W Y
Wallis, P. F., 282, 322 Yates, F., 205
Watt, D. G., 313 Yevjevich, Y., 77
Weighted-averaging analysis
scheme, 104
Wiener, N., 22
Wiesner, C. J., 61, 62 Z
Wilson, J. W., 8, 55 Zadeh, L. A., 23, 90, 253